DSA-U1-Notes 23-24
DSA-U1-Notes 23-24
DSA-U1-Notes 23-24
UNIT-1
Algorithms: Introduction, Algorithm Specifications, Recursive Algorithms,
Performance Analysis of an algorithm- Time and Space Complexity, Asymptotic
Notations.
Arrays:Arrays-ADT,Polynomials,Sparsematrices,Strings-ADT,Pattern Matching.
Objective:
To develop proficiency in the specification, representation, and implementation of abstract
data types and data structures.
Outcome: Implementing the concepts of data structure using abstract data type and
Evaluate an algorithm by using algorithmic performance and measures.
Algorithm
An algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a certain
order to get the desired output.
Algorithms are generally created independent of underlying languages, i.e. an algorithm can be
implemented in more than one programming language.
Pseudo code: Is nothing but an algorithm expressed in syntax much closer to high level languages but
not in executable format. So, it’s nearer to the actual code. Usually, pseudo code form of the solution
contains following two parts.
Heading Part
Body of the algorithm part
1. Heading Part:
Syntax: algorithm Name (Parameters list)
2. Body Part: This body of the algorithm contains the steps or statements when executed perform
a specific task. These statements may include from the following types.
// Comments
{
Declarations: of data items
Assignments : <Variable>=<Value>
If condition Statements
Looping statements
Input and Output statements
Function call statements
}
1
Data Structures and Algorithms Unit-I
From the data structures point of view, following are some important algorithms used in more often in
different computing applications.
Sorting
Searching
Compression
Encoding
Fast Fourier transforms
Geometric
Pattern matching
Parsing
Sub Categories of Algorithms: Based on the types of statements (steps) used in algorithm, it can be
called as following
Sequence
Selection and
Iteration algorithms
Sequence: The steps described in an algorithm are performed successively one by one without skipping
any step. The sequence of steps defined in an algorithm should be simple and easy to understand. In
this type, each instruction of the algorithm is executed. Because, there is no selection or conditional
branching statements exist in a sequence algorithm.
Example:
//Adding two numbers
Step1: start
Step 2: read a, b
Step3: Sum= a+b
Step4: write Sum
Step5: stop
Selection: The sequential algorithms are not sufficient to solve complex problems. Some solutions
need decision making or option selection. So, in solving such problems we write selection type of
algorithms. The general format of a Selection statement is given below:
if(condition)
Statement-1;
else
Statement-2;
The above syntax specifies that if the condition is true, statement-1 will be executed. Otherwise
statement-2 will be executed. In case the operation is unsuccessful.
In algorithms involving selection type of operations, all the statements are not executed all times.
Depending on the truth values of condition either Statement-1 or Statement-2 will be executed.
Following two examples involve selections
2
Data Structures and Algorithms Unit-I
Iteration: These types of algorithms are used in solving problems which involve repetition of certain
operations. So, some statements are repeatedly executed in solving the given task.
Example1:
Step1:start
Step2:readn
Step3:repeat step4 until n>0
Step4: (a)r=nmod10
o (b)s= s+r
o (c)n= n/10
Step5:writes
Step6:stop
In the algorithm, step3 and step4 are executed repeatedly.
3
Data Structures and Algorithms Unit-I
Recursive Algorithms
A function is a self-contained block of statements that performs a specific task when invoked (called)
by another function. When a function is called, it executes its code and then returns control back to its
caller.
This perspective normally ignores the fact that functions can call themselves (direct recursion) or they
may call other functions that invoke the calling function again (indirect recursion).
For example, functions calculating factorials and fibonacci numbers fit into this category as do
binomial coefficients where:
Example1:A recursive solution to the most popular Towers of Hanoi Problem, is much easier than
its iterative version. In the following lines, we have given the problem statement and its solution for
n>=1disks is given.
The number of disks is n. To get the largest disk from tower A to the bottom of tower B, we move the
remaining n-1 disks to tower C and then move the largest to tower B. Now we are left with the task of
moving the disks from tower C to tower B. To do this, we have towers A and B available. The fact that
tower B has a disk on it can be ignored as the disk is larger than the disks being moved from tower C
and so any disk can be placed on top of it.
A recursive solution for the n disk ToH problem is TowersOfHanoi(n,A,B,C). We can observe that
our solution for an n-disk problem is formulated in terms of solutions to two (n-1) disk sub problems.
A simple algorithm can be obtained by looking at the case of four elements (a,b,c,d). The answer can
be constructed by writing
OR
We can replace the looping statements in the iterative function by its equivalent recursive statement
calls. Creating recursive calls that move us closer to a solution is also simple since it requires only
passing the new left or right index as a parameter in the next recursive call.
Following algorithm implements recursive binary search method.
There are many criteria (parameters) upon which we can judge a program, including:
Though there are many parameters that can be used in performance evaluation of algorithms. Following
metrics are widely used.
6
Data Structures and Algorithms Unit-I
1) Space Complexity 2). Time Complexity
1) Space Complexity
Definition: The space complexity of a program is the amount of memory that it needs to run to
completion. An algorithm is said to be efficient if it occupies less space and required the minimum
amount of time to complete its execution.
The space needed by a program is the sum of the following components:
1. Fixed space requirements(c): This component refers to space requirements that do not depend on the
number and size of the program’s inputs and outputs.
The fixed requirements component includes
The instruction space (space needed to store the code),
Space for simple variables,
Fixed-size structured variables (such as structs), and
Constants.
2. Variable space requirements (Sp(I)): This component consists of the space needed by structured
variables whose size depends on the particular instance of the problem being solved.
It also includes the additional space required when a function uses recursion.
The variable space requirement of a program P, denoted Sp(I) and is usually defined as a function of
essential instance characteristics(IC) of the problem I.
Commonly used characteristics include the number, size, and values of the inputs and outputs
associated with I. For example, if our input is an array containing n numbers then n is an instance
characteristic. If n is the only instance characteristic, we wish to use when computing Sp(I), we will
use Sp(n) to represent Sp(I).
We can express the total space requirement of an algorithm (or program) P, denoted as S(P) of any
program as :
S(P) = c + Sp(I)
where c is a constant representing the fixed space requirements.
When analyzing the space complexity of a program we are usually concerned with only the variable
space requirements. In the following lines, we worked on programs that include both fixed and variable
space components.
Example 1:
Example 2: We want to add a list of numbers. Although the output is a simple value, the input includes
an array. Therefore, the variable space requirement depends on how the array is passed into the function
and also the size, n, of the array.
7
Data Structures and Algorithms Unit-I
We know that, C passes all parameters by value by default. When an array is passed as an argument to
a function, C interprets it as passing the address of the first element of the array. C does not copy the
entire array. Therefore S sum(n), = 0.
Example 3:
Following program also adds a list of numbers, but this time the summation is handled recursively.
This means that the compiler must save the parameters, the local variables, and the return address for
each recursive call,
If MAX-SIZE = 1000, the variable space needed by the recursive version is 6*1000 = 6,000 bytes.
The limitations of the operation counts method discussed above is, that we must have compilers
knowledge of how it translates to object code, and also the assembly language of the underlying machine.
In addition, ASM programs are machine dependent and not portable.
Alternatively
Program step: is any syntactically and semantically meaningful segment of a program whose execution
time is independent of the instance characteristics of the problem.
In this method, we first define a program step, and then determine total number of steps required by the
program to its completion.
To estimate the step counts,
1. We introduce a global variable count, which is initialized to 0, and
2. increment the count variable each time after every executable statement(step) is executed.( note
that the step count value of non-executable statements like declarations, function headers is
counted as 0)
3. Obtain the total count value in count as a function of ICs, i.e., f(n)
Above is the step counts table of the iterative function to sum an array of values. To construct the table,
First entered the steps/execution(s/e) for each statement. Next,
Figured out the frequency column.
Obtain the total steps per statement (s/e*frequency)column, and finally
We added the total steps due to all statements in the algorithm.
Note: In the process, the for loop at line 5 complicated matters slightly. However, since the loop starts
at 0 and terminates when i is equal to n, its frequency is n + 1. (A typical for(i = 0; i< n; i++) statement
will be executed “n+1” times for the first n times the condition is satisfied and the inner loop will be
executed and for the (n+1) th time the condition is failed and the loop terminates.) The body of the
loop (line 6) only executes n times since it is not executed when i = n.
We then obtained the total steps for each statement and the final step count as a function of n.
Now, let’s apply the step counts method on the recursive array sum algorithm and compute the total
steps required.
Example 2: [Recursive summing of a list of numbers]'.
We want to obtain the step count for the recursive version of the array summing function. To determine
the step counts, we first need to figure out the step count for the boundary condition if (n = 0).
Looking at Program, we can see that when n = 0 only the if conditional and the second return statement
are executed. So, the total step count for n = 0 is 2. For n > 0, the if conditional and the first return
statement are executed. So each recursive call with n > 0 adds a step count value 2 to the count. Since
there are n such function calls and these are followed by one with n = 0, the step count for the function
is 2n +2
10
Data Structures and Algorithms Unit-I
Advantage: This method is simple and gives a machine independent estimate of the performance.
This can be used in further analysis
Limitations: The concept of the program step is abstract and may not give exact values as we
considered a simple assignment like tsum=0 and the complex statements like rsum(list,n-1)+list[n-1]
also as single step.
Complexity of Algorithms:
The complexity of an algorithm M is the function f(n) which gives the running time and/or storage
space requirement of the algorithm in terms of the size “n” of the input data. Mostly, the storage space
required by an algorithm is simply a multiple of the data size “n‟. Complexity shall refer to the running
time of the algorithm. The function f(n), gives the running time of an algorithm, depends not only on
the size “n‟ of the input data. But, also on the particular data.
The complexity estimate, f(n) is obtained for different cases of the solution. Specifically, f(n) may be
for the
Best Case: In this case, f(n) denotes the minimum possible number of steps or operations.
Average Case: In this case, f(n) denotes the average number steps required.
Worst Case: In this case, f(n) is the maximum number of steps required.
The field of computer science that studies efficiency of algorithms is known as analysis of algorithms.
Algorithms can be evaluated by a variety of criteria. More often we shall be interested in the rate of
growth of the time or space needed to solve larger and larger instances of a problem.
We will associate with the problem an integer, called the size of the problem, which is a measure of the
quantity of input data.
ASYMPTOTIC NOTATIONS
It’s the process of analyzing algorithms. In Asymptotic Analysis, we evaluate the performance of
11
Data Structures and Algorithms Unit-I
an algorithm in terms of instance characteristics and input size (we don’t measure the actual
running time). We calculate, how the time (or space) taken by an algorithm increases with the input
size.
Asymptotic notation is a way to describe the running time or space complexity of an algorithm based
on the input size. It is commonly used in complexity analysis to describe how an algorithm performs
as the size of the input grows.
Asymptotic analysis is input bound, i.e., if there's no input to the algorithm, it is concluded to work in
a constant time. Other than the input. All other factors are considered constant.
For example, the running time of one operation is computed as f(n) and may be for another operation
it is computed as g(n2).This means the first operation running time will increase linearly with the
increase in n and the running time of the second operation will increase exponentially when n increases.
Similarly, the running time of both operations will be nearly the same if n is significantly small.
The three most commonly used notations are Big O, Omega, and Theta Notations
Asymptotic Notations
Following are the commonly used asymptotic notations to calculate the running time complexity of
an algorithm.
Big ‘oh’(O)Notation, Big omega (Ω) Notation and Theta (θ) Notations
Big ‘oh’(O)Notation
Definition; [Big “oh”]: f(n) = O(g(n)) (read as “ f of n is big oh of g of n” ) if and only if there exist
positive constants c and n0 such that f(n) ≤ c*g (n) for all n, n >n0.
The time required by an algorithm can be estimated for the following three types−
Best Case−Minimum time required for program execution.
Average Case−Average time required for program execution.
Worst Case−Maximum time required for program execution.
The notation Ο(n) is the formal way to express the upper bound of an algorithm's running time. It
measures the worst case time complexity or the longest amount of time an algorithm can possibly take
to complete.
For example, for a function f(n) , Ο(f(n)) = { g(n) : there exists c>0 and n0 such that f(n) ≤ c *g(n) for
all n>n0 }
12
Data Structures and Algorithms Unit-I
The notation Ω(n) is the formal way to express the lower bound of an algorithm's running time. It
measures the best case time complexity or the best amount of time an algorithm can possibly take to
complete.
For example, for a function f(n), f(n) ≥ Ω(f(n)) { g(n) : there exists c>0 and n0 such that g(n)≤c*f(n)
for all n>n0. }
13
Data Structures and Algorithms Unit-I
From the above graph, it can be observed that, θ(f(n))={ g(n) if and only if g(n) = Ο(f(n)) and g(n) =
Ω(f(n)) for all n>n0. }
DATA ABSTRACTION
Definition: A data type is a collection of objects and a set of operations that act on those objects.
14
Data Structures and Algorithms Unit-I
C language provides two mechanisms for grouping data together. These are the array and the
structure.
Arrays are collections of elements of the same data type. They are declared implicitly, for
example, int list[5] defines a five-element array of integers whose legitimate subscripts are in the
range 0 • • • 4.
structs are collections of elements whose data types need not be the same. They are explicitly
defined. For example,
struct student {
char last—name;
int student—id;
char grade;
}
defines a structure with three fields, two of type character and one of type integer. The structure name
is student.
C also provides the pointer data type. For every basic data type there is a corresponding pointer
data type, such as pointer-to-an-int, pointer-to-a-real, pointer-to- a-char, and pointer-to-a-float. A
pointer is denoted by placing an asterisk, * , before a variable’s name.
So, int i, *pi; declares i as an integer and pi as a pointer to an integer.
All programming languages provide at least a minimal set of predefined data types, plus the ability to
construct new, or user-defined types.
In general, abstraction is a creative process of focusing attention on the main problems by ignoring
lower-level (implementation) details.
In programming, we encounter two particular kinds of abstraction: procedural abstraction and data
abstraction.
A procedural abstraction is a mental model of what we want a subprogram to do (but not how to do
it).
Example: if you wanted to compute the length of the a hypotenuse of a right triangle, you might
write something like :
We can write this, understanding that the sqrt function is supposed to compute a square root, even if
we have no idea of how that square root actually gets computed.
Data Abstraction
Data abstraction works much the same way. A data abstraction is a mental model of what can be
done to a collection of data. It deliberately excludes details of how to do it.
Elapsed time refers to a period or extent of time, as opposed to an instant in time that you might read
in a single glance at a clock.
Elapsed time is generally measured in a mixture of hours, minutes, and seconds.
Definition: An abstract data type (ADT) is a data type that is organized in such a way that the
specification of the objects and the specification of the operations on the objects is separated from the
15
Data Structures and Algorithms Unit-I
representation of the objects and the implementation of the operations.
An abstract data type is implementation-independent. Furthermore, it is possible to classify the
functions of an ADT into several categories:
1. Creator/constructor: These functions create a new instance of the designated type.
2. Transformers: These functions also create an instance of the designated type, generally by
using one or more other instances. The difference between constructors and transformers will
become clearer with some examples.
3. Observers/reporters: These functions provide information about an instance of the type, but
they do not change the instance.
Typically, an ADT definition will include at least one function from each of these three categories.
Array ADT:
The array is an abstract data type (ADT) that holds a collection of elements accessible by an
index.
The Create(j, list) function produces a new, empty array of the appropriate size. All of the items
16
Data Structures and Algorithms Unit-I
are initially undefined.
Retrieve(A,i)- accepts an array and an index. It returns the value associated with the index if the
index is valid, or an error if the index is invalid.
Store(A,I,x)- accepts an array, an index, and an item, and returns the original array augmented
with the new pair.
The advantage of this ADT definition is that it clearly points out the fact that the array is a more
general structure than "a consecutive set of memory locations."
Consider the implementation of one-dimensional arrays. When the compiler encounters an array
declaration such as the one used above to create list, it allocates five consecutive memory locations.
Each memory location is large enough to hold a single integer. The address of the first element list[0],
is called the base address. If the size of an integer on your machine is denoted by sizeof(int), then we
get the following memory addresses for the five elements of the list[]-
The elements stored in an array can be anything from primitive types such as integers,
characters, etc..
An element is stored in each index and they can be retrieved later by specifying the same index.
The array ADT is usually implemented by an Array (Data Structure).
ADT Operations:
• Insertion – Adds an element at the given index.
• Deletion-Deletes an element at the given index
• Traverse- Print all the array elements one by one
• Search – Searches an element using the given index or by the value
• Update – Updates an element at the given index
•
// C Implementation of Array ADT
1. #include<stdio.h>
2. inta[5],i,n,j,index,value;
17
Data Structures and Algorithms Unit-I
3. void insert()
4. {
5. printf(" Enter 5 integers:");
6. for(int i = 0; i< 5; ++i)
7. scanf("%d",&a[i]);
8. }
9. void delete(){
printf("\n Enter an element to delete\n");
scanf("%d",&n);
10. for(i=0;i<5;i++){
11. if(a[i]==n)
12. {
13. for(j=i;j<=4;j++)
14. {
if(j==4)
{
a[j]=0;
}
else
{
a[j]=a[j+1];
}
15. }
16. break;
17. }}}
18.
19. void display()
20. {
21. for(i=0;i<5;i++)
22. printf("%d",a[i]);
23. }
24. void search()
25. {
a. printf("\n Enter an element to search\n");
b. scanf("%d", &n);
26. for(i=0;i<5;i++) {
if(a[i]==n)
{
printf("\n The searching element %d found at %d index",n,i);
}
27. }
28. }
29. void update()
30. {}
31. int main()
32. {
33. printf("Enter the index number and value which you want to update:");
34. scanf("%d%d",&index, &value);
a.[index]=value;
35. int ch;
36. do
37. {
printf("\n****MENU****");
printf("\n1:insert()\n2:delete()\n3:display()\n4:search()\n5:update()\n");
printf("\nEnter Your Choice:");
18
Data Structures and Algorithms Unit-I
scanf("%d",&ch);
switch(ch)
38. {
case1: insert();
break;
case 2: delete();
break:
case 3: display();
break;
case 4: search();
break;
case 5: update();
39. break;
40. default: printf("choose correct option");
41. }
42. }while(ch!=0);
43. }
19
Data Structures and Algorithms Unit-I
Figure 2.1 shows the results we obtained when we ran print 1. (Notice that the addresses increase by
two. This is what we would expect on an Intel 386 machine.)
Following algorithm works by comparing terms from the two polynomials until one or both of the
polynomials becomes empty. The switch statement performs the comparisons and adds the proper term
to the new polynomial, d. If one of the polynomials becomes empty, we copy the remaining terms from
the nonempty polynomial into d. With these insights, suppose we now consider the representation
question more carefully. One way to represent polynomials in C is to use typedef to create the type
polynomial as below:
20
Data Structures and Algorithms Unit-I
This representation does not impose any limit on the number of polynomials that we can place in terms.
The only stipulation is that the total number of nonzero terms must be no more than MAX-TERMS. It
is worth pointing out the difference between our specification and our representation. Our specification
used poly to refer to a polynomial, and our representation translated poly into a pair. Therefore, to use
21
Data Structures and Algorithms Unit-I
A(x) we must pass in starta and finisha. Any polynomial A that has n nonzero terms has starta and
finisha such that finisha = starta + n - 1.
For instance, Figure 2.3(b), it was noticed that it contains many zero entries. We call this a sparse
matrix. Although it is difficult to determine exactly whether a matrix is sparse or not, intuitively we
can recognize a sparse matrix when we see one. In Figure 2.3(b), only 8 of 36 elements are nonzero
and that certainly is sparse. Since a sparse matrix wastes space, we must consider alternate forms of
representation. The standard two-dimensional array implementation simply does not work when the
matrices are large since most compilers impose limits on array sizes.
For example, consider the space requirements necessary to store a 1000 x 1000 matrix. If this matrix
contains mostly zero entries we have wasted a tremendous amount of space. Therefore, our
representation of sparse matrices should store only nonzero elements.
23
Data Structures and Algorithms Unit-I
• A matrix can be defined as a two-dimensional array having 'm' columns and 'n' rows representing
m*n matrix.
• Sometimes it happens when a matrix has zero values is more than NON-ZERO values.
A minimal set of operations includes matrix creation, addition, multiplication, and transpose.
Structure 2.3 contains our specification of the matrix ADT.
24
Data Structures and Algorithms Unit-I
String ADT
As an ADT, we define a string to have the form, S = So, .. . S n-1 where Si are characters taken from the
character set of the programming language. If n = 0, then S is an empty or null string. There are several
useful operations we could specify for strings.
Some of these operations are similar to those required for other ADTs:
• creating a new empty string,
• reading a string or printing it out,
• appending two strings together (called concatenation), or copying a string.
However, there are other operations that are unique to our new ADT,
• including comparing strings,
• inserting a substring into a string,
• removing a sub string from a string, or finding a pattern in a string. We have listed the essential
operations in Structure 2.4, which contains our specification of the string ADT.
Actually there are many more operations on strings, as we shall see when we look at part of C’s string
library in Figure 2.7
26
Data Structures and Algorithms Unit-I
27
Data Structures and Algorithms Unit-I
By string literal:
char ch[] = “Hello”;
In such case, '\0' will be appended at the end of the string by the compiler.
Difference between char array and string literal:
• We need to add the null character '\0' at the end of the array by our self in char array. Whereas, it
is appended internally by the compiler in the case of the string literal.
• The string literal cannot be reassigned to another set of characters whereas; we can reassign the
characters of the array.
We can read a string from the keyboard in two ways
• Using scanf() method – This can read single word
• Using gets() method – reads a line of text or multi word strings
• Using scanf() method we can read only one word of string. We use %s to represent string in
scanf() and printf() methods.
• The puts() function is very much similar to printf() function.
• The puts() function is used to print the string on the console which is previously read by using
gets() or scanf() function.
Operations on strings: length, concatenation, copy and compare
28
Data Structures and Algorithms Unit-I
PATTERN MATCHING
Pattern matching is a process of checking whether a string (called pattern) is found in another string.
In pattern recognition, the match usually has to be exact.
A pattern can be a series of digits, a string arranged in an order. The order is really important in
case of pattern matching.
Assume that we have two strings, string and pat, where pat is a pattern to be searched for in
string.
There are many different ways to perform pattern matching including the following.
1) Using library functions of C
2) Using user defined algorithms
a. Naïve Pattern matching algorithm
b. Brute force algorithm and
29
Data Structures and Algorithms Unit-I
c. Knuth-Morris-Pratt(KMP) algorithm
Pattern Matching using C library functions
The easiest way to determine if pat is in string is to use the built-in function strstr(). If we have the
following declarations:
We declare two strings one pat[] to store the pattern, and another string[] to store the string in
which we search
30
Data Structures and Algorithms Unit-I
This pattern matching rule translates into function pmatch (Program 2.13). The following declarations
are assumed:
31
Data Structures and Algorithms Unit-I
33