DSA Unit - I Notes
DSA Unit - I Notes
DSA Unit - I Notes
Unit – I
Algorithms & Arrays
A data structure should be seen as a logical concept that must address two fundamental concerns.
1. First, how the data will be stored, and
2. Second, what operations will be performed on it.
As data structure is a scheme for data organization so the functional definition of a data structure should
be independent of its implementation. The functional definition of a data structure is known as ADT
(Abstract Data Type) which is independent of implementation. The way in which the data is organized
affects the performance of a program for different tasks. Computer programmers decide which data
structures to use based on the nature of the data and the processes that need to be performed on that data.
Some of the more commonly used data structures include lists, arrays, stacks, queues, heaps, trees, and
graphs.
Page 1
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Simple Data Structure: Simple data structure can be constructed with the help of primitive data
structure. A primitive data structure used to represent the standard data types of any one of the computer
languages. Variables, arrays, pointers, structures, unions, etc. are examples of primitive data structures.
Compound Data structure: Compound data structure can be constructed with the help of any one of the
primitive data structure and it is having a specific functionality. It can be designed by user. It can be
classified as
Linear data structure
Non-linear data structure
Page 2
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Writing an Algorithm:-
Algorithm algorithm_name(parameters)
{
Statements
}
Eg:
Page 3
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Types of Algorithms:-
There are several types of algorithms available some of which are listed below:
1. Brute Force Algorithm: It is the simplest approach for a problem. A brute force algorithm
is the first approach that comes to finding when we see a problem.
2. Recursive Algorithm: A recursive algorithm is based on recursion. In this case, a problem
is broken into several sub-parts and called the same function again and again.
3. Backtracking Algorithm: The backtracking algorithm basically builds the solution by
searching among all possible solutions. Using this algorithm, we keep on building the solution
following criteria. Whenever a solution fails we trace back to the failure point and build on the
next solution and continue this process till we find the solution or all possible solutions are
looked after.
4. Searching Algorithm: Searching algorithms are the ones that are used for searching
elements or groups of elements from a particular data structure. They can be of different types
based on their approach or the data structure in which the element should be found.
5. Sorting Algorithm: Sorting is arranging a group of data in a particular manner according to
the requirement. The algorithms which help in performing this function are called sorting
algorithms. Generally sorting algorithms are used to sort groups of data in an increasing or
decreasing manner.
6. Hashing Algorithm: Hashing algorithms work similarly to the searching algorithm. But
they contain an index with a key ID. In hashing, a key is assigned to specific data.
7. Divide and Conquer Algorithm: This algorithm breaks a problem into sub-problems,
solves a single sub-problem and merges the solutions together to get the final solution. It
consists of the following three steps:
Divide
Solve
Combine
8. Greedy Algorithm: In this type of algorithm the solution is built part by part. The solution
of the next part is built based on the immediate benefit of the next part. The one solution
giving the most benefit will be chosen as the solution for the next part.
9. Dynamic Programming Algorithm: This algorithm uses the concept of using the already
found solution to avoid repetitive calculation of the same part of the problem. It divides the
problem into smaller overlapping subproblems and solves them.
10. Randomized Algorithm: In the randomized algorithm we use a random number so it gives
immediate benefit. The random number helps in deciding the expected outcome.
Page 4
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Advantages of Algorithms:
It is easy to understand.
An algorithm is a step-wise representation of a solution to a given problem.
In Algorithm the problem is broken down into smaller pieces or steps hence, it is easier for
the programmer to convert it into an actual program.
Disadvantages of Algorithms:
Writing an algorithm takes a long time so it is time-consuming.
Understanding complex logic through algorithms can be very difficult.
Branching and Looping statements are difficult to show in Algorithms
Recursive Algorithms:-
A recursive algorithm is an algorithm which calls itself with "smaller (or simpler)" input
values, and which obtains the result for the current input by applying simple operations to the
returned value for the smaller (or simpler) input. More generally if a problem can be solved
utilizing solutions to smaller versions of the same problem, and the smaller versions reduce to
easily solvable cases, then one can use a recursive algorithm to solve that problem. For example,
the elements of a recursively defined set, or the value of a recursively defined function can be
obtained by a recursive algorithm.
If a set or a function is defined recursively, then a recursive algorithm to compute its members or
values mirrors the definition. Initial steps of the recursive algorithm correspond to the basis
clause of the recursive definition and they identify the basic elements. They are then followed by
steps corresponding to the inductive clause, which reduce the computation for an element of one
generation to that of elements of the immediately preceding generation.
In general, recursive computer programs require more memory and computation compared with
iterative algorithms, but they are simpler and for many cases, it is a natural way of thinking about
the problem.
Algorithm:
if k = 1, then return 0;
else return Even(k-1) + 2 .
Here the computation of Even(k) is reduced to that of Even for a smaller input value, that
is Even(k-1). Even(k) eventually becomes Even(1) which is 0 by the first line. For example, to
Page 5
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Algorithm:
int i, even;
i := 1;
even := 0;
while( i < k ) {
even := even + 2;
i := i + 1; }
return even
Types of Recursion: There are 2 types of Recursions. They are :
1) Direct recursion :- When a function calls itself within the same function repeatedly, it is
called the direct recursion.
Tail Recursion: If a recursive function calling itself and that recursive call is the last
statement in the function then it’s known as Tail Recursion. After that call the
recursive function performs nothing. The function has to process or perform any
operation at the time of calling and it does nothing at returning time.
Head Recursion: If a recursive function calling itself and that recursive call is the first
statement in the function then it’s known as Head Recursion. There’s no statement, no
operation before the call. The function doesn’t have to process or perform any
operation at the time of calling and all operations are done at returning time.
2) Indirect recursion:- When a function is mutually called by another function in a circular
manner, the function is called an indirect recursion function.
Page 6
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
The performance analysis of algorithms can be measured on the scales of time and space. The
performance of a program is the amount of computer memory and time needed to run a program.
We use two approaches to determine the performance of a program or an algorithm. One is
analytical and the other is experimental. In performance analysis we use analytical methods,
while in performance measurement we conduct experiments.
Time Complexity: The time complexity of an algorithm or a program is a function of the
running time of the algorithm or a program. In other words, it is the amount of computer time it
needs to run to completion.
Space Complexity: The space complexity of an algorithm or program is a function of the space
needed by the algorithm or program to run to completion.
algorithm sum(A,n)
{
s=0; -1
for(i=0;i<n;i++) - n+1
s=s+A[i]; -n
return s; -1
}
f(n)=1+n+1+n+1 =2n+3 -> O(n)
Page 7
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
--------
s(n)= n+3 -> O(n)
algorithm Add(A,B, n)
{
for(i=0;i<n;i++) - n+1
for(j=0;j<n;j++) - n * (n+1)
c[i][j]=A[i][j]+B[i][j]; - n*n
} ---------
f(n)=n+1+n2+n+n2=2n2+2n+1 -> O(n2)
Asymptotic Notations.:-
It is the meaningful way of representation of time complexity. It is often used to describe how
the size of the input data affects an algorithm’s usage of computational resources. Running time
of an algorithm is described as a function of input size n for large n.
They are
i) Big Oh [ O (g(n)) ] { Upper bound }
ii) Big Omega [ Ω (g(n)) ] { Lower bound }
iii) Big Theta [ Ɵ (g(n)) ] { Average }
The Order increases in this format:
O (1), O (log n), O (n/2), O (n), O (n2), O (n3) . . .
Page 8
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Big oh(O): Definition: f(n) = O(g(n)) (read as f of n is big oh of g of n) if there exist a positive
integer
n0 and a positive number c such that |f(n)| ≤ c|g(n)| for all n ≥ n0 . Here g(n) is the upper bound
of the
function f(n).
Theta(Θ): Definition: f(n) = Θ(g(n)) (read as f of n is theta of g of n), if there exists a positive
integer
n0 and two positive constants c1 and c2 such that c1 |g(n)| ≤ |f(n)| ≤ c2 |g(n)| for all n ≥ n0. The
function
g(n) is both an upper bound and a lower bound for the function f(n) for all values of n, n ≥ n0 .
Page 9
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Page 10
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Page 11
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
3n + 4 ≥ C . g(n)
3n + 4 ≥ 3n (Lower bound)
If 3n + 4 ≥ 3n then C = 3 , g(n) = n
If n=1 then 7 ≥ 3
If n=2 then 10 ≥ 6
If n=3 then 13 ≥ 9
Arrays – ADT:-
Arrays are defined as the collection of similar types of data items stored at contiguous memory
locations. It is one of the simplest data structures where each data element can be randomly
accessed by using its index number.
In C programming, they are the derived data types that can store the primitive type of data such
as int, char, double, float, etc. For example, if we want to store the marks of a student in 6
subjects, then we don't need to define a different variable for the marks in different subjects.
Instead, we can define an array that can store the marks in each subject at the contiguous
memory locations.
Properties of array
There are some of the properties of an array that are listed as follows -
o Each element in an array is of the same data type and carries the same size that is 4 bytes.
o Elements in the array are stored at contiguous memory locations from which the first
element is stored at the smallest memory location.
o Elements of the array can be randomly accessed since we can calculate the address of
each element of the array with the given base address and the size of the data element.
Representation of an array
We can represent an array in various ways in different programming languages. As an
illustration, let's see the declaration of array in C language -
Page 12
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
As per the above illustration, there are some of the following important points -
o Index starts with 0.
o The array's length is 10, which means we can store 10 elements.
o Each element in the array can be accessed via its index.
Why are arrays required?
Arrays are useful because -
o Sorting and searching a value in an array is easier.
o Arrays are best to process multiple values quickly and easily.
o Arrays are good for storing multiple values in a single variable - In computer
programming, most cases require storing a large number of data of a similar type. To
store such an amount of data, we need to define a large number of variables. It would be
very difficult to remember the names of all the variables while writing the programs.
Instead of naming all the variables with a different name, it is better to define an array
and store all the elements into it.
Memory allocation of an array
As stated above, all the data elements of an array are stored at contiguous locations in the main
memory. The name of the array represents the base address or the address of the first element in
the main memory. Each element of the array is represented by proper indexing.
Page 13
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
In the above image, we have shown the memory allocation of an array arr of size 5. The array
follows a 0-based indexing approach. The base address of the array is 100 bytes. It is the address
of arr[0]. Here, the size of the data type used is 4 bytes; therefore, each element will take 4 bytes
in the memory. Basic operations supported in the array are
Traversal - This operation is used to print the elements of the array.
Insertion - It is used to add an element at a particular index.
Deletion - It is used to delete an element from a particular index.
Search - It is used to search an element using the given index or by the value.
Update - It updates an element at a particular index.
Polynomials:-
Polynomials and Sparse Matrix are two important applications of arrays and linked lists. A
polynomial is composed of different terms where each of them holds a coefficient and an
exponent. This tutorial chapter includes the representation of polynomials using linked lists and
arrays.
A polynomial p(x) is the expression in variable x which is in the form (ax n + bxn-1 + …. +
jx+ k), where a, b, c …., k fall in the category of real numbers and 'n' is non negative integer,
which is called the degree of polynomial.
An essential characteristic of the polynomial is that each term in the polynomial expression
consists of two parts:
one is the coefficient
other is the exponent
Example:
10x2 + 26x, here 10 and 26 are coefficients and 2, 1 is its exponential value.
The sign of each coefficient and exponent is stored within the coefficient and the
exponent itself
Additional terms having equal exponent is possible one
The storage allocation for each term in the polynomial must be done in ascending and
descending order of their exponent
Representation of a Polynomial:-
Page 14
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Coefficient and
Exponent
Sparse matrices:-
A matrix is a two-dimensional data object made of m rows and n columns, therefore having
total m x n values. If most of the elements of the matrix have 0 value, then it is called a sparse
matrix.
Why to use Sparse Matrix instead of simple matrix ?
Storage: There are lesser non-zero elements than zeros and thus lesser memory can be used
to store only those elements.
Computing time: Computing time can be saved by logically designing a data structure
traversing only non-zero elements..
Example:
00304
00570
00000
02600
Representing a sparse matrix by a 2D array leads to wastage of lots of memory as zeroes in the
matrix are of no use in most of the cases. So, instead of storing zeroes with non-zero elements,
we only store non-zero elements. This means storing non-zero elements with triples- (Row,
Column, value).
Sparse Matrix Representations can be done in many ways following are two common
representations:
1. Array representation
2. Linked list representation
Method 1: Using Arrays:
2D array is used to represent a sparse matrix in which there are three rows named as
Page 15
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Strings-ADT:- An Abstract Data Type (ADT) consists of a set of values, a defined set of
properties of these values, and a set of operations for processing the values. The string ADT
values are all sequences of characters upto a specified length.
o Properties
The component characters are from the ASCII character set
They are comparable in lexicographic order
They have a length, from 0 to the specified length
o Operations on the string ADT include (p.264)
Input
Output
Initialization and assignment
Comparison greater, equal, less
Determination of length
Concatenation
Accessing component characters and substrings
String functions in C
Page 16
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
strncpy() - This command copies the first n characters of a string into another.
strcmp() - function that compares two strings.
strncmp() - compares two strings' first n characters.
strcmpi() - This function compares two strings without regard to case I indicates that this
function ignores case).
stricmp() - compares two strings regardless of case (identical to strcmpi).
strnicmp() – This function compares the first n characters of two strings. There is no
difference in case.
strdup() - This command duplicates a string.
strchr() - Finds the first instance of a character in a string.
strrchr() - Returns the position of a given character in a string.
strstr() - Looks for the first instance of a string in another string.
strset() - This command changes all characters in a string to a specific character.
strnset() - This command changes the first n characters of a string to a specific character.
strrev() - It reverses a string
Pattern Matching:- Pattern matching is the process of checking whether a specific sequence of
characters/tokens/data exists among the given data. Regular programming languages make use of
regular expressions (regex) for pattern matching. Pattern matching is used to determine whether
source files of high-level languages are syntactically correct. It is also used to find and replace a
matching pattern in a text or code with another text/code. Any application that supports search
functionality uses pattern matching in one way or another.
Exact string matching algorithms is to find one, several, or all occurrences of a defined
string (pattern) in a large string (text or sequences) such that each matching is perfect. All
alphabets of patterns must be matched to corresponding matched subsequence. Algorithms based
on character comparison:
● Naive Algorithm: It slides the pattern over text one by one and check for a match. If a
match is found, then slides by 1 again to check for subsequent matches.
● KMP (Knuth Morris Pratt) Algorithm: The idea is whenever a mismatch is detected, we
already know some of the characters in the text of the next window. So, we take advantage
of this information to avoid matching the characters that we know will anyway match.
● Boyer Moore Algorithm: This algorithm uses best heurestics of Naive and KMP algorithm
and starts matching from the last character of the pattern.
● Using the Trie data structure: It is used as an efficient information retrieval data structure.
It stores the keys in form of a balanced BST.
Page 17
GLWEC, Hyd PC301CS - DATA STRUCTURES AND ALGORITHMS Anitha.V
Page 18