Study Material On Data Structure and Algorithms
Study Material On Data Structure and Algorithms
Study Material On Data Structure and Algorithms
Course Outcomes:
CO1 Students will be able to acquire and remember the knowledge of fundamental data
structures.
CO2 Students will be able to implement any problem by writing their own algorithms.
CO3 Students will be able to analyze the algorithm for a given problem using different data
structures.
CO4 Students will be able to learn various data structure approaches and techniques to
develop and design projects.
Detailed Syllabus:
Module 1: 10L
Introduction: Why do we need data structure? [1L]; Concepts of data structures: a) Data and data
structure b) Abstract Data Type and Data Type [2L]; Applications Algorithms and programs [1L]; the
basic idea of pseudo-code [2L]; Algorithm efficiency and analysis [1L]; time and space complexity
analysis of algorithms – order notations [3L].
Module 2: 10L
Linear Data Structures
Array: Different representations – row-major, column-major [1L]; Sparse matrix - its implementation
and usage [1L]; Array representation of polynomials [1L];
Linked List: Singly linked list, circular linked list, doubly linked list, linked list representation of
polynomials and applications [2L];
Stack and Queue: Stack and its implementations (using array, using linked list), applications (Infix to
Postfix conversion, Evaluation of Postfix expression etc.) [2L]; Queue, circular queue, dequeue [1L];
Implementation of queue- both linear and circular (using array, using linked list), Applications [1L];
Recursion: Principles of recursion – use of the stack, differences between recursion and iteration, tail
recursion. Applications - Tower of Hanoi [1L].
Module 3: 12L
Nonlinear Data structures:
Trees: Basic terminologies, tree representation (using array, using linked list) [1L]; Binary trees -
binary tree traversal (pre-, in-, post- order), recursive and non-recursive traversal algorithms of binary
tree, threaded binary tree (left, right, full), and expression tree [2L]; Binary search tree- operations
(creation, insertion, deletion, searching) [1L];Height balanced binary tree – AVL tree (insertion,
deletion with examples only) [1L]; B- Trees –operations (insertion, deletion with examples only) [1L];
B+ Trees – operations (insertion, deletion with examples only) [1L];
Graphs: Graph definitions and concepts (directed/undirected graph, weighted/un-weighted edges,
subgraph, degree, cut vertex/ articulation point, pendant node, clique, complete graph, connected
components – strongly connected component, weakly connected component, path, shortest path, and
isomorphism) [1L]; Graph representations/storage implementations – adjacency matrix, adjacency list,
adjacency multi-list [1L]; Graph traversal and connectivity – Depth-first search (DFS), Breadth-first
search (BFS) – concepts of edges used in DFS and BFS (tree-edge, back-edge, cross-edge, forward-
edge) [2L]; applications. Minimal spanning tree– Prim’s algorithm, Kruskal’s algorithm (basic idea of
greedy methods) [2L].
Module 4: 4L
Searching and Sorting:
Sorting Algorithms: Bubble sort, insertion sort, shell sort, selection sort, merge sort, quick sort, heap
sort (concept of max heap, application – priority queue), radix sort[2L]; Time and space complexity
derivations [1L];
Searching: Sequential search, binary search, interpolation search. Time and space complexity
derivations. Hashing: Hashing functions, collision resolution techniques [2L].
CO1 3 2 3 1 3 1 3
CO2 3 2 3 2 3 2 3
CO3 2 1 3 3 3 2 3
CO4 1 1 3 1 3 2 3
3, 2, 1: - Indicate strong (3), medium (2) and weak (1) correlation respectively.
Module 1:
What is Data Structure?
Arrangement of data, either in computer’s memory or disk storage.
Algorithms are used to insert new data, append and/or delete existing data in data structures.
Not only storage, but retrieval of data also is different for different data structures.
Types of data structures:
Linear: E.g. Arrays, Linked Lists, Stacks, Queues.
Non-linear: E.g. Trees, Graphs.
Algorithms
A set of instructions to solve a class of problems or perform a computation, satisfying the following
properties:
Input: Zero or more quantities should be externally supplied.
Output: At least one quantity should be produced.
Definiteness: Each instruction must be clear and unambiguous.
Finiteness: Must terminate after finite number of steps.
Effectiveness: Each instruction must be feasible and can be carried out.
Can be expressed within a finite amount of space and time.
Can be expressed in many kinds of notations, like natural languages, flowcharts, pseudocode, etc.
Exhibits the following three features:
Sequence
Decision
Repetition
Given by the number of steps taken by the algorithm to compute the function it was written for.
Asymptotic Notations
Expressions that are used to represent the complexity of an algorithm.
We perform three types of analysis on a particular algorithm:
Best Case: Analyzing performance of an algorithm for the input, for which the algorithm takes
minimum time or space.
Worst Case: Analyzing performance of an algorithm for the input, for which the algorithm
takes maximum time or space.
Average Case: Lies between best and worst case, giving average performance of algorithm.
Types of Asymptotic Notations
Big-oh Notation (O) – Describes worst case scenario.
Omega Notation (Ω) – Describes best case scenario.
Theta Notation (Θ) – Describes average complexity of an algorithm.
Little-oh Notation (o) – Means loose upper bound of f(n).
Little omega Notation (ω) - Means loose lower bound of f(n).
Big-OH Notation
The function f(n)=O(g(n)) iff there exist positive constants c and n0 such that f(n) ≤ c*g(n) for all n,
n≥n0.
Example:
Function 3n+2=O(n) as 3n+2 ≤ 4n for all n ≥2.
Big-Omega Notation
The function f(n) =Ω(g(n)) iff there exist positive constants c and n0 such that f(n) ≥ c*g(n) for all n, n ≥
n0.
Example:
Function 3n+2 = Ω(n) as 3n+2 ≥ 3n for n ≥ 1.
Theta Notation
The function f(n) = Θ(n) iff there exist positive constants c1, c2, and n0 such that c1g(n) ≤ f(n) ≤ c2g(n)
for all n, n ≥ n0.
Example:
Function 3n+2 = Θ(n) as 3n+2 ≥ 3n for all n ≥ 2 and 3n+2 ≤ 4n for all n ≥ 2, so c1 = 3, c2 = 4, and n0 =
2.
Little-OH Notation
Function f(n) is ο(g(n)) if for any real constant c > 0, there exists an integer constant n0 ≥ 1 such that 0
≤ f(n) < c*g(n).
Mathematically, f(n) = o(g(n)) iff
lim f(n)/g(n) = 0
n→∞
Example:
Function 3n+2=o(n2) since
lim (3n+2)/n2 = 0
n→∞
Little-Omega Notation
Function f(n) is ω(g(n)) if for any real constant c > 0, there exists an integer constant n0 ≥ 1 such that
f(n) > c * g(n) ≥ 0 for every integer n ≥ n0.
Mathematically, f(n) = ω(g(n)) iff
lim f(n)/g(n) = ∞
n→∞
Example:
Function 4n+6= ω(1) since lim (4n+6)/1 = ∞
n→∞
Questions:
Sl. Bloom's
Question CO Hints
No. level
Contrast between using a linear data
structure (e.g., array) and a non-linear data
Compare and contrast the advantages
structure (e.g., tree) depends on the specific
and disadvantages of using a linear data
requirements of the application. If the
1 structure (e.g., array) versus a non-linear CO3 4
dataset has a fixed size and frequently
data structure (e.g., tree) in terms of
requires random access, an array might be
memory usage and search efficiency.
a better choice due to its memory
efficiency and constant-time access.
Analyze the time complexity of different
data structures (e.g., arrays, linked lists, Start by recalling the meaning of time
2 stacks) and evaluate their efficiency for CO3 4 complexity. Consider various fundamental
various operations such as insertion, operations with linear data structures.
deletion, and retrieval.
An array-based stack offers better memory
Compare and contrast the advantages efficiency and random access, but it is
and limitations of different techniques limited by its fixed size and the need for
for implementing a stack (e.g., array resizing. On the other hand, a linked list-
3 based stack, linked list-based stack), CO3 5 based stack can handle variable-sized data
considering factors such as memory more efficiently without the overhead of
efficiency and the ability to handle resizing, but it sacrifices random access
variable-sized data. and has slightly higher memory overhead
due to pointers.
Develop a data structure that maintains a
running sum of a stream of numbers, Strat by recalling the concept of the
4 allowing constant-time queries for the CO4 6 modified version of Binary Tree which is
sum of a given range of elements, and known as Binary Indexed Tree.
analyze its time complexity.
Start by recalling what notations we use to
compare orders of growth. Elaborate on
Justify the significance of asymptotic
5 CO3 5 each of them. Remember to explain the
notation in algorithm analysis
concepts of both upper and lower bounds
with respect to the asymptotic function.
Start by recalling the meaning of time and
Justify the statement: ‘A trade-off has to
space complexity. Consider the
6 be maintained between time and space CO3 5
implications of each. Finally, highlight how
complexity’.
the balance required between both.
Design and write an algorithm having All the steps of the algorithm must be very
7 CO4 6
worst case time complexity as O(log n). simple, clear and basic.
Appraise the complexity of an algorithm Estimate the number of steps of the
8 which performs a task of calculating a CO3 5 problem and the frequency of execution of
sum of 'n' numbers. each.
Bob wants to search an element 42 from
a given array
You have to find the mid value using start
{12, 13,14,32,21,42,54} but his
and end index of the array. If search
computer is not having
9 C04 6 element is mid then return the value,
capabilities to handle time complex data
otherwise you have to change the start
in order of
value or end value.
n^2. Suggest Bob a way so that he can
find the
element using binary search. Also
determine the
overall time complexity.
Searching in a phone book: A phone
book is stored in
a text file, containing names of people,
their city names
10 and phone numbers. Choose an CO3 4 Try Quick sort for this purpose.
appropriate data
structure to search a person’s phone
number based on
his / her first name and city.
T(n) =2T(n/2) +O(n)
Learn about the time complexity then solve
11 Determine the time complexity using CO3 5
the question.
taking O(n) = n.
User enter the array element:
10,20,30,30,10,9,40
solve the given problems by creating Duplicate elements can be found using two
suitable loops. The outer loop will iterate through
algorithm: the array from 0 to length of the array. The
12 (a)How do you find duplicate numbers CO4 4 outer loop will select an element. The inner
in an array if it loop will be used to compare the selected
contains multiple duplicates? element with the rest of the elements of the
array.
(b) How to remove duplicates from a
given array?
Using the abstract data type "Stack,"
design a program to evaluate a simple
1. Explain the concept of postfix notation
arithmetic expression in postfix notation.
and its relationship with stacks. 2. Design
Implement the necessary methods to
the required methods for the stack to
13 push operands onto the stack and CO4 4
handle operands and operators efficiently.
perform calculations using operators.
3. Provide a step-by-step example of how
The expression consists of integers and
the stack evaluates a postfix expression.
four basic arithmetic operators (+, -, *,
/).
You are given a string containing
parentheses (e.g., "((()))()()"). Design a 1. Design the necessary methods for the
program using the stack data structure to stack to process the given string efficiently.
14 determine if the parentheses in the string CO5 5 2. Provide a step-by-step example of how
are balanced. The program should return the stack checks the balance of parentheses
"True" if the parentheses are balanced in a sample string.
and "False" otherwise.
Suppose, the manager of a call centre
wants to optimize the way customer
1. Explain the concept of a Queue data
inquiries are handled by your team.
structure and its analogy with a real-life
Design a program using the Queue data
queue. 2. Design the required methods for
structure to simulate a call queue
15 CO4 4 the queue to manage calls effectively. 3.
system. Implement the necessary
Provide a step-by-step example of how the
methods to add incoming calls to the
queue operates as visitors join and leave
queue, process them one by one as the
the ride.
agents become available, and display the
current queue status.
Imagine you are in charge of organizing
a small amusement park. You want to
manage the ride queues efficiently to 1. Explain the concept of a Queue data
ensure a smooth experience for visitors. structure and its analogy with a real-life
Design a program using the Queue data queue. 2. Design the required methods for
16 structure to simulate the queueing CO5 5 the queue to manage visitors effectively. 3.
system for one of the popular rides. Provide a step-by-step example of how the
Implement the necessary methods to add queue operates as visitors join and leave
visitors to the queue, remove them after the ride.
they have enjoyed the ride, and display
the current queue status.
Module 2:
Linked List
A linear collection of data elements called nodes.
Acts as building block to implement other data structures like stacks, queues etc.
A sequence of nodes in which each node contains one or more data fields and a pointer to the next node.
START
1 2 3 4 5 6 7 X
• Every node contains two parts- one data and the other a pointer to the next node.
• The left part of the node i.e. data part may be a simple data type, an array or a structure.
• The right part of the node contains a pointer to the next node (or address of the next node in sequence).
• The last node will have no next node connected to it, so it will store a special value called NULL.
Singly Linked List
Simplest type of linked list
Every node contains some data and a pointer to the next node of the same data type i.e. it stores the
address of the next node in sequence.
Algorithm for traversing a linked list
Step 1: [INITIALIZE] SET PTR = START
Step 2: Repeat Steps 3 and 4 while PTR ≠ NULL
Step 3: Write PTR->DATA
Step 4: SET PTR = PTR->NEXT
[END OF LOOP]
Step 5: EXIT
Algorithm to insert a new node in the beginning of the linked list
Step 1: IF AVAIL = NULL, then
Step 2: Write OVERFLOW
Step 3: Go to Step 10
Step 4: [END OF IF]
Step 5: SET New_Node = AVAIL
Step 6: SET AVAIL = AVAIL->NEXT
Step 7: SET New_Node->DATA = VAL
Step 8: SET New_Node->Next = START
Step 9: SET START = New_Node
Step 10: EXIT
Algorithm to insert a new node after a certain given node in a linked list
Step 1: IF AVAIL = NULL, then
Step 2: Write OVERFLOW
Step 3: Go to Step 13
Step 4: [END OF IF]
Step 5: SET New_Node = AVAIL
Step 6: SET AVAIL = AVAIL->NEXT
Step 7: SET New_Node->DATA = VAL
Step 8: SET PTR = HEAD
Step 9: Repeat Step 10 while PTR->DATA ≠ NUM
Step 10: PTR = PTR->NEXT
[END OF LOOP]
Step 11: SET New_Node->NEXT = PTR->NEXT
Step 12: PTR->NEXT = New_Node
Step 13: EXIT
Deletion in Singly Linked List
Algorithm to delete the first node from the linked list
Step 1: IF START = NULL, then
Step 2: Write UNDERFLOW
Step 3: Go to Step 9
Step 4: [END OF IF]
Step 5: SET PTR = START
Step 6: SET START = START->NEXT
Step 7: PTR->NEXT=NULL
Step 8: FREE PTR
Step 9: EXIT
Algorithm to delete a given node from the linked list
Step 1: IF START = NULL, then
Step 2: Write UNDERFLOW
Step 3: Go to Step 14
Step 4: [END OF IF]
Step 5: SET PTR = START
Step 6: SET PREPTR = PTR
Step 7: Repeat Step 8 while PTR->DATA ≠ NUM
Step 8: SET PTR = PTR->NEXT
[END OF LOOP]
Step 9: Repeat Step 10 while PREPTR->NEXT ≠ PTR
Step 10: SET PREPTR = PREPTR->NEXT
[END OF LOOP]
Step 11: SET PREPTR->NEXT = PTR->NEXT
Step 12: SET PTR->NEXT = NULL
Step 13: FREE PTR
Step 14: EXIT
START
X 1 1 2 3 4 X
START
1 2 3 4 5 6 7
START
1 1 2 3 4
Stack
A linear data structure which can be implemented either using an array or a linked list.
The elements in a stack are added and removed only from one end, which is called top.
Stack is called a LIFO (Last In First Out) data structure as the element that was inserted last is the first one to be
taken out.
Three operations are possible in a stack:
• Either the beginning or the end of the linked list can be considered as the top of the stack.
• A TOP pointer always keeps track of the top of the stack.
• All insertions and deletions are to be done from the TOP of the stack.
Applications
Conversion of Infix to Postfix Expression
Infix Notation: X + Y (Operators are written between their operands)
Postfix Notation: X Y + (Operators are written after their operands)
Evaluation of a Postfix Expression
E.g. Expressions like 4 5 6 * + can be evaluated easily.
Recursion
A function calling itself again and again breaking a problem down into smaller and smaller sub-
problems, until we get a small enough problem that can be solved trivially.
Queues
Data structure which stores its elements in an ordered manner. Take for example the analogies given
below:
People moving on an escalator. The people who got on the escalator first will be the first one to step out
of it.
People waiting for bus. The first person standing in the line will be the first one to get into the bus.
A queue is a FIFO (First In First Out) data structure in which the element that was inserted first is the
first one to be taken out.
The elements in a queue are added at one end called the rear and removed from the other one end called
front.
Every queue will have front and rear variables that will point to the position from where deletions and
insertions can be done respectively.
Consider a queue shown in figure
12 9 7 18 14 36
12 9 7 18 14 36 45
9 7 18 14 36 45
7 18 14 36 45 21 99 72
Solutions:
1. Shifting elements to left so that the vacant space can be occupied and utilized efficiently. Very time
consuming (especially when the queue is quite large).
2. Using a circular queue. In circular queue, the first index comes right after the last index.
Q[0]
Q[6] Q[1]
Q[2]
Q[5]
Q[4] Q[3]
90 49 7 18 14 36 45 21
• If front!=0 and rear=MAX -1, then it means that the queue is not full. So, set rear = 0 and insert
the new element there as shown in figure
7 18 14 36 45 21 99 72
If the queue is not empty and after returning the value on front, if front = MAX -1, then front is set to 0.
This is shown in figure
72 63 9 18 27 39 81
1 7 3 4 2 6 5 X
FRONT REAR
Insert 1 7 3 4 2 6 5 9 X
Operation REAR
FRONT
Delete 5
7 3 4 2 6 9 X
Operation REAR
FRONT
Dequeue
A queue in which elements can be inserted or deleted at either end.
Two variants of a double ended queue are:
Input restricted dequeue: Insertions can be done only at rear while deletions can be done from both the
ends.
Output restricted dequeue: Deletions can be done only from front while insertions can be done on both
the ends.
Priority Queue
An abstract data type in which each element is assigned a priority.
The priority of the element is used to determine the order in which these elements will be processed.
The general rule of processing elements of a priority queue can be given as:
An element with higher priority is processed before an element with lower priority.
Two elements with same priority are processed on a first come first served (FCFS) basis.
A modified queue in which the highest-priority one is retrieved first. The priority of the element can be
set based upon distinct factors.
Widely used in operating systems to execute the highest priority process first. The priority of the
process may be set based upon the CPU time it needs to get executed completely.
Questions
Sl. Bloom's
Question CO Hints
No. level
Analyze the time and space complexity of
Start by recalling the meaning of
different sorting algorithms (e.g., bubble sort,
1 CO3 4 time complexity of different
merge sort, quicksort) and evaluate their
sorting algorithms.
efficiency for various input sizes.
Evaluate the trade-offs between using a stack
Start by recalling the concepts of
and a queue in different problemsolving
2 CO4 4 linear data structures and Abstract
scenarios, considering factors like data access
Data Types (ADT).
patterns and required operations.
Compare and contrast different strategies for
implementing a priority queue (e.g., using an Start by recalling the meaning of
3 array, a linked list, or a heap) in terms of time CO3 5 time complexity of linear data
complexity, space complexity, and their structures.
suitability for specific applications.
Create an algorithm that efficiently merges two Start by recalling the meaning of
4 sorted arrays into a single sorted array, CO2 6 time complexity of different
optimizing time complexity. sorting algorithms.
In a node of linked list there must
Explain how to implement polynomial ADT
be three items, one is coefficient,
5 using linked CO4 4
another one is power and address
list. Discuss its Advantages and Disadvantages.
of next pointer.
Help the programmer to identify the data
structure if he
find out two overflow condition in same
algorithm:
(i) Front=0 & Rear=Max-1 Learn the circular queue and solve
6 CO3 4
(ii) Front=Rear+1 the problem.
Also, create the diagram representation to show
the
connection between both the overflow
conditions.
Covert following infix expression into postfix
7 notation: C04 5 Using stack you can do this.
A+B-(C+D)/E*F-(G+H)/I.
Follow the stack data structure to find the
consecutive
number pair, Follow the following rules to
In stack we use arithmetic
create the
expression with parenthesis, if
8 algorithm: CO2 6
there is open and closed
(i) The pair can be increasing or decreasing
parenthesis then it is ok.
(ii) In addition, if stack has an odd number of
elements
then ignore the last element.
Write a program to combine two sorted sub
Use a third to execute the
9 arrays so that the combined array is also sorted. CO4 4
algorithm
Do not use any sorting algorithm
You are developing a task management
application that requires handling a list of tasks
with varying priorities. Design a program using
Implement insertion and deletion
a singly linked list data structure to represent the
10 CO5 5 of nodes from any position of a
task list. Implement the necessary methods to
singly linked list
add tasks with their corresponding priorities,
remove completed tasks, and display the current
task list in priority order.
Explain the concept of reversing a singly linked
Traverse each node using loop and
list and how it can be achieved using an iterative
11 CO5 5 reverse the links by address
approach. Provide a step-by-step explanation of
swapping.
the iterative algorithm to reverse the linked list.
Explain how a singly linked list can be utilized
to represent a polynomial expression efficiently.
Provide a step-by-step explanation of how each Store the coefficient and exponent
12 CO4 4
node in the linked list can store the coefficients values the nodes
and exponents of individual terms in the
polynomial.
Module 3:
Trees
Binary Trees
In a binary tree every node has 0, 1 or at the most 2 successors.
A node that has no successor is the leaf node or the terminal node.
1 ROOT NODE
T1 T2
2 3
4
5 6 7
8
9 10 11 12
A collection of elements called nodes. Every node contains a "left" pointer, a "right" pointer, and a data
element.
Has a root element pointed by a "root" pointer to the topmost node in the tree. If root = NULL, tree is
empty.
If the root node R is not NULL, then the two trees T1 and T2 are the left and right subtrees of R.
If T1 is non-empty, then T1 is said to be the left successor of R. Likewise, if T2 is non-empty then, it is
called the right successor of R.
Complete Binary Tree
A complete binary tree is a binary tree which satisfies two properties.
First, in a complete binary tree every level, except possibly the last, is completely filled.
Second, all nodes appear as far left as possible.
2 3
4 7
5 6
8 9 10 11 13
12
B C
D E
H I
Pre-order traversal
Visiting the parent node.
Traversing the left subtree.
Traversing the right subtree.
Preorder Traversal of the above tree: A, B, D, C, E, F, G, H and I
In-order traversal
Following operations are performed recursively at each node. The algorithm starts with the root node of the tree
and continues by:
Traversing the left subtree.
Visiting the parent node.
Traversing the right subtree.
In-order Traversal of the above tree: B, D, A, E, H, G, I, F AND C
Post-order traversal
Following operations are performed recursively at each node. The algorithm starts with the root node of the tree
and continues by:
Traversing the left subtree.
Traversing the right subtree.
Visiting the parent node.
Post-order traversal of the above tree: D, B, H, I, G, F, E, C and A
Binary Search Trees
A variant of binary tree in which the nodes are arranged in order such that:
All the nodes in the left sub-tree have a value less than that of the root node, and all the nodes in the
right sub-tree have a value either equal to or greater than the root node. The same rule is applicable to
every sub-tree in the tree.
39
27 45
18
29 40 54
9 21 28 36 59
10 19 65
60
4 -4
82 14
0
1 -1
45 45
45 0
1 0
36 0
-
1
36 63 0 63
36 63 0
0
0 0 0
0 0 1
1 27 0
27 39 72
39 54 72 27 72
39
54
0 54
0
0 18 0
Right heavy AVL tree Balanced AVL tree
70
B-Trees
A specialized m-way tree. A B tree of order m can have maximum m-1 keys and m pointers to its sub-
trees.
A B-tree is designed to store sorted data and allows search, insert, and delete operations to be performed
in logarithmic time. A B-tree of order m (the maximum number of children that each node can have) is a
tree with all the properties of an m-way search tree and in addition has the following properties:
Every node in the B-tree has at most (maximum) m children.
Every node in the B-tree except the root node and leaf nodes have at least (minimum) m⁄2
children.
The root node has at least two children if it is not a terminal (leaf) node.
All leaf nodes are at the same level.
An internal node in the B tree can have n number of children, where 0 ≤n ≤ m. it is not
necessary that every node has the same number of children, but the only restriction is that the
node should have at least m/2 children.
B+ Trees
A variant of a B tree which stores sorted data in a way that allows for efficient insertion, retrieval and
removal of records, each of which is identified by a key. While a B tree can store both keys and records
in its interior nodes, a B+ tree, in contrast, stores all records at the leaf level of the tree; only keys are
stored in interior nodes.
The leaf nodes of the B+ tree are often linked to one another in a linked list.
B+-tree stores data only in the leaf nodes. All other nodes (internal nodes) are called index nodes or i-
nodes and store index values which allow us to traverse the tree from the root down to the leaf node that
stores the desired data item.
Graphs
• A graph G is defined as an ordered set (V, E), where V(G) represent the set of vertices and E(G)
represents the edges that connect the vertices.
• The figure given shows a graph with V(G) = {A, B, C, D, E} and E(G) = {(A, B), (B, C), (A, D), (B,
D), (D, E), (C, E)}. Note that there are 5 vertices or nodes and 6 edges in the graph.
A B C
D E
A graph can be directed or undirected. In an undirected graph, the edges do not have any direction
associated with them. That is, if an edge is drawn between nodes A and B, then the nodes can be traversed from
A to B as well as from B to A. The above figure shows an undirected graph because it does not give any
information about the direction of the edges.
A B C
D E
The given figure shows a directed graph. In a directed graph, edges form an ordered pair. If there is
an edge from A to B, then there is a path from A to B but not from B to A. The edge (A, B) is said to initiate
from node A (also known as initial node) and terminate at node B (terminal node).
• Adjacent Nodes or Neighbors:For every edge, e = (u, v) that connects nodes u and v; the nodes u and v
are the end-points and are said to be the adjacent nodes or neighbors.
• Degree of a node:Degree of a node u, deg(u), is the total number of edges containing the node u. If
deg(u) = 0, it means that u does not belong to any edge and such a node is known as an isolated node.
• Regular graph:Regular graph is a graph where each vertex has the same number of neighbors. That is
every node has the same degree. A regular graph with vertices of degree k is called a k-regular graph or
regular graph of degree k.
• Path:A path P, written as P = {v0, v1, v2, ….., vn), of length n from a node u to v is defined as a
sequence of (n+1) nodes. Here, u = v0, v = vn and vi-1 is adjacent to vi for i = 1, 2, 3, …, n.
• Closed path:A path P is known as a closed path if the edge has the same end-points. That is, if v0 = vn.
• Simple path:A path P is known as a simple path if all the nodes in the path are distinct with an
exception that v0 may be equal to vn. If v0 = vn, then the path is called a closed simple path.
• Cycle: A closed simple path with length 3 or more is known as a cycle. A cycle of length k is called a k
– cycle.
• Connected graph:A graph in which there exists a path between any two of its nodes is called a
connected graph. That is to say that there are no isolated nodes in a connected graph. A connected graph
that does not have any cycle is called a tree.
• Complete graph:A graph G is said to be a complete, if all its nodes are fully connected, that is, there is a
path from one node to every other node in the graph. A complete graph has n(n-1)/2 edges, where n is
the number of nodes in G.
• Labeled graph or weighted graph:A graph is said to be labeled if every edge in the graph is assigned
some data. In a weighted graph, the edges of the graph are assigned some weight or length. Weight of
the edge, denoted by w(e) is a positive value which indicates the cost of traversing the edge.
• Multiple edges:Distinct edges which connect the same end points are called multiple edges.
That is, e = (u, v) and e’ = (u, v) are known as multiple edges of G.
• Loop: An edge that has identical end-points is called a loop. That is, e = (u, u).
• Multi- graph:A graph with multiple edges and/or a loop is called a multi-graph.
• Size of the graph:The size of a graph is the total number of edges in it.
REPRESENTATION OF GRAPHS
There are three common ways of storing graphs in computer’s memory. They are:
Sequential representation by using an adjacency matrix
Sequential representation by using an incidence matrix
Linked representation by using an adjacency list that stores the neighbors of a node using a linked list
Adjacency matrix representation
A B B C X
C D X
D X
C D
B X
Example:
B C D
E F G
H I
Breadth First Search of the above graph will give:A B C D E G F H I
Depth First Search of the above graph will give: H I F C G B E
A B 3 3
3
A B A 3 B A B A B
7
4 7 5 5
7 4
6 6
2
C D C D C D C D
C D
2 2
2 2
3
3 3
A B A B A B
A B A B
3
4 5 5
6 4 7 6
7 5
C D C D C D
C D C D
2 2
A 3 B
4
Total Cost = 9
C D
2
Prim’s Algorithm
• Choose a starting vertex
• Branch out from the starting vertex and during each iteration select a new vertex and edge. Basically,
during each iteration of the algorithm, we have to select a vertex from the fringe vertices in such a way
that the edge connecting the tree vertex and the new vertex has minimum weight assigned to it.
Kruskal’s Algorithm
• Like the Prim’s algorithm, the Kruskal's algorithm is used to find a minimum spanning tree for a
connected weighted graph. That is, the algorithm aims to find a subset of the edges that forms a tree that
includes every vertex. The total weight of all the edges in the tree is minimized. However, if initially,
the graph is not connected then it finds a minimum spanning forest (Note that a forest is a collection of
trees. Similarly, a minimum spanning forest is a collection of minimum spanning trees).
• Kruskal's algorithm is an example of a greedy algorithm as it makes the locally optimal choice at each
stage with the hope of finding the global optimum.
Dijkstra’s Algorithm
• Given a graph G and a source node A, the algorithm is used to find the shortest path (one having the
lowest cost) between A (source node) and every other node. Moreover, Dijkstra’s algorithm is also used
for finding costs of shortest paths from a source node to a destination node.
Questions
Bloom's
Sl. No. Question CO Hints
level
Binary search trees are advantageous for
their ordered structure, dynamic size, and
Analyze the advantages and disadvantages of lack of collision resolution overhead. On
1 using a binary search tree over a hash table CO3 4 the other hand, hash table in constant-time
for storing and retrieving data. operations, especially for large datasets, and
are more suitable for scenarios where
element ordering is not a primary concern.
Compare and contrast the time efficiency in Start by recalling the meaning of time
2 implementing a self-balancing binary search CO3 4 complexity of binary search tree and
tree and threaded binary tree. threaded binary tree.
Given a sorted array, design a function to
Start by recalling the meaning of time
3 construct a Balanced Binary Search Tree CO4 5
complexity of binary search tree.
(BST) from it.
Create an algorithm that finds the shortest
Start by recalling the meaning of time
path between two nodes in a directed graph
4 CO4 6 complexity of shortest path algorithm using
with weighted edges, considering time
non-linear data structure.
efficiency.
In postorder you find the root node at the
Construct a tree for the given inorder and
end of the expression. In inorder expression
postorder
5 CO3 5 where root node is placed, left part is the
traversals. Inorder: DGBAHEICF
left child of the tree and right part is the
Postorder :GDBHIEFCA
right child part.
Draw a directed graph with five vertices and
seven
Learn about directed graph and try to draw
6 edges. Exactly one of the edges should be a CO4 6
the graph.
loop, and
do not have any multiple edges.
Define an AVL tree and write the steps used In AVL tree is a binary search tree with
to follow balance factor and it value will be -1, 0, 1.
7 while inserting an element 3 into an given CO3 4 If we insert any value and if its balanced
AVL tree factor is changed then we apply the rotation
containing elements 13, 10, 15, 5, 11, 16, 4, algorithm.
8.
Module 4:
Searching
Linear Search
• The linear (or sequential) search algorithm on an array
– Sequentially scan the array, comparing each array item with the searched value.
– If a match is found; return the index of the matched element; otherwise return –1.
• Linear search can be applied to both sorted and unsorted arrays.
Time Complexity
• Worst Case:
– Time Complexity T(n)=O(n)
• Best Case:
– Time Complexity T(n)=O(1)
• Average Case:
– Time Complexity T(n)=O(n/2)=O(n)
Binary Search
• Binary search uses a recursive method to search an array to find a specified value
• The array must be a sorted array:
a[0]≤a[1]≤a[2]≤. . . ≤a[finalIndex]
• If the value is found, its index is returned
• If the value is not found, -1 is returned
• Note: Each execution of the recursive method reduces the search space by about a half
Time Complexity
• Best case:
– Time Complexity T(n)=O(1)
• Worst Case:
– Time Complexity T(n) = O(log2 n)
• Average Case:
– T(n) = O(log2 n/2) = O(log2 n)
Interpolation Search
• The Interpolation Search is an improvement over Binary Search for instances where the values in a
sorted array are uniformly distributed.
• Binary Search always goes to middle element to check. On the other hand, interpolation search may go
to different locations according to the value of key being searched.
• It parallels how humans search through a telephone book for a particular name, the key value by which
the book's entries are ordered.
Time Complexity
• Worst Case:
– Time Complexity T(n)=O(n)
• Best Case:
– Time Complexity T(n)=O(log2(log2n))
• Average Case:
– T(n)=O(log2(log2n))
Hashing
• The problem at hands is to speed up searching. Consider the problem of searching an array for a given
value.
• If the array is not sorted, the search might require examining each and all elements of the array.
• If the array is sorted, we can use the binary search, and therefore reduce the worst case runtime
complexity to O(log n).
• We could search even faster if we know in advance the index at which that value is located in the array.
• With a magic function our search may be reduced to just one probe, giving us a constant runtime O(1)
on average (under reasonable assumptions) and O(n) in worst case.
Hashing is an improvement over Direct Access Table.
Load Factor
A critical statistic for a hash table is the load factor, that is the number of entries divided by the number of
buckets:
Hash Functions
• A hash function is a function which when given a key, generates an address in the table.
• Problem of finding a hash function: There are ‘n’ keys such that all key values are between “a” and
“b”. It is required to find a hash function f(key) that transforms a key into an address in the range 0 to
(M-1), where M>n.
• An ideal hash function should distribute the keys uniformly over the range (0, M-1).
• Three popular hashing methods include: Division method, Mid Square method and Folding method.
Division Method
• Key is divided by M and the remainder is taken to be the address.
• H(k) =k mod M
• Produces addresses exactly in the range 0 to (M-1).
• M should be chosen carefully. Some choices are not satisfactory.
• Examples:
• Keys are decimal integers. M chosen to be power of 10, say 100. Then all keys having identical last 3
digits will hash into same address.
• M chosen to be an even integer. Then all even keys will hash into odd addresses.
Mid Square method
• Key is squared and a portion of the squared value is selected from the middle which is chosen as the
address.
• Say, a q-digit address is to be generated from a p-digit key.
• Example:
• Let p=4, q=3, key=3271.
• Square of 3271 is 10699441.
• 3 digits extracted from middle. 994 or 699 may be selected as address.
• Flexible method and can be modified if needed.
• When key is large, then a selected part of the key may be squared.
• Usually found to produce addresses that are uniformly distributed over the range of the hashing
function.
Folding Method
• From a p-digit key, a q-digit address is to be generated.
• Digits of a key are partitioned into groups of q-digits from the right. Groups are added and rightmost q-
digits of sum selected as address.
• Example:
• p=8, q=3, key=39427829.
• When partitioned, there are 3 groups: 39/427/829
• Adding 39, 427 and 829, we get 1295. Selecting last 3-digits, desired address is 295.
• Flexible and can be modified as required.
Collision/Conflict
• Different keys which are transformed to the same address are referred to as synonyms.
• Since a hash function gets us a small number for a big key, there is possibility that two keys result in
same value.
• The situation where a newly inserted key maps to an already occupied slot in hash table is called
collision.
• Example:
• Suppose M=3, key1=9, key2=27.
• Using division method results in same address for both the keys:
h(key1)=9 mod 3=0 and h(key2)=27 mod 3=0
• Hence, collision occurs.
• Needs to be resolved using collision resolution methods/techniques.
• Two types of collision resolution techniques: open addressing and chaining.
Open Addressing
• All entry records are stored in the bucket array itself.
• When a new entry has to be inserted, the buckets are examined, starting with the hashed-to slot and
proceeding in some probe sequence, until an unoccupied slot is found.
• When searching for an entry, the buckets are scanned in the same sequence, until either the target record
is found, or an unused array slot is found, which indicates that there is no such key in the table.
• Well-known probe sequences include:
• • Linear probing: in which the interval between probes is fixed (usually 1).
• • Quadratic probing: in which the interval between probes is increased by adding the successive
outputs of a quadratic polynomial to the starting value given by the original hash computation.
• • Double hashing: in which the interval between probes is computed by another hash function
Double Hashing
• Uses the idea of applying a second hash function to key when a collision occurs.
• Can be done using:
(hash1(key)+i*hash2(key)) % M
• First hash function is typically hash1(key) = key % M.
• A popular second hash function is :hash2(key) = PRIME – (key % PRIME) where PRIME is a prime
smaller than the M.
• The value of i = 0, 1, . . ., M – 1. So we start from i = 0, and increase this until we get one free space.
• A good second hash function:
– Must never evaluate to zero
– Must make sure that all cells can be probed.
Clustering
• Linear probing suffers from primary clustering.
• A collision at address “i” indicates that many keys are mapped at “i”.
• Linear probing does not distribute these keys in the hash table.
• All keys clustered around the slot, which increases the search and insertion time.
• One way to resolve is to use quadratic probing, where the locations j, (j+1), (j+4), (j+9) etc. are
searched.
• Does not ensure that all slots in the hash table would be examined.
• Possible that a key could not be inserted even when hash table is not full.
• Quadratic probing gives rise to secondary clustering.
Chaining/Closed Addressing
• In the strategy known as separate chaining, direct chaining, or simply chaining, each slot of the bucket
array is a pointer to a linked list that contains the key-value pairs that hashed to the same location.
• Lookup requires scanning the list for an entry with the given key.
• Insertion requires adding a new entry record to either end of the list belonging to the hashed slot.
• Deletion requires searching the list and removing the element.
Sorting
Bubble Sort
• Simplest sorting algorithm.
• Works by repeatedly swapping the adjacent elements if they are in wrong order.
• Stable sorting algorithm.
• In-place sorting algorithm.
Example:
Time Complexity
Worst case time complexity: T(N) = O(N2)
Best case time complexity: T(N) = O(N)
Average case time complexity: T(N) = O(N2)
Selection Sort
• Works by repeatedly finding the minimum element from unsorted part and putting it at the beginning.
• Unstable sorting algorithm.
• In-place sorting algorithm.
Example:
Time Complexity
Worst case time complexity: T(N) = O(N2)
Best case time complexity:T(N) = O(N2)
Average case time complexity: T(N) = O(N2)
Suited for small (sorted/unsorted) set of inputs.
Insertion Sort
• Builds the final sorted list one item at a time.
• Stable sorting algorithm.
• In-place sorting algorithm.
Example:
Time Complexity
Worst case time complexity: T(N) = O(N2)
Best case time complexity: T(N) = O(N)
Average case time complexity: T(N) = O(N2)
Much less efficient on large set of inputs.
Quick Sort
• Efficient sorting algorithm.
• It is a divide-and-conquer algorithm.
• It works by selecting a pivot element from the array and partitioning the other elements into two sub-
arrays, according to whether they are less than or greater than the pivot. The sub-arrays are
then sorted recursively.
• Unstable sorting algorithm.
• In-place sorting algorithm.
Example:
Time Complexity
• Worst Case time complexity:T(N) = O(N2)
• Best Case time complexity:T(N) =O(N logN)
• Average Case time complexity:T(N) = O(N logN)
Merge Sort
• Efficient, general-purpose, comparison based sorting algorithm.
• It is a divide-and-conquer algorithm.
• It works by breaking down the list into several sublists until each sublist consists of a single element,
and then merging those sublists in a manner that results into a sorted list.
• Stable sorting algorithm.
• Not an in-place sorting algorithm.
Example
Time Complexity
• T(N) = O(N logN)
• Worst Case = Best case = Average Case
Radix Sort
• An integer sorting algorithm that sorts data with integer keys by grouping the keys by individual digits
that share the same significant position and value.
• Stable sorting algorithm.
• Not an in-place sorting algorithm.
Example
Heap Sort
• Comparison based sorting algorithm.
• Divides its input into a sorted and an unsorted region, and iteratively shrinks the unsorted region by
extracting the largest element from it and inserting it into the sorted region.
• Maintains the unsorted region in a heap data structure to more quickly find the largest element in each
step.
• Unstable sorting algorithm.
• In-place sorting algorithm.
Example:
Building a Heap
Sorting a Heap
Time Complexity
• T(N) = O(N logN)
Shell Sort
• Comparison based sorting algorithm.
• Generalization of sorting by exchange or sorting by insertion.
• Starts by sorting pairs of elements far apart from each other, then progressively reducing the gap
between elements to be compared.
• Also called as diminishing increment sort.
• Unstable sorting algorithm.
• In-place sorting algorithm.
Example:
Time Complexity:
• Worst Case Complexity:Less than or equal to O(n2).
• Best Case Complexity: T(n) = O(n logn)
• Average Case Complexity:T(n)=O(n logn).
Sl. Bloom's
Question CO Hints
No. level
Recall the linear search operation
Examine and elaborate the worst case
1 CO3 4 procedure. Try to judge the condition
scenario for linear search.
that might take this search operation to
consume maximum number of search
matches.
Recall how Bubble sort works by
comparing and swapping elements
Justify the number of interchanges required
whenever necessary. Henceforth, try to
2 to sort 5, 1, 6, 24 in ascending order using CO3 5
figure out the number of interchanges
Bubble Sort.
required for the problem given in
question.
Recall a sorting technique where any
Relate to a method of sorting which new element coming in is adjusted in
3 CO3 4
resembles the technique of playing cards. such a way so that the sorted order of the
existing others is not disturbed.
Recall that in radix sort, the procedure of
Test radix sort on the following list:
sorting is performed from least
4 189, 205, 986, 421, 97, 192, 535, 839, 562, CO3 4
significant digit to the most significant
674
digit.
Apply the binary search algorithm to locate a
Start by recalling the meaning of time
target element in a sorted array. Explain each
5 CO4 4 complexity of different searching
step of the algorithm and analyze its time
techniques.
complexity.
Given a partially sorted array, propose an
Start by recalling the concepts of merge
6 algorithm that takes advantage of the existing CO3 4
sort with time complexity.
order and optimizes the sorting process.
Evaluate the impact of different collision
resolution strategies in hash tables (e.g.,
Recall different collision resolution
7 chaining, open addressing) and propose a CO4 5
strategies in hashing.
new approach or improvement that mitigates
their limitations.
Create a function to implement the Insertion
Sort algorithm. The function should take an
Start by recalling the concepts of
8 array of integers as input and sort it in CO4 6
insertion sort with time complexity.
ascending order using the Insertion Sort
technique.
Explain Binary Search procedure for the You have to find the mid value using
following start and end index of the array. If search
9 list of elements and assume the key element CO3 4 element is mid then return the value,
is 85. otherwise you have to change the start
12, 23, 34, 45, 55, 62, 71, 85, 96 value or end value.
Write the name of the sorting technique
which is
used in playing cards game? Write a
Perform insertion sort in this sorting
10 procedure for CO4 6
technique.
sorting a given list of numbers using that
technique?
14, 25, 36, 74, 85, 6, 53, 62, 41 .
What is the idea behind Selection sort and
sort the Set Min to location 0. Look for the
following list of elements using that idea. smallest element on the list. Replace the
11 CO2 4
Array A = [ 7 value at location Min with a different
, 5 , 4 , 2 ] needs to be sorted in ascending value.
order.
In linear probing, the algorithm simply
Given the input { 4371, 1323, 6173, 4199,
looks for the next available slot in the
4344, 9679, 1989 } and a hash function of
hash table and places the collided key
h(X)=X
12 CO2 5 there. If that slot is also occupied, the
(mod 10) show the resulting:
algorithm continues searching for the
a. Open addressing hash table using linear
next available slot until an empty slot is
probing
found.
Analyze the time complexity for the
algorithm to calculate sum of natural Write both algorithms and use step count
13 CO1 4
numbers using loops and using direct method to calculate time complexity.
formula.
"All algorithms cannot have tight bound."- Go through the definition of theta
14 CO1 4
Analyze the justification of the statement. notation.
Show that f(n) is having upper bound of
Go through the definition of Big-Oh
15 O(n^2) with proper constant and minimum CO1 5
notation.
input value; f(n)= 10n^2 - 5n +3.
Design an algorithm of time complexity
16 CO1 6 Study step count method.
O(log(log n).
Analyze the benefits of using array of Take variable length multiple strings and
17 CO2 4
pointers over 2D array with proper example. compare storage requirement.
Design an algorithm to calculate the memory
18 address of any element in a N-dimensional CO2 6 Take the idea from 2D array formula.
array.
Use two pointers, say fast & slow. Fast
Design an algorithm to check whether a
19 CO2 6 will move 2 steps at a time and slow will
linked list is containing any loop or not.
1 step. If they meet then there is a loop.
There is a rat running from one door to
another door. The owner of the house is a
lazy person and waits at own room only.
20 Design an algorithm of the path in such a CO2 6 Use the concept of circular linked list.
way that after a regular interval, the owner
will get the opportunity to get catch hold of
the rat.
"If a binary tree is full or complete, then only
Use the concept how height of the tree
21 it will be efficient."- Analyze the justification CO3 4
influence search time in a tree.
of the statement.
Use a data structure to remove duplicates and
sort a given set of data Construct a BST and perform inorder
22 CO3 5
{34, 8, -4, 56, 34, 9, 7, 2, -4, 8, 23, 7, 45, -5, traversal of the tree.
14, 56, 9, 65} .
Use a data structure to arrange different jobs
associated with different months with their
priorities, so that office clerk can understand
which order jobs need to be done. Written as
(month, (priority)) pair. Small number
23 CO3 5 Construct a Heap tree.
indicates higher priority.
March(1), February (3), April (2), May(2),
January(4), December(4), September (5),
October(5), June(2), July(3), August(4),
November (5).
In a palace, a group of kids playing hide and
seek game. They place themselves in
different rooms, one person in one room. One
of them has to find rest by exploring the path
Use Depth First Search algorithm to
24 after finding anyone, this procedure will CO3 6
check the connectedness of any graph.
continue till that kid able to find all. There
may be some room where no can find also.
Suggest a method to conclude whether the
kid can reach all the room or not.
Analyze the requirement of sorted data in Take two types of sorted data, one with
regular intervals for the application of regular interval and another with
25 CO4 4
interpolation search. Explain with an irregular interval. Apply te algorithm and
example. count number of steps to find the data.
Analyze the performance of quick sort if Discuss worst case scenario of quick
26 CO4 4
input data is already sorted. sort.
In a office, similar types of confidential A heap tree can be used to categorize
documents are kept in the same vaults. files as per their confidentiality level. As
Similar types of documents can be it is a complete tree, calculate the
recognized by their level of confidentiality. minimum size of the array (number of
27 CO4 6
Also, vaults are of limited capacity, each of vaults). Also, calculate the maximum
them can keep 'x' number of documents. If number of vaults and use a simple
there are total 'n' number of documents, then modulo hash function to accommodate
suggest a method to manage this situation. files with linear probing.
Consider a double hashing scheme in which
the primary hash function is
h1(k)=k mod 23, and the secondary hash
function is h2(k)=1+(k mod 19).
28 Assume that the table size is 23. Then CO4 5 Study double hashing method.
calculate the address returned by probe 1 in
the probe sequence (assume that the probe
sequence begins at probe 0) for key value
k=90.
Explain the concept of Insertion Sort and
how it works to sort a list of elements.
Provide a step-by-step example to illustrate Discuss the working process of insertion
29 CO4 4
the sorting process. Discuss the time sort
complexity of Insertion Sort and its best and
worst-case scenarios.