Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Trees

Download as pdf or txt
Download as pdf or txt
You are on page 1of 129

SREE VIDYANIKETHAN ENGINEERING COLLEGE 

 
Tutorial -1-Study Material (Trees & Graph)

Index

​Topics ​Page No

1) Trees
a) Binary/ N-ary Trees 2-5
b) Binary Search Tree 6-10
c) Heaps/ Priority Queues 11-31

2) Graphs
a) Graph Representation 32-41
b) Breadth First Search 42-53
c) Depth First Search 54-58
d) Minimum Spanning Tree 59-66
e) Shortest Path Algorithm 67-70
f) Flood-Fill Algorithm 71-74
g) Articulation Points and Bridges 75-83
h) Biconnected Components 84-92
i) Strongly Connected Components 93-99
j) Topological Sort 100-106
k) Hamiltonian Path 107-115
l) Maximum flow 116-121
m) Minimum Cost Maximum Flow 122-124
n) Min-Cut 125-127

1
Topic-1 - Trees

Binary/ N-ary Trees

A binary tree is a structure comprising nodes, where each node has the
following 3 components:

1. Data element: Stores any kind of data in the node


2. Left pointer: Points to the tree on the left side of node
3. Right pointer: Points to the tree on the right side of the node

As the name suggests, the data element stores any kind of data in the
node.
The left and right pointers point to binary trees on the left and right side of
the node respectively.

If a tree is empty, it is represented by a null pointer.

The following image explains the various components of a tree.

2
Commonly-used terminologies

● Root: Top node in a tree


● Child: Nodes that are next to each other and connected downwards
● Parent: Converse notion of child
● Siblings: Nodes with the same parent
● Descendant: Node reachable by repeated proceeding from parent to
child
● Ancestor: Node reachable by repeated proceeding from child to
parent.
● Leaf: Node with no children
● Internal node: Node with at least one child
● External node: Node with no children

Structure code of a tree node

In programming, trees are declared as follows:

3
Creating nodes

Simple node

Pointer to a node

In this case, you must explicitly allocate the memory of the node type to the
pointer (preferred method).

Utility function returning node

Maximum depth/height of a tree

The idea is to do a post-order traversal and maintain two variables to store


the left depth and right depth and return max of both the depths.

4
Time complexity

O(n)

Application of trees
1. A Manipulate hierarchical data
2. Make information easy to search (see tree traversal)
3. Manipulate sorted lists of data
4. Use as a workflow for compositing digital images for visual effects
5. Use in router algorithms

Practice
Link to the Practice Problems:
Practice Questions

5
Binary Search Tree

For a binary tree to be a binary search tree, the data of all the nodes in the
left sub-tree of the root node should be ≤ the data of the root. The data of
all the nodes in the right subtree of the root node should be > the data of
the root.

Example

In Fig. 1, consider the root node with data = 10.

● Data in the left subtree is: [5,1,6]


● All data elements are < 10
● Data in the right subtree is: [19,17]
● All data elements are > 10

Also, considering the root node with data=5, its children also satisfy the
specified ordering. Similarly, the root node with data=19 also satisfies this
ordering. When recursive, all subtrees satisfy the left and right subtree
ordering.

The tree is known as a Binary Search Tree or BST.

Traversing the tree

6
There are mainly ​three​ types of tree traversals.

Pre-order traversal

In this traversal technique the traversal order is root-left-right i.e.


● Process data of root node
● First, traverse left subtree completely
● Then, traverse right subtree

Post-order traversal

In this traversal technique the traversal order is left-right-root.


● Process data of left subtree
● First, traverse right subtree
● Then, traverse root node

7
In-order traversal

In in-order traversal, do the following:


● First process left subtree (before processing root node)
● Then, process current root node
● Process right subtree

Consider the in-order traversal of a sample BST

● The 'inorder( )' procedure is called with root equal to node with
data=10
● Since the node has a left subtree, 'inorder( )' is called with root equal
to node with data=5
● Again, the node has a left subtree, so 'inorder( )' is called with root=1

The function call stack is as follows:

8
● Node with data=1 does not have a left subtree. Hence, this node is
processed.
● Node with data=1 does not have a right subtree. Hence, nothing is
done.
● inorder(1) gets completed and this function call is popped from the
call stack.
The stack is as follows:

● Left subtree of node with data=5 is completely processed. Hence, this


node gets processed.
● Right subtree of this node with data=5 is non-empty. Hence, the right
subtree gets processed now. 'inorder(6)' is then called.
Note

'inorder(6)' is only equivalent to saying inorder(pointer to node with data=6).


The notation has been used for brevity.

The function call stack is as follows:

Again, the node with data=6 has no left subtree, Therefore, it can be
processed and it also has no right subtree. 'inorder(6)' is then completed.

9
Both the left and right subtrees of node with data=5 have been completely
processed. Hence, inorder(5) is then completed.

● Now, node with data=10 is processed


● Right subtree of this node gets processed in a similar way as
described until step 10
● After right subtree of this node is completely processed, entire
traversal of the BST is complete

The order in which BST in Fig. 1 is visited is: 1, 5, 6, 10, 17, 19. The
in-order traversal of a BST gives a sorted ordering of the data elements
that are present in the BST. This is an important property of a BST.

Insertion in BST

Consider the insertion of data=20 in the BST.

Algorithm

Compare data of the root node and element to be inserted.


1. If the data of the root node is greater, and if a left subtree exists, then
repeat step 1 with root = root of left subtree. Else, insert element as
left child of current root.

10
2. If the data of the root node is greater, and if a right subtree exists,
then repeat step 2 with root = root of right subtree. Else, insert
element as right child of current root.

Implementation

Practice
Link to the Practice Problems:
Practice Questions

Heaps/Priority Queues

Heaps

A heap is a tree-based data structure in which all the nodes of the tree are
in a specific order.

11
For example, if X is the parent node of Y, then the value of X follows a
specific order with respect to the value of Y and the same order will be
followed across the tree.
The maximum number of children of a node in a heap depends on the type
of heap. However, in the more commonly-used heap type, there are at
most 2 children of a node and it's known as a Binary heap.

In binary heap, if the heap is a complete binary tree with N nodes, then it
has smallest possible height which is log2N .

In the diagram above, you can observe a particular sequence, i.e each
node has greater value than any of its children.

Suppose there are N Jobs in a queue to be done, and each job has its own
priority. The job with maximum priority will get completed first than others.
At each instant, we are completing a job with maximum priority and at the
same time we are also interested in inserting a new job in the queue with
its own priority.

So at each instant we have to check for the job with maximum priority to
complete it and also insert if there is a new job. This task can be very easily
executed using a heap by considering N jobs as N nodes of the tree.

As you can see in the diagram below, we can use an array to store the
nodes of the tree. Let’s say we have 7 elements with values {6, 4, 5, 3, 2, 0,
1}.
12
Note: An array can be used to simulate a tree in the following way. If we
are storing one element at index i in array Arr, then its parent will be stored
at index i/2 (unless its a root, as root has no parent) and can be accessed
by Arr[i/2], and its left child can be accessed by Arr[2∗i] and its right child
can be accessed by Arr[2∗i+1]. Index of root will be 1 in an array.

There can be two types of heap:

Max Heap: In this type of heap, the value of parent node will always be
greater than or equal to the value of child node across the tree and the
node with highest value will be the root node of the tree.

13
Implementation:

Let’s assume that we have a heap having some elements which are stored
in array Arr. The way to convert this array into a heap structure is the
following. We pick a node in the array, check if the left sub-tree and the
right sub-tree are max heaps, in themselves and the node itself is a max
heap (it’s value should be greater than all the child nodes)

To do this we will implement a function that can maintain the property of


max heap (i.e each element value should be greater than or equal to any of
its child and smaller than or equal to its parent)

Complexity: O(logN)

Example:
In the diagram below,initially 1st node (root node) is violating property of
max-heap as it has smaller value than its children, so we are performing
max_heapify function on this node having value 4.

14
As 8 is greater than 4, so 8 is swapped with 4 and max_heapify is
performed again on 4, but on different position. Now in step 2, 6 is greater
than 4, so 4 is swapped with 6 and we will get a max heap, as now 4 is a
leaf node, so further call to max_heapify will not create any effect on heap.

Now as we can see that we can maintain max- heap by using max_heapify
function.

Before moving ahead, lets observe a property which states: A N element


heap stored in an array has leaves indexed by N/2+1, N/2+2 , N/2+3 ….
upto N.

Let’s observe this with an example:


Lets take above example of 7 elements having values {8, 7, 6, 3, 2, 4, 5}.

15
So you can see that elements 3, 2, 4, 5 are indexed by N/2+1 (i.e 4), N/2+2
(i.e 5 ) and N/2+3 (i.e 6) and N/2+4 (i.e 7) respectively.

Building MAX HEAP:

Now let’s say we have N elements stored in the array Arr indexed from 1 to
N. They are currently not following the property of max heap. So we can
use max-heapify function to make a max heap out of the array.

How?

16
From the above property we observed that elements from Arr[N/2+1] to
Arr[N] are leaf nodes, and each node is a 1 element heap. We can use
max_heapify function in a bottom up manner on remaining nodes, so that
we can cover each node of tree.

Complexity: O(N). max_heapify function has complexity logN and the


build_maxheap functions runs only N/2 times, but the amortized complexity
for this function is actually linear.
For more details, you can refer ​this​.

Example:
Suppose you have 7 elements stored in array Arr.

Here N=7, so starting from node having index N/2=3, (also having value 3
in the above diagram), we will call max_heapify from index N/2 to 1.

17
In the diagram below:

In step 1, in max_heapify(Arr, 3), as 10 is greater than 3, 3 and 10 are


swapped and further call to max_heap(Arr, 7) will have no effect as 3 is a
leaf node now.

In step 2, calling max_heapify(Arr, 2) , (node indexed with 2 has value 4) ,


4 is swapped with 8 and further call to max_heap(Arr, 5) will have no effect,
as 4 is a leaf node now.

In step 3, calling max_heapify(Arr, 1) , (node indexed with 1 has value 1 ),


1 is swapped with 10 .

Step 4 is a subpart of step 3, as after swapping 1 with 10, again a recursive


call of max_heapify(Arr, 3) will be performed , and 1 will be swapped with 9.
Now further call to max_heapify(Arr, 7) will have no effect, as 1 is a leaf
node now.

In step 5, we finally get a max- heap and the elements in the array Arr will
be :
18
Min Heap: In this type of heap, the value of parent node will always be less
than or equal to the value of child node across the tree and the node with
lowest value will be the root node of tree.

As you can see in the above diagram, each node has a value smaller than
the value of their children.
We can perform same operations as performed in building max_heap.

First we will make function which can maintain the min heap property, if
some element is violating it.

19
Complexity: O(logN) .

Example:

Suppose you have elements stored in array Arr {4, 5, 1, 6, 7, 3, 2}. As you
can see in the diagram below, the element at index 1 is violating the
property of min -heap, so performing min_heapify(Arr, 1) will maintain the
min-heap.

20
Now let’s use above function in building min-heap. We will run the above
function on remaining nodes other than leaves as leaf nodes are 1 element
heap.

Complexity: O(N). The complexity calculation is similar to that of building


max heap.

Example:
Consider elements in array {10, 8, 9, 7, 6, 5, 4} . We will run min_heapify
on nodes indexed from N/2 to 1. Here node indexed at N/2 has value 9.
And at last, we will get a min_heap.

21
Heaps can be considered as partially ordered tree, as you can see in the
above examples that the nodes of tree do not follow any order with their
siblings(nodes on the same level). They can be mainly used when we give
more priority to smallest or the largest node in the tree as we can extract
these node very efficiently using heaps.

APPLICATIONS:

1) Heap Sort:

We can use heaps in sorting the elements in a specific order in efficient


time.

22
Let’s say we want to sort elements of array Arr in ascending order. We can
use max heap to perform this operation.

Idea: We build the max heap of elements stored in Arr, and the maximum
element of Arr will always be at the root of the heap.

Leveraging this idea we can sort an array in the following manner.

Processing:

● Initially we will build a max heap of elements in Arr.


● Now the root element that is Arr[1] contains maximum element of Arr.
After that, we will exchange this element with the last element of Arr
and will again build a max heap excluding the last element which is
already in its correct position and will decrease the length of heap by
one.
● We will repeat the step 2, until we get all the elements in their correct
position.
● We will get a sorted array.

Implementation:
Suppose there are N elements stored in array Arr.

Complexity: As we know max_heapify has complexity O(logN),


build_maxheap has complexity O(N) and we run max_heapify N−1 times in
heap_sort function, therefore complexity of heap_sort function is O(NlogN).

23
Example:
In the diagram below,initially there is an unsorted array Arr having 6
elements. We begin by building max-heap.

After building max-heap, the elements in the array Arr will be:

24
Processing:

Step 1: 8 is swapped with 5.


Step 2: 8 is disconnected from heap as 8 is in correct position now.
Step 3: Max-heap is created and 7 is swapped with 3.
Step 4: 7 is disconnected from heap.
Step 5: Max heap is created and 5 is swapped with 1.
Step 6: 5 is disconnected from heap.
Step 7: Max heap is created and 4 is swapped with 3.
Step 8: 4 is disconnected from heap.
Step 9: Max heap is created and 3 is swapped with 1.
Step 10: 3 is disconnected.

25
26
After all the steps, we will get a sorted array.

2) Priority Queue:

Priority Queue is similar to queue where we insert an element from the


back and remove an element from front, but with a difference that the
logical order of elements in the priority queue depends on the priority of the
elements. The element with highest priority will be moved to the front of the
queue and one with lowest priority will move to the back of the queue. Thus
it is possible that when you enqueue an element at the back in the queue, it
can move to front because of its highest priority.

Example:

Let’s say we have an array of 5 elements : {4, 8, 1, 7, 3} and we have to


insert all the elements in the max-priority queue.

First as the priority queue is empty, so 4 will be inserted initially.

Now when 8 will be inserted it will moved to front as 8 is greater than 4.

While inserting 1, as it is the current minimum element in the priority queue,


it will remain in the back of priority queue.

Now 7 will be inserted between 8 and 4 as 7 is smaller than 8.

Now 3 will be inserted before 1 as it is the 2​nd​ minimum element in the


priority queue. All the steps are represented in the diagram below:

27
We can think of many ways to implement the priority queue.

Naive Approach:
Suppose we have N elements and we have to insert these elements in the
priority queue. We can use list and can insert elements in O(N)time and
can sort them to maintain a priority queue in O(NlogN) time.

Efficient Approach:
We can use heaps to implement the priority queue. It will take O(logN) time
to insert and delete each element in the priority queue.

Based on heap structure, priority queue also has two types max- priority
queue and min - priority queue.

Let’s focus on Max Priority Queue.

Max Priority Queue is based on the structure of max heap and can perform
following operations:
maximum(Arr) : It returns maximum element from the Arr.
extract_maximum (Arr) - It removes and return the maximum element from
the Arr.
increase_val (Arr, i , val) - It increases the key of element stored at index i
in Arr to new value val.

28
insert_val (Arr, val ) - It inserts the element with value val in Arr.

Implementation:

length = number of elements in Arr.

Maximum :

Complexity: O(1)

Extract Maximum: In this operation, the maximum element will be returned


and the last element of heap will be placed at index 1 and max_heapify will
be performed on node 1 as placing last element on index 1 will violate the
property of max-heap.

Complexity: O(logN).

Increase Value: In case increasing value of any node, it may violate the
property of max-heap, so we may have to swap the parent’s value with the
node’s value until we get a larger value on parent node.

29
Complexity : O(logN).

Insert Value :

Complexity: O(logN).

Example:
Initially there are 5 elements in priority queue.
Operation: Insert Value(Arr, 6)
In the diagram below, inserting another element having value 6 is violating
the property of max-priority queue, so it is swapped with its parent having
value 4, thus maintaining the max priority queue.

30
Operation: Extract Maximum:

In the diagram below, after removing 8 and placing 4 at node 1, violates the
property of max-priority queue. So max_heapify(Arr, 1) will be performed
which will maintain the property of max - priority queue.

As discussed above, like heaps we can use priority queues in scheduling of


jobs. When there are N jobs in queue, each having its own priority. If the
job with maximum priority will be completed first and will be removed from
the queue, we can use priority queue’s operation extract_maximum here. If
at every instant we have to add a new job in the queue, we can use
insert_value operation as it will insert the element in O(logN) and will also
maintain the property of max heap.
Practice
Link to the Practice Problems: ​Practice Questions

31
Topic - 2 - Graphs

Graph Representation

Graphs are mathematical structures that represent pairwise relationships


between objects. A graph is a flow structure that represents the relationship
between various objects. It can be visualized by using the following two
basic components:

● Nodes: These are the most important components in any graph.


Nodes are entities whose relationships are expressed using edges. If
a graph comprises 2 nodes A and B and an undirected edge between
them, then it expresses a bi-directional relationship between the
nodes and edge.
● Edges: Edges are the components that are used to represent the
relationships between various nodes in a graph. An edge between
two nodes expresses a one-way or two-way relationship between the
nodes.

Types of nodes

● Root node: The root node is the ancestor of all other nodes in a
graph. It does not have any ancestor. Each graph consists of exactly
one root node. Generally, you must start traversing a graph from the
root node.
● Leaf nodes: In a graph, leaf nodes represent the nodes that do not
have any successors. These nodes only have ancestor nodes. They
can have any number of incoming edges but they will not have any
outgoing edges.

32
Types of graphs

● Undirected: An undirected graph is a graph in which all the edges are


bi-directional i.e. the edges do not point in any specific direction.

● Directed: A directed graph is a graph in which all the edges are


uni-directional i.e. the edges point in a single direction.

33
Weighted: In a weighted graph, each edge is assigned a weight or cost.
Consider a graph of 4 nodes as in the diagram below. As you can see each
edge has a weight/cost assigned to it. If you want to go from vertex 1 to
vertex 3, you can take one of the following 3 paths:

● 1 -> 2 -> 3
● 1 -> 3
● 1 -> 4 -> 3
Therefore the total cost of each path will be as follows: - The total cost of 1
-> 2 -> 3 will be (1 + 2) i.e. 3 units - The total cost of 1 -> 3 will be 1 unit -
The total cost of 1 -> 4 -> 3 will be (3 + 2) i.e. 5 units

Cyclic: A graph is cyclic if the graph comprises a path that starts from a
vertex and ends at the same vertex. That path is called a cycle. An acyclic
graph is a graph that has no cycle.

A tree is an undirected graph in which any two vertices are connected by


only one path. A tree is an acyclic graph and has N - 1 edges where N is
the number of vertices. Each node in a graph may have one or multiple
parent nodes. However, in a tree, each node (except the root node)
comprises exactly one parent node.
34
Note​: A root node has no parent.

A tree cannot contain any cycles or self loops, however, the same does not
apply to graphs.

Graph representation

You can represent a graph in many ways. The two most common ways of
representing a graph is as follows:

Adjacency matrix

An adjacency matrix is a VxV binary matrix A. Element Ai,j is 1 if there is an


edge from vertex i to vertex j else Ai,j is 0.

Note​: A binary matrix is a matrix in which the cells can have only one of two
possible values - either a 0 or 1.

The adjacency matrix can also be modified for the weighted graph in which
instead of storing 0 or 1 in Ai,j, the weight or cost of the edge will be stored.

35
In an undirected graph, if Ai,j = 1, then Aj,i = 1. In a directed graph, if Ai,j =
1, then Aj,i may or may not be 1.

Adjacency matrix provides constant time access (O(1) ) to determine if


there is an edge between two nodes. Space complexity of the adjacency
matrix is O(V2).

The adjacency matrix of the following graph is:


i/j : 1 2 3 4
1:0101
2:1010
3:0101
4:1010

The adjacency matrix of the following graph is:


i/j: 1 2 3 4
1:0100
2:0001
3:1001
4:0100

36
Consider the directed graph given above. Let's create this graph using an
adjacency matrix and then show all the edges that exist in the graph.

Input file
4 // nodes
5 //edges
12 //showing edge from node 1 to node 2
24 //showing edge from node 2 to node 4
31 //showing edge from node 3 to node 1
34 //showing edge from node 3 to node 4
42 //showing edge from node 4 to node 2
Code

37
Output

There is an edge between 3 and 4.

There is no edge between 2 and 3.

Adjacency list

The other way to represent a graph is by using an adjacency list. An


adjacency list is an array A of separate lists. Each element of the array A​i​is
a list, which contains all the vertices that are adjacent to vertex i.

38
For a weighted graph, the weight or cost of the edge is stored along with
the vertex in the list using pairs. In an undirected graph, if vertex j is in list
Ai then vertex i will be in list Aj.

The space complexity of adjacency list is O(V + E) because in an


adjacency list information is stored only for those edges that actually exist
in the graph. In a lot of cases, where a matrix is sparse using an adjacency
matrix may not be very useful. This is because using an adjacency matrix
will take up a lot of space where most of the elements will be 0, anyway. In
such cases, using an adjacency list is better.

Note:​ A sparse matrix is a matrix in which most of the elements are zero,
whereas a dense matrix is a matrix in which most of the elements are
non-zero.

Consider the same undirected graph from an adjacency matrix. The


adjacency list of the graph is as follows:

A1 → 2 → 4
A2 → 1 → 3
A3 → 2 → 4
A4 → 1 → 3
39
Consider the same directed graph from an adjacency matrix. The
adjacency list of the graph is as follows:

A1 → 2
A2 → 4
A3 → 1 → 4
A4 → 2

Consider the directed graph given above. The code for this graph is as
follows:

Input file

4 // nodes
5 //edges
12 //showing edge from node 1 to node 2
24 //showing edge from node 2 to node 4
31 //showing edge from node 3 to node 1
34 //showing edge from node 3 to node 4
42 //showing edge from node 4 to node 2

Code
40
Output

● Adjacency list of node 1: 2


● Adjacency list of node 2: 4
● Adjacency list of node 3: 1 --> 4
● Adjacency list of node 4: 2

Practice
Link to the Practice Problems:​ Practice Questions

41
Breadth First Search

Graph traversals

Graph traversal means visiting every vertex and edge exactly once in a
well-defined order. While using certain graph algorithms, you must ensure
that each vertex of the graph is visited exactly once. The order in which the
vertices are visited are important and may depend upon the algorithm or
question that you are solving.

During a traversal, it is important that you track which vertices have been
visited. The most common way of tracking vertices is to mark them.

Breadth First Search (BFS)

There are many ways to traverse graphs. BFS is the most commonly used
approach.
BFS is a traversing algorithm where you should start traversing from a
selected node (source or starting node) and traverse the graph layerwise
thus exploring the neighbour nodes (nodes which are directly connected to
source node). You must then move towards the next-level neighbour
nodes.

As the name BFS suggests, you are required to traverse the graph
breadthwise as follows:

1. First move horizontally and visit all the nodes of the current layer
2. Move to the next layer

42
The distance between the nodes in layer 1 is comparitively lesser than the
distance between the nodes in layer 2. Therefore, in BFS, you must
traverse all the nodes in layer 1 before you move to the nodes in layer 2.

Traversing child nodes

A graph can contain cycles, which may bring you to the same node again
while traversing the graph. To avoid processing of same node again, use a
boolean array which marks the node after it is processed. While visiting the
nodes in the layer of a graph, store them in a manner such that you can
traverse the corresponding child nodes in a similar order.

In the earlier diagram, start traversing from 0 and visit its child nodes 1, 2,
and 3. Store them in the order in which they are visited. This will allow you
to visit the child nodes of 1 first (i.e. 4 and 5), then of 2 (i.e. 6 and 7), and
then of 3 (i.e. 7) etc.

To make this process easy, use a queue to store the node and mark it as
'visited' until all its neighbours (vertices that are directly connected to it) are
marked. The queue follows the First In First Out (FIFO) queuing method,
and therefore, the neigbors of the node will be visited in the order in which
they were inserted in the node i.e. the node that was inserted first will be
visited first, and so on.

43
Pseudocode

Traversing process

44
45
The traversing will start from the source node and push ​s​ in queue. ​s​ will be
marked as 'visited'.

First iteration

● s will be popped from the queue


● Neighbors of s i.e. 1 and 2 will be traversed
● 1 and 2, which have not been traversed earlier, are traversed. They
will be:
○ Pushed in the queue
○ 1 and 2 will be marked as visited

Second iteration

● 1 is popped from the queue


● Neighbors of 1 i.e. s and 3 are traversed
● s is ignored because it is marked as 'visited'
● 3, which has not been traversed earlier, is traversed. It is:
○ Pushed in the queue
○ Marked as visited

Third iteration

● 2 is popped from the queue


● Neighbors of 2 i.e. s, 3, and 4 are traversed
● 3 and s are ignored because they are marked as 'visited'
● 4, which has not been traversed earlier, is traversed. It is:
○ Pushed in the queue
○ Marked as visited

Fourth iteration

● 3 is popped from the queue


46
● Neighbors of 3 i.e. 1, 2, and 5 are traversed
● 1 and 2 are ignored because they are marked as 'visited'
● 5, which has not been traversed earlier, is traversed. It is:
○ Pushed in the queue
○ Marked as visited

Fifth iteration

● 4 will be popped from the queue


● Neighbors of 4 i.e. 2 is traversed
● 2 is ignored because it is already marked as 'visited'

Sixth iteration

● 5 is popped from the queue


● Neighbors of 5 i.e. 3 is traversed
● 3 is ignored because it is already marked as 'visited'

The queue is empty and it comes out of the loop. All the nodes have been
traversed by using BFS.

If all the edges in a graph are of the same weight, then BFS can also be
used to find the minimum distance between the nodes in a graph.

Example

47
As in this diagram, start from the source node, to find the distance between
the source node and node 1. If you do not follow the BFS algorithm, you
can go from the source node to node 2 and then to node 1. This approach
will calculate the distance between the source node and node 1 as 2,
whereas, the minimum distance is actually 1. The minimum distance can
be calculated correctly by using the BFS algorithm.

Complexity

The time complexity of BFS is O(V + E), where V is the number of nodes
and E is the number of edges.

Applications

1. How to determine the level of each node in the given tree?

48
As you know in BFS, you traverse level wise. You can also use BFS to
determine the level of each node.

Implementation

This code is similar to the BFS code with only the following difference:
level[ v[ p ][ i ] ] = level[ p ]+1;

In this code, while you visit each node, the level of that node is set with an
increment in the level of its parent node. This is how the level of each node
is determined.

49
2. 0-1 BFS

This type of BFS is used to find the shortest distance between two nodes in
a graph provided that the edges in the graph have the weights 0 or 1. If you
apply the BFS explained earlier in this article, you will get an incorrect
result for the optimal distance between 2 nodes.

In this approach, a boolean array is not used to mark the node because the
condition of the optimal distance will be checked when you visit each node.
A double-ended queue is used to store the node. In 0-1 BFS, if the weight
of the edge = 0, then the node is pushed to the front of the dequeue. If the
weight of the edge = 1, then the node is pushed to the back of the
dequeue.

Implementation

50
Here, ​edges[ v ] [ i ]​ is an adjacency list that exists in the form of pairs i.e.
edges[ v ][ i ].first​ will contain the node to which ​v​ is connected and ​edges[
v ][ i ].second​ will contain the distance between ​v​ and ​edges[ v ][ i ].first​.

Q​ is a double-ended queue. The distance is an array where, ​distance[ v ]


will contain the distance from the start node to ​v​ node. Initially the distance
defined from the source node to each node is infinity.

Let’s understand this code with the following graph:

51
The adjacency list of the graph will be as follows:
Here 's' is considered to be 0 or source node.

0 -> 1 -> 3 -> 2


edges[ 0 ][ 0 ].first = 1 , edges[ 0 ][ 0 ].second = 1
edges[ 0 ][ 1 ].first = 3 , edges[ 0 ][ 1 ].second = 0
edges[ 0 ][ 2 ].first = 2 , edges[ 0 ][ 2 ].second = 1

1 -> 0 -> 4
edges[ 1 ][ 0 ].first = 0 , edges[ 1 ][ 0 ].second = 1
edges[ 1 ][ 1 ].first = 4 , edges[ 1 ][ 1 ].second = 0

2 -> 0 -> 3
edges[ 2 ][ 0 ].first = 0 , edges[ 2 ][ 0 ].second = 0
edges[ 2 ][ 1 ].first = 3 , edges[ 2 ][ 1 ].second = 0

3 -> 0 -> 2 -> 4


edges[ 3 ][ 0 ].first = 0 , edges[ 3 ][ 0 ].second = 0
edges[ 3 ][ 2 ].first = 2 , edges[ 3 ][ 2 ].second = 0
edges[ 3 ][ 3 ].first = 4 , edges[ 3 ][ 3 ].second = 0

52
4 -> 1 -> 3
edges[ 4 ][ 0 ].first = 1 , edges[ 4 ][ 0 ].second = 0
edges[ 4 ][ 1 ].first = 3 , edges[ 4 ][ 1 ].second = 0

If you use the BFS algorithm, the result will be incorrect because it will
show you the optimal distance between s and node 1 and s and node 2 as
1 respectively. This is because it visits the children of s and calculates the
distance between s and its children, which is 1. The actual optimal distance
is 0 in both cases.

Processing

Starting from the source node, i.e 0, it will move towards 1, 2, and 3. Since
the edge weight between 0 and 1 and 0 and 2 is 1 respectively, 1 and 2 will
be pushed to the back of the queue. However, since the edge weight
between 0 and 3 is 0, 3 will pushed to the front of the queue. The distance
will be maintained in distance array accordingly.

3 will then be popped from the queue and the same process will be applied
to its neighbours, and so on.

Practice

Link to the Practice Problems: ​Practice Questions

53
Depth First Search

Depth First Search (DFS)

The DFS algorithm is a recursive algorithm that uses the idea of


backtracking. It involves exhaustive searches of all the nodes by going
ahead, if possible, else by backtracking.

Here, the word backtrack means that when you are moving forward and
there are no more nodes along the current path, you move backwards on
the same path to find nodes to traverse. All the nodes will be visited on the
current path till all the unvisited nodes have been traversed after which the
next path will be selected.

This recursive nature of DFS can be implemented using stacks. The basic
idea is as follows:
Pick a starting node and push all its adjacent nodes into a stack.

Pop a node from stack to select the next node to visit and push all its
adjacent nodes into a stack.
Repeat this process until the stack is empty. However, ensure that the
nodes that are visited are marked. This will prevent you from visiting the
same node more than once. If you do not mark the nodes that are visited
and you visit the same node more than once, you may end up in an infinite
loop.

Pseudocode

54
The following image shows how DFS works.

Time complexity O(V+E), when implemented using an adjacency list.

55
Applications

How to find connected components using DFS?

A graph is said to be disconnected if it is not connected, i.e. if two nodes


exist in the graph such that there is no edge in between those nodes. In an
undirected graph, a connected component is a set of vertices in a graph
that are linked to each other by paths.

Consider the example given in the diagram. Graph G is a disconnected


graph and has the following 3 connected components.

● First connected component is 1 -> 2 -> 3 as they are linked to each


other
● Second connected component 4 -> 5
● Third connected component is vertex 6

In DFS, if we start from a start node it will mark all the nodes connected to
the start node as visited. Therefore, if we choose any node in a connected
component and run DFS on that node it will mark the whole connected
component as visited.

56
Input File

6
4
12
23
13
45

Code

57
Output
Number of connected components: 3
Time complexity O(V+E), when implemented using the adjacency list.

Practice

Link to the Practice Problems:​ Practice Questions

58
Minimum Spanning Tree

Pre-requisites: Graphs, Trees

What is a Spanning Tree?


Given an undirected and connected graph G=(V,E), a spanning tree of the
graph G is a tree that spans G (that is, it includes every vertex of G) and is
a subgraph of G (every edge in the tree belongs to G)

What is a Minimum Spanning Tree?


The cost of the spanning tree is the sum of the weights of all the edges in
the tree. There can be many spanning trees. Minimum spanning tree is the
spanning tree where the cost is minimum among all the spanning trees.
There also can be many minimum spanning trees.
Minimum spanning tree has direct application in the design of networks. It
is used in algorithms approximating the travelling salesman problem,
multi-terminal minimum cut problem and minimum-cost weighted perfect
matching. Other practical applications are:
1. Cluster Analysis
2. Handwriting recognition
3. Image segmentation

59
There are two famous algorithms for finding the Minimum Spanning Tree:

Kruskal’s Algorithm
Kruskal’s Algorithm builds the spanning tree by adding edges one by one
into a growing spanning tree. Kruskal's algorithm follows greedy approach
as in each iteration it finds an edge which has least weight and add it to the
growing spanning tree.
Algorithm Steps:

● Sort the graph edges with respect to their weights.


● Start adding edges to the MST from the edge with the smallest weight
until the edge of the largest weight.
● Only add edges which doesn't form a cycle , edges which connect
only disconnected components.

So now the question is how to check if 2 vertices are connected or not ?


This could be done using DFS which starts from the first vertex, then check
if the second vertex is visited or not. But DFS will make time complexity
large as it has an order of O(V+E) where V is the number of vertices, E is
the number of edges. So the best solution is "Disjoint Sets":
Disjoint sets are sets whose intersection is the empty set so it means that
they don't have any element in common.
60
Consider following example:

61
In Kruskal’s algorithm, at each iteration we will select the edge with the
lowest weight. So, we will start with the lowest weighted edge first i.e., the
edges with weight 1. After that we will select the second lowest weighted
edge i.e., edge with weight 2. Notice these two edges are totally disjoint.
Now, the next edge will be the third lowest weighted edge i.e., edge with
weight 3, which connects the two disjoint pieces of the graph. Now, we are
not allowed to pick the edge with weight 4, that will create a cycle and we
can’t have any cycles. So we will select the fifth lowest weighted edge i.e.,
edge with weight 5. Now the other two edges will create cycles so we will
ignore them. In the end, we end up with a minimum spanning tree with total
cost 11 ( = 1 + 2 + 3 + 5).

Implementation:

62
63
Time Complexity:

In Kruskal’s algorithm, most time consuming operation is sorting because


the total complexity of the Disjoint-Set operations will be O(ElogV), which is
the overall Time Complexity of the algorithm.

Prim’s Algorithm
Prim’s Algorithm also use Greedy approach to find the minimum spanning
tree. In Prim’s Algorithm we grow the spanning tree from a starting position.
Unlike an edge in Kruskal's, we add vertex to the growing spanning tree in
Prim's.
Algorithm Steps:
● Maintain two disjoint sets of vertices. One containing vertices that are
in the growing spanning tree and other that are not in the growing
spanning tree.
● Select the cheapest vertex that is connected to the growing spanning
tree and is not in the growing spanning tree and add it into the
growing spanning tree. This can be done using Priority Queues.
Insert the vertices, that are connected to growing spanning tree, into
the Priority Queue.
● Check for cycles. To do that, mark the nodes which have been
already selected and insert only those nodes in the Priority Queue
that are not marked.

Consider the example below:

64
In Prim’s Algorithm, we will start with an arbitrary node (it doesn’t matter
which one) and mark it. In each iteration we will mark a new vertex that is
adjacent to the one that we have already marked. As a greedy algorithm,
Prim’s algorithm will select the cheapest edge and mark the vertex. So we
will simply choose the edge with weight 1. In the next iteration we have
three options, edges with weight 2, 3 and 4. So, we will select the edge with
weight 2 and mark the vertex. Now again we have three options, edges
with weight 3, 4 and 5. But we can’t choose edge with weight 3 as it is
creating a cycle. So we will select the edge with weight 4 and we end up
with the minimum spanning tree of total cost 7 ( = 1 + 2 +4).

Implementation:

65
Time
Complexity:
The time complexity of the Prim’s Algorithm is O((V+E)logV) because each
vertex is inserted in the priority queue only once and insertion in priority
queue take logarithmic time.
Practice
Link to the Practice Problems: ​Practice Questions

66
Shortest Path Algorithms

The shortest path problem is about finding a path between 2 vertices in a


graph such that the total sum of the edges weights is minimum.
This problem could be solved easily using (BFS) if all edge weights were
(1), but here weights can take any value. Three different algorithms are
discussed below depending on the use-case.

Bellman Ford's Algorithm:


Bellman Ford's algorithm is used to find the shortest paths from the source
vertex to all other vertices in a weighted graph. It depends on the following
concept: Shortest path contains at most n−1 edges, because the shortest
path couldn't have a cycle.
So why shortest path shouldn't have a cycle ?
There is no need to pass a vertex again, because the shortest path to all
other vertices could be found without the need for a second visit for any
vertices.
Algorithm Steps:
● The outer loop traverses from 0 : n−1.
● Loop over all edges, check if the next node distance > current node
distance + edge weight, in this case update the next node distance to
"current node distance + edge weight".
This algorithm depends on the relaxation principle where the shortest
distance for all vertices is gradually replaced by more accurate values until
eventually reaching the optimum solution. In the beginning all vertices have
a distance of "Infinity", but only the distance of the source vertex = 0, then
update all the connected vertices with the new distances (source vertex
distance + edge weights), then apply the same concept for the new vertices
with new distances and so on.
Implementation:

Assume the source node has a number (0):


67
A very important application of Bellman Ford is to check if there is a
negative cycle in the graph,
Time Complexity of Bellman Ford algorithm is relatively high O(V⋅E), in
case E=V2, O(E3).
Let's discuss an optimized algorithm.

Dijkstra's Algorithm
Dijkstra's algorithm has many variants but the most common one is to find
the shortest paths from the source vertex to all other vertices in the graph.
Algorithm Steps:
● Set all vertices distances = infinity except for the source vertex, set
the source distance = 0.
● Push the source vertex in a min-priority queue in the form (distance ,
vertex), as the comparison in the min-priority queue will be according
to vertices distances.
● Pop the vertex with the minimum distance from the priority queue (at
first the popped vertex = source).

68
● Update the distances of the connected vertices to the popped vertex
in case of "current vertex distance + edge weight < next vertex
distance", then push the vertex
● with the new distance to the priority queue.
● If the popped vertex is visited before, just continue without using it.
● Apply the same algorithm again until the priority queue is empty.

Implementation:

Assume the source vertex = 1.

Time Complexity of Dijkstra's Algorithm is O(V2) but with min-priority queue


it drops down to O(V+ElogV).
However, if we have to find the shortest path between all pairs of vertices,
both of the above methods would be expensive in terms of time. Discussed
below is another alogorithm designed for this case.

69
Floyd–Warshall's Algorithm
Floyd–Warshall's Algorithm is used to find the shortest paths between
between all pairs of vertices in a graph, where each edge in the graph has
a weight which is positive or negative. The biggest advantage of using this
algorithm is that all the shortest distances between any 2vertices could be
calculated in O(V3), where V is the number of vertices in a graph.
The Algorithm Steps:
For a graph with N vertices:
● Initialize the shortest paths between any 2 vertices with Infinity.
● Find all pair shortest paths that use 0 intermediate vertices, then find
the shortest paths that use 1 intermediate vertex and so on.. until
using all N vertices as intermediate nodes.
● Minimize the shortest paths between any 2 pairs in the previous
operation.
● For any 2 vertices (i,j) , one should actually minimize the distances
between this pair using the first K nodes, so the shortest path will be:
min(dist[i][k]+dist[k][j],dist[i][j]).
dist[i][k] represents the shortest path that only uses the first K vertices,
dist[k][j] represents the shortest path between the pair k,j. As the shortest
path will be a concatenation of the shortest path from i to k, then from k to j.

Time Complexity of Floyd–Warshall's Algorithm is O(V3), where V is the


number of vertices in a graph.
Practice
Link to the Practice Problems: ​Practice Questions

70
Flood-fill Algorithm

Flood fill algorithm helps in visiting each and every point in a given area. It
determines the area connected to a given cell in a multi-dimensional array.
Following are some famous implementations of flood fill algorithm:

Bucket Fill in Paint:

Clicking in an area with this tool selected fills that area with the selected
color.

Solving a Maze:

Given a matrix with some starting point, and some destination with some
obstacles in between, this algorithm helps to find out the path from source
to destination

Minesweeper:

When a blank cell is discovered, this algorithm helps in revealing


neighboring cells. This step is done recursively till cells having numbers are
discovered.

Flood fill algorithm can be simply modeled as graph traversal problem,


representing the given area as a matrix and considering every cell of that
matrix as a vertex that is connected to points above it, below it, to right of it,
and to left of it and in case of 8-connections, to the points at both diagonals
also. For example, consider the image given below.

71
It clearly shows how the cell in the middle is connected to cells around it.
For instance, there are 8-connections like there are in Minesweeper
(clicking on any cell that turns out to be blank reveals 8 cells around it
which contains a number or are blank). The cell (1,1) is connected to (0,0),
(0,1), (0,2), (1,0), (1,2), (2,0), (2,1), (2,2).

In general any cell (x,y) is connected to (x−1,y−1), (x−1,y), (x−1,y+1),


(x,y−1), (x,y+1), (x+1,y−1), (x+1,y), (x+1,y+1). Of course, the boundary
conditions are to be kept in mind.

Now that the given area has been modeled as a graph, a DFS or BFS can
be applied to traverse that graph. The pseudo code is given below.

The above code visits each and every cell of a matrix of size n×m starting
with some source cell. Time Complexity of above algorithm is O(n×m).

72
One another use of flood algorithm is found in solving a maze. Given a
matrix, a source cell, a destination cell, some cells which cannot be visited,
and some valid moves, check if the destination cell can be reached from
the source cell. Matrix given in the image below shows one such problem.

The source is cell (0,0) and the destination is cell (3,4). Cells containing X
cannot be visited. Let's assume there are 4 valid moves - ​move up​, ​move
down​, ​move left​ and ​move right​.

Following pseudo code solve the problem given above.

The code given above is same as that given previously with slight changes.
It takes three more parameters including the given matrix to check if the
current cell is marked X or not and coordinates of destination cell
(destx,desty). If the current cell is equal to destination cell it returns True,
and consequently, all the previous calls in the stack returns True, because
73
there is no use of visiting any cells further when it has been discovered that
there is a path between source and destination cell.

So for the matrix given in image above the code returns True.
If, in the given matrix the cell (1,2) was also marked X, then the code would
have returned False, as there would have been no path to reach from S to
D in that case.

Practice:

Link to the Practice Problems:​ Practice Questions

74
Articulation Points and Bridges

Articulation Point
In a graph, a vertex is called an articulation point if removing it and all the
edges associated with it results in the increase of the number of connected
components in the graph. For example consider the graph given in
following figure.

If in the above graph, vertex 1 and all the edges associated with it, i.e. the
edges 1-0, 1-2 and 1-3 are removed, there will be no path to reach any of
the vertices 2, 3 or 4 from the vertices 0 and 5, that means the graph will
split into two separate components. One consisting of the vertices 0 and 5
and another one consisting of the vertices 2, 3 and 4 as shown in the
following figure.

75
Likewise removing the vertex 0 will disconnect the vertex 5 from all other
vertices. Hence the given graph has two articulation points: 0 and 1.

Articulation Points represents vulnerabilities in a network. In order to find all


the articulation points in a given graph, the brute force approach is to check
for every vertex if it is an articulation point or not, by removing it and then
counting the number of connected components in the graph. If the number
of components increases then the vertex under consideration is an
articulation point otherwise not.

Here's the pseudo code of the brute force approach, it returns the total
number of articulation points in the given graph.

76
The above algorithm iterates over all the vertices and in one iteration
applies a Depth First Search to find connected components, so time
complexity of above algorithm is O(V×(V+E)), where V is the number of
vertices and E is the number of edges in the graph.

Clearly the brute force approach will fail for bigger graphs.

There is an algorithm that can help find all the articulation points in a given
graph by a single Depth First Search, that means with complexity O(V+E),
but it involves a new term called "Back Edge" which is explained below:

Given a DFS tree of a graph, a Back Edge is an edge that connects a


vertex to a vertex that is discovered before it's parent. For example
consider the graph given in Fig. 1. The figure given below depicts a DFS
tree of the graph.

77
In the above case, the edge 4 - 2 connects 4 to an ancestor of its parent
i.e. 3, so it is a Back Edge. And similarly 3 - 1 is also a Back edge. But why
bother about Back Edge? Presence of a back edge means presence of an
alternative path in case the parent of the vertex is removed. Suppose a
vertex u is having a child v such that none of the vertices in the subtree
rooted at v have a back edge to any vertex discovered before u, that
means if vertex u is removed then there will be no path left for vertex v or
any of the vertices present in the subtree rooted at vertex v to reach any
vertex discovered before u, that implies, the subtree rooted at vertex v will
get disconnected from the entire graph, and thus the number of
components will increase and u will be counted as an articulation point. On
the other hand, if the subtree rooted at vertex v has a vertex x that has
back edge that connects it to a vertex discovered before u, say y, then
there will be a path for any vertex in subtree rooted at v to reach y even

78
after removal of u, and if that is the case with all the children of u, then u
will not count as an articulation point.

So ultimately it all converges down to finding a back edge for every vertex.
So, for that apply a DFS and record the discovery time of every vertex and
maintain for every vertex v the earliest discovered vertex that can be
reached from any of the vertices in the subtree rooted at v. If a vertex u is
having a child v such that the earliest discovered vertex that can be
reached from the vertices in the subtree rooted at v has a discovery time
greater than or equal to u, then v does not have a back edge, and thus u
will be an articulation point.

So, till now the algorithm says that if all children of a vertex u are having a
back edge, then u is not an articulation point. But what will happen when u
is root of the tree, as root does not have any ancestors. Well, it is very easy
to check if the root is an articulation point or not. If root has more than one
child than it is an articulation point otherwise it is not. Now how does that
help?? Suppose root has two children, v1 and v2. If there had been an
edge between vertices in the subtree rooted at v1 and those of the subtree
rooted at v2, then they would have been a part of the same subtree.

Here's the pseudo code of the above algorithm:

79
Here's what everything means:

adj[][] : It is an N×N matrix denoting the adjacency matrix of the given


graph.

disc[] : It is an array of N elements which stores the discovery time of every


vertex. It is initialized by 0.

low[] : It is an array of N elements which stores, for every vertex v, the


discovery time of the earliest discovered vertex to which v or any of the
vertices in the subtree rooted at v is having a back edge. It is initialized by
INFINITY.

visited[] : It is an array of size N which denotes whether a vertex is visited


or not during the DFS. It is initialized by false.

parent[] : It is an array of size N which stores the parent of each vertex. It is


initialized by NIL.

AP[] : It is an array of size N. AP[i] = true, if i​th​ vertex is an articulation point.

80
vertex: The vertex under consideration.

V : Number of vertices.

time : Current value of discovery time.


The above algorithm starts with an initial vertex say u, marks it visited,
record its discovery time, disc[u], and since it is just discovered, the earliest
vertex it is connected to is itself, so low[u] is also set equal to vertex's
discovery time.
It keeps a counter called child to count the number of children of a vertex.
Then the algorithm iterates over every vertex in the graph and see if it is
connected to u, if it finds a vertex v. that is connected to u, but has already
been visited, then it updates the value low[u] to minimum of low[u] and
discovery time of v i.e., disc[v].But if the vertex v is not yet visited, then it
sets the parent[v] to u and calls the DFS again with vertex=v. So the same
things that just happened with u will happen for v also. When that DFS call
will return, low[v] will have the discovery time of the earliest discovered
vertex that can be reached from any vertex in the subtree rooted at v. So
set low[u] to minimum of low[v] and itself. And finally if u is not the root, it
checks whether low[v] is greater than or equal to disc[u], and if so, it marks
AP[u] as true. And if u is root it checks whether it has more than one child
or not, and if so, it marks AP[u] as true.

The following image shows the value of array disc[] and low[] for DFS tree
given in Fig. 3.

81
Clearly only for vertices 0 and 1, low[5]≥disc[0] and low[2]≥disc[1], so these
are the only two articulation points in the given graph.

Bridges

An edge in a graph between vertices say u and v is called a Bridge, if after


removing it, there will be no path left between u and v. It's definition is very
similar to that of Articulation Points. Just like them it also represents
vulnerabilities in the given network. For the graph given in Fig.1, if the edge
0-1 is removed, there will be no path left to reach from 0 to 1, similarly if
edge 0-5 is removed, there will be no path left that connects 0 and 5. So in
this case the edges 0-1 and 0-5 are the Bridges in the given graph.

The Brute force approach to find all the bridges in a given graph is to check
for every edge if it is a bridge or not, by first removing it and then checking
if the vertices that it was connecting are still connected or not. Following is
pseudo code of this approach:

82
The above code uses BFS to check if the vertices that were connected by
the removed edge are still connected or not. It does so for every edge and
thus it's complexity is O(E×(V+E)). Clearly it will fail for big values of V and
E.

To check if an edge is a bridge or not the above algorithm checks if the


vertices that the edge is connecting are connected even after removal of
the edge or not. If they are still connected, this implies existence of an
alternate path. So just like in the case of Articulation Points the concept of
Back Edge can be used to check the existence of the alternate path. For
any edge, u−v, (u having discovery time less than v), if the earliest
discovered vertex that can be visited from any vertex in the subtree rooted
at vertex v has discovery time strictly greater than that of u, then u−v is a
Bridge otherwise not. Unlike articulation point, here root is not a special
case. Following is the pseudo code for the algorithm:

83
For graph given in Fig.1, the low[] and disc[] value obtained for its DFS tree
shown in Fig.3, by the above pseudo code, will be the same as those
obtained in case of articulation points. The values of array low[] and disc[]
are shown in Fig.4. Clearly for only two edges i.e 0-1 and 0-5, low[1] >
disc[0] and low[5] > disc[0], hence those are the only two bridges in the
given graph.

Practice:

Link to the Practice Problems:​ Practice Questions

84
Biconnected Components

Pre-Requisite: ​Articulation Points

Before Biconnected Components, let's first try to understand what a


Biconnected Graph is and how to check if a given graph is Biconnected or
not.
A graph is said to be Biconnected if:

1. It is connected, i.e. it is possible to reach every vertex from every


other vertex, by a simple path.
2. Even after removing any vertex the graph remains connected.

For example, consider the graph in the following figure

The given graph is clearly connected. Now try removing the vertices one by
one and observe. Removing any of the vertices does not increase the
number of connected components. So the given graph is Biconnected.

85
Now consider the following graph which is a slight modification in the
previous graph.

In the above graph if the vertex 2 is removed, then here's how it will look:

86
Clearly the number of connected components have increased. Similarly, if
vertex 3 is removed there will be no path left to reach vertex 0 from any of
the vertices 1, 2, 4 or 5. And same goes for vertex 4 and 1. Removing
vertex 4 will disconnect 1 from all other vertices 0, 2, 3 and 4. So the graph
is not Biconnected.

Now what to look for in a graph to check if it's Biconnected. By now it is


said that a graph is Biconnected if it has no vertex such that its removal
increases the number of connected components in the graph. And if there
exists such a vertex then it is not Biconnected. A vertex whose removal
increases the number of connected components is called an Articulation
Point.

So simply check if the given graph has any articulation point or not. If it has
no articulation point then it is Biconnected otherwise not. Here's the pseudo
code:

The code above is exactly same as that for Articulation Point with one
difference that it returns false as soon as it finds an Articulation Point.

87
The image below shows how the DFS tree will look like for the graph in Fig.
2 according to the algorithm, along with the value of the arrays low[] and
disc[].

Clearly for vertex 4 and its child 1, low[1]≥disc[4], so that means 4 is an


articulation point. The algorithm returns false as soon as it discovers that 4
is an articulation point and will not go on to check low[] for vertices 0, 2 and
3. Value of low[] for all vertices is just shown for clarification.

Following image shows DFS tree, value of arrays low[] and disc[] for graph
in Fig.1

88
Clearly there does not exists any vertex x, such that low[x]≥disc[x], i.e. the
graph has no articulation point, so the algorithm returns true, that means
the graph is Biconnected.

Now let's move on to Biconnected Components. For a given graph, a


Biconnected Component, is one of its subgraphs which is Biconnected. For
example for the graph given in Fig. 2 following are 4 biconnected
components in the graph

89
Biconnected components in a graph can be determined by using the
previous algorithm with a slight modification. And that modification is to
maintain a Stack of edges. Keep adding edges to the stack in the order
they are visited and when an articulation point is detected i.e. say a vertex
u has a child v such that no vertex in the subtree rooted at v has a back
edge (low[v]≥disc[u]) then pop and print all the edges in the stack till the
u−v is found, as all those edges including the edge u−v will form one
biconnected component.

90
Let's see how it works for graph shown in Fig.2.

First it finds visited[0] is false so it starts with vertex 0 and first discovers
the edge 0-3 and pushes it in the stack.

91
It then discovers the fact that low[1]≥disc[4] i.e. discovers that 4 is an
articulation point. So all the edges inserted after the edge 4-1 along with
edge 4-1 will form first biconnected component. So it pops and print the
edges till last edge is 4-1 and then prints that too and pop it from the stack

Then it discovers the edge 4-5 and pushes that in stack

For 5 it discovers the back edge 5-2 and pushes that in stack

92
After that no more edge is connected to 5 so it goes back to 4. For 4 also
no more edge is connected and also low[5]≱disc[4].

Then it goes back to 2 and discovers that low[4]≥disc[2], that means 2 is an


articulation point. That means all the edges inserted after the edge 4-2
along with the edge 4-2 will form the second biconnected component. So it
print and pop all the edges till last edge is 4-2 and then print and pop that
too.

Then it goes to 3 and discovers that 3 is an articulation point as


low[2]≥disc[3] so it print and pops till last edge is 3-2 and the print and pop
that too. That will form the third biconnected component.

Now finally it discovers that for edge 0-3 also low[3]≥disc[0] so it pops it
from the stack and print it as the fourth biconnected component.

Then it checks visited[] value of other vertices and as for all vertices it is
true so the algorithm terminates.

So ultimately the algorithm discovers all the 4 biconnected components


shown in Fig.6.

93
Time complexity of the algorithm is same as that of DFS. If V is the number
of vertices and E is the number of edges then complexity is O(V+E).

Practice:

Link to the Practice Problems:​ Practice Questions

Strongly Connected Components

Connectivity​ in an undirected graph means that every vertex can reach


every other vertex via any path. If the graph is not connected the graph can
be broken down into ​Connected Components.

Strong Connectivity ​applies only to directed graphs. A directed graph is


strongly connected if there is a directed path from any vertex to every other
vertex. This is same as connectivity in an undirected graph, the only
difference being strong connectivity applies to directed graphs and there
should be directed paths instead of just paths. Similar to connected
components, a directed graph can be broken down into Strongly
Connected Components​.

Basic/Brute Force method to find Strongly Connected Components:

94
Strongly connected components can be found one by one, that is first the
strongly connected component including node 1 is found. Then, if node 2 is
not included in the strongly connected component of node 1, similar
process which will be outlined below can be used for node 2, else the
process moves on to node 3 and so on.

So, how to find the strongly connected component which includes node 1?
Let there be a list which contains all nodes, these nodes will be deleted one
by one once it is sure that the particular node does not belong to the
strongly connected component of node 1. So, initially all nodes from 1 to N
are in the list. Let length of list be LEN, current index be IND and the
element at current index ELE. Now for each of the elements at index
IND+1,...,LEN, assume the element is OtherElement, it can be checked if
there is a directed path from OtherElement to ELE by a single O(V+E) DFS,
and if there is a directed path from ELE to OtherElement, again by a single
O(V+E) DFS. If not, OtherElement can be safely deleted from the list.

After all these steps, the list has the following property: every element can
reach ELE, and ELE can reach every element via a directed path. But the
elements of this list may or may not form a strongly connected component,
because it is not confirmed that there is a path from other vertices in the list
excluding ELE to the all other vertices of the list excluding ELE.

So to do this, a similar process to the above mentioned is done on the next


element(at next index IND+1) of the list. This process needs to check
whether elements at indices IND+2,...,LEN have a directed path to element
at index IND+1. It should also check if element at index IND+1 has a
directed path to those vertices. If not, such nodes can be deleted from the
list. Now one by one, the process keeps on deleting elements that must not
be there in the Strongly Connected Component of

In the end, list will contain a Strongly Connected Component that includes
node 1. Now, to find the other Strongly Connected Components, a similar
95
process must be applied on the next element(that is 2), only if it has not
already been a part of some previous Strongly Connected
Component(here, the Strongly Connected Component of 1). Else, the
process continues to node 3 and so on.

The time complexity of the above algorithm is O(V3).

Kosaraju's Linear time algorithm to find Strongly Connected


Components:

This algorithm just does DFS twice, and has a lot better complexity O(V+E),
than the brute force approach. First define a Condensed Component Graph
as a graph with ≤V nodes and ≤E edges, in which every node is a Strongly
Connected Component and there is an edge from C to C′, where C and C′
are Strongly Connected Components, if there is an edge from any node of
C to any node of C′.

It can be proved that the Condensed Component Graph will be a Directed


Acyclic Graph(DAG). To prove it, assume the contradictory that is it is not a
DAG, and there is a cycle. Now observe that on the cycle, every strongly
connected component can reach every other strongly connected
component via a directed path, which in turn means that every node on the
cycle can reach every other node in the cycle, because in a strongly
96
connected component every node can be reached from any other node of
the component. So if there is a cycle, the cycle can be replaced with a
single node because all the Strongly Connected Components on that cycle
will form one Strongly Connected Component.

Therefore, the Condensed Component Graph will be a DAG. Now, a DAG


has the property that there is at least one node with no incoming edges and
at least one node with no outgoing edges. Call the above 2 nodes as
Source and Sink nodes. Now observe that if a DFS is done from any node
in the Sink(which is a collection of nodes as it is a Strongly Connected
Component), only nodes in the Strongly Connected Component of Sink are
visited. Now, removing the sink also results in a DAG, with maybe another
sink. So the above process can be repeated until all Strongly Connected
Component's are discovered. So at each step any node of Sink should be
known. This should be done efficiently.

Now a property can be proven for any two nodes C and C′ of the
Condensed Component Graph that share an edge, that is let C→C′ be an
edge. The property is that the finish time of DFS of some node in C will be
always higher than the finish time of all nodes of C′.

Proof: There are 2 cases, when DFS first discovers either a node in C or a
node in C′.

Case 1: When DFS first discovers a node in C: Now at some time during
the DFS, nodes of C′ will start getting discovered(because there is an edge
from C to C′), then all nodes of C′ will be discovered and their DFS will be
finished in sometime (Why? Because it is a Strongly Connected
Component and will visit everything it can, before it backtracks to the node
in C, from where the first visited node of C′ was called). Therefore for this
case, the finish time of some node of C will always be higher than finish
time of all nodes of C′.

97
Case 2: When DFS first discovers a node in C′: Now, no node of C has
been discovered yet. DFS of C′ will visit every node of C′ and maybe more
of other Strongly Connected Component's if there is an edge from C′ to that
Strongly Connected Component. Observe that now any node of C will
never be discovered because there is no edge from C′ to C. Therefore DFS
of every node of C′ is already finished and DFS of any node of C has not
even started yet. So clearly finish time of some node(in this case all) of C,
will be higher than the finish time of all nodes of C′.

So, if there is an edge from C to C′ in the condensed component graph, the


finish time of some node of C will be higher than finish time of all nodes of
C′. In other words, topological sorting(a linear arrangement of nodes in
which edges go from left to right) of the condensed component graph can
be done, and then some node in the leftmost Strongly Connected
Component will have higher finishing time than all nodes in the Strongly
Connected Component's to the right in the topological sorting.

Now the only problem left is how to find some node in the sink Strongly
Connected Component of the condensed component graph. The
condensed component graph can be reversed, then all the sources will
become sinks and all the sinks will become sources. Note that the Strongly
Connected Component's of the reversed graph will be same as the
Strongly Connected Components of the original graph.

98
Now a DFS can be done on the new sinks, which will again lead to finding
Strongly Connected Components. And now the order in which DFS on the
new sinks needs to be done, is known. The order is that of decreasing
finishing times in the DFS of the original graph. This is because it was
already proved that an edge from C to C′ in the original condensed
component graph means that finish time of some node of C is always
higher than finish time of all nodes of C′. So when the graph is reversed,
sink will be that Strongly Connected Component in which there is a node
with the highest finishing time. Since edges are reversed, DFS from the
node with highest finishing time, will visit only its own Strongly Connected
Component.

Now a DFS can be done from the next valid node(valid means which is not
visited yet, in previous DFSs) which has the next highest finishing time. In
this way all Strongly Connected Component's will be found. The complexity
of the above algorithm is O(V+E), and it only requires 2DFSs.
The algorithm in steps can be described as below:

1) Do a DFS on the original graph, keeping track of the finish times of each
node. This can be done with a stack, when some DFS finishes put the
source vertex on the stack. This way node with highest finishing time will be
on top of the stack.

2) ​Reverse the original graph,​ it can be done efficiently if data structure


used to store the graph is an adjacency list.

99
3) Do ​DFS​ on the reversed graph, with the source vertex as the vertex on
top of the stack. When DFS finishes, all nodes visited will form one Strongly
Connected Component. If any more nodes remain unvisited, this means
there are more Strongly Connected Component's, so pop vertices from top
of the stack until a valid unvisited node is found. This will have the highest
finishing time of all currently unvisited nodes. This step is repeated until all
nodes are visited.

Practice:

Link to the Practice Problems:​ Practice Questions

100
Topological Sort

Topological sorting of vertices of a Directed Acyclic Graph is an ordering of


the vertices v1,v2,...vn in such a way, that if there is an edge directed
towards vertex vj from vertex vi, then vi comes before vj.

For example consider the graph given below:

A topological sorting of this graph is: 1 2 3 4 5


There are multiple topological sorting possible for a graph. For the graph
given above one another topological sorting is: 1 2 3 5 4
In order to have a topological sorting the graph must not contain any
cycles. In order to prove it, let's assume there is a cycle made of the
vertices v1,v2,v3...vn. That means there is a directed edge between vi and
vi+1 (1≤i<n) and between vn and v1. So now, if we do topological sorting
then vn must come before v1 because of the directed edge from vn to v1.
Clearly, vi+1 will come after vi, because of the directed from vi to vi+1, that
means v1 must come before vn. Well, clearly we've reached a
contradiction, here. So topological sorting can be achieved for only directed
and acyclic graphs.

Le'ts see how we can find a topological sorting in a graph. So basically we


want to find a permutation of the vertices in which for every vertex vi, all the
vertices vj having edges coming out and directed towards vi comes before

101
vi. We'll maintain an array T that will denote our topological sorting. So, let's
say for a graph having N vertices, we have an array in_degree[] of size N
whose ith element tells the number of vertices which are not already
inserted in T and there is an edge from them incident on vertex numbered i.
We'll append vertices vi to the array T, and when we do that we'll decrease
the value of in_degree[vj] by 1 for every edge from vi to vj. Doing this will
mean that we have inserted one vertex having edge directed towards vj. So
at any point we can insert only those vertices for which the value of
in_degree[] is 0.

The algorithm using a BFS traversal is given below:

Let's take a graph and see the algorithm in action. Consider the graph
given below:

102
Initially in_degree[0]=0 and T is empty

So, we delete 0 from Queue and append it to T. The vertices directly


connected to 0 are 1 and 2 so we decrease their in_degree[] by 1. So, now
in_degree[1]=0 and so 1 is pushed in Queue.

Next we delete 1 from Queue and append it to T. Doing this we decrease


in_degree[2] by 1, and now it becomes 0 and 2 is pushed into Queue.

103
So, we continue doing like this, and further iterations looks like as follows:

104
105
So at last we get our Topological sorting in T i.e. : 0, 1, 2, 3, 4, 5

Solution using a DFS traversal, unlike the one using BFS, does not need
any special in_degree[] array. Following is the pseudo code of the DFS
solution:

The following image of shows the state of stack and of array T in the above
code for the same graph shown above.

106
107
Practice:

Link to the Practice Problems:​ Practice Questions

Hamiltonian Path

Hamiltonian Path is a path in a directed or undirected graph that visits each


vertex exactly once. The problem to check whether a graph (directed or
undirected) contains a Hamiltonian Path is NP-complete, so is the problem
of finding all the Hamiltonian Paths in a graph. Following images explains
the idea behind Hamiltonian Path more clearly.

108
Graph shown in Fig.1 does not contain any Hamiltonian Path. Graph shown
in Fig. 2 contains two Hamiltonian Paths which are highlighted in Fig. 3 and
Fig. 4

Following are some ways of checking whether a graph contains a


Hamiltonian Path or not.

1) A Hamiltonian Path in a graph having N vertices is nothing but a


permutation of the vertices of the graph [v​1​, v​2​, v​3​, ......v​N-1​, v​N​] , such
that there is an edge between v​i​ and v​i+1​ where 1 ≤ i ≤ N-1. So it can
be checked for all permutations of the vertices whether any of them

109
represents a Hamiltonian Path or not. For example, for the graph
given in Fig. 2 there are 4 vertices, which means total 24 possible
permutations, out of which only following represents a Hamiltonian
Path.

0-1-2-3
3-2-1-0
0-1-3-2
2-3-1-0

Following is the pseudo code of the above algorithm:

The function get_next_permutation(p) generates the lexicographically next


greater permutation than p.

Following is the C++ implementation:

110
Time complexity of the above method can be easily derived. For a graph
having N vertices it visits all the permutations of the vertices, i.e. N!
iterations and in each of those iterations it traverses the permutation to see
if adjacent vertices are connected or not i.e N iterations, so the complexity
is O( N * N! ).
2) There is one algorithm given by Bellman, Held, and Karp which uses
dynamic programming to check whether a Hamiltonian Path exists in a
graph or not. Here's the idea, for every subset S of vertices check whether
there is a path that visits "EACH and ONLY" the vertices in S exactly once
and ends at a vertex v. Do this for all v ϵ S. A path exists that visits each
vertex in subset S and ends at vertex v ϵ S iff v has a neighbor w in S and
there is a path that visits each vertex in set S-{v} exactly once and ends at
w. If there is such a path, then adding the edge w-v to it will extend it to visit
v and as it is already visiting every vertex in S-{v}, so the new path will visit
every vertex in S.

For example, consider the graph given in Fig. 2, let S={0, 1, 2} and v=2.
Clearly 2 has a neighbor in the set i.e. 1. A path exists that visits 0, 1, and 2
exactly once and ends at 2, if there is a path that visits each vertex in the
set (S-{2})={0, 1} exactly once and ends at 1. Well yes, there exists such a
path i.e. 0-1, and adding the edge 1-2 to it will make the new path look like

111
0-1-2. So there is a path that visits 0, 1 and 2 exactly once and ends at 2.
Following is the pseudo code for the above
algorithm, it uses bitmasking to represent subsets ( Learn about bitmasking
here​):

Let's try to understand it. The cell dp[j][i] checks if there is a path that visits
each vertex in subset represented by mask i and ends at vertex j. In the
first 3 lines every cell of table dp is initialized as false and in the following
two lines the cells (i, 2​i​), 0 ≤ i < n are initialized as true. In the binary
conversion of 2​i​ only i​th​ bit is 1. That means 2​i​ represents a subset
containing only the vertex i. And so the cell dp[i][2​i​] represents whether
there is a path that visits the vertex i exactly once and ends at vertex i. And
ofcourse for every vertex it should be true.

The next loop iterates over 0 to 2​n​-1, all the bitmasks, that means all the
subsets of the vertices. And the loop inside it check which of the vertices
from 0 to n-1 are present in subset S represented by a bitmask i. And the
third loop inside it checks for every vertex j present in S, which of the
vertices from 0 to n-1 are present in S and are neighbors of j. Then for
every such vertex k it checks whether the cell dp[k][ i XOR 2​j​ ] is true or not.

112
What does this cell represents? In binary conversion of i XOR 2​j​ every bit
which is 1 in binary conversion of i remains 1 except the j​th​ bit. So i XOR 2​j
represents the subset S-{j} and the cell dp[k][ i XOR 2​j​ ] represents whether
there is a path that visits each vertex in the subset S-{j} exactly once and
ends at k. If there is a path that visits each vertex in S-{j} exactly once and
ends at k than adding the edge k-j will extend that path to visit each vertex
in S exactly once and end at j. So dp[j][i] will be true if there is such a path.

Finally there is a loop that iterates over all the vertices 0 to n-1 and checks
if the cell dp[i][2​n​-1] is true or not, where 0 ≤ i < n. In the binary conversion
of 2​n​-1 every bit is 1, that means it represents the set containing all the
vertices and cell dp[i][2​n​-1] represents whether there is a path that visits
every vertex exactly once and ends at i. If there is such a path it returns
true i.e. there is a Hamiltonian path in the given graph. In the last line it
returns false, that means no Hamiltonian path is found in the given graph.
Following is the C++ implementation of the above method:

Here's how the table dp looks like for the graph given in Fig. 2 filled upto
the mask 6.

113
Let's fill it for the mask 7 i.e. for the subset S={0, 1, 2}.

For cell dp[ 0 ][ 7 ], 0 has a neighbor in S i.e. 1, check if there is a path that
visits each vertex in subset represented by 7 XOR 2​0​ = 6 i.e. {1, 2}, exactly
once and ends at 1, i.e. the cell dp[ 1 ][ 6 ]. It is true so dp[ 0 ][ 7 ] will also
be true.

For cell dp[ 1 ][ 7 ], 1 has two neighbors in S i.e. 0, 2. So check for the
bitmask 7 XOR 2​1​ = 5 i.e. the subset {0, 2}. Here both the cells dp[ 0 ][ 5 ]
and dp[ 2 ][ 5 ] are false. So the cell dp[ 1 ][ 7 ] will remain false.

For cell dp[ 2 ][ 7 ], 2 has a neighbor in S i.e. 1, check if there is a path that
visits each vertex in subset represented by 7 XOR 2​2​ = 3 i.e. {0, 1}, exactly
once and ends at 1, i.e. the cell dp[ 1 ][ 3 ]. It is true so dp[ 2 ][ 7 ] will also
be true.

For cell dp[ 3 ][ 7 ], clearly 3 ∉ {0, 1, 2}. So dp[ 3 ][ 7 ] will remain false.

Here's how the complete table will look like.

Now clearly the cells dp[ 0 ][ 15 ], dp[ 2 ][ 15 ], dp[ 3 ][ 15 ] are true so the
graph contains a Hamiltonian Path.

Time complexity of the above algorithm is O(2​n​n​2​).

114
3) Depth first search and backtracking can also help to check whether a
Hamiltonian path exists in a graph or not. Simply apply depth first search
starting from every vertex v and do labeling of all the vertices. All the
vertices are labelled as either "IN STACK" or "NOT IN STACK". A vertex is
labelled "IN STACK" if it is visited but some of its adjacent vertices are not
yet visited and is labelled "NOT IN STACK" if it is not visited.

If at any instant the number of vertices with label "IN STACK" is equal to
the total number of vertices in the graph then a Hamiltonian Path exists in
the graph.

Following image shows how this algorithm will work for graph shown in Fig.
2.

115
The above image shows how it will work when DFS is strated with vertex 1.
So clearly, the number of vertices having label IN_STACK is not equal to 4
at any stage. That means there is no Hamiltonian Path that starts with 1.
When DFS is applied starting from vertices 2, 3 and 4 same result will be
obtained. So there is no Hamiltonian Path in the given graph.
Following is the C++ implementation:

116
Worst case complexity of using DFS and backtracking is O(N!).

Practice:

Link to the Practice Problems:​ Practice Questions

117
Maximum flow

In graph theory, a flow network is defined as a directed graph involving a


source(S) and a sink(T) and several other nodes connected with edges.
Each edge has an individual capacity which is the maximum limit of flow
that edge could allow.
Flow in the network should follow the following conditions:
● For any non-source and non-sink node, the input flow is equal to
output flow.
● For any edge(Ei) in the network, 0≤flow(Ei)≤Capacity(Ei).
● Total flow out of the source node is equal total to flow in to the sink
node.
● Net flow in the edges follows skew symmetry i.e. F(u,v)=−F(v,u)
where F(u,v) is flow from node u to node v. This leads to a conclusion
where you have to sum up all the flows between two nodes(either
directions) to find net flow between the nodes initially.
Maximum Flow:
It is defined as the maximum amount of flow that the network would allow
to flow from source to sink. Multiple algorithms exist in solving the
maximum flow problem. Two major algorithms to solve these kind of
problems are Ford-Fulkerson algorithm and Dinic's Algorithm. They are
explained below.
Ford-Fulkerson Algorithm:
It was developed by L. R. Ford, Jr. and D. R. Fulkerson in 1956. A
pseudocode for this algorithm is given below,
Inputs required are network graph G, source node S and sink node T.

118
An augmenting path is a simple path from source to sink which do not
include any cycles and that pass only through positive weighted edges. A
residual network graph indicates how much more flow is allowed in each
edge in the network graph. If there are no augmenting paths possible from
S to T, then the flow is maximum. The result i.e. the maximum flow will be
the total flow out of source node which is also equal to total flow in to the
sink node.

A demonstration of working of Ford-Fulkerson algorithm is shown below


with the help of diagrams.

119
120
Implementation:

● An augmenting path in residual graph can be found using DFS or


BFS.
● Updating residual graph includes following steps: (refer the diagrams
for better understanding)
○ For every edge in the augmenting path, a value of minimum
capacity in the path is subtracted from all the edges of that
path.
○ An edge of equal amount is added to edges in reverse direction
for every successive nodes in the augmenting path.

The complexity of Ford-Fulkerson algorithm cannot be accurately


computed as it all depends on the path from source to sink. For example,
considering the network shown below, if each time, the path chosen are
S−A−B−T and S−B−A−T alternatively, then it can take a very long time.
Instead, if path chosen are only S−A−T and S−B−T, would also generate
the maximum flow.

Dinic's Algorithm

In 1970, Y. A. Dinitz developed a faster algorithm for calculating maximum


flow over the networks. It includes construction of level graphs and residual
graphs and finding of augmenting paths along with blocking flow.

121
Level graph is one where value of each node is its shortest distance from
source.
Blocking flow includes finding the new path from the bottleneck node.
Residual graph and augmenting paths are previously discussed.

Pseudocode for Dinic's algorithm is given below.

Inputs required are network graph G, source node S and sink node T.

Update of level graph includes removal of edges with full capacity.


Removal of nodes that are not sink and are dead ends. A demonstration of
working of Dinic's algorithm is shown below with the help of diagrams.

122
123
Practice:

Link to the Practice Problems:​ Practice Questions

Minimum Cost Maximum Flow

Minimum Cost flow problem is a way of minimizing the cost required to


deliver maximum amount of flow possible in the network. It can be said as
an extension of maximum flow problem with an added constraint on
cost(per unit flow) of flow for each edge. One other difference in min-cost
flow from a normal max flow is that, here, the source and sink have a strict
bound on the limits on the flow they can produce or take in respectively,
B(s)>0,B(t)<0. Intermediate nodes have no bounds or can be represented
as B(x)=0.

Cycle Cancelling Algorithm:

This algorithm is used to find min cost flow along the flow network. Pseudo
code for this algorithm is provided below.

Negative cycle in cost network(Gc) are cycle with sum of costs of all the
edges in the cycle is negative. They can be detected using Bellman Ford
algorithm. They should be eliminated because, practically, flow through

124
such cycles cannot be allowed. Consider a negative cost cycle, if at all flow
has to pass through this cycle, the total cost is always reducing for every
cycle completed. And so would result in an infinite loop in desire of
minimizing the total cost. So, whenever a cost network includes a negative
cycle, it implies, the cost can further be minimized (By flowing through other
side of cycle instead of the side currently considered). A negative cycle
once detected are removed by flowing a bottleneck capacity through all the
edges in the cycle.

There are various applications of minimum cost flow problem. One of which
is solving the minimum weighted bipartite matching. Bipartite graph(B) is a
graph whose nodes can be divided into two disjoint sets(P and Q) and all
the edges of graph joins a node in P to node in Q. Matching means that no
two edges in final flow touch each other(share common node). It can be
considered as a multi-source multi-destination graph. Convert such graphs
to single source and single destination by creating a source node S and
join all nodes in set P with Sand a destination node T and join all nodes in
set Q with T. Now, above algorithm can be applied to find min cost max
flow in graph B.

Hungarian Algorithm:
A variant of weighted bipartite matching problem is known as assignment
problem. In simple terms, assignment problem can be described as having
N jobs and N workers, each worker does a job for particular cost. Also,
each worker should be given only one job and each job should be assigned
to only worker. This can be solved using Hungarian algorithm. Pseudocode
for this problem is given below.

Input will be an N×N matrix showing cost charged by each worker for each
job.

125
FindMinCost does an optimal selection of 0s in matrix X such that N cells
are selected and non of them lie in same row or column. The values in cells
in C corresponding to selected cells in X, are added and are returned as
answer for minimum cost that is to be calculated.

Practice:

Link to the Practice Problems:​ Practice Questions

126
Min-cut

Min-Cut of a weighted graph is defined as the minimum sum of weights of


(at least one)edges that when removed from the graph divides the graph
into two groups. Mechthild Stoer and Frank Wagner proposed an algorithm
in 1995 to find minimum cut in an undirected weighted graphs.

The algorithm works on a method of shrinking the graph by merging the


most tightly connected vertex until only one node is left in the graph and for
each step performed, the weight of the merged cut is stored in a list.
Minimum value in the list would be the minimum cut value of the graph.

Pseudocode for the algorithm is given below,

Vertex a can be arbitrary and remains same throughout the whole


algorithm. A in MinCutPhase is a collection of vertices which starts with the
arbitrary vertex a and starts adding other vertices of graph G one at a time
until it is equal to V which is collection of all vertices. Tightly connected
vertex to A implies a vertex whose edge weight to any of the vertex in A is
maximum. It can mathematically represented as z∉A such that
W(A,z)=max{W(A,y)|y∉A}, z will be the new tightly connected vertex. The
cut_of_the_phase is the sum of edges connecting last added vertex with

127
rest of the graph. Minimum value of cut_of_the_phase is the required
result.

Last two vertices added to the set A are merged by creating a single node
for both of them and edges joining them are deleted. Edges connecting
these vertices to other vertices are replaced with edges weighing sum of
edges to both the vertex. Ex: Let vertices P and Q are the two vertices
added last to the set A. Let there are three edges connecting P and Q to
the set A, E(X,P),W(X,P)=10;E(X,Q),W(X,Q)=20;E(Y,Q),W(Y,Q)=15. Now
as they are added last, vertices P and Q are merged to a single node, say
R. Now, edges connecting to R will be,
E(X,R),W(X,R)=30;E(Y,R),W(Y,R)=15.

For a flow network with a as in the algo mentioned above, is either S or T,


will be the maximum flow of the network from S to T. This is because,
minimum cut will be the bottleneck capacity of the flow network. And
always, the amount of flow from source side to sink side has to pass
through these set of edges that are to be cut. Working on a paper, source
should be on left and destination on right. The cut should always start from
top of the graph, move to bottom of the graph(should not stop in between).
The edges that are to be considered in min-cut should move from left of the
cut to right of the cut. Sum of capacity of all these edges will be the min-cut
which also is equal to max-flow of the network.

Working on a directed graph to calculate max flow of the graph using


min-cut concept is shown in image below.

128
Few possible cuts in the graph are shown in the graph and weights of each
cut are as follows: Cut1:25,Cut2:12,Cut3:16,Cut4:10,Cut5:15. As
mentioned these are only few possible cuts but considering any valid cut
would not have weight less than Cut4. And it is the min-cut of this graph
which is also the max-flow of the graph as explained above.

Practice:

Link to the Practice Problems:​ Practice Questions

129

You might also like