Graph Algorithms
Graph Algorithms
Graph Algorithms
Definition of graph
A graph G is a pair G = (V, E), where V is the set of vertices, while E is the set of (undirected)
edges. An (undirected) edge connecting vertices u, v is denoted by { u, v }.
For instance, the sets V = {1, 2, 3, 4, 5} and E = {{1, 2}, {2, 3}, {3, 4}, {4, 5}} define a graph
with 5 vertices and 4 edges.
Basic graph concepts
1. Directed graph vs undirected graph- When the edges in a graph have a direction, the graph
is called a directed graph or digraph, and the edges are called directed edges or arcs
While Undirected graph: The edges of a graph are assumed to be unordered pairs of nodes.
In an undirected graph, we write edges using curly braces to denote unordered pairs. For
example, an undirected edge {2,3} from vertex 2 to vertex 3 is the same thing as an
undirected edge {3,2} from vertex 3 to vertex 2.
3. A directed acyclic graph or dag is a directed graph with no directed cycles. Any vertex
in a dag that has no incoming vertices is called a source; any vertex with no outgoing
edges is called a sink. Every dag has at least one source and one sink (Do you see why?),
4. Path - A path in a graph is a sequence of vertices and edges. The length of a path is the
number of edges on a path, which is always equal to the number of vertices - 1.
• A simple path is a path where all of the vertices are distinct, with the exception
of the first and last.
• A cycle is a path of at least length 1 where the first and last nodes are the same.
The edges must be distinct for undirected graphs.
5. Connected Graph vs disconnected graph – A graph is said to be connected if there
exist a path from any vertex to any other vertex of graph while a Disconnected Graph
is a graph is said to be disconnected it it is not connected i.e if we start from one vertex of
graph and we cannot reach some vertices that means there is no connection with those
vertices
6. Loop - A loop is an edge or arc connecting a vertex with itself.
7. Adjacent vs incident edge - Two vertices are called adjacent if they share a common
edge, in which case the common edge is said to join the two vertices. An edge and a
vertex on that edge are called incident.
8. Degree - The number of edges which connect a node.
• In Degree:Number of edges pointing to a node.
• Out Degree: Number of edges going out of a node.
REPRESENTATION OF GRAPHS
In order to perform graph algorithms in a computer, we have to decide how to store the graph.
The way we draw a graph with circles and lines on the blackboard is not the same way will be
stored in the computer i.e computers aren't very good at interpreting that sort of input. Instead we
need a representation closer to the abstract definition of a graph. There are many ways of
representing graphs. They differ in efficiency depending on the algorithm, whether the graph is
dense or sparse, and other factors.
There are two standard ways of maintaining a graph G in the memory of a computer.
In the first case we store a matrix (two-dimensional array) with size NxN, where N is the number
of vertices. This means that for each edge between the vertices i and j we have the value of 1
(A[i][j] = 1), and 0 otherwise.
Example 1
Linked list representation is also called adjacency list representation, we store a graph as a linked
structure. First we store all the vertices of the graph in a list and then each adjacent vertices will
be represented using linked list node. Here terminal vertex of an edge is stored in a structure
node and linked to a corresponding initial vertex in the list.
Example 1
Below is representations of an undirected graph. (a) An undirected graph G having five vertices
and seven edges. (b) An adjacency-list representation of G. (c) The adjacency-matrix
representation of G.
Below is a representations of a directed graph. (a) A directed graph G having six vertices and
eight edges. (b) An adjacency-list representation of G. (c) The adjacency-matrix representation
of G.
Adjacency list representation of a graph is very memory efficient when the graph has a large
number of vertices but very few edges.
Example 4
GRAPH SEARCHING(TRAVERSAL)
Breadth-first search is a traversal through a graph that touches all of the vertices reachable from a
particular source vertex. In addition, the order of the traversal is such that the algorithm will
To keep track of progress, breadth-first-search colors each vertex. Each vertex of the graph is in
one of three states:
1. Undiscovered;
2. Discovered but not fully explored; and
3. Fully explored.
To keep track of progress, breadth-first search colors each vertex white, gray, or black.
All vertices start out white and may later become gray and then black. A vertex is discovered the
first time it is encountered during the search, at which time it becomes nonwhite. Gray and black
vertices, therefore, have been discovered, but breadth-first search distinguishes between them to
ensure that the search proceeds in a breadth-first manner. If (u, v) E and vertex u is black, then
vertex v is either gray or black; that is, all vertices adjacent to black vertices have been
discovered. Gray vertices may have some adjacent white vertices; they represent the frontier
between discovered and undiscovered vertices.
Breath-first search (BFS) uses a queue data structure for storing nodes which are not yet
processed in the initialization phase the algorithm starts at an arbitrary node and places it into the
queue. In the initialization phase the algorithm starts at an arbitrary node and places it into the
queue. Then BFS processes the head H of the queue and inserts all its unprocessed descendants
into the queue BFS repeats this step till the queue is not empty.
BFS(G,s)
1 for each vertex u V[G] - {s}
2 do color[u] WHITE
3 d[u]
4 [u] NIL
5 color[s] GRAY
6 d[s] 0
7 [s] NIL
In breath-first-search procedure above the color of each vertex u V is stored in the variable
color[u] and the predecessor of u is stored in the variable [u]. If u has no predecessor for
example, if u=s or u has not been discovered then [u]=NIL. The distance from the source s to
vertex u computed by the algorithm is stored in d[u]. The algorithm also uses first-in first-out
queue Q to manage the set of gray vertices
The procedure BFS works as follows. Lines 1-4 paint every vertex white set d[u] to infinity for
every vertex u, and set the parent of every vertex to be NIL. Line 5 paints the source vertex s
gray, since its considered to be discovered when the procedure begins. Line 6 initializes d[s] to 0,
and line 7 sets the predecessor of the source to be NIL. Line 8 initializes Q to the queue
containing just the vertex s; thereafter, Q always contains the set of gray vertices.
The loop in Line 9-18 will iterate as long as there remain gray vertices, which are discovered
vertices that have not yet had their adjacency lists fully examined. Line 10 determines the fray
vertex u at the head of the queue Q. The for loop of lines 11-16 considers each vertex v in the
adjacency list of u. If v is white, then it has not yet been discovered and the algorithm discovers
it by executing Lines 13-16. It is first grayed, and its distance d[v] is set to d[u]+1. Then, u is
recorded as its parent. Finally, it is placed at the tail of the queue Q. When all the vertices on u’s
adjacency list have been examined, u is removed from Q and blackened in lines 17-18.
ANALYSIS
The operations of enqueuing and dequeuing take O(1) time, so the total time devoted to queue
operations is O(V). Because the adjacency list of each vertex is scanned only when the vertex is
dequeued, the adjacency list of each vertex is scanned at most once. Since the sum of the lengths
of all the adjacency lists is (E), at most O(E) time is spent in total scanning adjacency lists.
The overhead for initialization is O(V), and thus the total running time of BFS is O(V + E). Thus,
breadth-first search runs in time linear in the size of the adjacency- list representation of G.
The following figure (from CLRS) illustrates the progress of breadth-first search on the
undirected sample graph.
A depth first search (DFS) visits all the vertices in a graph. When choosing which edge to
explore next, this algorithm always chooses to go ``deeper'' into the graph. That is, it will pick
the next adjacent unvisited vertex until reaching a vertex that has no unvisited adjacent vertices.
The algorithm will then backtrack to the previous vertex and continue along any as-yet
unexplored edges from that vertex. After DFS has visited all the reachable vertices from a
particular source vertex, it chooses one of the remaining undiscovered vertices and continues the
search. This process creates a set of depth-first trees which together form the depth-first forest.
DFS uses a stack for storing unprocessed nodes.
As in breadth-first search, vertices are colored during the search to indicate their state.
Each vertex is initially white, is grayed when it is discovered in the search, and is blackened
when it is finished, that is, when its adjacency list has been examined completely. This technique
guarantees that each vertex ends up in exactly one depth-first tree, so that these trees are disjoint.
The procedure DFS below records when it discovers vertex u in the variable d[u] and
when it finishes vertex u in the variable f[u]. These timestamps are integers between 1 and 2 |V|,
since there is one discovery event and one finishing event for each of the |V| vertices. For every
vertex u,
Vertex u is WHITE before time d[u], GRAY between time d[u] and time f[u], and BLACK
thereafter.
Algorithm
The following pseudocode is the basic depth-first-search algorithm. The input graph G may be
undirected or directed. The variable time is a global variable that we use for timestamping.
DFS(G)
1 for each vertex u V[G]
2 do color[u] WHITE
3 [u] NIL
4 time 0
5 for each vertex u V[G]
6 do if color[u] = WHITE
7 then DFS-VISIT(u)
Procedure DFS works as follows. Lines 1-3 paint all vertices white and initialize their fields to
NIL. Line 4 resets the global time counter. Lines 5-7 check each vertex in V in turn and, when a
white vertex is found, visit it using DFS-VISIT. Every time DFS-VISIT(u) is called in line 7,
Notes Prepared by Peninah J. Limo Page 9
vertex u becomes the root of a new tree in the depth-first forest. When DFS returns, every vertex
u has been assigned a discovery time d[u] and a finishing time f[u].
In each call DFS-VISIT(u), vertex u is initially white. Line 1 paints u gray, and line 2 records the
discovery time d[u] by incrementing and saving the global variable time. Lines 3-6 examine each
vertex v adjacent to u and recursively visit v if it is white. As each vertex v Adj[u] is considered
in line 3, we say that edge (u, v) is explored by the depth-first search. Finally, after every edge
leaving u has been explored, lines 7-8 paint u black and record the finishing time in f[u].
Example 1
In the following figure, the solid edge represents discovery or tree edge and the dashed edge
shows the back edge. Furthermore, each vertex has two time stamps: the first time-stamp records
when vertex is first discovered and second time-stamp records when the search finishes
examining adjacency list of vertex.
Example 2
Starting from node U we can either discover node V or Y. Suppose that we discover node V,
which has a single outgoing edge to node W. W has no outgoing edges, so this node is finished,
and we return to V. From V there is no other choice, so this node is also finished and we return
to U. From node U we can continue to discover Y and its descendants, and the procedure
continues similarly. At stage (l) we have discovered and finished nodes U, V, W, X, Y. Selecting
node Q as a new starting node, we can discover the remaining nodes (in this case Z).
Classification of edges
The DFS algorithm can be modified to classify edges as it encounters them. The key idea is that
each edge (u, v) can be classified by the color of the vertex v that is reached when the edge is
first explored (except that forward and cross edges are not distinguished):
This edge classification can be used to glean important information about a graph.
1. Tree edges - If v is visited for the first time as we traverse the edge (u, v), then the edge is a
tree edge. A Tree Edge is an edge that connects a vertex with its parent.
2. Back edges - If v is an ancestor of u, then edge (u, v) is a back edge. Back Edge is a non-tree
edge that connects a vertex with an ancestor.
4. Cross edges - if v is neither an ancestor or descendant of u, then edge (u, v) is a cross edge.
TOPOLOGICAL SORT
At the heart of topological sort is a depth-first search. A topological sort or topological
ordering of a directed graph is a linear ordering of its vertices such that for every directed
edge uv from vertex u to vertex v, u comes before v in the ordering. A topological sort of a graph
can be viewed as an ordering of its vertices along a horizontal line so that all directed edges go
from left to right. ˆ Topological sorting is thus different from the usual kind of "sorting"
For instance, the vertices of the graph may represent tasks to be performed, and the edges
may represent constraints that one task must be performed before another; in this application, a
topological ordering is just a valid sequence for the tasks. A topological ordering is possible if
Example
The following figure gives an example that arises when Professor Bumstead gets dressed in the
morning. The DAG of dependencies for putting clothing is given (not shown in the figure). (a)
The discovery and finishing times from depth-first search are shown next to each vertex. (b)
Then the same DAG shown topologically sorted.
18 socks
16 underpants
15 pants
14 shoes
10 watch
8 shirt
7 belt
5 tie
4 jacket
It should be clear from above that we don't need to sort by finish times. We can just output
vertices as they are finished and understand that we want the reverse of this list. Or we can put
Example 2
The following figure shows a DAG and a topological ordering for the graph.
Topological-Sort(G)
We can perform a topological sort in time Θ(V + E), since depth-first search takes Θ(V + E) time
and it takes O(1) time to insert each of the |V| vertices onto the front of the linked list.
A minimum spanning tree (MST) of a weighted graph G is a spanning tree of G whose edges
sum to minimum weight. In other words, a minimum spanning tree is a tree formed from a subset
of the edges in a given undirected graph, with two properties:
• it spans the graph, i.e., it includes every vertex in the graph, and
• it is a minimum, i.e., the total weight of all the edges is as low as possible.
Minimum spanning tree has direct application in the design of networks. It is used in:
• algorithms approximating
1. Cluster Analysis
2. Handwriting recognition
3. Image segmentation
There are two famous algorithms for finding the Minimum Spanning Tree:
• Kruskal’s algorithm
• Prism algorithm
1. KRUSKAL’S ALGORITHM
Kruskal’s Algorithm builds the spanning tree by adding edges one by one into a growing
spanning tree. Kruskal's algorithm follows greedy approach as in each iteration it finds an edge
which has least weight and add it to the growing spanning tree.
To do this we will simply sort the edges in the increasing order of their weight and select the
edges one by one from the beginning. But sometimes, selecting edges in this order can create a
cycle. To avoid cycle creation, we have to check if the selected edge is creating a cycle or not. If
it is creating a cycle then simply ignore it otherwise add it to into the growing spanning tree. This
can be done simply use Disjoint Set data structure (Union Find).
Disjoint Set data structure can tell us if two nodes are connected or not. So if the endpoints of an
edge have same root , i.e., they are connected, then this edge will create a cycle, so just ignore it
In Kruskal’s algorithm, at each iteration we will select the edge with the lowest weight. So, we
will start with the lowest weighted edge first i.e., the edges with weight 1. After that we will
select the second lowest weighted edge i.e., edge with weight 2. Notice these two edges are
totally disjoint. Now, the next edge will be the third lowest weighted edge i.e., edge with weight
3, which connects the two disjoint pieces of the graph. Now, we are not allowed to pick the edge
with weight 4, that will create a cycle and we can’t have any cycles. So, we will select the fifth
lowest weighted edge i.e., edge with weight 5. Now the other two edges will create cycles so we
will ignore them. In the end, we end up with a minimum spanning tree with total cost 11 ( = 1 +
2 + 3 + 5).
Consider the weighted graph below. Run Kruskal’s algorithm starting from vertex A. Write the
edges in the order which they are added to the minimum spanning tree.
Solution
Pseudo code:
Time Complexity:
In Kruskal’s algorithm, most time-consuming operation is sorting because the total complexity
of the Disjoint-Set operations will be O(E(log*) V). So total running time will be O(ElgE). But
since |E| < |V|2, logE = O(lgV). So, time complexity of Kruskal’s algorithm is O(ElgV).
2. PRIM’S ALGORITHM
Prim’s Algorithm also use Greedy Approach to find the minimum spanning tree. In Prim’s
Algorithm we grow the spanning tree from a starting position. In contrast to the Kruskal’s
Algorithm, Prim’s Algorithm selects one vertex at a time and adds it into the growing spanning
tree. Prim’s Algorithm maintains two disjoint sets of vertices. One contains vertices that are in
the growing spanning tree and other that are not in the growing spanning tree.
Prim’s algorithm selects the cheapest vertex that is connected to the growing spanning tree and is
not in the growing spanning tree and add it into the growing spanning tree. This can be done
using Priority Queues. We will simply insert the vertices, that are connected to growing spanning
Example
In Prim’s Algorithm, we will start with an arbitrary node (it doesn’t matter which one) and mark
it. In each iteration we will mark a new vertex that is adjacent to the one that we have already
marked. As a greedy algorithm, Prim’s algorithm will select the cheapest edge and mark the
vertex. So, we will simply choose the edge with weight 1. In the next iteration we have three
options, edges with weight 2, 3 and 4. So, we will select the edge with weight 2 and mark the
vertex. Now again we have three options, edges with weight 3, 4 and 5. But we can’t choose
edge with weight 3 as it is creating a cycle. So, we will select the edge with weight 4 and we end
up with the minimum spanning tree of total cost 7 (= 1 + 2 +4).
Example 2
Consider the weighted graph below. Run Prim’s algorithm starting from vertex A. Write the
edges in the order which they are added to the minimum spanning tree
Pseudo code:
Time Complexity:
The time complexity of the Prim’s Algorithm is O ((V + E) logV) because each vertex is
inserted in the priority queue only once and insertion in priority queue take logarithmic time.