Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Graph Algorithms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

GRAPH ALGORITHMS

Definition of graph
A graph G is a pair G = (V, E), where V is the set of vertices, while E is the set of (undirected)
edges. An (undirected) edge connecting vertices u, v is denoted by { u, v }.
For instance, the sets V = {1, 2, 3, 4, 5} and E = {{1, 2}, {2, 3}, {3, 4}, {4, 5}} define a graph
with 5 vertices and 4 edges.
Basic graph concepts
1. Directed graph vs undirected graph- When the edges in a graph have a direction, the graph
is called a directed graph or digraph, and the edges are called directed edges or arcs
While Undirected graph: The edges of a graph are assumed to be unordered pairs of nodes.
In an undirected graph, we write edges using curly braces to denote unordered pairs. For
example, an undirected edge {2,3} from vertex 2 to vertex 3 is the same thing as an
undirected edge {3,2} from vertex 3 to vertex 2.

2. Weighted graph vs unweighted - unweighted graph is a graph in which all the


relationships symbolized by edges are considered equivalent. Such edges are rendered as
plain lines or arcs while weighted (edge) Weighted edges symbolize relationships
between nodes which are considered to have some value, for instance, distance or lag
time. Such edges are usually annotated by a number or letter placed beside the edge. If
edges have weights, we can put the weights in the lists.

3. A directed acyclic graph or dag is a directed graph with no directed cycles. Any vertex
in a dag that has no incoming vertices is called a source; any vertex with no outgoing
edges is called a sink. Every dag has at least one source and one sink (Do you see why?),

Notes Prepared by Peninah J. Limo Page 1


but may have more than one of each. For example, in the graph with n vertices but no
edges, every vertex is a source and every vertex is a sink.

4. Path - A path in a graph is a sequence of vertices and edges. The length of a path is the
number of edges on a path, which is always equal to the number of vertices - 1.
• A simple path is a path where all of the vertices are distinct, with the exception
of the first and last.
• A cycle is a path of at least length 1 where the first and last nodes are the same.
The edges must be distinct for undirected graphs.
5. Connected Graph vs disconnected graph – A graph is said to be connected if there
exist a path from any vertex to any other vertex of graph while a Disconnected Graph
is a graph is said to be disconnected it it is not connected i.e if we start from one vertex of
graph and we cannot reach some vertices that means there is no connection with those
vertices
6. Loop - A loop is an edge or arc connecting a vertex with itself.
7. Adjacent vs incident edge - Two vertices are called adjacent if they share a common
edge, in which case the common edge is said to join the two vertices. An edge and a
vertex on that edge are called incident.
8. Degree - The number of edges which connect a node.
• In Degree:Number of edges pointing to a node.
• Out Degree: Number of edges going out of a node.

9. The neighborhood of a vertex v in a graph G is the set of vertices adjacent to v. The


neighborhood is denoted N(v). The neighborhood does not include v itself. For example,
in the graph below N(5) = {4,2,1} and N(6) = {4}.

10. Connected component - In an undirected graph, a connected component is the set of


nodes that are reachable by traversal from some node. The connected components of an
undirected graph have the property that all nodes in the component are reachable from all
other nodes in the component.

Notes Prepared by Peninah J. Limo Page 2


In a directed graph, however, reachable usually means by a path in which all edges go in
the positive direction, i.e. from source to destination. In directed graphs, a vertex v may
be reachable from u but not vice-versa.

11. Strongly connected components - The strongly connected components in a directed


graph are defined in terms of the set of nodes that are mutually accessible from one
another. In other words, the strongly connected component of a node u is the set of all
nodes v such that v is reachable from u by a directed path and u is reachable from v by a
directed path. Equivalently, u and v lie on a directed cycle. One can show that this is an
equivalence relation on nodes, and the strongly connected components are the
equivalence classes.

REPRESENTATION OF GRAPHS
In order to perform graph algorithms in a computer, we have to decide how to store the graph.
The way we draw a graph with circles and lines on the blackboard is not the same way will be
stored in the computer i.e computers aren't very good at interpreting that sort of input. Instead we
need a representation closer to the abstract definition of a graph. There are many ways of
representing graphs. They differ in efficiency depending on the algorithm, whether the graph is
dense or sparse, and other factors.

There are two standard ways of maintaining a graph G in the memory of a computer.

• Adjacency matrix representation


• Adjacency list representation

1. Adjacency matrix representation

In the first case we store a matrix (two-dimensional array) with size NxN, where N is the number
of vertices. This means that for each edge between the vertices i and j we have the value of 1
(A[i][j] = 1), and 0 otherwise.

Example 1

Notes Prepared by Peninah J. Limo Page 3


Adjacency matrix representation of graphs is very simple to implement. However the Memory
requirement of Adjacency matrix representation of a graph wastes lot of memory space. Such
matrices are found to be very sparse. This representation requires space for n2 elements for a
graph with n vertices. If the graph has e number of edges then n2 – e elements in the matrix will
be 0.
Example 2
A directed graph and adjacency matrix:

2. Adjacency List Representation

Linked list representation is also called adjacency list representation, we store a graph as a linked
structure. First we store all the vertices of the graph in a list and then each adjacent vertices will
be represented using linked list node. Here terminal vertex of an edge is stored in a structure
node and linked to a corresponding initial vertex in the list.

Example 1

Below is representations of an undirected graph. (a) An undirected graph G having five vertices
and seven edges. (b) An adjacency-list representation of G. (c) The adjacency-matrix
representation of G.

Notes Prepared by Peninah J. Limo Page 4


Example 2

Below is a representations of a directed graph. (a) A directed graph G having six vertices and
eight edges. (b) An adjacency-list representation of G. (c) The adjacency-matrix representation
of G.

Adjacency list representation of a graph is very memory efficient when the graph has a large
number of vertices but very few edges.
Example 4

GRAPH SEARCHING(TRAVERSAL)

TRAVERSING A GRAPH Many application of graph requires a structured system to examine


the vertices and edges of a graph G. That is a graph traversal, which means visiting all the nodes
of the graph. There are two graph traversal methods.

• Breadth First Search (BFS)


• Depth First Search (DFS)

(1) BREADTH-FIRST SEARCH

Breadth-first search is a traversal through a graph that touches all of the vertices reachable from a
particular source vertex. In addition, the order of the traversal is such that the algorithm will

Notes Prepared by Peninah J. Limo Page 5


explore all of the neighbors of a vertex before proceeding on to the neighbors of its neighbors. A
queue is used to keep track of the progress of traversing the neighbor nodes.

To keep track of progress, breadth-first-search colors each vertex. Each vertex of the graph is in
one of three states:

1. Undiscovered;
2. Discovered but not fully explored; and
3. Fully explored.

The state of a vertex, u, is stored in a color variable as follows:

1. color[u] = White - for the "undiscovered" state,


2. color [u] = Gray - for the "discovered but not fully explored" state, and
3. color [u] = Black - for the "fully explored" state.

To keep track of progress, breadth-first search colors each vertex white, gray, or black.
All vertices start out white and may later become gray and then black. A vertex is discovered the
first time it is encountered during the search, at which time it becomes nonwhite. Gray and black
vertices, therefore, have been discovered, but breadth-first search distinguishes between them to
ensure that the search proceeds in a breadth-first manner. If (u, v) E and vertex u is black, then
vertex v is either gray or black; that is, all vertices adjacent to black vertices have been
discovered. Gray vertices may have some adjacent white vertices; they represent the frontier
between discovered and undiscovered vertices.

Breath-first search (BFS) uses a queue data structure for storing nodes which are not yet
processed in the initialization phase the algorithm starts at an arbitrary node and places it into the
queue. In the initialization phase the algorithm starts at an arbitrary node and places it into the
queue. Then BFS processes the head H of the queue and inserts all its unprocessed descendants
into the queue BFS repeats this step till the queue is not empty.

Algorithm for BFS

BFS(G,s)
1 for each vertex u V[G] - {s}
2 do color[u] WHITE
3 d[u]
4 [u] NIL
5 color[s] GRAY
6 d[s] 0
7 [s] NIL

Notes Prepared by Peninah J. Limo Page 6


8 Q {s}
9 while Q
10 do u head[Q]
11 for each v Adj[u]
12 do if color[v] = WHITE
13 then color[v] GRAY
14 d[v] d[u] + 1
15 [v] u
16 ENQUEUE(Q,v)
17 DEQUEUE(Q)
18 color[u] BLACK

In breath-first-search procedure above the color of each vertex u V is stored in the variable
color[u] and the predecessor of u is stored in the variable [u]. If u has no predecessor for
example, if u=s or u has not been discovered then [u]=NIL. The distance from the source s to
vertex u computed by the algorithm is stored in d[u]. The algorithm also uses first-in first-out
queue Q to manage the set of gray vertices

The procedure BFS works as follows. Lines 1-4 paint every vertex white set d[u] to infinity for
every vertex u, and set the parent of every vertex to be NIL. Line 5 paints the source vertex s
gray, since its considered to be discovered when the procedure begins. Line 6 initializes d[s] to 0,
and line 7 sets the predecessor of the source to be NIL. Line 8 initializes Q to the queue
containing just the vertex s; thereafter, Q always contains the set of gray vertices.

The loop in Line 9-18 will iterate as long as there remain gray vertices, which are discovered
vertices that have not yet had their adjacency lists fully examined. Line 10 determines the fray
vertex u at the head of the queue Q. The for loop of lines 11-16 considers each vertex v in the
adjacency list of u. If v is white, then it has not yet been discovered and the algorithm discovers
it by executing Lines 13-16. It is first grayed, and its distance d[v] is set to d[u]+1. Then, u is
recorded as its parent. Finally, it is placed at the tail of the queue Q. When all the vertices on u’s
adjacency list have been examined, u is removed from Q and blackened in lines 17-18.

ANALYSIS

The operations of enqueuing and dequeuing take O(1) time, so the total time devoted to queue
operations is O(V). Because the adjacency list of each vertex is scanned only when the vertex is
dequeued, the adjacency list of each vertex is scanned at most once. Since the sum of the lengths
of all the adjacency lists is (E), at most O(E) time is spent in total scanning adjacency lists.
The overhead for initialization is O(V), and thus the total running time of BFS is O(V + E). Thus,
breadth-first search runs in time linear in the size of the adjacency- list representation of G.

Notes Prepared by Peninah J. Limo Page 7


Example 1

The following figure (from CLRS) illustrates the progress of breadth-first search on the
undirected sample graph.

(2) DEPTH-FIRST SEARCH

A depth first search (DFS) visits all the vertices in a graph. When choosing which edge to
explore next, this algorithm always chooses to go ``deeper'' into the graph. That is, it will pick
the next adjacent unvisited vertex until reaching a vertex that has no unvisited adjacent vertices.
The algorithm will then backtrack to the previous vertex and continue along any as-yet
unexplored edges from that vertex. After DFS has visited all the reachable vertices from a
particular source vertex, it chooses one of the remaining undiscovered vertices and continues the
search. This process creates a set of depth-first trees which together form the depth-first forest.
DFS uses a stack for storing unprocessed nodes.

As in breadth-first search, vertices are colored during the search to indicate their state.
Each vertex is initially white, is grayed when it is discovered in the search, and is blackened
when it is finished, that is, when its adjacency list has been examined completely. This technique
guarantees that each vertex ends up in exactly one depth-first tree, so that these trees are disjoint.

Notes Prepared by Peninah J. Limo Page 8


Besides creating a depth-first forest, depth-first search also timestamps each vertex. Each
vertex v has two timestamps: the first timestamp d[v] records when v is first discovered (and
grayed), and the second timestamp f[v] records when the search finishes examining v's adjacency
list (and blackens v). These timestamps are used in many graph algorithms and are generally
helpful in reasoning about the behavior of depth-first search.

The procedure DFS below records when it discovers vertex u in the variable d[u] and
when it finishes vertex u in the variable f[u]. These timestamps are integers between 1 and 2 |V|,
since there is one discovery event and one finishing event for each of the |V| vertices. For every
vertex u,

d[u] < f[u] .

Vertex u is WHITE before time d[u], GRAY between time d[u] and time f[u], and BLACK
thereafter.

Algorithm

The following pseudocode is the basic depth-first-search algorithm. The input graph G may be
undirected or directed. The variable time is a global variable that we use for timestamping.

DFS(G)
1 for each vertex u V[G]
2 do color[u] WHITE
3 [u] NIL
4 time 0
5 for each vertex u V[G]
6 do if color[u] = WHITE
7 then DFS-VISIT(u)

Procedure DFS works as follows. Lines 1-3 paint all vertices white and initialize their fields to
NIL. Line 4 resets the global time counter. Lines 5-7 check each vertex in V in turn and, when a
white vertex is found, visit it using DFS-VISIT. Every time DFS-VISIT(u) is called in line 7,
Notes Prepared by Peninah J. Limo Page 9
vertex u becomes the root of a new tree in the depth-first forest. When DFS returns, every vertex
u has been assigned a discovery time d[u] and a finishing time f[u].

In each call DFS-VISIT(u), vertex u is initially white. Line 1 paints u gray, and line 2 records the
discovery time d[u] by incrementing and saving the global variable time. Lines 3-6 examine each
vertex v adjacent to u and recursively visit v if it is white. As each vertex v Adj[u] is considered
in line 3, we say that edge (u, v) is explored by the depth-first search. Finally, after every edge
leaving u has been explored, lines 7-8 paint u black and record the finishing time in f[u].

Example 1

In the following figure, the solid edge represents discovery or tree edge and the dashed edge
shows the back edge. Furthermore, each vertex has two time stamps: the first time-stamp records
when vertex is first discovered and second time-stamp records when the search finishes
examining adjacency list of vertex.

Notes Prepared by Peninah J. Limo Page 10


The progress of the depth-first-search algorithm DFS on a directed graph. As edges are explored
by the algorithm, they are shown as either shaded (if they are tree edges) or dashed (otherwise).
Nontree edges are labeled B, C, or F according to whether they are back, cross, or forward edges.
Vertices are timestamped by discovery time/finishing time.

Example 2

Starting from node U we can either discover node V or Y. Suppose that we discover node V,
which has a single outgoing edge to node W. W has no outgoing edges, so this node is finished,
and we return to V. From V there is no other choice, so this node is also finished and we return
to U. From node U we can continue to discover Y and its descendants, and the procedure
continues similarly. At stage (l) we have discovered and finished nodes U, V, W, X, Y. Selecting
node Q as a new starting node, we can discover the remaining nodes (in this case Z).

Notes Prepared by Peninah J. Limo Page 11


Example 3

Figure below illustrates the progress of DFS on the graph shown.

Notes Prepared by Peninah J. Limo Page 12


The progress of the depth-first-search algorithm DFS on a directed graph. As edges are explored
by the algorithm, they are shown as either shaded (if they are tree edges) or dashed (otherwise).

Classification of edges

The DFS algorithm can be modified to classify edges as it encounters them. The key idea is that
each edge (u, v) can be classified by the color of the vertex v that is reached when the edge is
first explored (except that forward and cross edges are not distinguished):

1. WHITE indicates a tree edge,

2. GRAY indicates a back edge, and

3. BLACK indicates a forward or cross edge.

This edge classification can be used to glean important information about a graph.

1. Tree edges - If v is visited for the first time as we traverse the edge (u, v), then the edge is a
tree edge. A Tree Edge is an edge that connects a vertex with its parent.

2. Back edges - If v is an ancestor of u, then edge (u, v) is a back edge. Back Edge is a non-tree
edge that connects a vertex with an ancestor.

3. Forward edges - if v is a descendant of u, then edge (u, v) is a forward edge.

4. Cross edges - if v is neither an ancestor or descendant of u, then edge (u, v) is a cross edge.

TOPOLOGICAL SORT
At the heart of topological sort is a depth-first search. A topological sort or topological
ordering of a directed graph is a linear ordering of its vertices such that for every directed
edge uv from vertex u to vertex v, u comes before v in the ordering. A topological sort of a graph
can be viewed as an ordering of its vertices along a horizontal line so that all directed edges go
from left to right. ˆ Topological sorting is thus different from the usual kind of "sorting"
For instance, the vertices of the graph may represent tasks to be performed, and the edges
may represent constraints that one task must be performed before another; in this application, a
topological ordering is just a valid sequence for the tasks. A topological ordering is possible if

Notes Prepared by Peninah J. Limo Page 13


and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any
DAG has at least one topological ordering, and algorithms are known for constructing a
topological ordering of any DAG in linear time. The canonical application of topological sorting
(topological order) is in scheduling a sequence of jobs or tasks based on their dependencies.
Topological orderings have many uses for problems ranging from job scheduling to
determining the order in which to compute quantities that depend on one another (e.g.,
spreadsheets, order of compilation of modules).

Example
The following figure gives an example that arises when Professor Bumstead gets dressed in the
morning. The DAG of dependencies for putting clothing is given (not shown in the figure). (a)
The discovery and finishing times from depth-first search are shown next to each vertex. (b)
Then the same DAG shown topologically sorted.

From the example above Order of vertices:

18 socks
16 underpants
15 pants
14 shoes
10 watch
8 shirt
7 belt
5 tie
4 jacket

It should be clear from above that we don't need to sort by finish times. We can just output
vertices as they are finished and understand that we want the reverse of this list. Or we can put

Notes Prepared by Peninah J. Limo Page 14


vertices onto the front of a linked list as they are finished. When done, the list contains vertices
in topologically sorted order.

Example 2

The following figure shows a DAG and a topological ordering for the graph.

Topological Sort Algorithm

Topological-Sort(G)

1 call DFS(G) to compute finishing times f[v] for each vertex v

2 as each vertex is finished, insert it onto the front of a linked list

3 return the linked list of vertices

We can perform a topological sort in time Θ(V + E), since depth-first search takes Θ(V + E) time
and it takes O(1) time to insert each of the |V| vertices onto the front of the linked list.

MINIMUM SPANNING TREE (MST)

A minimum spanning tree (MST) of a weighted graph G is a spanning tree of G whose edges
sum to minimum weight. In other words, a minimum spanning tree is a tree formed from a subset
of the edges in a given undirected graph, with two properties:
• it spans the graph, i.e., it includes every vertex in the graph, and
• it is a minimum, i.e., the total weight of all the edges is as low as possible.

Minimum spanning tree has direct application in the design of networks. It is used in:

• algorithms approximating

Notes Prepared by Peninah J. Limo Page 15


• the travelling salesman problem,
• multi-terminal minimum cut problem and
• minimum-cost weighted perfect matching.

Other practical applications are:

1. Cluster Analysis
2. Handwriting recognition
3. Image segmentation

There are two famous algorithms for finding the Minimum Spanning Tree:

• Kruskal’s algorithm
• Prism algorithm

1. KRUSKAL’S ALGORITHM

Kruskal’s Algorithm builds the spanning tree by adding edges one by one into a growing
spanning tree. Kruskal's algorithm follows greedy approach as in each iteration it finds an edge
which has least weight and add it to the growing spanning tree.

To do this we will simply sort the edges in the increasing order of their weight and select the
edges one by one from the beginning. But sometimes, selecting edges in this order can create a
cycle. To avoid cycle creation, we have to check if the selected edge is creating a cycle or not. If
it is creating a cycle then simply ignore it otherwise add it to into the growing spanning tree. This
can be done simply use Disjoint Set data structure (Union Find).

Disjoint Set data structure can tell us if two nodes are connected or not. So if the endpoints of an
edge have same root , i.e., they are connected, then this edge will create a cycle, so just ignore it

Notes Prepared by Peninah J. Limo Page 16


otherwise add it into the growing spanning tree. So at the end of the algorithm we will end up
with a tree which is a minimum spanning tree.

In Kruskal’s algorithm, at each iteration we will select the edge with the lowest weight. So, we
will start with the lowest weighted edge first i.e., the edges with weight 1. After that we will
select the second lowest weighted edge i.e., edge with weight 2. Notice these two edges are
totally disjoint. Now, the next edge will be the third lowest weighted edge i.e., edge with weight
3, which connects the two disjoint pieces of the graph. Now, we are not allowed to pick the edge
with weight 4, that will create a cycle and we can’t have any cycles. So, we will select the fifth
lowest weighted edge i.e., edge with weight 5. Now the other two edges will create cycles so we
will ignore them. In the end, we end up with a minimum spanning tree with total cost 11 ( = 1 +
2 + 3 + 5).

Notes Prepared by Peninah J. Limo Page 17


Example 2

Consider the weighted graph below. Run Kruskal’s algorithm starting from vertex A. Write the
edges in the order which they are added to the minimum spanning tree.

Solution

CD, BE, AE, BC, F G, CG, DH

Pseudo code:

1. Sort the edges in nondecreasing order of the edge weights.


2. Select the edge with minimum weight and if cycle is formed, ignore it, otherwise add this
edge into the spanning tree.
3. Repeat step 2 for all edges.

Time Complexity:
In Kruskal’s algorithm, most time-consuming operation is sorting because the total complexity
of the Disjoint-Set operations will be O(E(log*) V). So total running time will be O(ElgE). But
since |E| < |V|2, logE = O(lgV). So, time complexity of Kruskal’s algorithm is O(ElgV).

2. PRIM’S ALGORITHM

Prim’s Algorithm also use Greedy Approach to find the minimum spanning tree. In Prim’s
Algorithm we grow the spanning tree from a starting position. In contrast to the Kruskal’s
Algorithm, Prim’s Algorithm selects one vertex at a time and adds it into the growing spanning
tree. Prim’s Algorithm maintains two disjoint sets of vertices. One contains vertices that are in
the growing spanning tree and other that are not in the growing spanning tree.

Prim’s algorithm selects the cheapest vertex that is connected to the growing spanning tree and is
not in the growing spanning tree and add it into the growing spanning tree. This can be done
using Priority Queues. We will simply insert the vertices, that are connected to growing spanning

Notes Prepared by Peninah J. Limo Page 18


tree, into the Priority Queue. But just like Kruskal’s Algorithm, we need to check for cycles. To
do that we will mark the nodes which we have already selected and insert only those nodes in the
Priority Queue that are not marked.

Example

In Prim’s Algorithm, we will start with an arbitrary node (it doesn’t matter which one) and mark
it. In each iteration we will mark a new vertex that is adjacent to the one that we have already
marked. As a greedy algorithm, Prim’s algorithm will select the cheapest edge and mark the
vertex. So, we will simply choose the edge with weight 1. In the next iteration we have three
options, edges with weight 2, 3 and 4. So, we will select the edge with weight 2 and mark the
vertex. Now again we have three options, edges with weight 3, 4 and 5. But we can’t choose
edge with weight 3 as it is creating a cycle. So, we will select the edge with weight 4 and we end
up with the minimum spanning tree of total cost 7 (= 1 + 2 +4).

Example 2

Consider the weighted graph below. Run Prim’s algorithm starting from vertex A. Write the
edges in the order which they are added to the minimum spanning tree

Notes Prepared by Peninah J. Limo Page 19


Solution

AE, BE, BC, CD, CG, F G, DH

Pseudo code:

1. Start with a random vertex and mark it.


2. Check all the vertices that are adjacent to any marked vertex and select the cheapest
vertex does not create a cycle and mark it.
3. Repeat step 2 until all the vertices are marked.

Time Complexity:
The time complexity of the Prim’s Algorithm is O ((V + E) logV) because each vertex is
inserted in the priority queue only once and insertion in priority queue take logarithmic time.

Notes Prepared by Peninah J. Limo Page 20

You might also like