Module 5 - Chapter 9 - Graphs and Algorithms - MSDS 6203 Data Systems and Algorithms N2A
Module 5 - Chapter 9 - Graphs and Algorithms - MSDS 6203 Data Systems and Algorithms N2A
Lesson
Graphs are a non-linear data structure, in which the problem is represented as a network by connecting a set of nodes with edges, like a
telephone network or social network. For example, in a graph, nodes can represent different cities while the links between them represent
edges. Graphs are one of the most important data structures; they are used to solve many computing problems, especially when the problem
is represented in the form of objects and their connection, e.g. to find out the shortest path from one city to another city. Graphs are useful
data structures for solving real-world problems in which the problem can be represented as a network-like structure. In this chapter, we will be
discussing the most important and popular concepts related to graphs.
Graph
A graph is a set of a finite number of vertices (also known as nodes) and edges, in which the edges are the links between vertices, and each
edge in a graph joins two distinct nodes. Moreover, a graph is a formal mathematical representation of a network, i.e. a graph G is an ordered
pair of a set V of vertices and a set E of edges, given as G = (V, E) in formal mathematical notation.
• V = {A, B, C, D, E}
• E = {{A, B}, {A, C}, {B, C}, {B, D}, {C, D}, {D, D}, {B, E}, {D, E}}
• G = (V, E)
Let’s discuss some of the important definitions of a graph. A graph consists of vertices (or nodes) connected by edges. Common terminology
includes:
• Node or vertex: A point or node in a graph is called a vertex. In the preceding diagram, the vertices or nodes are A, B, C, D, and E and
are denoted by a dot.
• Edge: This is a connection between two vertices. The line connecting A and B is an example of an edge.
• Loop: When an edge from a node is returned to itself , that edge forms a loop, e.g. D node.
• Degree of a vertex/node: The total number of edges that are incidental on a given vertex is called the degree of that vertex. For example,
the degree of the B vertex in the previous diagram is 4 .
• Adjacency: This refers to the connection(s) between any two nodes; thus, if there is a connection between any two vertices or nodes,
then they are said to be adjacent to each other. For example, the C node is adjacent to the A node because there is an edge between
them.
• Path: A sequence of vertices and edges between any two nodes represents a path. For example, CABE represents a path from
the C node to the E node.
• Leaf vertex (also called pendant vertex): A vertex or node is called a leaf vertex or pendant vertex if it has exactly one degree.
• Types of Graphs
◦ Directed Graph: Edges have a direction, from one vertex to another.
◦ Undirected Graph: Edges do not have a direction.
◦ Weighted Graph: Edges have weights or costs associated with them.
◦ Unweighted Graph: Edges do not have weights.
Graphs are represented by the edges between the nodes. The connecting edges can be considered directed or undirected. If the connecting
edges in a graph are undirected, then the graph is called an undirected graph, and if the connecting edges in a graph are directed, then it is
called a directed graph. An undirected graph simply represents edges as lines between the nodes. There is no additional information about
the relationship between the nodes, other than the fact that they are connected. For example, in Figure 9.2, we demonstrate an undirected
graph of four nodes, A, B, C, and D, which are connected using edges:
In a directed graph, the edges provide information on the direction of connection between any two nodes in a graph. If an edge from A node
to B is said to be directed, then the edge (A, B) would not be equal to the edge (B, A). The directed edges are drawn as lines with arrows,
which will point in whichever direction the edge connects the two nodes.
For example, in Figure 9.3, we show a directed graph where many nodes are connected using directed edges:
Graph Representations
• Adjacency Matrix
A 2D array used to represent a graph where the element at row i and column j indicates the presence (and possibly the weight) of an edge
between vertices i and j.
• Adjacency List
A list where each vertex has a list of adjacent vertices. This representation is more space-efficient for sparse graphs.
The arrow of an edge determines the flow of direction. One can only move from A to B, as shown in the preceding diagram—not B to A. In a
directed graph, each node (or vertex) has an indegree and an outdegree. Let’s have a look at what these are:
• Indegree: The total number of edges that come into a vertex in the graph is called the indegree of that vertex. For example, in the
previous diagram, the E node has 1 indegree, due to edge CE coming into the E node.
• Outdegree: The total number of edges that go out from a vertex in the graph is called the outdegree of that vertex. For example,
the E node in the previous diagram has an outdegree of 2 , as it has two edges, EF and ED, going out of that node.
• Isolated vertex: A node or vertex is called an isolated vertex when it has a degree of zero, as shown as G node in Figure 9.3.
• Source vertex: A vertex is called a source vertex if it has an indegree of zero. For example, in the previous diagram, the A node is the
source vertex.
• Sink vertex: A vertex is a sink vertex if it has an outdegree of zero. For example, in the previous diagram, the F node is the sink vertex.
Now that we understand how directed graphs work, we can look into directed acyclic graphs.
A directed acyclic graph (DAG) is a directed graph with no cycles; in a DAG all the edges are directed from one node to another node so
that the sequence of edges never forms a closed loop. A cycle in a graph is formed when the starting node of the first edge is equal to the
ending node of the last edge in a sequence.
A DAG is shown in Figure 9.4 in which all the edges in the graph are directed and the graph does not have any cycles:
So, in a directed acyclic graph, if we start on any path from a given node, we never find a path that ends on the same node. A DAG has many
applications, such as in job scheduling, citation graphs, and data compression.
Weighted graphs
A weighted graph is a graph that has a numeric weight associated with the edges in the graph. A weighted graph can be either a directed or
an undirected graph. The numeric weight can be used to indicate distance or cost, depending on the purpose of the graph:
Figure 9.5: An example of a weighted graph
Let’s consider an example – Figure 9.5 indicates different ways to reach from A node to D node. There are two possible paths, such as
from A node to D node, or it can be nodes A-B-C-D through B node and C node. Now, depending on the weights associated with the edges,
any one of the paths can be considered better than the others for the journey – e.g. assume the weights in this graph represent the distance
between two nodes, and we want to find out the shortest path between A-D nodes; then one possible path A-D has an associated cost of 40,
and another possible path A-B-C-D has an associated cost of 25. In this case, the better path is A-B-C-D, which has a lower distance.
As the name suggests, the depth-first search (DFS) or traversal algorithm traverses the graph similar to how the preorder traversal
algorithm works in trees. In the DFS algorithm, we traverse the tree in the depth of any particular path in the graph. As such, child nodes
are visited first before sibling nodes.
In this, we start with the root node; firstly we visit it, and then we see all the adjacent vertices of the current node. We start visiting one of
the adjacent nodes. If the edge leads to a visited node, we backtrack to the current node. And, if the edge leads to an unvisited node, then
we go to that node and continue processing from that node. We continue the same process until we reach a dead end when there is no
unvisited node; in that case, we backtrack to previous nodes, and we stop when we reach the root node while backtracking.
Let’s take an example to understand the working of the DFS algorithm using the graph shown in Figure 9.22:
We start by visiting the A node, and then we look at the neighbors of the A vertex, then a neighbor of that neighbor, and so on. After
visiting the A vertex, we visit one of its neighbors, B (in our example, we sort alphabetically; however, any neighbor can be added), as
shown in Figure 9.23:
Figure 9.23: Nodes A and B are visited in depth-first traversal
After visiting the B vertex, we look at another neighbor of A, that is, S, as there is no vertex connected to B that can be visited. Next, we
look for the neighbors of the S vertex, which are the C and G vertices. We visit C as shown in Figure 9.24:
After visiting the C node, we visit its neighboring vertices, D and E, as shown in Figure 9.25:
Similarly, after visiting the E vertex, we visit the H and G vertices, as shown in Figure 9.26:
• Dijkstra's Algorithm
An algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks.
• Bellman-Ford Algorithm
An algorithm that computes shortest paths from a single source vertex to all of the other vertices in a weighted digraph.
• Kruskal's Algorithm
An algorithm that finds a minimum spanning tree for a connected weighted graph by adding increasing cost arcs while avoiding cycles.
Kruskal’s algorithm is a widely used algorithm for finding the spanning tree from a given weighted, connected, and undirected graph. It is
based on the greedy approach, as we firstly find the edge with the lowest weight and add it to the tree, and then in each iteration, we add
the edge with the lowest weight to the spanning tree so that we do not form a cycle. In this algorithm, initially, we treat all the vertices of
the graph as a separate tree, and then in each iteration we select edge with the lowest weight in such a way that it does not form a cycle.
These separate trees are combined, and it grows to form a spanning tree. We repeat this process until all the nodes are processed. The
algorithm works as follows:
We start by selecting the edge with the lowest weight (weight 1), as represented by the dotted line shown in Figure 9.29:
Figure 9.29: Selecting the first edge with the lowest weight in the spanning tree
After selecting the edge with weight 1, we select the edge with weight 2 and then the edge with weight 3, since these are the next lowest
weights, as shown in Figure 9.30:
Figure 9.30: Selecting edges with weights 2 and 3 in the spanning tree
Similarly, we select the next edges with weights 4 and 5 respectively as shown in Figure 9.31:
Figure 9.31: Selecting edges with weights 4 and 5 in the spanning tree
Next, we select the next edge with weight 6 and make it a dotted line. After that, we see that the lowest weight is 7 but if we select it, it
makes a cycle, so we ignore it. Next, we check the edge with weight 8, and then 9, which are also ignored because they will also form a
cycle. So, the next edge with the lowest weight, 10, is selected. This is shown in Figure 9.32:
Figure 9.32: Selecting edges with weights 6 and 10 in the spanning tree
Finally, we see the following spanning tree using Kruskal’s algorithm, as shown in Figure 9.33:
Figure 9.33: The final spanning tree created using Kruskal’s algorithm
Kruskal’s algorithm has many real-world applications, such as solving the traveling salesman problem (TSP), in which starting from one
city, we have to visit all the different cities in a network with the minimum total cost and without visiting the same city twice. There are
many other applications, such as TV networks, tour operations, LAN networks, and electric grids.
The time complexity of Kruskal’s algorithm is O (E log (E)) or O (E log(V)), where E is the number of edges and V is the number of
vertices.
• Prim's Algorithm
An algorithm that finds a minimum spanning tree for a weighted undirected graph by building the tree one vertex at a time.
Prim’s algorithm is also based on a greedy approach to find the minimum cost spanning tree. Prim’s algorithm is very similar to the Dijkstra
algorithm for finding the shortest path in a graph. In this algorithm, we start with an arbitrary node as a starting point, and then we check the
outgoing edges from the selected nodes and traverse through the edge that has the lowest cost (or weights). The terms cost and weight are
used interchangeably in this algorithm. So, after starting from the selected node, we grow the tree by selecting the edges, one by one, that
have the lowest weight and do not form a cycle. The algorithm works as follows:
1. Create a dictionary that holds all the edges and their weights
2. Get the edges, one by one, that have the lowest cost from the dictionary and grow the tree in such a way that the cycle is not formed
3. Repeat step 2 until all the vertices are visited
Let us consider an example to understand the working of Prim’s algorithm. Assuming that we arbitrarily select A node, we then check all the
outgoing edges from A. Here, we have two options, AB and AC; we select edge AC since it has less cost/weight (weight 1), as shown
in Figure 9.34:
Figure 9.34: Selecting edge AC in constructing the spanning tree using Prim’s algorithm
Next, we check the lowest outgoing edges from edge AC. We have options AB, CD, CE, CF, out of which we select edge CF, which has the
lowest weight of 2. Likewise, we grow the tree, and next we select the lowest weighted edge, i.e., AB, as shown in Figure 9.35:
Figure 9.35: Selecting edge AB in constructing the spanning tree using Prim’s algorithm
Afterward, we select edge BD, which has a weight of 3, and similarly, next, we select edge DG, which has the lowest weight of 4. This is
shown in Figure 9.36:
Figure 9.36: Selecting edges BD and DG in constructing the spanning tree using Prim’s algorithm
Next, we select edges FE and GH, which have weights of 6 and 10 respectively, as shown in Figure 9.37:
Figure 9.37: Selecting edges FE and GH in constructing the spanning tree using Prim’s algorithm
Next, whenever we try to include any more edges, a cycle is formed, so we ignore those edges. Finally, we obtain the spanning tree, which is
shown below in Figure 9.38:
Prim’s algorithm also has many real-world applications. For all the applications where we can use Kruskal’s algorithm, we can also use
Prim’s algorithm. Other applications include road networks, game development, etc.
Since both Kruskal’s and Prim’s MST algorithms are used for the same purpose, which one should be used? In general, it depends on the
structure of the graph. For a graph with C vertices and E edges, Kruskal’s algorithm’s worst-case time complexity is O(E logV), and Prim’s
algorithm has a time complexity of O(E + V logV). So, we can observe that Prim’s algorithm works better when we have a dense graph,
whereas Kruskal’s algorithm is better when we have a sparse graph.
• Network Routing
Using graphs to model and find the most efficient paths in network routing.
• Social Network Analysis
Analyzing social networks by representing individuals as vertices and their interactions as edges.
• Dependency Resolution
Using graphs to resolve dependencies in tasks, packages, or modules.
Performance Analysis
Summary
Lessons Learned from This Module:
Graphs are versatile data structures used to represent relationships between objects. This chapter covers graph representations, traversal
techniques (DFS and BFS), and key algorithms for shortest paths and minimum spanning trees. It includes practical applications in network
routing, social network analysis, and dependency resolution. Performance analysis, case studies, and best practices ensure a comprehensive
understanding of graph algorithms.