18: Graph Data Structures: Software Development 2 Bell College
18: Graph Data Structures: Software Development 2 Bell College
18: Graph Data Structures: Software Development 2 Bell College
Introduction ..................................................................................................................... 1
Describing graphs........................................................................................................... 2
Directed Graphs .............................................................................................................. 3
Traversing a graph.......................................................................................................... 4
EXERCISE: Traversal...................................................................................................... 8
Implementing a Graph .................................................................................................... 9
EXERCISE: Looking at a Graph ................................................................................... 12
EXERICISE: Implementing Traversal .......................................................................... 14
Networks ........................................................................................................................ 16
EXERCISE: Using Dykstra’s Algorithm....................................................................... 23
Introduction
We looked previously at the binary tree data structure, which provides a useful way of
storing data for efficient searching. In a binary tree, each node can have up to two child
nodes. More general tree structures can be created in which different numbers of child
nodes are allowed for each node.
All tree structures are hierarchical. This means that each node can only have one
parent node. Trees can be used to store data which has a definite hierarchy; for
example a family tree or a computer file system.
Some data need to have connections between items which do not fit into a hierarchy
like this. Graph data structures can be useful in these situations. A graph consists of a
number of data items, each of which is called a vertex. Any vertex may be connected to
any other, and these connections are called edges.
The following figure shows a graph in which the vertices are the names of cities in North
America. The edges could represent flights between these cities, or possibly Wide Area
Network links between them.
Edmonton
Anchorage Billings
Grand Rapids
Corvallis Denver
Flagstaff Houston
Describing graphs
Two vertices in a graph are adjacent if the form an edge. For example, Anchorage and
Corvallis are adjacent, while Anchorage and Denver are not. Adjacent vertices are called
neighbours.
A path is a sequence of vertices in which each successive pair is an edge. For example:
Anchorage-Billings-Denver-Edmonton-Anchorage
A cycle is a path in which the first and last vertices are the same and there are no
repeated edges. For example:
Anchorage-Billings-Denver-Flagstaff
An undirected graph is connected if, for any pair of vertices, there is a path between
them. The graph above is connected, while the following one is not, as there are no
paths to Corvallis.
Edmonton
Anchorage Billings
Corvallis Denver
A tree data structure can be described as a connected, acyclic graph with one element
designated as the root element. It is acyclic because there are no paths in a tree which
start and finish at the same element.
Directed Graphs
Edmonton
Anchorage Billings
Grand Rapids
Corvallis Denver
Flagstaff Houston
A path in a directed graph must follow the direction of the arrows. Note that there are
two edges in this example between Denver and Flagstaff, so it is possible to travel in
either direction. The following is a path in this graph
Billings-Denver-Flagstaff
while the following is not, because there is no edge from Denver to Billings:
Flagstaff-Billings-Denver
A directed graph is connected if, for any pair of vertices, there is a path between them.
The following example graph is not connected – can you see why? What single edge
could you change to make it connected?
Anchorage Billings
Corvallis Denver
Traversing a graph
Traversal is the facility to move through a structure visiting each of the vertices once. We
looked previously at the ways in which a binary tree can be traversed. Two possible
traversal methods for a graph are breadth-first and depth-first.
Breadth-First Traversal
This method visits all the vertices, beginning with a specified start vertex. It can be
described roughly as “neighbours-first”. No vertex is visited more than once, and vertices
are only visited if they can be reached – that is, if there is a path from the start vertex.
Breadth-first traversal makes use of a queue data structure. The queue holds a list of
vertices which have not been visited yet but which should be visited soon. Since a queue
is a first-in first-out structure, vertices are visited in the order in which they are added to
the queue.
Visiting a vertex involves, for example, outputting the data stored in that vertex, and also
adding its neighbours to the queue. Neighbours are not added to the queue if they are
already in the queue, or have already been visited.
1
Edmonton
Anchorage Billings
Grand Rapids
Corvallis Denver
Flagstaff Houston
Visited: Anchorage
Queue: Billings, Corvallis, Edmonton visit Billings next
2
Edmonton
Anchorage Billings
Grand Rapids
Corvallis Denver
Flagstaff Houston
Note that we only add Denver to the queue as the other neighbours of Billings are
already in the queue.
3
Edmonton
Anchorage Billings
Grand Rapids
Corvallis Denver
Flagstaff Houston
Note that nothing is added to the queue as Denver, the only neighbour of Corvallis, is
already in the queue.
4
Edmonton
Anchorage Billings
Grand Rapids
Corvallis Denver
Flagstaff Houston
5
Edmonton
Anchorage Billings
Grand Rapids
Corvallis Denver
Flagstaff Houston
6
Edmonton
Anchorage Billings
Grand Rapids
Corvallis Denver
Flagstaff Houston
7
Edmonton
Anchorage Billings
Grand Rapids
Corvallis Denver
Flagstaff Houston
Note that Grand Rapids was not added to the queue as there is no path from Houston
because of the edge direction. Since the queue is empty, we must stop, so the traversal
is complete. The order of traversal was:
EXERCISE: Traversal
1. Find the order of breadth-first traversals of the graph in the example starting at
(a) Billings and (b) Flagstaff
Find the order of a depth-first traversal of the graph in the example starting at
Anchorage1.
1
Answer should be: Anchorage, Edmonton, Corvallis, Denver, Flagstaff, Houston, Billings
Implementing a Graph
The diagrams we have seen of graphs show the data and connections in a visual way.
To make a Java Graph class, we have to work out a way in which that information can
actually be stored and accessed. This is known as the internal representation.
There are several possible internal representations for a graph data structure (this is also
true for binary trees). We will look at one which stores information as follows:
Vertices are stored as keys in a Map structure – this means a vertex can be quickly
looked up. This Map is known as the adjacency map.
Edges starting from each vertex are stored as a List of the adjacent vertices. This List
is stored as the value associated with the appropriate key in the Map.
For example, the representation of the graph used in the examples above would consist
of a Map with the following entry representing Anchorage:
Key: “Anchorage”
Value: [“Billings”, “Corvallis”, “Edmonton”]
If the adjacency map is a HashMap, and the edges for each vertex are stored in a
LinkedList, then the following diagram shows part of the internal representation of the
example graph:
adjacencyMap
key value
null
3 Corvallis Denver
null
null
The following code shows a basic graph class. The HashMap and LinkedList classes are
the ones you have used in previous chapters. Alternatively, you could use the equivalent
Java Collections Framework classes.
/**
* class Graph
*
* @author Jim
* @version 1.0
*/
public class Graph
{
protected HashMap adjacencyMap;
/**
* Initialize this Graph object to be empty.
*/
public Graph()
{
adjacencyMap = new HashMap();
}
/**
* Determines if this Graph contains no vertices.
*
* @return true - if this Graph contains no vertices.
*/
public boolean isEmpty()
{
return adjacencyMap.isEmpty();
}
/**
* Determines the number of vertices in this Graph.
*
* @return the number of vertices.
*/
public int size()
{
return adjacencyMap.size();
}
/**
* Returns the number of edges in this Graph object.
*
* @return the number of edges.
*/
public int getEdgeCount()
{
int count = 0;
for (int i=0;i<adjacencyMap.CAPACITY;i++){
if (adjacencyMap.keys[i] != null){
LinkedList edges = (LinkedList)
adjacencyMap.get(adjacencyMap.keys[i]);
count += edges.size();
}
}
return count;
}
/**
* Adds a specified object as a vertex
*
* @param vertex - the specified object
* @return true - if object was added by this call
*/
public boolean addVertex (Object vertex)
{
if (adjacencyMap.containsKey(vertex))
return false;
adjacencyMap.put (vertex, new LinkedList());
return true;
}
/**
* Adds an edge, and vertices if not already present
*
* @param v1 - the beginning vertex object of the edge
* @param v2 - the ending vertex object of the edge
* @return true - if the edge was added by this call
*/
public boolean addEdge (Object v1, Object v2)
{
addVertex (v1); addVertex (v2);
LinkedList l = (LinkedList)adjacencyMap.get(v1);
l.add(v2);
return true;
}
}
Create a new BlueJ project called simplegraph. Add a new class Graph using the above
code. Add the HashMap class from your simplehashmap project. Add the List, Node and
LinkedList classes from your simplelist project.
Create a new instance of Graph called graph1. Call the addEdge method repeatedly to
add the following edges (this should construct a directed graph equivalent to the
example used earlier in this chapter):
addEdge("Anchorage", "Billings");
addEdge("Anchorage", "Corvallis");
addEdge("Anchorage", "Edmonton");
addEdge("Billings", "Denver");
addEdge("Billings", "Edmonton");
addEdge("Corvallis", "Denver");
addEdge("Denver", "Edmonton");
addEdge("Denver", "Flagstaff");
addEdge("Flagstaff", "Denver");
addEdge("Flagstaff", "Houston");
addEdge("Grand Rapids", "Houston");
Inpsect graph1. The only field is adjacencyMap. Click the Inspect button in the Object
Inspector to inspect adjacencyMap. This allows you to access the keys and the values
stored in the map.
Inspect the keys array. Check that “Anchorage” is included. Note its position (6 in this
screenshot).
Inspect the values array. You should see an array which includes some object
references. Inspect the object at the same position as “Anchorage” occupied in the keys
array (6 in this case).
What do you expect to find if you inspect the value object with the same
position as “Denver”?
Add the Queue class from your queues project. Add the following method to your Graph
class2:
/**
* Lists the vertices reached using a Breadth First
* traversal with a specified starting point
*
* @param start - the starting vertex for the traversal
*/
public void breadthFirstTraversal(Object start)
{
Queue queue = new Queue();
HashMap reached = new HashMap();
Object current;
for (int i=0;i<adjacencyMap.CAPACITY;i++){
if (adjacencyMap.keys[i] != null){
reached.put(adjacencyMap.keys[i], false);
}
}
queue.add(start);
reached.set (start, true);
while (!(queue.isEmpty()))
{
Object to;
current = queue.remove();
LinkedList edgeList = (LinkedList)
adjacencyMap.get (current);
2
requires J2SE 5.0 or later
Create a new instance of Graph called graph1 and add the same edges as in the
previous exercise.
Call the breathFirstTraversal method of graph1 and specify “Anchorage” as your start
vertex.
Compare the output with the worked example which starts on page 4 of this
chapter.
Add a new method depthFirstTraversal to your Graph class and test it. You will need to
make use of the Stack class from your stacks project.
Networks
Sometimes the edges in a graph have numbers, or weights, associated with them.
Weights in the example below could be based on, for example, costs of flights, or on
WAN bandwidth. A graph like this is called a weighted graph, or a network.
Edmonton
15
10
Anchorage Billings
3
4
Grand Rapids
2 1
Corvallis Denver
4
5
0
2
Flagstaff Houston
4
In a network, each path has a total weight. For example, the path Anchorage-Billings-
Edmonton has a total weight of 4 + 10 = 14. This is in fact a shorter path than the direct
path Anchorage-Edmonton which has a weight of 15.
Finding the shortest path between two vertices is often important in answering questions
like “What is the cheapest way to fly from Anchorage to Flagstaff?” or “What is the best
way to route WAN traffic between Billings and Edmonton?”
The shortest path can be found using Dijkstra’s algorithm. This is similar to the breadth
first traversal we looked at earlier in this chapter, except that it uses a special kind of
queue data structure called a priority queue.
In the priority queue, items are removed in order of value rather than in order of being
added to the queue. When the target vertex is removed from the priority queue, the
shortest path has been found.
• The lowest total path weight, or weightsum, from the start point to each vertex
• The immediate predecessor in that path for each vertex
• The contents of the priority queue
The priority queue contains vertices in order of weightsum value – the lowest is removed
first. All weightsum are set to a large value to start with.
1
new paths added to the table
in each step are highlighted Edmonton
15
10
Anchorage Billings
3
4
Grand Rapids
2 1
Corvallis Denver
4
5
0
2
Flagstaff Houston
4
2
Edmonton
15
10
Anchorage Billings
3
4
Grand Rapids
2 1
Corvallis Denver
4
5
0
2
Flagstaff Houston
4
3
Edmonton
15
10
Anchorage Billings
3
4
Grand Rapids
2 1
Corvallis Denver
4
5
0
2
Flagstaff Houston
4
Note that the new, shorter paths to Denver and Edmonton through Billings are
added to the priority queue. They do not replace the previous paths, through
Corvallis, but take priority over them because the weightsums are lower.
The weightsums and predecessors in the table are updated to take account of
the new paths.
4
Edmonton
15
10
Anchorage Billings
3
4
Grand Rapids
2 1
Corvallis Denver
4
5
0
2
Flagstaff Houston
4
Note that the new, shorter path to Edmonton through Denver is added to the
priority queue, as is a path to Flagstaff.
5
Edmonton
15
10
Anchorage Billings
3
4
Grand Rapids
2 1
Corvallis Denver
4
5
0
2
Flagstaff Houston
4
Note that in this step we dequeued path to Denver with weightsum 7. This value
is higher than the value of 5 in the table, so the table should not be updated.
7
Edmonton
15
10
shortest path is highlighted
Anchorage Billings
3
4
Grand Rapids
2 1
Corvallis Denver
4
5
0
2
Flagstaff Houston
4
We have now removed a path, with weightsum 8, to the target, Edmonton, from the
queue. This must be the shortest path to Edmonton as the shortest paths to any
vertex are always removed first.
All we need to do now is to trace the path by looking at the predecessors in the table.
The predecessor of Edmonton is Denver; the predecessor of Denver is Billings; the
predecessor of Billings is Anchorage.
Anchorage-Billings-Denver-Edmonton
Use Dijkstra’s algorithm to find the shortest path from Corvallis to Edmonton in the
following graph (note that the graph is a bit different from the one in the example).3
Edmonton
15
10
Anchorage Billings
3
4
Grand Rapids
2 0 1
Corvallis Denver
4
6
0
2
Flagstaff Houston
4
Further Reading
Data Structures and the Java Collections Framework by W.J. Collins includes a full Java
implementation of a Network class which uses Dijkstra’s algorithm.
3
Answer should be: Corvallis-Denver-Edmonton, total weight 9