Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DS IV Unit Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 29

UNIT-IV GRAPHS

Graphs Terminology
A graph consists of:
 A set, V, of vertices (nodes)
 A collection, E, of pairs of vertices from V called edges (arcs) Edges, also called arcs, are
represented by (u, v) and are either:
Directed if the pairs are ordered (u, v) u the origin v the destination
Undirected if the pairs are unordered
A graph is a pictorial representation of a set of objects where some pairs of objects are connected
by links. The interconnected objects are represented by points termed as vertices, and the
links that connect the vertices are called edges.
Formally, a graph is a pair of sets (V, E), where V is the set of vertices and E is the set of edges,
connecting the pairs of vertices. Take a look at the following graph −

In the above graph, V = {a, b, c, d, e}


E = {ab, ac, bd, cd, de}

Then a graph can be:


Directed graph (di-graph) if all the edges are directed

Undirected graph (graph) if all the edges are undirected

Mixed graph if edges are both directed or undirected


Weighted: In a weighted graph, each edge is assigned a weight or cost. Consider a graph of 4
nodes as in the diagram below. As you can see each edge has a weight/cost assigned to it. If you
want to go from vertex 1 to vertex 3, you can take one of the following 3 paths:

o 1 -> 2 -> 3
o 1 -> 3
o 1 -> 4 -> 3

Therefore the total cost of each path will be as follows: - The total cost of 1 -> 2 -> 3 will
be (1 + 2) i.e. 3 units - The total cost of 1 -> 3 will be 1 unit - The total cost of 1 -> 4 -> 3
will be (3 + 2) i.e. 5 units

Cyclic: A graph is cyclic if the graph comprises a path that starts from a vertex and ends at
the same vertex. That path is called a cycle. An acyclic graph is a graph that has no cycle.

A tree is an undirected graph in which any two vertices are connected by only one path. A
tree is an acyclic graph and has N - 1 edges where N is the number of vertices. Each node in
a graph may have one or multiple parent nodes. However, in a tree, each node (except the
root node) comprises exactly one parent node.

Note: A root node has no parent.

A tree cannot contain any cycles or self loops, however, the same does not apply to graphs.
Illustrate terms on graphs

End-vertices of an edge are the endpoints of the edge.


Two vertices are adjacent if they are endpoints of the same edge.
An edge is incident on a vertex if the vertex is an endpoint of the edge.
Outgoing edges of a vertex are directed edges that the vertex is the origin.
Incoming edges of a vertex are directed edges that the vertex is the destination.
Degree of a vertex, v, denoted deg(v) is the number of incident edges.
Out-degree, outdeg(v), is the number of outgoing edges.
In-degree, indeg(v), is the number of incoming edges.
Parallel edges or multiple edges are edges of the same type and end-vertices
Self-loop is an edge with the end vertices the same vertex

Simple graphs have no parallel edges or self-loops


Properties
If graph, G, has m edges then Σv∈G deg(v) = 2m
If a di-graph, G, has m edges then
Σv∈G indeg(v) = m = Σv∈G outdeg(v)
If a simple graph, G, has m edges and n vertices:
If G is also directed then m ≤ n(n-1)
If G is also undirected then m ≤ n(n-1)/2
So a simple graph with n vertices has O(n2) edges at most

More Terminology
Path is a sequence of alternating vertices and edges such that each successive vertex is
connected by the edge. Frequently only the vertices are listed especially if there are no parallel
edges.
Cycle is a path that starts and end at the same vertex.
Simple path is a path with distinct vertices. Directed path is a path of only directed edges
Directed cycle is a cycle of only directed edges. Sub-graph is a subset of vertices and edges.
Spanning sub-graph contains all the vertices.
Connected graph has all pairs of vertices connected by at least one path.
Connected component is the maximal connected sub-graph of a unconnected graph. Forest is a
graph without cycles.
Tree is a connected forest (previous type of trees are called rooted trees, these are free trees)
Spanning tree is a spanning subgraph that is also a tree.
More Properties
If G is an undirected graph with n vertices and m edges:
 If G is connected then m ≥ n - 1
 If G is a tree then m = n - 1
 If G is a forest then m ≤ n – 1

 Graph operations:

Suppose we want the following operations:

• AddVertex:

Adds a new vertex to the graph.

For example, suppose there is a new city we want to add to our map of train
routes. AddVertex(graph, G) would give:
• AddEdge:

Adds a new directed edge to the graph.

For example, adding the city was not enough, we also need to say how the rail lines
connect it to other cities. Thus, we might do an AddEdge(graph, C, G), giving:

(an edge from C to G).

• IsReachable:
Reports whether we can get there from here.
For example, we might want to know whether we can get to city E from city A:
IsReachable(graph, E, A) would report a true value.
Again, we might want to know whether we can get to city D from city E:
IsReachable(graph, D, E) would report a false value.

The basic operations provided by a graph data structure G usually include:1]

 adjacent(G, x, y): tests whether there is an edge from the vertex x to the vertex y;
 neighbors(G, x): lists all vertices y such that there is an edge from the vertex x to the vertex y;
 add_vertex(G, x): adds the vertex x, if it is not there;
 remove_vertex(G, x): removes the vertex x, if it is there;
 add_edge(G, x, y): adds the edge from the vertex x to the vertex y, if it is not there;
 remove_edge(G, x, y): removes the edge from the vertex x to the vertex y, if it is there;
 get_vertex_value(G, x): returns the value associated with the vertex x;
 set_vertex_value(G, x, v): sets the value associated with the vertex x to v.

Graph representation
 You can represent a graph in many ways. The two most common ways of representing a graph
is as follows:
Adjacency matrix
 An adjacency matrix is a VxV binary matrix A. Element Ai,j is 1 if there is an edge from vertex i
to vertex j else Ai,jis 0.
 Note: A binary matrix is a matrix in which the cells can have only one of two possible values -
either a 0 or 1.
 The adjacency matrix can also be modified for the weighted graph in which instead of storing 0
or 1 in Ai,j, the weight or cost of the edge will be stored.
 In an undirected graph, if Ai,j = 1, then Aj,i = 1. In a directed graph, if Ai,j = 1, then Aj,i
may or may not be 1.
 Adjacency matrix provides constant time access (O(1) ) to determine if there is an edge
between two nodes. Space complexity of the adjacency matrix is O(V2).
 The adjacency matrix of the following graph is:
 i/j : 1 2 3 4
 1:0101
 2:1010
 3:0101
 4:1010


 The adjacency matrix of the following graph is:
 i/j: 1 2 3 4
 1:0100
 2:0001
 3:1001
 4:0100
 Adjacency list
 The other way to represent a graph is by using an adjacency list. An adjacency list is an array A
of separate lists. Each element of the array Ai is a list, which contains all the vertices that are
adjacent to vertex i.
 For a weighted graph, the weight or cost of the edge is stored along with the vertex in the list
using pairs. In an undirected graph, if vertex j is in list Ai then vertex i will be in list Aj.
 The space complexity of adjacency list is O(V + E) because in an adjacency list information is
stored only for those edges that actually exist in the graph. In a lot of cases, where a matrix is
sparse using an adjacency matrix may not be very useful. This is because using an adjacency
matrix will take up a lot of space where most of the elements will be 0, anyway. In such cases,
using an adjacency list is better.
 Note: A sparse matrix is a matrix in which most of the elements are zero, whereas a dense
matrix is a matrix in which most of the elements are non-zero.

Consider the same undirected graph from an adjacency matrix. The adjacency list of the graph
is as follows:
 A1 → 2 → 4
 A2 → 1 → 3
 A3 → 2 → 4
 A4 → 1 → 3
Consider the same directed graph from an adjacency matrix. The adjacency list of the graph is as
follows:
 A1 → 2
 A2 → 4
 A3 → 1 → 4
 A4 → 2

Graph Traversal:
Graph traversal is a technique used for a searching vertex in a graph. The graph traversal is also
used to decide the order of vertices is visited in the search process. A graph traversal finds the
edges to be used in the search process without creating loops. That means using graph traversal
we visit all the vertices of the graph without getting into looping path.

There are two graph traversal techniques and they are as follows...

1. DFS (Depth First Search)

2. BFS (Breadth First Search)


Depth First Search:

Depth First Search (DFS) algorithm traverses a graph in a depthward motion and uses a stack to
remember to get the next vertex to start a search, when a dead end occurs in any iteration.

As in the example given above, DFS algorithm traverses from S to A to D to G to E to B first, then
to F and lastly to C. It employs the following rules.
 Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Push it in a stack.
 Rule 2 − If no adjacent vertex is found, pop up a vertex from the stack. (It will pop up all the
vertices from the stack, which do not have adjacent vertices.)
 Rule 3 − Repeat Rule 1 and Rule 2 until the stack is empty.

Step Traversal Description

Initialize the stack.


2
Mark S as visited and put it
onto the stack. Explore any
unvisited adjacent node
from S. We have three nodes
and we can pick any of them.
For this example, we shall
take the node in an
alphabetical order.

Mark A as visited and put it


onto the stack. Explore any
unvisited adjacent node from
A. Both Sand D are adjacent
to A but we are concerned for
unvisited nodes only.

4
Visit D and mark it as visited
and put onto the stack. Here,
we have B and C nodes,
which are adjacent to D and
both are unvisited. However,
we shall again choose in an
alphabetical order.

We choose B, mark it as
visited and put onto the stack.
Here Bdoes not have any
unvisited adjacent node. So,
we pop Bfrom the stack.
6

We check the stack top for


return to the previous node
and check if it has any
unvisited nodes. Here, we
find D to be on the top of the
stack.

Only unvisited adjacent node


is from D is C now. So we
visit C, mark it as visited and
put it onto the stack.

As C does not have any unvisited adjacent node so we keep popping the stack until we find a
node that has an unvisited adjacent node. In this case, there's none and we keep popping until the
stack is empty
Breadth First Search
Breadth First Search (BFS) algorithm traverses a graph in a breadthward motion and uses a
queue to remember to get the next vertex to start a search, when a dead end occurs in any
iteration.

As in the example given above, BFS algorithm traverses from A to B to E to F first then to C and G
lastly to D. It employs the following rules.
 Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Insert it in a queue.
 Rule 2 − If no adjacent vertex is found, remove the first vertex from the queue.
 Rule 3 − Repeat Rule 1 and Rule 2 until the queue is empty.

Step Traversal Description

Initialize the queue.


2

We start from
visiting S(starting node), and
mark it as visited.

3
We then see an unvisited
adjacent node from S. In this
example, we have three nodes
but alphabetically we
choose A, mark it as visited
and enqueue it.

Next, the unvisited adjacent


node from S is B. We mark it
as visited and enqueue it.

Next, the unvisited adjacent


node from S is C. We mark it
as visited and enqueue it.
6

Now, S is left with no unvisited


adjacent nodes. So, we
dequeue and find A.

From A we have D as
unvisited adjacent node. We
mark it as visited and enqueue
it.

At this stage, we are left with no unmarked (unvisited) nodes. But as per the algorithm we keep on
dequeuing in order to get all unvisited nodes. When the queue gets emptied, the program is over.
SORTING
Sorting refers to the operation or technique of arranging and rearranging sets of data in some
specific order. A collection of records called a list where every record has one or more fields. The
fields which contain a unique value for each record is termed as the key field. For example, a
phone number directory can be thought of as a list where each record has three fields - 'name' of
the person, 'address' of that person, and their 'phone numbers'. Being unique phone number can
work as a key to locate any record in the list.

The techniques of sorting can be divided into two categories. These are:

 Internal Sorting
 External Sorting

Internal Sorting: If all the data that is to be sorted can be adjusted at a time in the main memory, the
internal sorting method is being performed.

External Sorting: When the data that is to be sorted cannot be accommodated in the memory at the
same time and some has to be kept in auxiliary memory such as hard disk, floppy disk, magnetic tapes
etc, then external sorting methods are performed.

Bubble Sort
We take an unsorted array for our example. Bubble sort takes Ο(n2) time so we're
keeping it short and precise.

Bubble sort starts with very first two elements, comparing them to check which one is
greater.

In this case, value 33 is greater than 14, so it is already in sorted locations. Next, we
compare 33 with 27.

We find that 27 is smaller than 33 and these two values must be swapped.

The new array should look like this −


Next we compare 33 and 35. We find that both are in already sorted positions.

Then we move to the next two values, 35 and 10.

We know then that 10 is smaller 35. Hence they are not sorted.

We swap these values. We find that we have reached the end of the array. After one
iteration, the array should look like this −

To be precise, we are now showing how an array should look like after each iteration.
After the second iteration, it should look like this −
Notice that after each iteration, at least one value moves at the end.

And when there's no swap required, bubble sorts learns that an array is completely
sorted.

Now we should look into some practical aspects of bubble sort.


Algorithm
We assume list is an array of n elements. We further assume that swapfunction
swaps the values of the given array elements.
begin BubbleSort(list)

for all elements of list


if list[i] > list[i+1]
swap(list[i], list[i+1])
end if
end for

return list

end BubbleSort
Pseudocode
We observe in algorithm that Bubble Sort compares each pair of array element unless
the whole array is completely sorted in an ascending order. This may cause a few
complexity issues like what if the array needs no more swapping as all the elements
are already ascending.
To ease-out the issue, we use one flag variable swapped which will help us see if any
swap has happened or not. If no swap has occurred, i.e. the array requires no more
processing to be sorted, it will come out of the loop.
Pseudocode of BubbleSort algorithm can be written as follows −
procedure bubbleSort( list : array of items )

loop = list.count;

for i = 0 to loop-1 do:


swapped = false
for j = 0 to loop-1 do:

/* compare the adjacent elements */


if list[j] > list[j+1] then
/* swap them */
swap( list[j], list[j+1] )
swapped = true
end if

end for

/*if no number was swapped that means


array is sorted now, break the loop.*/

if(not swapped) then


break
end if

end for

end procedure return list


Insertion Sort
We take an unsorted array for our example.

Insertion sort compares the first two elements.

It finds that both 14 and 33 are already in ascending order. For now, 14 is in sorted
sub-list.

Insertion sort moves ahead and compares 33 with 27.

And finds that 33 is not in the correct position.

It swaps 33 with 27. It also checks with all the elements of sorted sub-list. Here we see
that the sorted sub-list has only one element 14, and 27 is greater than 14. Hence, the
sorted sub-list remains sorted after swapping.

By now we have 14 and 27 in the sorted sub-list. Next, it compares 33 with 10.

These values are not in a sorted order.

So we swap them.

However, swapping makes 27 and 10 unsorted.


Hence, we swap them too.

Again we find 14 and 10 in an unsorted order.

We swap them again. By the end of third iteration, we have a sorted sub-list of 4 items.

This process goes on until all the unsorted values are covered in a sorted sub-list. Now
we shall see some programming aspects of insertion sort.

Algorithm

Now we have a bigger picture of how this sorting technique works, so we can derive
simple steps by which we can achieve insertion sort.
Step 1 − If it is the first element, it is already sorted. return 1;
Step 2 − Pick next element
Step 3 − Compare with all elements in the sorted sub-list
Step 4 − Shift all the elements in the sorted sub-list that is greater than the
value to be sorted
Step 5 − Insert the value
Step 6 − Repeat until list is sorted
Pseudocode
procedure insertionSort( A : array of items )
int holePosition
int valueToInsert
for i = 1 to length(A) inclusive do:
valueToInsert = A[i]
holePosition = i

while holePosition > 0 and A[holePosition-1] > valueToInsert do:


A[holePosition] = A[holePosition-1]
holePosition = holePosition -1
end while
A[holePosition] = valueToInsert

end for
end procedure
Selection Sort
Consider the following depicted array as an example.

For the first position in the sorted list, the whole list is scanned sequentially. The first
position where 14 is stored presently, we search the whole list and find that 10 is the
lowest value.

So we replace 14 with 10. After one iteration 10, which happens to be the minimum
value in the list, appears in the first position of the sorted list.

For the second position, where 33 is residing, we start scanning the rest of the list in a
linear manner.

We find that 14 is the second lowest value in the list and it should appear at the second
place. We swap these values.

After two iterations, two least values are positioned at the beginning in a sorted
manner.

The same process is applied to the rest of the items in the array.
Following is a pictorial depiction of the entire sorting process −
Now, let us learn some programming aspects of selection sort.

Algorithm

Step 1 − Set MIN to location 0


Step 2 − Search the minimum element in the list
Step 3 − Swap with value at location MIN
Step 4 − Increment MIN to point to next element
Step 5 − Repeat until list is sorted
Pseudocode

procedure selection sort


list : array of items
n : size of list

for i = 1 to n - 1
/* set current element as minimum*/
min = i

/* check the element to be minimum */

for j = i+1 to n
if list[j] < list[min] then
min = j;
end if
end for

/* swap the minimum element with the current element*/


if indexMin != i then
swap list[min] and list[i]
end if
end for

end procedure
External Sorting-Model for external sorting
External sorting is a term for a class of sorting algorithms that can handle massive amounts of
data. External sorting is required when the data being sorted do not fit into the main memory of a
computing device (usually RAM) and instead they must reside in the slower external memory
(usually a hard drive). External sorting typically uses a hybrid sort-merge strategy. In the sorting
phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a
temporary file. In the merge phase, the sorted sub-files are combined into a single larger file.
One example of external sorting is the external merge sort algorithm, which sorts chunks that each
fit in RAM, then merges the sorted chunks together. We first divide the file into runs such that
the size of a run is small enough to fit into main memory. Then sort each run in main memory
using merge sort sorting algorithm. Finally merge the resulting runs together into successively
bigger runs, until the file is sorted.
One example of external sorting is the external merge sort algorithm, which is a K-way merge
algorithm. It sorts chunks that each fit in RAM, then merges the sorted chunks together.
The algorithm first sorts M items at a time and puts the sorted lists back into external memory For
example, for sorting 900 megabytes of data using only 100 megabytes of RAM:

1. Read 100 MB of the data in main memory and sort by some conventional method,
like quicksort.
2. Write the sorted data to disk.
3. Repeat steps 1 and 2 until all of the data is in sorted 100 MB chunks (there are 900MB /
100MB = 9 chunks), which now need to be merged into one single output file.
4. Read the first 10 MB (= 100MB / (9 chunks + 1)) of each sorted chunk into input buffers in
main memory and allocate the remaining 10 MB for an output buffer. (In practice, it might
provide better performance to make the output buffer larger and the input buffers slightly
smaller.)
5. Perform a 9-way merge and store the result in the output buffer. Whenever the output
buffer fills, write it to the final sorted file and empty it. Whenever any of the 9 input
buffers empties, fill it with the next 10 MB of its associated 100 MB sorted chunk until no
more data from the chunk is available. This is the key step that makes external merge sort
work externally -- because the merge algorithm only makes one pass sequentially through
each of the chunks, each chunk does not have to be loaded completely; rather, sequential
parts of the chunk can be loaded as needed.
Historically, instead of a sort, sometimes a replacement-selection algorithm was used to perform
the initial distribution, to produce on average half as many output chunks of double the length.

Merge Sort
To understand merge sort, we take an unsorted array as the following −

We know that merge sort first divides the whole array iteratively into equal halves
unless the atomic values are achieved. We see here that an array of 8 items is divided
into two arrays of size 4.

This does not change the sequence of appearance of items in the original. Now we
divide these two arrays into halves.

We further divide these arrays and we achieve atomic value which can no more be
divided.

Now, we combine them in exactly the same manner as they were broken down. Please
note the color codes given to these lists.
We first compare the element for each list and then combine them into another list in a
sorted manner. We see that 14 and 33 are in sorted positions. We compare 27 and 10
and in the target list of 2 values we put 10 first, followed by 27. We change the order of
19 and 35 whereas 42 and 44 are placed sequentially.

In the next iteration of the combining phase, we compare lists of two data values, and
merge them into a list of found data values placing all in a sorted order.

After the final merging, the list should look like this −

Now we should learn some programming aspects of merge sorting.


Algorithm
Merge sort keeps on dividing the list into equal halves until it can no more be divided.
By definition, if it is only one element in the list, it is sorted. Then, merge sort combines
the smaller sorted lists keeping the new list sorted too.
Step 1 − if it is only one element in the list it is already sorted, return.
Step 2 − divide the list recursively into two halves until it can no more be divided.
Step 3 − merge the smaller lists into new list in sorted order.
Merge sort works with recursion and we shall see our implementation in the same way.
procedure mergesort( var a as array )
if ( n == 1 ) return a
var l1 as array = a[0] ... a[n/2]
var l2 as array = a[n/2+1] ... a[n]
l1 = mergesort( l1 )
l2 = mergesort( l2 )
return merge( l1, l2 )
end procedure
procedure merge( var a as array, var b as array )
var c as array
while ( a and b have elements )
if ( a[0] > b[0] )
add b[0] to the end of c
remove b[0] from b
else
add a[0] to the end of c
remove a[0] from a
end if
end while

while ( a has elements )


add a[0] to the end of c
remove a[0] from a
end while

while ( b has elements )


add b[0] to the end of c
remove b[0] from b
end while
return c
end procedure

Heap Sort
Heap sort is a comparison based sorting technique based on Binary Heap data
structure. It is similar to selection sort where we first find the maximum element and
place the maximum element at the end. We repeat the same process for remaining
element.
What is Binary Heap?

Let us first define a Complete Binary Tree. A complete binary tree is a binary tree in
which every level, except possibly the last, is completely filled, and all nodes are as far
left as possible
A Binary Heap is a Complete Binary Tree where items are stored in a special order
such that value in a parent node is greater(or smaller) than the values in its two children
nodes. The former is called as max heap and the latter is called min heap. The heap
can be represented by binary tree or array.
Why array based representation for Binary Heap?

Since a Binary Heap is a Complete Binary Tree, it can be easily represented as array
and array based representation is space efficient. If the parent node is stored at index I,
the left child can be calculated by 2 * I + 1 and right child by 2 * I + 2 (assuming the
indexing starts at 0).
Heap Sort Algorithm for sorting in increasing order:

1. Build a max heap from the input data.


2. At this point, the largest item is stored at the root of the heap. Replace it with the last
item of the heap followed by reducing the size of heap by 1. Finally, heapify the root of
tree.
3. Repeat above steps while size of heap is greater than 1.
How to build the heap?

Heapify procedure can be applied to a node only if its children nodes are heapified. So
the heapification must be performed in the bottom up order.
Lets understand with the help of an example:
Input data: 4, 10, 3, 5, 1
4(0)
/ \
10(1) 3(2)
/ \
5(3) 1(4)

The numbers in bracket represent the indices in the array


representation of data.
Applying heapify procedure to index 1:
4(0)
/ \
10(1) 3(2)
/ \
5(3) 1(4)

Applying heapify procedure to index 0:


10(0)
/\
5(1) 3(2)
/ \
4(3) 1(4)
The heapify procedure calls itself recursively to build heap
in top down manner.
Radix Sort
The lower bound for Comparison based sorting algorithm (Merge Sort, Heap Sort,
Quick-Sort .. etc) is Ω(nLogn), i.e., they cannot do better than nLogn.
Counting sort is a linear time sorting algorithm that sort in O(n+k) time when elements
are in range from 1 to k.
What if the elements are in range from 1 to n2?

We can’t use counting sort because counting sort will take O(n2) which is worse than
comparison based sorting algorithms. Can we sort such an array in linear time? Radix
Sort is the answer. The idea of Radix Sort is to do digit by digit sort starting from least
significant digit to most significant digit. Radix sort uses counting sort as a subroutine to
sort.

You might also like