Data Structures Lecture Notes
Data Structures Lecture Notes
(Autonomous)
Dundigal, Hyderabad - 500 043
Lecture Notes:
Drafted by :
Ms. D.Deepthisri (IARE10830)
Assistant Professor
Contents 1
List of Figures 3
Abbreviations 5
3 LINKED LISTS 58
3.1 LINKED LISTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.1.1 Linked List Concepts: . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1.2 Types of Linked Lists: . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Single Linked List: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 Double Linked List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.4 Circular Single Linked List . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Circular Double Linked List . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
1
Contents 2
Bibliography 137
List of Figures
4.1 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2 Binary Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3
List of Figures 4
5
Chapter 1
INTRODUCTION TO DATA
STRUCTURES, SEARCHING AND
SORTING
Course Outcomes
After successful completion of this module, students should be able to:
CO 1 Carryout the analysis of a range of algorithms in terms of al- Under-
gorithm analysis and express algorithm complexity using the O stand
notation.
CO 2 Make use of recursive algorithm design technique in appropriate Under-
contexts. stand
CO 3 Represent standard ADTs by means of appropriate data struc- Under-
tures. stand
A data structure is a way of storing data in a computer so that it can be used efficiently and
it will allow the most efficient algorithm to be used. The choice of the data structure begins
from the choice of an abstract data type (ADT). A well-designed data structure allows
1
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 2
A data structure should be seen as a logical concept that must address two fundamental
concerns.
As data structure is a scheme for data organization so the functional definition of a data
structure should be independent of its implementation. The functional definition of a data
structure is known as ADT (Abstract Data Type) which is independent of implementation.
The way in which the data is organized affects the performance of a program for different
tasks. Computer programmers decide which data structures to use based on the nature
of the data and the processes that need to be performed on that data. Some of the more
commonly used data structures include lists, arrays, stacks, queues, heaps, trees, and
graphs.
Simple data structure can be constructed with the help of primitive data structure. A
primitive data structure used to represent the standard data types of any one of the
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 3
computer languages. Variables, arrays, pointers, structures, unions, etc. are examples of
primitive data structures.
Compound data structure can be constructed with the help of any one of the primitive
data structure and it is having a specific functionality. It can be designed by user. It can
be classified as
The following list of operations applied on linear data structures 1. Add an element 2.
Delete an element 3. Traverse 4. Sort the list of elements 5. Search for a data element
1. Add elements
2. Delete elements
1.2 Algorithms:
1. Input Step
2. Assignment Step
3. Decision Step
4. Repetitive Step
5. Output Step
4. Effectiveness: the operations of the algorithm must be basic enough to be put down on
pencil and paper. They should not be too complex to warrant writing another algorithm
for the operation.
5. Input-Output: The algorithm must have certain initial and precise inputs, and outputs
that may be generated both at its intermediate and final steps.
An algorithm does not enforce a language or mode for its expression but only demands
adherence to its properties.
1. To save time (Time Complexity): A program that runs faster is a better program.
2. To save space (Space Complexity): A program that saves space over a competing
program is considerable desirable.
Efficiency of Algorithms:
The performances of algorithms can be measured on the scales of time and space. The
performance of a program is the amount of computer memory and time needed to run
a program. We use two approaches to determine the performance of a program. One
is analytical and the other is experimental. In performance analysis we use analytical
methods, while in performance measurement we conduct experiments.
Analyzing Algorithms
Suppose M is an algorithm, and suppose n is the size of the input data. Clearly the
complexity f(n) of M increases as n increases. It is usually the rate of increase of f(n) with
some standard functions. The most common computing times are
O(1), O(log2 n), O(n), O(n log2 n), O(n2), O(n3), O(2n)
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 7
Asymptotic Notations:
It is often used to describe how the size of the input data affects an algorithm’s usage of
computational resources. Running time of an algorithm is described as a function of input
size n for large n.
Time Complexity:
Reasons for analyzing algorithms: To predict the resources that the algorithm requires
GCD Design: Given two integers a and b, the greatest common divisor is recursively found
using the formula
Fibonacci Design: To start a fibonacci series, we need to know the first two numbers
1. A function is said to be recursive if it calls itself again and again within its body whereas
iterative functions are loop based imperative functions.
3. Recursion uses more memory than iteration as its concept is based on stacks.
6. Iteration terminates when the loop-continuation condition fails whereas recursion ter-
minates when a base case is recognized.
7. While using recursion multiple activation records are created on stack for each call
where as in iteration everything is done in one activation record.
8. Infinite recursion can crash the system whereas infinite looping uses CPU cycles re-
peatedly.
Types of Recursion:
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 10
Recursion is of two types depending on whether a function calls itself from within itself
or whether two functions call one another mutually. The former is called direct recursion
and the later is called indirect recursion. Thus there are two types of recursion:
• Direct Recursion
• Indirect Recursion
• Linear Recursion
• Binary Recursion
• Multiple Recursion
Linear Recursion:
It is the most common type of Recursion in which function calls itself repeatedly until
base condition [termination case] is reached. Once the base case is reached the results are
return to the caller function. If a recursive function is called only once then it is called a
linear recursion.
Binary Recursion:
Some recursive functions don’t just have one call to themselves; they have two (or more).
Functions with two recursive calls are referred to as binary recursive functions.
Example1: The Fibonacci function fib provides a classic example of binary recursion. The
Fibonacci numbers can be defined by the rule:
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 11
fib(n) = 0 if n is 0,
= 1 if n is 1,
Fib(0) = 0
Fib(1) = 1
# Program to display the Fibonacci sequence up to n-th term where n is provided by the
user
nterms = 10
n1 = 0
n2 = 1
count = 0
if nterms <= 0:
elif nterms == 1:
print(n1)
else:
print(n1,end=’ , ’)
nth = n1 + n2
# update values
n1 = n2
n2 = nth
count += 1
Tail Recursion:
Tail recursion is a form of linear recursion. In tail recursion, the recursive call is the last
thing the function does. Often, the value of the recursive call is returned. As such, tail
recursive functions can often be easily implemented in an iterative manner; by taking out
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 13
the recursive call and replacing it with a loop, the same effect can generally be achieved.
In fact, a good compiler can recognize tail recursion and convert it to iteration in order to
optimize the performance of the code.
A good example of a tail recursive function is a function to compute the GCD, or Greatest
Common Denominator, of two numbers: def factorial(n):
if n == 0: return 1
if n == 0: return 1
Recursive algorithms for Factorial, GCD, Fibonacci Series and Towers of Hanoi:
Factorial(n)
Input: integer n � 0
Output: n!
GCD(m, n)
Time-Complexity: O(ln n)
Fibonacci(n)
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 14
Input: integer n � 0
1. if n=1 or n=2
2. then Fibonacci(n)=1
Input: The aim of the tower of Hanoi problem is to move the initial n different sized disks
from needle A to needle C using a temporary needle
B. The rule is that no larger disk is to be placed above the smaller disk in any of the needle
while moving or at any time, and only the top of the disk is to be moved at a time from
any needle to any needle.
Output:
if n == 1:
return
n=4
Linear Search: Searching is a process of finding a particular data item from a collection of
data items based on specific criteria. Every day we perform web searches to locate data
items containing in various pages. A search typically performed using a search key and
it answers either True or False based on the item is present or not in the list. Linear
search algorithm is the most simplest algorithm to do sequential search and this technique
iterates over the sequence and checks one item at a time, until the desired item is found or
all items have been examined. In Python the in operator is used to find the desired item
in a sequence of items. The in operator makes searching task simpler and hides the inner
working details.
Consider an unsorted single dimensional array of integers and we need to check whether
31 is present in the array or not, then search begins with the first element. As the first
element doesn’t contain the desired value, then the next element is compared to value
31 and this process continues until the desired element is found in the sixth position.
Similarly, if we want to search for 8 in the same array, then the search begins in the same
manner, starting with the first element until the desired element is found. In linear search,
we cannot determine that a given search value is present in the sequence or not until the
entire array is traversed.
Source Code:
if obj[i] == item:
return i
return -1
arr=[1,2,3,4,5,6,7,8]
x=4
result=linear_search(arr,x)
if result==-1:
else:
Any algorithm is analyzed based on the unit of computation it performs. For linear search,
we need to count the number of comparisons performed, but each comparison may or may
not search the desired item.
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 17
Table 1.2
Binary Search: In Binary search algorithm, the target key is examined in a sorted sequence
and this algorithm starts searching with the middle item of the sorted sequence.
a. If the middle item is the target value, then the search item is found and it returns True.
b. If the target item < middle item, then search for the target value in the first half of
the list.
c. If the target item > middle item, then search for the target value in the second half of
the list.
In binary search as the list is ordered, so we can eliminate half of the values in the list
in each iteration. Consider an example, suppose we want to search 10 in a sorted array
of elements, then we first determine the middle element of the array. As the middle item
contains 18, which is greater than the target value 10, so can discard the second half of
the list and repeat the process to first half of the array. This process is repeated until
the desired target item is located in the list. If the item is found then it returns True,
otherwise False.
Source Code:
array =[1,2,3,4,5,6,7,8,9]
def binary_search(searchfor,array):
lowerbound=0
upperbound=len(array)-1
found=False
midpoint=(lowerbound+upperbound)//2
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 18
if array[midpoint]==searchfor:
found =True
return found
elif array[midpoint]<searchfor:
lowerbound=midpoint+1
else:
upperbound=midpoint-1
return found
if binary_search(searchfor,array):
else:
In Binary Search, each comparison eliminates about half of the items from the list. Con-
sider a list with n items, then about n/2 items will be eliminated after first comparison.
After second comparison, n/4 items of the list will be eliminated. If this process is re-
peated for several times, then there will be just one item left in the list. The number of
comparisons required to reach to this point is n/2i = 1. If we solve for i, then it gives us
i = log n. The maximum number is comparison is logarithmic in nature, hence the time
complexity of binary search is O(log n).
Table 1.3
Source Code:
# returns -1
fibMMm2 = fibMMm1
fibMMm1 = fibM
offset = -1;
i = min(offset+fibMMm2, n-1)
# from offset to i
fibM = fibMMm1
fibMMm1 = fibMMm2
offset = i
# after i+1
fibM = fibMMm2
else :
return i
return offset+1;
return -1
# Driver Code
arr = [10, 22, 35, 40, 45, 50, 80, 82, 85, 90, 100]
n = len(arr)
x = 80
print(”Found at index:”,
fibMonaccianSearch(arr, x, n))
Sorting in general refers to various methods of arranging or ordering things based on crite-
ria’s (numerical, chronological, alphabetical, hierarchical etc.). There are many approaches
to sorting data and each has its own merits and demerits.
Bubble Sort:
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 22
This sorting technique is also known as exchange sort, which arranges values by iterating
over the list several times and in each iteration the larger value gets bubble up to the end
of the list. This algorithm uses multiple passes and in each pass the first and second data
items are compared. if the first data item is bigger than the second, then the two items
are swapped. Next the items in second and third position are compared and if the first
one is larger than the second, then they are swapped, otherwise no change in their order.
This process continues for each successive pair of data items until all items are sorted.
[End of if]
Step 4: Exit
Source Code:
def bubbleSort(arr):
n = len(arr)
for i in range(n):
bubbleSort(arr)
for i in range(len(arr)):
Step-by-step example:
Let us take the array of numbers ”5 1 4 2 8”, and sort the array from lowest number
to greatest number using bubble sort. In each step, elements written in bold are being
compared. Three passes will be required.
First Pass:
(14258)
( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5), algorithm does not
swap them.
Second Pass:
(14258)(14258)
(12458)(12458)
(12458)(12458)
Now, the array is already sorted, but our algorithm does not know if it is completed. The
algorithm needs one whole pass without any swap to know it is sorted.
Third Pass:
(12458)(12458)
(12458)(12458)
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 25
(12458)(12458)
(12458)(12458)
Time Complexity:
The efficiency of Bubble sort algorithm is independent of number of data items in the
array and its initial arrangement. If an array containing n data items, then the outer loop
executes n-1 times as the algorithm requires n-1 passes. In the first pass, the inner loop is
executed n-1 times; in the second pass, n-2 times; in the third pass, n-3 times and so on.
The total number of iterations resulting in a run time of O(n2).
Selection Sort:
Selection sort algorithm is one of the simplest sorting algorithm, which sorts the elements
in an array by finding the minimum element in each pass from unsorted part and keeps
it in the beginning. This sorting technique improves over bubble sort by making only one
exchange in each pass. This sorting technique maintains two sub arrays, one sub array
which is already sorted and the other one which is unsorted. In each iteration the minimum
element (ascending order) is picked from unsorted array and moved to sorted sub array..
Source Code:
# Sort
import sys
for i in range(len(A)):
# unsorted array
min_idx = i
min_idx = j
for i in range(len(A)):
print(”%d” %A[i])
Output:
Step-by-step example:
64 25 12 22 11
11 25 12 22 64
11 12 25 22 64
11 12 22 25 64
11 12 22 25 64
Time Complexity:
Selection sort is not difficult to analyze compared to other sorting algorithms since none of
the loops depend on the data in the array. Selecting the lowest element requires scanning
all n elements (this takes n − 1 comparisons) and then swapping it into the first position.
Finding the next lowest element requires scanning the remaining n − 1 elements and so
on, for (n − 1) + (n − 2) + ... + 2 + 1 = n(n − 1) / 2 � O(n2) comparisons. Each of
these scans requires one swap for n − 1 elements (the final element is already in place).
Insertion Sort:
An algorithm consider the elements one at a time, inserting each in its suitable place
among those already considered (keeping them sorted). Insertion sort is an example of
an incremental algorithm. It builds the sorted sequence one number at a time. This
is a suitable sorting technique in playing card games. Insertion sort provides several
advantages:
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 28
• Simple implementation
• Adaptive (i.e., efficient) for data sets that are already substantially sorted: the time
complexity is O(n + d), where d is the number of inversions
• More efficient in practice than most other simple quadratic (i.e., O(n2)) algorithms such
as selection sort or bubble sort; the best case (nearly sorted input) is O(n)
• Stable; i.e., does not change the relative order of elements with equal keys
• In-place; i.e., only requires a constant amount O(1) of additional memory space
Source Code:
def insertionSort(arr):
key = arr[i]
j = i-1
arr[j+1] = arr[j]
j -= 1
arr[j+1] = key
insertionSort(arr)
for i in range(len(arr)):
1. The second element of an array is compared with the elements that appear before it
(only first element in this case). If the second element is smaller than first element, second
element is inserted in the position of first element. After first step, first two elements of
an array will be sorted.
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 30
2. The third element of an array is compared with the elements that appears before it
(first and second element). If third element is smaller than first element, it is inserted in
the position of first element. If third element is larger than first element but, smaller than
second element, it is inserted in the position of second element. If third element is larger
than both the elements, it is kept in the position as it is. After second step, first three
elements of an array will be sorted.
3. Similarly, the fourth element of an array is compared with the elements that appear
before it (first, second and third element) and the same procedure is applied and that
element is inserted in the proper position. After third step, first four elements of an array
will be sorted. If there are n elements to be sorted. Then, this procedure is repeated n-1
times to get sorted list of array.
Time Complexity:
Output:
Enter no of elements:5
Enter elements:1 65 0 32 66
Quick Sort :
Quick sort is a divide and conquer algorithm. Quick sort first divides a large list into two
smaller sub-lists: the low elements and the high elements. Quick sort can then recursively
sort the sub-lists.
2. Reorder the list so that all elements with values less than the pivot come before the
pivot, while all elements with values greater than the pivot come after it (equal values can
go either way). After this partitioning, the pivot is in its final position. This is called the
partition operation.
3. Recursively apply the above steps to the sub-list of elements with smaller values and
separately the sub-list of elements with greater values.
The base case of the recursion is lists of size zero or one, which never need to be sorted.
Advantages:
• Does not need additional memory (the sorting takes place in the array - this is called
in-place processing).
Source Code:
# of pivot
def partition(arr,low,high):
# equal to pivot
i = i+1
arr[i],arr[j] = arr[j],arr[i]
arr[i+1],arr[high] = arr[high],arr[i+1]
return ( i+1 )
def quickSort(arr,low,high):
# at right place
pi = partition(arr,low,high)
arr = [10, 7, 8, 9, 1, 5]
n = len(arr)
quickSort(arr,0,n-1)
for i in range(n):
Time Complexity:
Merge() function:
It takes the array, left-most , middle and right-most index of the array to be merged as
arguments.
Source Code:
result = []
i, j = 0, 0
result.append(left[i])
i+= 1
else:
result.append(right[j])
j+= 1
if i == len(left) or j == len(right):
result.extend(left[i:] or right[j:])
break
return result
def mergesort(list):
if len(list) < 2:
return list
middle = int(len(list)/2)
left = mergesort(list[:middle])
right = mergesort(list[middle:])
Chapter 1. INTRODUCTION TO DATA STRUCTURES, SEARCHING AND
SORTING 35
print(seq);
print(”\n”)
print(mergesort(seq))
Time Complexity:
Course Outcomes
After successful completion of this module, students should be able to:
CO 7 Implement linked lists, stacks and queues in Python for problem Analyze
solving.
CO 8 Explain the use of basic data structures such as arrays, stacks, Under-
queues and linked lists in program design. stand
A stack is a container of objects that are inserted and removed according to the last-in
first-out (LIFO) principle. In the pushdown stacks only two operations are allowed: push
the item into the stack, and pop the item out of the stack. A stack is a limited access data
structure - elements can be added and removed from the stack only at the top. Push adds
an item to the top of the stack, pop removes the item from the top. A helpful analogy is
to think of a stack of books; you can remove only the top book, also you can add a new
book on the top. A stack may be implemented to have a bounded capacity. If the stack is
full and does not contain enough space to accept an entity to be pushed, the stack is then
36
Chapter 2. LINEAR DATA STRUCTURES 37
considered to be in an overflow state. The pop operation removes an item from the top of
the stack. A pop either reveals previously concealed items or results in an empty stack,
but, if the stack is empty, it goes into underflow state, which means no items are present
in stack to be removed.
Stack is an Abstract data structure (ADT) works on the principle Last In First Out
(LIFO). The last element add to the stack is the first element to be delete. Insertion and
deletion can be takes place at one end called TOP. It looks like one side closed tube.
• If you push elements that are added at the top of the stack;
• In the same way when we pop the elements, the element at the top of the stack is
deleted.
Chapter 2. LINEAR DATA STRUCTURES 38
Stack is an Abstract data structure (ADT) works on the principle Last In First Out
(LIFO). The last element add to the stack is the first element to be delete. Insertion and
deletion can be takes place at one end called TOP. It looks like one side closed tube.
• If you push elements that are added at the top of the stack;
• In the same way when we pop the elements, the element at the top of the stack is
deleted.
Operations of stack:
1. push
2. pop.
While performing push & pop operations the following test must be conducted on the
stack.
Push:
Push operation is used to add new elements in to the stack. At the time of addition first
check the stack is full or not. If the stack is full it generates an error message ”stack
overflow”.
Pop:
Pop operation is used to delete elements from the stack. At the time of deletion first
check the stack is empty or not. If the stack is empty it generates an error message ”stack
underflow”.
Let us consider a stack with 6 elements capacity. This is called as the size of the stack.
The number of elements to be added should not exceed the maximum size of the stack.
If we attempt to add new element beyond the maximum size, we will encounter a stack
overflow condition. Similarly, you cannot remove elements beyond the base of the stack.
If such is the case, we will reach a stack underflow condition.
When an element is taken off from the stack, the operation is performed by pop().
STACK: Stack is a linear data structure which works under the principle of last in first
out. Basic operations: push, pop, display.
1. PUSH: if (top==MAX), display Stack overflow else reading the data and making stack
[top] =data and incrementing the top value by doing top++.
2. POP: if (top==0), display Stack underflow else printing the element at the top of the
stack and decrementing the top value by doing the top.
3. DISPLAY: IF (TOP==0), display Stack is empty else printing the elements in the
stack from stack [0] to stack [top].
# Used to return -infinite when stack is empty from sys import maxsize
def createStack():
stack = []
return stack
def isEmpty(stack):
return len(stack) == 0
Chapter 2. LINEAR DATA STRUCTURES 41
stack.append(item)
def pop(stack):
if (isEmpty(stack)):
print(”stack empty”)
return stack.pop()
stack = createStack()
push(stack, str(10))
push(stack, str(20))
push(stack, str(30))
push(stack, str(10))
push(stack, str(20))
Chapter 2. LINEAR DATA STRUCTURES 42
push(stack, str(30))
We can represent a stack as a linked list. In a stack push and pop operations are performed
at one end called top. We can perform similar operations at one end of list using top
pointer.
class StackNode:
self.data = data
self.next = None
class Stack:
def _init_(self):
Chapter 2. LINEAR DATA STRUCTURES 43
self.root = None
def isEmpty(self):
newNode = StackNode(data)
newNode.next = self.root
self.root = newNode
def pop(self):
if (self.isEmpty()):
return float(”-inf”)
temp = self.root
self.root = self.root.next
popped = temp.data
return popped
def peek(self):
if self.isEmpty():
return float(”-inf”)
return self.root.data
stack = Stack()
stack.push(10)
Chapter 2. LINEAR DATA STRUCTURES 44
stack.push(20)
stack.push(30)
Stack Applications:
1. Stack is used by compilers to check for balancing of parentheses, brackets and braces.
4. In recursion, all intermediate arguments and return values are stored on the processor’s
stack.
5. During a function call the return address and arguments are pushed onto a stack and
on return they are popped off.
6. Depth first search uses a stack data structure to find an element from a graph.
b) If the scanned symbol is an operand, then place directly in the postfix expression
(output).
c) If the symbol scanned is a right parenthesis, then go on popping all the items from the
stack and place them in the postfix expression till we get the matching left parenthesis.
d) If the scanned symbol is an operator, then go on removing all the operators from
the stack and place them in the postfix expression, if and only if the precedence of the
operator which is on the top of the stack is greater than (or equal) to the precedence of
Chapter 2. LINEAR DATA STRUCTURES 45
the scanned operator and push the scanned operator onto the stack otherwise, push the
scanned operator onto the stack.
Source Code:
import string
class Conversion:
self.top = -1
self.capacity = capacity
self.array = []
# Precedence setting
self.output = []
def isEmpty(self):
def peek(self):
return self.array[-1]
def pop(self):
if not self.isEmpty():
self.top -= 1
return self.array.pop()
else:
return ”$”
self.top += 1
self.array.append(op)
# is operand
return ch.isalpha()
try:
a = self.precedence[i]
b = self.precedence[self.peek()]
except KeyError:
Chapter 2. LINEAR DATA STRUCTURES 47
return False
# to postfix expression
for i in exp:
# add it to output
if self.isOperand(i):
self.output.append(i)
elif i == ’(’:
self.push(i)
elif i == ’)’:
a = self.pop()
self.output.append(a)
return -1
else:
Chapter 2. LINEAR DATA STRUCTURES 48
self.pop()
# An operator is encountered
else:
self.output.append(self.pop())
self.push(i)
self.output.append(self.pop())
result= ””.join(self.output)
print(result)
exp = ”a+b*(cd̂-e)(̂f+g*h)-i”
obj = Conversion(len(exp))
obj.infixToPostfix(exp)
Procedure:
The postfix expression is evaluated easily by the use of a stack. When a number is seen,
it is pushed onto the stack; when an operator is seen, the operator is applied to the two
numbers that are popped from the stack and the result is pushed onto the stack.
Source Code:
class Evaluate:
self.top = -1
self.capacity = capacity
self.array = []
def isEmpty(self):
def peek(self):
return self.array[-1]
def pop(self):
if not self.isEmpty():
self.top -= 1
return self.array.pop()
else:
return ”$”
self.top += 1
self.array.append(op)
# to postfix expression
for i in exp:
if i.isdigit():
self.push(i)
else:
val1 = self.pop()
val2 = self.pop()
self.push(str(eval(val2 + i + val1)))
return int(self.pop())
exp = ”231*+9-”
obj = Evaluate(len(exp))
Chapter 2. LINEAR DATA STRUCTURES 51
A queue is a data structure that is best described as ”first in, first out”. A queue is another
special kind of list, where items are inserted at one end called the rear and deleted at the
other end called the front. A real world example of a queue is people waiting in line at
the bank. As each person enters the bank, he or she is ”enqueued” at the back of the line.
When a teller becomes available, they are ”dequeued” at the front of the line.
This difficulty can overcome if we treat queue position with index 0 as a position that
comes after position with index 4 i.e., we treat the queue as a circular queue.
In order to create a queue we require a one dimensional array Q(1:n) and two variables
front and rear. The conventions we shall adopt for these two variables are that front is
always 1 less than the actual front of the queue and rear always points to the last element
in the queue. Thus, front = rear if and only if there are no elements in the queue. The
initial condition then is front = rear = 0.
The various queue operations to perform creation, deletion and display the elements in a
queue are as follows:
Source Code:
front = 0
rear = 0
Chapter 2. LINEAR DATA STRUCTURES 52
mymax = 3
def createQueue():
queue = []
return queue
def isEmpty(queue):
return len(queue) == 0
def enqueue(queue,item):
queue.append(item)
def dequeue(queue):
if (isEmpty(queue)):
item=queue[0]
Chapter 2. LINEAR DATA STRUCTURES 53
del queue[0]
return item
queue = createQueue()
while True:
print(”1 Enqueue”)
print(”2 Dequeue”)
print(”3 Display”)
print(”4 Quit”)
ch=int(input(”Enter choice”))
if(ch==1):
item=input(”enter item”)
enqueue(queue, item)
rear = rear + 1
else:
print(”Queue is full”)
elif(ch==2):
print(dequeue(queue))
elif(ch==3):
print(queue)
else:
Chapter 2. LINEAR DATA STRUCTURES 54
break
Applications of Queues:
3.
There are two problems associated with linear queue. They are:
• Time consuming: linear time to be spent in shifting the elements to the beginning of
the queue.
• Signaling queue full: even if the queue is having vacant position. The round-robin (RR)
scheduling algorithm is designed especially for time-sharing systems. It is similar to FCFS
scheduling, but pre-emption is added to switch between processes. A small unit of time,
called a time quantum or time slices, is defined. A time quantum is generally from 10
to 100 milliseconds. The ready queue is treated as a circular queue. To implement RR
scheduling
• The CPU scheduler picks the first process from the ready queue, sets a timer to interrupt
after 1 time quantum, and dispatches the process.
• The process may have a CPU burst of less than 1 time quantum.
o In this case, the process itself will release the CPU voluntarily.
o The scheduler will then proceed to the next process in the ready queue.
• Otherwise, if the CPU burst of the currently running process is longer than 1 time
quantum,
o The timer will go off and will cause an interrupt to the OS.
o A context switch will be executed, and the process will be put at the tail of the ready
queue.
o The CPU scheduler will then select the next process in the ready queue.
Chapter 2. LINEAR DATA STRUCTURES 55
The average waiting time under the RR policy is often long. Consider the following set of
processes that arrive at time 0, with the length of the CPU burst given in milliseconds: (a
time quantum of 4 milliseconds)
Using round-robin scheduling, we would schedule these processes according to the following
chart:
The output restricted DEQUE allows deletions from only one end and input restricted
DEQUE allow insertions at only one end. The DEQUE can be constructed in two ways
they are
1) Using array
Operations in DEQUE:
Applications of DEQUE:
1. The A-Steal algorithm implements task scheduling for several processors (multiprocessor
scheduling).
3. When one of the processor completes execution of its own threads it can steal a thread
from another processor.
4. It gets the last element from the deque of another processor and executes it.
Circular Queue:
Chapter 2. LINEAR DATA STRUCTURES 57
Circular queue is a linear data structure. It follows FIFO principle. In circular queue the
last node is connected back to the first node to make a circle.
• Elements are added at the rear end and the elements are deleted at front end of the
queue
• Both the front and the rear pointers points to the beginning of the array.
3. Using arrays
Chapter 3
LINKED LISTS
Course Outcomes
After successful completion of this module, students should be able to:
CO 7 Implement linked lists, stacks and queues in Python for problem Under-
solving. stand
CO 8 Explain the use of basic data structures such as arrays, stacks, Apply
queues and linked lists in program design.
Linked lists and arrays are similar since they both store collections of data. Array is the
most common data structure used to store collections of elements. Arrays are convenient
to declare and provide the easy syntax to access any element by its index number. Once
the array is set up, access to any element is convenient and fast.
• The size of the array is fixed. Most often this size is specified at compile time. This
makes the programmers to allocate arrays, which seems ”large enough” than required.
• Inserting new elements at the front is potentially expensive because existing elements
need to be shifted over to make room.
58
Chapter 3. LINKED LISTS 59
• Deleting an element from an array is not possible. Linked lists have their own strengths
and weaknesses, but they happen to be strong where arrays are weak.
• Generally array’s allocates the memory for all its elements in one block whereas linked
lists use an entirely different strategy. Linked lists allocate memory for each element
separately and only when necessary.
Linked lists have many advantages. Some of the very important advantages are:
1. Linked lists are dynamic data structures. i.e., they can grow or shrink during the
execution of a program.
2. Linked lists have efficient memory utilization. Here, memory is not preallocated. Mem-
ory is allocated whenever it is required and it is de-allocated (removed) when it is no
longer needed.
3. Insertion and Deletions are easier and efficient. Linked lists provide flexibility in
inserting a data item at a specified position and deletion of the data item from the given
position.
4. Many complex applications can be easily carried out with linked lists.
1. It consumes more space because every node requires a additional pointer to store address
of the next node.
Basically we can put linked lists into the following four items:
A single linked list is one in which all nodes are linked together in some sequential manner.
Hence, it is also called as linear linked list.
A double linked list is one in which all nodes are linked together by multiple links which
helps in accessing both the successor node (next node) and predecessor node (previous
node) from any arbitrary node within the list. Therefore each node in a double linked
list has two link fields (pointers) to point to the left node (previous) and the right node
(next). This helps to traverse in forward direction and backward direction.
A circular linked list is one, which has no beginning and no end. A single linked list can
be made a circular linked list by simply storing address of the very first node in the link
field of the last node. A circular double linked list is one, which has both the successor
pointer and predecessor pointer in the circular manner.
1. Linked lists are used to represent and manipulate polynomial. Polynomials are expres-
sion containing terms with non zero coefficient and exponents. For example: P(x) = a0
Xn + a1 Xn-1 + …… + an-1 X + an
2. Represent very large numbers and operations of the large number such as addition,
multiplication and division.
3. Linked lists are to implement stack, queue, trees and graphs. 4. Implement the symbol
table in compiler construction.
Chapter 3. LINKED LISTS 61
A linked list allocates space for each element separately in its own block of memory called
a ”node”. The list gets an overall structure by using pointers to connect all its nodes
together like the links in a chain. Each node contains two fields; a ”data” field to store
whatever element, and a ”next” field which is a pointer used to link to the next node. Each
node is allocated in the heap using malloc(), so the node memory continues to exist until
it is explicitly de-allocated using free(). The front of the list is a pointer to the “start”
node.
The beginning of the linked list is stored in a ”start” pointer which points to the first
node. The first node contains a pointer to the second node. The second node contains a
pointer to the third node, ... and so on. The last node in the list has its next field set to
NULL to mark the end of the list. Code can access any node in the list by starting at the
start and following the next pointers.
The start pointer is an ordinary local pointer variable, so it is drawn separately on the
left top to show that it is in the stack. The list nodes are drawn on the right to show that
they are allocated in the heap.
The basic operations in a single linked list are: • Creation. • Insertion. • Deletion. •
Traversing.
Creating a singly linked list starts with creating a node. Sufficient memory has to be
allocated for creating a node. The information is stored in the memory.
Insertion of a Node:
One of the most primitive operations that can be done in a singly linked list is the insertion
of a node. Memory is to be allocated for the new node (in a similar way that is done while
creating a list) before reading the data. The new node will contain empty data field and
empty next field. The data field of the new node is then stored with the information read
from the user. The next field of the new node is assigned to NULL. The new node can
then be inserted at three different places namely:
Inserting a node into the single linked list at a specified intermediate position other than
beginning and end.
Figure 3.5: Inserting a node into the single linked list at a specified intermediate position
other than beginning and end.
Deletion of a node:
Chapter 3. LINKED LISTS 64
Another primitive operation that can be done in a singly linked list is the deletion of a
node. Memory is to be released for the node to be deleted. A node can be deleted from
the list from three different places namely.
To display the information, you have to traverse (move) a linked list, node by node from
the first node, until the end of the list is reached.
• Display the information from the data field of each node. The function traverse () is
used for traversing and displaying the information stored in the list from left to right.
Source Code for creating and inserting the Implementation of Single Linked List:
class Item:
self.next = None
class SingleLikedList:
def __init__(self):
self._start = None
Chapter 3. LINKED LISTS 65
self._count = 0
if self._start == None:
return None
i=0
cursor = self._start
if i == pos:
return cursor
i += 1
cursor = cursor.next
return None
item = Item(data)
if self._start == None:
self._start = item
else:
cursor = self.getItemAtIndex(pos)
item.next = cursor.next
item.next = cursor.next
cursor.next = item
self._count += 1
Chapter 3. LINKED LISTS 66
def display(self):
cursor = self._start
print(cursor._content, end=’ ’)
cursor = cursor.next
l = SingleLikedList()
l.insert(10)
l.insert(20)
l.insert(30)
l.insert(40)
l.insert(50, 3)
l.display()
Source Code for creating , inserting ,deleting the Implementation of Single Linked List:
class Node(object):
self.data = data
self.next_node = next_node
def get_data(self):
return self.data
def get_next(self):
return self.next_node
self.next_node = new_next
class LinkedList(object):
self.head = head
new_node = Node(data)
new_node.set_next(self.head)
self.head = new_node
def size(self):
current = self.head
count = 0
while current:
count += 1
current = current.get_next()
return count
current = self.head
found = False
if current.get_data() == data:
found = True
else:
Chapter 3. LINKED LISTS 68
current = current.get_next()
if current is None:
return current
current = self.head
previous = None
found = False
if current.get_data() == data:
found = True
else:
previous = current
current = current.get_next()
if current is None:
if previous is None:
self.head = current.get_next()
else:
previous.set_next(current.get_next())
Source Code for creating , inserting ,deleting the Implementation of Single Linked List:
import sys
Chapter 3. LINKED LISTS 69
import os.path
sys.path.append(os.path.join(os.path.abspath(os.pardir), ”/home/satya/PycharmProjects
/DataStractures”))
import unittest
class TestLinkedList(unittest.TestCase):
def setUp(self):
self.list = LinkedList()
def tearDown(self):
self.list = None
def test_insert(self):
self.list.insert(”David”)
self.assertTrue(self.list.head.get_data() == ”David”)
self.assertTrue(self.list.head.get_next() is None)
def test_insert_two(self):
self.list.insert(”David”)
self.list.insert(”Thomas”)
self.assertTrue(self.list.head.get_data() == ”Thomas”)
head_next = self.list.head.get_next()
self.assertTrue(head_next.get_data() == ”David”)
def test_nextNode(self):
self.list.insert(”Jacob”)
self.list.insert(”Pallymay”)
Chapter 3. LINKED LISTS 70
self.list.insert(”Rasmus”)
self.assertTrue(self.list.head.get_data() == ”Rasmus”)
head_next = self.list.head.get_next()
self.assertTrue(head_next.get_data() == ”Pallymay”)
last = head_next.get_next()
self.assertTrue(last.get_data() == ”Jacob”)
def test_positive_search(self):
self.list.insert(”Jacob”)
self.list.insert(”Pallymay”)
self.list.insert(”Rasmus”)
found = self.list.search(”Jacob”)
self.assertTrue(found.get_data() == ”Jacob”)
found = self.list.search(”Pallymay”)
self.assertTrue(found.get_data() == ”Pallymay”)
found = self.list.search(”Jacob”)
self.assertTrue(found.get_data() == ”Jacob”)
def test_searchNone(self):
self.list.insert(”Jacob”)
self.list.insert(”Pallymay”)
found = self.list.search(”Jacob”)
self.assertTrue(found.get_data() == ”Jacob”)
Chapter 3. LINKED LISTS 71
with self.assertRaises(ValueError):
self.list.search(”Vincent”)
def test_delete(self):
self.list.insert(”Jacob”)
self.list.insert(”Pallymay”)
self.list.insert(”Rasmus”)
self.list.delete(”Rasmus”)
self.assertTrue(self.list.head.get_data() == ”Pallymay”)
self.list.delete(”Jacob”)
self.assertTrue(self.list.head.get_next() is None)
def test_delete_value_not_in_list(self):
self.list.insert(”Jacob”)
self.list.insert(”Pallymay”)
self.list.insert(”Rasmus”)
with self.assertRaises(ValueError):
self.list.delete(”Sunny”)
def test_delete_empty_list(self):
with self.assertRaises(ValueError):
self.list.delete(”Sunny”)
self.list.insert(”Cid”)
self.list.insert(”Pallymay”)
self.list.insert(”Rasmus”)
self.list.delete(”Pallymay”)
self.list.delete(”Cid”)
self.assertTrue(self.list.head.next_node.get_data() == ”Jacob”)
Source Code for creating , inserting ,deleting the Implementation of Single Linked List:
class Node(object):
self.data= data
self.next= next
class SingleList(object):
head =None
tail =None
def show(self):
current_node =self.head
current_node = current_node.next
printNone
node = Node(data,None)
ifself.headisNone:
self.head=self.tail= node
else:
self.tail.next= node
self.tail= node
current_node =self.head
previous_node =None
if current_node.data== node_value:
if previous_node isnotNone:
previous_node.next= current_node.next
else:
self.head= current_node.next
previous_node = current_node
current_node = current_node.next
s = SingleList()
s.append(31)
s.append(2)
Chapter 3. LINKED LISTS 74
s.append(3)
s.append(4)
s.show()
s.remove(31)
s.remove(3)
s.remove(2)
s.show()
A header node is a special dummy node found at the front of the list. The use of header
node is an alternative to remove the first node in a list. For example, the picture below
shows how the list with data 10, 20 and 30 would be represented using a linked list without
and with a header node:
Note that if your linked lists do include a header node, there is no need for the special case
code given above for the remove operation; node n can never be the first node in the list,
so there is no need to check for that case. Similarly, having a header node can simplify
the code that adds a node before a given node n.
Note that if you do decide to use a header node, you must remember to initialize an empty
list to contain one (dummy) node, you must remember not to include the header node in
the count of ”real” nodes in the list.
Chapter 3. LINKED LISTS 75
It is also useful when information other than that found in each node of the list is needed.
For example, imagine an application in which the number of items in a list is often cal-
culated. In a standard linked list, the list function to count the number of nodes has
to traverse the entire list every time. However, if the current length is maintained in a
header node, that information can be obtained very quickly. 3.5. Array based linked lists:
Another alternative is to allocate the nodes in blocks. In fact, if you know the maximum
size of a list a head of time, you can pre-allocate the nodes in a single array. The result
is a hybrid structure – an array based linked list. shows an example of null terminated
single linked list where all the nodes are allocated contiguously in an array.
Figure 3.7: Single linked list with header node conceptual structure
A double linked list is a two-way list in which all nodes will have two links. This helps
in accessing both successor node and predecessor node from the given node position. It
provides bi-directional traversing. Each node contains three fields:
• Left link.
• Data.
• Right link.
The left link points to the predecessor node and the right link points to the successor node.
The data field stores the required data.
Chapter 3. LINKED LISTS 76
Many applications require searching forward and backward thru nodes of a list. For
example searching for a name in a telephone directory would need forward and backward
scanning thru a region of the whole list.
• Creation.
• Insertion.
• Deletion.
• Traversing.
The beginning of the double linked list is stored in a ”start” pointer which points to the
first node. The first node’s left link and last node’s right link is set to NULL.
To display the information, you have to traverse the list, node by node from the first
node, until the end of the list is reached. The function traverse_left_right() is used for
traversing and displaying the information stored in the list from left to right.
To display the information from right to left, you have to traverse the list, node by node
from the first node, until the end of the list is reached. The function traverse_right_left()
is used for traversing and displaying the information stored in the list from right to left.
Chapter 3. LINKED LISTS 77
class DList:
def __init__(self):
self.head = None
self.size = 0
self.size += 1
if self.head is None:
self.head = Node(data)
else:
p = Node(data)
p.next = self.head
self.head.previous = p
self.head = p
if self.head is None:
current = self.head
if index == 0:
self.head = self.head.next
Chapter 3. LINKED LISTS 78
self.head.previous = None
else:
current = current.next
p = current.next.next
if p is None:
current.next = None
else:
current.next = p
p.previous = current
def __sizeof__(self):
return self.size
def __repr__(self):
res = ’[ ’
current = self.head
res += str(current.data)
res += ’ ’
current = current.next
class Node:
if data is None:
self.data = data
self.previous = None
self.next = None
class Node(object):
self.data= data
self.prev= prev
self.next= next
class DoubleList(object):
head =None
tail =None
new_node = Node(data,None,None)
ifself.headisNone:
self.head=self.tail= new_node
else:
new_node.prev=self.tail
new_node.next=None
self.tail.next= new_node
Chapter 3. LINKED LISTS 80
self.tail= new_node
current_node =self.head
if current_node.data== node_value:
if current_node.previsnotNone:
current_node.prev.next= current_node.next
current_node.next.prev= current_node.prev
else:
# otherwise we have no prev (it’s None), head is the next one, and prev becomes None
self.head= current_node.next
current_node.next.prev=None
current_node = current_node.next
def show(self):
current_node =self.head
print current_node.prev.dataifhasattr(current_node.prev,”data”)elseNone,
print current_node.data,
print current_node.next.dataifhasattr(current_node.next,”data”)elseNone
current_node = current_node.next
Chapter 3. LINKED LISTS 81
print”*”*50
d = DoubleList()
d.append(5)
d.append(6)
d.append(50)
d.append(30)
d.show()
d.remove(50)
d.remove(5)
d.show()
It is just a single linked list in which the link field of the last node points back to the address
of the first node. A circular linked list has no beginning and no end. It is necessary to
establish a special pointer called start pointer always pointing to the first node of the
list. Circular linked lists are frequently used instead of ordinary linked list because many
operations are much easier to implement. In circular linked list no null pointers are used,
hence all pointers contain valid address.
• Creation.
• Insertion.
•Deletion.
• Traversing.
Chapter 3. LINKED LISTS 82
class NodeConstants(Enum):
FRONT_NODE = 1
class Node:
self.element = element
self.next_node = next_node
def __str__(self):
if self.element:
return self.element.__str__()
else:
def __repr__(self):
return self.__str__()
class CircularLinkedList:
Chapter 3. LINKED LISTS 83
def __init__(self):
self.head = Node(element=NodeConstants.FRONT_NODE)
self.head.next_node = self.head
def size(self):
count = 0
current = self.head.next_node
count += 1
current = current.next_node
return count
self.head.next_node = node
current_node = self.head.next_node
current_node = current_node.next_node
current_node.next_node = node
if position == 0:
self.insert_front(data)
Chapter 3. LINKED LISTS 84
self.insert_last(data)
else:
current_node = self.head.next_node
current_pos = 0
current_pos += 1
current_node = current_node.next_node
current_node.next_node = node
else:
raise IndexError
def remove_first(self):
self.head.next_node = self.head.next_node.next_node
def remove_last(self):
current_node = self.head.next_node
current_node = current_node.next_node
current_node.next_node = self.head
if position == 0:
Chapter 3. LINKED LISTS 85
self.remove_first()
self.remove_last()
else:
current_node = self.head.next_node
current_pos = 0
current_node = current_node.next_node
current_pos += 1
current_node.next_node = current_node.next_node.next_node
else:
raise IndexError
current_node = self.head.next_node
current_pos = 0
current_node = current_node.next_node
current_pos += 1
return current_node.element
else:
Chapter 3. LINKED LISTS 86
raise IndexError
import unittest
class TestCircularLinkedList(unittest.TestCase):
’Linda Belcher’,
’Tina Belcher’,
’Gene Belcher’,
’Louise Belcher’]
def test_init(self):
dll = CircularLinkedList()
self.assertIsNotNone(dll.head)
self.assertEqual(dll.size(), 0)
def test_insert_front(self):
dll = CircularLinkedList()
dll.insert_front(name)
self.assertEqual(dll.fetch(0), TestCircularLinkedList.names[4])
self.assertEqual(dll.fetch(1), TestCircularLinkedList.names[3])
self.assertEqual(dll.fetch(2), TestCircularLinkedList.names[2])
self.assertEqual(dll.fetch(3), TestCircularLinkedList.names[1])
self.assertEqual(dll.fetch(4), TestCircularLinkedList.names[0])
Chapter 3. LINKED LISTS 87
def test_insert_last(self):
dll = CircularLinkedList()
dll.insert_last(name)
self.assertEqual(dll.fetch(i), TestCircularLinkedList.names[i])
def test_insert(self):
dll = CircularLinkedList()
dll.insert_last(name)
dll.insert(’Teddy’, pos)
self.assertEqual(dll.fetch(pos), ’Teddy’)
def test_remove_first(self):
dll = CircularLinkedList()
dll.insert_last(name)
self.assertEqual(dll.size(), i)
dll.remove_first()
def test_remove_last(self):
dll = CircularLinkedList()
Chapter 3. LINKED LISTS 88
dll.insert_last(name)
self.assertEqual(dll.size(), i)
dll.remove_last()
def test_remove(self):
dll = CircularLinkedList()
dll.insert_last(name)
dll.remove(1)
if __name__ == ’__main__’:
unittest.main()
A circular double linked list has both successor pointer and predecessor pointer in circular
manner. The objective behind considering circular double linked list is to simplify the
insertion and deletion operations performed on double linked list. In circular double linked
list the right link of the right most node points back to the start node and left link of the
first node points to the last node.
Figure 3.10: Creating a Circular Double Linked List with ‘n’ number of nodes
• Creation.
• Insertion.
• Deletion.
• Traversing.
The major disadvantage of doubly linked lists (over singly linked lists) is that they require
more space (every node has two pointer fields instead of one). Also, the code to manipulate
doubly linked lists needs to maintain the prev fields as well as the next fields; the more
fields that have to be maintained, the more chance there is for errors.
The major advantage of doubly linked lists is that they make some operations (like the
removal of a given node, or a right-to-left traversal of the list) more efficient.
The major advantage of circular lists (over non-circular lists) is that they eliminate some
extra-case code for some operations (like deleting last node). Also, some applications lead
naturally to circular list representations. For example, a computer network might best be
modeled using a circular list.
Polynomials:
5x2 + 3x + 1
12x3 – 4x
A linked list structure that represents polynomials 5x4 – 8x3 + 2x2 + 4x1 + 9x0
Addition of Polynomials:
To add two polynomials we need to scan them once. If we find terms with the same
exponent in the two polynomials, then we add the coefficients; otherwise, we copy the
term of larger exponent into the sum and go on. When we reach at the end of one of the
polynomial, then remaining part of the other is copied into the sum.
• Add them.
Course Outcomes
After successful completion of this module, students should be able to:
CO 7 Implement linked lists, stacks and queues in Python for problem Under-
solving. stand
CO 8 Explain the use of basic data structures such as arrays, stacks, Under-
queues and linked lists in program design. stand
A tree is a non-empty set one element of which is designated the root of the tree while the
remaining elements are partitioned into non-empty sets each of which is a sub-tree of the
root.
A tree T is a set of nodes storing elements such that the nodes have a parent-child rela-
tionship that satisfies the following
• If T is not empty, T has a special tree called the root that has no parent.
• Each node v of T different than the root has a unique parent node w; each node with
parent w is a child of w.
91
Chapter 4. NON LINEAR DATA STRUCTURES 92
Tree nodes have many useful properties. The depth of a node is the length of the path
(or the number of edges) from the root to that node. The height of a node is the longest
path from that node to its leaves. The height of a tree is the height of the root. A leaf
node has no children – its only path is up to its parent.
Binary Tree:
In a binary tree, each node can have at most two children. A binary tree is either empty
or consists of a node called the root together with two binary trees called the left subtree
and the right subtree.
Tree Terminology:
Leaf node
A node with no children is called a leaf (or external node). A node which is not a leaf is
called an internal node.
Chapter 4. NON LINEAR DATA STRUCTURES 93
Path: A sequence of nodes n1, n2, . . ., nk, such that ni is the parent of ni + 1 for i = 1,
2,. . ., k - 1. The length of a path is 1 less than the number of nodes on the path. Thus
there is a path of length zero from a node to itself.
Ancestor and Descendent If there is a path from node A to node B, then A is called an
ancestor of B and B is called a descendent of A.
Level: The level of the node refers to its distance from the root. The root of the tree has
level O, and the level of any other node in the tree is one more than the level of its parent.
Height:The maximum level in a tree determines its height. The height of a node in a tree
is the length of a longest path from the node to a leaf. The term depth is also used to
denote height of the tree.
Depth:The depth of a node is the number of nodes along the path from the root to that
node.
Assigning level numbers and Numbering of nodes for a binary tree: The nodes of a binary
tree can be numbered in a natural way, level by level, left to right. The nodes of a
complete binary tree can be numbered so that the root is assigned the number 1, a left
child is assigned twice the number assigned its parent, and a right child is assigned one
more than twice the number assigned its parent.
3. Since a binary tree can contain at most one node at level 0 (the root), it can contain
at most 2l node at level l.
If every non-leaf node in a binary tree has nonempty left and right subtrees, the tree is
termed a strictly binary tree. Thus the tree of figure 7.2.3(a) is strictly binary. A strictly
binary tree with n leaves always contains 2n - 1 nodes.
A full binary tree of height h has all its leaves at level h. Alternatively; All non leaf nodes
of a full binary tree have two children, and the leaf nodes have no children. A full binary
tree with height h has 2h + 1 - 1 nodes. A full binary tree of height h is a strictly binary
tree all of whose leaves are at level h.
Chapter 4. NON LINEAR DATA STRUCTURES 95
A binary tree with n nodes is said to be complete if it contains all the first n nodes of the
above numbering scheme.
A complete binary tree of height h looks like a full binary tree down to level h-1, and the
level h is filled from left to right.
A Binary tree is Perfect Binary Tree in which all internal nodes have two children and all
leaves are at same level.
A Perfect Binary Tree of height h (where height is number of nodes on path from root to
leaf) has 2h – 1 node.
Chapter 4. NON LINEAR DATA STRUCTURES 96
Example of Perfect binary tree is ancestors in family. Keep a person at root, parents as
children, parents of parents as their children.
A binary tree is balanced if height of the tree is O(Log n) where n is number of nodes. For
Example, AVL tree maintain O(Log n) height by making sure that the difference between
heights of left and right subtrees is 1. Red-Black trees maintain O(Log n) height by making
sure that the number of Black nodes on every root to leaf paths are same and there are
no adjacent red nodes. Balanced Binary Search trees are performance wise good as they
provide O(log n) time for search, insert and delete.
2. Pointer-based.
For these nodes are numbered / indexed according to a scheme giving 0 to root. Then
all the nodes are numbered from left to right level by level from top to bottom. Empty
nodes are also numbered. Then each node having an index i is put into the array as its
ith element.
In the figure shown below the nodes of binary tree are numbered according to the given
scheme.
The figure shows how a binary tree is represented as an array. The root 3 is the 0th
element while its leftchild 5 is the 1 st element of the array. Node 6 does not have any
child so its children i.e. 7 th and 8 th element of the array are shown as a Null value.
It is found that if n is the number or index of a node, then its left child occurs at (2n +
1)th position and right child at (2n + 2) th position of the array. If any node does not
have any of its child, then null value is stored at the corresponding index of the array.
Chapter 4. NON LINEAR DATA STRUCTURES 97
The following program implements the above binary tree in an array form. And then
traverses the tree in inorder traversal.
# parent array
# A node structure
class Node:
self.key = key
self.left = None
self.right = None
”””
return
created[i] = Node(i)
if parent[i] == -1:
return
p = created[parent[i]]
if p.left is None:
p.left = created[i]
# If second child
else:
p.right = created[i]
# created tree
def createTree(parent):
n = len(parent)
Chapter 4. NON LINEAR DATA STRUCTURES 99
root = [None]
for i in range(n):
return root[0]
def inorder(root):
inorder(root.left)
print root.key,
inorder(root.right)
# Driver Method
parent = [-1, 0, 0, 1, 1, 3, 5]
root = createTree(parent)
Binary trees can be represented by links where each node contains the address of the left
child and the right child. If any node has its left or right child empty then it will have in
its respective link field, a null value. A leaf node has null value in both of its links.
class ListNode:
self.data = data
self.next = None
class BinaryTreeNode:
self.data = data
self.left = None
self.right = None
class Conversion:
self.head = None
self.root = None
new_node = ListNode(new_data)
new_node.next = self.head
self.head = new_node
def convertList2Binary(self):
q = []
# Base Case
if self.head is None:
self.root = None
return
self.root = BinaryTreeNode(self.head.data)
q.append(self.root)
Chapter 4. NON LINEAR DATA STRUCTURES 102
self.head = self.head.next
while(self.head):
leftChild= None
rightChild = None
leftChild = BinaryTreeNode(self.head.data)
q.append(leftChild)
self.head = self.head.next
if(self.head):
rightChild = BinaryTreeNode(self.head.data)
q.append(rightChild)
self.head = self.head.next
parent.left = leftChild
parent.right = rightChild
if(root):
self.inorderTraversal(root.left)
print root.data,
self.inorderTraversal(root.right)
conv = Conversion()
conv.push(36)
conv.push(30)
conv.push(25)
conv.push(15)
conv.push(12)
conv.push(10)
conv.convertList2Binary()
conv.inorderTraversal(conv.root)
Traversal of a binary tree means to visit each node in the tree exactly once. The tree
traversal is used in all the applications of it.
Chapter 4. NON LINEAR DATA STRUCTURES 104
In a linear list nodes are visited from first to last, but a tree being a non linear one we
need definite rules. There are a no. of ways to traverse a tree. All of them differ only in
the order in which they visit the nodes.
� Inorder Traversal
� Preorder Traversal
� Postorder Traversal
In all of them we do not require to do anything to traverse an empty tree. All the traversal
methods are based on recursive functions since a binary tree is itself recursive as every
child of a node in a binary tree is itself a binary tree
Inorder Traversal:
To traverse a non empty tree in inorder the following steps are followed recursively.
Preorder Traversal:
Post-order Traversal:
# Binary Tree
class Node:
def __init__(self,key):
self.left = None
Chapter 4. NON LINEAR DATA STRUCTURES 105
self.right = None
self.val = key
def printInorder(root):
if root:
printInorder(root.left)
print(root.val),
printInorder(root.right)
def printPostorder(root):
if root:
printPostorder(root.left)
printPostorder(root.right)
print(root.val),
def printPreorder(root):
Chapter 4. NON LINEAR DATA STRUCTURES 106
if root:
print(root.val),
printPreorder(root.left)
printPreorder(root.right)
# Driver code
root = Node(1)
root.left = Node(2)
root.right = Node(3)
root.left.left = Node(4)
root.left.right = Node(5)
printPreorder(root)
printInorder(root)
printPostorder(root)
Binary Search Tree, is a node-based binary tree data structure which has the following
properties:
Chapter 4. NON LINEAR DATA STRUCTURES 107
� The left sub-tree of a node contains only nodes with keys less than the node’s key.
� The right sub-tree of a node contains only nodes with keys greater than the node’s key.
� The left and right sub-tree each must also be a binary search tree.
The above properties of Binary Search Tree provide an ordering among keys so that the
operations like search, minimum and maximum can be done fast. If there is no ordering,
then we may have to compare every key to search a given key.
Searching a key:
Binary Search Tree, is a node-based binary tree data structure which has the following
properties:
� The left sub-tree of a node contains only nodes with keys less than the node’s key.
� The right sub-tree of a node contains only nodes with keys greater than the node’s key.
� The left and right sub-tree each must also be a binary search tree.
The above properties of Binary Search Tree provide an ordering among keys so that the
operations like search, minimum and maximum can be done fast. If there is no ordering,
then we may have to compare every key to search a given key.
Chapter 4. NON LINEAR DATA STRUCTURES 108
Searching a key
To search a given key in Binary Search Tree, we first compare it with root, if the key is
present at root, we return root. If key is greater than root’s key, we recur for right sub-tree
of root node. Otherwise we recur for left sub-tree.
def search(root,key):
return root
return search(root.right,key)
return search(root.left,key)
Priority Queues:
2) An element with high priority is dequeued before an element with low priority.
3) If two elements have the same priority, they are served according to their order in the
queue.
insert() operation can be implemented by adding an item at end of array in O(1) time.
We can also use Linked List, time complexity of all operations with linked list remains
same as array. The advantage with linked list is deleteHighestPriority() can be more
efficient as we don’t have to move items.
1) CPU Scheduling
2) Graph algorithms like Dijkstra’s shortest path algorithm, Prim’s Minimum Spanning
Tree, etc 3) All queue applications where priority is involved.
Application of Trees:
1. One reason to use trees might be because you want to store information that naturally
forms a hierarchy. For example, the file system on a computer:
If we organize keys in form of a tree (with some ordering e.g., BST), we can search for
a given key in moderate time (quicker than Linked List and slower than arrays). Self-
balancing search trees like AVL and Red-Black trees guarantee an upper bound of O(logn)
for search.
3) We can insert/delete keys in moderate time (quicker than Arrays and slower than Un-
ordered Linked Lists). Self-balancing search trees like AVL and Red-Black trees guarantee
an upper bound of O(logn) for insertion/deletion.
Chapter 4. NON LINEAR DATA STRUCTURES 110
4) Like Linked Lists and unlike Arrays, Pointer implementation of trees don’t have an
upper limit on number of nodes as nodes are linked using pointers.
5. Router algorithms
The pair is ordered because (u, v) is not same as (v, u) in case of directed graph (di-graph).
The pair of form (u, v) indicates that there is an edge from vertex u to vertex v. The
edges may contain weight/value/cost.
Graphs are used to represent many real life applications: Graphs are used to represent
networks. The networks may include paths in a city or telephone network or circuit
network. Graphs are also used in social networks like linkedIn, facebook. For example, in
facebook, each person is represented with a vertex(or node). Each node is a structure and
contains information like person id, name, gender and locale.
1. Adjacency Matrix
Chapter 4. NON LINEAR DATA STRUCTURES 111
2. Adjacency List
There are other representations also like, Incidence Matrix and Incidence List. The choice
of the graph representation is situation specific. It totally depends on the type of operations
to be performed and ease of use.
Adjacency Matrix:
Pros: Representation is easier to implement and follow. Removing an edge takes O(1)
time. Queries like whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient and
can be done O(1).
Cons: Consumes more space O(V2). Even if the graph is sparse (contains less number of
edges), it consumes the same space. Adding a vertex is O(V2) time.
Adjacency List:
An array of linked lists is used. Size of the array is equal to number of vertices. Let the
array be array[]. An entry array[i] represents the linked list of vertices adjacent to the ith
vertex. This representation can also be used to represent a weighted graph. The weights
of edges can be stored in nodes of linked lists. Following is adjacency list representation
of the above graph.
Breadth First Traversal (or Search) for a graph is similar to Breadth First Traversal of a
tree The only catch here is, unlike trees, graphs may contain cycles, so we may come to
the same node again. To avoid processing a node more than once, we use a boolean visited
array.
For example, in the following graph, we start traversal from vertex 2. When we come to
vertex 0, we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we don’t
Chapter 4. NON LINEAR DATA STRUCTURES 113
mark visited vertices, then 2 will be processed again and it will become a non-terminating
process. Breadth First Traversal of the following graph is 2, 0, 3, 1.
BFS(V, E, s)
do color[u] ← WHITE
d[u] ← infinity
�[u] ← NIL
color[s] ← GRAY
d[s] ← 0
�[s] ← NIL
Q ← {}
ENQUEUE(Q, s)
while Q is non-empty
do u ← DEQUEUE(Q)
do if color[v] ← WHITE
Chapter 4. NON LINEAR DATA STRUCTURES 114
d[v] ← d[u] + 1
�[v] ← u
ENQUEUE(Q, v)
DEQUEUE(Q)
color[u] ← BLACK
1) Shortest Path and Minimum Spanning Tree for unweighted graph In unweighted graph,
the shortest path is the path with least number of edges. With Breadth First, we al-
ways reach a vertex from given source using minimum number of edges. Also, in case of
unweighted graphs, any spanning tree is Minimum Spanning Tree and we can use either
Depth or Breadth first traversal for finding a spanning tree.
2) Peer to Peer Networks. In Peer to Peer Networks like BitTorrent, Breadth First Search
is used to find all neighbor nodes.
3) Crawlers in Search Engines: Crawlers build index using Bread First. The idea is to
start from source page and follow all links from source and keep doing same. Depth First
Traversal can also be used for crawlers, but the advantage with Breadth First Traversal
is, depth or levels of built tree can be limited.
4) Social Networking Websites: In social networks, we can find people within a given
distance ‘k’ from a person using Breadth First Search till ‘k’ levels.
5) GPS Navigation systems: Breadth First Search is used to find all neighboring locations.
7) In Garbage Collection: Breadth First Search is used in copying garbage collection using
Cheney’s algorithm.
Chapter 4. NON LINEAR DATA STRUCTURES 115
8) Cycle detection in undirected graph: In undirected graphs, either Breadth First Search
or Depth First Search can be used to detect cycle. In directed graph, only depth first
search can be used.
10) To test if a graph is Bipartite We can either use Breadth First or Depth First Traversal.
11) Path Finding We can either use Breadth First or Depth First Traversal to find if there
is a path between two vertices.
12) Finding all nodes within one connected component: We can either use Breadth First
or Depth First Traversal to find all nodes reachable from a given node.
Depth First Traversal (or Search) for a graph is similar to Depth First Traversal of a tree.
The only catch here is, unlike trees, graphs may contain cycles, so we may come to the
same node again. To avoid processing a node more than once, we use a boolean visited
array. For example, in the following graph, we start traversal from vertex 2. When we
come to vertex 0, we look for all adjacent vertices of it. 2 is also an adjacent vertex of
0. If we don’t mark visited vertices, then 2 will be processed again and it will become a
non-terminating process. Depth First Traversal of the following graph is 2, 0, 1, 3
The DFS forms a depth-first forest comprised of more than one depth-first trees. Each
tree is made of edges (u, v) such that u is gray and v is white when edge (u, v) is explored.
The following pseudocode for DFS uses a global timestamp time.
DFS (V, E)
do color[u] ← WHITE
�[u] ← NIL
time ← 0
do if color[u] ← WHITE
then DFS-Visit(u)
DFS-Visit(u)
color[u] ← GRAY
time ← time + 1
d[u] ← time
do if color[v] ← WHITE
then �[v] ← u
DFS-Visit(v)
color[u] ← BLACK
time ← time + 1
f[u] ← time
Chapter 4. NON LINEAR DATA STRUCTURES 117
1) For an unweighted graph, DFS traversal of the graph produces the minimum spanning
tree and all pair shortest path tree.
A graph has cycle if and only if we see a back edge during DFS. So we can run DFS for
the graph and check for back edges. (See this for details)
3) Path Finding
We can specialize the DFS algorithm to find a path between two given vertices u and z.
ii) Use a stack S to keep track of the path between the start vertex and the current vertex.
iii) As soon as destination vertex z is encountered, return the path as the contents of the
stack
4) Topological Sorting
We can augment either BFS or DFS when we first discover a new vertex, color it opposite
its parents, and for each other edge, check it doesn’t link two vertices of the same color.
The first vertex in any connected component can be red or black! See this for details.
DFS search starts from root node then traversal into left child node and continues, if item
found it stops otherwise it continues. The advantage of DFS is it requires less memory
compare to Breadth First Search (BFS).
class Graph:
# Constructor
def __init__(self):
self.graph = defaultdict(list)
def addEdge(self,u,v):
self.graph[u].append(v)
def DFSUtil(self,v,visited):
visited[v]= True
print (v),
for i in self.graph[v]:
Chapter 4. NON LINEAR DATA STRUCTURES 119
if visited[i] == False:
self.DFSUtil(i, visited)
# recursive DFSUtil()
def DFS(self,v):
visited = [False]*(len(self.graph))
# DFS traversal
self.DFSUtil(v,visited)
# Driver code
g = Graph()
g.addEdge(0, 1)
g.addEdge(0, 2)
g.addEdge(1, 2)
g.addEdge(2, 0)
g.addEdge(2, 3)
g.addEdge(3, 3)
g.DFS(2)
# from s.
# list representation
class Graph:
# Constructor
def __init__(self):
self.graph = defaultdict(list)
def addEdge(self,u,v):
self.graph[u].append(v)
visited = [False]*(len(self.graph))
queue = []
queue.append(s)
visited[s] = True
Chapter 4. NON LINEAR DATA STRUCTURES 121
while queue:
s = queue.pop(0)
print (s)
for i in self.graph[s]:
if visited[i] == False:
queue.append(i)
visited[i] = True
# Driver code
g = Graph()
g.addEdge(0, 1)
g.addEdge(0, 2)
g.addEdge(1, 2)
g.addEdge(2, 0)
g.addEdge(2, 3)
g.addEdge(3, 3)
g.BFS(2)
Chapter 5
Course Outcomes
After successful completion of this module, students should be able to:
CO 9 Extend their knowledge of data structures to more sophisti- Under-
cated data structures to solve problems involving balanced bi- stand
nary search trees, AVL Trees, B-trees and B+ trees, hashing, and
basic graphs.
CO 9 Design and contrast the benefits of dynamic and static data struc- Apply
tures implementations and choose appropriate data structure for
specified problem domain.
An important special kind of binary tree is the binary search tree (BST). In a BST, each
node stores some information including a unique key value, and perhaps some associated
data. A binary tree is a BST iff, for every node n in the tree:
• All keys in n’s left subtree are less than the key in n, and
• All keys in n’s right subtree are greater than the key in n.
122
Chapter 5. BINARY TREES AND HASHING 123
In other words, binary search trees are binary trees in which all values in the node’s left
subtree are less than node value all values in the node’s right subtree are greater than
node value.
Here are some BSTs in which each node just stores an integer key:
In the left one 5 is not greater than 6. In the right one 6 is not greater than 7. The reason
binary-search trees are important is that the following operations can be implemented
efficiently using a BST:
2. The keys in the left subtree are < (less) than the key in its parent node
3. The keys in the right subtree > (greater) than the key in its parent node
Inserting a node
A naïve algorithm for inserting a node into a BST is that, we start from the root node, if
the node to insert is less than the root, we go to left child, and otherwise we go to the right
child of the root. We continue this process (each node is a root for some sub tree) until
we find a null pointer (or leaf node) where we cannot go any further. We then insert the
node as a left or right child of the leaf node based on node is less or greater than the leaf
node. We note that a new node is always inserted as a leaf node. A recursive algorithm
for inserting a node into a BST is as follows. Assume we insert a node N to tree T. if the
tree is empty, the we return new node N as the tree. Otherwise, the problem of inserting
is reduced to inserting the node N to left of right sub trees of T, depending on N is less or
greater than T. A definition is as follows.
Insert(N, T) = N if T is empty
Chapter 5. BINARY TREES AND HASHING 125
Searching for a node is similar to inserting a node. We start from root, and then go left
or right until we find (or not find the node). A recursive definition of search is as follows.
If the node is equal to root, then we return true. If the root is null, then we return false.
Otherwise we recursively solve the problem for T.left or T.right, depending on N < T or
N > T. A recursive definition is as follows.
Search should return a true or false, depending on the node is found or not.
= true if T = N
Deleting a node
A BST is a connected structure. That is, all nodes in a tree are connected to some other
node. For example, each node has a parent, unless node is the root. Therefore deleting a
node could affect all sub trees of that node. For example, deleting node 5 from the tree
could result in losing sub trees that are rooted at 1 and 9.
Hence we need to be careful about deleting nodes from a tree. The best way to deal with
deletion seems to be considering special cases. What if the node to delete is a leaf node?
What if the node is a node with just one child? What if the node is an internal node (with
two children). The latter case is the hardest to resolve. But we will find a way to handle
this situation as well.
This is a very easy case. Just delete the node 46. We are done
Case 2 : The node to delete is a node with one child. This is also not too bad. If the node
to be deleted is a left child of the parent, then we connect the left pointer of the parent
(of the deleted node) to the single child. Otherwise if the node to be deleted is a right
child of the parent, then we connect the right pointer of the parent (of the deleted node)
to single child.
Case 3: The node to delete is a node with two children This is a difficult case as we need to
deal with two sub trees. But we find an easy way to handle it. First we find a replacement
node (from leaf node or nodes with one child) for the node to be deleted. We need to do
this while maintaining the BST order property. Then we swap leaf node or node with one
child with the node to be deleted (swap the data) and delete the leaf node or node with
one child (case 1 or case 2)
Next problem is finding a replacement leaf node for the node to be deleted. We can easily
find this as follows. If the node to be deleted is N, the find the largest node in the left
sub tree of N or the smallest node in the right sub tree of N. These are two candidates
that can replace the node to be deleted without losing the order property. For example,
consider the following tree and suppose we need to delete the root 38.
Then we find the largest node in the left sub tree (15) or smallest node in the right sub
tree (45) and replace the root with that node and then delete that node. The following
set of images demonstrates this process. Let’s see when we delete 13 from that tree.
A self-balancing (or height-balanced) binary search tree is any node-based binary search
tree that automatically keeps its height (maximal number of levels below the root) small
in the face of arbitrary item insertions and deletions. The red–black tree, which is a type
of self-balancing binary search tree, was called symmetric binary B-tree. Self-balancing
binary search trees can be used in a natural way to construct and maintain ordered lists,
such as priority queues. They can also be used for associative arrays; key-value pairs are
simply inserted with an ordering based on the key alone. In this capacity, self-balancing
BSTs have a number of advantages and disadvantages over their main competitor, hash
tables. One advantage of self-balancing BSTs is that they allow fast (indeed, asymptoti-
cally optimal) enumeration of the items in key order, which hash tables do not provide.
One disadvantage is that their lookup algorithms get more complicated when there may
be multiple items with the same key. Self-balancing BSTs have better worst-case lookup
performance than hash tables (O(log n) compared to O(n)), but have worse average-case
performance (O(log n) compared to O(1)). Self-balancing BSTs can be used to implement
any algorithm that requires mutable ordered lists, to achieve optimal worst-case asymptotic
performance. For example, if binary tree sort is implemented with a self-balanced BST, we
have a very simple-to-describe yet asymptotically optimal O(n log n) sorting algorithm.
Similarly, many algorithms in computational geometry exploit variations on self-balancing
BSTs to solve problems such as the line segment intersection problem and the point loca-
tion problem efficiently. (For average-case performance, however, self-balanced BSTs may
be less efficient than other solutions. Binary tree sort, in particular, is likely to be slower
than merge sort, quicksort, or heapsort, because of the tree-balancing overhead as well as
cache access patterns.)
Self-balancing BSTs are flexible data structures, in that it’s easy to extend them to effi-
ciently record additional information or perform new operations. For example, one can
record the number of nodes in each subtree having a certain property, allowing one to count
the number of nodes in a certain key range with that property in O(log n) time. These
extensions can be used, for example, to optimize database queries or other list-processing
algorithms.
Chapter 5. BINARY TREES AND HASHING 129
An AVL tree is another balanced binary search tree. Named after their inventors, Adelson-
Velskii and Landis, they were the first dynamically balanced trees to be proposed. Like
red-black trees, they are not perfectly balanced, but pairs of sub-trees differ in height by
at most 1, maintaining an O(logn) search time. Addition and deletion operations also take
O(logn) time.
Definition of an AVL tree: An AVL tree is a binary search tree which has the following
properties:
No this is not an AVL tree. Sub-tree with root 8 has height 4 and sub-tree with root 18
has height 2.
An AVL tree implements the Map abstract data type just like a regular binary search
tree, the only difference is in how the tree performs. To implement our AVL tree we need
to keep track of a balance factor for each node in the tree. We do this by looking at the
heights of the left and right subtrees for each node. More formally, we define the balance
factor for a node as the difference between the height of the left subtree and the height of
the right subtree.
balanceFactor=height(leftSubTree)−height(rightSubTree)
Chapter 5. BINARY TREES AND HASHING 130
Using the definition for balance factor given above we say that a subtree is left-heavy if
the balance factor is greater than zero. If the balance factor is less than zero then the
subtree is right heavy. If the balance factor is zero then the tree is perfectly in balance.
For purposes of implementing an AVL tree, and gaining the benefit of having a balanced
tree we will define a tree to be in balance if the balance factor is -1, 0, or 1. Once the
balance factor of a node in a tree is outside this range we will need to have a procedure to
bring the tree back into balance. Figure shows an example of an unbalanced, right-heavy
tree and the balance factors of each node.
AVL trees are identical to standard binary search trees except that for every node in an
AVL tree, the height of the left and right subtrees can differ by at most 1 (Weiss, 1993,
p:108). AVL trees are HB-k trees (height balanced trees of order k) of order HB-1.
When storing an AVL tree, a field must be added to each node with one of three values:
1, 0, or -1. A value of 1 in this field means that the left subtree has a height one more
than the right subtree. A value of -1 denotes the opposite. A value of 0 indicates that
the heights of both subtrees are the same. Updates of AVL trees require up to rotations,
whereas updating red-black trees can be done using only one or two rotations (up to color
changes). For this reason, they (AVL trees) are considered a bit obsolete by some.
Sparse AVL trees are defined as AVL trees of height h with the fewest possible nodes.
Figure 3 shows sparse AVL trees of heights 0, 1, 2, and 3.
A multiway tree is a tree that can have more than two children. A multiway tree of order
m (or an m-way tree) is one in which a tree can have m children.
Chapter 5. BINARY TREES AND HASHING 131
As with the other trees that have been studied, the nodes in an m-way tree will be made
up of key fields, in this case m-1 key fields, and pointers to children.
To make the processing of m-way trees easier some type of order will be imposed on the
keys within each node, resulting in a multiway search tree of order m (or an m-way search
tree). By definition an m-way search tree is a m-way tree in which:
• The keys in the first i children are smaller than the ith key
• The keys in the last m-i children are larger than the ith key
M-way search trees give the same advantages to m-way trees that binary search trees gave
to binary trees - they provide fast information retrieval and update. However, they also
have the same problems that binary search trees had - they can become unbalanced, which
means that the construction of the tree becomes of vital importance.
5.4 B Trees:
An extension of a multiway search tree of order m is a B-tree of order m. This type of tree
will be used when the data to be accessed/stored is located on secondary storage devices
because they allow for large amounts of data to be stored in a node.
1. The root has at least two subtrees unless it is the only node in the tree.
2. Each nonroot and each nonleaf node have at most m nonempty children and at least
m/2 nonempty children.
3. The number of keys in each nonroot and each nonleaf node is one less than the number
of its nonempty children.
These restrictions make B-trees always at least half full, have few levels, and remain
perfectly balanced.
An algorithm for finding a key in B-tree is simple. Start at the root and determine which
pointer to follow based on a comparison between the search value and key fields in the
root node. Follow the appropriate pointer to a child node. Examine the key fields in the
child node and continue to follow the appropriate pointers until the search value is found
or a leaf node is reached that doesn’t contain the desired search value.
The condition that all leaves must be on the same level forces a characteristic behavior of
B-trees, namely that B-trees are not allowed to grow at the their leaves; instead they are
forced to grow at the root.
When inserting into a B-tree, a value is inserted directly into a leaf. This leads to three
common situations that can occur:
As usual, this is the hardest of the processes to apply. The deletion process will basically
be a reversal of the insertion process - rather than splitting nodes, it’s possible that nodes
will be merged so that B-tree properties, namely the requirement that a node must be at
least half full, can be maintained.
Hashing is the technique used for performing almost constant time search in case of inser-
tion, deletion and find operation. Taking a very simple example of it, an array with its
index as key is the example of hash table. So each index (key) can be used for accessing
the value in a constant search time. This mapping key must be simple to compute and
must helping in identifying the associated value. Function which helps us in generating
such kind of key-value mapping is known as Hash Function.
In a hashing system the keys are stored in an array which is called the Hash Table. A
perfectly implemented hash table would always promise an average insert/delete/retrieval
time of O(1).
Chapter 5. BINARY TREES AND HASHING 134
A function which employs some algorithm to computes the key K for all the data elements
in the set U, such that the key K which is of a fixed size. The same key K can be used
to map data to a hash table and all the operations like insertion, deletion and searching
should be possible. The values returned by a hash function are also referred to as hash
values, hash codes, hash sums, or hashes.
Hash Collision:
location in the has table, is called a hash collision. In such a situation two or more data
elements would qualify to be stored / mapped to the same location in the hash table.
Open Hashing, is a technique in which the data is not directly stored at the hash key
index (k) of the Hash table. Rather the data at the key index (k) in the hash table is a
Chapter 5. BINARY TREES AND HASHING 135
pointer to the head of the data structure where the data is actually stored. In the most
simple and common implementations the data structure adopted for storing the element
is a linked-list.
n this technique when a data needs to be searched, it might become necessary (worst case)
to traverse all the nodes in the linked list to retrieve the data.
Note that the order in which the data is stored in each of these linked lists (or other data
structures) is completely based on implementation requirements. Some of the popular
criteria are insertion order, frequency of access etc.
In this technique a hash table with pre-identified size is considered. All items are stored
in the hash table itself. In addition to the data, each hash bucket also maintains the
three states: EMPTY, OCCUPIED, DELETED. While inserting, if a collision occurs,
alternative cells are tried until an empty bucket is found. For which one of the following
technique is adopted.
1. Liner Probing
2. Quadratic probing
3. Double hashing (in short in case of collision another hashing function is used with the
key value as an input to identify where in the open addressing scheme the data should
actually be stored.)
Applications of Hashing:
A hash function maps a variable length input string to fixed length output string – its hash
value, or hash for short. If the input is longer than the output, then some inputs must
map to the same output – a hash collision. Comparing the hash values for two inputs can
Chapter 5. BINARY TREES AND HASHING 136
give us one of two answers: the inputs are definitely not the same, or there is a possibility
that they are the same. Hashing as we know it is used for performance improvement,
error checking, and authentication. One example of a performance improvement is the
common hash table, which uses a hash function to index into the correct bucket in the
hash table, followed by comparing each element in the bucket to find a match. In error
checking, hashes (checksums, message digests, etc.) are used to detect errors caused by
either hardware or software. Examples are TCP checksums, ECC memory, and MD5
checksums on downloaded files. In this case, the hash provides additional assurance that
the data we received is correct. Finally, hashes are used to authenticate messages. In this
case, we are trying to protect the original input from tampering, and we select a hash that
is strong enough to make malicious attack infeasible or unprofitable.
• Digital signature
• Timestamping
[3] Tata McGraw Hill Education Tata McGraw Hill Education, 1st Edition, 2008.
137