Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 4 Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 49

UNIT-4

NON-LINEAR DATA STRUCTURES


Trees – Binary Trees – Tree Traversals – Expression Trees – Binary Search Tree –
Hashing - Hash Functions – Separate Chaining – Open Addressing – Linear Probing–
Quadratic Probing – Double Hashing – Rehashing.

4.1 INTRODUCTION TO TRESS


 A tree is non-linear and a hierarchical data structure consisting of a collection of
nodes such that each node of the tree stores a value and a list of references to
other nodes (the “children”). This data structure is a specialized method to
organize and store data in the computer to be used more effectively.

4.1.1 Example of Tree data structure

Here,
 Node A is the root node
 B is the parent of D and E
 D and E are the siblings
 D, E, F and G are the leaf nodes
 A and B are the ancestors of E

4.1.2 Basic Terminologies in Tree Data Structure


 Parent Node: The node which is a predecessor of a node is called the parent
node of that node.
 Child Node: The node which is the immediate successor of a node is called the
child node of that node.
 Root Node: The topmost node of a tree or the node which does not have any
parent node is called the root node. A non-empty tree must contain exactly one
root node and exactly one path from the root to all other nodes of the tree.
 Leaf Node or External Node: The nodes which do not have any child nodes are
called leaf nodes.
 Ancestor of a Node: Any predecessor nodes on the path of the root to that node
are called Ancestors of that node.
 Descendant: Any successor node on the path from the leaf node to that node.
 Sibling: Children of the same parent node are called siblings.
 Level of a node: The count of edges on the path from the root node to that node.
The root node has level 0.
 Internal node: A node with at least one child is called Internal Node.
 Neighbour of a Node: Parent or child nodes of that node are called neighbors of
that node.
 Subtree: Any node of the tree along with its descendant.

4.1.3 Properties of a Tree


 Number of edges: An edge can be defined as the connection between two
nodes. If a tree has N nodes, then it will have (N-1) edges. There is only one
path from each node to any other node of the tree.
 Depth of a node: The depth of a node is defined as the length of the path from
the root to that node. Each edge adds 1 unit of length to the path. So, it can also
be defined as the number of edges in the path from the root of the tree to the
node.
 Height of a node: The height of a node can be defined as the length of the
longest path from the node to a leaf node of the tree.
 Height of the Tree: The height of a tree is the length of the longest path from
the root of the tree to a leaf node of the tree.
 Degree of a Node: The total count of subtrees attached to that node is called the
degree of the node. The degree of a leaf node must be 0. The degree of a tree is
the maximum degree of a node among all the nodes in the tree.

4.1.4 Syntax for creating a node


struct Node
{
int data.
struct Node *left_child;
struct Node *right_child;
};

4.2 BINARY TREES


 Binary Tree is defined as a Tree data structure with at most 2 children. Since
each element in a binary tree can have only 2 children, we typically name them
the left and right child.

4.2.1 Binary Tree Representation


 A Binary tree is represented by a pointer to the topmost node of the tree. If the
tree is empty, then the value of the root is NULL.
 Binary Tree node contains the following parts:
 Data
 Pointer to left child
 Pointer to right child

4.2.2 Types of Binary Trees


 There are various types of binary trees, and each of these binary tree types has
unique characteristics. Here are each of the binary tree types in detail:
 Full Binary Tree
 It is a special kind of a binary tree that has either zero children or two
children. It means that all the nodes in that binary tree should either
have two child nodes of its parent node or the parent node is itself the
leaf node or the external node.

 Complete Binary Tree


 A complete binary tree is another specific type of binary tree where all
the tree levels are filled entirely with nodes, except the lowest level of
the tree. Also, in the last or the lowest level of this binary tree, every
node should possibly reside on the left side.
 Perfect Binary Tree
 A binary tree is said to be ‘perfect’ if all the internal nodes have strictly two
children, and every external or leaf node is at the same level or same depth
within a tree. A perfect binary tree having height ‘h’ has 2h – 1 node.

 Balanced Binary Tree


 A balanced binary tree, also referred to as a height-balanced binary tree, is
defined as a binary tree in which the height of the left and right subtree of
any node differs by not more than 1.

 Degenerate Binary Tree


 A binary tree is said to be a degenerate binary tree or pathological binary
tree if every internal node has only a single child.

4.2.3 Benefits of Binary Trees:


 The search operation in a binary tree is faster as compared to other trees
 Only two traversals are enough to provide the elements in sorted order
 It is easy to pick up the maximum and minimum elements
 Graph traversal also uses binary trees
 Converting different postfix and prefix expressions are possible using binary trees

4.3 TREE TRAVERSAL


 Tree traversal means visiting each node of the tree. The tree is a non-linear data
structure, and therefore its traversal is different from other linear data structures.
 There is only one way to visit each node/element in linear data structures, i.e.
starting from the first value and traversing in a linear order.

4.3.1 Types of Tree Traversal


 Preorder traversal
 In a preorder traversal, we process/visit the root node first. Then we traverse
the left subtree in a preorder manner. Finally, we visit the right subtree again
in a preorder manner.
 For example, consider the following tree:

 Here, the root node is A. All the nodes on the left of A are a part of the left
subtree whereas all the nodes on the right of A are a part of the right subtree.
Thus, according to preorder traversal, we will first visit the root node, so A
will print first and then move to the left subtree.
 B is the root node for the left subtree. So B will print next, and we will visit
the left and right nodes of B. In this manner, we will traverse the whole left
subtree and then move to the right subtree. Thus, the order of visiting the
nodes will be A→B→C→D→E→F→G→H→I.
 Algorithm for Preorder Traversal
o for all nodes of the tree:
 Step 1: Visit the root node.
 Step 2: Traverse left subtree recursively.
 Step 3: Traverse right subtree recursively.
 Pseudo-code for Preorder Traversal
void Preorder(struct node* ptr)
{
if(ptr != NULL)
{
printf("%d", ptr->data);
Preorder(ptr->left);
Preorder(ptr->right);
}
}
 Uses of Preorder Traversal
o If we want to create a copy of a tree, we make use of preorder
traversal.
o Preorder traversal helps to give a prefix expression for the
expression tree.
 Inorder Traversal
 In an inorder traversal, we first visit the left subtree, then the root node and
then the right subtree in an inorder manner.
 Consider the following tree:
 In this case, as we visit the left subtree first, we get the node with the value
30 first, then 20 and then 40. After that, we will visit the root node and print
it. Then comes the turn of the right subtree. We will traverse the right
subtree in a similar manner. Thus, after performing the inorder traversal, the
order of nodes will be 30→20→40→10→50→70→60→80.
 Algorithm for Inorder Traversal
o for all nodes of the tree:
 Step 1: Traverse left subtree recursively.
 Step 2: Visit the root node.
 Step 3: Traverse right subtree recursively.
 Pseudo-code for Inorder Traversal
void Inorder(struct node* ptr)
{
if(ptr != NULL)
{
Inorder(ptr->left);
printf("%d", ptr-
>data); Inorder(ptr-
>right);
}
}
 Uses of Inorder Traversal
o It helps to delete the tree.
o It helps to get the postfix expression in an expression tree.
 Postorder Traversal
 Postorder traversal is a kind of traversal in which we first traverse the left
subtree in a postorder manner, then traverse the right subtree in a postorder
manner and at the end visit the root node.
 For example, in the following tree:
 The postorder traversal will be 7→5→4→20→60→30→10.
 Algorithm for Postorder Traversal
o or all nodes of the tree:
 Step 1: Traverse left subtree recursively.
 Step 2: Traverse right subtree recursively.
 Step 3: Visit the root node.
 Pseudo-code for Postorder Traversal
void Postorder(struct node* ptr)
{
if(ptr != NULL)
{
Postorder(ptr->left);
Postorder(ptr->right);
printf(“%d”, ptr->data);
}
}
 Uses of Postorder Traversal
o It helps to delete the tree.
o It helps to get the postfix expression in an expression tree.
4.4 EXPRESSION TREES
 The expression tree is a tree used to represent the various expressions. The tree
data structure is used to represent the expressional statements. In this tree, the
internal node always denotes the operators. The leaf nodes always denote the
operands.
 For example, expression tree for 3 + ((5+9)*2) would be:

4.4.1 Properties of an Expression tree


 In this tree, the internal node always denotes the operators.
 The leaf nodes always denote the operands.
 The operations are always performed on these operands.
 The operator present in the depth of the tree is always at the highest priority.
 The operator, which is not much at the depth in the tree, is always at the lowest
priority compared to the operators lying at the depth.
 The operand will always present at a depth of the tree; hence it is considered the
highest priority among all the operators.

4.4.2 Construction of Expression Tree


 Let us consider a postfix expression is given as an input for constructing an
expression tree. Following are the step to construct an expression tree:
 Read one symbol at a time from the postfix expression.
 Check if the symbol is an operand or operator.
 If the symbol is an operand, create a one node tree and push a pointer onto a
stack
 If the symbol is an operator, pop two pointers from the stack namely T1 &
T2 and form a new tree with root as the operator, T1 & T2 as a left and right
child
 A pointer to this new tree is pushed onto the stack
 Thus, an expression is created or constructed by reading the symbols or numbers
from the left. If operand, create a node. If operator, create a tree with operator as
root and two pointers to left and right subtree

4.4.3 Example - Postfix Expression Construction


 The input is: a b + c *
o The first two symbols are operands, we create one-node tree and push a
pointer to them onto the stack.

o Next, read a'+' symbol, so two pointers to tree are popped, a new tree is
formed and push a pointer to it onto the stack.

o Next, 'c' is read, we create one node tree and push a pointer to it onto the
stack.
o Finally, the last symbol is read ' * ', we pop two tree pointers and form a
new tree with a, ' * ' as root, and a pointer to the final tree remains on
the stack.

4.4.4 Implementation of Expression tree in C Programming language


// C program for expression tree implementation
#include <stdio.h>
#include <stdlib.h>
/* The below structure node is defined as a node of a binary tree consists
of left child and the right child, along with the pointer next which points to the
next node */
struct node
{
char info ;
struct node* l ;
struct node* r ;
struct node* nxt ;
};
struct node *head=NULL;
/* Helper function that allocates a new node with
the given data and NULL left and right pointers. */
struct node* newnode(char data)
{
struct node* node = (struct node*) malloc ( sizeof ( struct node ) )
; node->info = data ;
node->l = NULL ;
node->r = NULL ;
node->nxt = NULL
; return ( node ) ;
}
void Inorder(struct node* node)
{
if ( node == NULL)
return ;
else
{
/* first recur on left child */
Inorder ( node->l ) ;
/* then print the data of node */
printf ( "%c " , node->info ) ;
/* now recur on right child */
Inorder ( node->r ) ;
}
}
void push ( struct node* x )
{
if ( head == NULL )
head = x ;
else
{
( x )->nxt = head ;
head = x ;
}
// struct node* temp ;
// while ( temp != NULL )
// {
// printf ( " %c " , temp->info ) ;
// temp = temp->nxt ;
// }
}
struct node* pop()
{
// Poping out the top most [pointed with head] element
struct node* n = head ;
head = head->nxt ;
return n ;
}
int main()
{
char t[] = { 'X' , 'Y' , 'Z' , '*' , '+' , 'W' , '/' } ;
int n = sizeof(t) / sizeof(t[0]) ;
int i ;
struct node *p , *q , *s ;
for ( i = 0 ; i < n ; i++ )
{
// if read character is operator then popping two
// other elements from stack and making a binary
// tree
if ( t[i] == '+' || t[i] == '-' || t[i] == '*' || t[i] == '/' || t[i] == '^' )
{
s = newnode ( t [ i ] ) ;
p = pop() ;
q = pop()
; s->l = q ;
s->r = p;
push(s);
}
else {
s = newnode ( t [ i ] ) ;
push ( s ) ;
}
}
printf ( " The Inorder Traversal of Expression Tree: " ) ;
Inorder ( s ) ;
return 0 ;
}
The output of the above program is
X+Y*Z/W

4.4.5 Use of Expression tree


 The main objective of using the expression trees is to make complex expressions
and can easily be evaluated using these expression trees.
 It is also used to find out the associativity of each operator in the expression.
 It is also used to solve the postfix, prefix, and infix expression evaluation.

4.5 BINARY SEARCH TREE


 In a Binary search tree, the value of left node must be smaller than the parent
node, and the value of right node must be greater than the parent node. This rule
is applied recursively to the left and right sub trees of the root.
 In the above figure, we can observe that the root node is 40, and all the nodes of
the left subtree are smaller than the root node, and all the nodes of the right
subtree are greater than the root node.
 Similarly, we can see the left child of root node is greater than its left child and
smaller than its right child. So, it also satisfies the property of binary search tree.
Therefore, we can say that the tree in the above image is a binary search tree.

4.5.1 Advantages of Binary search tree


 Searching an element in the Binary search tree is easy as we always have a hint
that which subtree has the desired element.
 As compared to array and linked lists, insertion and deletion operations are faster
in BST.

4.5.2 Example of creating a binary search tree


 Now, let's see the creation of binary search tree using an example. Suppose the
data elements are : 45, 15, 79, 90, 10, 55, 12, 20, 50
o First, we have to insert 45 into the tree as the root of the tree.
o Then, read the next element; if it is smaller than the root node, insert it as
the root of the left subtree, and move to the next element.
o Otherwise, if the element is larger than the root node, then insert it as the
root of the right subtree.
 Now, let's see the process of creating the Binary search tree using the given data
element. The process of creating the BST is shown below
 Step 2 - Insert 15.
o As 15 is smaller than 45, so insert it as the root node of the left subtree.

 Step 3 - Insert 79.


o As 79 is greater than 45, so insert it as the root node of the right subtree.

 Step 4 - Insert 90.


o 90 is greater than 45 and 79, so it will be inserted as the right subtree of 79.
 Step 5 - Insert 10
o 10 is smaller than 45 and 15, so it will be inserted as a left subtree of 15.

 Step 6 - Insert 55
o 55 is larger than 45 and smaller than 79, so it will be inserted as the left
subtree of 79.
 Step 7 - Insert 12
o 12 is smaller than 45 and 15 but greater than 10, so it will be inserted as the
right subtree of 10.

 Step 8 - Insert 20
o 20 is smaller than 45 but greater than 15, so it will be inserted as the right
subtree of 15.
 Step 9 - Insert 50.
o 50 is greater than 45 but smaller than 79 and 55. So, it will be inserted as a
left subtree of 55.

 Now, the creation of binary search tree is completed.

4.5.3 Operations performed on a Binary Search Tree


 We can perform insert, delete and search operations on the binary search tree.
4.5.3.1 Searching in Binary search tree
 Searching means to find or locate a specific element or node in a data structure.
In Binary search tree, searching a node is easy because elements in BST are
stored in a specific order.

4.5.3.1.1 Steps involved in Searching in a Binary Search Tree


 First, compare the element to be searched with the root element of the tree.
 If root is matched with the target element, then return the node's location.
 If it is not matched, then check whether the item is less than the root element, if
it is smaller than the root element, then move to the left subtree.
 If it is larger than the root element, then move to the right subtree.
 Repeat the above procedure recursively until the match is found.
 If the element is not found or not present in the tree, then return NULL.
 Now, let's understand the searching in binary tree using an example. We are
taking the binary search tree formed above. Suppose we have to find node
20 from the below tree.
Step1:

Step2:

Step3:
4.5.3.1.2 Algorithm to search an element in Binary search tree
Search (root, item)
Step 1 - if (item = root → data) or (root = NULL)
return root
else if (item < root → data)
return Search(root → left, item)
else
return Search(root → right, item)
END if
Step 2 - END

4.5.3.2 Deletion in Binary Search tree


 In a binary search tree, we must delete a node from the tree by keeping in mind
that the property of BST is not violated. To delete a node from BST, there are
three possible situations occur -
 The node to be deleted is the leaf node, or,
 The node to be deleted has only one child, and,
 The node to be deleted has two children

4.5.3.2.1 When the node to be deleted is the leaf node


 It is the simplest case to delete a node in BST. Here, we have to replace the leaf
node with NULL and simply free the allocated space.
 We can see the process to delete a leaf node from BST in the below image. In
below image, suppose we have to delete node 90, as the node to be deleted is a
leaf node, so it will be replaced with NULL, and the allocated space will free.
4.5.3.2.2 When the node to be deleted has only one child
 In this case, we have to replace the target node with its child, and then delete the
child node. It means that after replacing the target node with its child node, the
child node will now contain the value to be deleted. So, we simply have to
replace the child node with NULL and free up the allocated space.
 We can see the process of deleting a node with one child from BST in the below
image. In the below image, suppose we have to delete the node 79, as the node
to be deleted has only one child, so it will be replaced with its child 55.
 So, the replaced node 79 will now be a leaf node that can be easily deleted.

4.5.3.2.3 When the node to be deleted has two children


 This case of deleting a node in BST is a bit complex among other two cases. In
such a case, the steps to be followed are listed as follows -
o First, find the inorder successor of the node to be deleted.
o After that, replace that node with the inorder successor until the target node
is placed at the leaf of tree.
o And at last, replace the node with NULL and free up the allocated space.
 The inorder successor is required when the right child of the node is not empty.
We can obtain the inorder successor by finding the minimum element in the
right child of the node.
 We can see the process of deleting a node with two children from BST in the
below image.
 In the below image, suppose we have to delete node 45 that is the root node, as
the node to be deleted has two children, so it will be replaced with its inorder
successor. Now, node 45 will be at the leaf of the tree so that it can be deleted
easily.
4.5.3.3 Insertion in Binary Search tree
 A new key in BST is always inserted at the leaf. To insert an element in BST, we
have to start searching from the root node; if the node to be inserted is less than
the root node, then search for an empty location in the left subtree.
 Else, search for the empty location in the right subtree and insert the data. Insert
in BST is similar to searching, as we always have to maintain the rule that the
left subtree is smaller than the root, and right subtree is larger than the root.

4.5.3.4 The complexity of the Binary Search tree


 Let's see the time and space complexity of the Binary search tree. We will see
the time complexity for insertion, deletion, and searching operations in best
case, average case, and worst case.
4.5.3.5 Implementation of Binary search tree
#include <iostream>
using namespace
std; struct Node {
int data;
Node *left;
Node *right;
};
Node* create(int item)
{
Node* node = new
Node; node->data =
item;
node->left = node->right = NULL;
return node;
}
/*Inorder traversal of the tree formed*/
void inorder(Node *root)
{
if (root == NULL)
return;
inorder(root->left); //traverse left subtree
cout<< root->data << " "; //traverse root node
inorder(root->right); //traverse right subtree
}
Node* findMinimum(Node* cur) /*To find the inorder successor*/
{
while(cur->left != NULL)
{ cur = cur->left;
}
return cur;
}
Node* insertion(Node* root, int item) /*Insert a node*/
{
if (root == NULL)
return create(item); /*return new node if tree is empty*/
if (item < root->data)
root->left = insertion(root->left, item);
else
root->right = insertion(root->right, item);
return root;
}
void search(Node* &cur, int item, Node* &parent)
{
while (cur != NULL && cur->data != item)
{
parent = cur;
if (item < cur->data)
cur = cur->left;
else
cur = cur->right;
}
}
void deletion(Node*& root, int item) /*function to delete a node*/
{
Node* parent = NULL;
Node* cur = root;
search(cur, item, parent); /*find the node to be deleted*/
if (cur == NULL)
return;
if (cur->left == NULL && cur->right == NULL) /*When node has no
children*/
{
if (cur != root)
{
if (parent->left == cur)
parent->left =
NULL;
else
parent->right = NULL;
}
else
root = NULL;
free(cur);
}
else if (cur->left && cur->right)
{
Node* succ = findMinimum(cur->right);
int val = succ->data;
deletion(root, succ-
>data); cur->data = val;
}
else
{
Node* child = (cur->left)? cur->left: cur->right;
if (cur != root)
{
if (cur == parent-
>left) parent->left =
child;
else
parent->right = child;
}
else
root = child;
free(cur);
}
}
int main()
{
Node* root = NULL;
root = insertion(root,
45); root =
insertion(root, 30); root
= insertion(root, 50);
root = insertion(root,
25); root =
insertion(root, 35); root
= insertion(root, 45);
root = insertion(root,
60); root =
insertion(root, 4);
printf("The inorder traversal of the given binary tree is - \n");
inorder(root);
deletion(root, 25);
printf("\nAfter deleting node 25, the inorder traversal of the given binary tree is
- \n");
inorder(root);
insertion(root,
2);
printf("\nAfter inserting node 2, the inorder traversal of the given binary tree is
- \n");
inorder(root);
return 0;
}
Output

4.6 HASHING
 Hashing in the data structure is a technique of mapping a large chunk of data
into small tables using a hashing function. It is also known as the message digest
function. It is a technique that uniquely identifies a specific item from a
collection of similar items.
 It uses hash tables to store the data in an array format. Each value in the array
has been assigned a unique index number. Hash tables use a technique to
generate these unique index numbers for each value stored in an array format.
This technique is called the hash technique.
 You only need to find the index of the desired item, rather than finding the data.
With indexing, you can quickly scan the entire list and retrieve the item you
wish. Indexing also helps in inserting operations when you need to insert data at
a specific location. No matter how big or small the table is, you can update and
retrieve data within seconds.
 The hash table is basically the array of elements, and the hash techniques of
search are performed on a part of the item i.e. key. Each key has been mapped to
a number, the range remains from 0 to table size 1
 Types of hashing in data structure is a two-step process.
o The hash function converts the item into a small integer or hash value. This
integer is used as an index to store the original data.
o It stores the data in a hash table. You can use a hash key to locate data
quickly.
4.6.1 Examples
 In schools, the teacher assigns a unique roll number to each student. Later, the
teacher uses that roll number to retrieve information about that student.
 A library has an infinite number of books. The librarian assigns a unique number
to each book. This unique number helps in identifying the position of the books
on the bookshelf.

4.7 HASH FUNCTION


 The hash function in a data structure maps the arbitrary size of data to fixed-
sized data. It returns the following values: a small integer value (also known as
hash value), hash codes, and hash sums. The hashing techniques in the data
structure are very interesting, such as:
o hash = hashfunc(key)
o index = hash % array_size
 The hash function must satisfy the following requirements:
o A good hash function is easy to compute.
o A good hash function never gets stuck in clustering and distributes keys
evenly across the hash table.
o A good hash function avoids collision when two elements or items get
assigned to the same hash value.
 The three characteristics of the hash function in the data structure are:
o Collision free
o Property to be hidden
o Puzzle friendly

4.7.1 Hash Table


 Hashing in data structure uses hash tables to store the key-value pairs. The hash
table then uses the hash function to generate an index. Hashing uses this unique
index to perform insert, update, and search operations.
 It can be defined as a bucket where the data are stored in an array format. These
data have their own index value. If the index values are known then the process
of accessing the data is quicker.

4.7.2 How does Hashing in Data Structure Works?


 In hashing, the hashing function maps strings or numbers to a small integer
value. Hash tables retrieve the item from the list using a hashing function.
 The objective of hashing technique is to distribute the data evenly across an
array. Hashing assigns all the elements a unique key. The hash table uses this
key to access the data in the list.
 Hash table stores the data in a key-value pair. The key acts as an input to the
hashing function. Hashing function then generates a unique index number for
each value stored.
 The index number keeps the value that corresponds to that key. The hash
function returns a small integer value as an output. The output of the hashing
function is called the hash value.
 Let us understand hashing in a data structure with an example. Imagine you need
to store some items (arranged in a key-value pair) inside a hash table with 30
cells. The values are: (3,21) (1,72) (40,36) (5,30) (11,44) (15,33) (18,12) (16,80)
(38,99)
 The hash table will look like the following:
 The process of taking any size of data and then converting that into smaller data
value which can be named as hash value. This hash alue can be used in an index
accessible in hash table. This process define hashing in data structure.

4.8 SEPARATE CHAINING


 Separate Chaining is the collision resolution technique that is implemented using
linked list. When two or more elements are hash to the same location, these
elements are represented into a singly linked list like a chain. Since this method
uses extra memory to resolve the collision, therefore, it is also known as open
hashing.

4.8.1 Separate Chaining Hash Table


 In separate chaining, each slot of the hash table is a linked list. We will insert the
element into a specific linked list to store it in the hash table. If there is any
collision i.e. if more than one element after calculating the hashed value mapped
to the same key then we will store those elements in the same linked list. Given
below is the representation of the separate chaining hash table.

4.8.2 Example for Separate Chaining


 Let's understand with the help of examples. Given below is the hash function:
h(key) = key % table size
 In a hash table with size 7, keys 42 and 38 would get 0 and 3 as hash indices
respectively.
 If we insert a new element 52 , that would also go to the fourth index as 52%7 is
3.

 The lookup cost will be scanning all the entries of the selected linked list for the
required key. If the keys are uniformly distributed, then the average lookup cost
will be an average number of keys per linked list.

4.8.3 How to Avoid Collision in Separate Chaining Method


 Separate chaining method handles the collison by creating a linked list to the
occupied buckets. So far we only looked at a simple hash function where
collision is imminent.
 It is important to choose a good hash function in order to minimize the number
of collisions so that all the key values are evenly distributed in the hash table.
 Some characterstics of good hash function are:
 Minimize collisions
 Be easy and quick to compute
 key values inserted evenly in the hash table
 Have a high load factor for a given set of keys

4.8.4 Practice Problem Based on Separate Chaining


 Let's take an example to understand the concept more clearly. Suppose we have
the following hash function, and we have to insert certain elements in the hash
table by using separate chaining as the collision resolution technique.
 Hash function = key % 6 Elements = 24, 75, 65, 81, 42, and 63.
 Step1: First we will draw the empty hash table which will have possible range
of hash values from 0 to 5 according to the hash function provided.

 Step 2: Now we will insert all the keys in the hash table one by one. First key to
be inserted is 24. It will map to bucket number 0 which is calculated by using
hash function 24%6=0.
 Step 3: Now the next key that is need to be inserted is 75. It will map to the
bucket number 3 because 75%6=3. So insert it to bucket number 3.

 Step 4: The next key is 65. It will map to bucket number 5 because 65%6=5. So,
insert it to bucket number 5.

 Step 5: Now the next key is 81. Its bucket number will be 81%6=3. But bucket 3
is already occupied by key 75. So separate chaining method will handles the
collision by creating a linked list to bucket 3.
 Step 6: Now the next key is 42. Its bucket number will be 42%6=0. But bucket 0
is already occupied by key 24. So separate chaining method will again handles
the collision by creating a linked list to bucket 0.

 Step 7: Now the last key to be inserted is 63. It will map to the bucket number
63%6=3. Since bucket 3 is already occupied, so collision occurs but separate
chaining method will handle the collision by creating a linked list to bucket 3.
 In this way the separate chaining method is used as the collision resolution
technique.

4.8.5 Advantages and Disadvantages of Separate Chaining


Advantages
 Separate Chaining is one of the simplest methods to implement and understand.
 We can add any number of elements to the chain.
 It is frequently used when we don't know about the number of elements and the
number of keys that can be inserted or deleted.
Disadvantages
 The keys in the hash table are not evenly distributed.
 Some amount of wastage of space occurs.
 The complexity of searching becomes O(n) in the worst case when the chain
becomes long.

4.9 OPEN ADDRESSING


 The open addressing is another technique for collision resolution. Unlike
chaining, it does not insert elements to some other data-structures. It inserts the
data into the hash table itself. The size of the hash table should be larger than the
number of keys.
 There are three different popular methods for open addressing techniques. These
methods are −
 Linear Probing
 Quadratic Probing
 Double Hashing

4.10 LINEAR PROBING


 This is a simple method, sequentially tries the new location until an empty
location is found in the table.
 For example: inserting the keys {79, 28, 39, 68, 89} into closed hash table by
using same function and collision resolution technique as mentioned before and
the table size is 10 ( for easy undestanding we are not using prime number for
table size).Here array or hash table is considered circular because when the last
slot reached an empty location not found then the search proceeds to the first
location of the array.
 The hash function is hi(X)=(Hash(X)+F(i)) % TableSize for i = 0, 1, 2, 3,...etc.

4.10.1 Solution

A Closed Hash Table using Linear Probing


Key Hash Function h(X) Index Collision Alt Index
79 h0(79) = (Hash (79) +F(0)) %10 9
= ((79%10) +0) %10 = 9
28 h0(28) = (Hash (28) +F(0)) %10 8
= ((28%10) +0) %10 = 8
39 h0(39) = (Hash (39) +F(0)) %10 9 First collision
= ((39%10) +0) %10 = 9 occurs
h1(39) = (Hash (39) +F(0)) %10 0 0
= ((39%10) +1) %10 = 0
68 h0(68) 8 first
= (Hash(68)+F(0))%10 collision
= ((68%10)+0)%10 =8 occurs
h1(68) 9 Again
= (Hash(68)+F(1))%10 collision
= ((68%10)+1)%10 =9 occurs
h2(68) 0 Again
= (Hash(68)+F(2))%10 collision
= ((68%10)+2)%10 =0 occurs
h3(68) 1 1
= (Hash(68)+F(3))%10
= ((68%10)+3)%10 =1
89 h0(89) 9 collision
= (Hash(89)+F(0))%10 occurs
= ((89%10)+0)%10 =9
h1(89) 0 Again
= (Hash(89)+F(1))%10 collision
= ((89%10)+1)%10 =0 occurs
h2(89) 1 Again
= (Hash(89)+F(2))%10 collision
= ((89%10)+2)%10 =1 occurs
h3(89) 2 2
= (Hash(89)+F(3))%10
= ((89%10)+3)%10 =2

4.11 QUADTRATIC PROBING


 Quadratic probing is an open addressing method for resolving collision in the
hash table. This method is used to eliminate the primary clustering problem of
linear probing.
 This technique works by considering of original hash index and adding
successive value of an arbitrary quadratic polynomial until the empty location is
found. In linear probing, we would use H+0, H+1, H+2, H+3,.....H+K hash
function sequence.
 Instead of using this sequence, the quadratic probing would use another
sequence is that H+12, H+22, H+32,....H+K2. Therefore, the hash function for
quadratic probing is
 Let us examine the linear probing with the same example

Key Hash Function h(X) Index Collision Alt Index


79 h0(79) 9
= (Hash (79) + F(0)2) % 10
= ((79 % 10) + 0) % 10
28 h0(28) 8
= (Hash (28) + F(0)2) % 10
= ((28 % 10) + 0) % 10
39 h0(39) 9 The first
= (Hash (39) + F(0)2) % 10 collision
occurs
= ((39 % 10) + 0) % 10
h1(39) 0 0
= (Hash(39) + F(1)2) % 10
= ((39 % 10) + 1) % 10
68 h0(68) 8 The
= (Hash (68) + F(0)2) % 10 collision
= ((68 % 10) + 0) % 10 occurs
h1(68) 9 Again
2
= (Hash (68) + F(1) ) % 10 collision
occurs
= ((68 % 10) + 1) % 10
h2(68) 2 2
= (Hash(68) + F(2)2) % 10
= ((68 % 10) + 4) % 10
89 h0(89) 9 The
= (Hash(89) + F(0)2) % 10 collision
= ((89 % 10) + 0) % 10 occurs
h1(89) 0 Again
2
= (Hash(89) + F(1) ) % 10 collision
occurs
= ((89 % 10) + 1) % 10
h2(89) 3 3
= ( Hash(89) + F(2)2) % 10
= ((89 % 10) + 4) % 10

 Although, the quadratic probing eliminates the primary clustering, it still has the
problem.
 When two keys hash to the same location, they will probe to the same alternative
location. This may cause secondary clustering. In order to avoid this secondary
clustering, double hashing method is created where we use extra multiplications
and divisions

4.12 DOUBLE HASHING


 Double Hashing uses 2 hash functions and hence called double hashing. The first
hash function determines the initial location to locate the key and the second
hash function is to determine the size of the jumps in the probe sequence.

4.12.1 Double Hashing - Hash Function 1 or First Hash Function – formula


 hi = (Hash(X) + F(i)) % Table
Size were
 F(i) = i * hash2(X)
 X is the Key or the Number for which the hashing is done
 i is the ith time that hashing is done for the same value. Hashing is
repeated only when collision occurs
 Table size is the size of the table in which hashing is done
 This F(i) will generate the sequence such as hash2(X), 2 * hash2(X) and so on.

4.12.2 Double Hashing - Hash Function 2 or Second Hash Function – formula


 Second hash function is used to resolve collision in hashing We use second hash
function as
o hash2(X) = R - (X mod R)
 where
 R is the prime number which is slightly smaller than the Table Size.
 X is the Key or the Number for which the hashing is done

4.12.3 Double Hashing Example - Closed Hash Table


 Let us consider the same example in which we choose R = 7.

A Closed Hash Table using Double Hashing


Key Hash Function h(X) Index Collision At Index
79 h0(79)=(Hash(79)+F(0))% 10 9
= ((79 % 10) + 0) % 10 = 9
28 h0(28)=(Hash(28)+F(0))% 10 8
= ((28 % 10) + 0) % 10 = 8
39 h0(39)=(Hash(39)+F(0))% 10 9 First collision
= ((39 % 10) + 0) % 10 = 9 occurs
h1(39)=(Hash(39)+F(1))%10 2 2
=((39% 10)+1(7-(39 % 7))) % 10
= (9 + 3) % 10 =12 % 10=2
68 h0(68) = (Hash (68) + F(0)) % 10 8 collision
= ((68 % 10) + 0) % 10 = 8 occurs
h1(68) =(Hash(68) +F(1)) % 10 0 0
= ((68 % 10) +1(7-(68 % 7))) % 10
= (8 + 2) % 10 =10 % 10=0
89 h0(89) =(Hash(89) + F(0)) % 10 9 Collision
= ((89 % 10) + 0) % 10 = 9 occurs
h1(89) = (Hash(89) + F(1)) % 10 0 Again
= ((89 % 10) + 1(7-(89 % 7))) % 10 collision
= (9 + 2) % 10 =10 % 10 =0 occurs
h2(89) = (Hash(89) + F(2)) % 10 3 3
= ((89 % 10) + 2(7-(89 % 7))) % 10
= (9 + 4) % 10=13 % 10 =3

 The problem with linear probing is primary clustering. This means that even if
the table is empty, any key that hashes to table requires several attempt to
resolve the collision because it has to cross over the blocks of occupied cell.
 These blocks of occupied cell form the primary clustering. If any key falls into
clustering, then we cannot predict the number of attempts needed to resolve the
collision. These long paths affect the performance of the hash table.
4.13 RE-HASHING
 Rehashing is the process of re-calculating the hashcode of already stored entries
(Key-Value pairs), to move them to another bigger size hashmap when the
threshold is reached/crossed.

4.13.1 Why Rehashing is done?


 Rehashing is done because whenever a new key value pair is inserted into map,
the load factor increases and due to which complexity also increases. And if
complexity increases our HashMap will not have constant O(1) time complexity.
 Hence rehashing is done to distribute the items across the hashmap as to reduce
both laod factor and complexity, So that get() and put() have constant time
complexity of O(1).
 After rehashing is done existing items may fall in the same bucket or different
bucket.

4.13.2 What is Load factor in HashMap?


 Load factor in HashMap is basically a measure that decides when exactly to
increase the size of the HashMap to maintain the same time complexity of O(1).
 Load factor is defined as (m/n) where n is the total size of the hash table and m is
the preferred number of entries which can be inserted before an increment in the
size of the underlying data structure is required.
 If you are going to store really large no of elements in the hashmap then it is
always good to create HashMap with sufficient capacity upfront as rehashing will
not be done frequently, this is more efficient than letting it to perform automatic
rehashing.

4.13.3 How Rehashing is Done?


 Let’s try to understand the above with an example: Say we had HashTable with
Initial Capacity of 4. We need to insert 4 Keys: 100, 101, 102, 103
 Hash function used was division method: Key % ArraySize
 Element1: Hash(100) = 100%6 = 4, so Element1 will be rehashed and will be
stored at 5th Index in this newly resized HashTable, instead of 1st Index as on
previous HashTable.
 Element2: Hash(101) = 101%6 = 5, so Element2 will be rehashed and will be
stored at 6th Index in this newly resized HashTable, instead of 2nd Index as on
previous HashTable.
 Element3: Hash(102) = 102%6 = 6, so Element3 will be rehashed and will be
stored at 4th Index in this newly resized HashTable, instead of 3rd Index as on
previous HashTable.
 Since the Load Balance now is 3/6 = 0.5, we can still insert the 4th element now.
 Element4: Hash(103) = 103%6 = 1, so Element4 will be stored at 1st Index in this
newly resized HashTable.

4.13.4 Rehashing Steps


 For each addition of a new entry to the map, check the current load factor.
 If it’s greater than its pre-defined value, then Rehash.
 For Rehash, make a new array of double the previous size and make it the new
bucket array.
 Then traverse to each element in the old bucketArray and insert them back so as
to insert it into the new larger bucket array.

 If you are going to store a really large number of elements in the HashTable then
it is always good to create a HashTable with sufficient capacity upfront as this is
more efficient than letting it perform automatic rehashing.

You might also like