Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

R23_DS_Unit V-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Unit V [R23]

Trees: Introduction to Trees, Binary Search Tree – Insertion, Deletion & Traversal.
Hashing: Brief introduction to hashing and hash function, Collision resolution techniques: Chaining
and open addressing, Hash tables: basic implementation and operations, Applications of hashing in
unique identifier generation, caching etc.
Introduction to Trees:
 A Tree is a non linear advanced data structure which organizes the data in a hierarchical
structure.
 It represents nodes connected by edges and these nodes are having parent-child relationships.
Definition: A tree can be defined as collection of finite number of nodes in which one node is
designated as root and all other nodes are partitioned into n>=0 disjoint sets T1, T2, …..Tn, where
each of these sets is a tree. We can call T1, T2,………,Tn are sub-trees of root.

 The above figure represents a tree


 Node A is root of tree and it has two sub-trees whose roots are B and C
 A is parent of B, C
 B is child of A and also parent of D, E
Basic terminology:
Name Description
Node It is an individual element of a tree. Every node stores data and links to the next
elements in the hierarchical structure.
Root It is a special node in tree. The entire tree is referenced through it only. It does
not have parent.
Successor & If there is an edge from A to B then A is called Predecessor or ancestor of B and
Predecessor B is called Successor or descendent of A
Parent Node It is immediate predecessor of a node
Child Node All immediate successors of a node are its children
Siblings Nodes with same parent are called siblings
Edge It is a connection between one to another node. It is represented as line between
two nodes.
Path It is a number of successive edges from source node to destination node
Height of Node It represents number of edges on the longest path between that node and leaf.
Height of Tree It represents height of root node.
Depth of Node It represents the number of edges from root to that node
Degree of Node It represents the number of children of that node
Level of Node It refers to the distance between node and the root of tree
Leaf It is the node with no children, sometimes these are also called as external nodes.
Internal nodes These are the nodes which are not leaf nodes.

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 1


Binary Trees:
 In a normal tree, every node has any number of children. But binary tree is a special type of
tree data structure in which every node has maximum of 2 children nodes. One is known as
left child and another is known as right child.
 In a binary tree, every node has either 0 children or 1 child or 2 children nodes
Type of Binary trees: There are different types of binary trees.
1. Strictly Binary Tree
2. Complete Binary Tree
3. Extended Binary Tree
4. Binary Search Tree (BST)
1. Strictly Binary Tree: A binary tree in which every internal node has either zero or exactly
two children nodes is called as Strictly Binary Tree. This is also called Full Binary Tree or
Proper Binary Tree or 2 – Tree.
2. Complete Binary Tree: A binary tree in which every internal node has exactly two children
and all the leaf nodes are at the same level is called Complete Binary Tree.
3. Extended Binary Tree: A binary tree can converted into full binary tree by adding some
dummy nodes where ever required. So the full binary tree obtained by adding some dummy
nodes to a binary tree is called as Extended Binary Tree.

Representation of binary trees:


A Binary Tree data structure can be represented in two methods, include
1. Array Representation
2. Linked list Representation
1) Array Representation(Sequential Representation): In this representation, we use one –
dimensional array to represent a binary tree. This representation follows the following rules:
 A one-dimensional array is used to store elements of tree.
 The root of tree is always at first location.
 The children nodes of an element at location k will be stored at 2k and 2k+1 locations.
The following is a binary tree with its array representation:

2) Linked List Representation: In this representation, every node will have 3 fields, data field,
a pointer field to left child and a pointer field to right child. So we use a doubly linked list to
represent a binary tree. In C the binary tree node structure is given as

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 2


struct node{
int data;
struct node *left;
struct node *right; };
The following is the binary tree with linked list representation:

Binary Search Tree (BST):


 It is a special binary tree in which every node contains only smaller values in its left subtree
and only larger values in its right sub tree.
 This tree is mainly focus on search operation in binary tree.

Definition: It is an empty or all nodes follow the following properties,


1. Every element has unique key
2. The keys in nonempty left sub tree are smaller than the key in the root of sub tree.
3. The keys in nonempty right sub tree are larger than the key in the root of sub tree.
4. The left and right sub trees are binary search trees.

Operations on BST: The basic operations that can be performed on BST are given as

1. Search: It is searching an element in BST


2. Insertion: Insert a new element into BST
3. Deletion: Delete an element from BST

Algorithm for Search Operation:


Step 1 - Read the search element from the user.
Step 2 - Compare the search element with the value of root node in the tree.
Step 3 - If both are matched, then display "Element found" and terminate the function
Step 4 - If both are not matched, then check whether search element is smaller or larger than
that node value.
Step 5 - If search element is smaller, then continue the search process in left subtree.
Step 6- If search element is larger, then continue the search process in right subtree.
Step 7 - Repeat the same until we find the exact element or until the search element is
compared with the leaf node
Step 8 - If we reach to the node having the value equal to the search value then display
"Element is found" and terminate the function.
Step 9 - If we reach to the leaf node and if it is also not matched with the search element,
then display "Element is not found" and terminate the function.

Recursive subprogram for Search Operation:


tree_ptr search(tree_ptr root, int key)
{
if(root==NULL)
return NULL;
if(key<root-> data return search(root ->lchild, key);
if(key>root -> data return search(root ->rchild, key);
}

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 3


Non-recursive subprogram for Search Operation:
tree_ptr search(tree_ptr root, int key)
{
while(root)
{
if(root -> data ==key) return root;
else if(key < root -> data)
root= root ->lchild;
else
root = root ->rchild;
}
return NULL;
}
Algorithm for Insertion Operation:
Step 1 - Create a newNode with given value and set its left and right to NULL.
Step 2 - Check whether tree is Empty.
Step 3 - If the tree is Empty, then set root to newNode.
Step 4 - If the tree is Not Empty, then check whether the value of newNode
is smaller or larger than the node (here it is root node).
Step 5 - If newNode is smaller than or equal to the node then move to its left child. If
newNode is larger than the node then move to its right child.
Step 6- Repeat the above steps until we reach to the leaf node (i.e., reaches to NULL).
Step 7 - After reaching the leaf node, insert the newNode as left child if the newNode
is smaller or equal to that leaf node or else insert it as right child

Subprogram to Insert a node in Binary Tree:


void insert(tree_ptr root, int x)
{
struct node *temp= (struct node *)malloc(sizeof(struct node));
struct node *cur, *par;
temp -> data = x;
temp ->lchild = NULL;
temp ->rchild = NULL;
if(root == NULL)
root =temp;
else
{
cur=root;
par = NULL;
while(1)
{
par= cur;
if( x< par -> data)
{
cur = cur ->lchild;
if (cur == NULL)
{
par ->lchild = temp;
return;
}
}
else
{
cur = cur ->rchild;
if( cur == NULL)

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 4


{
par ->rchild = temp;
return;
}
}
}/*while end*/
}/*else end*/
}
Algorithm for deletion:
Deleting a node in BST is more difficult operation. This deletion operation in BST has 3
case, as follows:
1. Deleting leaf node
2. Deleting node with one child
3. Deleting node with two children
Deleting a leaf node: We can use the following steps to delete a leaf node from BST.
Step 1: find the node to delete by search function.
Step 2: delete the node by using free function and stop the process.

Deleting a node with one child: We can use the following steps to delete a node with one
child.
Step 1: Find the node to delete by search function
Step 2: If it has one child then delete the node using free function and replace it with its child.
Step 3: terminate the function

Deleting a node with two children: We can use the following steps to delete a node with
two children.
Step 1: Find the node to be delete by search function.
Step 2: If it has two children, then find the largest node in left sub tree or smallest node in
right sub tree.
Step 3: Swap the deleting node and node found in step 2
Step 4: Then check whether deleting node came to case 1 or case 2, otherwise goto step 2
Step 5: If it comes to case 1, use case 1 logic to delete it.
Step 6: If it comes to case 2, use case 2 logic to delete it.
Step 7: Repeat the same process until node is deleted from the tree.

Binary Search Tree Traversals:


When we want to display a BST, we need to follow some order in which all nodes of BST must
be displayed. In any binary tree displaying order of nodes depends on a method called Traversal
method. There are 3 types of binary traversals.
1. In-order Traversal
2. Pre-order Traversal
3. Post-order Traversal
1. In-order Traversal:
In In-Order traversal, the root node is visited between the left child and right child. In this
traversal, the left child node is visited first, then the root node is visited and later we go for
visiting the right child node. This in-order traversal is applicable for every root node of all
subtrees in the tree. This is performed recursively for all nodes in the tree.
Algorithm:
Step 1: Recursively traverse left sub-tree
Step 2: Visit root
Step 3: Recursively traverse right sub-tree

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 5


Program segment:
void Inorder(struct Node *root)
{
if(root != NULL){
Inorder(root->left);
printf("%d\t",root->data);
Inorder(root->right);
}
}
2. Pre-order Traversal:
In Pre-Order traversal, the root node is visited before the left child and right child nodes. In this
traversal, the root node is visited first, then its left child and later its right child. This pre-order
traversal is applicable for every root node of all subtrees in the tree.
Algorithm:
Step 1: Visit root
Step 2: Recursively traverse left sub-tree
Step 3: Recursively traverse right sub-tree
Program segment:
void Preorder(struct Node *root)
{
if(root != NULL){
printf("%d\t",root->data);
Preorder(root->left);
Preorder(root->right);
}
}
3. Post-order Traversal:
In Post-Order traversal, the root node is visited after left child and right child. In this traversal,
left child node is visited first, then its right child and then its root node. This is recursively
performed until the right most node is visited.
Algorithm:
Step 1: Recursively traverse left sub-tree
Step 2: Recursively traverse right sub-tree
Step 3: Visit root
Program segment:
void Postorder(struct Node *root)
{
if(root != NULL){
Postorder(root->left);
Postorder(root->right);
printf("%d\t",root->data);
}}

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 6


Hashing:

 Hashing is the process of generating a value from a list of values using a mathematical
function known as Hash Function.
 This is the best search technique because it will perform the search operation in very less
time than the previous algorithm.
 It will perform the search operation almost in constant average time.
 However this technique requires that the data to be organized in a special manner.

Hash Table:

 The data in hashing is organized with the help of a table known as hash table.
 Actually it is an array of fixed size whose index range from 0 to tablesize – 1, containing the
items.
 Each key is mapped into some manner in the range 0 to tablesize – 1 by using hash function
and placed in the appropriate cell.
 Hence, this hash table a data structure which stores that data in an associative manner.
 In hash table each key has unique index of it.

Hash Function:

 This is the function that transforms or maps a key into hash table index.
 This function should be simple to compute and ensures that any two different keys get
different cells of the hash table.
 If H is the hash function, k is the key then H(k) is called hash of k and it gives the index of
hash table at which the key k should be placed.
 So 0<= H(k)<= m, where m is the size of hash table.

There are several ways to define hash function, some of them are

1. Division Method (Modular Arithmetic)


2. Mid Square Method
3. Folding Method
4. Multiplication Method.
1. Division: This is the simplest method to generate the hash value. In this method, the key is
divided by the size of hash table to get the remainder which acts as an index for the key i.e
H(k) = k % m, where m is size of HT. usually it is a prime number.

2. Mid Square: It is very good hashing method. In this method, hash function, H, is
computed by squaring the identifier, and then using the appropriate number of digits (r) from
middle of the square to obtain the index.
For example: Suppose hash table has 100 locations. So r= 2 because two digits are required
to map the key to the memory location.
K= 60, H(k) = 60 * 60 = 3600. Then take r= 60. So H(60) = 60.

3. Folding Method: In this method, the key is partitioned into parts such that all the parts,
except possible the last parts, are of equal length. The parts are then added in some convenient
way to obtain the hash address.

For example: k=12345, then k1=12, k2=34 and k3=5, and hash key = 12+34+5 = 51

So H(12345) = 51.

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 7


4. Multiplication Method: This method involves the following steps

1. Choose a constant value A such that 0<A<1.


2. Multiply the key value with A.
3. Extract the fractional part of kA.
4. Multiply the result of above step by the size of hash table i. e. m
5. The resulting hash value is obtained by taking the floor of the result obtained in step 4.
i. e H(k) = floor ( m (kA mod 1))

For example k = 12345, then A = 0.357840, m= 100

H(12345) = floor(100(12345* 0.357840 mod 1))

=floor(100(4417.5348 mod 1))

= floor(100(0.5348)) = floor(53.48) = 53

Collision:

 Sometimes the hash function results the same index of hash table for two or more keys then
this situation is said to be collision occurrence.
 So in hashing we must include the algorithm to handle the collisions.
 Hence we must use certain technique to resolve the collision such that keys which yield the
same index should be placed in different cells.

Collision Resolution Techniques: These techniques are classified into 2 categories, like

1. Open Addressing (also called as Closed Hashing)


2. Chaining (also called Open Hashing)

1. Open Addressing: This technique can be implemented in several ways. There are 3 methods
to implement this technique. All these 3 methods only vary in the way to find the vacant cell
when collision occurs. They are

1. Linear Probing
2. Quadratic Probing
3. Double Hashing
1. Linear Probing: This is the simplest method of handling collisions. Suppose we use
modular division for hashing the keys. When collision occurs, according to this technique, we
search sequentially for vacant cell from where collision occurred. This is so called because it
steps sequentially the hash table along the line of cells until it finds a vacant cell.

For example: Consider the following elements are to be inserted into the hash table by using
modular division hash function.

31,4,7,21,5,41,61

If we use division hash function to place keys, we can place in hash table as

H(k) = k % table size and let table size =10

H(31) = 31%10= 1

31
0 1 2 3 4 5 6 7 8 9

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 8


H(4) = 4 % 10 = 4

31 4
0 1 2 3 4 5 6 7 8 9

H(7) = 7 % 10 = 7

31 4 7
0 1 2 3 4 5 6 7 8 9

H(21) = 21%10=1

Now collision occurs, then according this method it will be place at 2nd index

31 21 4 7
0 1 2 3 4 5 6 7 8 9

Problem in Linear Probing: The primary clustering is the main problem in linear probing. i.e
in which the block of data is formed at one end of hash table.

2. Quadratic Probing: This is another collision resolution method that eliminates the primary
clustering problem of linear probing. The idea behind this method is to probe more widely
separated cell instead of adjacent cells. i.e the resolution function in this method is quadratic. i.
f(i) = i2. In other words, suppose collision occurred at x index then it will probe x+12,x+22, x+32
etc.

Problem in Quadratic Probing: However this quadratic probing suffers from a different
clustering problem called secondary clustering. i.e when all keys that has to particular cell should
follow the same sequence in trying to find vacant cell.

3. Double Hashing: This is most efficient resolution method that can eliminate both primary as
well as secondary clustering problems. This is sometimes referred as Rehashing. In double
hashing method, a second hash function is applied to the key when a collision occurs. However
there are two important rules for 2nd hash function, they are

1. It must never evaluate to zero


2. It must make sure that all cells can be probed.

Some experts discovered that the functions for this technique which are

H1(k) = k %table size

H2(k) = M – (k % M) where M is prime number smaller than the table size.

For example: consider the following elements to be place in has table of size 10.

37, 90,45,22,17, 49 55

For inserting 37, 90, 45 and 22, H1(k) is enough i.e

H1(37) = 37 %10 7; H1(90) = 90 %10 = 0 ; H1(45) = 45 %10 = 5

Now if 17 is to be inserted then H1(17) = 17 %10 = 7, now collision occurred because that cell
already occupied by 37. So we use second has function to resolve that collision. Hence

H2(17) = 7 – (17 %7) = 7 – 3 = 4.

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 9


Therefore we have to insert 17 at 4 places from 37. Suppose if that is also occupied then take 4
jumps from there and so on. Similarly H2(55) = 7 –(55%7) = 7 – 6 = 1. The hash table becomes

90 17 22 45 55 37 49
0 1 2 3 4 5 6 7 8 9

2. Chaining: This is an alternative approach to open addressing to resolve the collisions. In


open addressing collisions are resolved by looking for a open cell in hash table. However this
method install a linked list at each index in the hash table. When a key hashes to an index then
item is inserted into linked list at that index. So other items hashed to the same index then
simply they added to the linked list at end. There is no need to search for an empty cell. This
method is called chaining because items that collide are chained together in separate linked lists.

Dr. M. Purnachandra Rao, Assoc. Prof., Dept. of IT, KITS Page 10

You might also like