R23_DS_Unit V-1
R23_DS_Unit V-1
R23_DS_Unit V-1
Trees: Introduction to Trees, Binary Search Tree – Insertion, Deletion & Traversal.
Hashing: Brief introduction to hashing and hash function, Collision resolution techniques: Chaining
and open addressing, Hash tables: basic implementation and operations, Applications of hashing in
unique identifier generation, caching etc.
Introduction to Trees:
A Tree is a non linear advanced data structure which organizes the data in a hierarchical
structure.
It represents nodes connected by edges and these nodes are having parent-child relationships.
Definition: A tree can be defined as collection of finite number of nodes in which one node is
designated as root and all other nodes are partitioned into n>=0 disjoint sets T1, T2, …..Tn, where
each of these sets is a tree. We can call T1, T2,………,Tn are sub-trees of root.
2) Linked List Representation: In this representation, every node will have 3 fields, data field,
a pointer field to left child and a pointer field to right child. So we use a doubly linked list to
represent a binary tree. In C the binary tree node structure is given as
Operations on BST: The basic operations that can be performed on BST are given as
Deleting a node with one child: We can use the following steps to delete a node with one
child.
Step 1: Find the node to delete by search function
Step 2: If it has one child then delete the node using free function and replace it with its child.
Step 3: terminate the function
Deleting a node with two children: We can use the following steps to delete a node with
two children.
Step 1: Find the node to be delete by search function.
Step 2: If it has two children, then find the largest node in left sub tree or smallest node in
right sub tree.
Step 3: Swap the deleting node and node found in step 2
Step 4: Then check whether deleting node came to case 1 or case 2, otherwise goto step 2
Step 5: If it comes to case 1, use case 1 logic to delete it.
Step 6: If it comes to case 2, use case 2 logic to delete it.
Step 7: Repeat the same process until node is deleted from the tree.
Hashing is the process of generating a value from a list of values using a mathematical
function known as Hash Function.
This is the best search technique because it will perform the search operation in very less
time than the previous algorithm.
It will perform the search operation almost in constant average time.
However this technique requires that the data to be organized in a special manner.
Hash Table:
The data in hashing is organized with the help of a table known as hash table.
Actually it is an array of fixed size whose index range from 0 to tablesize – 1, containing the
items.
Each key is mapped into some manner in the range 0 to tablesize – 1 by using hash function
and placed in the appropriate cell.
Hence, this hash table a data structure which stores that data in an associative manner.
In hash table each key has unique index of it.
Hash Function:
This is the function that transforms or maps a key into hash table index.
This function should be simple to compute and ensures that any two different keys get
different cells of the hash table.
If H is the hash function, k is the key then H(k) is called hash of k and it gives the index of
hash table at which the key k should be placed.
So 0<= H(k)<= m, where m is the size of hash table.
There are several ways to define hash function, some of them are
2. Mid Square: It is very good hashing method. In this method, hash function, H, is
computed by squaring the identifier, and then using the appropriate number of digits (r) from
middle of the square to obtain the index.
For example: Suppose hash table has 100 locations. So r= 2 because two digits are required
to map the key to the memory location.
K= 60, H(k) = 60 * 60 = 3600. Then take r= 60. So H(60) = 60.
3. Folding Method: In this method, the key is partitioned into parts such that all the parts,
except possible the last parts, are of equal length. The parts are then added in some convenient
way to obtain the hash address.
For example: k=12345, then k1=12, k2=34 and k3=5, and hash key = 12+34+5 = 51
So H(12345) = 51.
= floor(100(0.5348)) = floor(53.48) = 53
Collision:
Sometimes the hash function results the same index of hash table for two or more keys then
this situation is said to be collision occurrence.
So in hashing we must include the algorithm to handle the collisions.
Hence we must use certain technique to resolve the collision such that keys which yield the
same index should be placed in different cells.
Collision Resolution Techniques: These techniques are classified into 2 categories, like
1. Open Addressing: This technique can be implemented in several ways. There are 3 methods
to implement this technique. All these 3 methods only vary in the way to find the vacant cell
when collision occurs. They are
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
1. Linear Probing: This is the simplest method of handling collisions. Suppose we use
modular division for hashing the keys. When collision occurs, according to this technique, we
search sequentially for vacant cell from where collision occurred. This is so called because it
steps sequentially the hash table along the line of cells until it finds a vacant cell.
For example: Consider the following elements are to be inserted into the hash table by using
modular division hash function.
31,4,7,21,5,41,61
If we use division hash function to place keys, we can place in hash table as
H(31) = 31%10= 1
31
0 1 2 3 4 5 6 7 8 9
31 4
0 1 2 3 4 5 6 7 8 9
H(7) = 7 % 10 = 7
31 4 7
0 1 2 3 4 5 6 7 8 9
H(21) = 21%10=1
Now collision occurs, then according this method it will be place at 2nd index
31 21 4 7
0 1 2 3 4 5 6 7 8 9
Problem in Linear Probing: The primary clustering is the main problem in linear probing. i.e
in which the block of data is formed at one end of hash table.
2. Quadratic Probing: This is another collision resolution method that eliminates the primary
clustering problem of linear probing. The idea behind this method is to probe more widely
separated cell instead of adjacent cells. i.e the resolution function in this method is quadratic. i.
f(i) = i2. In other words, suppose collision occurred at x index then it will probe x+12,x+22, x+32
etc.
Problem in Quadratic Probing: However this quadratic probing suffers from a different
clustering problem called secondary clustering. i.e when all keys that has to particular cell should
follow the same sequence in trying to find vacant cell.
3. Double Hashing: This is most efficient resolution method that can eliminate both primary as
well as secondary clustering problems. This is sometimes referred as Rehashing. In double
hashing method, a second hash function is applied to the key when a collision occurs. However
there are two important rules for 2nd hash function, they are
Some experts discovered that the functions for this technique which are
For example: consider the following elements to be place in has table of size 10.
37, 90,45,22,17, 49 55
Now if 17 is to be inserted then H1(17) = 17 %10 = 7, now collision occurred because that cell
already occupied by 37. So we use second has function to resolve that collision. Hence
90 17 22 45 55 37 49
0 1 2 3 4 5 6 7 8 9