Chapter 8 - Searching
Chapter 8 - Searching
Chapter 8 - Searching
Searching
Introduction
▪ Searching is a process of finding an element within the list of
elements stored in any order.
▪ Searching is divided into two categories. They are
• Linear Search and Binary Search
▪ Linear Searching is the basic and simple method of searching.
▪ Binary Search takes some less time to search an element from the sorted
list of element.
▪ So we can say that binary search method is more efficient than the linear search. Only
drawback is that binary search work on the sorted list where there is no prerequisite
for the linear search.
Types of Searching
Algorithm
Let 'a' be the linear array with ‘n’ elements, and ‘item' is an element to be searched.
The algorithm finds the location 'loc' of item in ‘a’ or give the failure message.
1. Read the item to be searched.
2. [Initialize counter] set loc= -1,j=0
3. [Search for item]
Repeat while j<n
If a[j] = item
Set loc = j
Break
Else
Set j = j+1
Wend
4. [Successful] if loc>=0
Print the searched value’s (item’s) position (loc)
5. [Unsuccessful] else
Print the searching item is not found.
6. [END]
1
Searching
2. Binary Search:
Binary search is an extremely efficient algorithm. This search technique searches the
given item in minimum possible comparisons. To do the binary search, first we have to
sort the array elements. The logic behind this technique is given below.
1. First find the middle element of the array.
2. Compare the middle element with an item.
3. There are three cases.
a) If it is a desired element then search is successful,
b) If it is less than the desired item then search only in the first half of the array.
c) If it is greater than the desired item, search in the second half of the array.
4. Repeat the same steps until an element is found or search area is exhausted.
In this way, at each step we reduce the length of the list to be searched by half.
Requirements
i. The list must be ordered
ii. Rapid random access is required, so we cannot use binary search for a linked
list
Algorithm
Let 'a’ be the array of size 'Maxsize' and 'LB', 'UB' and 'mid' are variables to denote
first, last and middle location of a segment. This algorithm finds the location 'Loc' of
'item' in array 'a’ or return fail(sets loc=NULL).
5 if a[mid]=item then
set loc=mid
print the value and its position
else
set loc=NULL
print search unsuccessful
[End if]
6. Exit
2
Searching
void binsrch( int a[], int beg, int end, int item)
{
int mid;
int count = 0;
mid = (beg + end)/2;
count ++;
while(beg <= end && item != a[mid])
{
If(item < a[mid])
end = mid- 1;
else
beg = mid + 1;
mid = (beg + end)/2;
count ++;
}
3
Searching
if(item == a[mid])
{
printf(“\nSearch Successful!!!\n It took %d iterations to find this item”, cnt);
printf(“\nThe position is %d”,mid);
}
else
printf(“\n Search Unsuccessful”);
getch();
}
Hashing
In order words the process of mapping large amount of data into a smaller table is called
hashing.
This searching scheme is easy to program compared to trees. But it is difficult to expand,
since it is based on arrays.
Hash Table
A hash table is simply an array that is address via a hash function. It is a data structure
made up of
• A table of some fixed size to hold a collection of records each uniquely identified
by some key
• A function called hash function that is used to generate index values in the table.
Hash Function
The basic idea in hashing is the transformation of a key into the corresponding location in
the hash table. This is done by a hash function. A hash function can be defined as a
function that takes key as input and transforms it into a hash table index usually denoted
by H.
4
Searching
Hash of Key
Let H be a hash function and k is a key then H(k) is called hash-of-key. The hash-of-key
is the index at which a record with the key values k must be kept.
The result of a hash function tells us where to look for a particular element in order to
retrieve, modify or delete the element. We could use the following algorithm for
retrieving a record.
retrieve(key)
{
int location;
location = hash(key); //The result of hash function gives the location
record = info[location]; //you get the record from the info array
}
We must put each new record into the correct slot according to the hash function when
you insert a record. The algorithm for inserting a record follows;
insert(key)
{
int location;
location = hash(key); //hash function gives the location loc
info[location] = record; //insert record into info array at array position loc.
}
What is collision?
There are a finite number of indices in a table. But there are large numbers of keys so it is
clearly impossible to get two different indexes for two distinct keys.
5
Searching
Folding method
Eg. Let the keys be of four digits, chopping the key into two parts and adding yields
Hash(5421) = 54 + 21 = 75
So 75 is the index at which we should store or retrieve record with key 5421
Mid square method: The key is squared. We defined the hash function in this case as
Hash (key) = p;
Where p is obtained by deleting digits from both ends of (key)2. We emphasize that the
same positions of (key)2 must be used for all the keys.
The following calculation are performed
Key = 5421 | 1825
(Key)2 =29387241 | 3330625
Hash(key) = 87 | 30
Leaving 3 digits from the last and then taking two preceeding digits.
To get a good distribution of indices, prime number makes the best table size.
6
Searching
• Double hashing
• Chaining
Linear probing:
A simple approach to resolve collision is to store the colliding element in the next
available space. This technique is known as linear probing.
Key = 32 to be Must probe two more times
added [h(32) = 6]
4 1 4 5 3 2
0 1 21 3 4 58 64 79 82 92 10 11 12
Eg. Let the hash function be h(key) = key % 10 (10 is the table size)
Then using linear probing, let’s insert following keys in the hash table
9 9 9 89 9 89
8 8 8 8 18
7 7 7 7
6 6 6 6
5 5 15 5 15 5 15
4 4 4 4
3 3 3 3
2 2 2 2
1 1 1 1
0 20 0 20 0 20 0 20
20%10=0 15%10=5 89%10=9 18%10=8
7
Searching
Clustering occurs when a hash function is biased towards the placement of keys into a
given region within the storage space. When the linear method is used to resolve
collisions, this clustering problem is compounded, because keys that collide are loaded
relatively close to the initial collision point.
Chaining
It is another technique to deal with collision. In this method, for each location in the table,
we keep a linked list of records that hash to the same index.
The hash table contains pointers to linked list nodes; we can view these as the head
pointers.
Each time a record is inserted, it is added to the list at the location given by the hash
function.
When a collision occurs, the record is simply added to the list at the collision site.
9 89 49 79
8 18 48
7
6
5 15
4
3
2
1 21
0 20
Quadratic Probing
This method makes an attempt to correct the problem of clustering with linear probing. It
forces the problem key to move quickly a considerable distance from the initial collision.
When a key value hashes and collision occurs for key, this method probes the table
location at
8
Searching
That is, the first rehash adds 1 to the hash value. The second rehash adds 4, the third adds
9 and so on.
It reduces primary clustering but suffers from secondary clustering : keys that hash to
some initial slot will probe the same alternative cells.
There is no guarantee to finding an empty location once the table gets more than half full.
9 9 9 89 9 89
8 8 8 8 18
7 7 7 7
6 6 6 6
5 5 15 5 15 5 15
4 4 4 4
3 3 3 3
2 2 2 2
1 1 1 1
0 20 0 20 0 20 0 20
20%10=0 15%10=5 89%10=9 18%10=8
Double Hashing
When collision occurs, a new hash function is defined.
H2(key) = R – (key%R)
Rehashing
If at any stage the hash table become almost full (when packing density is more than 70%)
then it will be difficult to find the free slot which will increase execution time. If the hash
function produces collisions, we create a new hash table of double size. We may use the
old hash value on input to a rehash function and compute a new hash value. For
rehashing with linear probing, we can use the rehash function as:
9
Searching
Where constant and array size are relatively prime, i.e, the largest number that divides
both of them is 1. For eg, given 100 slots array, we may use constant in rehash function:
(old hash value + 3) % 100
10
Searching
11
Searching
12