Unit Nine
Unit Nine
Unit Nine
Searching
What is Searching??
• Searching is the process of finding a given value position in a list of
values.
• It decides whether a search key is present in the data or not.
• It is the algorithmic process of finding a particular item in a collection
of items.
• It can be done on internal data structure or on external data structure.
• To search an element in a given array, it can be done in following ways:
1. Sequential Search
2. Binary Search
Sequential Search
• Sequential search is also called as Linear Search.
• Sequential search starts at the beginning of the list and checks every element of the list.
• It is a basic and simple search algorithm.
• Sequential search compares the element with all the other elements given in the list. If the
element is matched, it returns the value index, else it returns -1.
• The above figure shows how sequential search works. It searches an element or value from
an array till the desired element or value is not found. If we search the element 25, it will
go step by step in a sequence order. It searches in a sequence order. Sequential search is
applied on the unsorted or unordered list when there are fewer elements in a list.
Algorithm:
1. Start from the leftmost element of array and compare each element one by one with
x.
2. If x matches with an element, return the index (hit).
3. If x doesn't match with any of elements, return -1(miss).
Algorithm:
Hashing is another approach in which time required to search an element doesn't depend on
the total number of elements. Using hashing data structure, a given element is searched
with constant time complexity. Hashing is an effective way to reduce the number of
comparisons to search an element in a data structure.
In other words, Hashing is the process of indexing and retrieving element (data) in a data
structure to provide a faster way of finding the element using a hash key.
In this data structure, we use a concept called Hash table to store data. All the data values are
inserted into the hash table based on the hash key value. The hash key value is used to map
the data with an index in the hash table. And the hash key is generated for every data using
a hash function. That means every entry in the hash table is based on the hash key value
generated using the hash function.
Components of Hashing
• There are majorly three components of hashing:
• Key: A Key can be anything string or integer which is fed as input in the hash function the
technique that determines an index or location for storage of an item in a data structure.
• Hash Function: The hash function receives the input key and returns the index of an
element in an array called a hash table. The index is known as the hash index.
• Hash Table: Hash table is a data structure that maps keys to values using a special function
called a hash function. Hash stores the data in an associative manner in an array where
each data value has its own unique index.
Hash Function
● A hash function is any function that can be used to map a data set of an arbitrary size to a
data set of a fixed size, which falls into the hash table.
● A common hash function is h(x)=x mod SIZE
○ if key=27 and SIZE=10 then hash address = 27 mod 10 = 7
● The values returned by a hash function are called hash values, hash codes, hash sums,
or simply hashes.
● To achieve a good hashing mechanism, It is important to have a good hash function with
the following basic requirements:
○ Easy to compute: It should be easy to compute and must not become an algorithm in
itself
○ Uniform distribution: It should provide a uniform distribution across the hash table
and should not result in clustering.
○ Less collisions: Collisions occur when pairs of elements are mapped to the same hash
value. These should be avoided.
Types of Hash Function
Division method
● In this the hash, function is dependent upon the remainder of a division.
For example:-if the record 52,68,99,84 is to be placed in a hash table and let us take the
table size is 10.
Then: h(key)=record% table_size.
2=52%10; 8=68%10; 9=99%10 4=84%10
Mid square method
● In this method firstly key is squared and then mid part of the result is taken as the index.
For example: consider that if we want to place a record of 3101 and the size of table is 1000.
So 3101*3101 - 9616201 i.e. h (3101) = 162 (middle 3 digit)
Digit folding method
● In this method the key is divided into separate parts and by using some simple operations
these parts are combined to produce a hash key.
For example: consider a record of 12465512 then it will be divided into parts i.e. 124, 655,
12.
After dividing the parts combine these parts by adding it. H(key)=124+655+12 =791
Hash Table
● Suppose we have a set of strings ("abc", "der, "ghi"} that we'd like to store in a table.
● Our objective here is to find or update them quickly from a table, actually in 0(1).
● We are not concerned about ordering them or maintaining any order at all. Let us think of
a simple schema to do this.
● Suppose we assign "a" = 1, "b"=2, ... etc to all alphabetical characters.
● We can then simply compute a number for each of the strings by using the sum of the
characters as follows. "abc" = 1 + 2 + 3=6, "def' = 4 + 5 + 6=15 , "ghi" = 7 + 8 + 9=24
● If we assume that we have a table of size 5 to store these strings, we can compute the
location of the string by taking the sum mod 5.
● So we will then store "abc" in 6 mod 5 = 1, def in 15 mod 5 = 0, and "ghi" in 24
Collision
The hashing process generates a small number for a big key, so there is a possibility that two
keys could produce the same value. The situation where the newly inserted key maps to an
already occupied, and it must be handled using some collision handling technology.
Collision Resolution Technique
● Collision Resolution Techniques are the techniques used for resolving or handling the
collision.
● The process of finding an alternate location is called collision resolution.
● Even though hash tables have collision problems, they are more efficient in many cases
compared to all other data structures, like search trees.
● There are a number of collision resolution techniques, and the most popular are direct
chaining and open addressing.
● In this technique, a linked list is created from the slot in which collision has occurred,
after which the new key is inserted into the linked list.
● This linked list of slots looks like a chain, so it is called separate chaining.
● It is used more when we do not know how many keys to insert or delete.
● Open addressing is collision-resolution method that is used to control the collision in the
hashing table.
● There is no key stored outside of the hash table.
● Therefore, the size of the hash table is always greater than or equal to the number of
keys.
● It is also called closed hashing.
● The following techniques are used in open addressing:
○ Linear probing
○ Quadratic probing
○ Double hashing.
Linear Probing
● In linear probing, we search the hash table sequentially, starting from the original hash
location.
● If a location is occupied, we check the next location.
● We wrap around from the last table location to the first table location if necessary.
● The function for rehashing is the following:
○ rehash(x) % tablesize
● If slot hash(x) % tablesize is full, then we try (hash(x) + 1) % tablesize
● If (hash(x) + 1) % tablesize is also full, then we try (hash(x) + 2) % tablesize.
● If (hash(x) + 2) % tablesize is also full, then we try (hash(x) + 3) % tablesize
● And so on until you find the empty slot.
Performing hash operations on: [key mod 7]
50,700,76,85,92
Advantages
● It is easy to compute
Disadvantages
● The main disadvantage is clustering
● Many consecutive elements might form groups
● Might be time consuming to search for empty bucket.
Worst time to search an element is O(s) where s is the table size.
Quadratic Probing
● The problem of Clustering can be eliminated if we use the quadratic probing method.
● In quadratic probing, when collision occurs, we probe for i2`th bucket in ith iteration.
i.e.In quadratic probing, we start from the original hash location i. If a location is
occupied, we check the locations :H+12, H+22, H+32, H+42, … H+k2 where i= 1 to k
● In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
● For example: Table_Size = 10, hash(x) = x mod 10,
f(i) = i2, hi(x) = (hash(x) + f(i)) mod Table_Size.
Example: for table size = 11, index from 0…10
● Hash Function: h(x) = x mod 11
● Inserting keys: 20, 30, 2, 13, 25, 24, 10, 9
Advantages
○ Quadratic probing is less likely to have the problem of primary clustering and is
easier to implement than Double Hashing.
Disadvantages
○ Quadratic probing has secondary clustering. This occurs when 2 keys hash to the
same location, they have the same probe sequence. So, it may take many attempts
before an insertion is being made.
○ Also probe sequences do not probe all locations in the table.
Double Hashing