Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit Nine

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Unit Nine

Searching
What is Searching??
• Searching is the process of finding a given value position in a list of
values.
• It decides whether a search key is present in the data or not.
• It is the algorithmic process of finding a particular item in a collection
of items.
• It can be done on internal data structure or on external data structure.
• To search an element in a given array, it can be done in following ways:

1. Sequential Search
2. Binary Search
Sequential Search
• Sequential search is also called as Linear Search.
• Sequential search starts at the beginning of the list and checks every element of the list.
• It is a basic and simple search algorithm.
• Sequential search compares the element with all the other elements given in the list. If the
element is matched, it returns the value index, else it returns -1.

• The above figure shows how sequential search works. It searches an element or value from
an array till the desired element or value is not found. If we search the element 25, it will
go step by step in a sequence order. It searches in a sequence order. Sequential search is
applied on the unsorted or unordered list when there are fewer elements in a list.
Algorithm:
1. Start from the leftmost element of array and compare each element one by one with
x.
2. If x matches with an element, return the index (hit).
3. If x doesn't match with any of elements, return -1(miss).

Search 12 in list of element {65,20,10,55,32,12,50,99}


Worst case scenario => The target element is not in the list of at the end
Total Comparisons = n, so complexity is O(n)
Best Case Scenario => The target element is at the first position
Total Comparisons = 1, so complexity is O(1)
Average Case Scenario The target element can be anywhere in the list
Total comparisons = average of possible comparisons.
i.e: [ n+(n-1)+(n-2)+.....+1) / n comparison, so the complexity is O(n)
Binary Search
● Binary search can be implemented only on a sorted list of items.
● If the elements are not sorted already, we need to sort them first.
● Binary search follows divide and conquer approach in which, the list is divided into two
halves and the item is compared with the middle element of the list.
● If the match is found then, the location of middle element is returned otherwise, we
search into either of the halves depending upon the result produced through the match.
● Binary search algorithm reduce the time complexity of search on sorted array to O(Log
n)

Algorithm:

Step 1: Find the index of the middle (central) element as:


Mid = (first + last) / 2
Step 2: If the element is at the mid, return mid
Step 3: If the element at the mid is greater than the element to be searched ie element[mid] >
number, then update last index as: last = (mid-1) and go to Step 1.
Step 4: If the element at the mid is less than the element to be searched i.e. element[mid] <
number, then update first index as: first = (mid+1) and go to Step 1.
Step 5: Continue until mid == 0
Step 6: Display number not found, if mid becomes 0.
Example:
Search an element 46 from the list 4,10,16,24,32,46,76,112,144,182 using binary search
algorithm.
Hashing
In all search techniques like linear search, binary search and search trees, the time required to
search an element depends on the total number of elements present in that data structure. In
all these search techniques, as the number of elements increases the time required to search
an element also increases linearly.

Hashing is another approach in which time required to search an element doesn't depend on
the total number of elements. Using hashing data structure, a given element is searched
with constant time complexity. Hashing is an effective way to reduce the number of
comparisons to search an element in a data structure.
In other words, Hashing is the process of indexing and retrieving element (data) in a data
structure to provide a faster way of finding the element using a hash key.
In this data structure, we use a concept called Hash table to store data. All the data values are
inserted into the hash table based on the hash key value. The hash key value is used to map
the data with an index in the hash table. And the hash key is generated for every data using
a hash function. That means every entry in the hash table is based on the hash key value
generated using the hash function.
Components of Hashing
• There are majorly three components of hashing:
• Key: A Key can be anything string or integer which is fed as input in the hash function the
technique that determines an index or location for storage of an item in a data structure.
• Hash Function: The hash function receives the input key and returns the index of an
element in an array called a hash table. The index is known as the hash index.
• Hash Table: Hash table is a data structure that maps keys to values using a special function
called a hash function. Hash stores the data in an associative manner in an array where
each data value has its own unique index.
Hash Function

● A hash function is any function that can be used to map a data set of an arbitrary size to a
data set of a fixed size, which falls into the hash table.
● A common hash function is h(x)=x mod SIZE
○ if key=27 and SIZE=10 then hash address = 27 mod 10 = 7
● The values returned by a hash function are called hash values, hash codes, hash sums,
or simply hashes.
● To achieve a good hashing mechanism, It is important to have a good hash function with
the following basic requirements:
○ Easy to compute: It should be easy to compute and must not become an algorithm in
itself
○ Uniform distribution: It should provide a uniform distribution across the hash table
and should not result in clustering.
○ Less collisions: Collisions occur when pairs of elements are mapped to the same hash
value. These should be avoided.
Types of Hash Function

Division method
● In this the hash, function is dependent upon the remainder of a division.
For example:-if the record 52,68,99,84 is to be placed in a hash table and let us take the
table size is 10.
Then: h(key)=record% table_size.
2=52%10; 8=68%10; 9=99%10 4=84%10
Mid square method
● In this method firstly key is squared and then mid part of the result is taken as the index.
For example: consider that if we want to place a record of 3101 and the size of table is 1000.
So 3101*3101 - 9616201 i.e. h (3101) = 162 (middle 3 digit)
Digit folding method
● In this method the key is divided into separate parts and by using some simple operations
these parts are combined to produce a hash key.
For example: consider a record of 12465512 then it will be divided into parts i.e. 124, 655,
12.
After dividing the parts combine these parts by adding it. H(key)=124+655+12 =791
Hash Table

● A hash table is a data structure that is used to store keys/value pairs.


● It uses a hash function to compute an index into an array in which an element will be
inserted or searched.
● By using a good hash function, hashing can work well.
● Under reasonable assumptions, the average time required to search for an element in a
hash table is 0(1).
● Load Factor of the Hash Table: It is the denoted by the symbol
● = (number of items in the table) / tablesize
Example: Assume that the tablesize is 10 and it consists of 6 items, then the load factor
of the table is = 6/10 = 0.6
Example:

● Suppose we have a set of strings ("abc", "der, "ghi"} that we'd like to store in a table.
● Our objective here is to find or update them quickly from a table, actually in 0(1).
● We are not concerned about ordering them or maintaining any order at all. Let us think of
a simple schema to do this.
● Suppose we assign "a" = 1, "b"=2, ... etc to all alphabetical characters.
● We can then simply compute a number for each of the strings by using the sum of the
characters as follows. "abc" = 1 + 2 + 3=6, "def' = 4 + 5 + 6=15 , "ghi" = 7 + 8 + 9=24
● If we assume that we have a table of size 5 to store these strings, we can compute the
location of the string by taking the sum mod 5.
● So we will then store "abc" in 6 mod 5 = 1, def in 15 mod 5 = 0, and "ghi" in 24
Collision
The hashing process generates a small number for a big key, so there is a possibility that two
keys could produce the same value. The situation where the newly inserted key maps to an
already occupied, and it must be handled using some collision handling technology.
Collision Resolution Technique

● Collision Resolution Techniques are the techniques used for resolving or handling the
collision.
● The process of finding an alternate location is called collision resolution.
● Even though hash tables have collision problems, they are more efficient in many cases
compared to all other data structures, like search trees.
● There are a number of collision resolution techniques, and the most popular are direct
chaining and open addressing.

● Direct Chaining => Linked list based implementation


○ Separate Chaining
● Open Addressing => Array -based implementation
○ Linear Probing => linear search
○ Quadratic probing => nonlinear search
○ Double hashing => use of two hash functions
Separate Chaining

● In this technique, a linked list is created from the slot in which collision has occurred,
after which the new key is inserted into the linked list.
● This linked list of slots looks like a chain, so it is called separate chaining.
● It is used more when we do not know how many keys to insert or delete.

Hashing for keys: [key mod 7]


50, 700, 76, 85, 92
Hashing for keys: [key mod 7]
50, 700, 76, 85, 92, 73?, 101?
Time complexity
• Its worst-case complexity for searching is o(n).
• Its worst-case complexity for deletion is o(n).

Advantages of separate chaining


• It is easy to implement.
• The hash table never fills full, so we can add more elements to the chain.
• It is less sensitive to the function of the hashing.

Disadvantages of separate chaining


• In this, the cache performance of chaining is not good.
• Memory wastage is too much in this method.
• It requires more space for element links.
Open Addressing

● Open addressing is collision-resolution method that is used to control the collision in the
hashing table.
● There is no key stored outside of the hash table.
● Therefore, the size of the hash table is always greater than or equal to the number of
keys.
● It is also called closed hashing.
● The following techniques are used in open addressing:
○ Linear probing
○ Quadratic probing
○ Double hashing.
Linear Probing

● In linear probing, we search the hash table sequentially, starting from the original hash
location.
● If a location is occupied, we check the next location.
● We wrap around from the last table location to the first table location if necessary.
● The function for rehashing is the following:
○ rehash(x) % tablesize
● If slot hash(x) % tablesize is full, then we try (hash(x) + 1) % tablesize
● If (hash(x) + 1) % tablesize is also full, then we try (hash(x) + 2) % tablesize.
● If (hash(x) + 2) % tablesize is also full, then we try (hash(x) + 3) % tablesize
● And so on until you find the empty slot.
Performing hash operations on: [key mod 7]
50,700,76,85,92
Advantages
● It is easy to compute

Disadvantages
● The main disadvantage is clustering
● Many consecutive elements might form groups
● Might be time consuming to search for empty bucket.
Worst time to search an element is O(s) where s is the table size.
Quadratic Probing

● The problem of Clustering can be eliminated if we use the quadratic probing method.
● In quadratic probing, when collision occurs, we probe for i2`th bucket in ith iteration.
i.e.In quadratic probing, we start from the original hash location i. If a location is
occupied, we check the locations :H+12, H+22, H+32, H+42, … H+k2 where i= 1 to k
● In quadratic probing, f is a quadratic function of i, typically f(i) = i 2.
● For example: Table_Size = 10, hash(x) = x mod 10,
f(i) = i2, hi(x) = (hash(x) + f(i)) mod Table_Size.
Example: for table size = 11, index from 0…10
● Hash Function: h(x) = x mod 11
● Inserting keys: 20, 30, 2, 13, 25, 24, 10, 9
Advantages
○ Quadratic probing is less likely to have the problem of primary clustering and is
easier to implement than Double Hashing.

Disadvantages
○ Quadratic probing has secondary clustering. This occurs when 2 keys hash to the
same location, they have the same probe sequence. So, it may take many attempts
before an insertion is being made.
○ Also probe sequences do not probe all locations in the table.
Double Hashing

● For double hashing, one popular choice is f(i) -= i * hash2(x).


● This formula says that we apply a second hash function to x and probe at a distance
hash2(x), 2hash2(x), 3hash2(x), ..., and so on.
● The choice of hash2(x) is essential.
○ The function must never evaluate to zero.
○ It is important to make sure all cells can be probed.
○ A function such as hash2(x) = R — (x mod R), with R a prime, smaller than Table
Size.
○ Table Size need to be prime.
○ The cost of double hashing is determined by the use of a second hash function.
Ex: Table size = 10, hash(x) = x mod 10
Insert keys: 89, 18, 49, 58, 69
Rehashing
● Rehashing is a technique in which the table is resized, i.e., the size of table is doubled
by creating a new table.
● While increasing the size, it is preferable that the total size of table is a prime number.
● Some situations in which the rehashing is required:
○ When table is completely full
○ With quadratic probing when the table is filled half.
○ When insertions fail due to overflow
● In such situations, we have to transfer entries from old table to the new table by re
computing their positions using hash functions.
● In such situations, we have to transfer entries from old table to the new table by re
computing their positions using hash functions.
● Example: Compute the hash for keys (37,90,55,22,17,49) with table size 10.
○ Here, H(key) = key mod tablesize
○ 37 % 10 = 7
○ 90% 10=0
○ 55 % 10 = 5
○ 22% 10 = 2
○ 17 % 10 = 7 Collision solved by linear probing
○ 49% 10 = 9
○ Now this table is almost full and if we try to insert more elements collisions will
occur and eventually further insertions will fail. Hence we will rehash by doubling
the table size. The old table size is 10 then we should double this size for new table,
that becomes 20. But 20 is not a prime number, we will prefer to make the table size
as 23.
● Now the new hash function will be:
● h(key) = key mod 23
● As seen, now the hash table is sufficiently large
to accommodate new insertions.

You might also like