Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 8 - Hashing

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 26

Hashing

Concept of Hashing

a hash table, or a hash map, is a data structure that associates keys (names) with values (attributes).

Example

A small phone book as a hash table.

Dictionaries

Collection of pairs.

(key, value) Each pair has a unique key.

Just An Idea

Hash table :

Collection of pairs, Lookup function (Hash function)

Hashing

Key-value pairs are stored in a fixed size table called a hash table.

A hash table is partitioned into many buckets. Each bucket has many slots. Each slot holds one record. A hash function f(x) transforms the identifier (key) into an address in the hash table

Hash table
s slots 0 0 1 1 s-1

. . .

b buckets

. . .
b-1

. . . . . .

. . .

Ideal Hashing

Uses an array table[0:b-1].


Each position of this array is a bucket. A bucket can normally hold only one dictionary pair.

Uses a hash function f that converts each key k into an index in the range [0, b-1]. Every dictionary pair (key, element) is stored in its home bucket table[f[key]].

Ideal Hashing Example


Pairs are: (22,a),(33,c),(3,d),(72,e),(85,f) Hash table is ht[0:7], b = 8 (where b is the number of positions in the hash table) Hash function f is key % b = key % 8 Where are the pairs stored?

[0]

[1]

[2]

[3]
(3,d)

[4]

[5]

[6]

[7]

(72,e) (33,c)

(85,f) (22,a)

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]
9

What Can Go Wrong? - Collision


(72,e) (33,c) (3,d) (85,f) (22,a)

[0] [1] [2] [3] [4] [5] [6] [7] Where does (25,g) go? The home bucket for (25,g) is already occupied by (33,c)

This situation is called collision

Keys that have the same home bucket are called synonyms 25 and 33 are synonyms with respect to the hash function that is in use
10

What Can Go Wrong? Overflow


(72,e) (33,c) (3,d) (85,f) (22,a)

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

A collision occurs when the home bucket for a new pair is occupied by a pair with different key An overflow occurs when there is no space in the home bucket for the new pair When a bucket can hold only one pair, collisions and overflows occur together Need a method to handle overflows
11

Some Issues

Choice of hash function.

To avoid collision (two different pairs are in the same the same bucket.) Size (number of buckets) of hash table. Overflow: there is no space in the bucket for the new pair.

Overflow handling method.

Choice of Hash Function

Requirements

easy to compute minimal number of collisions

A good hashing function distributes the key values uniformly throughout the range.

Some hash functions

Division:

Choose a number m(PRIME number) larger than the number n of keys in K. The Hash function H is defined by
H(k) = K(mod m) E.g. K=3205, 7148, 2345 & No. Of Address = 100( 0 -99) Let m =97 H(3205) = 3205 mod 97

Some hash functions


Folding:
Partition the key k into several parts, and add the parts together to obtain the hash address H(k) = k1+k2+ .... +kr e.g. x=12320324111220; partition k into 123,203,241,112,20; then return the address H(k)=123+203+241+112+20=699

Some hash functions


Mid Square:
The key k is squared. Then the Hsh function H is defined byH(k) = l Where l is obtained by deleting digits from both ends of k2 e.g.
E.g. K= 3205, k2 = 10 272 025 H(k) = 72 7148, 2345

Overflow Handling

An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by:

Search the hash table in some systematic fashion for a bucket that is not full.

Linear probing (linear open addressing). Quadratic probing. Rehashing

Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the home bucket.

Array linear list. Chain.

Linear probing (linear open addressing)

Open addressing ensures that all elements are stored directly into the hash table, thus it attempts to resolve collisions using various methods. Linear Probing resolves collisions by placing the data into the next open slot in the table.

Linear Probing Get And Insert


divisor = b (number of buckets) = 17. Home bucket = key % 17.


4 6 8 23 7 12 16 28 12 29 11 30 33

0 34 0 45

Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45

Linear Probing (program 8.3)


void linear_insert(element item, element ht[]){ int i, hash_value; i = hash_value = hash(item.key); while(strlen(ht[i].key)) { if (!strcmp(ht[i].key, item.key)) { fprintf(stderr, Duplicate entry\n); exit(1); } i = (i+1)%TABLE_SIZE; if (i == hash_value) { fprintf(stderr, The table is full\n); exit(1); } } ht[i] = item; }

Problem of Linear Probing


Identifiers tend to cluster together Increase the search time

Quadratic Probing

Quadratic probing uses a quadratic function of i as the increment (H(x)+i2)%b for H(K) = h For i = 0,1, 2 -----i.e. h, h+1, h+4 .......... h+ i2

Rehashing

Rehashing: Try H1, H2, , Hm in sequence if collision occurs. Here Hi is a hash function. Double hashing is one of the best methods for dealing with collisions.

If the slot is full, then a second hash function is calculated and combined with the first hash function. H(k, i) = (H1(k) + i H2(k) ) % m

Data Structure for Chaining


The idea of Chaining is to combine the linked list and hash table to solve the overflow problem.

Hashing with Chains

Hash table can handle overflows using chaining Each bucket keeps a chain of all pairs for which it is the home bucket. The chain may or may not be sorted by key

25

Hash Table with Sorted Chains

Put in pairs whose keys are 6,12,34,29, 28,11,23,7,0, 33,30,45 Home bucket = key % 17.
26

You might also like