Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Hash

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2 Data Structure, Hashing, and Heap

• Hashing: Many applications require a dynamic set that supports only the
dictionary operations INSERT, SEARCH, and DELETE.
• The best strategy for searching so far is O(logn) for a sorted input.
• assume each element has a key drawn from the universe set U = {0, 1, ..., m−
1},we can use containers, or bins, each indexed by possible values of the
key.
• Ex. Student ID, University.

• 1: procedure Direct-Address-Search(T, k)
2: return T [k]
3: end procedure
4: procedure Direct-Address-Insert(T, x)
5: T [x.key] = x
6: end procedure
7: procedure Direct-Address-Delete(T, x)
8: T [x.key] = N il
9: end procedure

• Runtime:O(1)
• Downsides:
– If the universe U is large, a table T of size |U | may be impracti-
cal/impossible.
– Ex: student ID and university.
• When the set K of keys stored in a dictionary is much smaller than the
universe U of all possible keys, a hash table requires much less storage
than a direct address. table.

8
• We use hash function: h(k) : U → {0, 1, ..., m − 1}. m is the size of hash
table, m << |U |
• Collision: hash function reduces the size of the array, instead we might
have two keys that hash to the same slot.

• Chain hashing: we place all the elements that hash to the same slot into
the same linked list.

• 1: procedure Chained-Address-Search(T, k)
2: search for an element with key k in list T [h(k)]
3: end procedure
4: procedure Direct-Address-Insert(T, x)
5: Insert x at the head of list T [h(x.key)]
6: end procedure
7: procedure Direct-Address-Delete(T, x)
8: Delete x from the list T [h(x.key)]
9: end procedure

• Worst runtime for insertion is O(1)

• Worst runtime for searching depends on the length of the list.


• We can delete an element in O(1) if the lists are doubly linked./ with single
linked lists, both deletion and search would have the same asymptotic
running time.


• Given a hash table T with m slots that stores n elements, we define the
load factor α for T as n/m, that is, the average number of elements stored
in a chain.
• Worst case Θ(n)

9
• Simple uniform hashing: any given element is equally likely to hash
into any of the m slots, independently of where any other element has
hashed to.
• Theorem: In a hash table in which collisions are resolved by chaining, an
unsuccessful search takes average-case time Θ(1+α) under the assumption
of simple uniform hashing.
• Theorem: In a hash table in which collisions are resolved by chaining, a
successful search takes average-case time Θ(1 + α) under the assumption
of simple uniform hashing

• If the number of hash-table slots is at least proportional to the number of


elements in the table, we have n = O(m) and, consequently, α = n/m =
O(m)/m = O(1) Thus, searching takes constant time on average.
• Ex Find common elements:

– 1: for i ← 1 to n do
2: search for A[i] in B
3: end for

– linear search: O(n) → O(n2 )


– Binary search: (sorted) O(logn) → O(nlogn)
– Hashing: O(1) → O(n)

• 1: for i ← 1 to n do
2: add B[i] to T
3: end for
4: for i ← 1 to n do
5: search for A[i] in T
6: end for

• Prefix example: In: array A, is there a continuous sub-array of A that


sum of all elements of sub array is 0

– {1, 4,-2,-2, 5, −4, 3}


– {1, 5,3,1, 6, 2, 5}

10
• 1: P [1] = A[1]
2: Add P [1] to hash table T .
3: for i ← 2 to n do
4: P [i] = P [i − 1] + A[i]
5: search P [i] in T
6: if Found then return T rue
7: else
8: Add P [i] to hash table T .
9: end if
10: end forreturn F alse

• Hash Function:A good hash function satisfies (approximately) the as-


sumption of simple uniform hashing: each key is equally likely to hash to
any of the m slots, independently of where any other key has hashed to.
• Division method: map a key k into one of m slots by taking the remain-
der of k divided by m, h(k) = k mod m
• Ex.
– m = 12
– k = 100
– h(100) = 100mod12 = 4
• For the division method, we avoid certain values for m,for example
– m = 2p since h(k) will be the p lowest-order bit of k.
– m = 2p − 1 when k is a character string interpreted in radix 2p , since
permutation of characters of k does not change the hash value.
• A prime not too close to an exact power of 2 is often a good choice for m
• Multiplication method: First, we multiply the key k by a constant A
in the range 0 < A < 1 and extract the fractional part of kA. Then,
we multiply this value by m and take the floor of the result. h(k) =
⌊m(kAmod1)⌋

11
• Typically, m = 2p for some integer p
• Ex.
– k = 123
– n = 100
– A = 0.618033
– h(123) = 100(123.0.618033mod1)
– = 100(76.018059mod1)
– = 100(0.018059) = 1

• Java hashcode() method for string: Java hashcode method returns


the hashcode of a string. The hashcode value of an empty string is Zero.
• The formula behind the hashcode is

s[0] ∗ 31(n−1) + s[1] ∗ 31(n−2) + ... + s(n − 2)

where s[i] is the ith character of the string and n is the length of the
string.

12

You might also like