Hash
Hash
Hash
• Hashing: Many applications require a dynamic set that supports only the
dictionary operations INSERT, SEARCH, and DELETE.
• The best strategy for searching so far is O(logn) for a sorted input.
• assume each element has a key drawn from the universe set U = {0, 1, ..., m−
1},we can use containers, or bins, each indexed by possible values of the
key.
• Ex. Student ID, University.
• 1: procedure Direct-Address-Search(T, k)
2: return T [k]
3: end procedure
4: procedure Direct-Address-Insert(T, x)
5: T [x.key] = x
6: end procedure
7: procedure Direct-Address-Delete(T, x)
8: T [x.key] = N il
9: end procedure
• Runtime:O(1)
• Downsides:
– If the universe U is large, a table T of size |U | may be impracti-
cal/impossible.
– Ex: student ID and university.
• When the set K of keys stored in a dictionary is much smaller than the
universe U of all possible keys, a hash table requires much less storage
than a direct address. table.
8
• We use hash function: h(k) : U → {0, 1, ..., m − 1}. m is the size of hash
table, m << |U |
• Collision: hash function reduces the size of the array, instead we might
have two keys that hash to the same slot.
• Chain hashing: we place all the elements that hash to the same slot into
the same linked list.
• 1: procedure Chained-Address-Search(T, k)
2: search for an element with key k in list T [h(k)]
3: end procedure
4: procedure Direct-Address-Insert(T, x)
5: Insert x at the head of list T [h(x.key)]
6: end procedure
7: procedure Direct-Address-Delete(T, x)
8: Delete x from the list T [h(x.key)]
9: end procedure
•
• Given a hash table T with m slots that stores n elements, we define the
load factor α for T as n/m, that is, the average number of elements stored
in a chain.
• Worst case Θ(n)
9
• Simple uniform hashing: any given element is equally likely to hash
into any of the m slots, independently of where any other element has
hashed to.
• Theorem: In a hash table in which collisions are resolved by chaining, an
unsuccessful search takes average-case time Θ(1+α) under the assumption
of simple uniform hashing.
• Theorem: In a hash table in which collisions are resolved by chaining, a
successful search takes average-case time Θ(1 + α) under the assumption
of simple uniform hashing
– 1: for i ← 1 to n do
2: search for A[i] in B
3: end for
• 1: for i ← 1 to n do
2: add B[i] to T
3: end for
4: for i ← 1 to n do
5: search for A[i] in T
6: end for
10
• 1: P [1] = A[1]
2: Add P [1] to hash table T .
3: for i ← 2 to n do
4: P [i] = P [i − 1] + A[i]
5: search P [i] in T
6: if Found then return T rue
7: else
8: Add P [i] to hash table T .
9: end if
10: end forreturn F alse
11
• Typically, m = 2p for some integer p
• Ex.
– k = 123
– n = 100
– A = 0.618033
– h(123) = 100(123.0.618033mod1)
– = 100(76.018059mod1)
– = 100(0.018059) = 1
where s[i] is the ith character of the string and n is the length of the
string.
12