CLRS Linked Lists
CLRS Linked Lists
CLRS Linked Lists
1
n
X
jDi C1
X
ij
!#
=
1
n
n
X
i D1
1
n
X
jDi C1
EX
ij
!
(by linearity of expectation)
=
1
n
n
X
i D1
1
n
X
jDi C1
1
m
!
= 1
1
nm
n
X
i D1
.n i /
= 1
1
nm
n
X
i D1
n
n
X
i D1
i
!
= 1
1
nm
n
2
n.n 1/
2
2n
:
Thus, the total time required for a successful search (including the time for com-
puting the hash function) is .2 =2 =2n/ = .1 /.
What does this analysis mean? If the number of hash-table slots is at least pro-
portional to the number of elements in the table, we have n = O.m/ and, con-
sequently, = n=m = O.m/=m = O.1/. Thus, searching takes constant time
on average. Since insertion takes O.1/ worst-case time and deletion takes O.1/
worst-case time when the lists are doubly linked, we can support all dictionary
operations in O.1/ time on average.
11.2 Hash tables 261
Exercises
11.2-1
Suppose we use a hash function h to hash n distinct keys into an array T of
length m. Assuming simple uniform hashing, what is the expected number of
collisions? More precisely, what is the expected cardinality of {{k; l] : k = l and
h.k/ = h.l/]?
11.2-2
Demonstrate what happens when we insert the keys 5; 28; 19; 15; 20; 33; 12; 17; 10
into a hash table with collisions resolved by chaining. Let the table have 9 slots,
and let the hash function be h.k/ = k mod 9.
11.2-3
Professor Marley hypothesizes that he can obtain substantial performance gains by
modifying the chaining scheme to keep each list in sorted order. How does the pro-
fessors modication affect the running time for successful searches, unsuccessful
searches, insertions, and deletions?
11.2-4
Suggest how to allocate and deallocate storage for elements within the hash table
itself by linking all unused slots into a free list. Assume that one slot can store
a ag and either one element plus a pointer or two pointers. All dictionary and
free-list operations should run in O.1/ expected time. Does the free list need to be
doubly linked, or does a singly linked free list sufce?
11.2-5
Suppose that we are storing a set of n keys into a hash table of size m. Show that if
the keys are drawn from a universe U with [U[ > nm, then U has a subset of size n
consisting of keys that all hash to the same slot, so that the worst-case searching
time for hashing with chaining is .n/.
11.2-6
Suppose we have stored n keys in a hash table of size m, with collisions resolved by
chaining, and that we know the length of each chain, including the length L of the
longest chain. Describe a procedure that selects a key uniformly at random from
among the keys in the hash table and returns it in expected time O.L .1 1=//.
262 Chapter 11 Hash Tables
11.3 Hash functions
In this section, we discuss some issues regarding the design of good hash functions
and then present three schemes for their creation. Two of the schemes, hashing by
division and hashing by multiplication, are heuristic in nature, whereas the third
scheme, universal hashing, uses randomization to provide provably good perfor-
mance.
What makes a good hash function?
A good hash function satises (approximately) the assumption of simple uniform
hashing: each key is equally likely to hash to any of the m slots, independently of
where any other key has hashed to. Unfortunately, we typically have no way to
check this condition, since we rarely know the probability distribution from which
the keys are drawn. Moreover, the keys might not be drawn independently.
Occasionally we do know the distribution. For example, if we know that the
keys are random real numbers k independently and uniformly distributed in the
range 0 _ k < 1, then the hash function
h.k/ = ]km
satises the condition of simple uniform hashing.
In practice, we can often employ heuristic techniques to create a hash function
that performs well. Qualitative information about the distribution of keys may be
useful in this design process. For example, consider a compilers symbol table, in
which the keys are character strings representing identiers in a program. Closely
related symbols, such as pt and pts, often occur in the same program. A good
hash function would minimize the chance that such variants hash to the same slot.
A good approach derives the hash value in a way that we expect to be indepen-
dent of any patterns that might exist in the data. For example, the division method
(discussed in Section 11.3.1) computes the hash value as the remainder when the
key is divided by a specied prime number. This method frequently gives good
results, assuming that we choose a prime number that is unrelated to any patterns
in the distribution of keys.
Finally, we note that some applications of hash functions might require stronger
properties than are provided by simple uniform hashing. For example, we might
want keys that are close in some sense to yield hash values that are far apart.
(This property is especially desirable when we are using linear probing, dened in
Section 11.4.) Universal hashing, described in Section 11.3.3, often provides the
desired properties.