Quicksort PDF
Quicksort PDF
Quicksort PDF
(CLRS 7)
• We previously saw how the divide-and-conquer technique can be used to design sorting
algorithm—Merge-sort
• Another possibility is to divide the elements such that there is no need of merging, that is
– Partition A[1...n] into subarrays A0 = A[1..q] and A” = A[q +1...n] such that all elements
in A” are larger than all elements in A0 .
– Recursively sort A0 and A”.
– (nothing to combine/merge. A already sorted after sorting A0 and A”)
If q = n/2 and we divide in Θ(n) time, we again get the recurrence T (n) = 2T (n/2) + Θ(n)
for the running time ⇒ T (n) = Θ(n log n)
The problem is that it is hard to develop partition algorithm which always divide A in two
halves
1
Partition(A, p, r)
x = A[r]
i=p−1
FOR j = p TO r − 1 DO
IF A[j] ≤ x THEN
i=i+1
Exchange A[i] and A[j]
FI
OD
Exchange A[i + 1] and A[r]
RETURN i + 1
Quicksort correctness:
• ..easy to show, inductively, if Partition works correctly
• Example:
2 8 7 1 3 5 6 4 i=0, j=1
2 8 7 1 3 5 6 4 i=1, j=2
2 8 7 1 3 5 6 4 i=1, j=3
2 8 7 1 3 5 6 4 i=1, j=4
2 1 7 8 3 5 6 4 i=2, j=5
2 1 3 8 7 5 6 4 i=3, j=6
2 1 3 8 7 5 6 4 i=3, j=7
2 1 3 8 7 5 6 4 i=3, j=8
2 1 3 4 7 5 6 8 q=4
• Partition can be proved correct (by induction) using the loop invariant:
– A[k] ≤ x for p ≤ k ≤ i
– A[k] > x for i + 1 ≤ k ≤ j − 1
– A[k] = x for k = r
quicksort analysis
• But, in the worst case Partition always returns q = p or q = r and the running time becomes
T (n) = Θ(n) + T (0) + T (n − 1) ⇒ T (n) = Θ(n2 ).
2
– and what is maybe even worse, the worst case is when A is already sorted.
– Even if all the splits are relatively bad, we get Θ(n log n) time:
9 1
∗ Example: Split is 10 n, 10 n.
9 1
T (n) = T ( 10 n) + T ( 10 n) + n
Solution?
Guess: T (n) ≤ cn log n
Induction
9 1
T (n) = T ( n) + T ( n) + n
10 10
9cn 9n cn n
≤ log( ) + log( ) + n
10 10 10 10
9cn 9cn 9 cn cn 1
≤ log n + log( ) + log n + log( ) + n
10 10 10 10 10 10
9cn 9cn cn
≤ cn log n + log 9 − log 10 − log 10 + n
10 10 10
9c
≤ cn log n − n(c log 10 − log 9 − 1)
10
9c 10
T (n) ≤ cn log n if c log 10 − 10 log 9 − 1 > 0 which is definitely true if c > log 10
– So, in other words, if the splits happen at a constant fraction of n we get Θ(n lg n)—or,
it’s almost never bad!
• If we run quicksort on a set of inputs that are all almost sorted, the average running time
will be close to the worst-case.
• Similarly, if we run quicksort on a set of inputs that give good splits, the average running
time will be close to the best-case.
• If we run quicksort on a set of inputs which are picked uniformly at random from the space
of all possible input permutations, then the average case will also be close to the best-case.
Why? Intuitively, if any input ordering is equally likely, then we expect at least as many good
splits as bad splits, therefore on the average a bad split will be followed by a good split, and
it gets “absorbed” in the good split.
So, under the assumption that all input permutations are equally likely, the average time of
Quicksort is Θ(n lg n) (intuitively). Is this assumption realistic?
3
• Not really. In many cases the input is almost sorted; think of rebuilding indexes in a database
etc.
The question is: how can we make Quicksort have a good average time irrespective of the
input distribution?
• Using randomization.
Randomization
We consider what we call randomized algorithms, that is, algorithms that make some random choices
during their execution.
• Running time of a randomized algorithm depends not only on input but also on the random
choices made by the algorithm.
• Randomized algorithms have best-case and worst-case running times, but the inputs for which
these are achieved are not known, they can be any of the inputs.
We are normally interested in analyzing the expected running time of a randomized algorithm,
that is, the expected (average) running time for all inputs of size n
Randomized Quicksort
• We can enforce that all n! permutations are equally likely by randomly permuting the input
before the algorithm.
4
• Alternatively we can modify Partition slightly and exchange last element in A with random
element in A before partitioning.
RandPartition(A, p, r)
i=Random(p, r)
Exchange A[r] and A[i]
RETURN Partition(A, p, r)
RandQuicksort(A, p, r)
IF p < r THEN
q=RandPartition(A, p, r)
RandQuicksort(A, p, q − 1)
RandQuicksort(A, q + 1, r)
FI
• Running time of RandQuicksort is the total running time spent in all Partition calls.
• One call of Partition takes O(1) time plus time proportional to the number of iterations of
FOR-loop.
⇓
If X is the number of comparisons A[j] ≤ x performed in Partition over the entire execution
of RandQuicksort then the running time is O(n + X).
⇓
E[T (n)] = E[O(n + X)] = n + E[X]
⇓
To analyze the expected running time we need to compute E[X]
• Each pair of elements zi and zj are compared at most once (when either of them is the pivot)
⇓
Pn−1 Pn
X= i=1 j=i+1 Xij where
5
(
1 If zi compared to zi
Xij =
0 If zi not compared to zi
⇓
hP i
n−1 n P
E[X] = E i=1 j=i+1 Xij
Pn−1 Pn
= i=1 Pj=i+1 E[Xij ]
Pn−1 n
= i=1 j=i+1 P r[zi compared to zj ]
• To compute P r[zi compared to zj ] it is useful to consider when two elements are not compared.
In general, once a pivot x, zi < x < zj , is chosen, we know that zi and zj cannot later be
compared.
On the other hand, if zi is chosen as pivot before any other element in Zij then it is compared
to each element in Zij . Similar for zj .
In example: 7 and 9 are compared because 7 is first item from Z7,9 to be chosen as pivot,
and 2 and 9 are not compared because the first pivot in Z2,9 is 7.
Prior to an element in Zij being chosen as pivot, the set Zij is together in the same partition
⇒ any element in Zij is equally likely to be first element chosen as pivot ⇒ the probability
1
that zi or zj is chosen first in Zij is j−i+1
⇓
2
P r[zi compared to zj ] = j−i+1
• We now have:
Pn−1 Pn
E[X] = i=1 Pj=i+1 P r[zi compared to zj ]
Pn−1 n 2
= i=1
Pn−1 Pj=i+1
n−i 2
j−i+1
=
Pi=1 k=1 k+1
n−1 Pn−i 2
< i=1 k=1 k
Pn−1
= i=1 O(log n)
= O(n log n)
• Since best case is θ(n lg n) =⇒ E[X] = Θ(n lg n) and therefore E[T (n)] = Θ(n lg n).
Next time we will see how to make quicksort run in worst-case O(n log n) time.