Mergesort / Quicksort: Steven Skiena
Mergesort / Quicksort: Steven Skiena
Mergesort / Quicksort: Steven Skiena
Mergesort / Quicksort
Steven Skiena
Department of Computer Science
State University of New York
Stony Brook, NY 117944400
http://www.cs.sunysb.edu/skiena
Solution
Mergesort
Recursive algorithms are based on reducing large problems
into small ones.
A nice recursive approach to sorting involves partitioning
the elements into two groups, sorting each of the smaller
problems recursively, and then interleaving the two sorted
lists to totally order the elements.
Mergesort Implementation
mergesort(item type s[], int low, int high)
{
int i; (* counter *)
int middle; (* index of middle element *)
if (low < high) {
middle = (low+high)/2;
mergesort(s,low,middle);
mergesort(s,middle+1,high);
merge(s, low, middle, high);
}
}
Mergesort Animation
M E R G E S O R T
M E R G E
M E R
S O R T
G E
S O
R T
M E
M
E M
E M R
E G
E E G M R
O S
R T
O R S T
E E G M O R R S T
Buffering
Although mergesort is O(n lg n), it is inconvenient to
implement with arrays, since we need extra space to merge.
the lists.
Merging (4, 5, 6) and (1, 2, 3) would overwrite the first three
elements if they were packed in an array.
Writing the merged list to a buffer and recopying it uses extra
space but not extra time (in the big Oh sense).
External Sorting
Which O(n log n) algorithm you use for sorting doesnt
matter much until n is so big the data does not fit in memory.
Mergesort proves to be the basis for the most efficient
external sorting programs.
Disks are much slower than main memory, and benefit from
algorithms that read and write data in long streams not
random access.
Quicksort
In practice, the fastest internal sorting algorithm is Quicksort,
which uses partitioning as its main idea.
Example: pivot about 10.
Before: 17 12 6 19 23 8 5 10
After:: 6 8 5 10 23 19 12 17
Partitioning places all the elements less than the pivot in the
left part of the array, and all elements greater than the pivot in
the right part of the array. The pivot fits in the slot between
them.
Note that the pivot element ends up in the correct place in the
total order!
Why Partition?
Since the partitioning step consists of at most n swaps, takes
time linear in the number of keys. But what does it buy us?
1. The pivot element ends up in the position it retains in the
final sorted order.
2. After a partitioning, no element flops to the other side of
the pivot in the final sorted order.
Thus we can sort the elements to the left of the pivot and the
right of the pivot independently, giving us a recursive sorting
algorithm!
Quicksort Pseudocode
Sort(A)
Quicksort(A,1,n)
Quicksort(A, low, high)
if (low < high)
pivot-location = Partition(A,low,high)
Quicksort(A,low, pivot-location - 1)
Quicksort(A, pivot-location+1, high)
Partition Implementation
Partition(A,low,high)
pivot = A[low]
leftwall = low
for i = low+1 to high
if (A[i] < pivot) then
leftwall = leftwall+1
swap(A[i],A[leftwall])
swap(A[low],A[leftwall])
Quicksort Animation
Q U I C K S O R T
Q I C K S O R T U
Q I C K O R S T U
I C K O Q R S T U
I C K O Q R S T U
I C K O Q R S T U
n/4
n/2
3n/4
Half the time, the pivot element will be from the center half
of the sorted array.
Whenever the pivot element is from positions n/4 to 3n/4, the
larger remaining subarray contains at most 3n/4 elements.
n
X
i=1
comes from and (2) how the log comes out from the
summation. The rest is just messy algebra.
n 1
X
(T (p 1) + T (n p)) + n 1
T (n) =
p=1 n
n
2 X
T (n) =
T (p 1) + n 1
p=1
n
n
X
nT (n) = 2
T (p 1) + n(n 1) multiply by n
p=1
(n1)T (n1) = 2
n1
X
p=1
1
2 ln n
i=1 (i + 1)
We are really interested in A(n), so
an 2
n
X
Randomized Quicksort
Suppose you are writing a sorting program, to run on data
given to you by your worst enemy. Quicksort is good on
average, but bad on certain worst-case instances.
If you used Quicksort, what kind of data would your enemy
give you to run it on? Exactly the worst-case instance, to
make you look bad.
But instead of picking the median of three or the first element
as pivot, suppose you picked the pivot element at random.
Now your enemy cannot design a worst-case instance to give
to you, because no matter which data they give you, you
would have the same probability of picking a good pivot!
Randomized Guarantees
Randomization is a very important and useful idea. By either
picking a random pivot or scrambling the permutation before
sorting it, we can say:
With high probability, randomized quicksort runs in
(n lg n) time.
Where before, all we could say is:
If you give me random input data, quicksort runs in
expected (n lg n) time.
Importance of Randomization
Since the time bound how does not depend upon your input
distribution, this means that unless we are extremely unlucky
(as opposed to ill prepared or unpopular) we will certainly get
good performance.
Randomization is a general tool to improve algorithms with
bad worst-case but good average-case complexity.
The worst-case is still there, but we almost certainly wont
see it.