Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lecture Notes On Sorting: 15-122: Principles of Imperative Computation Frank Pfenning September 18, 2012

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Lecture Notes on

Sorting

15-122: Principles of Imperative Computation


Frank Pfenning

Lecture 7
September 18, 2012

1 Introduction
We begin this lecture by discussing how to compare running times of func-
tions in an abstract, mathematical way. The same underlying mathematics
can be used for other purposes, like comparing memory consumption or
the amount of parellism permitted by an algorithm. We then use this to
take a first look at sorting algorithms, of which there are many. In this lec-
ture it will be selection sort because of its simplicity.
In terms of our learning goals, we will work on:
Computational Thinking: Still trying to understand how order can lead
to efficient computation. Worst-case asymptotic complexity of func-
tions.

Algorithms and Data Structures: In-place sorting of arrays in general, and


selection sort in particular. Big-O notation.

Programming: More examples of programming with arrays and algorithm


invariants.

2 Big-O Notation
Our brief analysis in the last lecture already indicates that linear search
should take about n iterations of a loop while binary search take about
log2 (n) iterations, with a constant number of operations in each loop body.
This suggests that binary search should more efficient. In the design and

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.2

analysis of algorithms we try to make this mathematically precise by deriv-


ing so-called asymptotic complexity measures for algorithms. There are two
fundamental principles that guide our mathematical analysis.
1. We only care about the behavior of an algorithm on large inputs, that
is, when it takes a long time. It is when the inputs are large that differ-
ences between algorithms become really pronounced. For example,
linear search on a 10-element array will be practically the same as bi-
nary search on a 10-element array, but once we have an array of, say,
a million entries the difference will be huge.

2. We do not care about constant factors in the mathematical analysis.


For example, in analyzing the search algorithms we count how of-
ten we have to iterate, not exactly how many operations we have to
perform on each iteration. In practice, constant factors can make a
big difference, but they are influenced by so many factors (compiler,
runtime system, machine model, available memory, etc.) that at the
abstract, mathematical level a precise analysis is neither appropriate
nor feasible.
Let’s see how these two fundamental principles guide us in the comparison
between functions that measure the running time of an algorithm.
Let’s say we have functions f and g that measure the number of oper-
ations of an algorithm as a function of the size of the input. For example
f (n) = 3 ∗ n measures the number of comparisons performed in linear
search for an array of size n, and g(n) = 3 ∗ log(n) measures the number of
comparisons performed in binary search for an array of size n.
The simplest form of comparison would be
g ≤0 f if for every n ≥ 0, g(n) ≤ f (n).
However, this violates principle (1) because we compare the values and g
and f on all possible inputs n.
We can refine this by saying that eventually, g will always be smaller or
equal to f . We express “eventually” by requiring that there be a number n0
such that g(n) ≤ f (n) for all n that are greater than n0 .
g ≤1 f if there is some n0 such that for every n ≥ n0 it is the case
that g(n) ≤ f (n).
This now incorporates the first principle (we only care about the func-
tion on large inputs), but constant factors still matter. For example, accord-
ing to the last definition we have 3 ∗ n ≤1 5 ∗ n but 5 ∗ n 6≤1 3 ∗ n. But if

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.3

constants factors don’t matter, then the two should be equivalent. We can
repair this by allowing the right-hand side to be multiplied by an arbitrary
constant.

g ≤2 f if there is a constant c > 0 and some n0 such that for


every n ≥ n0 we have g(n) ≤ c ∗ f (n).

This definition is now appropriate.


The less-or-equal symbol ≤ is already overloaded with many meanings,
so we write instead:

g ∈ O(f ) if there is a constant c > 0 and some n0 such that for


every n ≥ n0 we have g(n) ≤ c ∗ f (n).

This notation derives from the view of O(f ) as a set of functions, namely
those that eventually are smaller than a constant times f .1 Just to be ex-
plicit, we also write out the definition of O(f ) as a set of functions:

O(f ) = {g | there are c > 0 and n0 s.t. for all n ≥ n0 , g(n) ≤ c ∗ f (n)}

With this definition we can check that O(f (n)) = O(c ∗ f (n)).
When we characterize the running time of a function using big-O nota-
tion we refer to it as the asymptotic complexity of the function. Here, asymp-
totic refers to the fundamental principles listed above: we only care about
the function in the long run, and we ignore constant factors. Usually, we
use an analysis of the worst case among the inputs of a given size. Trying
to do average case analysis is much harder, because it depends on the distri-
bution of inputs. Since we often don’t know the distribution of inputs it is
much less clear whether an average case analysis may apply in a particular
use of an algorithm.
The asymptotic worst-case time complexity of linear search is O(n),
which we also refer to as linear time. The worst-case asymptotic time com-
plexity of binary search is O(log(n)), which we also refer to as logarithmic
time. Constant time is usually described as O(1), expressing that the running
time is independent of the size of the input.
Some brief fundamental facts about big-O. For any polynomial, only
the highest power of n matters, because it eventually comes to dominate the
function. For example, O(5∗n2 +3∗n+83) = O(n2 ). Also O(log(n)) ⊆ O(n),
but O(n) 6⊆ O(log(n)).
1
In textbooks and research papers you may sometimes see this written as g = O(f ) but
that is questionable, comparing a function with a set of functions.

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.4

That is the same as to say O(log(n)) ( O(n), which means that O(log(n))
is a proper subset of O(n), that is, O(log(n)) is a subset (O(log(n)) ⊆ O(n)),
but they are not equal (O(log(n)) 6= O(n)). Logarithms to different (con-
stant) bases are asymptotically the same: O(log2 (n)) = O(logb (n)) because
logb (n) = log2 (n)/log2 (b).
As a side note, it is mathematically correct to say the worst-case running
time of binary search is O(n), because log(n) ∈ O(n). It is, however, a
looser characterization than saying that the running time of binary search
is O(log(n)), which is also correct. Of course, it would be incorrect to say
that the running time is O(1). Generally, when we ask you to characterize
the worst-case running time of an algorithm we are asking for the tightest
bound in big-O notation.

3 Sorting Algorithms
We have seen in the last lecture that sorted arrays drastically reduce the
time to search for an element when compared to unsorted arrays. Asymp-
totically, it is the difference between O(n) (linear time) and O(log(n)) (loga-
rithmic time), where n is the length of the input array. This suggests that it
may be important to establish this invariant, namely sorting a given array.
In practice, this is indeed the case: sorting is an important component of
many other data structures or algorithms.
There are many different algorithms for sorting: bucket sort, bubble
sort, insertion sort, selection sort, heap sort, etc. This is testimony to the
importance and complexity of the problem, despite its apparent simplicity.
In this lecture we discuss selection sort, which is one of the simplest
algorithms. In the next lecture we will discuss quicksort. Earlier course in-
stances used mergesort as another example of efficient sorting algorithms.

4 Selection Sort
Selection sort is based on the idea that on each iteration we select the small-
est element of the part of the array that has not yet been sorted and move it
to the end of the sorted part at the beginning of the array.
Let’s play this through for two steps on an example array. Initially, we
consider the whole array (from i = 0 to the end). We write this as A[0..n),
that is the segment of the array starting at 0 up to n, where n is excluded.

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.5

0   1   2   3   4   5   6   7   8   9   10  
A   12   87   21   3   2   78   97   16   89   21  

i  =  0   n  

We now find the minimal element of the array segment under consid-
eration (2) and move it to the front of the array. What do we do with the
element that is there? We move it to the place where 2 was (namely at A[4]).
In other words, we swap the first element with the minimal element. Swap-
ping is a useful operation when we sorting an array in place by modifying
it, because the result is clearly a permutation of the input. If swapping is
our only operation we are immediately guaranteed that the result is a per-
mutation of the input.

0   1   2   3   4   5   6   7   8   9   10  
A   2   87   21   3   12   78   97   16   89   21  

i   n  

Now 2 is in the right place, and we find the smallest element in the
remaining array segment and move it to the beginning of the segment (i =
1).

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.6

0   1   2   3   4   5   6   7   8   9   10  
A   2   3   21   87   12   78   97   16   89   21  

i   n  

Let’s pause and see if we can write down properties of the variables and
array segments that allow us to write the code correctly. First we observe
rather straightforwardly that

0≤i≤n

where i = n after the last iteration and i = 0 before the first iteration. Next
we observe that the elements to the left of i are already sorted.

A[0..i) sorted

These two invariants are not yet sufficient to prove the correctness of selec-
tion sort. We also need to know that all elements to the left of i are less or
equal to all element to the right of i. We abbreviate this:

A[0..i) ≤ A[i..n)

saying that every element in the left segment is smaller than or equal to
every element in the right segment.

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.7

We summarize the invariants


0≤i≤n
A[0..i) sorted
A[0..i) ≤ A[i..n)

Let’s reason through without any code (for the moment), why these invari-
ants are preserved. Let’s look at the picture again.

0   1   2   3   4   5   6   7   8   9   10  
A   2   3   21   87   12   78   97   16   89   21  

i   n  

In the next iteration we pick the minimal element among A[i..n), which
would be 12 = A[4]. We now swap this to i = 2 and increment i. We write
here i0 = i + 1 in order to distinguish the old value of i from the new one,
as we do in proofs of preservation of the loop invariant.

0   1   2   3   4   5   6   7   8   9   10  
A   2   3   12   87   21   78   97   16   89   21  

i   i’  =  i+1   n  

Since we only step when i < n, the bounds on i are preserved.


Why is A[0..i+1) sorted? We know by the third invariant that any ele-
ment in A[0..i) is less than any element in A[i..n) and in particular the one
we moved to A[i+1].
Why is A[0..i+1) ≤ A[i+1..n)? We know from the loop invariant before
the iteration that A[0..i) ≤ A[i+1..n). So it remains to show that A[i..i+1] ≤
A[i+1..n). But that is true since A[i] was a minimal element of A[i..n) which
is the same as saying that it is smaller or equal to all the elements in A[i..n)
and therefore also A[i+1..n) after we swap the old A[i] into its new position.

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.8

5 Programming Selection Sort


From the above invariants and description of the algorithm, the correct
code is simple to write, including its invariants. The function does not
return a value, since it modifies the given array A, so it has declaration:

void sort(int[] A, int n)


//@requires 0 <= n && n <= \length(A);
//@ensures is_sorted(A, 0, n);
;

We encourage you to now write the function, using the following aux-
iliary and contract functions:

1. is_sorted(A, lower, upper) which is true if the array segment


A[lower ..upper ) is sorted.

2. le_segs(A, lower1, upper1, lower2, upper2) which is true if


A[lower 1 ..upper 1 ) ≤ A[lower 2 ..upper 2 ) (which means all elements in
the first segment are less or equal to the all elements in the second
array segment).

3. swap(A, i, j) modifies the array A by swapping A[i] with A[j]. Of


course, if i = j, the array remains unchanged.

4. min_index(A, lower, upper) which returns the index m of a mini-


mal element in the segment A[lower ..upper ).

Please write it and then compare it to our version on the next page.

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.9

void sort(int[] A, int n)


//@requires 0 <= n && n <= \length(A);
//@ensures is_sorted(A, 0, n);
{
for (int i = 0; i < n; i++)
//@loop_invariant 0 <= i && i <= n;
//@loop_invariant is_sorted(A, 0, i);
//@loop_invariant le_segs(A, 0, i, i, n);
{
int m = min_index(A, i, n);
//@assert le_seg(A[m], A, i, n);
swap(A, i, m);
}
return;
}

At this point, let us verify that the loop invariants are initially satisfied.

• 0 ≤ i and i ≤ n since i = 0 and 0 ≤ n (by precondition (@requires)).

• A[0..i) is sorted, since for i = 0 the segment A[0..0) is empty (has no


elements) since the right bound is exclusive.

• A[0..i) ≤ A[i..n) is true since for i = 0 the segment A[0..0) has no


elements. The other segment, A[0..n), is the whole array.

We should also verify the assertion we added in the loop body. It ex-
presses that A[m] is less or equal to any element in the segment A[i..n),
abbreviated mathematically as A[m] ≤ A[i..n). This should be implies by
the postcondition of the min_index function.
How can we prove the postcondition (@ensures) of the sorting func-
tion? By the loop invariant 0 ≤ i ≤ n and the negation of the loop condition
i ≥ n we know i = n. The second loop invariant then states that A[0..n) is
sorted, which is the postcondition.

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.10

6 Auxiliary Functions
Besides the specification functions in contracts, we also used two auxiliary
functions: swap and min_index.
Here is the implementation of swap.

void swap(int[] A, int i, int j)


//@requires 0 <= i && i < \length(A);
//@requires 0 <= j && j < \length(A);
{
int tmp = A[i];
A[i] = A[j];
A[j] = tmp;
return;
}

For min_index, we recommend you follow the method used for selec-
tion sort: follow the algorithm for a couple of steps on a generic example,
write down the invariants in general terms, and then synthesize the simple
code and invariants from the result. What we have is below, for complete-
ness.

int min_index(int[] A, int lower, int upper)


//@requires 0 <= lower && lower < upper && upper <= \length(A);
//@ensures lower <= \result && \result < upper;
//@ensures le_seg(A[\result], A, lower, upper);
{
int m = lower;
int min = A[lower];
for (int i = lower+1; i < upper; i++)
//@loop_invariant lower < i && i <= upper;
//@loop_invariant le_seg(min, A, lower, i);
//@loop_invariant A[m] == min;
if (A[i] < min) {
m = i;
min = A[i];
}
return m;
}

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.11

7 Asymptotic Complexity Analysis


Previously, we have had to expend some effort to prove that functions actu-
ally terminate (like in the function for greatest common divisor). Here we
do more: we do counting in order to give a big-O classification of the num-
ber of operations. If we have an explicit bound on the number of operations
that, of course, implies termination.
The outer loop iterates n times, from i = 0 to i = n − 1. Actually, we
could stop one iteration earlier, but that does not effect the asymptotic com-
plexity, since it only involves a constant number of additional operations.
For each iteration of the outer loop (identified by the value for i), we
do a linear search through the array segment to the right of i and then a
simple swap. The linear search will take n − i iterations, and cannot be
easily improved since the array segment A[i..n) is not (yet) sorted. So the
total number of iterations (counting the number of inner iterations for each
outer one)
n(n + 1)
n + (n − 1) + (n − 2) + · · · + 1 =
2
During each of these iterations, we only perform a constant amount of op-
erations (some comparisons, assignments, and increments), so, asymptoti-
cally, the running time can be estimated as

n(n + 1) n2 n
O( ) = O( + ) = O(n2 )
2 2 2
The last equation follows since for a polynomial, as we remarked earlier,
only the degree matters.
We summarize this by saying that the worst-case running time of selec-
tion sort is quadratic. In this algorithm there isn’t a significant difference
between average case and worst case analysis: the number of iterations is
exactly the same, and we only save one or two assignments per iteration in
the loop body of the min_index function if the array is already sorted.

L ECTURE N OTES S EPTEMBER 18, 2012


Sorting L7.12

8 Empirical Validation
If the running time were really O(n2 ) and not asymptotically faster, we pre-
dict the following: for large inputs, its running time should be essentially
cn2 for some constant c. If we double the size of the input to 2n, then the
running time should roughly become c(2n)2 = 4(cn2 ) which means the
function should take approximately 4 times as many seconds as before.
We try this with the function sort_time(n, r) which generates a ran-
dom array of size n and then sorts it r times. You can find the C0 code at
sort-time.c0. We run this code several times, with different parameters.

% cc0 selectsort.c0 sort-time.c0


% time ./a.out -n 1000 -r 100
Timing array of size 1000, 100 times
0
0.700u 0.001s 0:00.70 100.0% 0+0k 0+0io 0pf+0w
% time ./a.out -n 2000 -r 100
Timing array of size 2000, 100 times
0
2.700u 0.001s 0:02.70 100.0% 0+0k 0+0io 0pf+0w
% time ./a.out -n 4000 -r 100
Timing array of size 4000, 100 times
0
10.790u 0.002s 0:10.79 100.0% 0+0k 0+0io 0pf+0w
% time ./a.out -n 8000 -r 100
Timing array of size 8000, 100 times
0
42.796u 0.009s 0:42.80 99.9% 0+0k 0+0io 0pf+0w
%

Calculating the ratios of successive running times, we obtain

n Time Ratio
1000 0.700
2000 2.700 3.85
4000 10.790 4.00
8000 42.796 3.97

We see that especially for the larger numbers, the ratio is almost exactly 4
when doubling the size of the input. Our conjecture of quadratic asymp-
totic running time has been experimentally confirmed.

L ECTURE N OTES S EPTEMBER 18, 2012

You might also like