Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ads Sorting Heaps

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Sorting and Heaps

ADC/ADS

Rom Langerak
E-mail: r.langerak@utwente.nl

© GfH&JPK&RL
Sorting and Heaps ADC/ADS

Plan for today

Several sorting algorithms:

• Insertion sort
• Quicksort
• Mergesort

Heaps:

• What is a heap?
• How to build a heap?
• Priority queues, heapsort

© GfH&JPK&RL 1
Sorting and Heaps ADC/ADS

Why look at sorting?


We look at sorting because

• sorting is done frequently and has many applications


• ideas for sorting give insight on how to improve algorithms
• ingenious and optimal algorithms have been found

We assume the following

• the set to be sorted is organized as an array (and not a list)


• data objects are sorted in nondecreasing order of their keys
• the measure of work is the number of comparisons of keys

You know selection and bubble sort; here we study other algorithms

besides the algorithm principles we’ll focus on their analysis!

© GfH&JPK&RL 2
Sorting and Heaps ADC/ADS

Insertion sort – Strategy

0 12 17 17 19 8 25 3 6 69 26 4 2 13 34 41

already sorted still unsorted


next to be sorted

• Traverse the (unsorted) array from left-to-right

• Pick the first element that is not considered so far

• Insert this in sorted (left) part by doing element-wise comparisons

• This algorithm works for other linear structures such as lists

© GfH&JPK&RL 3
Sorting and Heaps ADC/ADS

Insertion sort – Algorithm

def insertionSort(E):
for j in range(1,len(E)):
v=E[j]
i=j-1 #now insert v into sorted E[0,j]
while i>=0 and E[i]>v:
E[i+1]=E[i]
i=i-1
E[i+1]=v

insertion sort is in-place, i.e., does not require extra storage

© GfH&JPK&RL 4
Sorting and Heaps ADC/ADS

Analysis of insertion sort – 1


In worst case, each next element to be moved to its new position, ends
at the front of the array. This will be the case if the elements appear in
reverse order.

• this requires a comparison with all preceding elements


• in worst case, to find the slot for the i-index element, i comparisons are needed

This yields:
Pn−1 n·(n−1)
W (n) = i=0 (i) = 2 ∈ Θ(n2)

© GfH&JPK&RL 5
Sorting and Heaps ADC/ADS

Analysis of insertion sort – 2

Average case analysis assumptions:

• all permutations of the elements are equally likely to occur


• all elements to be sorted are distinct

Pn−1
A(n) = i=0 Xi
where Xi = expected # comparisons to find slot for element i

Further calculations (see book) lead to A(n) ≈ n·(n+3)


4 − ln n ∈ Θ(n2)
Insertion sort is quadratic in worst and average case!

© GfH&JPK&RL 6
Sorting and Heaps ADC/ADS

Quicksort – Strategy
41 26 17 25 19 17 8 3 6 69 12 4 2 13 34 0

pivot partition

8 3 6 12 4 2 13 0 17 41 26 17 69 25 19 34

< pivot > pivot

• Choose an element from the array to be sorted – called pivot

• Partition array in two parts: (i) smaller than and (ii) at least the pivot
several partitioning strategies do exist

• Sort parts recursively and squeeze pivot inbetween sorted parts


This is a prime example of the divide-and-conquer paradigm! [Hoare 1962]

© GfH&JPK&RL 7
Sorting and Heaps ADC/ADS

Quicksort – Algorithm

def quickSort(E,left,right):
if right>left:
i=partition(E,left,right) #i is split point
quickSort(E,left,i-1) #sort left part
quicksort(E, i+1,right) #sort right part

© GfH&JPK&RL 8
Sorting and Heaps ADC/ADS

Partitioning for quicksort – 1

Once a pivot is selected, partitioning can be done in O(n), e.g.:

• maintaining 3 regions: < pivot, > pivot and “unexamined”


• move left bound to the right as long as element < pivot
• move right bound to the left as long as element > pivot
• swap the two elements encountered on the left and right
• continue until left and right search meet

This partitioning algorithm differs from the one in the book!

© GfH&JPK&RL 9
Sorting and Heaps ADC/ADS

Partitioning for quicksort – 2

8 6 17 25 19 0 4 3 13 2 12 26 69 41 34 17
left bound right bound pivot
search

8 6 17 25 19 0 4 3 13 2 12 26 69 41 34 17
left bound right bound
< pivot > pivot
swap

8 6 12 25 19 0 4 3 13 2 17 26 69 41 34 17
left bound
< pivot right bound > pivot

search

© GfH&JPK&RL 10
Sorting and Heaps ADC/ADS

Partitioning for quicksort – Algorithm


def partition{E,left,right):
i,j=left,right
pivot=E[right] #pick some pivot
while i<j:
while E[i]<pivot and i<j:
i=i+1 #move left bound
while E[j-1]>=pivot and i<j:
j=j-1 #move right bound
if i<j:
E[i],E[j-1]=E[j-1],E[i]
i=i+1
j=j-1
if pivot<E[i]:
E[i],E[right]=E[right],E[i]
return i

© GfH&JPK&RL 11
Sorting and Heaps ADC/ADS

Partitioning - correctness invariants


E[left,i]<pivot, E[j, right+1]>pivot
while (i < j)

{ while (E[i] < pivot && i < j) i = i +1;


i=j or E[i]>pivot, E[left,i]<pivot

while (E[j-1] > pivot && i < j) j = j −1;


i=j or E[j-1]<pivot, E[j, right+1]>pivot

if (i < j) E[i]>pivot, E[j-1]<pivot, i=


6 j-1
{ swap(E[i],E[j-1]) i = i +1; j = j −1 };
E[left,i]<pivot, E[j, right+1]>pivot

}
E[left,i]<pivot, E[i, right+1]>pivot

© GfH&JPK&RL 12
Sorting and Heaps ADC/ADS

Quicksort – Space usage

At first sight, it looks like an in-place sorting algorithm. It is not

Recursive calls require storage of all the left and right parameters

In worst-case, partition splits only a single element at a time

In worst-case, Θ(n) stack storage is needed for n elements

Optimization possible (check book!) to obtain Θ(log n) storage

© GfH&JPK&RL 13
Sorting and Heaps ADC/ADS

Quicksort – Worst-case analysis

In worst case, pivot is the smallest (or largest) element in array

• the splitting into the smaller and larger part is as unbalanced as possible
• one part is empty, whereas the other part contains the remaining elements
• this appears e.g., when the array is already ascending (or descending)!
• this yields n−1 levels in the recursion tree

Pn−1 n·(n−1)
This yields: W (n) = i=0 (i) = 2 ∈ Θ(n2)

This is as bad as insertion sort, bubble sort, selection sort, and so on

So what is quick about quicksort?

© GfH&JPK&RL 14
Sorting and Heaps ADC/ADS

Quicksort – Best-case analysis

Divide-and-conquer works best if division is as equal as possible

• split the array of n elements into two sub-arrays of size n/2


• this yields log n levels in the recursion tree

Partitioning is linear in the size; i.e., on each level this is O(n)

This yields B(n) = 2·B(n/2) + c·n for n > 1, and B(1) = 1

Applying the Master theorem yields: B(n) ∈ Θ(n· log n)

It appears that this also holds for the average case: A(n) ∈ Θ(n log n)

© GfH&JPK&RL 15
Sorting and Heaps ADC/ADS

Mergesort – Strategy
41 26 17 25 19 17 8 3 6 69 12 4 2 13 34 0

sort recursively sort recursively


middle

3 8 17 17 19 25 26 41 0 2 4 6 12 13 34 69
merge

0 2 3 4 6 8 12 13 17 17 19 25 26 34 41 69

• Do an optimal split: divide the array to be sorted in two halves

• Sort parts recursively

• Merge the sorted sub-arrays into a single sorted array


Yet another prime example of the divide-and-conquer paradigm!

© GfH&JPK&RL 16
Sorting and Heaps ADC/ADS

Mergesort – Algorithm

def mergeSort(E,left,right):
if right>left:
mid=(right+left)//2
mergeSort(E,left,mid)
mergeSort(E,mid+1,right)
merge(E,left,mid,right)

Merging can be done in linear time; How?

© GfH&JPK&RL 17
Sorting and Heaps ADC/ADS

Mergesort – Analysis

For the worst-case behaviour we obtain:

W (n) = W (⌊n/2⌋) + W (⌈n/2⌉) + n − 1 with W (1) = 1

By the Master theorem, this yields W (n) ∈ Θ(n· log n)

Average-case behaviour is in Θ(n· log n)

Space usage: Θ(n) for the copy of the array at merging

More details can be found in the book

© GfH&JPK&RL 18
Sorting and Heaps ADC/ADS

Can sorting be more efficient?


yes E[0] < E[1]? no

E[1] < E[2]? E[0] < E[2]?


yes no yes no

E[0] < E[2]? E[1] < E[2]?


yes no yes no

View comparison-based sorting algorithm as a decision tree:


– decision tree describes the sequence of comparisons carried out
– sorting a different permutation of input data yields another path in tree
– # comparisons in worst case = length of longest path = level k of tree
– as only binary comparisons are used it is a binary tree with n! leaves
– n! 6 2k and thus k > ⌈log(n!)⌉ comparisons needed in worst case

as ⌈log(n!)⌉ ≈ n· log n − 1.4·n, we cannot do any better than n· log n

© GfH&JPK&RL 19
Sorting and Heaps ADC/ADS

Complexity of sorting algorithms

Algorithm Worst case Average Space usage

Insertion sort Θ(n2) Θ(n2) in place


Quicksort Θ(n2) Θ(n· log n) Θ(log n) extra
Mergesort Θ(n· log n) Θ(n· log n) Θ(n) extra

© GfH&JPK&RL 20
Sorting and Heaps ADC/ADS

Trees in arrays
16
16 14 10 8 7 9 3 2 4 1
0 1 2 3 4 5 6 7 8 9
14 10

8 7 9 3

2 4 1

The elements in an array E can be seen as the nodes of a binary tree.

• E[0] is the root of the tree

• E[2i+1] is the left- and E[2i+2] is the right-child of E[i]

© GfH&JPK&RL 21
Sorting and Heaps ADC/ADS

Heaps

A binary tree with a representation in an array has special properties.

• all leaves are on at most two adjacent levels


• all levels – except possibly the lowest – are completely filled
• all leaves on the lowest level occur “to the left”

Such a special binary tree is a heap if for the keys in the nodes:

• each node key exceeds that of all its children, or this is a maxheap
• the keys of all its children exceed the key of the node. this is a minheap

Previous slide shows a maxheap.

© GfH&JPK&RL 22
Sorting and Heaps ADC/ADS

Inserting an element in a maxheap

To add an element x to a maxheap H with n elements

• put x into the first (left-most) open position


• if x is greater than its parent, swap them, and repeat at a higher level

Doing such an insertion requires O(log n) comparisons

• the level k of a heap of n elements is bounded: n 6 2k+1 − 1


⇒ k = ⌈log(n + 1)⌉ − 1

To re-order an array into a maxheap do not use repeated insertions, but


heapify (or fixheap) instead [Floyd 1964]

© GfH&JPK&RL 23
Sorting and Heaps ADC/ADS

Heapify – Strategy

Consider E[i] and assume its left and right subtrees are heaps

• but E[i] may be smaller than its children

Construct a heap from the left- and right subtrees and E[i], let the value
of E[i] “float down” in the structure such that the structure rooted at E[i]
is a heap

• determine the maximum of the values E[i] and of its children


• if E[i] is the largest, then the subtree rooted at it is a heap. Done.
• otherwise, swap E[i] with the largest and heapify the corresponding subtree

© GfH&JPK&RL 24
Sorting and Heaps ADC/ADS

Heapify – Algorithm

def heapify(E,i):
left,right=2*i+1,2*i+2
if left<E.heapsize and E[left]>E[i]:
max=left
else:
max=i
if right<E.heapsize and E[right]>E[max]:
max=right # max is index of max{E[i],E[left],E[right]}
if max!=i: #so not a heap
E[i],E[max]=E[max],E[i]
heapify(E,max) # heapify subtree with root max

© GfH&JPK&RL 25
Sorting and Heaps ADC/ADS

Heapify – Example

0 4 0
Heapify(E,0) 16 Heapify(E,1)

Heap 1 16 1 10 2
10 2 Heap 4

3 4 5 3 4 5 6
7 9 6 3 14 7 9 3
14

9 7 8 9
7 8 2 1
2 8 1 8

0 0 Heap
16 16
Heapify(E,3)
1 1 10 2
14 10 2 14

3 5 6 3 4 5 6
4 4 7 9 3 8 7 9 3

7 8 7 8 9
9 2 4 1
2 8 1

© GfH&JPK&RL 26
Sorting and Heaps ADC/ADS

Constructing a maxheap – Algorithm


def buildHeap(E):
E.heapsize=len(E)
lastparent=(len(E)-2)//2
for i in range(lastparent,-1,-1):
heapify(E,i)

This algorithm can also be written in a recursive way (easier to


analyse!):
def constructHeap(E,i):
if 2*i+1<=len(E)-1:
constructHeap(E,2*i+1)
if 2*i+2<=len(E)-1:
constructHeap(E,2*i+2)
heapify(E,i)

© GfH&JPK&RL 27
Sorting and Heaps ADC/ADS

Constructing a heap – Example

0 4 0 4 Heapify(E,3)
Heapify(E,4)

1 3 2 1 3 2
1 1

3 4 16 5 6 10 3 4 16 5 6
2 9 2 9 10

7 8 9 7 8 9
14 8 7 14 8 7

4 1 3 2 16 9 10 14 8 7

10 elements

© GfH&JPK&RL 28
Sorting and Heaps ADC/ADS

0 4 Heapify(E,2) 0 4 Heapify(E,1)

1 1 10 2
1 3 2 1

3 4 16 5 6 3 4 16 5 6 3
14 9 10 14 9

7 8 9 7 8 9
2 8 7 2 8 7

0 4
recursion 0 4 Heapify(E,0)
Heapify(E,4)
1 16 10 2 1 16 10 2

3 4 1 5 6 3 3 4 5 6 3
14 9 14 7 9

7 8 9 7 8 9
2 8 7 2 8 1

© GfH&JPK&RL 29
Sorting and Heaps ADC/ADS

Constructing a maxheap – Complexity intuition

constructHeap splits the problem of creating a heap of n elements


into two subproblems of creating heaps each with half the number of
elements, and applies Heapify to merge the two into a single maxheap
in log n steps.

W (n) = 2 · W (n/2) + log n

Apply the Master theorem!

© GfH&JPK&RL 30
Sorting and Heaps ADC/ADS

Constructing a maxheap – Complexity intuition

constructHeap splits the problem of creating a heap of n elements


into two subproblems of creating heaps each with half the number of
elements, and applies Heapify to merge the two into a single maxheap
in log n steps.

W (n) = 2 · W (n/2) + log n

Apply the Master theorem!

b = c = 2, so E = 1; and f (n) = log n, so f (n) ∈ O(nE−0.5). The


theorem says:
W (n) ∈ Θ(n)

© GfH&JPK&RL 31
Sorting and Heaps ADC/ADS

Priority queues
Consider objects that are equipped with a key (or, priority)

• assume each key is associated to at most one data object

Objects are ordered according to their priority


A priority queue pq stores a collection of such objects and supports:

• isEmpty( ) returns true if pq is empty, and false otherwise


• insert(e,k) insert element e with key k into the queue pq
• getMin( ) returns the element with smallest key; requires non-empty pq
• delMin( ) deletes the element with smallest key; requires non-empty pq
• getElt(k) returns the object with key k in pq ; requires k to be in pq
• decrKey(e, k) sets key of e to k; requires e in pq and k < getKey(pq, e)

can be very efficiently implemented using heaps!

© GfH&JPK&RL 32
Sorting and Heaps ADC/ADS

Priority queue – Unsorted bounded array


implementation

tail
0 12 0 12 3 7
17 4 head 17 4 head 17 4 head 0 12 head
12 8 12 8 12 8 12 8

ins(0,12) ins(3,7) delMin() getElt(12)

tail
tail tail
3 7

black = element; red = key

© GfH&JPK&RL 33
Sorting and Heaps ADC/ADS

Priority queue – Sorted bounded array implementation

tail
0 12 12 8 12 8 head
12 8 head 12 8 head 3 7 head 3 7
17 4 17 4 17 4

ins(0,12) ins(3,7) delMin() getElt(12)

tail
tail tail
0 12 0 12

© GfH&JPK&RL 34
Sorting and Heaps ADC/ADS

Comparing priority queue implementations

Implementation Unsorted array Sorted array Heap


Operation

isEmpty( ) Θ(1) Θ(1) Θ(1)


insert(e, k) Θ(1) Θ(n)∗ Θ(log n)
getMin( ) Θ(n) Θ(1) Θ(1)
delMin( ) Θ(n) Θ(1) Θ(log n)
getElt(k) Θ(n) Θ(log n) Θ(n)
decrKey(e, k) Θ(1) Θ(n) Θ(log n)


this includes shifting all elements “to the right” of k

© GfH&JPK&RL 35
Sorting and Heaps ADC/ADS

Heapsort – Algorithm

Idea: you know where the maximum element of a (max)heap is.

def heapSort(E):
buildHeap(E)
for i in range(len(E)-1,0,-1):
E[0],E[i]=E[i],E[0]
E.heapsize=heapsize-1
heapify(E,0)

This algorithm sorts array E in nondecreasing order.

© GfH&JPK&RL 36
Sorting and Heaps ADC/ADS

Heapsort – Complexity analysis

Worst case complexity of Heapify is at most ⌊2· log n⌋ for n nodes

The worst case complexity of buildHeap is Θ(n)

This yields for heapsort:

X
n−1 Z n
W (n) = 2·⌊log i⌋ 6 2 · (log e) ln x dx = 2·n· log n + c1·n + c2
i=1 1

No extra space is needed (tail recursion in Heapify can be removed):

heapsort sorts in Θ(n· log n) and sorts in-place

© GfH&JPK&RL 37
Sorting and Heaps ADC/ADS

Complexity of sorting algorithms

Algorithm Worst case Average Space usage

Insertion sort Θ(n2) Θ(n2) in place


Quicksort Θ(n2) Θ(n· log n) Θ(log n) extra
Mergesort Θ(n· log n) Θ(n· log n) Θ(n) extra
Heapsort Θ(n· log n) Θ(n· log n) in place

© GfH&JPK&RL 38

You might also like