Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo

1

David Luebke 1 02/10/17
CS 332: Algorithms
Medians and Order Statistics

2

David Luebke 2 02/10/17
Order Statistics
● The ith order statistic in a set of n elements is
the ith smallest element
● The minimum is thus the 1st order statistic
● The maximum is (duh) the nth order statistic
● The median is the n/2 order statistic
■ If n is even, there are 2 medians
● How can we calculate order statistics?
● What is the running time?

3

David Luebke 3 02/10/17
Order Statistics
● How many comparisons are needed to find the
minimum element in a set? The maximum?
● Can we find the minimum and maximum with
less than twice the cost?
● Yes:
■ Walk through elements by pairs
○ Compare each element in pair to the other
○ Compare the largest to maximum, smallest to minimum
■ Total cost: 3 comparisons per 2 elements =
O(3n/2)

4

David Luebke 4 02/10/17
Finding Order Statistics:
The Selection Problem
● A more interesting problem is selection:
finding the ith smallest element of a set
● We will show:
■ A practical randomized algorithm with O(n)
expected running time
■ A cool algorithm of theoretical interest only with
O(n) worst-case running time

5

David Luebke 5 02/10/17
Randomized Selection
● Key idea: use partition() from quicksort
■ But, only need to examine one subarray
■ This savings shows up in running time: O(n)
● We will again use a slightly different partition
than the book:
q = RandomizedPartition(A, p, r)
≤ A[q] ≥ A[q]
qp r

6

David Luebke 6 02/10/17
Randomized Selection
RandomizedSelect(A, p, r, i)
if (p == r) then return A[p];
q = RandomizedPartition(A, p, r)
k = q - p + 1;
if (i == k) then return A[q]; // not in book
if (i < k) then
return RandomizedSelect(A, p, q-1, i);
else
return RandomizedSelect(A, q+1, r, i-k);
≤ A[q] ≥ A[q]
k
qp r

7

David Luebke 7 02/10/17
Randomized Selection
● Analyzing RandomizedSelect()
■ Worst case: partition always 0:n-1
T(n) = T(n-1) + O(n) = ???
= O(n2
) (arithmetic series)
○ No better than sorting!
■ “Best” case: suppose a 9:1 partition
T(n) = T(9n/10) + O(n) = ???
= O(n) (Master Theorem, case 3)
○ Better than sorting!
○ What if this had been a 99:1 split?

8

David Luebke 8 02/10/17
Randomized Selection
● Average case
■ For upper bound, assume ith element always falls
in larger side of partition:
■ Let’s show that T(n) = O(n) by substitution
( ) ( )( ) ( )
( ) ( )∑
∑
−
=
−
=
Θ+≤
Θ+−−≤
1
2/
1
0
2
1,max
1
n
nk
n
k
nkT
n
nknkT
n
nT
What happened here?

9

David Luebke 9 02/10/17
What happened here?“Split” the recurrence
What happened here?
What happened here?
What happened here?
Randomized Selection
● Assume T(n) ≤ cn for sufficiently large c:
( )
( )
( )
( ) ( )
( ) ( )n
nc
nc
n
nn
nn
n
c
nkk
n
c
nck
n
nkT
n
nT
n
k
n
k
n
nk
n
nk
Θ+





−−−=
Θ+











−−−=
Θ+





−=
Θ+≤
Θ+≤
∑∑
∑
∑
−
=
−
=
−
=
−
=
1
22
1
2
1
22
1
1
2
12
2
2
)(
2
)(
12
1
1
1
1
2/
1
2/
The recurrence we started with
Substitute T(n) ≤ cn for T(k)
Expand arithmetic series
Multiply it out

10

David Luebke 10 02/10/17
What happened here?Subtract c/2
What happened here?
What happened here?
What happened here?
Randomized Selection
● Assume T(n) ≤ cn for sufficiently large c:
The recurrence so far
Multiply it out
Rearrange the arithmetic
What we set out to prove
( ) ( )
( )
( )
( )
enough)bigiscif(
24
24
24
1
22
1)(
cn
n
ccn
cn
n
ccn
cn
n
ccn
ccn
n
nc
ncnT
≤






Θ−+−=
Θ+−−=
Θ++−−=
Θ+





−−−≤

11

David Luebke 11 02/10/17
Worst-Case Linear-Time Selection
● Randomized algorithm works well in practice
● What follows is a worst-case linear time
algorithm, really of theoretical interest only
● Basic idea:
■ Generate a good partitioning element
■ Call this element x

12

David Luebke 12 02/10/17
Worst-Case Linear-Time Selection
● The algorithm in words:
1. Divide n elements into groups of 5
2. Find median of each group (How? How long?)
3. Use Select() recursively to find median x of the n/5
medians
4. Partition the n elements around x. Let k = rank(x)
5. if (i == k) then return x
if (i < k) then use Select() recursively to find ith smallest
element in first partition
else (i > k) use Select() recursively to find (i-k)th smallest
element in last partition

13

David Luebke 13 02/10/17
Worst-Case Linear-Time Selection
● (Sketch situation on the board)
● How many of the 5-element medians are ≤ x?
■ At least 1/2 of the medians = n/5 / 2 = n/10
● How many elements are ≤ x?
■ At least 3 n/10  elements
● For large n, 3 n/10  ≥ n/4 (How large?)
● So at least n/4 elements ≤ x
● Similarly: at least n/4 elements ≥ x

14

David Luebke 14 02/10/17
Worst-Case Linear-Time Selection
● Thus after partitioning around x, step 5 will
call Select() on at most 3n/4 elements
● The recurrence is therefore:
 ( ) ( ) ( )
( ) ( ) ( )
( )( )
enoughbigisif
20
)(2019
)(435
435
435)(
ccn
ncncn
ncn
ncncn
nnTnT
nnTnTnT
≤
Θ−−=
Θ+=
Θ++≤
Θ++≤
Θ++≤
???
???
???
???
???
n/5  ≤ n/5
Substitute T(n) = cn
Combine fractions
Express in desired form
What we set out to prove

15

David Luebke 15 02/10/17
Worst-Case Linear-Time Selection
● Intuitively:
■ Work at each level is a constant fraction (19/20)
smaller
○ Geometric progression!
■ Thus the O(n) work at the root dominates

16

David Luebke 16 02/10/17
Linear-Time Median Selection
● Given a “black box” O(n) median algorithm,
what can we do?
■ ith order statistic:
○ Find median x
○ Partition input around x
○ if (i ≤ (n+1)/2) recursively find ith element of first half
○ else find (i - (n+1)/2)th element in second half
○ T(n) = T(n/2) + O(n) = O(n)
■ Can you think of an application to sorting?

17

David Luebke 17 02/10/17
Linear-Time Median Selection
● Worst-case O(n lg n) quicksort
■ Find median x and partition around it
■ Recursively quicksort two halves
■ T(n) = 2T(n/2) + O(n) = O(n lg n)

18

David Luebke 18 02/10/17
The End

More Related Content

Medians and order statistics

  • 1. David Luebke 1 02/10/17 CS 332: Algorithms Medians and Order Statistics
  • 2. David Luebke 2 02/10/17 Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum is (duh) the nth order statistic ● The median is the n/2 order statistic ■ If n is even, there are 2 medians ● How can we calculate order statistics? ● What is the running time?
  • 3. David Luebke 3 02/10/17 Order Statistics ● How many comparisons are needed to find the minimum element in a set? The maximum? ● Can we find the minimum and maximum with less than twice the cost? ● Yes: ■ Walk through elements by pairs ○ Compare each element in pair to the other ○ Compare the largest to maximum, smallest to minimum ■ Total cost: 3 comparisons per 2 elements = O(3n/2)
  • 4. David Luebke 4 02/10/17 Finding Order Statistics: The Selection Problem ● A more interesting problem is selection: finding the ith smallest element of a set ● We will show: ■ A practical randomized algorithm with O(n) expected running time ■ A cool algorithm of theoretical interest only with O(n) worst-case running time
  • 5. David Luebke 5 02/10/17 Randomized Selection ● Key idea: use partition() from quicksort ■ But, only need to examine one subarray ■ This savings shows up in running time: O(n) ● We will again use a slightly different partition than the book: q = RandomizedPartition(A, p, r) ≤ A[q] ≥ A[q] qp r
  • 6. David Luebke 6 02/10/17 Randomized Selection RandomizedSelect(A, p, r, i) if (p == r) then return A[p]; q = RandomizedPartition(A, p, r) k = q - p + 1; if (i == k) then return A[q]; // not in book if (i < k) then return RandomizedSelect(A, p, q-1, i); else return RandomizedSelect(A, q+1, r, i-k); ≤ A[q] ≥ A[q] k qp r
  • 7. David Luebke 7 02/10/17 Randomized Selection ● Analyzing RandomizedSelect() ■ Worst case: partition always 0:n-1 T(n) = T(n-1) + O(n) = ??? = O(n2 ) (arithmetic series) ○ No better than sorting! ■ “Best” case: suppose a 9:1 partition T(n) = T(9n/10) + O(n) = ??? = O(n) (Master Theorem, case 3) ○ Better than sorting! ○ What if this had been a 99:1 split?
  • 8. David Luebke 8 02/10/17 Randomized Selection ● Average case ■ For upper bound, assume ith element always falls in larger side of partition: ■ Let’s show that T(n) = O(n) by substitution ( ) ( )( ) ( ) ( ) ( )∑ ∑ − = − = Θ+≤ Θ+−−≤ 1 2/ 1 0 2 1,max 1 n nk n k nkT n nknkT n nT What happened here?
  • 9. David Luebke 9 02/10/17 What happened here?“Split” the recurrence What happened here? What happened here? What happened here? Randomized Selection ● Assume T(n) ≤ cn for sufficiently large c: ( ) ( ) ( ) ( ) ( ) ( ) ( )n nc nc n nn nn n c nkk n c nck n nkT n nT n k n k n nk n nk Θ+      −−−= Θ+            −−−= Θ+      −= Θ+≤ Θ+≤ ∑∑ ∑ ∑ − = − = − = − = 1 22 1 2 1 22 1 1 2 12 2 2 )( 2 )( 12 1 1 1 1 2/ 1 2/ The recurrence we started with Substitute T(n) ≤ cn for T(k) Expand arithmetic series Multiply it out
  • 10. David Luebke 10 02/10/17 What happened here?Subtract c/2 What happened here? What happened here? What happened here? Randomized Selection ● Assume T(n) ≤ cn for sufficiently large c: The recurrence so far Multiply it out Rearrange the arithmetic What we set out to prove ( ) ( ) ( ) ( ) ( ) enough)bigiscif( 24 24 24 1 22 1)( cn n ccn cn n ccn cn n ccn ccn n nc ncnT ≤       Θ−+−= Θ+−−= Θ++−−= Θ+      −−−≤
  • 11. David Luebke 11 02/10/17 Worst-Case Linear-Time Selection ● Randomized algorithm works well in practice ● What follows is a worst-case linear time algorithm, really of theoretical interest only ● Basic idea: ■ Generate a good partitioning element ■ Call this element x
  • 12. David Luebke 12 02/10/17 Worst-Case Linear-Time Selection ● The algorithm in words: 1. Divide n elements into groups of 5 2. Find median of each group (How? How long?) 3. Use Select() recursively to find median x of the n/5 medians 4. Partition the n elements around x. Let k = rank(x) 5. if (i == k) then return x if (i < k) then use Select() recursively to find ith smallest element in first partition else (i > k) use Select() recursively to find (i-k)th smallest element in last partition
  • 13. David Luebke 13 02/10/17 Worst-Case Linear-Time Selection ● (Sketch situation on the board) ● How many of the 5-element medians are ≤ x? ■ At least 1/2 of the medians = n/5 / 2 = n/10 ● How many elements are ≤ x? ■ At least 3 n/10  elements ● For large n, 3 n/10  ≥ n/4 (How large?) ● So at least n/4 elements ≤ x ● Similarly: at least n/4 elements ≥ x
  • 14. David Luebke 14 02/10/17 Worst-Case Linear-Time Selection ● Thus after partitioning around x, step 5 will call Select() on at most 3n/4 elements ● The recurrence is therefore:  ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) enoughbigisif 20 )(2019 )(435 435 435)( ccn ncncn ncn ncncn nnTnT nnTnTnT ≤ Θ−−= Θ+= Θ++≤ Θ++≤ Θ++≤ ??? ??? ??? ??? ??? n/5  ≤ n/5 Substitute T(n) = cn Combine fractions Express in desired form What we set out to prove
  • 15. David Luebke 15 02/10/17 Worst-Case Linear-Time Selection ● Intuitively: ■ Work at each level is a constant fraction (19/20) smaller ○ Geometric progression! ■ Thus the O(n) work at the root dominates
  • 16. David Luebke 16 02/10/17 Linear-Time Median Selection ● Given a “black box” O(n) median algorithm, what can we do? ■ ith order statistic: ○ Find median x ○ Partition input around x ○ if (i ≤ (n+1)/2) recursively find ith element of first half ○ else find (i - (n+1)/2)th element in second half ○ T(n) = T(n/2) + O(n) = O(n) ■ Can you think of an application to sorting?
  • 17. David Luebke 17 02/10/17 Linear-Time Median Selection ● Worst-case O(n lg n) quicksort ■ Find median x and partition around it ■ Recursively quicksort two halves ■ T(n) = 2T(n/2) + O(n) = O(n lg n)
  • 18. David Luebke 18 02/10/17 The End