T02 - Algorithm Analysis and Design
T02 - Algorithm Analysis and Design
Learning objectives
1. Understand why it is important to be able to compare the
complexity of algorithms
2. Measure the complexity of algorithms
3. Analyse the performance of algorithms
4. Be able to perform empirical analyses of algorithms
5. Develop new algorithms
2
RMIT Classification: Trusted
3
RMIT Classification: Trusted
Array Search
Items Average number of tries
• 10 5
• 100 50
• 1000 500
• 10,000 5000
• 100,000 50,000
• 1,000,000 500,000
• 10,000,000 5,000,000 For a non-sorted array of size N, the number of
tries to find a value is correlated to N
4
RMIT Classification: Trusted
Binary Search
Items Tries
• 10 4
• 100 7
• 1000 10
• 10,000 14
• 100,000 17
• 1,000,000 20
• 10,000,000 24 For a sorted array of size N, the
number of tries to find a value is
correlated to log2N
5
RMIT Classification: Trusted
Sorting
• This is a very interesting topic that we will cover extensively
in class as many problems require using or generating
sorted data
• But to answer the question on the medical company we
need to be able to determine how fast sorting can be done
6
RMIT Classification: Trusted
Which is Faster?
• Assume that sorting takes 10 n*log(n) msec, linear search takes 2*n
msec and binary search takes 2*log(n) msec
• If you do the search 10 times on 106 which approach is best?
1. 10*2*106 = 20*106 msec = 20,000 sec
2. Sorting: 10 * 10 6 * log 106 ≈ 10*106 *13.8 msec ≈ 138,000 sec +
Searching: 10*2*log 10 6 = 0.276 sec. Total ≈ 138,000 sec
• If you do the search 1000 times and the data size is 106 then
1. 1000*2*106 = 2000*106 msec = 2,000,000 sec
2. Sorting: 10 * 10 6 * log 106 ≈ 10*106 *13.8 msec ≈ 138,000 secs +
Searching: 1000*2*log 10 6 = 27.6 seconds. Total ≈ 138,027 seconds
7
RMIT Classification: Trusted
Program Performance
• Assume that we can find a formula that, for a
specific machine, determines how long a
program takes to run on various input (N) sizes
• When do programs A and B become impractical?
o If n = 10 then A takes 20 and B takes 50 secs
8
RMIT Classification: Trusted
9
RMIT Classification: Trusted
Comparing Performance
• Different algorithms that solve the same problem can have very
different performance
o How do we compare the performance of two programs that solve the same
problem?
o Should we just run them and compare their time requirements?
• Need address issues such as - what computer to use, what data to use
and how much space is needed?
• An inefficient algorithm on a small data set (i.e., one that grows
quadratically) would be unfeasible to run on a large data set
11
RMIT Classification: Trusted
What to measure?
• In this lecture, we look at the ways of estimating the
running time of a program and how to compare the running
times of two programs without ever implementing them
• It is vital to analyse the resource use of an algorithm, well
before it is implemented and deployed
• Space is also important but we focus on time in this course
12
RMIT Classification: Trusted
Time Efficiency
RMIT Classification: Trusted
14
RMIT Classification: Trusted
15
RMIT Classification: Trusted
Basic operation
Operation(s) that contribute most towards the total running
time
• Examples:
o Compare ( i != j )
o Add ( i + j )
o Multiply ( i * j )
o Divide ( i / j )
o Assignment ( i = j )
16
RMIT Classification: Trusted
17
RMIT Classification: Trusted
18
RMIT Classification: Trusted
19
RMIT Classification: Trusted
20
RMIT Classification: Trusted
21
RMIT Classification: Trusted
Runtime Complexity
• Worst Case - Given an input of n items, what is the
maximum running time for any possible input?
• Best Case - Given an input of n items, what is the
minimum running time for any possible input?
• Average Case - Given an input of n items, what is the
average running time across all possible inputs?
NOTE: Average Case is not the average of the worst and best case.
Rather, it is the average performance across all possible inputs.
22
RMIT Classification: Trusted
Runtime Complexity
• Sequential Search: search for the key by traversing
values in the array one-by-one
• Best-case:
o The best case input is when the item being searched for
is the first item in the list, so Cbest (n) = 1.
• Worst-case:
o The worst case input is when the item being searched
for is not present in the list, so Cworst (n) = n.
23
RMIT Classification: Trusted
Runtime Complexity
• Average Case: What does average case mean?
o Recall: average across all possible inputs – how to
analyse this?
o Typically not straight forward
24
RMIT Classification: Trusted
Runtime Complexity
Average Case Analysis: p is the probability of a successful
search.
If search is successful
Cavg(n) = (1 + 2 + … + n)/n = (n + 1)/2
(Assume the probability of finding the search value at each
element is the same)
If search is unsuccessful
Cavg(n) = n
25
RMIT Classification: Trusted
Summary
• Input size, basic operation
• Time complexity estimate using input size and
basic operation
• Best, worst, and average cases
26
RMIT Classification: Trusted
Asymptotic Complexity
RMIT Classification: Trusted
Asymptotic Complexity
• Problem:
o We now have a way to analyse the running time (a.k.a.
time complexity) of an algorithm, but every algorithm
has their own time complexity.
• 𝑇1 = 𝑐1 ∙ 𝐶 𝑛1 , 𝑇2 = 𝑐2 ∙ 𝐶 𝑛2 , 𝑇3 = 𝑐3 ∙ 𝐶 𝑛3 …
o How to compare in a meaningful way?
28
RMIT Classification: Trusted
Asymptotic Complexity
• Solution:
o Group them into equivalence classes (for
easier comparison and understanding), with
respect to the input size
o Focus of this part: asymptotic complexity and
equivalence classes
29
RMIT Classification: Trusted
Asymptotic Complexity
Consider the running time
estimates of two algorithms:
• Algo 1: 𝑇1 𝑛 = 5.1𝑛
• Algo 2: 𝑇2 𝑛 = 5.2𝑛
30
RMIT Classification: Trusted
Asymptotic Complexity
What about the
followings:
• Algo 3: 𝑇3 𝑛 = 5.1𝑛2
• Algo 4: 𝑇4 𝑛 = 5.2𝑛2
31
RMIT Classification: Trusted
• In other words:
32
RMIT Classification: Trusted
33
RMIT Classification: Trusted
34
RMIT Classification: Trusted
35
RMIT Classification: Trusted
36
RMIT Classification: Trusted
Asymptotic Complexity
If an algorithm requires n2–3*n+10 seconds to solve a problem size n. If
constants k and n0 exist such that
k*n2 > n2–3*n+10 for all n n0
the algorithm is order n2 (In fact, k is 3 and n0 is 2)
3*n2 > n2–3*n+10 for all n 2
Thus, the algorithm requires no more than
k*n2 time units for n n0 where k = 3 and no = 2
So, it is O(n2)
37
RMIT Classification: Trusted
List sorted from best to worst. Many other options e.g. sqrt(n), log(log(n)), etc.
38
RMIT Classification: Trusted
39
RMIT Classification: Trusted
40
RMIT Classification: Trusted
41
RMIT Classification: Trusted
6 60
310,224,200,866,619,719,680,000
7 360
8 2,520
9 20,160
10 181,440
42
RMIT Classification: Trusted
• One example of the famous P versus NP problems which are problems where
the best known solutions are really hard (exponential) to solve but are relative
easy (polynomial) to check
o Examples - subset sum, jigsaw solving
1 -3 7 2 11 -9 -6
14 5 11 -10 -1 22 -11
43
RMIT Classification: Trusted
44
RMIT Classification: Trusted
Is P = NP? (*)
• Open Question: if one can check an answer in polynomial time, is there
an algorithm to find the answer in polynomial time
• Examples : sudoku, Jigsaws, subset-sum, graph colouring, map
colouring, travelling salesman, Hamiltonian path
45
RMIT Classification: Trusted
46
RMIT Classification: Trusted
47
RMIT Classification: Trusted
Growth-Rate Functions
• Ignore low-order terms in the growth-rate function
o If an algorithm is O(n3+4n2+3n+5), it is also O(n 3)
• Ignore a multiplicative constant in the higher-order term
o If an algorithm is O(5n 3), it is also O(n3)
Answer: O(n4)
49
RMIT Classification: Trusted
Analysis of Algorithms
RMIT Classification: Trusted
51
RMIT Classification: Trusted
52
RMIT Classification: Trusted
Example: an
Input size: n
C(n):
=n
53
RMIT Classification: Trusted
= n2
54
RMIT Classification: Trusted
Growth-Rate Functions – Ex 1
Cost Occurrences
i=1 c1 1
sum = 0 c2 1
while (i <= n): c3 n+1
i=i+1 c4 n
sum = sum + i c5 n
55
RMIT Classification: Trusted
Growth-Rate Functions – Ex 2
Cost Occurrences
i=1 c1 1
sum = 0 c2 1
while (i <= n): c3 n+1
j=1 c4 n
while (j <= n): c5 n*(n+1)
sum = sum + i c6 n*n
j=j+1 c7 n*n
i = i +1 c8 n
T(n) = c1 + c2 + (n+1)*c3 + n*c4 + n*(n+1)*c5+n*n*c6+n*n*c7+n*c8
= (c5+c6+c7)*n 2 + (c3+c4+c5+c8)*n + (c1+c2+c3) = a*n2 + b*n + c
➔ So, the growth-rate function for this algorithm is O(n2)
56
RMIT Classification: Trusted
Recursion
• Recursion is fundamental tool in computer science.
o A recursive program (or function) is one that calls itself.
o It must have a termination condition defined.
• Many interesting algorithms are simply expressed with a
recursive approach
57
RMIT Classification: Trusted
Input size: n
58
RMIT Classification: Trusted
59
RMIT Classification: Trusted
Backward Substitution
Recurrence: 𝐶(𝑛) = 𝐶(𝑛 − 1) + 1 for 𝑛 > 1, and 𝐶(1) = 0
60
RMIT Classification: Trusted
Backward Substitution
Aim of simplification and backward substitution: Convert
𝐶(𝑛) = 𝐶(𝑛 − 1) + 1 to 𝐶(𝑛) = 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛(𝑛), e.g., 𝐶(𝑛) = 𝑛 + 1
61
RMIT Classification: Trusted
Backward Substitution
Recurrence: 𝑪(𝒏) = 𝑪(𝒏 − 𝟏) + 𝟏 for 𝒏 > 𝟏, and 𝑪(𝟏) = 𝟎
1. 𝐶(𝑛) = 𝐶(𝑛 − 1) + 1
2. Substitute 𝐶(𝑛 − 1) = 𝐶(𝑛 − 2) + 1 into original equation
3. 𝐶(𝑛) = [𝐶(𝑛 − 2) + 1] + 1 = 𝐶(𝑛 − 2) + 2
4. Substitute C(n − 2) = C(n − 3) + 1 into original equation
5. 𝐶(𝑛) = [𝐶(𝑛 − 3) + 1] + 2 = 𝐶(𝑛 − 3) + 3
6. We see the pattern 𝑪(𝒏) = 𝑪(𝒏 − 𝒊) + 𝒊 emerge, where 1 ≤ 𝑖 ≤ 𝑛
7. Now, we know 𝐶(1) = 0 and want to determine when 𝐶(𝑛 − 𝑖) = 𝐶(1), or when
𝑛 − 𝑖 = 1. This value is 𝑖 = 𝑛 − 1
𝐶(𝑛) = 𝐶(𝑛 − 𝑖) + 𝑖 = 𝐶(𝑛 − (𝑛 − 1)) + 𝑛 − 1 = 𝐶(1) + 𝑛 − 1 = 0 + 𝑛 − 1 = 𝑛 − 1
Hence 𝒕(𝒏) = 𝒄𝒐𝒑 ∙ 𝒏 ∈ 𝑶(𝒏)
62
RMIT Classification: Trusted
Empirical Analysis
• Theoretical analysis of the complexity of an algorithm gives
an estimate of the running time and growth rate, but not the
real time
• Measuring the actual time of an implementation takes is in
the real world is very important, especially when comparing
two algorithms with the same time complexity
63
RMIT Classification: Trusted
64
RMIT Classification: Trusted
65
RMIT Classification: Trusted
66
RMIT Classification: Trusted
67
RMIT Classification: Trusted
Estimating Performance
• We can estimate the growth rate if we collect performance data on a number of
inputs of different sizes
• We can then apply it to other input sizes
69
RMIT Classification: Trusted
Benchmarking Algorithms
Theoretical: Merge-sort and Quick-sort have O(n log(n))
complexity, while Selection-sort is O(n2).
Empirical: Running Times (in seconds) for different sorting
algorithms on a randomised list:
Input (list size) 500 2,500 10,000
Merge sort 0.8 8.1 39.8
Quick sort 0.3 1.3 5.3
Selection sort 1.5 35.0 534.7
70
RMIT Classification: Trusted
Benchmarking Algorithms - 2
Theoretical: Merge-sort and Quick-sort have O(n log(n))
complexity, while Selection-sort is O(n2).
Empirical: Running Times (in seconds) for different sorting
algorithms on an ordered list:
71
RMIT Classification: Trusted
Question
One of your data sources to predict the weather are images
sent from satellites. When you receive this data, you find that
for 5, 10 and 20 images your code took 24 msec, 99 msec
and 401 msec respectively.
o What is order of complexity of your algorithm and justify your
answer
o Your final input file has 1,000,000 images. Without using a
calculator, can you determine approximately how many days it will
take your code to run?
72
RMIT Classification: Trusted
Best Practices
• If the problem size is always small, we can probably ignore the
algorithm’s efficiency and should choose the simplest algorithm
• Use adaptive algorithms that work differently depending on
o Input size
o Input characteristics (e.g. random or partially sorted, range)
• We should compare both the style and efficiency of each algorithm
o No need for coding tricks if the gain is small
o Should prefer easily understandable and maintainable program
• Implementation complexity: easy or complex to implement an algorithm
73
RMIT Classification: Trusted
74
RMIT Classification: Trusted
75
RMIT Classification: Trusted
Other Approaches
• Heuristic algorithms
o Do not guarantee that the best will be found but usually will quickly find a
solution close to the best one.
o Sometimes these algorithms actually find the best solution, but the
algorithm is still called heuristic until this solution is proven to be the best.
• In general if X is the found solution and Y is the optimal solution, then
o Deterministic algorithm: X = Y
o Approximation Algorithm : X is close to Y
o Probabilistic algorithm - X is "usually" Y
o Heuristic algorithm – X is usually "close" to Y
76
RMIT Classification: Trusted
78
RMIT Classification: Trusted
79
RMIT Classification: Trusted
80
RMIT Classification: Trusted
81
RMIT Classification: Trusted
82
RMIT Classification: Trusted
Examples (*)
Upper bound Lower bound
t (n) O(n) O(n2) O(n3) Ω(n) Ω(n2) Ω(n3)
log2 n T T T F F F
10n + 5 T T T T F F
n(n − 1)/2 F T T T T F
(n + 1)3 F F T T T T
2n F F F T T T
83
RMIT Classification: Trusted
Some Clarifications…
• Generally O(n) is most commonly used…
• But exact bounds Θ(n) tell us the bounds are tight and the
algorithm doesn’t have anything outside what we expect
• Lower bounds Ω(n) are useful to describe the (theoretical)
limits of whole classes of algorithms, and also sometimes
useful to state how fast can the best case reach.
84
RMIT Classification: Trusted
Some clarifications…
• O(n) is not the same thing as “Worst Case Efficiency”
• Ω(n) is not the same thing as “Best Case Efficiency”
• Θ(n) is not the same thing as “Average Case Efficiency”
85
RMIT Classification: Trusted