4 Greedy Updated

4.
THE GREEDY METHOD
Aleks Ignjatović, ignjat@cse.unsw.edu.au

office: K17 504
Course Admin: Song Fang, cs3121@cse.unsw.edu.au
School of Computer Science and Engineering

UNSW Sydney
Term 2, 2023
Table of Contents 2
1. Introduction
2. Assorted problems
3. Applications to graphs
3.1 Directed graph structure
3.2 Single source shortest paths
3.3 Minimum spanning trees
4. Puzzle
The Greedy Method 3
Question
What is a greedy algorithm?
Answer
A greedy algorithm is one that solves a problem by doing it in
stages, only considering the choice that appears to be the best at
that stage of construction.
This obviously reduces the search space, but it works correctly only
in cases when the locally optimal choices lead to the globally
optimal outcome.
Suppose you are searching for the highest point in a mountain

range. If you always climb upwards from the current point in the
steepest possible direction, you will find a peak, but not necessarily
the highest point overall.
Table of Contents 4
1. Introduction
4. Puzzle
Activity Selection 5
Problem
Instance: A list of n activities, with starting times si and finishing
times fi . No two activities can take place simultaneously.
si fi
Task: Find a maximum size subset of compatible activities.

Attempt 1
Always choose the shortest activity which does not conflict with
the previously chosen activities, then remove the conflicting
activities and repeat.
In the above example, our proposed algorithm chooses the

activities in blue, then has to discard all the red activities, so
clearly this does not work.
Attempt 2
Maybe we should always choose an activity which conflicts with
the fewest possible number of the remaining activities? It may
appear that in this way we minimally restrict our next choice . . .
As appealing this idea is, the above figure shows this again does
not work!
Solution
Among those activities which do not conflict with the previously
chosen activities, always choose the activity with the earliest end
time (breaking ties arbitrarily).
To prove the correctness of our algorithm, we will use an exchange

argument. We will show that any alternative solution can be
transformed into a solution obtained by our greedy algorithm with
at least as many activities.
Find the first place where the chosen activity violates the
greedy choice.
What if we replace that activity with the greedy choice?
Does the new selection have any conflicts? No!

Does the new selection have the same number of activities?
Yes!
So the greedy choice is actually just as good as the choice
used in the alternative solution!
We replace it and repeat.
Continuing in this manner, we can eventually “morph” any
alternative solution into the greedy solution, thus proving the
greedy solution is optimal.
What is the time complexity of the algorithm?
We represent activities by ordered pairs of their starting and

their finishing times and sort them in increasing order of their
finishing time (the second coordinate), in O(n log n) time.
We go through this sorted list in order. How do we tell

whether an activity conflicts with the already chosen
activities?
Suppose we are up to activity i, starting at si and finishing at

fi , with all earlier finishing activities already processed.
If all previously chosen activities finished before si , activity i

can be chosen without a conflict. Otherwise, there will be a
clash, so we discard activity i.
We would prefer not to go through all previously chosen

activities each time.
We need only keep track of the latest finishing time among

chosen activities.
Since we process activities in increasing order of finishing time,

this is just the finishing time of the last activity to be chosen.
Every activity is therefore either chosen (and the last finishing

time updated) or discarded in constant time, so this part of
the algorithm takes O(n) time.
Thus, the algorithm runs in total time O(n log n), dominated
by sorting.
A related problem
Instance: A list of n activities with starting times si and finishing
times fi = si + d; thus, all activities are of the same duration. No
two activities can take place simultaneously.
Task: Find a subset of compatible activities of maximal total

duration.
Solution
Since all activities are of the same duration, this is equivalent to
finding a selection with a largest number of non conflicting
activities, i.e., the previous problem.
Question
What happens if the activities are not all of the same duration and
we have to select activities of maximal total duration?
Solution
The greedy strategy no longer works - we will need a more
sophisticated technique.
Petrol stations 17
Problem
Instance: You are traveling by car on a road from Loololong (in
the West) to Goolagong (in the East). You start with full tank of
petrol which you know that is sufficient to travel K kilometres.
You also know the distances di from Loologong to N petrol
stations on the road and you wish to reach Goolagong with a
minimal possible number of stops to refuel? How should you
choose at which petrol stations to stop?
Solution
Always travel to the furthest petrol station you can reach without
running out of petrol and always fill the tank to its capacity.
Petrol stations 18
Why is this optimal?

Clearly for any given number of stops, such greedy strategy
allows you to travel the longest distance.
So if there is a way to reach Goolagong with n stops, this is
possible with the greedy strategy.
Greedy solutions always stays ahead.
Greedy Method Correctness Proofs 19
Question
So, in general, how do we prove that a greedy algorithm produces
a correct solution?
Answer
There are two main methods of proof:
1 Exchange argument: consider an alternative solution, and
gradually transform it to the solution found by our proposed
algorithm without making it any worse (as we did in the
Activity Selection Problem).
2 Greedy stays ahead: prove that at every stage, no other
sequence of choices could do better than our proposed
algorithm (as we did in the Petrol Stations Problem).
Greedy Method Correctness Proofs 20
When does Greedy strategy work?

There is a theory dealing with this issue (matroids, presented
in CLRS textbook) but it is not very practical.
After solving a sufficient number of problems you will develop
intuition when the Greedy strategy produces a correct solution.
It is important that you always prove the correctness of the
algorithm, because sometimes it is not obvious if Greedy
Method produces a correct solution!
Cell Towers 21
Problem
Instance: Along the long, straight road from Loololong (in the
West) to Goolagong (in the East), houses are scattered quite
sparsely, sometimes with long gaps between two consecutive
houses. Telstra must provide mobile phone service to people who
live alongside the road, and the range of Telstra’s cell tower is 5km.
L G
Task: Design an algorithm for placing the minimal number of cell

towers alongside the road, that is sufficient to cover all houses.
Cell Towers 22
Maybe we should put the first tower at the spot where it

would cover the largest number of houses, then remove the
covered houses and continue in this manner until all houses
are covered.
Exercise: Show that this does not work by finding a counter

example.
Cell Towers 23
Let us attempt a different greedy algorithm, processing the

houses west to east.
The first house must be covered by some tower, which we

place 5km to the east of this house.
This tower may cover some other houses, but eventually we

should reach a house that is out of range of this tower. We
then place a second tower 5km to the east of that house.
Continue in this way until all houses are covered.

Cell Towers 24
At each house, we need to decide whether to place a new

tower. This can be done in constant time by referring to the
most recently created tower, which can itself be updated in
constant time if necessary.
Therefore this algorithm runs in O(n) time if the houses are

provided in order, and O(n log n) time otherwise.
Exercise
Prove the correctness of this algorithm using an exchange
argument.
Cell Towers 25
One of Telstra’s engineers started with the house closest to

Loololong and put a tower 5km away to the east. He then
found the westmost house not already in the range of the
tower and placed another tower 5 km to the east of it and
continued in this way till he reached Goolagong.
His junior associate did exactly the same but starting from
Goolagong and moving westwards and claimed that his
method required fewer towers.
Is there a placement of houses for which the associate is right?

Minimising Job Lateness 26
Problem
Instance: A start time T0 and a list of n jobs, with duration times
ti and deadlines di . Only one job can be performed at any time; all
jobs have to be completed. If a job i is completed at a finishing
time fi > di then we say that it has incurred lateness li = fi − di .
Task: Schedule all the jobs so that the lateness of the job with the
largest lateness is minimised.
Solution
Ignore job durations and schedule jobs in the increasing order of
deadlines.
Proof of optimality
Consider any alternative schedule. We say that jobs i and j form
an inversion if job i is scheduled before job j but dj < di .
dj ti tj
T0 di
fi −1 fi fj −1 fj
li
lj
We will show that there exists a scheduling without inversions

which is at least as good.
Recall that bubble sort only swaps adjacent array entries, and
eventually sorts the array.
Thus, if we show that we can swap two adjacent inverted

activities without increasing the largest lateness then we are
guaranteed that the schedule without any inversions is
optimal.
dj tj tk
T0 dk
fi fj fk
lj
lk
dj tk tj
T0 dk
fi fk fj
lj
lk
Note that swapping adjacent inverted jobs reduces the larger

lateness!
Tape Storage 30
Problem
Instance: A list of n files of lengths li which have to be stored on
a tape. Each file is equally likely to be needed. To retrieve a file,
one must start from the beginning of the tape and scan it until the
file is found and read.
Task: Order the files on the tape so that the average (expected)
retrieval time is minimised.
Tape Storage 31
If the files are stored in order l1 , l2 , . . . , ln , then the expected

time is proportional to
l1 + (l1 + l2 ) + (l1 + l2 + l3 ) + . . . + (l1 + l2 + l3 + . . . + ln )

= nl1 + (n − 1)l2 + (n − 2)l3 + . . . + 2ln−1 + ln .
This is minimised if l1 ≤ l2 ≤ l3 ≤ . . . ≤ ln , so we simply sort

the files by increasing order of length for an O(n log n)
solution.
Tape Storage II 32
Problem
Instance: A Plist of n files of lengths li and probabilities to be
needed pi , ni=1 pi = 1, which have to be stored on a tape. To
retrieve a file, one must start from the beginning of the tape and
scan it until the file is found and read.
Task: Order the files on the tape so that the expected retrieval
time is minimised.
Tape Storage II 33
If the files are stored in order l1 , l2 , . . . ln , then the expected

time is proportional to
l1 p1 + (l1 + l2 )p2 + (l1 + l2 + l3 )p3

+ . . . + (l1 + l2 + l3 + . . . + ln )pn
We now show that this is minimised if the files are ordered in

a decreasing order of values of the ratio pi /li .
Tape Storage II 34
Let us see what happens if we swap two adjacent files, say

files k and k + 1.
The expected time before the swap and after the swap are,
respectively,
E = l1 p1 + (l1 + l2 )p2 + (l1 + l2 + l3 )p3 + . . .
+ (l1 + l2 + l3 + . . . + lk−1 + lk )pk
+ (l1 + l2 + l3 + . . . + lk−1 + lk + lk+1 )pk+1
+ . . . + (l1 + l2 + l3 + . . . + ln )pn
and
E 0 = l1 p1 + (l1 + l2 )p2 + (l1 + l2 + l3 )p3 + . . .
+ (l1 + l2 + l3 + . . . + lk−1 + lk+1 )pk+1
+ (l1 + l2 + l3 + . . . + lk−1 + lk+1 + lk )pk
+ . . . + (l1 + l2 + l3 + . . . + ln )pn .
Tape Storage II 35
Thus, E − E 0 = lk pk+1 − lk+1 pk , which is positive whenever

lk pk+1 > lk+1 pk , i.e. when pk /lk < pk+1 /lk+1 .
Consequently, E > E 0 if and only if pk /lk < pk+1 /lk+1 , which

means that the swap decreases the expected time whenever
pk /lk < pk+1 /lk+1 , i.e., if there is an inversion: file k + 1 with
a larger ratio pk+1 /lk+1 has been put after file k with a
smaller ratio pk /lk .
As long as the sequence is not sorted, there will be inversions

of consecutive files, and swapping will reduce the expected
time. Consequently, the optimal solution is the one with no
inversions.
Tape Storage II 36
Note: We DO NOT use the Bubble Sort in our algorithm, it is

used only in the proof that optimal solution contains no
inversions.
In the algorithm itself we compute in time O(n) all the ratios
pk /lk and then sort all files according to these ratios in time
O(n log n).
Thus, the whole algorithm runs in time O(n log n), while
Bubble Sort can sometimes run in time O(n2 ).
Interval Stabbing 37
Problem
Instance: Let X be a set of n intervals on the real line, described
by two arrays XL [1..n] and XR [1..n], representing their left and
right endpoints. We say that a set P of points stabs X if every
interval in X contains at least one point in P.
Task: Describe and analyse an efficient algorithm to compute the

smallest set of points that stabs X .
Attempt 1
Is it a good idea to stab the largest possible number of intervals?
No! In the above example, this strategy needs three stabbing

points rather than two.
Hint
The interval which ends the earliest has to be stabbed somewhere.
What is the best place to stab it?

Fractional Knapsack 40
Problem
Instance: A list of n powder substances, described by their
weights wi and values vi , and a maximal weight limit W of your
knapsack. You can take any fraction available of each substance
(not necessarily integer).
Task: Select a non-negative quantity of each substance, with total

weight not exceeding W and maximal total value.
Fractional Knapsack 41
Solution
Take maximal available amount of the substance of highest value
per unit weight!
0-1 Knapsack 42
Problem
Instance: A list of n discrete items described by their weights wi
and values vi , and a maximal weight limit W of your knapsack.
Task: Find a subset S of the items with total weight not

exceeding W and maximal total value.
0-1 Knapsack 43
Can we always choose the item of highest value per unit

weight?
Assume there are just three items with weights and values:
A(10 kg, $60), B(20 kg, $100), C (30 kg, $120)
and a knapsack of capacity W = 50 kg.
The greedy strategy would choose items A and B, while the

optimal solution is to take items B and C !
So when does the Greedy Strategy work??

As we mentioned, unfortunately there is no easy rule . . .
Array Merging 44
Assume you are given n sorted arrays of different sizes.
You are allowed to merge any two arrays into a single new
sorted array and proceed in this manner until only one array is
left.
Exercise
Design an algorithm which achieves this task and moves array
elements as few times as possible.
Give an informal justification why your algorithm is optimal.
This problem is somewhat related to the next problem, which is

arguably among the most important applications of the greedy
method!
The Huffman Code 45
Assume you are given a set of symbols, for example the

English alphabet plus punctuation marks and a blank space
(to be used between words).
You want to encode these symbols using binary strings, so

that sequences of such symbols can be decoded in an
unambiguous way.
The Huffman Code 46
One way of doing so is to reserve bit strings of equal and

sufficient length, given the number of distinct symbols to be
encoded. For example, if you have 26 letters and up to 6
punctuation symbols, you could use strings of 5 bits, as
25 = 32.
To decode a piece of text you would partition the bit stream

into groups of 5 bits and use a lookup table to decode the
text.
The Huffman Code 47
However this might not be the most economical way: all the
symbols have codes of equal length but the symbols are not
equally frequent.
One would prefer an encoding in which frequent symbols such

as ‘a’, ‘e’, ‘i’ or ‘t’ have short codes while infrequent ones,
such as ‘q’,‘x’ and ‘z’ can have longer codes.
The Huffman Code 48
However, if the codes are of variable length, then how can we

partition a bitstream uniquely into segments each
corresponding to a code?
One way of ensuring unique readability of codes from a single

bitstream is to ensure that no code of a symbol is a prefix of a
code for another symbol.
Codes with this property are called prefix codes.

The Huffman Code 49
We can now formulate the problem as follows:

Given the frequencies (probabilities of occurrences)
of each symbol, design an optimal prefix code, i.e. a
prefix code which minimises the expected length of
an encoded text.
Note that this amounts to saying that the average number of

bits per symbol in an “average” text is as small as possible.
We now sketch the algorithm informally; please see the

textbook for details and the proof of optimality.
MATH3411 Information, Codes & Ciphers covers this and

much more!
Table of Contents 50
1. Introduction
4. Puzzle
1. Introduction
4. Puzzle
Tsunami Warning 52
Problem
Instance: There are n radio towers for broadcasting tsunami
warnings. You are given the (x, y ) coordinates of each tower and
its radius of range. When a tower is activated, all towers within
the radius of range of the tower will also activate, and those can
cause other towers to activate and so on.
You need to equip some of these towers with seismic sensors so

that when these sensors activate the towers where these sensors
are located all towers will eventually get activated and send a
tsunami warning.
Task: Design an algorithm which finds the fewest number of

towers you must equip with seismic sensors.
Tsunami Warning 53
d
e f
Tsunami Warning 54
Activating a causes the activation of b and c, and therefore d

is activated also.
Activating e causes the activation of f .
g must be activated separately.
Therefore a minimum of three sensors are required.
Note that we could have placed the first sensor at c instead of

a.
Tsunami Warning 55
d
e f
Tsunami Warning 57
Attempt 1
Find the unactivated tower with the largest radius (breaking ties
arbitrarily), and place a sensor at this tower. Find and remove all
towers activated as a result. Repeat.
Attempt 2
Find the unactivated tower with the largest number of towers
within its range (breaking ties arbitrarily), and place a sensor at this
tower. Find and remove all towers activated as a result. Repeat.
Exercise
Give examples which show that neither of these algorithms solve
the problem correctly.
Tsunami Warning 58
It is useful to consider the towers as vertices of a directed graph,

where an edge from tower a to tower b indicates that the
activation of a directly causes the activation of b, that is, b is
within the radius of a.
a b a b
a→b a↔b
Tsunami Warning 59
Observation
Suppose that activating tower a causes tower b to also be
activated, and vice versa. Then we never want to place sensors at
both towers; indeed, placing a sensor at a is equivalent to placing a
sensor at b.
How can we can extend this notion to a larger number of towers?

Cycles also have this property.
Can we do better?
Tsunami Warning 60
Example
a c
All four towers can be activated by placing just one sensor at a, b

or c.
Tsunami Warning 61
Observation
Let S be a subset of the towers such that that activating any tower
in S causes the activation of all towers in S.
We never want to place more than one sensor in S, and if we place

one, then it doesn’t matter where we put it.
In this way, we can treat all of S as a unit; a super-tower.

Strongly Connected Components 62
Definition
Given a directed graph G = (V , E ) and a vertex v , the strongly
connected component of G containing v consists of all vertices
u ∈ V such that there is a path in G from v to u and a path from
u to v . We will denote it by Cv .
In the terms of our problem, strongly connected components are

maximal super-towers.
How do we find the strongly connected component Cv ⊆ V

containing v ?
Construct another graph Grev = (V , Erev ) consisting of the

same set of vertices V but with the set of edges Erev obtained
by reversing the direction of all edges E of G .
Claim
u is in Cv if and only if u is reachable from v and v is reachable
from u.
Equivalently, u is reachable from v in both G and Grev .

Suppose the original graph G = (V , E ) is
a b d e
c f
g h
Then the set of vertices reachable from e is
Re = {d, e, f , g , h}.
The reverse graph Grev = (V , Erev ) is
a b d e
c f
g h
Then the set of vertices reachable from e is
Re0 = {a, b, c, d, e, f }.
Combining
Re = {d, e, f , g , h}
with
Re0 = {a, b, c, d, e, f },
we have
Ce = Re ∩ Re0 = {d, e, f }.
Note that Cd and Cf are also the same set, namely {d, e, f }.
Similarly, we find that Ca = {a, b, c} and Cg = {g , h}.

Use BFS to find the set Rv ⊆ V of all vertices in V which are

reachable in G from v .
Similarly find the set Rv0 ⊆ V of all vertices which are

reachable in Grev from v .
The strongly connected component of G containing v is given

by Cv = Rv ∩ Rv0 .
Therefore the decomposition of G into strongly connected

components is
a b d e
c f
g h
Finding all strongly connected components in this way could

require O(V ) traversals of the graph.
Each of these traversals is a BFS, requiring O(V + E ) time.
Therefore the total time complexity is O(V (V + E )).
Faster algorithms exist! Kosaraju’s algorithm and Tarjan’s

algorithm find all strongly connected components of a directed
graph in linear time, i.e. O(V + E ) (covered in COMP4128).
The Condensation Graph 70
It should be clear that distinct strongly connected components

are disjoint sets, so the strongly connected components form a
partition of V .
Let CG be the set of all strongly connected components of a

graph G .
Definition
Define the condensation graph ΣG = (CG , E ∗ ), where
E ∗ = {(Cu1 , Cu2 ) | (u1 , u2 ) ∈ E , Cu1 6= Cu2 } .
The vertices of ΣG are the strongly connected components of G ,

and the edges of ΣG correspond to those edges of G that are not
within a strongly connected component, with duplicates ignored.
Recall our earlier example
a b d e
c f
g h
The condensation graph is simply
{a, b, c} {d, e, f }
{g , h}
We begin our solution to the tsunami warning problem by

finding the condensation graph.
Now we have the set of super-towers, and we know for each

super-tower which others it can activate.
Our task is to decide which super-towers need a sensor

installed in order to activate all the super-towers.
We need to know one more property about the condensation

graph.
Claim
The condensation graph ΣG is a directed acyclic graph.
Proof Outline
Suppose there is a cycle in ΣG . Then the vertices on this cycle are
not maximal strongly connected sets, as they can be merged into
an even larger strongly connected set.
Tsunami Warning 75
Solution
The correct greedy strategy is to only place a sensor in each
super-tower without incoming edges in the condensation graph.
Proof
These super-towers cannot be activated by another super-tower, so
they each require a sensor. This shows that there is no solution
using fewer sensors.
Tsunami Warning 76
Proof (continued)
We still have to prove that this solution activates all super-towers.
Consider a super-tower with one or more incoming edges. Follow

any of these edges backwards, and continue backtracking in this
way.
Since the condensation graph is acyclic, this path must end at

some super-tower without incoming edges. The sensor placed here
will then activate all super-towers along our path.
Therefore, all super-towers are activated as required.

Topological Sorting 77
Definition
Let G = (V , E ) be a directed graph, and let n = |V |. A
topological sort of G is a linear ordering (enumeration) of its
vertices σ : V → {1, . . . , n} such that if there exists an edge
(v , w ) ∈ E then v precedes w in the ordering, i.e., σ(v ) < σ(w ).
Property
A directed acyclic graph permits a topological sort of its vertices.
Note that the topological sort is not necessarily unique, i.e., there
may be more than one valid topological ordering of the vertices.
Algorithm
Maintain:
a list L of vertices, initially empty,
an array D consisting of the in-degrees of the vertices, and
a set S of vertices with no incoming edges.
Algorithm (continued)
While set S is non-empty, select a vertex u in the set.
Remove it from S and append it to L.
Then, for every outgoing edge e = (u, v ) from this vertex,

remove the edge from the graph, and decrement D[v ]
accordingly.
If D[v ] is now zero, insert v into S.
If there are no edges remaining, then L is a topological ordering.

Otherwise, the graph has a cycle.
This algorithm runs in O(V + E ), that is, linear time.

1. Introduction
4. Puzzle
Single Source Shortest Paths 81
Problem
Instance: a directed graph G = (V , E ) with non-negative weight
w (e), and a designated source vertex s ∈ V .
We will assume that for every v ∈ V there is a path from s to v .
Task: find the weight of the shortest path from s to v for every
v ∈ V.
Single Source Shortest Paths 82
Note
To find shortest paths from s in an undirected graph, simply
replace each undirected edge with two directed edges in opposite
directions.
Note
There isn’t necessarily a unique shortest path from s to each
vertex.
Dijkstra’s Algorithm 83
This task is accomplished by a very elegant greedy algorithm

developed by Edsger Dijkstra in 1959.
Algorithm Outline
Maintain a set S of vertices for which the shortest path weight has
been found, initially empty. S is represented by a boolean array.
For every vertex v , maintain a value dv which is the weight of the

shortest ‘known’ path from s to v , i.e. the shortest path using only
intermediate vertices in S. Initially ds = 0 and dv = ∞ for all
other vertices.
At each stage, we greedily add to S the vertex v ∈ V \ S which

has the smallest dv value. Record this value as the length of the
shortest path from s to v , and update other dz values as necessary.
w d(w)=7
1
7 3 v d(v)=5
5 1 2 z d(z)=¥
S={s}
u d(u)=3
s d(s)=0 3
w d(w)=3+3=6
1
7 3 v d(v)=3+1=4
S={s,u} 5 1 2 z d(z)=¥
u d(u)=3
s d(s)=0 3
w d(w)=3+3=6
1
7 3 v d(v)=3+1=4
S={s,u} 5 1 2 z d(z)=¥
u d(u)=3
s d(s)=0 3
w d(w)=3+1+1=5
1
7 3 v d(v)=4
S={s,u,v} 5 1 2 z d(z)=3+1+2=6
u d(u)=3
s d(s)=0 3
This outline still leaves much work to do.
Why is it correct to always add the vertex outside S with the

smallest dv value?
When v is added to S, for which vertices z must we update

dz , and how do we do these updates?
What data structure should we use to represent the dv values?
What is the time complexity of this algorithm, and how is it

impacted by our choice of data structure?
Dijkstra’s Algorithm: Correctness 87
First, we will prove the correctness of Dijkstra’s algorithm.
Claim
Suppose v is the next vertex to be added to S. Then dv is the
length of the shortest path from s to v .
Proof
dv is the length of the shortest path from s to v using only

intermediate vertices in S. Let’s call this path p.
If this were not to be the shortest path from s to v , there

must be some shorter path p 0 which first leaves S at some
vertex y before later reaching v .
Proof (continued)
Now, the portion of p 0 up to y is a path from s to y using

only intermediate vertices in S.
Therefore, this portion of p 0 has weight at least dy .
Since all edge weights are non-negative, p 0 itself has weight at

least dy .
Proof (continued)
But v was chosen to have smallest d-value among all vertices

outside S!
So we know that dv ≤ dy , and hence the weight of path p is

at most that of p 0 .
Therefore, dv is indeed the weight of the shortest path from s

to v .
new S
old S
v
s
y
Dijkstra’s Algorithm: Updates 91
Question
Earlier, we said that when we add a vertex v to S, we may have to
update some dz values. What updates could be required?
Answer
If there is an edge from v to z with weight w (v , z), the shortest
known path to z may be improved by taking the shortest path to v
followed by this edge. Therefore we check whether
dz > dv + w (v , z),
and if so we update dz to the value dv + w (v , z).
As it turns out, these are the only updates we should consider!

Claim
If dz changes as a result of adding v to S, the new shortest known
path to z must have penultimate vertex v , i.e. the last edge must
go from v to z.
Proof
Suppose that adding v to S allows for a new shortest path

through S from s to z with penultimate vertex u 6= v .
Such a path must include v , or else it would not be new.
Thus the path is of the form
p = s → · · · → v → · · · → u → z.
Proof (continued)
Since u was added to S before v was, we know that there is a

shortest path p 0 from s to u which does not pass through v .
Appending the edge from u to z to p 0 produces a path
through S from s to z which is no longer than p.
This path was already a candidate for dz , so the weight of p is
greater than or equal to the existing dz value.
This is a contradiction, so the proof is complete.
new S
old S
v
s
z
u
Dijkstra’s Algorithm: Data Structures 95
Now, we are ready to consider data structures to maintain the

dv values.
We need to support two operations:
find the vertex v ∈ V \ S with smallest dv value, and
for each of its outgoing edges (v , z), update dz if necessary.
We’ll start with the simplest data structure: the array.

Dijkstra’s Algorithm: Array 96
Let n = |V | and m = |E |, and suppose the vertices of the graph

are labelled 1, . . . , n, where vertex s is the source.
Attempt 1
Store the di values in an array d[1..n].
At each stage:
Perform a linear search of array d, ignoring those vertices

already in S, and select the vertex v with smallest d[v ] to be
added to S.
For each outgoing edge from vertex v to some z ∈ V \ S,

update d[z] if necessary.
Question
What is the time complexity of this algorithm?
Answer
At each of n steps, we perform a linear scan on an array of

length n.
We also run the update procedure (in constant time) at most

once for each edge.
The algorithm therefore runs in O(n2 + m).

In a simple graph (no self loops or parallel edges) we have

m ≤ n(n − 1), so we can simplify the time complexity
expression to just O(n2 ).
If the graph is dense, this is fine. But this is not guaranteed!
Can we do better when m n2 ?

Recall the two operations we need to support:
find the vertex v ∈ V \ S with smallest dv value, and
for each of its outgoing edges (v , z), update dz if necessary.
So far, we have done the first operation in O(n) using a linear

search.
How can we improve on this?

The first operation isn’t a pure ‘find minimum’ because we

have to skip over vertices already in S.
Instead, when we add a vertex v to S, we could try deleting

dv from the data structure altogether.
We now have three operations to support: find minimum,

delete minimum, and update any.
The first two of these suggest the use of a min-heap, but the
standard heap doesn’t allow us to update arbitrary elements.
Augmented Heaps 101
We will use a heap represented by an array A[1..n]; the left

child of A[j] is stored in A[2j] and the right child in A[2j + 1].
Every element of A is of the form A[j] = (i, di ) for some

vertex i. The min-heap property is maintained with respect to
the d-values only.
We will also maintain another array P[1..n] which stores the

position of elements in the heap.
Whenever A[j] refers to vertex i, we record P[i] = j, so that

we can look up vertex i using the property A[P[i]] = (i, di ).
Augmented Heaps 102
Changing the d-value of vertex i is now an O(log n) operation.
First, look up vertex i in the position array P. This gives us

P[i], the index in A where the pair (i, di ) is stored.
Next, we update di by changing the second entry of A[P[i]].
Finally, it may be necessary to bubble up or down to restore

the min-heap property. In this algorithm, d-values are only
ever reduced, so only bubbling up is applicable.
Accessing the top of the heap still takes O(1), and popping
the heap still takes O(log n).
Augmented Heaps 103
(7, 1)
(6, 2) (5, 6)
(2, 3) (4, 8) (1, 7) (3, 9)
j 1 2 3 4 5 6 7
7 6 5 2 4 1 3 i
A[j]
1 2 6 3 8 7 9 di
i 1 2 3 4 5 6 7 i
P[i] 6 4 7 5 3 2 1 pi
Dijkstra’s Algorithm: Augmented Heap 104
Algorithm
Store the di values in an augmented heap of size n.
At each stage:
Access the top of the heap to obtain the vertex v with

smallest key and add it to set S.
Pop the corresponding element dv from the heap.
For each outgoing edge from v to some z ∈ V \ S, update dz

if necessary.
Dijkstra’s Algorithm: Augmented Heap 105
Question
What is the time complexity of our algorithm?
Answer
Each of n stages requires a deletion from the heap (when a

vertex is added to S), which takes O(log n) many steps.
Each edge causes at most one update of a key in the heap,

also taking O(log n) many steps.
Thus, in total, the algorithm runs in time O((n + m) log n).

But since there is a path from v to every other vertex, we
know m ≥ n − 1, so we can simplify to O(m log n).
Dijkstra’s Shortest Paths Algorithm 106
Note
In COMP2521/9024, you may have seen that the time complexity
of Dijkstra’s algorithm can be improved to O(m + n log n). This is
true, but it relies on an advanced data structure called the
Fibonacci heap, which has not been taught in this course or any
prior course.
1. Introduction
4. Puzzle
Minimum Spanning Trees 108
Definition
A minimum spanning tree T of a connected graph G is a subgraph
of G (with the same set of vertices) which is a tree, and among all
such trees it minimises the total length of all edges in T .
Lemma
Let G be a connected graph with all lengths of edges E of G
distinct and S a non empty proper subset of the set of all vertices
V of G . Assume that e = (u, v ) is an edge such that u ∈ S and
v 6∈ S and is of minimal length among all the edges having this
property. Then e must belong to every minimum spanning tree T
of G .
Proof
Assume that there exists a minimum spanning tree T which does
not contain such an edge e = (u, v ).
p q
S S̄
u v
Proof (continued)
Since T is a spanning tree, there exists a path from u to v

within T , and this path must leave S by some edge, say (p, q)
where p ∈ S and q 6∈ S.
However, (u, v ) is shorter than any other edge with one end in
S and one end outside S, including (p, q).
Replacing the edge (p, q) with the edge (u, v ) produces a new
tree T 0 with smaller total edge weight.
This contradicts our assumption that T is a minimum

spanning tree, completing the proof.
There are two famous greedy algorithms for the minimum

spanning tree problem.
Both algorithms build up a forest, beginning with all n

isolated vertices and adding edges one by one.
Prim’s algorithm uses one large component, adding one of the

isolated vertices to it at each stage. This algorithm is very
similar to Dijkstra’s algorithm, but adds the vertex closest to
S rather than the one closest to the starting vertex v .
We will instead focus on Kruskal’s algorithm.

Kruskal’s Algorithm 112
We sort the edges E in increasing order by weight.
An edge e is added if its inclusion does not introduce a cycle

in the graph constructed thus far, or discarded otherwise.
The process terminates when the forest is connected, i.e.

when n − 1 edges have been added.
Claim
Kruskal’s algorithm produces a minimal spanning tree, and if all
weights are distinct, then such a Minimum Spanning Tree is
unique.
Proof
We consider the case when all weights are distinct.
Consider an edge e = (u, v ) added in the course of Kruskal’s
algorithm, and let F be the forest in its state before adding e.
Proof (continued)
Let S be the set of vertices reachable from u in F . Then

clearly u ∈ S but v 6∈ S.
The original graph does not contain any edges shorter than e
with one end in S and the other outside S. If such an edge
existed, it would have been considered before e and included
in F , but then both its endpoints would be in S, contradicting
the definition.
Consequently, edge e is the shortest edge between a vertex of

S and a vertex of S̄ and by the previous lemma it must belong
to every minimum spanning tree.
Proof (continued)
Thus, the set of edges produced by Kruskal’s algorithm is a

subset of the set of edges of every minimum spanning tree.
But the graph produced by Kruskal’s algorithm by definition

has no cycles and is connected, so it is a tree.
Therefore in the case where all edge weights are distinct,

Kruskal’s algorithm produces the unique minimum spanning
tree.
Efficient Implementation of Kruskal’s Algorithm 116
To efficiently implement Kruskal’s algorithm, we need to

quickly determine whether a certain new edge will introduce a
cycle.
An edge e = (u, v ) will introduce a cycle in the forest F if and

only if there is already a path between u and v , i.e., u and v
are in the same connected component.
Union-Find 117
In our implementation of Kruskal’s algorithm, we store the vertices

using the Union-Find data structure. It handles disjoint sets,
supporting three operations:
1 MakeUnionFind(S), which returns a structure in which all

elements (vertices) are placed into distinct singleton sets.
This operation runs in time O(n) where n = |S|.
2 Find(a), which returns the (label of the) set to which a

belongs. This operation runs in time O(1).
3 Union(a, b), which changes the data structure by merging

the sets A and B (whose labels are a and b respectively) into
a single set A ∪ B. The first k Union operations run in total
time O(k log k).
Union-Find 118
Note that we do not give the run time of a single Union

operation but of a sequence of k Union operations.
This is called amortized analysis; it effectively estimates the

average cost of each operation in a sequence.
Any one Union operation might be Θ(n), but the total time
taken by the first k is O(k log k), i.e. each takes ‘on average’
O(log k).
This is different to average case analysis, because it’s a

statement about an aggregate, rather than a probability.
Union-Find 119
We will label each set by one of its elements, called the

representative of the set.
The simplest implementation of the Union-Find data structure

consists of:
an array A, where A[i] = j means that i belongs to the set

with representative j;
an array B, where B[i] contains the number of elements in the

set with representative i;
an array L, where L[i] contains pointers to the head and tail of

a linked list containing the elements of the set with
representative i.
Union-Find 120
Note
If i is not the representative of any set, then B[i] is zero and the
list L[i] is empty.
Note
The list array L allows us to iterate through the members of one of
the disjoint sets, which is used in the Union operation.
Union-Find 121
Given two sets I and J with representatives i and j, Union(i, j) is

defined as follows:
assume B[i] ≥ B[j] (i.e. |I | ≥ |J|); otherwise perform

Union(j, i) instead;
for each m ∈ J, use linked list L[j] to update A[m] from j to i;
update B[i] to B[i] + B[j] and B[j] to zero;
append the list L[j] to the list L[i] and replace L[j] with an
empty list.
Union-Find 122
Observation
The new value of B[i] is at least twice the old value of B[j].
Observation
Suppose m is an element of the smaller set J, so its label A[m]
changed from j to i.
Then the observation above tells us that B[A[m]] (formerly the old
B[j], now the new B[i]) at least doubled.
What’s the significance of B[A[m]]? It’s the number of elements in

the set containing m.
Union-Find 123
The first k Union operations can touch at most 2k elements

of S (with equality if they each merge a different pair of
singleton sets).
Thus, the set containing an element m after the first k

Union operations must have at most 2k elements.
Since every Union operation which changes A[m] at least

doubles B[A[m]], we deduce that A[m] has changed at most
log 2k times.
Thus, since at most 2k elements have their label changed at

all, we can conclude that the first k Union operations will
cause at most 2k log 2k label changes in A.
Union-Find 124
Each Union operation requires only constant time to update

the size array B and the list array L.
Thus, the first k Union operations take O(k log k) time in

total.
This Union-Find data structure is good enough to get the

sharpest possible bound on the run time of Kruskal’s
algorithm.
See the textbook for a Union-Find data structure based on

pointers and path compression, which further reduces the
amortised complexity of the Union operation at the cost of
increasing the complexity of the Find operation from O(1) to
O(log n).
We now use the previously described Union-Find data

structure to efficiently implement Kruskal’s algorithm on a
graph G = (V , E ) with n vertices and m edges.
We first have to sort m edges of graph G which takes time

O(m log m). Since m < n2 , we can rewrite this as
O(m log n2 ) = O(m log n).
As we progress through the execution of Kruskal’s algorithm,

we will start with n isolated vertices, which will be merged
into connected components until all vertices belong to a single
connected component. We use the Union-Find data structure
to keep track of the connected components constructed at
any stage.
For each edge e = (u, v ) on the sorted list of edges, we use

two Find operations to determine whether vertices u and v
belong to the same component.
If they do not belong to the same component, i.e., if

Find(u) = i and Find(v ) = j where i 6= j, we add edge e to
the spanning tree being constructed and perform Union(i, j)
to merge the connected components containing u and v .
If instead Find(u) = Find(v ), there is already a path

between u and v , so adding this edge would create a cycle.
Therefore, we simply discard the edge.
We perform 2m Find operations, each costing O(1).
We also perform n − 1 Union operations, which in total cost

O(n log n).
The overall time complexity is therefore

O(m log n + m + n log n).
The first term (from sorting) dominates, so we can simplify

the time complexity to O(m log n).
k-clustering of maximum spacing 128
Problem
Instance: A complete graph G with weighted edges representing
distances between the two vertices.
Task: Partition the vertices of G into k disjoint subsets so that

the minimal distance between two points belonging to different
sets of the partition is as large as possible. Thus, we want a
partition into k disjoint sets which are as far apart as possible.
Solution
Sort the edges in increasing order and start performing the usual
Kruskal’s algorithm for building a minimal spanning tree, but stop
when you obtain k connected components, rather than a single
spanning tree.
Proof of optimality
Let δ be the distance associated with the first edge of the

minimal spanning tree which was not added to our k
connected components.
It is clear that δ is the minimal distance between two vertices

belonging to different connected components.
All the edges included in the connected components produced

by our algorithm are of length at most δ.
Proof of optimality (continued)
Consider any partition S into k subsets different from the one

produced by our algorithm.
This means that there is a connected component produced by

our algorithm which contains vertices x and y such that
x ∈ Si and y 6∈ Si for some Si ∈ S.
y
Si
x z
w
Sj
Since x and y belong to the same connected component,

there is a path in that component connecting x and y .
Let z and w be two consecutive vertices on that path such

that z belongs to Si and w 6∈ Si .
Thus, w ∈ Sj for some j 6= i.

Since (z, w ) was an edge chosen by our proposed algorithm,

we know that d(z, w ) ≤ δ.
It follows that the distance between these two clusters

Si , Sj ∈ S is at most δ.
Thus, such a partition cannot be a better clustering than the

one produced by our algorithm.
What is the time complexity of this algorithm?
We have Θ(n2 ) edges; thus sorting them by weight will take

O(n2 log n2 ), which we can simplify to O(n2 log n).
Running the (partial) Kruskal algorithm requires O(m log n)

steps, making use of the Union-Find data structure. Since
m = Θ(n2 ), this step also takes O(n2 log n).
So the algorithm has time complexity O(n2 log n) in total.

1. Introduction
4. Puzzle
Puzzle 137
Problem
Bob is visiting Elbonia and wishes to send his teddy bear to Alice,
who is staying at a different hotel. Both Bob and Alice have boxes
like the one illustrated above, as well as padlocks which can be
used to lock the boxes.
Puzzle 138
Problem (continued)
However, there is a problem. The Elbonian postal service
mandates that when a nonempty box is sent, it must be locked.
Also, they do not allow keys to be sent, so the key must remain
with the sender. Finally, you can send padlocks only if they are
locked. How can Bob safely send his teddy bear to Alice?
Puzzle 139
Hint
The way in which the boxes are locked (via a padlock) is
important. It is also crucial that both Bob and Alice have padlocks
and boxes. They can also communicate over the phone to agree on
the strategy.
There are two possible solutions; one can be called the “AND”
solution, the other can be called the “OR” solution. The “AND”
solution requires 4 mail one way services while the “OR” solution
requires only 2.
That’s All, Folks!!

4 Greedy Updated

Uploaded by

Copyright:

Available Formats

4 Greedy Updated

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 Greedy Updated

Uploaded by

Copyright:

Available Formats

4.

THE GREEDY METHOD

Aleks Ignjatović, ignjat@cse.unsw.edu.au

School of Computer Science and Engineering

Suppose you are searching for the highest point in a mountain

Task: Find a maximum size subset of compatible activities.

In the above example, our proposed algorithm chooses the

To prove the correctness of our algorithm, we will use an exchange

Does the new selection have any conflicts? No!

What is the time complexity of the algorithm?

We represent activities by ordered pairs of their starting and

We go through this sorted list in order. How do we tell

Suppose we are up to activity i, starting at si and finishing at

If all previously chosen activities finished before si , activity i

We would prefer not to go through all previously chosen

We need only keep track of the latest finishing time among

Since we process activities in increasing order of finishing time,

Every activity is therefore either chosen (and the last finishing

Task: Find a subset of compatible activities of maximal total

Why is this optimal?

When does Greedy strategy work?

Task: Design an algorithm for placing the minimal number of cell

Maybe we should put the first tower at the spot where it

Exercise: Show that this does not work by finding a counter

Let us attempt a different greedy algorithm, processing the

The first house must be covered by some tower, which we

This tower may cover some other houses, but eventually we

Continue in this way until all houses are covered.

At each house, we need to decide whether to place a new

Therefore this algorithm runs in O(n) time if the houses are

One of Telstra’s engineers started with the house closest to

Is there a placement of houses for which the associate is right?

We will show that there exists a scheduling without inversions

Thus, if we show that we can swap two adjacent inverted

Note that swapping adjacent inverted jobs reduces the larger

If the files are stored in order l1 , l2 , . . . , ln , then the expected

l1 + (l1 + l2 ) + (l1 + l2 + l3 ) + . . . + (l1 + l2 + l3 + . . . + ln )

This is minimised if l1 ≤ l2 ≤ l3 ≤ . . . ≤ ln , so we simply sort

If the files are stored in order l1 , l2 , . . . ln , then the expected

l1 p1 + (l1 + l2 )p2 + (l1 + l2 + l3 )p3

We now show that this is minimised if the files are ordered in

Let us see what happens if we swap two adjacent files, say

Thus, E − E 0 = lk pk+1 − lk+1 pk , which is positive whenever

Consequently, E > E 0 if and only if pk /lk < pk+1 /lk+1 , which

As long as the sequence is not sorted, there will be inversions

Note: We DO NOT use the Bubble Sort in our algorithm, it is

Task: Describe and analyse an efficient algorithm to compute the

No! In the above example, this strategy needs three stabbing

What is the best place to stab it?

Task: Select a non-negative quantity of each substance, with total

Task: Find a subset S of the items with total weight not

Can we always choose the item of highest value per unit

A(10 kg, $60), B(20 kg, $100), C (30 kg, $120)

and a knapsack of capacity W = 50 kg.

The greedy strategy would choose items A and B, while the

So when does the Greedy Strategy work??

Assume you are given n sorted arrays of different sizes.

This problem is somewhat related to the next problem, which is

Assume you are given a set of symbols, for example the