combinatorial_algorithms
combinatorial_algorithms
Alexander Souza
Contents
1 Introduction 4
1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Combinatorial Optimization Problems . . . . . . . . . . . . . . . . . . . . . 5
1.3 Algorithms and Approximation . . . . . . . . . . . . . . . . . . . . . . . . . 6
I Optimization Algorithms 7
2 Network Flows 8
2.1 Maximum Flows and Minimum Cuts . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Edmonds-Karp Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Minimum Cost Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Matroids 18
3.1 Independence Systems and Matroids . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Matroid Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Linear Programming 27
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
II Approximation Algorithms 37
5 Knapsack 38
5.1 Fractional Knapsack and Greedy . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Pseudo-Polynomial Time Algorithm . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Fully Polynomial-Time Approximation Scheme . . . . . . . . . . . . . . . . 42
6 Set Cover 44
6.1 Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 Primal-Dual Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3 LP-Rounding Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7 Satisfiability 54
7.1 Randomized Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Derandomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2
8 Facility Location 59
8.1 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.2 Primal-Dual Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9 Makespan Scheduling 65
9.1 Identical Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.2 Unrelated Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
10 Bin Packing 71
10.1 Hardness of Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.2 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.3 Asymptotic Polynomial Time Approximation Scheme . . . . . . . . . . . . . 73
11 Traveling Salesman 75
11.1 Hardness of Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.2 Metric Traveling Salesman . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3
Chapter 1
Introduction
1.1 Examples
We start with some examples of combinatorial optimization problems.
Example 1.1. The following problem is called the Knapsack problem. We are given an
amount of C Euro and wish to invest it among a set of n options. Each such option i has
cost ci and profit pi . The goal is to maximize the total profit.
Consider C = 100 and the following cost-profit table:
Option Cost Profit
1 100 150
2 1 2
3 50 55
4 50 100
Our choice of purchased options must not exceed our capital C. Thus the feasible solutions
are {1}, {2}, {3}, {4}, {2, 3}, {2, 4}, {3, 4}. Which is the best solution? We evaluate all
possibilities and find that {3, 4} give 155 altogether which maximizes our profit.
1
1
3
4
4
We can formulate our problem with the following mathematical program. We use the
variables xi,j ∈ {0, 1} that indicate if employee i is assigned to job j. We want to minimize
the time until all jobs are finished.
n
X
minimize max pj xi,j “minimize finishing time”
i=1,...,m
j=1
X
subject to xi,j = 1 j = 1, . . . , n “each job gets done”
i∈Sj
Example 1.3. Many combinatorial optimization problems, like the ones above can be for-
mulated in terms of a Integer Linear Program (ILP). Let A = (ai,j )i=1,...,m,j=1,...,n ∈
Rm,n be a matrix and let b = (bi )i=1,...,m ∈ Rm and c = (cj )j=1,...,n ∈ Rn be vectors.
Further let x = (xj )j=1,...,n ∈ Zn be variables that are allowed to take integral values, only.
Our objective function is to minimize c> x subject to Ax ≤ b. More explicitly
n
X
minimize cj xj “objective function”
j=1
Xn
subject to ai,j xj ≤ bi i = 1, . . . , m “constraints”
j=1
xj ∈ Z j = 1, . . . , n, j = 1, . . . , n “integrality”
Solving an ILP is in general NP-hard. However, we will often replace the constraints
xj ∈ Z with xj ∈ R. This is then called a relaxation as a Linear Program (LP) and
can be solved in polynomial time. Of course, such a solution is in general not feasible for
the ILP, but we can sometimes “turn” it into a feasible solution, which is not “too bad”.
Our goal is to find a feasible solution where the desired extremum of val is attained. Any
such solution is called an optimum solution, or simply an optimum. U and S are usually
not given explicitly, but implicitly.
5
Let us investigate the problem in Example 1.1 in with this formalism.
U = 2{1,2,3,4} ,
X
P = “total cost is at most C”, i.e., X ∈ S if ci ≤ C
i∈X
S = {{1}, {2}, {3}, {4}, {2, 3}, {2, 4}, {3, 4}},
(
U →R
val = P
X 7→ i∈X pi ,
extr = max .
6
Part I
Optimization Algorithms
7
Chapter 2
Network Flows
balf (v) = 0 for v ∈ V − {s, t}, balf (s) ≥ 0, and balf (t) ≤ 0.
8
source sink
s t
fe ≤ c(e) e ∈ A,
fe ≥ 0.
Since
P the flow f = 0 is feasible for this LP, and the LP is obviously bounded (by
e∈δ + (s) c(e)) we have that the Maximum Flow problem always has an optimum so-
lution. Of course, we can solve the problem by using any algorithm for solving LPs but we
are not satisfied with this – we want a combinatorial algorithm (without solving an LP)
with guaranteed polynomial running time.
Let S be a subset of the vertices, called a cut. The induced cut-edges is the set of
outgoing edges δ + (S) = {uv ∈ A : u ∈ S, v ∈ V − S} andPincoming edges δ − (S) = {vu ∈
A : u ∈ S, v ∈ V − S}. Define its capacity by cap(S) = e∈δ+ (S) c(e). An s − t-cut is a
cut so that s ∈ S and t ∈ V − S. A minimum cut refers to one with minimal capacity
among all s − t-cuts. We extend the definition of balance also for any cut S:
X X
balf (S) = f (e) − f (e).
e∈δ + (S) e∈δ − (S)
The following result tells us that the value of a flow can be expressed through the
incoming and outcoming flow of an arbitrary cut. Furthermore, the value of any flow
(including the maximum one) is bounded from above by the capacity of any cut. We will
see soon that the value of a maximum flow equals the capacity of a minimum cut.
Lemma 2.1. For any s − t-cut S and any s − t-flow f we have that
9
Proof. We use the flow conservation property, i.e., balf (v) = 0 for all v ∈ S − {s} to find
X
val(f ) = balf (s) = balf (v)
v∈S
X X X
= f (e) − f (e)
v∈S e∈δ + (v) e∈δ − (v)
X X
= f (e) − f (e)
e∈δ + (S) e∈δ − (S)
= balf (S).
P
Furthermore we have val(f ) ≤ e∈δ+ (S) c(e) = cap(S) since 0 ≤ f (e) ≤ c(e).
The following definitions and structural result are the basis for an algorithm. A path
P = e1 , . . . , e` is a sequence of pairwise disjoint edges with common vertices, that is ei ∈ A
such that vi vi+1 ∈ A or vi+1 vi ∈ A for i = 1, . . . ` − 1, ei 6= ej for 1 ≤ i < j ≤ `. The
number ` of edges in P is called its length. A v-w-path P has the form e1 = v· and e` = ·w,
i.e., it starts at v and ends at w. An edge e = vw in a path is called foreward edge if
vw ∈ A; backward edge if wv ∈ A. (A v-v-path is called a cycle.)
An s-v-path P is called f -augmenting with respect to a flow f if
By how much can we increase the current flow value using a particular augmenting path
P ? Define the quantity
Proof. By definition of the quantity α and because each edge occurs at most once in P , we
have that 0 ≤ f 0 (e) ≤ c(e) for all e ∈ A. It remains to show that f 0 is flow conserving. It
is clear that balf 0 (s) ≥ balf (s) ≥ 0 and consequently balf 0 (t) ≤ balf (t) ≤ 0. Consider an
augmentation along edges ei ei+1 with ei = vi vi+1 and ei+1 = vi+1 vi+2 for i = 1, . . . , ` − 1.
Call v = vi+1 and distinguish four cases:
+α +α +α −α
v v
10
Algorithm 2.1 Ford-Fulkerson
Input. Network N = (G, c, s, t) with c : A → R+ .
Step 3. Compute
Theorem 2.3. In a network N , the maximum value of an s − t-flow equals the minimum
capacity of an s − t-cut.
Proof. We show that an s − t-flow f has maximum value if and only if there is no f -
augmenting path from s to t. In that case we will be able to find a minimum cut R with
equal capacity.
Let there be an f -augementing path P from s to t, let α be as above and obtain f 0 by
augmenting f by α along P . Observe that val(f 0 ) > val(f ), i.e., that f is not maximal.
Now let there be no f -augmenting path from s to t. Consider the set S of vertices with
augmenting paths from s, i.e., S = {v ∈ V : there is an f -augmenting path from s to v}
and t 6∈ S. Thus S is an s − t-cut. By definition of augmenting paths, we must have
e ∈ δ + (S) and f (e) = 0 for all e ∈ δ − (S). Hence, using Lemma 2.1 (1),
f (e) = c(e) for allP
we have val(f ) = e∈δ+ (S) c(e) = cap(S). By Lemma 2.1 (2) f must be a maximum flow
and S be a minimum cut.
If all capacities are integers then α is an integer and the algorithm terminates after a
finite number of iterations. Thus we obtain the following important consequence:
Corollary 2.4. If the capacities of a network N are integers, then there is an integral
maximum flow.
If the capacities are not integers, then Ford-Fulkerson might not even terminate.
Especially, we have not yet specified how we actually choose the augmenting paths men-
tioned in Step 2 of the algorithm. This must be done carefully in order to obtain a
polynomial time algorithm as the following instance illustrates. It turns out that choosing
shortest augmenting paths guarantee termination after a polynomial number of augmen-
tations; see the Edmonds-Karp algorithm.
Example 2.5. To show that Ford-Fulkerson is not a polynomial time algorithm con-
sider the following network. Here M is a large number.
11
a
M M
s 1 t
M M
b
Alternatingly augmenting one unit of flow along the paths s-a-b-t and s-b-a-t requires 2M
augmentations. This is already exponential because the (binary) input size of the graph
is O (log M ). In contrast the augmenting paths s-a-t and s-b-t already give a maximum
flow after two augmentations.
It is an exercise to show the following flow decomposition result, which provides another
structural insight into flows.
Theorem 2.6. Given a network N = (G, c, s, t) and an s − t-flow f then there is a familiy
P of simple paths, a familily C of simple cycles and positive numbers h : P ∪ C → R+ such
that
P
(1) f (e) = T ∈P∪C:e∈T h(T ) for all e ∈ A,
P
(2) val(f ) = T ∈P h(T ), and
Step 2. Find a shortest f -augmenting path P w.r.t. the number of edges. If none exists
then return f .
The following lemma is crucial for the proof of the worst-case running time. Let
f0 , f1 , f2 , . . . be the flows constructed by the algorithm. Denote the shortest length of an
augmenting path from s to a vertex v with respect to fk by xv (k) and respectively from v
to t by yv (k).
12
Lemma 2.8. We have that
Proof. Suppose for the sake of contradiction that (1) is violated for some pair (v, k). We
may assume that xv (k + 1) is minimal among the xw (k + 1) for which (1) does not hold.
Let e be the last edge in a shortest augmenting path from s to v with respect to fk+1 .
Suppose e = uv is a forward edge. Hence fk+1 (e) < c(e), xv (k + 1) = xu (k + 1) + 1,and
xu (k + 1) ≥ xu (k) by our choice of xv (k + 1). Thus xv (k + 1) ≥ xu (k) + 1. Suppose that
fk (e) < c(e) which yields xv (k) ≤ xu (k) + 1 and thus xv (k + 1) ≥ xv (k), a contradiction.
Hence we must have fk (e) = c(e) which implies that e was a backward edge when
fk was changed to fk+1 . As we used an augmenting path of shortest length we have
xu (k) = xv (k) + 1 and thus xv (k + 1) − 1 = xu (k + 1) ≥ xu (k) ≥ xv (k) + 1. Hence
xv (k + 1) ≥ xv (k) + 2 yields a contradiction.
Similarly when e is a backward edge. The proof of (2) is analogous to (1).
Proof of Theorem 2.7. When we increase the flow, the augmenting path always contains
a critical edge, i.e., an edge where the flow is either increased to meet the capacity or
reduced to zero.
Let e = uv be critical in the augmenting path w.r.t. fk . This path has xv (k) + yv (k) =
xu (k) + yu (k) edges. If e is used the next time in an augmenting path w.r.t. fh , say, then
it must be used in the opposite direction as w.r.t. fk .
Suppose that e = uv was a forward edge w.r.t. fk . Then xv (k) = xu (k) + 1 and
xu (h) = xv (h)+1. By Lemma 2.8 xv (h) ≥ xv (k) and yu (h) ≥ yu (k). Hence xu (h)+yu (h) =
xv (h) + 1 + yu (h) ≥ xv (k) + 1 + yu (k) ≥ xu (k) + yu (k) + 2. Thus the augmenting path
w.r.t. fh is at least two edges longer than the augmenting path w.r.t. fk . Similarly if e is
a backward edge.
No shortest augmenting path can contain more than n − 1 edges and hence each edge
can be critical at most (n − 1)/2 times. As each augmenting path contains at least one
critical edge, there can be at most O (nm) augmentations and each one takes time O (m).
This yields the running time of O(nm2 ).
There are further algorithms that solve the Maximum Flow problem in less time. For
√
example the Goldberg-Tarjan algorithm runs in time O n2 m ; with sophisticated
implementations O nm log(n2 /m) and O min{m1/2 , n2/3 }m log(n2 /m) log cmax can be
reached.
13
P P
(2) b(v) = balf (v) = e∈δ + (v) f (e) − e∈δ − (v) f (e).
The second part of our task is easy. Given a network N = (G, c, w, b) with balance
vector b, we can decide if a b-flow exists by solving a Maximum Flow problem: Add
two vertices s and t and edges sv, vt with capacities c(sv) = max{0, P b(v)} and c(vt) =
max{0, −b(v)} for all v ∈ V to N . Then any s − t-flow with value v∈V c(sv) in the
resulting network corresponds to a b-flow in the original network N .
For the remainder of the section we give an optimality criterion which leads directly
to an algorithm similar to the Ford-Fulkerson method. But here we augment along
cycles instead of paths. Again, the choice of the augmenting cycles must be done carefully.
But we omit this here and state the following theorem which refers to Orlin’s algorithm
without proof.
Theorem 2.9. There is an algorithm which solves the Minimum Cost Flow problem
on any network with n vertices and m edges in time O (m log m(m + n log n)).
We begin our discussion of an optimality criterion with a definition. Given a digraph
G = (V, A) with capacities c, weights w, and a flow f in G, construct the graph R = (V, A+
AR ) with AR = {wv : vw ∈ A}, where r ∈ AR is called a reverse edge. (The notation
“+” here means that we actually allow parallel edges in R). The residual capacities
cR : A + AR → R+ are cR (vw) = c(vw) − f (vw) for vw ∈ A and cR (wv) = f (vw)
for wv ∈ AR . The residual weight wR : A → R is wR (vw) = w(vw) for vw ∈ A and
wR (wv) = −w(vw) for wv ∈ AR . Finally define the residual graph Gf = (V, Af ) with
Af = {e ∈ A + AR : cR (e) > 0}.
Now, given a digraph G with capacities c and a b-flow f , an f -augmenting cycle is
a simple cycle in Gf . The following theorem is an optimality criterion for the Minimum
Cost Flow problem.
Theorem 2.10. Let N = (G, c, w, b) be an instance of the Minimum Cost Flow problem.
A b-flow f is of minimum cost if and only if there is no f -augmenting cycle with negative
total weight.
We prove the theorem in two steps. First we show that the difference between any two
b-flows gives rise to a circulation and second that this circulation can be decomposed into
circulations on simple cycles.
Lemma 2.11. Let G be a digraph with capacities c and let f and f 0 be b-flows in (G, c).
Construct R and Gf as above and define g : A + AR → R+ by g(e) = max{0, f 0 (e) − f (e)}
for e ∈ A and g(e) = max{0, f (e0 ) − f 0 (e0 )} for all e ∈ AR with corresponding e0 ∈ A.
Then g is a circulation in R, g(e) = 0 for all e 6∈ Af and val(g) = val(f 0 ) − val(f ).
14
Proof. At each vertex v ∈ R we have
X X X X
g(e) − g(e) = (f 0 (e) − f (e)) − (f 0 (e) − f (e))
+ − + −
e∈δR (v) e∈δR (v) e∈δG (v) e∈δG (v)
X X X X
= f 0 (e) − f 0 (e) − f (e) − f (e)
+ − + −
e∈δG (v) e∈δG (v) e∈δG (v) e∈δG (v)
= b(v) − b(v) = 0.
so g is a circulation in R.
For e 6∈ Af consider two cases: If e ∈ A then f (e) = c(e) and hence f 0 (e) ≤ f (e) which
gives g(e) = 0. If e = wv ∈ AR then e0 = vw ∈ A and f (e0 ) = 0 which yields g(e) = 0.
We verify the last statement
X X X
val(g) = w(e)g(e) = w(e)f 0 (e) − w(e)f (e) = val(f 0 ) − val(f )
e∈A+AR e∈A e∈A
Proof of Theorem 2.10. If there is an f -augmenting cycle C with weight γ < 0, we can
augment f along C by some α > 0 and get a b-flow f 0 with cost decreased by −γα. So f
is not a minimum cost flow.
If f is not a minimum cost b-flow, there is another b-flow f 0 with smaller cost. Consider
g as defined in Lemma 2.11 and observe that g is a circulation with val(g) < 0. By
Lemma 2.12, g can be decomposed into flows on simple cycles. Since g(e) = 0 for all
e 6∈ Af , all these cycles are f -augmenting and one of them must have negative total
weight.
15
Problem 2.3 Assignment
Instance. Bipartite graph G = (L ∪ R, E) and a weight function w : E → R.
P
Task. Find perfect matching M with minimum weight val(M ) = e∈M w(e) or con-
clude that no such matching exists.
Proof. Let G = (V, E) be a bipartite graph with V = L ∪ R and |L| = |R| = n. Now
we construct a network N for the Minimum Cost Flow problem. We start with the
vertices V , add a vertex s and connect it with every vertex ` ∈ L with directed edges s`.
Further add a vertex t and introduce the directed edges rt for every r ∈ R. Further add
directed versions of all edges e ∈ E, i.e., a directed edge `r is added for every undirected
edge `r. The capacities of all these edges is one. The weights of the s` edges and the rt
edges are zero – the weights of the `r edges are equal to their weights in G. Now every
integral b-flow f in N with b = (b(s), b(v1 ), . . . , b(vn ), b(t)) = (n, 0, . . . , 0, −n) corresponds
to a perfect matching in G with the same weight, and vice versa.
Internet Dating
An internet dating website has ` females and r males in its pool. Furthermore, there is
a preference system, where each person describes her/himself and her/his ideal partner.
This system produces for each female i and each male j a value wij > 0. We seek to find
an assignment of females to males with maximum total value. By adding dummy vertices
with zero-weight edges to the appropriate side, and defining weights −wij we arrive at an
Assignment problem as defined above.
16
a permutation π of 1, 2, . . . , n, where π(j) gives the position of job j, and observe that we
can write
Xn Xn
cj = (n − π(j) + 1) · p1j ,
j=1 j=1
P the contribution p1j of job j in position π(j) is counted n − π(j) + 1 many times
because
in j cj .
For multiple machines, the crucial observation is that the contribution of any job j
can be described by pij times one of the multipliers n, n − 1, . . . , 1. Hence we create
the following graph: A source s, a sink t, for each job a vertex vj for j = 1, . . . , n, and
for each machine i exactly n slots, i.e., vertices sik for k = 1, . . . , n. We add edges svj
with zero weight and unit capacity. Furthermore, we add the edges vj sik with weight
(n − k + 1) · pij . Finally we add edges sik t with zero weight and unit capacity. Any b-flow
with b = (b(s), b(v1 ), . . . , b(vn ), b(s11 ), . . . , b(smn ), b(t)) = (n, 0, . . . , 0, −n) with minimum
cost corresponds to an optimal job-machine-assignment and vice versa.
17
Chapter 3
Matroids
(1) ∅ ∈ S and
(2) If X ⊆ Y ∈ S, then X ∈ S.
Maximal independent sets (with resect to inclusion) are called bases. In particular, for
X ⊆ U , any maximal independent subset of X is called a basis of X. Furthermore, there
is a cost function c : 2U → R, which is modular, i.e., for any X ⊆ U we have
X
c(X) = c(x).
x∈X
18
Problem 3.2 Minimum Independence System Basis
Instance. Independence system (U, S), cost function c : U → R.
Task. Find a forrest F ⊆ E, i.e., a cycle-free subgraph, having maximal cost c(F ) =
P
e∈F c(e).
All the above problems can be solved in polynomial time, but there are also a lot of NP-
hard combinatorial optimization problems that can be formulated in terms of independence
systems. We will see that Maximum Cost Forrest and Minimum Spanning Tree can
be expressed as a special independence system, called matroid, while Maximum Cost
Matching (for bipartite graphs) can be solved in terms of intersection of two matroids.
An independence system (U, S) is called a matroid if X, Y ∈ S and |X| > |Y |, then
there is x ∈ X − Y such that Y ∪ {x} ∈ S.
Observation 3.1. Let (U, S) be an independence system. Then the following statements
are equivalent:
Proof. By definition (1) and (2) are equivalent. Statements (2) and (3) are equivalent
and (2) implies (4). To prove that (4) implies (2) let X, Y ∈ S and |X| > |Y |. By (4) Y
can not be a basis of X ∪ Y . So there must be an x ∈ (X ∪ Y ) − Y = X − Y such that
Y ∪ {x} ∈ S.
19
Problem 3.5 Maximum Cost Matching
Instance. Undirected graph G = (V, E), cost function c : E → R.
lrank(X)
q(U, S) = min .
X⊆U rank(X)
Output. Set X ∈ S
Step 1. X = ∅.
Step 4. Return X.
The following theorem states that the approximation ratio of Greedy is at least the
rank quotient (and of course at most one). Notice that Greedy is a polynomial time
algorithm if the independence oracle has polynomial time (which is often the case in the
applications).
20
I = (U, S, c). Let opt(I) denote the optimum cost for Maximum Independence System
Member. Then we have
greedy(I)
q(U, S) ≤ ≤1
opt(I)
for all c : U → R+ . There is a cost function where the lower bound is attained.
Proof. Let U = {u1 , . . . , un } be ordered such that c(u1 ) ≥ · · · ≥ c(un ). Let Gn be the
solution found by Greedy while On is an optimum solution. Let Uj = {u1 , . . . , uj },
Gj = Gn ∩ Uj , and Oj = On ∩ Uj for j = 0, . . . , n. Set dn = c(un ) and dj = c(uj ) − c(uj+1 )
for j = 1, . . . , n − 1.
Since Oj ∈ S we have |Oj | ≤ rank(Uj ). Since Gj is a basis of Uj we have |Gj | ≥
lrank(Uj ). We conclude that
n
X
c(Gn ) = (|Gj | − |Gj−1 |)c(uj )
j=1
n
X
= |Gj |dj
j=1
Xn
≥ lrank(Uj )dj
j=1
n
X
≥ q(U, S) rank(Uj )dj
j=1
Xn
≥ q(U, S) |Oj |dj
j=1
Xn
= q(U, S) (|Oj | − |Oj−1 |)c(uj )
j=1
= q(U, S)c(On ).
To show that the lower bound is sharp choose V ⊆ U and bases B1 , B2 of V such that
|B1 |/|B2 | = q(U, S). Define c(v) = 1 if v ∈ V and c(v) = 0 if v ∈ U − V . Sort u1 , . . . , un
such that c(u1 ) ≥ · · · ≥ c(un ) and B1 = {u1 , . . . , u|B1 | }. Then c(Gn ) = |B1 | and c(On ) =
|B2 |. Thus the lower bound is attained.
Specifically, if (U, S) is a matroid, then Greedy always determines an optimum solu-
tion (and vice versa).
Theorem 3.4. An independence system (U, S) is a matroid if and only if Greedy finds
an optimum solution for Maximum Independence System Member for all cost func-
tions c : U → R+ .
Proof. By Theorem 3.3, we have q(U, S) < 1 if and only if there is a cost function c :
U → R+ , for which the Greedy algorithm does not find an optimum solution. By
Observation 3.2 we have that q(U, S) < 1 if and only if (U, S) is not a matroid.
21
Proposition 3.5. Any independence system is the intersection of a finite number of ma-
troids.
Thus, the intersection of matroids is not a matroid in general. Hence we can not
expect that a Greedy algorithm finds an optimum common independent set. However,
the following result implies a lower bound on the approximation ratio of Greedy.
As we have seen in Theorem 3.3 that the lower bound is sharp, Greedy can be
arbitrarily bad when used for optimizing over arbitrary independence systems, i.e., the
intersection of p matroids. However, for optimizing over the intersection of two matroids,
there is a polynomial time algorithm, called Edmonds, provided that the independence
oracle is polynomial (see below). The important property of the algorithm is that it
generalizes the concept of an augmenting path. The intersection problem becomes NP-
hard for more than two matroids.
An important special case is that Maximum Cost Matching in bipartite graphs can
be formulated as matroid intersection of two matroids. Specifically let G = (L ∪ R, E) be
bipartite and let S = {M ⊆ E : M is a matching in G}. Then the independence system
(E, S) is the intersection of the two matroids (E, SL ) and (E, SR ) with
where δM (v) = {e ∈ M : v ∈ e}. It is an exercise to show that (E, SL ) and (E, SR ) are
actually matroids.
The remainder of this section is devoted to the development of Edmonds. We start
our discussion with basic facts on the submodularity of the rank function of a matroids.
The proof is left as an execise.
Theorem 3.7. Let U be a finite set and r : 2U → N. Then the following are equivalent:
(a) r(∅) = 0;
(b) r(X) ≤ r(X ∪ {y}) ≤ r(X) + 1;
(c) If r(X ∪ {x}) = r(X ∪ {y}) = r(X) then r(X ∪ {x, y}) = r(X).
22
For any independence system (U, S) the sets 2U − S are called dependent. A minimal
dependent set is called a circuit.
Lemma 3.8. Let U be a finite set and C ⊆ 2U . C is the set of circuits of an independence
system (U, S), where S = {X ⊂ U : there is no Y ∈ C with Y ⊆ X}, if and only if the
following conditions hold:
(1) ∅ 6∈ C;
Proof. By definition, the family of circuits of any independence system satisfies (1) and (2).
If C satisfies (1), then (U, S) is an indepencence system. If C also satisfies (2), it is the
set of circuits of this independence system.
Theorem 3.9. Let (U, S) be an independence system. If for any X ∈ S and x ∈ U the
set X ∪ {x} contains at most p circuits, then q(U, S) ≥ 1/p.
Proof. Let V ⊆ U and let X and Y be two bases of V . We show |X|/|Y | ≥ 1/p.
Let X − Y = {u1 , . . . , ut }. We construct a sequence Y = Y0 , . . . , Yt of independent
subsets of X ∪ Y such that X ∩ Y ⊆ Yi ∩ {u1 , . . . , ut } = {u1 , . . . , ui } and |Yi−1 − Yi | ≤ p
for i = 1, . . . , t.
Since Yi ∪ {ui+1 } contains at most p circuits and each such circuit must meet Yi − X
(because X is independent), there is an Z ⊆ Yi −X such that |Z| ≤ p and (Yi −Z)∪{ui+1 } ∈
S. We set Yi+1 = (Yi − Z) ∪ {ui+1 }.
Now X ⊆ Yt ∈ S. Since X is a basis of V , X = Yt . We conclude that
t
X
|Y − X| = |Yi−1 − Yi | ≤ pt = p|X − Y |,
i=1
proving |Y | ≤ p|X|.
Theorem 3.10. If C is the set of circuits of an independence system (U, S), then the
following statements are equivalent:
Proof. Firstly (1) implies (4): Let C be the familiy of circuits of a matroid, and let
X, Y ∈ C, x ∈ X ∩ Y , and y ∈ X − Y . By Theorem 3.7 we find
23
So r((X ∪ Y ) − {x, y}) = r(X ∪ Y ). Let B be a basis of (X ∪ Y ) − {x, y}. Then B ∪ {y}
contains a circuit Z with y ∈ Z ⊆ (X ∪ Y ) − {x} as required.
Secondly, (4) trivially implies (3). Thirdly, (3) implies (2): If X ∈ S and X ∪ {x}
contains two circuits Y and Z then (3) yields (Y ∪ Z) − {x} 6∈ S. However, (Y ∪ Z) − {x}
is a subset of X. Finally, (2) implies (1) by Theorem 3.9 and Observation 3.2.
The idea behind the algorithm Edmonds is the following: Starting with X = ∅, we
augment X by one element in each iteration. Since in general we cannot hope for an
element x such that x ∈ S1 ∩ S2 , we shall look for “alternating paths”. To make this
convenient, we define an auxiliary graph. We apply the notion C(X, x) to (U, Si ) and
write Ci (X, x) for i ∈ {1, 2}.
Given a set X ∈ S1 ∩ S2 we define a directed auxiliary graph GX by
(1)
AX = {(x, y) : y ∈ U − X, x ∈ C1 (X, y) − {y}},
(2)
AX = {(y, x) : y ∈ U − X, x ∈ C2 (X, y) − {y}},
(1) (2)
GX = (U, AX ∪ AX ).
We set
SX = {y ∈ U − X : X ∪ {y} ∈ S1 },
TX = {y ∈ U − X : X ∪ {y} ∈ S2 }.
and look for a shortest path from SX to TX . Such a path will enable us to augment the
set X. (If SX ∩ TX 6= ∅, we have a path of length zero and we can augment X by any
element in SX ∩ TX .)
X 0 = (X ∪ {y0 , . . . , ys }) − {x1 , . . . , xs } ∈ S1 ∩ S2 .
24
Algorithm 3.2 Edmonds
Input. Matroids (U, S1 ) and (U, S2 ) by independence oracles.
Step 1. X = ∅.
Step 5. Augment X by P :
X = (X ∪ {y0 , . . . , ys }) − {x1 , . . . , xs },
We shall now prove that if there is no SX −TX -path in GX , then X is already maximum.
We need the following fact:
Proposition 3.13. Let (U, S1 ) and (U, S2 ) be two matroids with rank functions r1 and r2 .
Then for any X ∈ S1 ∩ S2 and Y ⊆ U we have
|X| ≤ r1 (Y ) + r2 (U − Y ).
25
Next we prove that r1 (U − R) = |X − R|. If not, there would be a y ∈ (U − R) − X
with (X − R) ∪ {y} ∈ S1 . Since X ∪ {y} 6∈ S1 (because y 6∈ SX ), the circuit C1 (X, y) must
(1)
contain an element x ∈ X ∩ R. But then (x, y) ∈ AX means that there is an edge leaving
R. This contradicts the definition of R.
Altogether we have |X| = r2 (R) + r1 (U − R). By Proposition 3.13, this implies opti-
mality.
Theorem 3.15. Let (U, S1 ) and (U, S2 ) be two matroids with rank functions r1 and r2 .
Then
max{|X| : X ∈ S1 ∩ S2 } = min{r1 (Y ) + r2 (U − Y ) : Y ⊆ U }.
Theorem 3.16. Edmonds correctly solves the Matroid Intersection problem in time
3
O |U | θ , where θ is the maximum complexity of the two independence oracles.
Proof. Correctness follows from Lemma 3.12 and 3.14. Steps 2 and 3 can be done in
2
O |U | θ time and Step 4 in O (|U |). Since there are at most |U | augmentations, the
result follows.
26
Chapter 4
Linear Programming
Linear programs (LP) play an important role in the theory and practice of optimization
problems. Many COPs can directly be formulated as LPs. Furthermore, LPs are invaluable
for the design and analysis of approximation algorithms. Generally speaking, LPs are
COPs with linear objective function and linear constraints, where the variables are defined
on a continous domain. We will be more specific below.
4.1 Introduction
We begin our treatment of linear programming with an example of a transportation prob-
lem to illustrate how LPs can be used to formulate optimization problems.
Example 4.1. There are two brickworks w1 , w2 and three construction sites s1 , s2 , s3 .
The works produce b1 = 60 and b2 = 30 tons of bricks per day. The sites require c1 = 30,
c2 = 20 and c3 = 40 tons of bricks per day. The transportation costs tij per ton from work
wi to site sj are given in the following table:
tij s1 s2 s3
w1 40 75 50
w2 20 50 40
Which work delivers which site in order to minimize the total transportation cost? Let us
write the problem as a mathematical program. We use variables xij that tell us how much
we deliver from work wi to site sj .
minimize 40x11 + 75x12 + 50x13 + 20x21 + 50x22 + 40x23
subject to x11 + x12 + x13 ≤ 60
x21 + x22 + x23 ≤ 30
x11 + x21 = 30
x12 + x21 = 20
x13 + x23 = 40
xij ≥ 0 i = 1, 2, j = 1, 2, 3.
How do we find the best xij ?
The general Linear Programming task is given in Problem 4.1.
As a shorthand we shall frequently write max{c> x : Ax ≤ b}. We can assume that
we deal with a maximization problem without loss of generality because we can treat a
minimization problem if we replace c with −c.
27
Problem 4.1 Linear Programming
Instance. Matrix A ∈ Rm×n , vectors b ∈ Rm and c ∈ Rn .
maximze c> x,
subject to Ax ≤ b,
x ∈ Rn .
The function val(x) = c> x is the objective function. A feasible x∗ which maximizes val
is an optimum solution and the value z ∗ = val(x∗ ) is called optimum value. Any x ∈ Rn
that satisfies Ax ≤ b is called feasible. The set P = {x ∈ Rn : Ax ≤ b} is called the feasible
region, i.e., the set of feasible solutions. If P is empty, then the problem is infeasible. If
for every α ∈ R, there is a feasible x such that c> x > α then the problem is unbounded.
This simply means that the maximum of the objective function does not exist.
4.2 Polyhedra
Consider the vector space Rn . A (linear) subspace S of Rn is a subset of Rn closed under
vector addition and scalar multiplication. Equivalently, S is the set of all points in Rn
that satisfy a set of homogeneous linear equations:
S = {x ∈ Rn : Ax = 0},
for some matrix A ∈ Rn×m . The dimension dim(S) is equal to the maximum number of
linear independent vectors in S, i.e., dim(S) = n − rank(A). Here rank(A) denotes the
number of linear independent rows of A. An affine subspace Sb of Rn is the set of all points
that satisfy a set of inhomogeneous linear equations:
Sb = {x ∈ Rn : Ax = b}.
We have dim(Sb ) = dim(S). The dimension dim(X) of any subset X ⊆ Rn is the smallest
dimension of any affine subspace which contains it.
An affine subspace of Rn of dimension n − 1 is called hyperplane, i.e., alternatively
H = {x ∈ Rn : a> x = b},
for some vector a ∈ Rn , a 6= 0 and scalar b. A hyperplane defines two (closed) halfspaces
H + = {x ∈ Rn : a> x ≥ b},
H − = {x ∈ Rn : a> x ≤ b}.
28
As a halfspace is a convex set, the intersection of halfspaces is also convex.
A polyhedron in Rn is a set
P = {x ∈ Rn : Ax ≤ b}
for some matrix A ∈ Rm×n and some vector b ∈ Rm . A bounded polyhedron is called
polytope.
Let P = {x : Ax ≤ b} be a non-empty polyhedron with dimension d. Let c be a vector
for which δ := max{c> x : x ∈ P } < ∞, then
Hc = {x : c> x = δ}
H1
x3
Facet
H3
x1
Vertex
Edge
H2
x2
(1) F is a face of P .
29
(2) There is a vector c with δ := max{c> x : x ∈ P } < ∞ and F = {x ∈ P : c> x = δ}.
(3) F = {x ∈ P : A0 x = b0 } =
6 ∅ for some subsystem A0 x ≤ b0 of Ax ≤ b.
A important class of faces are minimal faces, i.e., faces that do not contain any other
face. For these we have:
Corollary 4.3 and Lemma 4.5 already imply that Linear Programming can be solved
by solving the linear equation system A0 x = b0 for each subsystem A0 x ≤ b0 . This approach
obviously yields an exponential time algorithm. An algorithm which is more practicable
(although also exponential in the worst case) is the Simplex algorithm. The algorithm is
based on the following important consequence of Lemma 4.5.
Thus, it suffices to search an optimum solution among the vertices of the polyhedron.
This is what the Simplex algorithm is doing. We do not explain the algorithm in detail
here, but it works as follows. Provided that the polyhedron is not empty, it finds an
initial vertex. If the current vertex is not optimal, find another vertex with strictly larger
objective value (pivot rule). Iterate until an optimal vertex is found or the polyhedron
can be shown to be unbounded. See Figure 4.2.
The algorithm terminates after at most m n iterations (which is not polynomial). It
was conjectured that Simplex is polynomial until Klee and Minty gave an example where
the algorithm (with Bland’s pivot rule) uses 2n iterations on an LP with n variables and
2n constraints. It is not known if there is a pivot rule that leads to polynomial running
time. Nonetheless, Simplex with Bland’s pivot rule is frequently observed to terminate
after few iterations when run on “practical instances”.
However, there are algorithms, e.g., the Ellipsoid method and Karmakar’s algo-
rithm that solve Linear Programming in polynomial time. But these algorithms are
mainly of interest from a theoretical point of view. We conclude with the statement that
one can solve Linear Programming in polynomial time with “black box” algorithms.
30
x2
x1
4.3 Duality
Intuition behind Duality
Consider the following LP, which is illustrated in Figure 4.3
maximize x1 + x2 (4.1)
subject to 4x1 − x2 ≤ 8 (4.2)
2x1 + x2 ≤ 10 (4.3)
− 5x1 + 2x2 ≤ 2 (4.4)
− x1 ≤ 0 (4.5)
− x2 ≤ 0 (4.6)
max{c> x : Ax ≤ b}.
Because we are dealing with a maximization problem, every feasible solution x provides
the lower bound c> x on the value c> x∗ of the optimum solution x∗ , i.e., we know c> x ≤
c> x∗ .
Can we also obtain upper bounds on c> x∗ ? For any feasible solution x, the constraints
(4.2)–(4.6) are satisfied. Now compare the objective function (4.1) with the constraint
(4.3) coefficient-by-coefficient (where we remember that x1 , x2 ≥ 0 in this example):
1 · x1 + 1 · x2
≤
2 · x1 + 1 · x2 ≤ 10
31
(4.6)
x2
(4.4)
(4.3) (4.1) objective function
(4.2)
x1
(4.5)
Thus for every feasible solution x we have the upper bound x1 +x2 ≤ 10, i.e., the optimum
value can be at most 10. Can we improve on this? We could try 79 · (4.3) + 19 · (4.4):
1 · x1 + 1 · x2
≤
( 79 · 2 + 1
9 · (−5))x1 + ( 79 · 1 + 1
9 · 2)x2 ≤ 7
9 · 10 + 1
9 ·2= 72
9 =8
Hence we have x1 + x2 ≤ 8 for every feasible x and thus an upper bound of 8 on the
optimum value. If we look closely, our choices 7/9 and 1/9 give 79 · 2 + 19 · (−5) = 1 and
7 1 >
9 · 1 + 9 · 2 = 1, i.e., we have combined the coefficients of the objective function c x with
equality. This is also the best bound this approach can give here.
This suggests the following genear approach for obtining upper bounds on the optimal
value. Combine the constraints with non-negative multipliers y = (y1 , y2 , y3 , y4 , y5 ) such
that each coefficient in the result equals the corresponding coefficient in the objective
function, i.e., we want y > A = c> . We associate y1 with (4.2), y2 with (4.3), y3 with (4.4),
y4 with (4.5), and y5 with (4.6). Notice that the yi must be non-negative because we are
multiplying an inequality of the system Ax ≤ b, i.e., if a multiplier yi were negative we
change the corresponding inequality from “≤” to “≥”. Now y1 (4.2) + y2 (4.3) + y3 (4.4) +
y4 (4.5) + y5 (4.6) evaluates to
(4y1 + 2y2 − 5y3 − y4 )x1 + (−y1 + y2 + 2y3 − y5 )x2 ≤ 8y1 + 10y2 + 2y3 + 0y4 + 0y5
32
and want to find values for y1 , y2 , y3 , y4 , y5 ≥ 0 that satisfy:
1 · x1 + 1 · x2
=
(4y1 + 2y2 − 5y3 − y4 )x1 + (−y1 + y2 + 2y3 − y5 )x2 ≤ 8y1 + 10y2 + 2y3 + 0y4 + 0y5
Of course, we are interested in the best choice for y = (y1 , y2 , y3 , y4 , y5 ) ≥ 0 the approach
can give. This means that we want to minimize the upper bound 8y1 +10y2 +2y3 +0y4 +0y5 .
We simply write down this task as a mathematical program, which turns out to be an LP.
Further note that the new objective function is the right hand side (8, 10, 2, 0, 0)> of the
original LP and that the new right hand side is the objective function (1, 1)> of the original
LP. Thus the above LP is of the form
Notice that there is a feasible solution x = (2, 6)> for the original LP that gives
c> x = 8. Further note that the multipliers y = (0, 7/9, 1/9, 0, 0)> yield y > b = 8, i.e.,
c> x = y > b.
Hence we have a certificate that the solution x = (2, 6) is indeed optimal (because we
have a matching upper bound). Not surprisingly this is no exception but the principal
statement of the strong duality theorem.
Lemma 4.7. The dual of the dual of an LP is (equivalent to) the original LP.
Now we can say that the LPs P and D are dual to each other or a primal-dual pair.
The following forms of primal-dual pairs are standard:
Lemma 4.8 (Weak Duality). Let x and y be respective feasible solutions of the primal-dual
pair P = max{c> x : Ax ≤ b} and D = min{y > b : y > A = c> , y ≥ 0}. Then c> x ≤ y > b.
33
Proof. c> x = (y > A)x = y > (Ax) ≤ y > b.
The following strong duality theorem is the most important result in LP theory and
the basis for a lot of algorithms for COPs.
Theorem 4.9 (Strong Duality). For any primal-dual pair P = max{c> x : Ax ≤ b} and
D = min{y > b : y > A = c> , y ≥ 0} we have:
(1) If P and D have respective optimum solutions x and y, say, then
c> x = y > b.
Before we prove the theorem, we establish the fundamental theorem of linear inequali-
ties. The heart of the proof actually gives a basic version of the Simplex algorithm. The
result also implies Farkas’ Lemma.
Theorem 4.10. Let a1 , . . . , am , b be vectors in n-dimensional space. Then
either (I): b = m
P
i=1 ai λi with λi ≥ 0 for i = 1, . . . , m,
Proof. We may assume that a1 , . . . , am span the n-dimensional space. Clearly, (I) and (II)
exclude each other as we would otherwise have the contradiction
To see that at least one of (I) and (II) holds, choose linearly independent ai1 , . . . , ain from
a1 , . . . , am and set B = {ai1 , . . . , ain }. Next apply the following iteration:
(i) Write b = λi1 ai1 + · · · + λin ain . If λi1 , . . . , λin ≥ 0 we are in case (I).
(ii) Otherwise, choose the smallest h among i1 , . . . , in with λh < 0. Let {x : c> x = 0}
be the hyperplane spanned by B − {ah }. We normalize c so that c> ah = 1. (Hence
c> b = λh < 0.)
(iv) Otherwise, choose the smallest s such that c> as < 0. Then replace B by (B −{ah })∪
{as }. Restart the iteration anew.
We are finished if we have shown that this process terminates. Let Bk denote the set
B as it is in the k-th iteration. If the process does not terminate, then Bk = B` for some
k < ` (as there are only finitely many choices for B). Let r be the highest index for which
ar has been removed from B at the end of one of the iterations k, k + 1, . . . , ` − 1, say in
iteration p. As Bk = B` , we know that ar also has been added to B in some iteration q
with k ≤ q ≤ `. So
Bp ∩ {ar+1 , . . . , am } = Bq ∩ {ar+1 , . . . , am }.
34
Let Bp = {ai1 , . . . , ain }, b = λi1 ai1 + · · · + λin ain , and let d be the vector c found in
iteration q. Then we have the contradiction
0 > d> b = d> (λi1 ai1 + · · · + λin ain ) = λi1 d> ai1 + · · · + λin d> ain + · · · + λin d> ain > 0,
where the second inequality follows from: If ij < r then λij ≥ 0, d> aij ≥ 0, if ij = r then
λij < 0, d> aij < 0, and if ij > r then d> aij = 0.
Proof. We first show the case x ≥ 0 with Ax = b if and only if y > b ≥ 0 for each y with
y > A ≥ 0.
Necessity is clear since y > b = y > (Ax) ≥ 0 for all x and y with x ≥ 0, y > A ≥ 0, and
Ax = b. For suffience, assume that there is no x ≥ 0 with Ax = b. Then, by Theorem 4.10
and denoting a1 , . . . , am be the colums of A, there is a hyperplane {x : y > x = 0} with
y > b < 0 for some y with y > A ≥ 0.
For the case x with Ax ≤ b if and only if y > b ≥ 0 for each y ≥ 0 with y > A = 0
consider A0 = [I, A, −A]. Observe that Ax ≤ b has a solution x if and only if A0 x0 = b has
a solution x0 ≥ 0. Now apply what we have just proved.
For the case x ≥ 0 with Ax ≤ b if and only if y > b ≥ 0 for each y ≥ 0 with y > A ≥ 0
consider A0 = [I, A]. Observe that Ax ≤ b has a solution x ≥ 0 if and only if A0 x0 = b has
a solution x0 ≥ 0. Now apply what we have just proved.
Proof of Theorem 4.9. For (1) both optima exist. Thus, if Ax ≤ b and y ≥ 0, y > A = c> ,
then c> x = y > Ax ≤ y > b. Now it suffices to show that there are x, y such that Ax ≤ b,
y ≥ 0, y > A = c> , c> x ≥ y > b, i.e., that
A 0 b
−c> b>
x ≤ 0
there are x, y such that y ≥ 0 and 0 > >
A y c
0 −A > −c>
By Lemma 4.11 this is equivalent to: If u, λ, v, w ≥ 0 with uA − λc> = 0 and λb> + vA> −
wA> ≥ 0 then ub + vc − wc ≥ 0.
Let u, λ, v, w satisfy this premise. If λ > 0 then ub = λ−1 λb> u> ≥ λ−1 (w − v)A> u> =
λ−1 λ(w − v)c = (w − v)c. If λ = 0, let Ax0 ≤ b and y0 ≥ 0, y0> A = c> . (x0 , y0 exist since
P and D are not empty.) Then ub ≥ uAx0 = 0 ≥ (w − v)A> y0> = (w − v)c.
The claim (2) directly follows from Lemma 4.8. For (3), if D is infeasible there is
nothing to show. Thus let D be feasible. From Lemma 4.11 we get: Since Ax ≤ b is
infeasible, there is a vector y ≥ 0 with y > A = 0 and y > b < 0. Let z be such that
z > A = c> and α > 0. Then αy + z is feasible with objective value αy > b + z > b, which can
be made arbitrarily small since y > b < 0 and α > 0.
The theorem has a lot of implications but we only list two of them. The first one is
called complementary slackness (and gives another way of proving optimality).
35
Corollary 4.12. Let max{c> x : Ax ≤ b} and min{y > b : y > A = c, y ≥ 0} be a primal-
dual pair and let x and y be respective feasible solutions. Then the following statements
are equivalent:
Secondly, the fact that a system Ax ≤ b is infeasible can be proved by giving a vector
y ≥ 0 with y > A = 0 and y > b < 0 (Farkas’ Lemma).
36
Part II
Approximation Algorithms
37
Chapter 5
Knapsack
This chapter is concerned with the Knapsack problem. This problem is of interest in
its own right because it formalizes the natural problem of selecting items so that a given
budget is not exceeded but profit is as large as possible. Questions like that often also
arise as subproblems of other problems. Typical applications include: option-selection in
finance, cutting, and packing problems.
In the Knapsack problem we are given a budget W and n items. Each item j comes
along with a profit cj and a weight wj . We are asked to choose a subset of the items as to
maximize total profit but the total weight not exceeding W .
Example 5.1. We are given an amount of W and we wish to buy a subset of n items
and sell those later on. Each such item j has cost wj but yields profit cj . The goal is to
maximize the total profit. Consider W = 100 and the following profit-weight table:
j cj wj
1 150 100
2 2 1
3 55 50
4 100 50
Our choice of purchased items must not exceed our capital W . Thus the feasible solu-
tions are {1}, {2}, {3}, {4}, {2, 3}, {2, 4}, {3, 4}. Which is the best solution? Evaluating all
possibilities yields that {3, 4} gives 155 altogether which maximizes our profit.
xj ∈ {0, 1} j = 1, . . . n.
38
For anPitem j the quantity cj is called its profit. The profit of a vector x ∈ {0, 1}n is
val(x) = nj=1 cj xj .
n
The number wj is calledPn the weight of item j. The weight of a vector x ∈ {0, 1}
is given by weight(x) = j=1 wjP xj . In order to obtain a non-trivial problem we assume
wj ≤ W for all j = 1, . . . , n and nj=1 wj > W throughout.
Knapsack is NP-hard which means that “most probably”, there is no polynomial
time optimization algorithm for it. However, in Section 5.1 we derive a simple 1/2-
approximation algorithm. In Section 5.3 we can even improve on this by giving a polynomial-
time 1 − ε-approximation algorithm (for every fixed ε > 0).
n
X
maximize val(x) = cj xj ,
j=1
n
X
subject to wj xj ≤ W,
j=1
0 ≤ xj ≤ 1 j = 1, . . . , n.
This problem is solvable in polynomial time quite easily. The proof of the observation
below is left as an exercise.
Observation 5.2. Let c, w ∈ Nn be non-negative integral vectors with
c1 c2 cn
≥ ≥ ··· ≥
w1 w2 wn
and let
j
( )
X
k = min j ∈ {1, . . . , n} : wi > W .
i=1
Then an optimum solution for the Fractional Knapsack problem is given by
xj = 1 for j = 1, . . . , k − 1,
W − k−1
P
i=1 wi
xj = for j = k, and
wk
xj = 0 for j = k + 1, . . . , n.
The ratio cj /wj is called the efficiency of item j. The item number k, as defined above,
is called the break item.
Now we turn our attention back to the original Knapsack problem. We may assume
that the items are given in non-increasing order of efficiency. Observation 5.2 suggests the
following simple algorithm: xj = 1 for j = 1, . . . , k − 1, xj = 0 for j = k, . . . , n.
Unfortunately, the approximation ratio of this algorithm can be arbitrarily bad as the
example below shows. The problem is that more efficient items can “block” more profitable
ones.
39
Example 5.3. Consider the following instance, where W is a sufficiently large integer.
j cj wj cj /wj
1 1 1 1
2 W −1 W 1 − 1/W
The algorithm chooses item 1, i.e., the solution x = (1, 0) and hence val(x) = 1. The
optimum solution is x∗ = (0, 1) and thus val(x∗ ) = W − 1. The approximation ratio of
the algorithm is 1/(W − 1), i.e., arbitrarily bad. However, this natural algorithm can be
turned into a 1/2-approximation.
Proof. The value obtained by the Greedy algorithm is equal to max{val(x), val(y)}.
Let x∗ be an optimum solution for the Knapsack instance. Since every solution
that is feasible for the Knapsack instance is also feasible for the respective Fractional
Knapsack instance we have that
val(x∗ ) ≤ val(z ∗ ),
where z ∗ is the respective optimum solution for Fractional Knapsack. Observe that it
has the structure z ∗ = (1, . . . , 1, α, 0, . . . , 0), where α ∈ [0, 1) is at the break item k. The
solutions x and y are x = (1, . . . , 1, 0, 0, . . . , 0) and y = (0, . . . , 0, 1, 0, . . . , 0).
In total we have
40
notation write x ∈ {0, 1}j 0n−j . Now the variable mj,k equals the minimum total weight
of such a solution x with weight(x) ≤ W and val(x) = k. That is, after defining the set
Wj,k = {weight(x) : weight(x) ≤ W, val(x) = k, x ∈ {0, 1}j 0n−j } we require
(Recall that for any finite set S of integers inf S = min S if S 6= ∅ and inf S = ∞,
otherwise.) P
Let C be any upper bound on the optimum profit, for example C = i ci . Clearly, the
value of an optimum solution for Knapsack is the largest value k ∈ {0, . . . , C} such that
mn,k < ∞. The algorithm Dynamic Programming Knapsack recursively computes the
values for mj,k and then returns the optimum value for the given Knapsack instance. In
the algorithm below, the variables x(j, k) are n-dimensional vectors that store the solutions
corresponding to mj,k , i.e., with weight equal to mj,k and value k.
If the first case applied set x(j, k)i = x(j − 1, k − cj )i for i 6= j and x(j, k)j = 1.
Otherwise set x(j, k) = x(j − 1, k).
Step 3. Determine the largest k ∈ {0, . . . , C} such that mn,k < ∞. Return x(n, k).
Theorem 5.5. The Dynamic Programming Knapsack algorithm computes the op-
timum value of the Knapsack instance W , w, c ∈ Nn in time O (nC), where C is an
arbitrary upper bound on this optimum value.
Proof. The running time is obvious. For the correctness we prove that the values mj,k
computed by the algorithm satisfy
41
By construction of the algorithm and induction hypothesis we have weight(x) ≤
inf Wj−1,k and weight(x) = wj + inf Wj−1,k−cj . That is, the weight of x is at most the
weight of any solution without the j-th item and at most the weight of any solution
including the j-th item. Hence mj,k = inf Wj,k .
In the other situation, when the algorithm sets
mj,k = mj−1,k ,
then either cj > k and hence no solution with value equal to k can contain the j-th item,
or mj−1,k + wj > W , i.e., adding the j-th item is infeasible, or mj−1,k + wj > inf Wj−1,k ,
i.e., there is a solution with less weight and still value equal to k.
Step 1. Run Greedy on the instance W, w, c and let x be the solution. If val(x) = 0 then
return x.
Step 3. Set C = 2val(x)/t and apply the Dynamic Programming Knapsack algorithm
on the instance W, C, w, c0 and let y be the solution obtained.
42
Theorem 5.6. For every fixed ε > 0, the Knapsack FPTAS algorithm is a 1 − ε-
2
approximation algorithm with running time O n /ε .
Proof. The value of the solution returned by the algorithm is equal to max{val(x), val(y)}.
Let x∗ be an optimum solution for the instance W, w, c. By Theorem 5.4 we have 2val(x) ≥
val(x∗ ) and hence the choice C = 2val(x)/t is a legal upper bound for the optimum value of
the rounded instance W, w, c0 . By Theorem 5.5 y is an optimum solution for this instance
and we have
n
X n
X n
X
val(y) = cj yj ≥ tc0j yj = t c0j yj
j=1 j=1 j=1
Xn Xn Xn
≥t c0j x∗j = tc0j x∗j > (cj − t)x∗j ≥ val(x∗ ) − nt.
j=1 j=1 j=1
If t = 1 then y is optimal by Theorem 5.5. Otherwise the above inequality and the choice
of t yields val(y) ≥ val(x∗ ) − εval(x) and hence
where we have used the definition of t: If t = 1 then val(x) ≤ n/ε and otherwise t =
εval(x)/n. This running time dominates the time needed for the other steps.
43
Chapter 6
Set Cover
The Set Cover problem this chapter deals with is again a very simple to state – yet quite
general – NP-hard combinatorial problem. It is widely applicable in sometimes unexpected
ways. The problem is the following: We are given a set U (called universe) of n elements,
a collection of sets S = {S1 , . . . , Sk } where Si ⊆ U , and a cost function c : S → R+ .
The task is to find a minimum cost subcollection S 0 ⊆ S that covers U , i.e., such that
∪S∈S 0 S = U .
Example 6.1. Consider this instance: U = {1, 2, 3}, S = {S1 , S2 , S3 } with S1 = {1, 2},
S2 = {2, 3}, S3 = {1, 2, 3} and cost c(S1 ) = 10, c(S2 ) = 50, and c(S3 ) = 100. These
collections cover U : {S1 , S2 }, {S3 }, {S1 , S3 }, {S2 , S3 }, {S1 , S2 , S3 }. The cheapest one is
{S1 , S2 } with cost equal to 60.
For each set S, we associate a variable xS ∈ {0, 1} that indicates of we want to choose
S or not. We may thus write solutions for Set Cover as a vector x ∈ {0, 1}k . With this,
we write Set Cover as a mathematical program.
Define the frequency of an element to be the number of sets it is contained in. Let
f denote the frequency of the most frequent element. In this chapter we present several
algorithms that either achieve approximation ratio O (log n) or f . Why are we interested
in a variety algorithms? Is one algorithm not sufficient? Yes, but here the focus is on the
techniques that yield these algorithms.
44
6.1 Greedy Algorithm
The Greedy algorithm follows the natural approach of iteratively choosing the most
cost-effective set and remove all the covered elements until all elements are covered. Let
C be the set of elements already covered at the beginning of an iteration. During this
iteration define the cost-effectiveness of a set S as c(S)/|S − C|, i.e., the average cost at
which it covers new elements. For later reference, the algorithm sets the price at which it
coveredPan element equal to the cost-effectiveness of the covering set. Further recall that
n
Hn = i=1 1/i is called the n-th Harmonic number and that log n ≤ Hn ≤ log n + 1.
Step 1. C = ∅, x = 0.
(a) Find the most cost-effective set in the current iteration, say S.
(b) Set xS = 1 and for each e ∈ S − C set price(e) = c(S)/|S − C|.
(c) C = C ∪ S.
Step 3. Return x.
Theorem 6.2. The Greedy algorithm is an Hn -approximation algorithm for the Set
Cover problem.
Direct Analysis
The following lemma is crucial for the proof of the approximation-guarantee. Number the
elements of U in the order in which they were covered by the algorithm, say e1 , . . . , en .
Let x∗ be an optimum solution.
Lemma 6.3. For each i ∈ {1, . . . , n}, price(ei ) ≤ val(x∗ )/(n − i + 1).
Proof. In any iteration, the leftover sets of the optimal solution x∗ can cover the remaining
elements at a cost of at most val(x∗ ). Therefore, among these, there must be one set
having cost-effectiveness of at most val(x∗ )/|U − C|. In the iteration in which element ei
was covered, U − C contained at least n − i + 1 elements. Since ei was covered by the most
cost-effective set in this iteration, we have that
val(x∗ ) val(x∗ )
price(ei ) ≤ ≤
|U − C| n−i+1
45
Proof of Theorem 6.2. Since the cost of each set is distributed evenly among the new
elements covered, the total cost of the set cover picked is
n
X
val(x) = price(ei ) ≤ val(x∗ )Hn ,
i=1
Dual-Fitting Analysis
Here we will give an alternative analysis of the Greedy algorithm for Set Cover. We
will use the dual fitting method, which is quite general and helps to analyze a broad variety
of combinatorial algorithms.
For sake of exposition we consider a minimization problem, but the technique works
similarly for maximization. Consider an algorithm Alg which does the following:
(1) Let (P ) be an integer programming formulation of the problem of interest. We are
interested in its optimal solution x∗ , respectively its objective value val(x∗ ). Let (D)
be the dual of a linear programming relaxation of (P ).
(2) The algorithm Alg computes a feasible solution x for (P ) and a “solution” y for (D),
where we allow that y is infeasible for (D). But the algorithm has to ensure that
val(x) ≤ val(y),
where val is the objective function of (P ) and val is the objective function of (D).
(3) Now divide the entries of y by a certain quantity α until y 0 = y/α is feasible for (D).
(The method of dual fitting is applicable only if this property can be ensured.) Then
val(y 0 ) is a lower bound for val(x∗ ) by weak duality, i.e.,
val(y 0 ) ≤ val(x∗ )
by Lemma 4.8.
(4) Putting these things together, we obtain the approximation guarantee of α by
val(x) ≤ val(y) = val(αy 0 ) = αval(y 0 ) ≤ αval(x∗ ).
Now we apply this recipe to Set Cover and consider the Greedy algorithm. For
property (1) we use our usual formulation
X
minimize c(S)xS , (P)
S∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ∈ {0, 1} S ∈ S.
When we relax the constraints xS ∈ {0, 1} to 0 ≤ xS ≤ 1 and dualize the corresponding
linear program we find
X
maximize ye , (D)
e∈U
X
subject to ye ≤ c(S) S ∈ S,
e∈S
ye ≥ 0.
46
This dual can be derived purely mechanically (by applying the primal-dual-definition and
rewriting constraints if needed), but this program also has an intuitive interpretation. The
constraints of (D) state that we want to “pack stuff” into each set S such that the cost
c(S) of each set is not exceeded, i.e., the sets are not overpacked. We seek to maximize
the total amount packed.
How about property (2)? The algorithm Greedy computes a certain feasible solution
x for (P ), i.e., a solution xS = 1 if the algorithm picks set S and xS = 0 otherwise. What
about the vector y? Define the following vector: For each e ∈ U set ye = price(e), where
price(e) is the value computed during the execution of the algorithm.
By construction of the algorithm we have
X X X
val(x) = c(S)xS = price(e) = ye = val(y),
S∈S e∈U e∈U
i.e., Greedy satisfies property (2) of the dual fitting method (even with equality).
For property (3) the following result is useful.
Proof. Let S ∈ S with, say, m elements. Consider these in the ordering the algorithm
covered them, say, e1 , . . . , em . At the iteration when ei gets covered S contains m − i + 1
uncovered elements. Since Greedy chooses the most cost-effective set we have that
c(S)
price(ei ) ≤ ,
m−i+1
i.e., the cost-effectiveness of the set the algorithm chooses can only be smaller than the
cost-effectiveness of S. (Be aware that “smaller” is “better” here.)
Summing over all elements gives
m m
X X X 1
ye = price(ei ) ≤ c(S) = c(S)Hm ≤ c(S)Hn
m−i+1
e∈S i=1 i=1
as claimed.
Now we are in position to finalize the dual-fitting analysis using property (4).
Proof of Theorem 6.2. Define the vector y 0 = y/Hn , where y is defined above. Observe
that for each set S ∈ S we have
X X ye 1 X
ye0 = = ye ≤ c(S)
Hn Hn
e∈S e∈S e∈S
using Lemma 6.4. That means y 0 is feasible for (D). Using the property (4) of the dual
fitting method proves the approximation guarantee of at most Hn .
47
6.2 Primal-Dual Algorithm
The primal-dual schema introduced here is the method of choice for designing approxi-
mation algorithms because it often gives algorithms with good approximation guarantees
and good running times. After introducing the ideas behind the method, we will use it to
design a simple factor f algorithm, where f is the frequency of the most frequent element.
The general idea is to work with an LP-relaxation of an NP-hard problem and its dual.
Then the algorithm iteratively changes a primal and a dual solution until the relaxed
primal-dual complementary slackness conditions are satisfied.
Primal-Dual Schema
Consider the following primal program:
n
X
minimize val(x) = cj xj ,
j=1
n
X
subject to aij xj ≥ bi i = 1, . . . , m,
j=1
xj ≥ 0 j = 1, . . . , n.
Most known approximation algorithms using the primal-dual schema run by ensuring one
set of conditions and suitably relaxing the other. We will capture both situations by
relaxing both conditions. If primal conditions are to be ensured, we set α = 1 below, and
if dual conditions are to be ensured, we set β = 1.
Lemma 6.5. If x and y are primal and dual feasible solutions respectively satisfying the
complementary slackness conditions stated above, then
val(x) ≤ αβval(y).
48
Proof. We calculate directly using the slackness conditions and obtain
n n m
!
X X X
val(x) = cj xj ≤ α aij yi xj
j=1 j=1 i=1
m
X n
X m
X
=α aij xj yi ≤ αβ bi yi = val(y)
i=1 j=1 i=1
The algorithm starts with a primal infeasible solution and a dual feasible solution;
usually these are x = 0 and y = 0 initially. It iteratively improves the feasibility of the
primal solution and the optimality of the dual solution ensuring that in the end a primal
feasible solution is obtained and all conditions stated above, with a suitable choice for α
and β, are satisfied. The primal solution is always extended integrally, thus ensuring that
the final solution is integral. The improvements to the primal and the dual go hand-in-
hand: the current primal solution is used to determine the improvement to the dual, and
vice versa. Finally, the cost of the dual solution is used as a lower bound on the optimum
value, and by Lemma 6.5, the approximation guarantee of the algorithm is αβ.
Primal-Dual Algorithm
Here we derive a factor f approximation algorithm for Set Cover using the primal-dual
schema. For this algorithm we will choose α = 1 and β = f . We will work with the
following primal LP for Set Cover
X
minimize val(x) = c(S)xS ,
S∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ≥ 0 S ∈ S.
For these LPs the primal and dual complementary slackness conditions are:
49
Dual Complementary Slackness Conditions. For each e ∈ U :
X
either ye = 0 or xS ≤ f.
S:e∈S
Since we will find a 0/1 solution for x, these conditions are equivalent to: “Each
element having non-zero dual value can be covered at most f times.” Since each
element is in at most f sets, this condition is trivially satisfied for all elements.
(a) Pick an uncovered element, say e, and raise ye until some set goes tight.
(b) Pick all tight sets S in the cover, i.e., set xS = 1.
(c) Declare all the elements occuring in these sets as covered.
Step 3. Return x.
Example 6.7. A tight example for this algorithm is provided by the following set system.
The universe is U = {e1 , . . . , en+1 } and S consists of n − 1 sets {e1 , en }, . . . , {en−1 , en } of
cost 1 and one set {e1 , . . . , en+1 } of cost 1 + ε for some small ε > 0. Since en appears in
all n sets, this system has f = n.
Suppose the algorithm raises yen in the first iteration. When yen is raised to 1, all
sets {ei , en }, i = 1, . . . , n − 1 go tight. They are all picked in the cover, thus covering the
elements e1 , . . . , en . In the second iteration yen+1 is raised to ε and the set {e1 , . . . , en+1 }
goes tight. The resulting set cover has cost n + ε, whereas the optimum cover has cost
1 + ε.
50
Here we derive a factor f approximation algorithm for Set Cover but this time by
rounding the fractional solution of an LP to an integral solution (instead of the primal-dual
schema). We consider our usual LP relaxation for Set Cover
X
minimize val(x) = c(S)xS ,
s∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ≥ 0 S ∈ S.
Step 1. Set x = 0, solve the LP relaxation below, and call the optimal solution z.
X
minimize val(x) = c(S)xS ,
S∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ≥ 0 S ∈ S.
Step 3. Return x.
Proof. Let x be the solution returned by the algorithm and z be the optimal solution of
the LP. Consider an arbitrary element e ∈ U . Since e is in at most f sets, one of these
sets must be pickedPto the extentPof at least 1/f in the fractional solution z. If this were
not the case then S:e∈S zS < S:e∈S 1/f ≤ f · 1/f = 1 yields a contradiction to the
feasibility of z. Thus e is covered due to the definition of the algorithm and x is hence a
feasible cover. We further have xS ≤ f zS and thus
51
Randomized Rounding
Another natural idea for rounding fractional solutions is to use randomization: For exam-
ple, for the above relaxation, observe that the values zS are between zero and one. We
may thus interpret these values as probabilities for choosing a certain set S.
Here is the idea of the following algorithm: Solve the LP-relaxation optimally and call
the solution z. With probability zS include the set S into the cover.
This basic procedure yields a vector x with expected value equal to the optimal frac-
tional solution value but might not cover all the elements. We thus repeat the procedure
“sufficiently many” times and include a set into our cover if it was included in any of
the iterations. We will show that O (log n) many iterations suffice yielding an O (log n)-
approximation algorithm.
Step 1. Set x = 0, solve the LP relaxation below, and call the optimal solution z.
X
minimize val(x) = c(S)xS ,
S∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ≥ 0 S ∈ S.
Step 2. Repeat d3 log ne times: For each set S set xS = 1 with probability zS .
Step 3. Return x.
Theorem 6.9. With probability at least 1 − 1/n2 the algorithm Randomized Rounding
Set Cover returns a feasible solution, which is expected d3 log ne-approximate for Set
Cover.
Proof. Let z be an optimal solution for the LP. We estimate the probability that an
element e ∈ U is covered in one iteration in Step 2. Let e be contained in m sets and
let z1 , . . . , zm be the probabilities given in the solution z. Since e is fractionally covered
we have z1 + · · · + zm ≥ 1. With easy but tedious calculus we see that – under this
condition – the probability for e being covered is minimized when the zi are all equal, i.e.,
z1 = · · · = zm = 1/m:
1 m
1
Pr [xS = 1] = 1 − (1 − z1 ) · · · · · (1 − zm ) ≥ 1 − 1 − ≥1− .
m e
Each element is covered with probability at least 1 − 1/e. But maybe we have not covered
all elements after d3 log ne iterations. The probability that the element e is not covered at
the end of the algorithm, i.e., after d3 log ne iterations is
d3 log ne
1 1
Pr [e is not covered] ≤ ≤ 3.
e n
52
Thus the probability that there is an uncovered element is at most
X 1 1
Pr [e is not covered] ≤ n · 3
≤ 2.
n n
e∈U
where x∗ is an optimal solution for Set Cover. So, the algorithm returns a feasible
solution, with probability at least 1 − 1/n2 , whose expected value is d3 log ne-approximate.
The proof above shows that the algorithm is a d3 log ne-approximation in expectation.
But we can actually state that the approximation ratio is 4 · d3 log ne with probability
around 3/4. Use Markov’s inequality Pr [X > t] ≤ E [X] /t to show
E [val(x)] 1
Pr [val(x) > 4 · d3 log ne · val(z)] ≤ ≤
4 · d3 log ne · val(z) 4
The probability that either not all elements are covered or the obtained solution has value
larger than 4 · d3 log ne times the optimal value is at most 1/n2 + 1/4 ≤ 1/2 for all n ≥ 2.
Thus we have to run the whole algorithm at most two times in expectation to actually get
a 4 · d3 log ne-approximate solution.
53
Chapter 7
Satisfiability
The Satisfyability problem asks if a certain given Boolean formula has a satisfying
assignment, i.e., one that makes the whole formula evaluate to true. There is a related
optimization problem called Maximum Satisfiability. The goal of this chapter is to
develop a deterministic 3/4-approximation algorithm. We first give a corresponding ran-
domized algorithm which will then be derandomized.
We are given the Boolean variables X = {x1 , . . . , xn }, where each xi ∈ {0, 1}. A literal
`i of the variable xi is either xi itself, called a positive literal, or its negation x̄i with truth
value 1 − xi , called a negative literal. A clause is a disjunction C = (`1 ∨ · · · ∨ `k ) of literals
`j of X; their number k is called the size of C. For a clause C let SC+ denote the set of its
positive literals; similarly SC− the set of its negative literals. Let C denote the set of clauses.
A Boolean formula in conjunctive form is a conjunction of clauses F = C1 ∧ · · · ∧ Cm . Each
vector x ∈ {0, 1}n is called a truth assignment. For any clause C and any such assignment
x we say that x satisfies C if at least one of the literals of C evaluates to 1.
The problem Maximum Satisfiability is the following: We are given a formula F
in conjunctive form and for each clause C a weight wC , i.e., a weight function w : C → N.
The objective is to find a truth assignment x ∈ {0, 1}n that maximizes the total weight of
the satisfied clauses. As an important special case: If we set all weights wC equal to one,
then we seek to maximize the number of satisfied clauses.
Now we introduce for each clause C a variable zC ∈ {0, 1} which takes the value one if
and only if C is satisfied under a certain truth assignment x. Now we can formulate this
problem as a mathematical program as follows:
zC ∈ {0, 1} C ∈ C,
xi ∈ {0, 1} i = 1, . . . , n.
54
The algorithm we aim for is a combination of two algorithms. One works better
for small clauses, the other for large clauses. Both are initially randomized but can be
derandomized using the method of conditional expectation, i.e., the final algorithm is
deterministic.
Proof. A clause C is not satisfied, i.e., ZC = 0 if and only if all its literals are set to zero.
By independence, the probability of this event is exactly 2−k and thus
E [ZC ] = 1 · Pr [ZC = 1] + 0 · Pr [ZC = 0] = 1 − 2−k = αk
which was claimed.
0 ≤ zC ≤ 1 C∈C
0 ≤ xi ≤ 1 i = 1, . . . , n.
55
In the sequel let (x̄, z̄) denote an optimum solution for this LP.
Consider this algorithm Randomized Small: Determine (x̄, z̄). For each variable xi
with i = 1, . . . , n, set Xi = 1 independently with probability x̄i and Xi = 0 otherwise.
Output X = (X1 , . . . , Xn ).
Define the quantity
1 k
βk = 1 − 1 − .
k
Lemma 7.3. Let C be a clause. If size(C) = k then
E [ZC ] = βk z̄C .
Proof. We may assume that the clause C has the form C = (x1 ∨ · · · ∨ xk ); otherwise
rename the variables and rewrite the LP.
The clause C is satisfied if x1 , . . . , xk are not all set to zero. The probability of this
event is
Pk !k
k i=1 (1 − x̄i )
1 − Πi=1 (1 − x̄i ) ≥ 1 −
k
Pk !k
x̄ i
= 1 − 1 − i=1
k
z̄C k
≥1− 1− .
k
Above we firstly have used the arithmetic-geometric mean inequality, which states that
for non-negative numbers a1 , . . . , ak we have
a1 + · · · + ak √
≥ k a1 · · · · · ak .
k
Secondly the LP guarantees the inequality x̄1 + · · · + x̄k ≥ z̄C .
Now define the function g(t) = 1 − (1 − t/k)k . This function is concave with g(0) = 0
and g(1) = 1 − (1 − 1/k)k which yields that we can bound
g(t) ≥ t(1 − (1 − 1/k)k ) = tβk
for all t ∈ [0, 1].
Therefore z̄C k
Pr [ZC = 1] ≥ 1 − 1 − ≥ βk z̄C
k
and the claim follows.
where (x∗ , z ∗ ) is an optimal solution for Maximum Satisfiability. The claim follows
since (1 − 1/k)k < 1/e for all k ∈ N.
56
3/4-Approximation Algorithm
Consider the algorithm Randomized Combine: With probability 1/2 run Randomized
Large otherwise run Randomized Small.
Proof. Let the random variable B take the value zero if the first algorithm is run, one
otherwise. For a clause C let size(C) = k. By Lemma 7.1 and z̄C ≤ 1
E [ ZC | B = 0] = αk ≥ αk z̄C .
7.2 Derandomization
The notion of derandomization refers to “turning” a randomized algorithm into a deter-
ministic one (possibly at the cost of additional running time or deterioration of approxi-
mation guarantee). One of the several available techniques is the method of conditional
expectation.
We are given a Boolean formula F = C1 ∧· · ·∧Cm in conjunctive form over the variables
X = {x1 , . . . , xn }. Suppose we set x1 = 0, then we get a formula F0 over the variables
x2 , . . . , xn after simplification; if we set x1 = 1 then we get a formula F1 .
x1 = 0 : F0 = (x2 ) ∧ (x4 )
x1 = 1 : F1 = (x3 )
Applying this recursively, we obtain the tree T (F ) depicted in Figure 7.1. The tree
T (F ) is a complete binary tree with height n+1 and 2n+1 −1 vertices. Each vertex at level i
corresponds to a setting for the Boolean variables x1 , . . . , xi . We label the vertices of T (F )
with their respective conditional expectations as follows. Let X1 = a1 , . . . , Xi = ai ∈ {0, 1}
be the outcome of a truth assignment for the variables x1 , . . . , xi . The vertex corresponding
to this assignment will be labeled
E [ val(Z) | X1 = a1 , . . . , Xi = ai ] .
57
T (F )
F level 0
x1 = 0 x1 = 1
F0 F1 level 1
T (F0) T (F1)
If i = n, then this conditional expectation is simply the total weight of clauses satisfied by
the truth assignment x1 = a1 , . . . , xn = an .
The goal of the remainder of the section is to show that we can find deterministically
in polynomial time a path from the root of T (F ) to a leaf such that the conditional
expectations of the vertices on that path are at least as large as E [val(Z)]. Obviously, this
property yields the desired: We can construct determistically a solution which is at least
as good as the one of the randomized algorithm in expectation.
Lemma 7.8. The conditional expectation
E [ val(Z) | X1 = a1 , . . . , Xi = ai ]
of any vetex in T (F ) can be computed in polynomial time.
Proof. Consider a vertex X1 = a1 , . . . , Xi = ai . Let F 0 be the Boolean formula obtained
from F by setting x1 , . . . , xi accordingly. F 0 is in the variables xi+1 , . . . , xn .
Clearly, by linearity of expectation, the expected weight of any clause of F 0 under any
random truth assignment to the variables xi+1 , . . . , xn can be computed in polynomial
time. Adding to this the total weight of clauses satisfied by x1 , . . . , xi gives the answer.
Theorem 7.9. We can compute in polynomial time a path from the root to a leaf in T (F )
such that the conditional expectation of each vertex on this path is at least E [val(Z)].
Proof. Consider the conditional expectation at a certain vertex X1 = a1 , . . . , Xi = ai for
setting the next variable Xi+1 . We have that
E [ val(Z) | X1 = a1 , . . . , Xi = ai ]
= E [ val(Z) | X1 = a1 , . . . , Xi = ai , Xi+1 = 0] Pr [Xi+1 = 0]
+ E [ val(Z) | X1 = a1 , . . . , Xi = ai , Xi+1 = 1] Pr [Xi+1 = 1] .
We show that the two conditional expectations with Xi+1 can not be both strictly smaller
than E [ val(Z) | X1 = a1 , . . . , Xi = ai ]. Assume the contrary, then we have
E [ val(Z) | X1 = a1 , . . . , Xi = ai ]
< E [ val(Z) | X1 = a1 , . . . , Xi = ai ] (Pr [Xi+1 = 0] + Pr [Xi+1 = 1])
which is a contradiction since Pr [Xi+1 = 0] + Pr [Xi+1 = 1] = 1.
This yields the existence of such a path can by Lemma 7.8 it can be computed in
polynomial time.
The derandomized version of a randomized algorithm now simply executes these proofs
with the probability distribution as given by the randomized algorithm.
58
Chapter 8
Facility Location
The Metric Facility Location problem was popular in operations research in the 1960s
but no constant factor approximation algorithms were known until 1997. The discovery
of these is due to LP-rounding techniques and the primal-dual schema. In this section we
present a 3-approximate primal-dual algorithm.
Metric Facility Location is the following problem: We are given a complete bipar-
tite graph G = (V, E) with bipartition V = F ∪ C, where F refers to the set of (potential)
facilities and C to the set of cities. Establishing a facility i causes opening cost fi . At-
taching city j to an (opened) facility i yields connection cost cij . We assume that the cij
satisfy the triangle inequality cij ≤ cij 0 + ci0 j 0 + ci0 j for all i, i0 ∈ F and j, j 0 ∈ C. So,
now, the problem is to find a subset I ⊆ F of facilities to open and a mapping a : C → I
for assigning cities to open facilities in a way that each city is connected to at least one
facility as to minimize the total opening and connection cost. We write this task as a
mathematical program, where yi indicates if facility i is open and xij if city j is assigned
to facility i.
The problem Metric Facility Location is NP-hard. Here we will show the following
main result.
59
An obvious way of relaxing this problem is to replace the constraints xij ∈ {0, 1} and
yi ∈ {0, 1} by 0 ≤ xij ≤ 1 and 0 ≤ yi ≤ 1 respectively. For sake of completeness:
XX X
minimize val(x, y) = cij xij + fi yi (P)
i∈F j∈C i∈F
X
subject to xij ≥ 1 j∈C
i∈F
yi − xij ≥ 0 i ∈ F, j ∈ C
0 ≤ xij ≤ 1
0 ≤ yi ≤ 1.
αj ≥ 0 j∈C
βij ≥ 0 i ∈ F, j ∈ C.
For us, the third and the fourth condition are particularly interesting as they relate
the primal and dual variables at optimal points. The third is called primal complemen-
tary slackness, the forth dual complementary slackness. One way of looking at this dual
condition is to say that “either yi = 0 or bi − (Ax)i = 0”, i.e., “if the dual variable yi is
not zero, then the corresponding primal constraint is satisfied with equality”. Similarly
for the primal condition.
Now we return to Metric Facility Location. Assume for the moment that there
is an integral solution, say (x, y), which is optimal for (P). This solution corresponds to a
60
set I ⊆ F and a mapping a : C → I. Thus, under this solution, yi = 1 if and only if i ∈ I
and xij = 1 if and only if a(j) = i. Let (α, β) be an optimal solution for (D).
Now, for (P) and (D) the primal-dual complementary slackness conditions are:
By (2) each open facility i must be “paid” by the dual variables βij , i.e.,
X
βij = fi .
j∈C
By condition (4), if facility i is open, but city j is not assigned to it, i.e., a(j) 6= i, then
we must have yi 6= xij and thus βij = 0. This means that no city contributes to a facility
it is not connecting to.
By condition (1), if for some city j and facility i we have a(j) = i then we must have
αj − βij = cij . Thus we can think of αj = βij + cij as the total price paid by city j, where
βij is its opening cost share and cij its connection cost (paid exclusively).
αj − βa(j)j = ca(j)j ,
61
Algorithm
The algorithm consists of two phases. In the first phase, the algorithm operates in a
primal-dual fashion. It finds a dual feasible solution and also determines a set of tight
edges and temporarily open facilities Ft . In the second phase the algorithm chooses a
subset I ⊆ Ft of facilities to open permanently, and a mapping a : C → I.
Phase 1. We would like to find as large a dual solution as possible. This motivates the
following underlying process: Each city j raises its dual variable αj until it gets
connected to an open facility, i.e., until αj = cij for some open facility i. All other
primal and dual variables simply respond to this change, trying to maintain feasibility
or satisfying complementary slackness conditions.
A notion of time is defined in this phase, so that each event can be associated with
the time at which it happened; the phase starts at time zero. Initially each city is
defined to be unconnected. Throughout this phase, the algorithm raises the dual
variable αj for each unconnected city at unit rate, i.e., αj will grow by one in unit
time. When αj = cij for some edge ij, the algorithm will declare this edge to be
tight. Henceforth, the dual variable βij will also be raised uniformly, thus ensuring
that the constraint αj − βij ≤ cij in (D) is never violated. At this point in time the
connection cost are paid and the variable βij goes towards paying for the opening
cost of facility i. Each edge ij such that βij > 0 is called special.
P
Facility i is said to be paid for if j βij = fi . If so, the algorithm declares the facility
temporarily open, i.e., i ∈ Ft . Furthermore, all unconnected cities having tight edges
to this facility are declared connected and facility i is declared the connecting witness
for each of these cities. (Notice that the dual variables αj of these cities are not raised
any more.) In the future, as soon as an unconnected city j gets a tight edge to i,
j will also be declared connected and i the connection witness for j. (Notice that
βij = 0 and the edge ij is not special.) When all cities are connected, the first phase
terminates. If several events happen simultaneously, the algorithm executes them in
arbitrary order.
As a side remark, at the end of this phase, a city may have paid towards temporarily
opening several facilities. However, we want to ensure that a city pays only for the
facility that it is eventually connected to. This is ensured in the second phase, which
chooses a set of temporarily open facilities for opening permanently.
Phase 2. Let Ft denote the set of temporarily open facilities and T denote the subgraph
of G induced by all special edges. Let T 2 denote the graph that has an edge uv if
and only if there is a path of length at most two between u and v in T , and let H
be the subgraph of T 2 induced by Ft . Find any maximal independent set in H, say
I. All facilities in the set I are declared open.
For city j, define Fj = {i ∈ Ft : ij is special}. Since I is an independent set, at most
one of the facilities in Fj is opened. If there is a facility i ∈ Fj that is opened, then
set a(j) = i and declare the city j directly connected. Otherwise, consider the tight
edge i0 j such that i0 was the connecting witness for j. If i0 ∈ I, again set a(j) = i0
and declare the city j directly connected (notice that in this case βi0 j = 0). In the
remaining case that i0 6∈ I, let i be any neighbor of i0 in the graph H such that i ∈ I.
Set a(j) = i and declare city j indirectly connected.
The set I and the mapping a : C → I define a primal integral solution: xij = 1 if and
only if a(j) = i and yi = 1 if and only if i ∈ I. The values for αj and βij obtained at the
62
end of the first phase form a dual feasible solution.
Analysis
The crucial result for the analysis, which directly gives the approximation guarantee, is
the following.
Theorem 8.3. The primal and dual solutions constructed by the algorithm satisfy
XX X X
cij xij + 3 fi yi ≤ 3 αj .
i∈F j∈C i∈F j∈C
We will show how the dual variables αj pay for the primal costs of opening facilities
and connecting cities to facilities. Denote by αjf and αjc the contributions of city j to these
two costs respectively; αj = αjf + αjc . If j is indirectly connected then αjf = 0 and αjc = αj .
If j is directly connected then the following must hold:
αj = cij + βij ,
j:a(j)=i
Proof. Since i is temporarily open at the end of phase one, it is completely paid for, i.e.,
X
βij = fi .
j:ij is special
The critical observation is that each city j that has contributed to fi must be directly
connected to i. For each such city, αjf = βij . Any other city j 0 that is connected to facility
i must satisfy αjf0 = 0.
αjf .
P P
Corollary 8.5. i∈I fi = j∈C
Recall that αjf was defined to be 0 for indirectly connected cities. Thus, only the
directly connected cities pay for the cost of opening facilities.
Lemma 8.6. For an indirectly connected city j, cij ≤ 3αjc , where i = a(j).
Proof. Let i0 be the connecting witness for city j. Since j is indirectly connected to i, the
edge ii0 must be an edge in H. In turn, there must be a city, say j 0 , such that ij 0 and
i0 j 0 are both special edges. Let t1 and t2 be the times at which i and i0 were declared
temporarily open during phase 1.
Since edge i0 j is tight, αj ≥ ci0 j . We will show that αj ≥ cij 0 and αj ≥ ci0 j 0 . Then, the
lemma will follow by using the triangle inequality.
Since edges ij 0 and i0 j 0 are tight, αj 0 ≥ cij 0 and αj 0 ≥ ci0 j 0 . Since both these edges are
special, they must both have gone tight before either i or i0 is declared temporarily open.
Consider the time min{t1 , t2 }. Clearly, αj 0 can not be growing beyond this point in time
and we have αj 0 ≤ min{t1 , t2 }. Finally, since i0 is the connecting witness for j, αj ≥ t2 .
Therefore αj ≥ αj 0 , and the required inequalities follow.
63
Proof of Theorem 8.3. For a directly connected city j, cij = αjc ≤ 3αjc , where i = a(j).
With Lemma 8.6 we get XX X
cij xij ≤ 3 αjc .
i∈F j∈C j∈C
Adding to this the equality given in Corollary 8.5 multiplied by 3 yields the claim.
Running Time
Clearly, the total number of edges of the complete bipartite graph G = (F ∪ C, E) is
m = |F ||C|. For the implementation of the algorithm sort all the edges by increasing cost.
This is the ordering in which they go tight. For each facility i, we maintain the number of
cities that are currently contributing to it, and the anticipated time ti , at which it would
be completely paid for if no other event happens on the way. Initially all ti ’s are infinite
and each facility has 0 cities contributing to it. The ti ’s are maintained in a binary heap,
so we can update each one and find the current minimum in O (log |F |) time. Two types
of events happen and they lead to the following updates:
An edge ij goes tight. If facility i is not temporarily open, then it gets one more city
contributing towards its cost. The corresponding amount can easily be computed.
Thus, the anticipated time for facility i to be paid for can be computed in constant
time. The heap can be updated in O (log |F |) time.
If facility i is already temporarily open, city j is declared connected, and αj is not
raised anymore. For each facility i0 that was counting j as a contributor, we need
to decrease the number of contributors by 1 and recompute the anticipated time at
which it gets paid for.
Facility i is completely paid for. In this event, i will be declared temporarily open,
and all cities contributing to i will be declared connected. For each of these cities,
we will execute the second case of the previous event, i.e., update facilities that they
were contributing towards.
Observe that each edge ij will be considered at most twice. First, when it goes tight,
and second when city j is declared connected. For each consideration of this edge, we will
do O (log |F |) work. This discussion of the running time together with Theorem 8.3 yields
Theorem 8.1.
Tight Example
The following familty of examples shows that the analysis of the algorithm is tight.
Example 8.7. There are two facilities with opening cost f1 = ε and f2 = (n + 1)ε. There
are n cities where city one is at distance one from facility one, but the remaining cities are
at distance three from it, and each city is at distance one from facility two. The optimal
solution is to open facility two at total cost (n + 1)ε + n. The algorithm will open facility
one and connect all cities to it at total cost ε + 1 + 3(n − 1).
64
Chapter 9
Makespan Scheduling
In this chapter, we consider the classical Makespan Scheduling problem. We are given
m machines for scheduling, indexed by the set M = {1, . . . , m}. There are furthermore
given n jobs, indexed by the set J = {1, . . . , n}, where job j takes pi,j units of time
if scheduled
P on machine i. Let Ji be the set of jobs scheduled on machine i. Then
`i = j∈Ji pi,j is the load of machine i. The maximum load `max = cmax = maxi∈M `i is
called the makespan of the schedule.
The problem is NP-hard, even if there are only two identical machines. However, we
will derive several constant factor approximations and a PTAS for identical machines and
a 2-approximation for the general case.
List Scheduling
As a warm-up we consider the following two heuristics for Makespan Scheduling. The
List Scheduling algorithm works as follows: Determine any ordering of the job set J,
stored in a list L. Starting with all machines empty, determine the machine i with the
currently least load and schedule the respective next job j in L on i. The load of i before
the assignment of j is called the starting time sj of job j and the load of i after the
assignment is called the completion time cj of job j. In the Sorted List Scheduling
algorithm we execute List Scheduling, where the list L consists of the jobs in decreasing
order of length.
Proof. Let T ∗ be the optimal makespan of the given instance. We show that sj ≤ T ∗ for
all j ∈ J. This implies cj = sj + pj ≤ T ∗ + pj ≤ 2 · T ∗ for all j ∈ J, since we clearly must
have T ∗ ≥ pj for all j ∈ J.
Assume that sj > T ∗ for some j ∈ J. Then we have that the load before the assignment
of j is `i > T ∗ for P
all i ∈ M . Thus the jobs J 0 ⊆ J scheduled before j by the algorithm
have total length j 0 ∈J 0 pj 0 > m · T ∗ . On P
the other hand, since the optimum solution
schedules all jobs J until time T we have j∈J pj ≤ m · T ∗ . A contradiction and the
∗
List Scheduling algorithm must start all jobs not later than time T ∗ .
65
Here we show that Sorted List Scheduling is a 3/2-approximation, but one can
actually prove that the algorithm is a 4/3-approximation.
Proof. Let T ∗ be the optimal makespan of the given instance. Partition the jobs JL =
{j ∈ J : pj > T ∗ /2} and JS = J − JL , called large and small jobs. Notice that there can
be at most m large jobs: Assume that there are more than m such jobs. Then, in any
schedule, including the optimal one, there must be at least two such jobs scheduled on
some machine. Since the length of a large job is more than T ∗ /2, this contradicts that T ∗
is the optimal makespan.
Since there are at most m large jobs and the algorithm schedules those first and hence
on individual machines, we have that each large job completes not later than T ∗ , i.e.,
cj ≤ T ∗ for all j ∈ JL . Thus, if a job completes later than T ∗ it must be a small job having
length at most T ∗ /2. Since each job starts not later than T ∗ we have cj ≤ T ∗ +pj ≤ 3/2·T ∗
for every small job j ∈ JS .
(1) Firstly, assume that we are given the optimal makespan T ∗ at the outset. Then we
can try to construct a schedule with makespan at most (1 + ε) · T ∗ . But how do
we determine the number T ∗ ? It turns out that we can perform binary search in an
interval [α, β], where α is any lower bound on T ∗ and β any upper bound on T ∗ .
This binary search will enable us to eventually find a number B, which is within
(1 + ε) times T ∗ and where the number of binary search iterations depends on the
error parameter ε.
(2) Secondly, assume that the number of distinct values of job lengths is a constant k,
say. Then we can determine all configurations of jobs that do not violate a load bound
of t if scheduled on a single machine. This is the basis of a dynamic programming
scheme to determine a schedule on m machines. Of course, this approach involves
rounding the original job lengths to constantly many values, which introduces some
error. The error can be controled by adjusting the constant k of distinct job lengths
at the expense of running time and space requirement for the dynamic programming
table.
66
Dynamic Programming. Assume for now that |{p1 , . . . , pn }| = k, i.e., there are k
distinct job lengths. Fix an ordering of the jobs lengths. Then a k-tuple (i1 , . . . , ik )
describes for any ` ∈ {1, . . . , k} the number i` of jobs having the respective length. For
any k-tuple (i1 , . . . , ik ) let m(i1 , . . . , ik , t) be the smallest number of machines needed to
schedule these jobs Phaving makespan at most t. For a given parameter t and an instance
(n1 , . . . , nk ) with k`=1 n` = n, we first compute the set Q of all k-tuples (q1 , . . . , qk ) such
that m(q1 , . . . , qk , t) = 1, 0 ≤ q` ≤ n` for ` = 1, . . . , k, i.e., all sets of jobs that can be
scheduled on a single machine with makespan at most t. Clearly, Q contains at most O nk
elements. Having these numbers computed, we determine the entries m(i1 , . . . , ik , t) for
every (i1 , . . . , ik ) ∈ {0, . . . , n1 } × · · · × {0, . . . , nk } of a k-dimensional table as follows: The
table is initialized by setting m(q, t) = 1 for every q ∈ Q. Then we use the following
recurrence to compute the remaining entries:
a constant.
Rounding. Let ε > 0 be an error parameter and let t ∈ [α, β] as defined above. We
say that a job j is small if pj < ε · t. Small jobs are removed from the instance for
now. The rest of the job lengths are rounded down as follows: If a job j has length
pj ∈ [t · ε · (1 + ε)i , t · ε · (1 + ε)i+1 ) for i ≥ 0, it is replaced by p0j = t · ε · (1 + ε)i . Thus
there can be at most k = dlog1+ε 1/εe many distinct job lengths. Now we invoke the
above dynamic programming scheme and determine a the optimal number of machines
for scheduling these jobs if the makespan is at most t. Since the rounding reduces the
length of each job by a factor of at most (1 + ε), the computed schedule has makespan
at most (1 + ε) · t when considering the original job lengths. Now we schedule the small
jobs greedily in leftover space and open new machines if needed. Clearly, whenever a new
machine is opened, all previous machines must be loaded to an extent of at least t. Denote
by a(J, t, ε) the number of machines used by this algorithm. Recall that the makespan is
at most (1 + ε) · t.
Proof. If the algorithm does not open any new machines for small jobs, then the assertion
clearly holds since the rounded down jobs have been scheduled optimally with makespan
t. In the other case, all but the last machine are loaded to the extent of t. Hence, the
optimal schedule of J having makespan t must also use at least a(J, t, ε) machines.
Binary Search. If T could be determined with no additional error during the binary
search, then clearly we could use the above algorithm to obtain a schedule with makespan
at most (1 + ε) · T ∗ . Next, we will specify the details of the binary search and show how
to control the error it introduces. The binary search is performed in the interval [α, β] as
defined above. Thus, the length of the available interval is β − α = α at the start of the
search and it reduces by a factor of two in each iteration. We continue the search until it
drops to a length of at most ε · α. This will require dlog2 1/εe many iterations. Let B be
the right endpoint of the interval [A, B] we terminate with.
67
Lemma 9.5. We have that B ≤ (1 + ε) · T ∗ .
B ≤ T + ε · α ≤ (1 + ε) · T ∗ ,
Theorem 9.6. For any 0 < ε ≤ 1 the algorithm produduces a schedule with makespan
2 ∗ ∗ 2k
at most (1 + ε) · T ≤ (1 + 3ε) · T within running time O n · dlog2 1/εe , where k =
dlog1+ε 1/εe.
minimize t
X
subject to xi,j = 1 j ∈ J,
i∈M
X
pi,j xi,j ≤ t, i ∈ M,
j∈J
If we relax the constraints xi,j ∈ {0, 1} to xi,j ∈ [0, 1], it turns out that this formulation
has unbounded integrality gap. (It is left as an exercise to show this.) The main cause of
the problem is an “unfair” advantage of the LP-relaxation: If pi,j > t, then we must have
xi,j = 0 in any feasible integer solution, but we might have xi,j > 0 in feasible fractional
solutions. However, we can not formulate the statement “if pi,j > t then xi,j = 0” in terms
of linear constraints.
68
solution using the restricted assignment possibilities, only.
minimize 0 (lp(t))
X
subject to xi,j = 1 j ∈ J,
i:(i,j)∈St
X
pi,j xi,j ≤ t, i ∈ M,
j:(i,j)∈St
xi,j ≥ 0 (i, j) ∈ St .
Extreme Point Solutions. With a binary search, we find the smallest value for t such
that lp(t) has a feasible solution. Let T be this value and observe that T ∗ ≥ T , i.e., the
actual makespan is bounded from below by T . Our algorithm will “round” an extreme
point solution of lp(T ) to yield a schedule with makespan at most 2 · T ∗ . Extreme point
solutions to lp(T ) have several useful properties.
Lemma 9.7. Any extreme point solution to lp(T ) has at most n + m many non-zero
variables.
Proof. Let r = |ST | represent the number of variables on which lp(T ) is defined. Recall
that a feasible solution is an extreme point solution to lp(T ) if and only if it sets r many
linearly independent constraints to equality. Of these r linearly independent constraints,
at least r − (n + m) must be chosen from the third set of constraints, i.e., of the form
“xi,j ≥ 0”. The corresponding variables are set to zero. So, any extreme point solution
has at most n + m many non-zero variables.
Let x be an extreme point solution to lp(T ). We will say that job j is integrally set
if xi,j ∈ {0, 1} for all machines i. Otherwise, i.e., xi,j ∈ (0, 1) for some machine i, job j is
said to be fractionally set.
Corollary 9.8. Any extreme point solution to lp(T ) must set at least n − m many jobs
integrally.
Proof. Let x be an extreme point solution to lp(T ) and let α and β be the number of
jobs that are integrally and fractionally set by x, respectively. Each job of the latter kind
is assigned to at least 2 machines and therefore results in at least 2 non-zero entries in x.
Hence we get α + β = n and α + 2β ≤ n + m. Therefore β ≤ m and α ≥ n − m.
Algorithm. The algorithm starts by computing the range in which it finds the right
value for T . For this it constructs a greedy schedule, in which each job is assigned to
the machine on which it has the smallest length. Let α be the makespan of this schedule.
Then the range is [α/m, α] (and it is an exercise to show that α/m is indeed a lower bound
on T ∗ ).
The LP-rounding algorithm is based on several interesting properties of extreme point
solutions of lp(T ), which we establish now. For any extreme point solution x for lp(T )
define a bipartite graph G = (M ∪ J, E) such that (i, j) ∈ E if and only if xi,j > 0. Let
F ⊆ J be the fractionally set jobs in x and let H be the subgraph of G induced by the
vertex set M ∪ F . Clearly (i, j) ∈ E(H) if 0 < xi,j < 1. A matching in H is called perfect
if it matches every job j ∈ F . We will show and use that the graph H admits perfect
matchings.
We say that a connected graph on a vertex set V is a pseudo tree if it has at most |V |
many edges. Since the graph is connected, it must have at least |V | − 1 many edges. So,
69
Algorithm 9.1 Schedule Unrelated
Input. J, M , pi,j for all i ∈ M and j ∈ J
Step 1. By binary search in [α/m, α] compute smallest value T of the parameter t such
that lp(t) has a feasible solution.
it is either a tree or a tree with an additional single edge (closing exactly one cycle). A
graph is a pseudo forrest if each of its connected components is a pseudo tree.
Lemma 9.9. We have that G is a pseudo forrest.
Proof. We will show that the number of edges in each connected component of G is
bounded by the number of vertices in it. Hence, each connected component is a pseudo
tree.
Consider a connected component Gc . Restrict lp(T ) and the extreme point solution x
to the jobs and machines of Gc , only, to obtain lpc (T ) and xc . Let xc̄ represent the rest
of x. The important observation is that xc must be an extreme point solution for lpc (T ).
Suppose that this is not the case. Then, xc is a convex combination of two feasible solutions
to lpc (T ). Each of these, together with xc̄ form a feasible solution for lp(T ). Therefore x
is a convex combination of two feasible solutions to lp(T ). But this contradicts the fact
that x is an extreme point solution. With Lemma 9.7 Gc is a pseudo tree.
Lemma 9.10. Graph H has a perfect matching P .
Proof. Each job that is integrally set in x has exactly one edge incident at it in G. Remove
these jobs together with their incident edges from G. The resulting graph is clearly H.
Since an equal number of edges and vertices have been removed from the pseudo forrest
G, H is also a pseudo forrest.
In H, each job has a degree of at least two. So, all leaves in H must be machines.
Keep matching a leaf with the job it is incident to and remove them both from the graph.
(At each stage all leaves must be machines.) In the end we will be left with even cycles
(since we started with a bipartite pseudo forrest.) Match alternating edges of each cycle.
This gives a perfect matching P .
Theorem 9.11. Algorithm Schedule Unrelated is a 2-approximation for Makespan
Scheduling on unrelated machines.
Proof. Clearly T ≤ T ∗ since lp(T ∗ ) has a feasible solution. The extreme point solution
x to lp(T ) has a fractional makespan of at most T . Therefore, the restriction of x to
integrally set jobs has an integral makespan of at most T . Each edge (i, j) of H satisfies
pi,j ≤ T . The perfect matching found in H schedules at most one extra job on each
machine. Hence, the total makespan is at most 2 · T ≤ 2 · T ∗ as claimed. The algorithm
clearly runs in polynomial time.
It is an exercise to show that the analysis is tight for the algorithm.
70
Chapter 10
Bin Packing
Here we consider the classical Bin Packing problem: We are given a set I = {1, . . . , n} of
items, where item i ∈ I has size si ∈ (0, 1] and a set B = {1, . . . , n} of bins with capacity
one. Find an assignment a : I → P B such that the number of non-empty bins is minimal.
As a shorthand, we write s(J) = j∈J sj for any J ⊆ I.
10.2 Heuristics
We will show that there are constant factor approximations for Bin Packing. Firstly
we consider the probably most simple Next Fit algorithm, which can be shown to be
2-approximate. Secondly, we give the First Fit Decreasing algorithm and show that
it is 3/2-approximate. Thus, with the above hardness result, this is best-possible, unless
P = NP.
Next Fit
The Next Fit algorithm works as follows: Initially all bins are empty and we start with
bin j = 1 and item i = 1. If bin j has residual capacity for item i, assign item i to bin j,
i.e., a(i) = j, and consider item i + 1. Otherwise consider bin j + 1 and item i. Repeat
until item n is assigned.
71
Theorem 10.3. Next Fit is a 2-approximation for Bin Packing. The algorithm runs
in O (n) time.
Proof. Let k be the number of non-empty bins in the assignment a found by Next Fit.
Let k ∗ be the optimal number of bins. We show the slightly stronger statement that
k ≤ 2 · k ∗ − 1.
Firstly we observe the lower bound k ∗ ≥ ds(I)e. Secondly, for bins j = 1, . . . , bk/2c we
have X
si > 1.
i:a(i)∈{2j−1,2j}
k−1
k
≤ ≤ ds(I)e − 1.
2 2
The analysis is tight for the algorithm, which can be seen with the following instance
with 2n items. For some ε > 0 let s2i−1 = 2 · ε, s2i = 1 − ε for i = 1, . . . , n.
Theorem 10.4. First Fit Decreasing is a 3/2-approximation for Bin Packing. The
algorithm runs in O n2 time.
Proof. Let k be the number of non-empty bins of the assignment a found by First Fit
Decreasing and let k ∗ be the optimal number.
Consider bin number j = d2/3ke. If it contains an item i with si > 1/2, then each bin
j 0 < j did not have space for item i. Thus j 0 was assigned an item i0 with i0 < i. As the
items are considered in non-increasing order of size we have si0 ≥ si > 1/2. That is, there
are at least j items of size larger than 1/2. These items need to be placed in individual
bins. This implies
2
k ∗ ≥ j ≥ k.
3
72
Otherwise, bin j and any bin j 0 > j does not contain an item with size larger than 1/2.
Hence the bins j, j + 1, . . . , k contain at least 2(k − j) + 1 items, none of which fits into
the bins 1, . . . , j − 1. Thus we have
Lemma 10.7. Let ε > 0 be a constant. For any instance of Bin Packing where si ≥ ε,
there is a (1 + ε)-approximation algorithm.
Proof. Let I be the given instance. Sort the n items by increasing size and partition them
into g = d1/ε2 e many groups each having at most q = bnε2 c many items. Notice that two
groups may contain items of the same size.
Construct an instance J by rounding up the size of each item to the size of the largest
item in its group. Instance J has at most g many different item sizes. Therefore, we
can find an optimal assignment for J by invoking Lemma 10.6. This is clearly a feasible
assignment for the original item sizes.
Now we show that k ∗ (J) ≤ (1 + ε)k ∗ (I): We construct another instance J 0 by rounding
down the size of each item to the smallest item size in its group. Clearly k ∗ (J 0 ) ≤ k ∗ (J).
73
The crucial observation is that an assignment for instance J 0 yields an assignment for all
but the largest q items of the instance J. Therefore
k ∗ (J) ≤ k ∗ (J 0 ) + q ≤ k ∗ (I) + q.
To finalize the proof, since each item has size at least ε, we have k ∗ (I) ≥ n · ε and
q = bnε2 c ≤ ε · k ∗ (I). Hence
k ∗ (J) ≤ (1 + ε) · k ∗ (I)
and the claim is established.
Proof of Theorem 10.5. Let I denote the given instance and I 0 the instance after discarding
the items with size less than ε from I. We can invoke Lemma 10.7 and find an assignment
which uses at most k(I 0 ) ≤ (1 + ε) · k ∗ (I 0 ) many bins. By using First Fit, we assign the
items with sizes less than ε into the solution found for instance I 0 . We use additional bins
if an item does not fit into any of the bins used so far.
If no additional bins are needed, then our assignment uses k(I) ≤ (1 + ε) · k ∗ (I 0 ) ≤
(1 + ε) · k ∗ (I) many bins. Otherwise, all but the last bin have residual capacity less than
ε. Thus s(I) ≥ (k(I) − 1)(1 − ε), which is a lower bound for k ∗ (I). Thus we have
k ∗ (I)
k(I) ≤ + 1 ≤ (1 + 2ε) · k ∗ (I) + 1,
1−ε
where we have used 0 < ε ≤ 1/2.
74
Chapter 11
Traveling Salesman
Here we study the classical Traveling Salesman problem: Given a complete graph
G = (V, E) on n vertices with non-negative edge cost c : E → R+ find a tour T , P
i.e., a cycle
in G which visits each vertex v ∈ V exactly once, having minimum cost(T ) = e∈T c(e).
75
Spanning Tree Heuristic
Observe that the cost of any minimum spanning tree S of G is a lower bound for the
optimal tour T ∗ , i.e., cost(T ∗ ) ≥ cost(S). This is because the removal of any edge in any
tour T , including T ∗ , yields a spanning tree of G.
A graph G is called Eulerian, if all its degrees are even. In this case it has an Euler
tour, i.e., is possible to traverse the edges of G in a cycle that visits each edge exactly
once. A respective algorithm can be implemented to run in O (n + m) time.
Output. Tour T in G
Step 4. Compute tour T in G that traveres the vertices V in the order of their first
appearance in Q.
Step 5. Return T .
cost(T ) ≤ 2 · cost(T ∗ ).
It remains to show that T is indeed a tour in G. Since T visits each vertex in the order of
first appearance in Q, i.e., at most once, and since Q visits each vertex at least once as S
is a spanning tree, T visits each vertex exactly once.
Christofides Algorithm
In the above heuristic, we doubled all the edges of the spanning tree S in order to obtain
an Eulerian graph D. Maybe there is a smarter way of finding such a graph. Recall that a
graph is Eulerian if all its degrees are even. Thus we do not have to be concerned about the
vertices with even degree in the spanning tree S. Also recall that the number of vertices
with odd degree in any graph is even k, say. Our goal will be to start with the spanning
76
tree S and obtain a graph D by adding a collection of edges (a matching) e1 , . . . , ek/2
between the vertices of odd degree in S. Observe that the even degrees in S remain even
in D and that the odd degrees in S become also even in D. Thus D is an Eulerian graph.
We want to find the cheapest possible matching of such kind.
Output. Tour T in G
Step 3. Compute minimum cost perfect matching M in H (using the cost function c).
Step 5. Compute tour T in G that traveres the vertices V in the order of their first
appearance in Q.
Step 6. Return T .
Lemma 11.3. Let W ⊆ V such that |W | is even, let H = (W, F ), where F = {vw : v, w ∈
W }, and let M be a minimum cost perfect matching in H. Then
cost(T ∗ ) ≥ 2 · cost(M ).
Proof. First observe that H has a perfect matching since the graph is complete and has an
even number of vertices. Let T ∗ be an optimal tour in G and let T be the tour in H which
visits the vertices W in the same order as in T ∗ . For every edge uv ∈ T there is a path
uw1 , . . . , wk v ∈ T ∗ and by the triangle inequality we have c(uv) ≤ c(uw1 ) + · · · + c(wk v).
Therefore cost(T ∗ ) ≥ cost(T ). On the other hand, T is a cycle with even number of edges.
Thus, by considering the edges alternatingly, T can be decomposed into two matchings
M1 and M2 . Clearly cost(M1 ) ≥ cost(M ) and cost(M2 ) ≥ cost(M ), which yields
as claimed.
Proof. We have already argued that the graph D constructed is an Eulerian graph and,
by the triangle inequality, the constructed tour T has cost(T ) ≤ cost(D). Then we have
1 3
cost(T ) ≤ cost(D) = cost(S) + cost(M ) ≤ cost(T ∗ ) + · cost(T ∗ ) = · cost(T ∗ )
2 2
as claimed.
77