Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
12 views

combinatorial_algorithms

Uploaded by

aussie.st1au
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

combinatorial_algorithms

Uploaded by

aussie.st1au
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Combinatorial Algorithms

Lecture Notes, Winter Term 10/11


Humboldt University Berlin

Alexander Souza
Contents

1 Introduction 4
1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Combinatorial Optimization Problems . . . . . . . . . . . . . . . . . . . . . 5
1.3 Algorithms and Approximation . . . . . . . . . . . . . . . . . . . . . . . . . 6

I Optimization Algorithms 7

2 Network Flows 8
2.1 Maximum Flows and Minimum Cuts . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Edmonds-Karp Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Minimum Cost Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Matroids 18
3.1 Independence Systems and Matroids . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Matroid Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Linear Programming 27
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

II Approximation Algorithms 37

5 Knapsack 38
5.1 Fractional Knapsack and Greedy . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Pseudo-Polynomial Time Algorithm . . . . . . . . . . . . . . . . . . . . . . 40
5.3 Fully Polynomial-Time Approximation Scheme . . . . . . . . . . . . . . . . 42

6 Set Cover 44
6.1 Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 Primal-Dual Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3 LP-Rounding Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7 Satisfiability 54
7.1 Randomized Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Derandomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2
8 Facility Location 59
8.1 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
8.2 Primal-Dual Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

9 Makespan Scheduling 65
9.1 Identical Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.2 Unrelated Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

10 Bin Packing 71
10.1 Hardness of Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.2 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
10.3 Asymptotic Polynomial Time Approximation Scheme . . . . . . . . . . . . . 73

11 Traveling Salesman 75
11.1 Hardness of Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.2 Metric Traveling Salesman . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3
Chapter 1

Introduction

1.1 Examples
We start with some examples of combinatorial optimization problems.

Example 1.1. The following problem is called the Knapsack problem. We are given an
amount of C Euro and wish to invest it among a set of n options. Each such option i has
cost ci and profit pi . The goal is to maximize the total profit.
Consider C = 100 and the following cost-profit table:
Option Cost Profit
1 100 150
2 1 2
3 50 55
4 50 100
Our choice of purchased options must not exceed our capital C. Thus the feasible solutions
are {1}, {2}, {3}, {4}, {2, 3}, {2, 4}, {3, 4}. Which is the best solution? We evaluate all
possibilities and find that {3, 4} give 155 altogether which maximizes our profit.

Example 1.2. Another example is a Load Balancing problem: We have m employees,


where each one has certain qualifications. Furthermore, we have a set of n jobs that need
to be done and each job j has a processing time pj . However, maybe not every employee is
qualified to work on a certain job j. For each job j we introduce a set Sj of the employees
that are eligible to work on that particular job. The following diagramm visualizes the
sets Sj : for each job j (on the left hand side) we see which employee (on the right hand
side) is able to work on that job.
Job j Employee i

1
1

3
4

4
We can formulate our problem with the following mathematical program. We use the
variables xi,j ∈ {0, 1} that indicate if employee i is assigned to job j. We want to minimize
the time until all jobs are finished.

n
X
minimize max pj xi,j “minimize finishing time”
i=1,...,m
j=1
X
subject to xi,j = 1 j = 1, . . . , n “each job gets done”
i∈Sj

xi,j ∈ {0, 1} i = 1, . . . , m, j = 1, . . . , n “assignment”

Example 1.3. Many combinatorial optimization problems, like the ones above can be for-
mulated in terms of a Integer Linear Program (ILP). Let A = (ai,j )i=1,...,m,j=1,...,n ∈
Rm,n be a matrix and let b = (bi )i=1,...,m ∈ Rm and c = (cj )j=1,...,n ∈ Rn be vectors.
Further let x = (xj )j=1,...,n ∈ Zn be variables that are allowed to take integral values, only.
Our objective function is to minimize c> x subject to Ax ≤ b. More explicitly
n
X
minimize cj xj “objective function”
j=1
Xn
subject to ai,j xj ≤ bi i = 1, . . . , m “constraints”
j=1

xj ∈ Z j = 1, . . . , n, j = 1, . . . , n “integrality”

Solving an ILP is in general NP-hard. However, we will often replace the constraints
xj ∈ Z with xj ∈ R. This is then called a relaxation as a Linear Program (LP) and
can be solved in polynomial time. Of course, such a solution is in general not feasible for
the ILP, but we can sometimes “turn” it into a feasible solution, which is not “too bad”.

1.2 Combinatorial Optimization Problems


An instance of a combinatorial optimization problem (COP) can formally be defined as a
tuple I = (U, P, val, extr) with the following meaning:

U the solution space (on which val and S are defined),


P the feasibility predicate,
val the objective function val : U → R,
extr the extremum (usually max or min).
The feasibility predicate P induces a set:

S the set of feasible solutions: S = {X ∈ U : X satisfies P }.

Our goal is to find a feasible solution where the desired extremum of val is attained. Any
such solution is called an optimum solution, or simply an optimum. U and S are usually
not given explicitly, but implicitly.

5
Let us investigate the problem in Example 1.1 in with this formalism.

U = 2{1,2,3,4} ,
X
P = “total cost is at most C”, i.e., X ∈ S if ci ≤ C
i∈X
S = {{1}, {2}, {3}, {4}, {2, 3}, {2, 4}, {3, 4}},
(
U →R
val = P
X 7→ i∈X pi ,
extr = max .

The optimum solution here is {3, 4} with value 155.


A central problem around combinatorial optimization is that it is often in principle
possible to find an optimum solution by enumerating the set of feasible solutions, but
this set mostly contains “too many” elements. This phenomenon is called combinatorial
explosion.

1.3 Algorithms and Approximation


Many problems in combinatorial optimization can be solved by using an appropriate algo-
rithm. Informally, an algorithm is given a (valid) input, i.e., a description of an instance
of a problem and computes a solution after a finite number of “elementary steps”. The
number of bits used to describe an input I is called the (binary) length or size of the input
and denoted size(I).
Let t : N → R be a function. We say that an algorithm runs in time O (t) if there is
a constant α such that the algorithm uses at most αt(size(I)) many elementary steps to
compute a solution given any input I. An algorithm is called polynomoial time if t : n 7→ nc
for some constant c. This contrasts exponential time algorithms where t : n 7→ cn for some
constant c > 1.
Because the running times of exponential time algorithms grow rather rapidly as the
input size grows, we are mostly interested in polynomial time algorithms. Of course, we
desire to find an optimum solution for any given COP in polynomial time. Unfortunately
this is not always possible as many COPs are NP-hard. (It is widely believed that no poly-
nomial time algorithm exists that solves some NP-hard COP optimally on every instance.)
Thus our goal is a find “good” solutions in polynomial time.
Let Π = {I1 , I2 , . . . } be a set of instances of a COP, where each I ∈ Π is of the form
I = (U, P, S, val, extr). For any I ∈ Π, let opt(I) = extrX∈S(I) val(X) denote the respective
optimum value. An approximation algorithm alg for Π is a polynomial time algorithm
that computes some solution X ∈ S(I) for every instance I ∈ Π. The respective value
obtained is denoted alg(I) = val(X). The approximation ratio of alg on an instance I
is defined by
alg(I)
ρalg (I) = .
opt(I)
The algorithm alg is a ρ-approximation algorithm if

ρalg (I) ≤ ρ for all I ∈ Π and extr = min,


ρalg (I) ≥ ρ for all I ∈ Π and extr = max.

6
Part I

Optimization Algorithms

7
Chapter 2

Network Flows

Flow problems are among the best-understood problems in combinatorial optimization.


They are rather important because of their numerous applications.

2.1 Maximum Flows and Minimum Cuts


A network is a (simple) digraph G = (V, A) where each edge has a capacity c : A → R+
and we have two distinguished vertices, the source s and the sink t. We often write
N = (G, c, s, t).
For any vertex v, let δ − (v) be the set of incoming edges of v, i.e., δ − (v) = {uv ∈ A :
u ∈ V } and δ + (v) the set of outgoing edges of v, i.e., δ + (v) = {vu ∈ A : u ∈ V }. Let
f : A → R+ be any function on the edges. Define the balance balf (v) of vertex v with
respect to f by X X
balf (v) := f (e) − f (e).
e∈δ + (v) e∈δ − (v)

The function f is called conserving at a vertex v if balf (v) = 0.


The Maximum Flow problem asks to transport as many units from the source to the
sink without violating the edge capacities. More precisely, a function f : A → R+ is called
an s-t-flow if:

(1) edge capacities are respected, i.e.,

0 ≤ f (e) ≤ c(e) for all e ∈ A, and

(2) f is conserving, i.e.,

balf (v) = 0 for v ∈ V − {s, t}, balf (s) ≥ 0, and balf (t) ≤ 0.

Its value is defined by val(f ) = balf (s). See Figure 2.1.

Problem 2.1 Maximum Flow


Instance. A network N = (G, c, s, t).

Task. Find an s − t-flow of maximum value in N .

8
source sink
s t

Figure 2.1: A network with source s and sink t.

We can formulate the maximum flow problem as an LP in the variables fe for e ∈ A.


X X
maximize fe − fe ,
e∈δ + (s) e∈δ − (s)
X X
subject to fe − fe = 0 v ∈ V − {s, t},
e∈δ + (v) e∈δ − (v)

fe ≤ c(e) e ∈ A,
fe ≥ 0.

Since
P the flow f = 0 is feasible for this LP, and the LP is obviously bounded (by
e∈δ + (s) c(e)) we have that the Maximum Flow problem always has an optimum so-
lution. Of course, we can solve the problem by using any algorithm for solving LPs but we
are not satisfied with this – we want a combinatorial algorithm (without solving an LP)
with guaranteed polynomial running time.
Let S be a subset of the vertices, called a cut. The induced cut-edges is the set of
outgoing edges δ + (S) = {uv ∈ A : u ∈ S, v ∈ V − S} andPincoming edges δ − (S) = {vu ∈
A : u ∈ S, v ∈ V − S}. Define its capacity by cap(S) = e∈δ+ (S) c(e). An s − t-cut is a
cut so that s ∈ S and t ∈ V − S. A minimum cut refers to one with minimal capacity
among all s − t-cuts. We extend the definition of balance also for any cut S:
X X
balf (S) = f (e) − f (e).
e∈δ + (S) e∈δ − (S)

The following result tells us that the value of a flow can be expressed through the
incoming and outcoming flow of an arbitrary cut. Furthermore, the value of any flow
(including the maximum one) is bounded from above by the capacity of any cut. We will
see soon that the value of a maximum flow equals the capacity of a minimum cut.

Lemma 2.1. For any s − t-cut S and any s − t-flow f we have that

(1) val(f ) = balf (S),

(2) val(f ) ≤ cap(S).

9
Proof. We use the flow conservation property, i.e., balf (v) = 0 for all v ∈ S − {s} to find
X
val(f ) = balf (s) = balf (v)
v∈S
 
X X X
=  f (e) − f (e)
v∈S e∈δ + (v) e∈δ − (v)
X X
= f (e) − f (e)
e∈δ + (S) e∈δ − (S)

= balf (S).
P
Furthermore we have val(f ) ≤ e∈δ+ (S) c(e) = cap(S) since 0 ≤ f (e) ≤ c(e).

The following definitions and structural result are the basis for an algorithm. A path
P = e1 , . . . , e` is a sequence of pairwise disjoint edges with common vertices, that is ei ∈ A
such that vi vi+1 ∈ A or vi+1 vi ∈ A for i = 1, . . . ` − 1, ei 6= ej for 1 ≤ i < j ≤ `. The
number ` of edges in P is called its length. A v-w-path P has the form e1 = v· and e` = ·w,
i.e., it starts at v and ends at w. An edge e = vw in a path is called foreward edge if
vw ∈ A; backward edge if wv ∈ A. (A v-v-path is called a cycle.)
An s-v-path P is called f -augmenting with respect to a flow f if

(1) f (e) < c(e) for every forward edge e ∈ P ,

(2) f (e) > 0 for every backward edge e ∈ P .

forward edges backward edges

By how much can we increase the current flow value using a particular augmenting path
P ? Define the quantity

α = min{c(e) − f (e) : e foreward edge in P } ∪ {f (e) : e backward edge in P }.

The following construction of a new flow f 0 is called augmenting f by α along P . Set


f 0 (e) = f (e) + α if e is foreward edge in P , f 0 (e) = f (e) − α if e is backward edge in P ,
and f 0 (e) = f (e) otherwise.

Observation 2.2. The function f 0 defines a flow.

Proof. By definition of the quantity α and because each edge occurs at most once in P , we
have that 0 ≤ f 0 (e) ≤ c(e) for all e ∈ A. It remains to show that f 0 is flow conserving. It
is clear that balf 0 (s) ≥ balf (s) ≥ 0 and consequently balf 0 (t) ≤ balf (t) ≤ 0. Consider an
augmentation along edges ei ei+1 with ei = vi vi+1 and ei+1 = vi+1 vi+2 for i = 1, . . . , ` − 1.
Call v = vi+1 and distinguish four cases:
+α +α +α −α
v v

(a) foreward/foreward (b) foreward/backward


−α +α −α −α
v v

(c) backward/foreward (d) backward/backward

10
Algorithm 2.1 Ford-Fulkerson
Input. Network N = (G, c, s, t) with c : A → R+ .

Output. s − t-flow f of maximum value.

Step 1. Set f (e) = 0 for all e ∈ A.

Step 2. Find an f -augmenting path P . If none exists then return f .

Step 3. Compute

α = min{c(e) − f (e) : e foreward edge in P } ∪ {f (e) : e backward edge in P }.

and augment f by α along P . Go to Step 2.

This yields the claim.

Theorem 2.3. In a network N , the maximum value of an s − t-flow equals the minimum
capacity of an s − t-cut.

Proof. We show that an s − t-flow f has maximum value if and only if there is no f -
augmenting path from s to t. In that case we will be able to find a minimum cut R with
equal capacity.
Let there be an f -augementing path P from s to t, let α be as above and obtain f 0 by
augmenting f by α along P . Observe that val(f 0 ) > val(f ), i.e., that f is not maximal.
Now let there be no f -augmenting path from s to t. Consider the set S of vertices with
augmenting paths from s, i.e., S = {v ∈ V : there is an f -augmenting path from s to v}
and t 6∈ S. Thus S is an s − t-cut. By definition of augmenting paths, we must have
e ∈ δ + (S) and f (e) = 0 for all e ∈ δ − (S). Hence, using Lemma 2.1 (1),
f (e) = c(e) for allP
we have val(f ) = e∈δ+ (S) c(e) = cap(S). By Lemma 2.1 (2) f must be a maximum flow
and S be a minimum cut.

If all capacities are integers then α is an integer and the algorithm terminates after a
finite number of iterations. Thus we obtain the following important consequence:

Corollary 2.4. If the capacities of a network N are integers, then there is an integral
maximum flow.

If the capacities are not integers, then Ford-Fulkerson might not even terminate.
Especially, we have not yet specified how we actually choose the augmenting paths men-
tioned in Step 2 of the algorithm. This must be done carefully in order to obtain a
polynomial time algorithm as the following instance illustrates. It turns out that choosing
shortest augmenting paths guarantee termination after a polynomial number of augmen-
tations; see the Edmonds-Karp algorithm.

Example 2.5. To show that Ford-Fulkerson is not a polynomial time algorithm con-
sider the following network. Here M is a large number.

11
a
M M

s 1 t

M M
b

Alternatingly augmenting one unit of flow along the paths s-a-b-t and s-b-a-t requires 2M
augmentations. This is already exponential because the (binary) input size of the graph
is O (log M ). In contrast the augmenting paths s-a-t and s-b-t already give a maximum
flow after two augmentations.

It is an exercise to show the following flow decomposition result, which provides another
structural insight into flows.

Theorem 2.6. Given a network N = (G, c, s, t) and an s − t-flow f then there is a familiy
P of simple paths, a familily C of simple cycles and positive numbers h : P ∪ C → R+ such
that
P
(1) f (e) = T ∈P∪C:e∈T h(T ) for all e ∈ A,
P
(2) val(f ) = T ∈P h(T ), and

(3) |P| + |C| ≤ |A|.

2.2 Edmonds-Karp Algorithm


Example 2.5 suggests that it may be a good idea to always choose shortest augmenting
paths, i.e., with minimum number edges. Indeed, the algorithm Edmonds-Karp below
uses this strategy and yields polynomial running time.

Algorithm 2.2 Edmonds-Karp


Input. Network N = (G, c, s, t) with c : A → R+ .

Output. s − t-flow f of maximum value.

Step 1. Set f (e) = 0 for all e ∈ A.

Step 2. Find a shortest f -augmenting path P w.r.t. the number of edges. If none exists
then return f .

Step 3. Compute α as above and augment f by α along P . Go to Step 2.

Theorem 2.7. The algorithm Edmonds-Karp computes  a maximum s − t-flow f in any


network N with n vertices and m edges in time O nm2 .

The following lemma is crucial for the proof of the worst-case running time. Let
f0 , f1 , f2 , . . . be the flows constructed by the algorithm. Denote the shortest length of an
augmenting path from s to a vertex v with respect to fk by xv (k) and respectively from v
to t by yv (k).

12
Lemma 2.8. We have that

(1) xv (k + 1) ≥ xv (k) for all k and v,

(2) yv (k + 1) ≥ yv (k) for all k and v.

Proof. Suppose for the sake of contradiction that (1) is violated for some pair (v, k). We
may assume that xv (k + 1) is minimal among the xw (k + 1) for which (1) does not hold.
Let e be the last edge in a shortest augmenting path from s to v with respect to fk+1 .
Suppose e = uv is a forward edge. Hence fk+1 (e) < c(e), xv (k + 1) = xu (k + 1) + 1,and
xu (k + 1) ≥ xu (k) by our choice of xv (k + 1). Thus xv (k + 1) ≥ xu (k) + 1. Suppose that
fk (e) < c(e) which yields xv (k) ≤ xu (k) + 1 and thus xv (k + 1) ≥ xv (k), a contradiction.
Hence we must have fk (e) = c(e) which implies that e was a backward edge when
fk was changed to fk+1 . As we used an augmenting path of shortest length we have
xu (k) = xv (k) + 1 and thus xv (k + 1) − 1 = xu (k + 1) ≥ xu (k) ≥ xv (k) + 1. Hence
xv (k + 1) ≥ xv (k) + 2 yields a contradiction.
Similarly when e is a backward edge. The proof of (2) is analogous to (1).

Proof of Theorem 2.7. When we increase the flow, the augmenting path always contains
a critical edge, i.e., an edge where the flow is either increased to meet the capacity or
reduced to zero.
Let e = uv be critical in the augmenting path w.r.t. fk . This path has xv (k) + yv (k) =
xu (k) + yu (k) edges. If e is used the next time in an augmenting path w.r.t. fh , say, then
it must be used in the opposite direction as w.r.t. fk .
Suppose that e = uv was a forward edge w.r.t. fk . Then xv (k) = xu (k) + 1 and
xu (h) = xv (h)+1. By Lemma 2.8 xv (h) ≥ xv (k) and yu (h) ≥ yu (k). Hence xu (h)+yu (h) =
xv (h) + 1 + yu (h) ≥ xv (k) + 1 + yu (k) ≥ xu (k) + yu (k) + 2. Thus the augmenting path
w.r.t. fh is at least two edges longer than the augmenting path w.r.t. fk . Similarly if e is
a backward edge.
No shortest augmenting path can contain more than n − 1 edges and hence each edge
can be critical at most (n − 1)/2 times. As each augmenting path contains at least one
critical edge, there can be at most O (nm) augmentations and each one takes time O (m).
This yields the running time of O(nm2 ).

There are further algorithms that solve the Maximum Flow problem in less time. For
√ 
example the Goldberg-Tarjan algorithm runs in time O n2 m ; with sophisticated
implementations O nm log(n2 /m) and O min{m1/2 , n2/3 }m log(n2 /m) log cmax can be
 

reached.

2.3 Minimum Cost Flows


In this section we treat a more general problem than the Maximum Flow problem,
namely the Minimum Cost Flow problem. We are again given a digraph G = (V, A)
with edge capacities c : A → R+ and in addition to that a weight function w : A → R+
indicating the cost of an edge. Thus a network is denoted N = (G, c, w, b).
P Now we define a modified notion of a flow. For any mapping b : V → R with
v∈V b(v) = 0 the value b(v) is called the balance of a vertex v. If b(v) > 0 then v
is called a source, if b(v) < 0 a sink. A b-flow in N is a function f : A → R such that

(1) 0 ≤ f (e) ≤ c(e) for all e ∈ A and

13
P P
(2) b(v) = balf (v) = e∈δ + (v) f (e) − e∈δ − (v) f (e).

A 0-flow is called a circulation.


The cost of any flow f is X
val(f ) = f (e)w(e).
e∈A

Now the problem is to find a b-flow with minimum cost.

Problem 2.2 Minimum Cost Flow


Instance. A network N = (G, c, w, b).

Task. Find an b-flow of minimum cost in N or decide that none exists.

The second part of our task is easy. Given a network N = (G, c, w, b) with balance
vector b, we can decide if a b-flow exists by solving a Maximum Flow problem: Add
two vertices s and t and edges sv, vt with capacities c(sv) = max{0, P b(v)} and c(vt) =
max{0, −b(v)} for all v ∈ V to N . Then any s − t-flow with value v∈V c(sv) in the
resulting network corresponds to a b-flow in the original network N .
For the remainder of the section we give an optimality criterion which leads directly
to an algorithm similar to the Ford-Fulkerson method. But here we augment along
cycles instead of paths. Again, the choice of the augmenting cycles must be done carefully.
But we omit this here and state the following theorem which refers to Orlin’s algorithm
without proof.
Theorem 2.9. There is an algorithm which solves the Minimum Cost Flow problem
on any network with n vertices and m edges in time O (m log m(m + n log n)).
We begin our discussion of an optimality criterion with a definition. Given a digraph
G = (V, A) with capacities c, weights w, and a flow f in G, construct the graph R = (V, A+
AR ) with AR = {wv : vw ∈ A}, where r ∈ AR is called a reverse edge. (The notation
“+” here means that we actually allow parallel edges in R). The residual capacities
cR : A + AR → R+ are cR (vw) = c(vw) − f (vw) for vw ∈ A and cR (wv) = f (vw)
for wv ∈ AR . The residual weight wR : A → R is wR (vw) = w(vw) for vw ∈ A and
wR (wv) = −w(vw) for wv ∈ AR . Finally define the residual graph Gf = (V, Af ) with
Af = {e ∈ A + AR : cR (e) > 0}.
Now, given a digraph G with capacities c and a b-flow f , an f -augmenting cycle is
a simple cycle in Gf . The following theorem is an optimality criterion for the Minimum
Cost Flow problem.
Theorem 2.10. Let N = (G, c, w, b) be an instance of the Minimum Cost Flow problem.
A b-flow f is of minimum cost if and only if there is no f -augmenting cycle with negative
total weight.
We prove the theorem in two steps. First we show that the difference between any two
b-flows gives rise to a circulation and second that this circulation can be decomposed into
circulations on simple cycles.
Lemma 2.11. Let G be a digraph with capacities c and let f and f 0 be b-flows in (G, c).
Construct R and Gf as above and define g : A + AR → R+ by g(e) = max{0, f 0 (e) − f (e)}
for e ∈ A and g(e) = max{0, f (e0 ) − f 0 (e0 )} for all e ∈ AR with corresponding e0 ∈ A.
Then g is a circulation in R, g(e) = 0 for all e 6∈ Af and val(g) = val(f 0 ) − val(f ).

14
Proof. At each vertex v ∈ R we have
X X X X
g(e) − g(e) = (f 0 (e) − f (e)) − (f 0 (e) − f (e))
+ − + −
e∈δR (v) e∈δR (v) e∈δG (v) e∈δG (v)
 
X X  X X
= f 0 (e) − f 0 (e) −  f (e) − f (e)

+ − + −
e∈δG (v) e∈δG (v) e∈δG (v) e∈δG (v)

= b(v) − b(v) = 0.

so g is a circulation in R.
For e 6∈ Af consider two cases: If e ∈ A then f (e) = c(e) and hence f 0 (e) ≤ f (e) which
gives g(e) = 0. If e = wv ∈ AR then e0 = vw ∈ A and f (e0 ) = 0 which yields g(e) = 0.
We verify the last statement
X X X
val(g) = w(e)g(e) = w(e)f 0 (e) − w(e)f (e) = val(f 0 ) − val(f )
e∈A+AR e∈A e∈A

and the proof is complete.

Lemma 2.12. For any circulation f in a digraph G = (V, A) there is a familiy C of


at mostP|A| simple cycles in G and for each C ∈ C a positive number h(C) such that
f (e) = C∈C:e∈C h(C).

Proof. Follows from Theorem 2.6.

Proof of Theorem 2.10. If there is an f -augmenting cycle C with weight γ < 0, we can
augment f along C by some α > 0 and get a b-flow f 0 with cost decreased by −γα. So f
is not a minimum cost flow.
If f is not a minimum cost b-flow, there is another b-flow f 0 with smaller cost. Consider
g as defined in Lemma 2.11 and observe that g is a circulation with val(g) < 0. By
Lemma 2.12, g can be decomposed into flows on simple cycles. Since g(e) = 0 for all
e 6∈ Af , all these cycles are f -augmenting and one of them must have negative total
weight.

2.4 Assignment Problem


A graph G = (V, E) with vertex set V = L ∪ R (“left” and “right”) is called bipartite if
the edge set satisfies E ⊆ {`r : ` ∈ L, r ∈ R}. An assignment (also called a matching) is a
subset M ⊆ E such that for every v ∈ V in the graph H = (V, M ) we have degH (v) ≤ 1. A
matching is called perfect if degH (v) = 1 for every v ∈ V . Of course, a necessary condition
for the existence of a perfect matching in a bipartite graph is |L| = |R|.
The Assignment Problem has numerous applications and refers to the following. We
are given a bipartite graph G = (L ∪ R, E) and a weight function w : E → R. We are
asked to find a subset M ⊆ E with minimum total weight, i.e.,
X
val(M ) = w(e),
e∈M

such that M is a perfect matching or to conclude that no such matching exists.

Theorem 2.13. The Assignment problem is a Minimum Cost Flow problem.

15
Problem 2.3 Assignment
Instance. Bipartite graph G = (L ∪ R, E) and a weight function w : E → R.
P
Task. Find perfect matching M with minimum weight val(M ) = e∈M w(e) or con-
clude that no such matching exists.

Proof. Let G = (V, E) be a bipartite graph with V = L ∪ R and |L| = |R| = n. Now
we construct a network N for the Minimum Cost Flow problem. We start with the
vertices V , add a vertex s and connect it with every vertex ` ∈ L with directed edges s`.
Further add a vertex t and introduce the directed edges rt for every r ∈ R. Further add
directed versions of all edges e ∈ E, i.e., a directed edge `r is added for every undirected
edge `r. The capacities of all these edges is one. The weights of the s` edges and the rt
edges are zero – the weights of the `r edges are equal to their weights in G. Now every
integral b-flow f in N with b = (b(s), b(v1 ), . . . , b(vn ), b(t)) = (n, 0, . . . , 0, −n) corresponds
to a perfect matching in G with the same weight, and vice versa.

Below we give several applications of the Assignment problem. In most applications


the requirement |L| = |R| is disturbing, but can usually be handeled by adding artificial
vertices and edges.

Bipartite Cardinality Matching


In the Bipartite Cardinality Matching problem we are given a bipartite graph G =
(V, E) with V = L ∪ R, where |L| ≤ |R|. Our task is to find a matching with maximum
number of edges. We construct a network similarly as before: we add vertices s and t and
the directed edges s` and rt for all ` ∈ L and r ∈ R. All these edges have capacity equal
to one. Any integral s − t-flow of value k corresponds to a matching with k edges. Thus
we have to solve a Maximum Flow problem.

Internet Dating
An internet dating website has ` females and r males in its pool. Furthermore, there is
a preference system, where each person describes her/himself and her/his ideal partner.
This system produces for each female i and each male j a value wij > 0. We seek to find
an assignment of females to males with maximum total value. By adding dummy vertices
with zero-weight edges to the appropriate side, and defining weights −wij we arrive at an
Assignment problem as defined above.

Scheduling on Parallel Machines


In the Scheduling on Parallel Machines problem we are given m machines and n
jobs, where job j takes time pij if assigned to machine i. The jobs assigned to any machine
are scheduled in a certain order. The completion time of job j is denoted cj and refers
to the following: If job j is assigned to machine i then the times pik of the jobs k also
assigned to
P machine i but scheduled before job j contribute to the completion time of j,
i.e., cj = k on i, k ≤ j pik (where “k ≤ j” means thatPjob k is scheduled before job j). The
objective is to minimize the total completion time j cj .
This problem can be formulated as an Assignment problem as follows: First consider
the case that we have exactly one machine, i.e., all jobs have to be assigned to it. Consider

16
a permutation π of 1, 2, . . . , n, where π(j) gives the position of job j, and observe that we
can write
Xn Xn
cj = (n − π(j) + 1) · p1j ,
j=1 j=1

P the contribution p1j of job j in position π(j) is counted n − π(j) + 1 many times
because
in j cj .
For multiple machines, the crucial observation is that the contribution of any job j
can be described by pij times one of the multipliers n, n − 1, . . . , 1. Hence we create
the following graph: A source s, a sink t, for each job a vertex vj for j = 1, . . . , n, and
for each machine i exactly n slots, i.e., vertices sik for k = 1, . . . , n. We add edges svj
with zero weight and unit capacity. Furthermore, we add the edges vj sik with weight
(n − k + 1) · pij . Finally we add edges sik t with zero weight and unit capacity. Any b-flow
with b = (b(s), b(v1 ), . . . , b(vn ), b(s11 ), . . . , b(smn ), b(t)) = (n, 0, . . . , 0, −n) with minimum
cost corresponds to an optimal job-machine-assignment and vice versa.

17
Chapter 3

Matroids

Several combinatorial optimization problems can be formulated in terms of independence


systems. More specifically in terms of matroids. If this is the case, we are in a very
comfortable position, because optimizing over matroids can be done by a simple greedy
algorithm. The main result of this chapter is that an independence system is a matroid
if and only if the greedy algorithm determines an optimal solution. The most prominent
example of a matroid is the Minimum Spanning Tree problem.

3.1 Independence Systems and Matroids


An independence system is a pair (U, S), where U is an arbitrary finite set, called universe
and S ⊆ 2U is a family subsets, whose members are called the independent sets of U and
satisfy the following properties:

(1) ∅ ∈ S and

(2) If X ⊆ Y ∈ S, then X ∈ S.

Maximal independent sets (with resect to inclusion) are called bases. In particular, for
X ⊆ U , any maximal independent subset of X is called a basis of X. Furthermore, there
is a cost function c : 2U → R, which is modular, i.e., for any X ⊆ U we have
X
c(X) = c(x).
x∈X

Here we have used c(x) = c({x}) as a shorthand. By modularity we may assume c : U → R.


For any X ⊆ U define the rank of X by

rank(X) = max{|Y | : Y ⊆ X, Y ∈ S},

i.e., the maximum cardinality of a basis of X.


There are two optimization problems associated with independence systems, a mini-
mization and a maximization problem.

Problem 3.1 Maximum Independence System Member


Instance. Independence system (U, S), cost function c : U → R.

Task. Find a member X ∈ S such that c(X) is maximum.

18
Problem 3.2 Minimum Independence System Basis
Instance. Independence system (U, S), cost function c : U → R.

Task. Find a basis X ∈ S such that c(X) is minimum.

In general, S is not given explicitly. Instead, we assume that we have an independence


oracle that decides if a given subset X ⊆ U is a member of the independence system, i.e.,
if X ∈ S. Also notice that the maximization problem asks for any member of the system
(U, S), while the minimization problem asks for a basis.
Three examples of combinatorial optimization problems that can be expressed in terms
of independence systems are:

Problem 3.3 Maximum Cost Forrest


Instance. Undirected graph G = (V, E), cost function c : E → R.

Task. Find a forrest F ⊆ E, i.e., a cycle-free subgraph, having maximal cost c(F ) =
P
e∈F c(e).

Problem 3.4 Minimum Spanning Tree


Instance. Undirected connected graph G = (V, E), cost function c : E → R.

Task. T ⊆ E, i.e., a connected forrest with vertex set V , having


Find a spanning tree P
minimal cost c(T ) = e∈T c(e).

All the above problems can be solved in polynomial time, but there are also a lot of NP-
hard combinatorial optimization problems that can be formulated in terms of independence
systems. We will see that Maximum Cost Forrest and Minimum Spanning Tree can
be expressed as a special independence system, called matroid, while Maximum Cost
Matching (for bipartite graphs) can be solved in terms of intersection of two matroids.
An independence system (U, S) is called a matroid if X, Y ∈ S and |X| > |Y |, then
there is x ∈ X − Y such that Y ∪ {x} ∈ S.

Observation 3.1. Let (U, S) be an independence system. Then the following statements
are equivalent:

(1) (U, S) is a matroid.

(2) If X, Y ∈ S and |X| > |Y |, then there is x ∈ X − Y such that Y ∪ {x} ∈ S.

(3) If X, Y ∈ S and |X| = |Y | + 1, then there is x ∈ X − Y such that Y ∪ {x} ∈ S.

(4) For each X ⊆ S, all bases of X have the same cardinality.

Proof. By definition (1) and (2) are equivalent. Statements (2) and (3) are equivalent
and (2) implies (4). To prove that (4) implies (2) let X, Y ∈ S and |X| > |Y |. By (4) Y
can not be a basis of X ∪ Y . So there must be an x ∈ (X ∪ Y ) − Y = X − Y such that
Y ∪ {x} ∈ S.

19
Problem 3.5 Maximum Cost Matching
Instance. Undirected graph G = (V, E), cost function c : E → R.

Task. Find a matching


P M ⊆ E, i.e., e, e0 ∈ M imply e ∩ e0 = ∅, having maximal cost
c(M ) = e∈M c(e).

Let (U, S) be an independence system. For X ⊆ U we define the lower rank of X by

lrank(X) = min{|Y | : Y ⊆ X, Y ∈ S, and Y ∪ {x} 6∈ S for all x ∈ X − Y },

i.e., the minimum cardinality of a basis of X.


The rank quotient of (U, S) is defined by

lrank(X)
q(U, S) = min .
X⊆U rank(X)

Observation 3.2. Let (U, S) be an independence system. Then q(U, S) ≤ 1. Furthermore,


(U, S) is a matroid if and only if q(U, S) = 1.

Proof. By definition we have q(U, S) ≤ 1. Statement q(U, S) = 1 is equivalent to Obser-


vation 3.1 (4).

3.2 Greedy Algorithm


Let (U, S) be an independence system with cost function c : U → R+ and consider Maxi-
mum Independence System Member. Notice that the restriction to non-negative cost
is without loss of generality, because negative cost never appear in an optimum solution.
Again, we assume that we have access to an independence oracle. This allows us to for-
mulate an algorithm called Greedy.

Algorithm 3.1 Greedy


Input. Independence system (U, S) with independence oracle, cost function c : U → R+ .

Output. Set X ∈ S

Step 1. X = ∅.

Step 2. Sort U = {u1 , . . . , un } such that c(u1 ) ≥ · · · ≥ c(un ).

Step 3. For i = 1, . . . , n, if X ∪ {ui } ∈ S let X = X ∪ {ui }.

Step 4. Return X.

The following theorem states that the approximation ratio of Greedy is at least the
rank quotient (and of course at most one). Notice that Greedy is a polynomial time
algorithm if the independence oracle has polynomial time (which is often the case in the
applications).

Theorem 3.3. Let (U, S) be an independence system and c : U → R+ be a cost function.


Let greedy(I) denote the cost c(X) of the solution returned by Greedy on an instance

20
I = (U, S, c). Let opt(I) denote the optimum cost for Maximum Independence System
Member. Then we have
greedy(I)
q(U, S) ≤ ≤1
opt(I)
for all c : U → R+ . There is a cost function where the lower bound is attained.
Proof. Let U = {u1 , . . . , un } be ordered such that c(u1 ) ≥ · · · ≥ c(un ). Let Gn be the
solution found by Greedy while On is an optimum solution. Let Uj = {u1 , . . . , uj },
Gj = Gn ∩ Uj , and Oj = On ∩ Uj for j = 0, . . . , n. Set dn = c(un ) and dj = c(uj ) − c(uj+1 )
for j = 1, . . . , n − 1.
Since Oj ∈ S we have |Oj | ≤ rank(Uj ). Since Gj is a basis of Uj we have |Gj | ≥
lrank(Uj ). We conclude that
n
X
c(Gn ) = (|Gj | − |Gj−1 |)c(uj )
j=1
n
X
= |Gj |dj
j=1
Xn
≥ lrank(Uj )dj
j=1
n
X
≥ q(U, S) rank(Uj )dj
j=1
Xn
≥ q(U, S) |Oj |dj
j=1
Xn
= q(U, S) (|Oj | − |Oj−1 |)c(uj )
j=1

= q(U, S)c(On ).
To show that the lower bound is sharp choose V ⊆ U and bases B1 , B2 of V such that
|B1 |/|B2 | = q(U, S). Define c(v) = 1 if v ∈ V and c(v) = 0 if v ∈ U − V . Sort u1 , . . . , un
such that c(u1 ) ≥ · · · ≥ c(un ) and B1 = {u1 , . . . , u|B1 | }. Then c(Gn ) = |B1 | and c(On ) =
|B2 |. Thus the lower bound is attained.
Specifically, if (U, S) is a matroid, then Greedy always determines an optimum solu-
tion (and vice versa).
Theorem 3.4. An independence system (U, S) is a matroid if and only if Greedy finds
an optimum solution for Maximum Independence System Member for all cost func-
tions c : U → R+ .
Proof. By Theorem 3.3, we have q(U, S) < 1 if and only if there is a cost function c :
U → R+ , for which the Greedy algorithm does not find an optimum solution. By
Observation 3.2 we have that q(U, S) < 1 if and only if (U, S) is not a matroid.

3.3 Matroid Intersection


Given two independence systems (U, S1 ) and (U, S2 ) define their intersection by (U, S1 ∩
S2 ). The intersection of a finite number of independence systems is defined analogously.
We state without proof:

21
Proposition 3.5. Any independence system is the intersection of a finite number of ma-
troids.

Thus, the intersection of matroids is not a matroid in general. Hence we can not
expect that a Greedy algorithm finds an optimum common independent set. However,
the following result implies a lower bound on the approximation ratio of Greedy.

Proposition 3.6. If (U, S) is the intersection of p matroids, then q(U, S) ≥ 1/p.

As we have seen in Theorem 3.3 that the lower bound is sharp, Greedy can be
arbitrarily bad when used for optimizing over arbitrary independence systems, i.e., the
intersection of p matroids. However, for optimizing over the intersection of two matroids,
there is a polynomial time algorithm, called Edmonds, provided that the independence
oracle is polynomial (see below). The important property of the algorithm is that it
generalizes the concept of an augmenting path. The intersection problem becomes NP-
hard for more than two matroids.

Problem 3.6 Maximum Matroid Intersection


Instance. Matroids (U, S1 ) and (U, S2 ) with independence oracles.

Task. Find X ∈ S1 ∩ S2 such that |X| is maximum.

An important special case is that Maximum Cost Matching in bipartite graphs can
be formulated as matroid intersection of two matroids. Specifically let G = (L ∪ R, E) be
bipartite and let S = {M ⊆ E : M is a matching in G}. Then the independence system
(E, S) is the intersection of the two matroids (E, SL ) and (E, SR ) with

SL = {M ⊆ E : |δM (v)| ≤ 1 for all v ∈ L},


SR = {M ⊆ E : |δM (v)| ≤ 1 for all v ∈ R},

where δM (v) = {e ∈ M : v ∈ e}. It is an exercise to show that (E, SL ) and (E, SR ) are
actually matroids.
The remainder of this section is devoted to the development of Edmonds. We start
our discussion with basic facts on the submodularity of the rank function of a matroids.
The proof is left as an execise.

Theorem 3.7. Let U be a finite set and r : 2U → N. Then the following are equivalent:

(1) r is the rank function of a matroid (U, S) (and S = {X ⊆ U : r(X) = |X|}).

(2) For all X, Y ⊆ U :

(a) r(X) ≤ |X|;


(b) If X ⊆ Y then r(X) ≤ r(Y );
(c) r(X ∪ Y ) + r(X ∩ Y ) ≤ r(X) + r(Y ).

(3) For all X ⊆ U and x, y ∈ U :

(a) r(∅) = 0;
(b) r(X) ≤ r(X ∪ {y}) ≤ r(X) + 1;
(c) If r(X ∪ {x}) = r(X ∪ {y}) = r(X) then r(X ∪ {x, y}) = r(X).

22
For any independence system (U, S) the sets 2U − S are called dependent. A minimal
dependent set is called a circuit.

Lemma 3.8. Let U be a finite set and C ⊆ 2U . C is the set of circuits of an independence
system (U, S), where S = {X ⊂ U : there is no Y ∈ C with Y ⊆ X}, if and only if the
following conditions hold:

(1) ∅ 6∈ C;

(2) For any X, Y ∈ C, X ⊆ Y implies X = Y .

Proof. By definition, the family of circuits of any independence system satisfies (1) and (2).
If C satisfies (1), then (U, S) is an indepencence system. If C also satisfies (2), it is the
set of circuits of this independence system.

Theorem 3.9. Let (U, S) be an independence system. If for any X ∈ S and x ∈ U the
set X ∪ {x} contains at most p circuits, then q(U, S) ≥ 1/p.

Proof. Let V ⊆ U and let X and Y be two bases of V . We show |X|/|Y | ≥ 1/p.
Let X − Y = {u1 , . . . , ut }. We construct a sequence Y = Y0 , . . . , Yt of independent
subsets of X ∪ Y such that X ∩ Y ⊆ Yi ∩ {u1 , . . . , ut } = {u1 , . . . , ui } and |Yi−1 − Yi | ≤ p
for i = 1, . . . , t.
Since Yi ∪ {ui+1 } contains at most p circuits and each such circuit must meet Yi − X
(because X is independent), there is an Z ⊆ Yi −X such that |Z| ≤ p and (Yi −Z)∪{ui+1 } ∈
S. We set Yi+1 = (Yi − Z) ∪ {ui+1 }.
Now X ⊆ Yt ∈ S. Since X is a basis of V , X = Yt . We conclude that
t
X
|Y − X| = |Yi−1 − Yi | ≤ pt = p|X − Y |,
i=1

proving |Y | ≤ p|X|.

Theorem 3.10. If C is the set of circuits of an independence system (U, S), then the
following statements are equivalent:

(1) (U, S) is a matroid.

(2) For any X ∈ S and x ∈ U , X ∪ {x} contains at most one circuit.

(3) For any X, Y ∈ C with X 6= Y and x ∈ X ∩ Y , there exists a Z ∈ C with Z ⊆


(X ∪ Y ) − {x}.

(4) For any X, Y ∈ C, x ∈ X ∩ Y , and y ∈ X − Y , there exists a Z ∈ C with y ∈ Z ⊆


(X ∪ Y ) − {x}.

Proof. Firstly (1) implies (4): Let C be the familiy of circuits of a matroid, and let
X, Y ∈ C, x ∈ X ∩ Y , and y ∈ X − Y . By Theorem 3.7 we find

|X| − 1 + r((X ∪ Y ) − {x, y}) + |Y | − 1 = r(X) + r((X ∪ Y ) − {x, y}) + r(Y )


≥ r(X) + r((X ∪ Y ) − {y}) + r(Y − {x})
≥ r(X − {y}) + r(X ∪ Y ) + +r(Y − {x})
= |X| − 1 + r(X ∪ Y ) + |Y | − 1.

23
So r((X ∪ Y ) − {x, y}) = r(X ∪ Y ). Let B be a basis of (X ∪ Y ) − {x, y}. Then B ∪ {y}
contains a circuit Z with y ∈ Z ⊆ (X ∪ Y ) − {x} as required.
Secondly, (4) trivially implies (3). Thirdly, (3) implies (2): If X ∈ S and X ∪ {x}
contains two circuits Y and Z then (3) yields (Y ∪ Z) − {x} 6∈ S. However, (Y ∪ Z) − {x}
is a subset of X. Finally, (2) implies (1) by Theorem 3.9 and Observation 3.2.

For X ∈ S and x ∈ U , C(X, x) denotes the unique circuit in X ∪ {x} if X ∪ {x} 6∈ S,


and C(X, x) = ∅ otherwise.

Lemma 3.11. Let (U, S) be a matroid and X ∈ S. Let x1 , . . . , xs ∈ X and y1 , . . . , ys 6∈ X


with

(1) xk ∈ C(X, yk ) for 1 ≤ k ≤ s and

(2) xj 6∈ C(X, yk ) for 1 ≤ j < k ≤ s.

Then (X − {x1 , . . . , xs }) ∪ {y1 , . . . , ys } ∈ S.

Proof. Let Xr = (X − {x1 , . . . , xr }) ∪ {y1 , . . . , yr }. We show that Xr ∈ S for all r by


induction. For r = 0 this is clearly true. Let us assume that Xr−1 ∈ S for some r ∈
{1, . . . , s}. If X ∪ {yr } ∈ S then we immediately have Xr ∈ S. Otherwise Xr−1 ∪ {yr }
contains a unique circuit Z by Theorem 3.10. Since C(X, yr ) ⊆ Xr−1 ∪ {yr }, we must have
Z = C(X, yr ). But then, by assumption xr ∈ C(X, yr ) = Z, so Xr = (Xr−1 ∪{yr })−{xr } ∈
S.

The idea behind the algorithm Edmonds is the following: Starting with X = ∅, we
augment X by one element in each iteration. Since in general we cannot hope for an
element x such that x ∈ S1 ∩ S2 , we shall look for “alternating paths”. To make this
convenient, we define an auxiliary graph. We apply the notion C(X, x) to (U, Si ) and
write Ci (X, x) for i ∈ {1, 2}.
Given a set X ∈ S1 ∩ S2 we define a directed auxiliary graph GX by
(1)
AX = {(x, y) : y ∈ U − X, x ∈ C1 (X, y) − {y}},
(2)
AX = {(y, x) : y ∈ U − X, x ∈ C2 (X, y) − {y}},
(1) (2)
GX = (U, AX ∪ AX ).

We set

SX = {y ∈ U − X : X ∪ {y} ∈ S1 },
TX = {y ∈ U − X : X ∪ {y} ∈ S2 }.

and look for a shortest path from SX to TX . Such a path will enable us to augment the
set X. (If SX ∩ TX 6= ∅, we have a path of length zero and we can augment X by any
element in SX ∩ TX .)

Lemma 3.12. Let X ∈ S1 ∩ S2 . Let y0 , x1 , y1 , . . . , xs , ys be the vertices of a shortest


y0 − ys -path in GX (in this order), with y0 ∈ SX and ys ∈ TX . Then

X 0 = (X ∪ {y0 , . . . , ys }) − {x1 , . . . , xs } ∈ S1 ∩ S2 .

24
Algorithm 3.2 Edmonds
Input. Matroids (U, S1 ) and (U, S2 ) by independence oracles.

Output. Set X ∈ S1 ∩ S2 with maximum cardinality.

Step 1. X = ∅.

Step 2. For each y ∈ U − X and i ∈ {1, 2} let

Ci (X, y) = {x ∈ X ∪ {y} : X ∪ {y} 6∈ Si , (X ∪ {y}) − {x} ∈ Si }.

Step 3. Compute SX and TX , and GX as defined above.

Step 4. Find shortest SX − TX -path P in GX . If none exists then return X.

Step 5. Augment X by P :

X = (X ∪ {y0 , . . . , ys }) − {x1 , . . . , xs },

where {y0 , . . . , ys } = (U − X) ∩ P and {x1 , . . . , xs } = X ∩ P . Go to Step 2.

Proof. Firstly, we show that X ∪ {y0 }, x1 , . . . , xs and y1 , . . . , ys satisfy the requirements


(1)
of Lemma 3.11 with respect to S1 . (1) is satisfied because (xj , yj ) ∈ AX for all j and (2)
is satisfied because otherwise the path could be shortcut. We conclude that X 0 ∈ S1 .
Secondly, we show that X ∪ {ys }, xs , . . . , x1 and ys−1 , . . . , y0 satisfy the requirements
of Lemma 3.11 with respect to S2 . Observe that X ∪ {ys } ∈ S2 because ys ∈ TX . (1) is
(2)
satisfied because (yj−1 , xj ) ∈ AX for all j and (2) is satisfied because otherwise the path
could be shortcut. We conclude that X 0 ∈ S2 .

We shall now prove that if there is no SX −TX -path in GX , then X is already maximum.
We need the following fact:
Proposition 3.13. Let (U, S1 ) and (U, S2 ) be two matroids with rank functions r1 and r2 .
Then for any X ∈ S1 ∩ S2 and Y ⊆ U we have

|X| ≤ r1 (Y ) + r2 (U − Y ).

Proof. X ∩ Y ∈ S1 implies |X ∩ Y | ≤ r1 (Y ). Similarly X − Y ∈ S2 implies |X − Y | ≤


r2 (U − Y ). Adding the inequalities yields the proof.

Lemma 3.14. X ∈ S1 ∩ S2 is maximum if and only if there is no SX − TX -path in GX .


Proof. If there is an SX − TX -path, there is also a shortest one. We apply Lemma 3.12
and obtain a set X 0 ∈ S1 ∩ S2 of greater cardinality.
Otherwise let R be the set of vertices reachable from SX in GX . We have R ∩ TX = ∅.
Let r1 and r2 be the rank function of S1 and S2 , respectively.
We claim that r2 (R) = |X ∩R|. If not, there would be a y ∈ R−X with (X ∩R)∪{y} ∈
S2 . Since X ∪ {y} 6∈ S2 (because y 6∈ TX ), the circuit C2 (X, y) must contain an element
(2)
x ∈ X − R. But then (y, x) ∈ AX means that there is an edge leaving R. This contradicts
the definition of R.

25
Next we prove that r1 (U − R) = |X − R|. If not, there would be a y ∈ (U − R) − X
with (X − R) ∪ {y} ∈ S1 . Since X ∪ {y} 6∈ S1 (because y 6∈ SX ), the circuit C1 (X, y) must
(1)
contain an element x ∈ X ∩ R. But then (x, y) ∈ AX means that there is an edge leaving
R. This contradicts the definition of R.
Altogether we have |X| = r2 (R) + r1 (U − R). By Proposition 3.13, this implies opti-
mality.

This directly implies a min-max-equality.

Theorem 3.15. Let (U, S1 ) and (U, S2 ) be two matroids with rank functions r1 and r2 .
Then
max{|X| : X ∈ S1 ∩ S2 } = min{r1 (Y ) + r2 (U − Y ) : Y ⊆ U }.

Theorem  3.16. Edmonds correctly solves the Matroid Intersection problem in time
3
O |U | θ , where θ is the maximum complexity of the two independence oracles.

Proof. Correctness follows from Lemma 3.12 and 3.14. Steps 2 and 3 can be done in
2

O |U | θ time and Step 4 in O (|U |). Since there are at most |U | augmentations, the
result follows.

26
Chapter 4

Linear Programming

Linear programs (LP) play an important role in the theory and practice of optimization
problems. Many COPs can directly be formulated as LPs. Furthermore, LPs are invaluable
for the design and analysis of approximation algorithms. Generally speaking, LPs are
COPs with linear objective function and linear constraints, where the variables are defined
on a continous domain. We will be more specific below.

4.1 Introduction
We begin our treatment of linear programming with an example of a transportation prob-
lem to illustrate how LPs can be used to formulate optimization problems.
Example 4.1. There are two brickworks w1 , w2 and three construction sites s1 , s2 , s3 .
The works produce b1 = 60 and b2 = 30 tons of bricks per day. The sites require c1 = 30,
c2 = 20 and c3 = 40 tons of bricks per day. The transportation costs tij per ton from work
wi to site sj are given in the following table:
tij s1 s2 s3
w1 40 75 50
w2 20 50 40
Which work delivers which site in order to minimize the total transportation cost? Let us
write the problem as a mathematical program. We use variables xij that tell us how much
we deliver from work wi to site sj .
minimize 40x11 + 75x12 + 50x13 + 20x21 + 50x22 + 40x23
subject to x11 + x12 + x13 ≤ 60
x21 + x22 + x23 ≤ 30
x11 + x21 = 30
x12 + x21 = 20
x13 + x23 = 40
xij ≥ 0 i = 1, 2, j = 1, 2, 3.
How do we find the best xij ?
The general Linear Programming task is given in Problem 4.1.
As a shorthand we shall frequently write max{c> x : Ax ≤ b}. We can assume that
we deal with a maximization problem without loss of generality because we can treat a
minimization problem if we replace c with −c.

27
Problem 4.1 Linear Programming
Instance. Matrix A ∈ Rm×n , vectors b ∈ Rm and c ∈ Rn .

Task. Solve the problem

maximze c> x,
subject to Ax ≤ b,
x ∈ Rn .

That means answer one of the following questions.

(1) Find a vector x ∈ Rn such that Ax ≤ b and val(x) = c> x is maximum, or

(2) decide that the set P = {x ∈ Rn : Ax ≤ b} is empty, or


(3) decide that for all α ∈ R there is an x ∈ Rn with Ax ≤ b and c> x > α.

The function val(x) = c> x is the objective function. A feasible x∗ which maximizes val
is an optimum solution and the value z ∗ = val(x∗ ) is called optimum value. Any x ∈ Rn
that satisfies Ax ≤ b is called feasible. The set P = {x ∈ Rn : Ax ≤ b} is called the feasible
region, i.e., the set of feasible solutions. If P is empty, then the problem is infeasible. If
for every α ∈ R, there is a feasible x such that c> x > α then the problem is unbounded.
This simply means that the maximum of the objective function does not exist.

4.2 Polyhedra
Consider the vector space Rn . A (linear) subspace S of Rn is a subset of Rn closed under
vector addition and scalar multiplication. Equivalently, S is the set of all points in Rn
that satisfy a set of homogeneous linear equations:

S = {x ∈ Rn : Ax = 0},

for some matrix A ∈ Rn×m . The dimension dim(S) is equal to the maximum number of
linear independent vectors in S, i.e., dim(S) = n − rank(A). Here rank(A) denotes the
number of linear independent rows of A. An affine subspace Sb of Rn is the set of all points
that satisfy a set of inhomogeneous linear equations:

Sb = {x ∈ Rn : Ax = b}.

We have dim(Sb ) = dim(S). The dimension dim(X) of any subset X ⊆ Rn is the smallest
dimension of any affine subspace which contains it.
An affine subspace of Rn of dimension n − 1 is called hyperplane, i.e., alternatively

H = {x ∈ Rn : a> x = b},

for some vector a ∈ Rn , a 6= 0 and scalar b. A hyperplane defines two (closed) halfspaces

H + = {x ∈ Rn : a> x ≥ b},
H − = {x ∈ Rn : a> x ≤ b}.

28
As a halfspace is a convex set, the intersection of halfspaces is also convex.
A polyhedron in Rn is a set

P = {x ∈ Rn : Ax ≤ b}

for some matrix A ∈ Rm×n and some vector b ∈ Rm . A bounded polyhedron is called
polytope.
Let P = {x : Ax ≤ b} be a non-empty polyhedron with dimension d. Let c be a vector
for which δ := max{c> x : x ∈ P } < ∞, then

Hc = {x : c> x = δ}

is called supporting hyperplane of P . A face of P is the intersection of P with a supporting


hyperplane of P . Three types of faces are particular important, see Figure 4.1:

(1) A facet is a face of dimension d − 1,

(2) a vertex is a face of dimension zero (a point), and

(3) an edge is a face of dimension one (a line segment).

H1

x3

Facet

H3

x1
Vertex

Edge

H2

x2

Figure 4.1: Facet, vertex, and edge.

The following lemma essentially states that a set F ⊆ P is a face of a polyhedron P if


and only if some of the inequalities of Ax ≤ b are satisfied with equality for all elements
of F .

Lemma 4.2. Let P = {x : Ax ≤ b} be a polyhedron and F ⊆ P . Then the following


statements are equivalent:

(1) F is a face of P .

29
(2) There is a vector c with δ := max{c> x : x ∈ P } < ∞ and F = {x ∈ P : c> x = δ}.

(3) F = {x ∈ P : A0 x = b0 } =
6 ∅ for some subsystem A0 x ≤ b0 of Ax ≤ b.

As important corollaries we have:

Corollary 4.3. If max{c> x : x ∈ P } < ∞ for a non-empty polyhedron P and a vector c,


then the set of points where the maximum is attained is a face of P .

Corollary 4.4. Let P be a polyhedron and F a face of P . Then F is again a polyhedron.


Furthermore, a set F 0 ⊆ F is a face of P if and only if it is a face of F .

A important class of faces are minimal faces, i.e., faces that do not contain any other
face. For these we have:

Lemma 4.5. Let P = {x : Ax ≤ b} be a polyhedron. A non-empty set F ⊆ P is a minimal


face of P if and only if F = {x ∈ Rn : A0 x = b0 } for some subsystem of Ax ≤ b.

Corollary 4.3 and Lemma 4.5 already imply that Linear Programming can be solved
by solving the linear equation system A0 x = b0 for each subsystem A0 x ≤ b0 . This approach
obviously yields an exponential time algorithm. An algorithm which is more practicable
(although also exponential in the worst case) is the Simplex algorithm. The algorithm is
based on the following important consequence of Lemma 4.5.

Corollary 4.6. Let P = {x ∈ Rn : Ax ≤ b} be a polyhedron. Then all minimal faces of


P have dimension n − rank(A). The minimal faces of polytopes are vertices.

Thus, it suffices to search an optimum solution among the vertices of the polyhedron.
This is what the Simplex algorithm is doing. We do not explain the algorithm in detail
here, but it works as follows. Provided that the polyhedron is not empty, it finds an
initial vertex. If the current vertex is not optimal, find another vertex with strictly larger
objective value (pivot rule). Iterate until an optimal vertex is found or the polyhedron
can be shown to be unbounded. See Figure 4.2.
The algorithm terminates after at most m n iterations (which is not polynomial). It
was conjectured that Simplex is polynomial until Klee and Minty gave an example where
the algorithm (with Bland’s pivot rule) uses 2n iterations on an LP with n variables and
2n constraints. It is not known if there is a pivot rule that leads to polynomial running
time. Nonetheless, Simplex with Bland’s pivot rule is frequently observed to terminate
after few iterations when run on “practical instances”.
However, there are algorithms, e.g., the Ellipsoid method and Karmakar’s algo-
rithm that solve Linear Programming in polynomial time. But these algorithms are
mainly of interest from a theoretical point of view. We conclude with the statement that
one can solve Linear Programming in polynomial time with “black box” algorithms.

30
x2

x1

Figure 4.2: A Simplex path.

4.3 Duality
Intuition behind Duality
Consider the following LP, which is illustrated in Figure 4.3

maximize x1 + x2 (4.1)
subject to 4x1 − x2 ≤ 8 (4.2)
2x1 + x2 ≤ 10 (4.3)
− 5x1 + 2x2 ≤ 2 (4.4)
− x1 ≤ 0 (4.5)
− x2 ≤ 0 (4.6)

and notice that this LP is in the maximization form

max{c> x : Ax ≤ b}.

Because we are dealing with a maximization problem, every feasible solution x provides
the lower bound c> x on the value c> x∗ of the optimum solution x∗ , i.e., we know c> x ≤
c> x∗ .
Can we also obtain upper bounds on c> x∗ ? For any feasible solution x, the constraints
(4.2)–(4.6) are satisfied. Now compare the objective function (4.1) with the constraint
(4.3) coefficient-by-coefficient (where we remember that x1 , x2 ≥ 0 in this example):

1 · x1 + 1 · x2

2 · x1 + 1 · x2 ≤ 10

31
(4.6)

x2
(4.4)
(4.3) (4.1) objective function

(4.2)

x1

(4.5)

Figure 4.3: An LP.

Thus for every feasible solution x we have the upper bound x1 +x2 ≤ 10, i.e., the optimum
value can be at most 10. Can we improve on this? We could try 79 · (4.3) + 19 · (4.4):

1 · x1 + 1 · x2

( 79 · 2 + 1
9 · (−5))x1 + ( 79 · 1 + 1
9 · 2)x2 ≤ 7
9 · 10 + 1
9 ·2= 72
9 =8
Hence we have x1 + x2 ≤ 8 for every feasible x and thus an upper bound of 8 on the
optimum value. If we look closely, our choices 7/9 and 1/9 give 79 · 2 + 19 · (−5) = 1 and
7 1 >
9 · 1 + 9 · 2 = 1, i.e., we have combined the coefficients of the objective function c x with
equality. This is also the best bound this approach can give here.
This suggests the following genear approach for obtining upper bounds on the optimal
value. Combine the constraints with non-negative multipliers y = (y1 , y2 , y3 , y4 , y5 ) such
that each coefficient in the result equals the corresponding coefficient in the objective
function, i.e., we want y > A = c> . We associate y1 with (4.2), y2 with (4.3), y3 with (4.4),
y4 with (4.5), and y5 with (4.6). Notice that the yi must be non-negative because we are
multiplying an inequality of the system Ax ≤ b, i.e., if a multiplier yi were negative we
change the corresponding inequality from “≤” to “≥”. Now y1 (4.2) + y2 (4.3) + y3 (4.4) +
y4 (4.5) + y5 (4.6) evaluates to

y1 (4x1 − 1x2 ) + y2 (2x1 + x2 ) + y3 (−5x1 + 2x2 ) + y4 (−x1 ) + y5 (−x2 )


≤ y1 8 + y2 10 + y3 2 + y4 0 + y5 0,

where rearranging yields

(4y1 + 2y2 − 5y3 − y4 )x1 + (−y1 + y2 + 2y3 − y5 )x2 ≤ 8y1 + 10y2 + 2y3 + 0y4 + 0y5

32
and want to find values for y1 , y2 , y3 , y4 , y5 ≥ 0 that satisfy:

1 · x1 + 1 · x2

=
(4y1 + 2y2 − 5y3 − y4 )x1 + (−y1 + y2 + 2y3 − y5 )x2 ≤ 8y1 + 10y2 + 2y3 + 0y4 + 0y5

Of course, we are interested in the best choice for y = (y1 , y2 , y3 , y4 , y5 ) ≥ 0 the approach
can give. This means that we want to minimize the upper bound 8y1 +10y2 +2y3 +0y4 +0y5 .
We simply write down this task as a mathematical program, which turns out to be an LP.

minimize 8y1 + 10y2 + 2y3 + 0y4 + 0y5 (4.7)


subject to 4y1 + 2y2 − 5y3 − y4 = 1 (4.8)
− y1 + y2 + 2y3 − y5 = 1 (4.9)
y1 , y2 , y3 , y4 , y5 ≥ 0 (4.10)

Further note that the new objective function is the right hand side (8, 10, 2, 0, 0)> of the
original LP and that the new right hand side is the objective function (1, 1)> of the original
LP. Thus the above LP is of the form

min{y > b : y > A = c> , y ≥ 0}.

Notice that there is a feasible solution x = (2, 6)> for the original LP that gives
c> x = 8. Further note that the multipliers y = (0, 7/9, 1/9, 0, 0)> yield y > b = 8, i.e.,

c> x = y > b.

Hence we have a certificate that the solution x = (2, 6) is indeed optimal (because we
have a matching upper bound). Not surprisingly this is no exception but the principal
statement of the strong duality theorem.

Weak and Strong Duality


Given an LP
P = max{c> x : Ax ≤ b}
called primal, we define the dual

D = min{y > b : y > A = c> , y ≥ 0}.

Lemma 4.7. The dual of the dual of an LP is (equivalent to) the original LP.

Now we can say that the LPs P and D are dual to each other or a primal-dual pair.
The following forms of primal-dual pairs are standard:

max{c> x : Ax ≤ b} ∼ min{y > b : y > A = c> , y ≥ 0}


max{c> x : Ax ≤ b, x ≥ 0} ∼ min{y > b : y > A ≥ c> , y ≥ 0}
max{c> x : Ax = b, x ≥ 0} ∼ min{y > b : y > A ≥ c> }

The following lemma is called weak duality.

Lemma 4.8 (Weak Duality). Let x and y be respective feasible solutions of the primal-dual
pair P = max{c> x : Ax ≤ b} and D = min{y > b : y > A = c> , y ≥ 0}. Then c> x ≤ y > b.

33
Proof. c> x = (y > A)x = y > (Ax) ≤ y > b.

The following strong duality theorem is the most important result in LP theory and
the basis for a lot of algorithms for COPs.
Theorem 4.9 (Strong Duality). For any primal-dual pair P = max{c> x : Ax ≤ b} and
D = min{y > b : y > A = c> , y ≥ 0} we have:
(1) If P and D have respective optimum solutions x and y, say, then

c> x = y > b.

(2) If P is unbounded, then D is infeasible.

(3) If P is infeasible, then D is infeasible or unbounded.

Before we prove the theorem, we establish the fundamental theorem of linear inequali-
ties. The heart of the proof actually gives a basic version of the Simplex algorithm. The
result also implies Farkas’ Lemma.
Theorem 4.10. Let a1 , . . . , am , b be vectors in n-dimensional space. Then
either (I): b = m
P
i=1 ai λi with λi ≥ 0 for i = 1, . . . , m,

or (II): there is a hyperplane {x : c> x = 0}, containing t − 1 linearly independent


vectors from a1 , . . . , am such that c> b < 0 and c> a1 , . . . , c> am ≥ 0, where t =
rank{a1 , . . . , am , b}.

Proof. We may assume that a1 , . . . , am span the n-dimensional space. Clearly, (I) and (II)
exclude each other as we would otherwise have the contradiction

0 > c> b = λ1 c> a1 + · · · + λm c> am ≥ 0.

To see that at least one of (I) and (II) holds, choose linearly independent ai1 , . . . , ain from
a1 , . . . , am and set B = {ai1 , . . . , ain }. Next apply the following iteration:
(i) Write b = λi1 ai1 + · · · + λin ain . If λi1 , . . . , λin ≥ 0 we are in case (I).

(ii) Otherwise, choose the smallest h among i1 , . . . , in with λh < 0. Let {x : c> x = 0}
be the hyperplane spanned by B − {ah }. We normalize c so that c> ah = 1. (Hence
c> b = λh < 0.)

(iii) If c> a1 , . . . , c> am ≥ 0 we are in case (II).

(iv) Otherwise, choose the smallest s such that c> as < 0. Then replace B by (B −{ah })∪
{as }. Restart the iteration anew.
We are finished if we have shown that this process terminates. Let Bk denote the set
B as it is in the k-th iteration. If the process does not terminate, then Bk = B` for some
k < ` (as there are only finitely many choices for B). Let r be the highest index for which
ar has been removed from B at the end of one of the iterations k, k + 1, . . . , ` − 1, say in
iteration p. As Bk = B` , we know that ar also has been added to B in some iteration q
with k ≤ q ≤ `. So

Bp ∩ {ar+1 , . . . , am } = Bq ∩ {ar+1 , . . . , am }.

34
Let Bp = {ai1 , . . . , ain }, b = λi1 ai1 + · · · + λin ain , and let d be the vector c found in
iteration q. Then we have the contradiction

0 > d> b = d> (λi1 ai1 + · · · + λin ain ) = λi1 d> ai1 + · · · + λin d> ain + · · · + λin d> ain > 0,

where the second inequality follows from: If ij < r then λij ≥ 0, d> aij ≥ 0, if ij = r then
λij < 0, d> aij < 0, and if ij > r then d> aij = 0.

Lemma 4.11 (Farkas’ Lemma). There is a vector


 
x
 with Ax ≤ b y ≥ 0
 with y > A = 0
x ≥ 0 with Ax ≤ b if and only if y > b ≥ 0 for all y≥0 with y > A ≥ 0 .
 
x ≥ 0 with Ax = b y with y > A ≥ 0
 

Proof. We first show the case x ≥ 0 with Ax = b if and only if y > b ≥ 0 for each y with
y > A ≥ 0.
Necessity is clear since y > b = y > (Ax) ≥ 0 for all x and y with x ≥ 0, y > A ≥ 0, and
Ax = b. For suffience, assume that there is no x ≥ 0 with Ax = b. Then, by Theorem 4.10
and denoting a1 , . . . , am be the colums of A, there is a hyperplane {x : y > x = 0} with
y > b < 0 for some y with y > A ≥ 0.
For the case x with Ax ≤ b if and only if y > b ≥ 0 for each y ≥ 0 with y > A = 0
consider A0 = [I, A, −A]. Observe that Ax ≤ b has a solution x if and only if A0 x0 = b has
a solution x0 ≥ 0. Now apply what we have just proved.
For the case x ≥ 0 with Ax ≤ b if and only if y > b ≥ 0 for each y ≥ 0 with y > A ≥ 0
consider A0 = [I, A]. Observe that Ax ≤ b has a solution x ≥ 0 if and only if A0 x0 = b has
a solution x0 ≥ 0. Now apply what we have just proved.

Proof of Theorem 4.9. For (1) both optima exist. Thus, if Ax ≤ b and y ≥ 0, y > A = c> ,
then c> x = y > Ax ≤ y > b. Now it suffices to show that there are x, y such that Ax ≤ b,
y ≥ 0, y > A = c> , c> x ≥ y > b, i.e., that
   
A 0 b
−c> b> 
 
 x ≤ 0 
 
there are x, y such that y ≥ 0 and  0 > >
A  y  c 
0 −A > −c>

By Lemma 4.11 this is equivalent to: If u, λ, v, w ≥ 0 with uA − λc> = 0 and λb> + vA> −
wA> ≥ 0 then ub + vc − wc ≥ 0.
Let u, λ, v, w satisfy this premise. If λ > 0 then ub = λ−1 λb> u> ≥ λ−1 (w − v)A> u> =
λ−1 λ(w − v)c = (w − v)c. If λ = 0, let Ax0 ≤ b and y0 ≥ 0, y0> A = c> . (x0 , y0 exist since
P and D are not empty.) Then ub ≥ uAx0 = 0 ≥ (w − v)A> y0> = (w − v)c.
The claim (2) directly follows from Lemma 4.8. For (3), if D is infeasible there is
nothing to show. Thus let D be feasible. From Lemma 4.11 we get: Since Ax ≤ b is
infeasible, there is a vector y ≥ 0 with y > A = 0 and y > b < 0. Let z be such that
z > A = c> and α > 0. Then αy + z is feasible with objective value αy > b + z > b, which can
be made arbitrarily small since y > b < 0 and α > 0.

The theorem has a lot of implications but we only list two of them. The first one is
called complementary slackness (and gives another way of proving optimality).

35
Corollary 4.12. Let max{c> x : Ax ≤ b} and min{y > b : y > A = c, y ≥ 0} be a primal-
dual pair and let x and y be respective feasible solutions. Then the following statements
are equivalent:

(1) x and y are both optimum solutions.

(2) c> x = y > b.

(3) y > (b − Ax) = 0.

Secondly, the fact that a system Ax ≤ b is infeasible can be proved by giving a vector
y ≥ 0 with y > A = 0 and y > b < 0 (Farkas’ Lemma).

36
Part II

Approximation Algorithms

37
Chapter 5

Knapsack

This chapter is concerned with the Knapsack problem. This problem is of interest in
its own right because it formalizes the natural problem of selecting items so that a given
budget is not exceeded but profit is as large as possible. Questions like that often also
arise as subproblems of other problems. Typical applications include: option-selection in
finance, cutting, and packing problems.
In the Knapsack problem we are given a budget W and n items. Each item j comes
along with a profit cj and a weight wj . We are asked to choose a subset of the items as to
maximize total profit but the total weight not exceeding W .

Example 5.1. We are given an amount of W and we wish to buy a subset of n items
and sell those later on. Each such item j has cost wj but yields profit cj . The goal is to
maximize the total profit. Consider W = 100 and the following profit-weight table:
j cj wj
1 150 100
2 2 1
3 55 50
4 100 50
Our choice of purchased items must not exceed our capital W . Thus the feasible solu-
tions are {1}, {2}, {3}, {4}, {2, 3}, {2, 4}, {3, 4}. Which is the best solution? Evaluating all
possibilities yields that {3, 4} gives 155 altogether which maximizes our profit.

Problem 5.1 Knapsack


Instance. Non-negative integral vectors c ∈ Nn , w ∈ Nn , and an integer W .

Task. Solve the problem


n
X
maximize val(x) = cj xj ,
j=1
n
X
subject to wj xj ≤ W,
j=1

xj ∈ {0, 1} j = 1, . . . n.

38
For anPitem j the quantity cj is called its profit. The profit of a vector x ∈ {0, 1}n is
val(x) = nj=1 cj xj .
n
The number wj is calledPn the weight of item j. The weight of a vector x ∈ {0, 1}
is given by weight(x) = j=1 wjP xj . In order to obtain a non-trivial problem we assume
wj ≤ W for all j = 1, . . . , n and nj=1 wj > W throughout.
Knapsack is NP-hard which means that “most probably”, there is no polynomial
time optimization algorithm for it. However, in Section 5.1 we derive a simple 1/2-
approximation algorithm. In Section 5.3 we can even improve on this by giving a polynomial-
time 1 − ε-approximation algorithm (for every fixed ε > 0).

5.1 Fractional Knapsack and Greedy


A direct relaxation of Knapsack as an LP is often referred to as the Fractional Knap-
sack problem:

n
X
maximize val(x) = cj xj ,
j=1
n
X
subject to wj xj ≤ W,
j=1

0 ≤ xj ≤ 1 j = 1, . . . , n.

This problem is solvable in polynomial time quite easily. The proof of the observation
below is left as an exercise.
Observation 5.2. Let c, w ∈ Nn be non-negative integral vectors with
c1 c2 cn
≥ ≥ ··· ≥
w1 w2 wn
and let
j
( )
X
k = min j ∈ {1, . . . , n} : wi > W .
i=1
Then an optimum solution for the Fractional Knapsack problem is given by

xj = 1 for j = 1, . . . , k − 1,
W − k−1
P
i=1 wi
xj = for j = k, and
wk
xj = 0 for j = k + 1, . . . , n.

The ratio cj /wj is called the efficiency of item j. The item number k, as defined above,
is called the break item.
Now we turn our attention back to the original Knapsack problem. We may assume
that the items are given in non-increasing order of efficiency. Observation 5.2 suggests the
following simple algorithm: xj = 1 for j = 1, . . . , k − 1, xj = 0 for j = k, . . . , n.
Unfortunately, the approximation ratio of this algorithm can be arbitrarily bad as the
example below shows. The problem is that more efficient items can “block” more profitable
ones.

39
Example 5.3. Consider the following instance, where W is a sufficiently large integer.
j cj wj cj /wj
1 1 1 1
2 W −1 W 1 − 1/W

The algorithm chooses item 1, i.e., the solution x = (1, 0) and hence val(x) = 1. The
optimum solution is x∗ = (0, 1) and thus val(x∗ ) = W − 1. The approximation ratio of
the algorithm is 1/(W − 1), i.e., arbitrarily bad. However, this natural algorithm can be
turned into a 1/2-approximation.

Algorithm 5.1 Greedy


Integer W , vectors c, w ∈ Nn with wj ≤ W ,
P
Input. j wj > W , and c1 /w1 ≥ · · · ≥
cn /wn .

Output. Vector x ∈ {0, 1}n such that weight(x) ≤ W .


Pj
Step 1. Define k = min{j ∈ {1, . . . , n} : i=1 wi > W }.

Step 2. Let x and y be the following two vectors: xj = 1 for j = 1, . . . , k − 1, xj = 0 for


j = k, . . . , n, and yj = 1 for j = k, yj = 0 for j 6= k.

Step 3. If val(x) ≥ val(y) return x otherwise return y.

Theorem 5.4. The algorithm Greedy is a 1/2-approximation for Knapsack.

Proof. The value obtained by the Greedy algorithm is equal to max{val(x), val(y)}.
Let x∗ be an optimum solution for the Knapsack instance. Since every solution
that is feasible for the Knapsack instance is also feasible for the respective Fractional
Knapsack instance we have that

val(x∗ ) ≤ val(z ∗ ),

where z ∗ is the respective optimum solution for Fractional Knapsack. Observe that it
has the structure z ∗ = (1, . . . , 1, α, 0, . . . , 0), where α ∈ [0, 1) is at the break item k. The
solutions x and y are x = (1, . . . , 1, 0, 0, . . . , 0) and y = (0, . . . , 0, 1, 0, . . . , 0).
In total we have

val(x∗ ) ≤ val(z ∗ ) = val(x) + αck ≤ val(x) + val(y) ≤ 2 max{val(x), val(y)}

which implies the approximation ratio of 1/2.

5.2 Pseudo-Polynomial Time Algorithm


Here we give a pseudo-polynomial time algorithm that solves Knapsack optimally by
using dynamic programming. The term pseudo-polynomial means polynomial if the input
is given in unary encoding (and thus exponential if the input is given in binary encoding).
The idea is the following: Suppose you restrict yourself to choose only among the
first j items, for some integer j ∈ {0, . . . , n}. So all the solutions x you consider have
the form xi ∈ {0, 1} for i = 1, . . . , j and xi = 0 for i = j + 1, . . . , n. With abuse of

40
notation write x ∈ {0, 1}j 0n−j . Now the variable mj,k equals the minimum total weight
of such a solution x with weight(x) ≤ W and val(x) = k. That is, after defining the set
Wj,k = {weight(x) : weight(x) ≤ W, val(x) = k, x ∈ {0, 1}j 0n−j } we require

mj,k = inf Wj,k .

(Recall that for any finite set S of integers inf S = min S if S 6= ∅ and inf S = ∞,
otherwise.) P
Let C be any upper bound on the optimum profit, for example C = i ci . Clearly, the
value of an optimum solution for Knapsack is the largest value k ∈ {0, . . . , C} such that
mn,k < ∞. The algorithm Dynamic Programming Knapsack recursively computes the
values for mj,k and then returns the optimum value for the given Knapsack instance. In
the algorithm below, the variables x(j, k) are n-dimensional vectors that store the solutions
corresponding to mj,k , i.e., with weight equal to mj,k and value k.

Algorithm 5.2 Dynamic Programming Knapsack


Input. Integers W, C, vectors w, c ∈ Nn .

Output. Vector x ∈ {0, 1}n such that weight(x) ≤ W .

Step 1. Set m0,0 = 0, m0,k = ∞ for k = 1, . . . , C, and x(0, 0) = 0.

Step 2. For j = 1, . . . , n and k = 0, . . . , C do


(
mj−1,k−cj + wj if cj ≤ k and mj−1,k−cj + wj ≤ min{W, mj−1,k },
mj,k =
mj−1,k otherwise.

If the first case applied set x(j, k)i = x(j − 1, k − cj )i for i 6= j and x(j, k)j = 1.
Otherwise set x(j, k) = x(j − 1, k).

Step 3. Determine the largest k ∈ {0, . . . , C} such that mn,k < ∞. Return x(n, k).

Theorem 5.5. The Dynamic Programming Knapsack algorithm computes the op-
timum value of the Knapsack instance W , w, c ∈ Nn in time O (nC), where C is an
arbitrary upper bound on this optimum value.
Proof. The running time is obvious. For the correctness we prove that the values mj,k
computed by the algorithm satisfy

mj,k = inf Wj,k

by induction on j. Here Wj,k = {weight(x) : weight(x) ≤ W, val(x) = k, x ∈ {0, 1}j 0n−j }


by definition.
The base case j = 0 is clear. For the inductive case first consider a situation when the
algorithm sets
mj,k = mj−1,k−cj + wj ,
i.e. we “take” the j-th item. Let y = x(j − 1, k − cj ) be the solution that corresponds to
mj−1,k−cj . The solution x = x(j, k) that corresponds to mj,k is obtained from y by setting
xi = yi for i 6= j and xj = 1. The value of x is val(x) = k. By definition of the algorithm
we have weight(x) = weight(y) + wj = mj−1,k−cj + wj ≤ W and thus x ∈ Wj,k .

41
By construction of the algorithm and induction hypothesis we have weight(x) ≤
inf Wj−1,k and weight(x) = wj + inf Wj−1,k−cj . That is, the weight of x is at most the
weight of any solution without the j-th item and at most the weight of any solution
including the j-th item. Hence mj,k = inf Wj,k .
In the other situation, when the algorithm sets

mj,k = mj−1,k ,

then either cj > k and hence no solution with value equal to k can contain the j-th item,
or mj−1,k + wj > W , i.e., adding the j-th item is infeasible, or mj−1,k + wj > inf Wj−1,k ,
i.e., there is a solution with less weight and still value equal to k.

5.3 Fully Polynomial-Time Approximation Scheme


Here we give a fully polynomial time approximation scheme (FPTAS), i.e., we show that for
every fixed ε > 0 there is an 1 − ε-approximation algorithm that runs in time polynomial
in the input size and 1/ε. From a complexity-theoretic point of view this is the best that
can be hoped for: Assuming P 6= NP there is no polynomial time algorithm that solves
Knapsack optimally on every instance, but the FPTAS delivers solutions with arbitrarily
good approximation guarantees in polynomial time. (Unfortunately not many problems
admit an FPTAS.)
A common theme in constructing FPTASs is the following: First find an algorithm
that solves the problem exactly (mostly using the dynamic programming paradigm). This
algorithm usually has pseudo-polynomial or even exponential running time. Second con-
struct an algorithm for “rounding” input-instances, i.e., reducing the input-size. This
modification reduces the running time but may lead to inaccurate solutions.
The running time of Dynamic Programming Knapsack is O (nC). If we divide we
profit cj of each item by a number t and round the result down, then this improves the
running time of Dynamic Programming Knapsack by a factor of t to O (nC/t) but
may yield suboptimal solutions.

Algorithm 5.3 Knapsack FPTAS


Input. Integer W , vectors w, c ∈ Nn , a number ε > 0.

Output. Vector x ∈ {0, 1}n such that weight(x) ≤ W .

Step 1. Run Greedy on the instance W, w, c and let x be the solution. If val(x) = 0 then
return x.

Step 2. Set t = max{1, εval(x)/n} and set


jc k
j
c0j = for j = 1, . . . , n.
t

Step 3. Set C = 2val(x)/t and apply the Dynamic Programming Knapsack algorithm
on the instance W, C, w, c0 and let y be the solution obtained.

Step 4. If val(x) ≥ val(y) return x otherwise y.

42
Theorem 5.6. For every fixed ε > 0, the Knapsack FPTAS algorithm is a 1 − ε-
2

approximation algorithm with running time O n /ε .

Proof. The value of the solution returned by the algorithm is equal to max{val(x), val(y)}.
Let x∗ be an optimum solution for the instance W, w, c. By Theorem 5.4 we have 2val(x) ≥
val(x∗ ) and hence the choice C = 2val(x)/t is a legal upper bound for the optimum value of
the rounded instance W, w, c0 . By Theorem 5.5 y is an optimum solution for this instance
and we have
n
X n
X n
X
val(y) = cj yj ≥ tc0j yj = t c0j yj
j=1 j=1 j=1
Xn Xn Xn
≥t c0j x∗j = tc0j x∗j > (cj − t)x∗j ≥ val(x∗ ) − nt.
j=1 j=1 j=1

If t = 1 then y is optimal by Theorem 5.5. Otherwise the above inequality and the choice
of t yields val(y) ≥ val(x∗ ) − εval(x) and hence

val(x∗ ) ≤ val(y) + εval(x) ≤ (1 + ε) max{val(x), val(y)}

which yields the approximation guarantee 1 − ε/(1 + ε).


The running time of Dynamic Programming Knapsack on the rounded instance is
   2
nval(x) n
O (nC) = O =O ,
t ε

where we have used the definition of t: If t = 1 then val(x) ≤ n/ε and otherwise t =
εval(x)/n. This running time dominates the time needed for the other steps.

43
Chapter 6

Set Cover

The Set Cover problem this chapter deals with is again a very simple to state – yet quite
general – NP-hard combinatorial problem. It is widely applicable in sometimes unexpected
ways. The problem is the following: We are given a set U (called universe) of n elements,
a collection of sets S = {S1 , . . . , Sk } where Si ⊆ U , and a cost function c : S → R+ .
The task is to find a minimum cost subcollection S 0 ⊆ S that covers U , i.e., such that
∪S∈S 0 S = U .

Example 6.1. Consider this instance: U = {1, 2, 3}, S = {S1 , S2 , S3 } with S1 = {1, 2},
S2 = {2, 3}, S3 = {1, 2, 3} and cost c(S1 ) = 10, c(S2 ) = 50, and c(S3 ) = 100. These
collections cover U : {S1 , S2 }, {S3 }, {S1 , S3 }, {S2 , S3 }, {S1 , S2 , S3 }. The cheapest one is
{S1 , S2 } with cost equal to 60.

For each set S, we associate a variable xS ∈ {0, 1} that indicates of we want to choose
S or not. We may thus write solutions for Set Cover as a vector x ∈ {0, 1}k . With this,
we write Set Cover as a mathematical program.

Problem 6.1 Set Cover


Instance. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost
function c : S → R.

Task. Solve the problem


X
minimize val(x) = c(S)xS ,
S∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ∈ {0, 1} S ∈ S.

Define the frequency of an element to be the number of sets it is contained in. Let
f denote the frequency of the most frequent element. In this chapter we present several
algorithms that either achieve approximation ratio O (log n) or f . Why are we interested
in a variety algorithms? Is one algorithm not sufficient? Yes, but here the focus is on the
techniques that yield these algorithms.

44
6.1 Greedy Algorithm
The Greedy algorithm follows the natural approach of iteratively choosing the most
cost-effective set and remove all the covered elements until all elements are covered. Let
C be the set of elements already covered at the beginning of an iteration. During this
iteration define the cost-effectiveness of a set S as c(S)/|S − C|, i.e., the average cost at
which it covers new elements. For later reference, the algorithm sets the price at which it
coveredPan element equal to the cost-effectiveness of the covering set. Further recall that
n
Hn = i=1 1/i is called the n-th Harmonic number and that log n ≤ Hn ≤ log n + 1.

Algorithm 6.1 Greedy


Input. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost function
c : S → R.

Output. Vector x ∈ {0, 1}k

Step 1. C = ∅, x = 0.

Step 2. While C 6= U do the following:

(a) Find the most cost-effective set in the current iteration, say S.
(b) Set xS = 1 and for each e ∈ S − C set price(e) = c(S)/|S − C|.
(c) C = C ∪ S.

Step 3. Return x.

Theorem 6.2. The Greedy algorithm is an Hn -approximation algorithm for the Set
Cover problem.

It is an exercise to show that this bound is tight.

Direct Analysis
The following lemma is crucial for the proof of the approximation-guarantee. Number the
elements of U in the order in which they were covered by the algorithm, say e1 , . . . , en .
Let x∗ be an optimum solution.

Lemma 6.3. For each i ∈ {1, . . . , n}, price(ei ) ≤ val(x∗ )/(n − i + 1).

Proof. In any iteration, the leftover sets of the optimal solution x∗ can cover the remaining
elements at a cost of at most val(x∗ ). Therefore, among these, there must be one set
having cost-effectiveness of at most val(x∗ )/|U − C|. In the iteration in which element ei
was covered, U − C contained at least n − i + 1 elements. Since ei was covered by the most
cost-effective set in this iteration, we have that

val(x∗ ) val(x∗ )
price(ei ) ≤ ≤
|U − C| n−i+1

which was claimed.

45
Proof of Theorem 6.2. Since the cost of each set is distributed evenly among the new
elements covered, the total cost of the set cover picked is
n
X
val(x) = price(ei ) ≤ val(x∗ )Hn ,
i=1

where we have used Lemma 6.3.

Dual-Fitting Analysis
Here we will give an alternative analysis of the Greedy algorithm for Set Cover. We
will use the dual fitting method, which is quite general and helps to analyze a broad variety
of combinatorial algorithms.
For sake of exposition we consider a minimization problem, but the technique works
similarly for maximization. Consider an algorithm Alg which does the following:
(1) Let (P ) be an integer programming formulation of the problem of interest. We are
interested in its optimal solution x∗ , respectively its objective value val(x∗ ). Let (D)
be the dual of a linear programming relaxation of (P ).
(2) The algorithm Alg computes a feasible solution x for (P ) and a “solution” y for (D),
where we allow that y is infeasible for (D). But the algorithm has to ensure that
val(x) ≤ val(y),
where val is the objective function of (P ) and val is the objective function of (D).
(3) Now divide the entries of y by a certain quantity α until y 0 = y/α is feasible for (D).
(The method of dual fitting is applicable only if this property can be ensured.) Then
val(y 0 ) is a lower bound for val(x∗ ) by weak duality, i.e.,
val(y 0 ) ≤ val(x∗ )
by Lemma 4.8.
(4) Putting these things together, we obtain the approximation guarantee of α by
val(x) ≤ val(y) = val(αy 0 ) = αval(y 0 ) ≤ αval(x∗ ).

Now we apply this recipe to Set Cover and consider the Greedy algorithm. For
property (1) we use our usual formulation
X
minimize c(S)xS , (P)
S∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ∈ {0, 1} S ∈ S.
When we relax the constraints xS ∈ {0, 1} to 0 ≤ xS ≤ 1 and dualize the corresponding
linear program we find
X
maximize ye , (D)
e∈U
X
subject to ye ≤ c(S) S ∈ S,
e∈S
ye ≥ 0.

46
This dual can be derived purely mechanically (by applying the primal-dual-definition and
rewriting constraints if needed), but this program also has an intuitive interpretation. The
constraints of (D) state that we want to “pack stuff” into each set S such that the cost
c(S) of each set is not exceeded, i.e., the sets are not overpacked. We seek to maximize
the total amount packed.
How about property (2)? The algorithm Greedy computes a certain feasible solution
x for (P ), i.e., a solution xS = 1 if the algorithm picks set S and xS = 0 otherwise. What
about the vector y? Define the following vector: For each e ∈ U set ye = price(e), where
price(e) is the value computed during the execution of the algorithm.
By construction of the algorithm we have
X X X
val(x) = c(S)xS = price(e) = ye = val(y),
S∈S e∈U e∈U

i.e., Greedy satisfies property (2) of the dual fitting method (even with equality).
For property (3) the following result is useful.

Lemma 6.4. For every S ∈ S we have that


X
ye ≤ Hn c(S).
e∈S

Proof. Let S ∈ S with, say, m elements. Consider these in the ordering the algorithm
covered them, say, e1 , . . . , em . At the iteration when ei gets covered S contains m − i + 1
uncovered elements. Since Greedy chooses the most cost-effective set we have that
c(S)
price(ei ) ≤ ,
m−i+1
i.e., the cost-effectiveness of the set the algorithm chooses can only be smaller than the
cost-effectiveness of S. (Be aware that “smaller” is “better” here.)
Summing over all elements gives
m m
X X X 1
ye = price(ei ) ≤ c(S) = c(S)Hm ≤ c(S)Hn
m−i+1
e∈S i=1 i=1

as claimed.

Now we are in position to finalize the dual-fitting analysis using property (4).

Proof of Theorem 6.2. Define the vector y 0 = y/Hn , where y is defined above. Observe
that for each set S ∈ S we have
X X ye 1 X
ye0 = = ye ≤ c(S)
Hn Hn
e∈S e∈S e∈S

using Lemma 6.4. That means y 0 is feasible for (D). Using the property (4) of the dual
fitting method proves the approximation guarantee of at most Hn .

47
6.2 Primal-Dual Algorithm
The primal-dual schema introduced here is the method of choice for designing approxi-
mation algorithms because it often gives algorithms with good approximation guarantees
and good running times. After introducing the ideas behind the method, we will use it to
design a simple factor f algorithm, where f is the frequency of the most frequent element.
The general idea is to work with an LP-relaxation of an NP-hard problem and its dual.
Then the algorithm iteratively changes a primal and a dual solution until the relaxed
primal-dual complementary slackness conditions are satisfied.

Primal-Dual Schema
Consider the following primal program:
n
X
minimize val(x) = cj xj ,
j=1
n
X
subject to aij xj ≥ bi i = 1, . . . , m,
j=1

xj ≥ 0 j = 1, . . . , n.

The dual program is:


m
X
maximize val(y) = bi yi ,
i=1
m
X
subject to aij yi ≤ cj j = 1, . . . , n,
i=1
yi ≥ 0 i = 1, . . . , m.

Most known approximation algorithms using the primal-dual schema run by ensuring one
set of conditions and suitably relaxing the other. We will capture both situations by
relaxing both conditions. If primal conditions are to be ensured, we set α = 1 below, and
if dual conditions are to be ensured, we set β = 1.

Primal Complementary Slackness Conditions. Let α ≥ 1. For each 1 ≤ j ≤ n:


m
X
either xj = 0 or cj /α ≤ aij yi ≤ cj .
i=1

Dual Complementary Slackness Conditions. Let β ≥ 1. For each 1 ≤ i ≤ m:


n
X
either yi = 0 or bi ≤ aij xj ≤ βbi .
j=1

Lemma 6.5. If x and y are primal and dual feasible solutions respectively satisfying the
complementary slackness conditions stated above, then

val(x) ≤ αβval(y).

48
Proof. We calculate directly using the slackness conditions and obtain
n n m
!
X X X
val(x) = cj xj ≤ α aij yi xj
j=1 j=1 i=1
 
m
X n
X m
X
=α  aij xj  yi ≤ αβ bi yi = val(y)
i=1 j=1 i=1

which was claimed.

The algorithm starts with a primal infeasible solution and a dual feasible solution;
usually these are x = 0 and y = 0 initially. It iteratively improves the feasibility of the
primal solution and the optimality of the dual solution ensuring that in the end a primal
feasible solution is obtained and all conditions stated above, with a suitable choice for α
and β, are satisfied. The primal solution is always extended integrally, thus ensuring that
the final solution is integral. The improvements to the primal and the dual go hand-in-
hand: the current primal solution is used to determine the improvement to the dual, and
vice versa. Finally, the cost of the dual solution is used as a lower bound on the optimum
value, and by Lemma 6.5, the approximation guarantee of the algorithm is αβ.

Primal-Dual Algorithm
Here we derive a factor f approximation algorithm for Set Cover using the primal-dual
schema. For this algorithm we will choose α = 1 and β = f . We will work with the
following primal LP for Set Cover
X
minimize val(x) = c(S)xS ,
S∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ≥ 0 S ∈ S.

and its dual


X
maximize val(y) = ye ,
e∈U
X
subject to ye ≤ c(S) S ∈ S,
e∈S
ye ≥ 0 e ∈ U.

For these LPs the primal and dual complementary slackness conditions are:

Primal Complementary Slackness Conditions. For each S ∈ S:


X
either xS = 0 or ye = c(S).
e∈S
P
A set S will be said to be tight if e∈S ye = c(S). So, this condition states that:
“Pick only tight sets into the cover.”

49
Dual Complementary Slackness Conditions. For each e ∈ U :
X
either ye = 0 or xS ≤ f.
S:e∈S

Since we will find a 0/1 solution for x, these conditions are equivalent to: “Each
element having non-zero dual value can be covered at most f times.” Since each
element is in at most f sets, this condition is trivially satisfied for all elements.

These conditions suggest the following algorithm:

Algorithm 6.2 Primal-Dual Set Cover


Input. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost function
c : S → R.

Output. Vector x ∈ {0, 1}k

Step 1. x = 0, y = 0. Declare all elements uncovered.

Step 2. Unless all elements are covered, do:

(a) Pick an uncovered element, say e, and raise ye until some set goes tight.
(b) Pick all tight sets S in the cover, i.e., set xS = 1.
(c) Declare all the elements occuring in these sets as covered.

Step 3. Return x.

Theorem 6.6. The algorithm Primal-Dual Set Cover is a f -approximation algorithm


for Set Cover.
Proof. At the end of the algorithm, there will be no uncovered elements. Further no dual
constraint is violated since we pick only tight sets S into the cover and no element e ∈ S
will later on be a candidate for increasing ye . Thus, the primal and dual solutions will both
be feasible. Since they satisfy the primal and dual complementary slackness conditions
with α = 1 and β = f , by Lemma 6.5, the approximation guarantee is f .

Example 6.7. A tight example for this algorithm is provided by the following set system.
The universe is U = {e1 , . . . , en+1 } and S consists of n − 1 sets {e1 , en }, . . . , {en−1 , en } of
cost 1 and one set {e1 , . . . , en+1 } of cost 1 + ε for some small ε > 0. Since en appears in
all n sets, this system has f = n.
Suppose the algorithm raises yen in the first iteration. When yen is raised to 1, all
sets {ei , en }, i = 1, . . . , n − 1 go tight. They are all picked in the cover, thus covering the
elements e1 , . . . , en . In the second iteration yen+1 is raised to ε and the set {e1 , . . . , en+1 }
goes tight. The resulting set cover has cost n + ε, whereas the optimum cover has cost
1 + ε.

6.3 LP-Rounding Algorithms


The central idea behind algorithms that make use of the LP-rounding technique is as
follows: Suppose you have an LP-relaxation of a certain NP-hard problem. Then you can
solve this optimally and try to “round” the optimal fractional solution to an integral one.

50
Here we derive a factor f approximation algorithm for Set Cover but this time by
rounding the fractional solution of an LP to an integral solution (instead of the primal-dual
schema). We consider our usual LP relaxation for Set Cover
X
minimize val(x) = c(S)xS ,
s∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ≥ 0 S ∈ S.

Simple Rounding Algorithm


The idea of the algorithm below is to include those sets S into the cover for which the
corresponding value zS in the optimal solution z of the LP is “large enough”.

Algorithm 6.3 Simple Rounding Set Cover


Input. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost function
c : S → R.

Output. Vector x ∈ {0, 1}k

Step 1. Set x = 0, solve the LP relaxation below, and call the optimal solution z.
X
minimize val(x) = c(S)xS ,
S∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ≥ 0 S ∈ S.

Step 2. For each set S set xS = 1 if zS ≥ 1/f .

Step 3. Return x.

Theorem 6.8. The algorithm Simple Rounding Set Cover is an f -approximation


algorithm for Set Cover.

Proof. Let x be the solution returned by the algorithm and z be the optimal solution of
the LP. Consider an arbitrary element e ∈ U . Since e is in at most f sets, one of these
sets must be pickedPto the extentPof at least 1/f in the fractional solution z. If this were
not the case then S:e∈S zS < S:e∈S 1/f ≤ f · 1/f = 1 yields a contradiction to the
feasibility of z. Thus e is covered due to the definition of the algorithm and x is hence a
feasible cover. We further have xS ≤ f zS and thus

val(x) ≤ f val(z) ≤ f val(x∗ )

where x∗ is an optimal solution for the Set Cover problem.

51
Randomized Rounding
Another natural idea for rounding fractional solutions is to use randomization: For exam-
ple, for the above relaxation, observe that the values zS are between zero and one. We
may thus interpret these values as probabilities for choosing a certain set S.
Here is the idea of the following algorithm: Solve the LP-relaxation optimally and call
the solution z. With probability zS include the set S into the cover.
This basic procedure yields a vector x with expected value equal to the optimal frac-
tional solution value but might not cover all the elements. We thus repeat the procedure
“sufficiently many” times and include a set into our cover if it was included in any of
the iterations. We will show that O (log n) many iterations suffice yielding an O (log n)-
approximation algorithm.

Algorithm 6.4 Randomized Rounding Set Cover


Input. Universe U with n elements, collection S = {S1 , . . . , Sk }, Si ⊆ U , a cost function
c : S → R.

Output. Vector x ∈ {0, 1}k

Step 1. Set x = 0, solve the LP relaxation below, and call the optimal solution z.
X
minimize val(x) = c(S)xS ,
S∈S
X
subject to xS ≥ 1 e ∈ U,
S:e∈S
xS ≥ 0 S ∈ S.

Step 2. Repeat d3 log ne times: For each set S set xS = 1 with probability zS .

Step 3. Return x.

Theorem 6.9. With probability at least 1 − 1/n2 the algorithm Randomized Rounding
Set Cover returns a feasible solution, which is expected d3 log ne-approximate for Set
Cover.
Proof. Let z be an optimal solution for the LP. We estimate the probability that an
element e ∈ U is covered in one iteration in Step 2. Let e be contained in m sets and
let z1 , . . . , zm be the probabilities given in the solution z. Since e is fractionally covered
we have z1 + · · · + zm ≥ 1. With easy but tedious calculus we see that – under this
condition – the probability for e being covered is minimized when the zi are all equal, i.e.,
z1 = · · · = zm = 1/m:
1 m
 
1
Pr [xS = 1] = 1 − (1 − z1 ) · · · · · (1 − zm ) ≥ 1 − 1 − ≥1− .
m e
Each element is covered with probability at least 1 − 1/e. But maybe we have not covered
all elements after d3 log ne iterations. The probability that the element e is not covered at
the end of the algorithm, i.e., after d3 log ne iterations is
 d3 log ne
1 1
Pr [e is not covered] ≤ ≤ 3.
e n

52
Thus the probability that there is an uncovered element is at most
X 1 1
Pr [e is not covered] ≤ n · 3
≤ 2.
n n
e∈U

Hence the retured solution x is feasible with probability at least 1 − 1/n2 .


Consider a single iteration in Step 2 and let y ∈ {0, 1}k be the vector that indicates
which sets are included in this particular iteration. For each set S let yS = 1 with
probability zS . Then we have
X X X
E [val(y)] = E [c(S)yS ] = c(S)Pr [yS = 1] = c(S)zS = val(z).
S∈S S∈S S∈S

Now we consider all iterations in Step 2 and clearly have

E [val(x)] ≤ d3 log ne · E [val(y)] ≤ d3 log ne · val(z) ≤ d3 log ne · val(x∗ ),

where x∗ is an optimal solution for Set Cover. So, the algorithm returns a feasible
solution, with probability at least 1 − 1/n2 , whose expected value is d3 log ne-approximate.

The proof above shows that the algorithm is a d3 log ne-approximation in expectation.
But we can actually state that the approximation ratio is 4 · d3 log ne with probability
around 3/4. Use Markov’s inequality Pr [X > t] ≤ E [X] /t to show

E [val(x)] 1
Pr [val(x) > 4 · d3 log ne · val(z)] ≤ ≤
4 · d3 log ne · val(z) 4

The probability that either not all elements are covered or the obtained solution has value
larger than 4 · d3 log ne times the optimal value is at most 1/n2 + 1/4 ≤ 1/2 for all n ≥ 2.
Thus we have to run the whole algorithm at most two times in expectation to actually get
a 4 · d3 log ne-approximate solution.

53
Chapter 7

Satisfiability

The Satisfyability problem asks if a certain given Boolean formula has a satisfying
assignment, i.e., one that makes the whole formula evaluate to true. There is a related
optimization problem called Maximum Satisfiability. The goal of this chapter is to
develop a deterministic 3/4-approximation algorithm. We first give a corresponding ran-
domized algorithm which will then be derandomized.
We are given the Boolean variables X = {x1 , . . . , xn }, where each xi ∈ {0, 1}. A literal
`i of the variable xi is either xi itself, called a positive literal, or its negation x̄i with truth
value 1 − xi , called a negative literal. A clause is a disjunction C = (`1 ∨ · · · ∨ `k ) of literals
`j of X; their number k is called the size of C. For a clause C let SC+ denote the set of its
positive literals; similarly SC− the set of its negative literals. Let C denote the set of clauses.
A Boolean formula in conjunctive form is a conjunction of clauses F = C1 ∧ · · · ∧ Cm . Each
vector x ∈ {0, 1}n is called a truth assignment. For any clause C and any such assignment
x we say that x satisfies C if at least one of the literals of C evaluates to 1.
The problem Maximum Satisfiability is the following: We are given a formula F
in conjunctive form and for each clause C a weight wC , i.e., a weight function w : C → N.
The objective is to find a truth assignment x ∈ {0, 1}n that maximizes the total weight of
the satisfied clauses. As an important special case: If we set all weights wC equal to one,
then we seek to maximize the number of satisfied clauses.
Now we introduce for each clause C a variable zC ∈ {0, 1} which takes the value one if
and only if C is satisfied under a certain truth assignment x. Now we can formulate this
problem as a mathematical program as follows:

Problem 7.1 Maximum Satisfiability


Instance. Formula F = C1 ∧ · · · ∧ Cm with m clauses over the n Boolean variables
X = {x1 , . . . , xn }. A weight function w : C → N.

Task. Solve the problem


X
maximize val(z) = wC z C ,
C∈C
X X
subject to xi + (1 − xi ) ≥ zC C ∈ C,
+ −
i∈SC i∈SC

zC ∈ {0, 1} C ∈ C,
xi ∈ {0, 1} i = 1, . . . , n.

54
The algorithm we aim for is a combination of two algorithms. One works better
for small clauses, the other for large clauses. Both are initially randomized but can be
derandomized using the method of conditional expectation, i.e., the final algorithm is
deterministic.

7.1 Randomized Algorithm


For each variable xi we define the random variable Xi that takes the value one with a
certain probability pi and zero otherwise. This induces, for each clause C, a random
variable ZC that takes the value one if C is satisfied under a (random) assignment and
zero otherwise.

Algorithm for Large Clauses


Consider this algorithm Randomized Large: For each variable xi with i = 1, . . . , n,
set Xi = 1 independently with probability 1/2 and Xi = 0 otherwise. Output X =
(X1 , . . . , Xn ).
Define the quantity
αk = 1 − 2−k .
Lemma 7.1. Let C be a clause. If size(C) = k then
E [ZC ] = αk .

Proof. A clause C is not satisfied, i.e., ZC = 0 if and only if all its literals are set to zero.
By independence, the probability of this event is exactly 2−k and thus
E [ZC ] = 1 · Pr [ZC = 1] + 0 · Pr [ZC = 0] = 1 − 2−k = αk
which was claimed.

Theorem 7.2. In expectation, the algorithm Randomized Large is a 1/2-approximation


algorithm for Maximum Satisfiability.
Proof. By linearity of expectation, Lemma 7.1, and size(C) ≥ 1 we have
X X 1X 1
E [val(Z)] = wC E [ZC ] = wC αsize(C) ≥ wC ≥ val(z ∗ )
2 2
C∈C C∈C C∈C

where (x∗ , z ∗ ) is an optimal


P solution for Maximum Satisfiability. We have used the
obvious bound val(z ∗ ) ≤ C∈C wC .

Algorithm for Small Clauses


Maybe the most natural linear programming relaxation of the problem is:
X
maximize val(z) = wC z C ,
C∈C
X X
subject to xi + (1 − xi ) ≥ zC C ∈ C,
+ −
i∈SC i∈SC

0 ≤ zC ≤ 1 C∈C
0 ≤ xi ≤ 1 i = 1, . . . , n.

55
In the sequel let (x̄, z̄) denote an optimum solution for this LP.
Consider this algorithm Randomized Small: Determine (x̄, z̄). For each variable xi
with i = 1, . . . , n, set Xi = 1 independently with probability x̄i and Xi = 0 otherwise.
Output X = (X1 , . . . , Xn ).
Define the quantity
1 k
 
βk = 1 − 1 − .
k
Lemma 7.3. Let C be a clause. If size(C) = k then
E [ZC ] = βk z̄C .

Proof. We may assume that the clause C has the form C = (x1 ∨ · · · ∨ xk ); otherwise
rename the variables and rewrite the LP.
The clause C is satisfied if x1 , . . . , xk are not all set to zero. The probability of this
event is
Pk !k
k i=1 (1 − x̄i )
1 − Πi=1 (1 − x̄i ) ≥ 1 −
k
Pk !k
x̄ i
= 1 − 1 − i=1
k
 z̄C k
≥1− 1− .
k
Above we firstly have used the arithmetic-geometric mean inequality, which states that
for non-negative numbers a1 , . . . , ak we have
a1 + · · · + ak √
≥ k a1 · · · · · ak .
k
Secondly the LP guarantees the inequality x̄1 + · · · + x̄k ≥ z̄C .
Now define the function g(t) = 1 − (1 − t/k)k . This function is concave with g(0) = 0
and g(1) = 1 − (1 − 1/k)k which yields that we can bound
g(t) ≥ t(1 − (1 − 1/k)k ) = tβk
for all t ∈ [0, 1].
Therefore  z̄C k
Pr [ZC = 1] ≥ 1 − 1 − ≥ βk z̄C
k
and the claim follows.

Theorem 7.4. In expectation, the algorithm Randomized Small is a 1−1/e-approximation


algorithm for Maximum Satisfiability.
Proof. The function βk is decreasing with k. Therefore if all clauses are of size at most k,
then by Lemma 7.3
X X
E [val(Z)] = wC E [ZC ] ≥ βk wC z̄C = βk val(z̄) ≥ βk val(z ∗ ),
C∈C C∈C

where (x∗ , z ∗ ) is an optimal solution for Maximum Satisfiability. The claim follows
since (1 − 1/k)k < 1/e for all k ∈ N.

56
3/4-Approximation Algorithm
Consider the algorithm Randomized Combine: With probability 1/2 run Randomized
Large otherwise run Randomized Small.

Lemma 7.5. Let C be a clause, then


3z̄C
E [ZC ] ≥ .
4

Proof. Let the random variable B take the value zero if the first algorithm is run, one
otherwise. For a clause C let size(C) = k. By Lemma 7.1 and z̄C ≤ 1

E [ ZC | B = 0] = αk ≥ αk z̄C .

and by Lemma 7.1


E [ ZC | B = 1] ≥ βk z̄C .
Combining we have
z̄C
E [ZC ] = E [ ZC | B = 0] Pr [B = 0] + E [ ZC | B = 1] Pr [B = 1] ≥ (αk + βk ).
2
Inspection shows that αk + βk ≥ 3/2 for all k ∈ N.

Theorem 7.6. In expectation, the algorithm Randomized Combine is a 3/4-approximation


algorithm for Maximum Satisfiability.

Proof. This follows from Lemma 7.5 and linearity of expectation.

7.2 Derandomization
The notion of derandomization refers to “turning” a randomized algorithm into a deter-
ministic one (possibly at the cost of additional running time or deterioration of approxi-
mation guarantee). One of the several available techniques is the method of conditional
expectation.
We are given a Boolean formula F = C1 ∧· · ·∧Cm in conjunctive form over the variables
X = {x1 , . . . , xn }. Suppose we set x1 = 0, then we get a formula F0 over the variables
x2 , . . . , xn after simplification; if we set x1 = 1 then we get a formula F1 .

Example 7.7. Let F = (x1 ∨ x2 ) ∧ (x̄1 ∨ x3 ) ∧ (x1 ∨ x̄4 ) where X = {x1 , . . . , x4 }.

x1 = 0 : F0 = (x2 ) ∧ (x4 )
x1 = 1 : F1 = (x3 )

Applying this recursively, we obtain the tree T (F ) depicted in Figure 7.1. The tree
T (F ) is a complete binary tree with height n+1 and 2n+1 −1 vertices. Each vertex at level i
corresponds to a setting for the Boolean variables x1 , . . . , xi . We label the vertices of T (F )
with their respective conditional expectations as follows. Let X1 = a1 , . . . , Xi = ai ∈ {0, 1}
be the outcome of a truth assignment for the variables x1 , . . . , xi . The vertex corresponding
to this assignment will be labeled

E [ val(Z) | X1 = a1 , . . . , Xi = ai ] .

57
T (F )
F level 0
x1 = 0 x1 = 1

F0 F1 level 1

T (F0) T (F1)

Figure 7.1: Derandomization tree for a formula F .

If i = n, then this conditional expectation is simply the total weight of clauses satisfied by
the truth assignment x1 = a1 , . . . , xn = an .
The goal of the remainder of the section is to show that we can find deterministically
in polynomial time a path from the root of T (F ) to a leaf such that the conditional
expectations of the vertices on that path are at least as large as E [val(Z)]. Obviously, this
property yields the desired: We can construct determistically a solution which is at least
as good as the one of the randomized algorithm in expectation.
Lemma 7.8. The conditional expectation
E [ val(Z) | X1 = a1 , . . . , Xi = ai ]
of any vetex in T (F ) can be computed in polynomial time.
Proof. Consider a vertex X1 = a1 , . . . , Xi = ai . Let F 0 be the Boolean formula obtained
from F by setting x1 , . . . , xi accordingly. F 0 is in the variables xi+1 , . . . , xn .
Clearly, by linearity of expectation, the expected weight of any clause of F 0 under any
random truth assignment to the variables xi+1 , . . . , xn can be computed in polynomial
time. Adding to this the total weight of clauses satisfied by x1 , . . . , xi gives the answer.
Theorem 7.9. We can compute in polynomial time a path from the root to a leaf in T (F )
such that the conditional expectation of each vertex on this path is at least E [val(Z)].
Proof. Consider the conditional expectation at a certain vertex X1 = a1 , . . . , Xi = ai for
setting the next variable Xi+1 . We have that

E [ val(Z) | X1 = a1 , . . . , Xi = ai ]
= E [ val(Z) | X1 = a1 , . . . , Xi = ai , Xi+1 = 0] Pr [Xi+1 = 0]
+ E [ val(Z) | X1 = a1 , . . . , Xi = ai , Xi+1 = 1] Pr [Xi+1 = 1] .
We show that the two conditional expectations with Xi+1 can not be both strictly smaller
than E [ val(Z) | X1 = a1 , . . . , Xi = ai ]. Assume the contrary, then we have

E [ val(Z) | X1 = a1 , . . . , Xi = ai ]
< E [ val(Z) | X1 = a1 , . . . , Xi = ai ] (Pr [Xi+1 = 0] + Pr [Xi+1 = 1])
which is a contradiction since Pr [Xi+1 = 0] + Pr [Xi+1 = 1] = 1.
This yields the existence of such a path can by Lemma 7.8 it can be computed in
polynomial time.
The derandomized version of a randomized algorithm now simply executes these proofs
with the probability distribution as given by the randomized algorithm.

58
Chapter 8

Facility Location

The Metric Facility Location problem was popular in operations research in the 1960s
but no constant factor approximation algorithms were known until 1997. The discovery
of these is due to LP-rounding techniques and the primal-dual schema. In this section we
present a 3-approximate primal-dual algorithm.
Metric Facility Location is the following problem: We are given a complete bipar-
tite graph G = (V, E) with bipartition V = F ∪ C, where F refers to the set of (potential)
facilities and C to the set of cities. Establishing a facility i causes opening cost fi . At-
taching city j to an (opened) facility i yields connection cost cij . We assume that the cij
satisfy the triangle inequality cij ≤ cij 0 + ci0 j 0 + ci0 j for all i, i0 ∈ F and j, j 0 ∈ C. So,
now, the problem is to find a subset I ⊆ F of facilities to open and a mapping a : C → I
for assigning cities to open facilities in a way that each city is connected to at least one
facility as to minimize the total opening and connection cost. We write this task as a
mathematical program, where yi indicates if facility i is open and xij if city j is assigned
to facility i.

Problem 8.1 Metric Facility Location


Instance. Complete bipartite graph G = (F ∪ C, E), weight functions f : F → N, and
c : E → N that satisfies the triangle inequality.

Task. Solve the problem


XX X
minimize val(x, y) = cij xij + fi yi
i∈F j∈C i∈F
X
subject to xij ≥ 1 j∈C “each city connects”
i∈F
yi − xij ≥ 0 i ∈ F, j ∈ C “facility must be open”
xij ∈ {0, 1}
yi ∈ {0, 1}.

The problem Metric Facility Location is NP-hard. Here we will show the following
main result.

Theorem 8.1. There is a 3-approximation algorithm for Metric Facility Location


that runs in O (m log m) time, where m = |F ||C|.

59
An obvious way of relaxing this problem is to replace the constraints xij ∈ {0, 1} and
yi ∈ {0, 1} by 0 ≤ xij ≤ 1 and 0 ≤ yi ≤ 1 respectively. For sake of completeness:
XX X
minimize val(x, y) = cij xij + fi yi (P)
i∈F j∈C i∈F
X
subject to xij ≥ 1 j∈C
i∈F
yi − xij ≥ 0 i ∈ F, j ∈ C
0 ≤ xij ≤ 1
0 ≤ yi ≤ 1.

The dual of this LP can be written as:


X
maximize val(α, β) = αj (D)
j∈C

subject to αj − βij ≤ cij i ∈ F, j ∈ C


X
βij ≤ fi i ∈ F
j∈C

αj ≥ 0 j∈C
βij ≥ 0 i ∈ F, j ∈ C.

8.1 Complementary Slackness


One of the important steps in designing a primal-dual algorithm is to get an intuition what
the dual LP “is doing”. This usually induces a schema how dual variables can “pay” for
primal ones.
We begin with a recapitulation of the complementary slackness conditions, which refers
to Corollary 4.12 of the strong duality theorem and gives one way of proving optimality:

Corollary 8.2. Let max{c> x : Ax ≤ b, x ≥ 0} and min{y > b : y > A ≥ c, y ≥ 0} be


a primal-dual pair and let x and y be respective feasible solutions. Then the following
statements are equivalent:

(1) x and y are both optimum solutions.

(2) c> x = y > b.

(3) (y > A − c)> x = 0.

(4) y > (b − Ax) = 0.

For us, the third and the fourth condition are particularly interesting as they relate
the primal and dual variables at optimal points. The third is called primal complemen-
tary slackness, the forth dual complementary slackness. One way of looking at this dual
condition is to say that “either yi = 0 or bi − (Ax)i = 0”, i.e., “if the dual variable yi is
not zero, then the corresponding primal constraint is satisfied with equality”. Similarly
for the primal condition.
Now we return to Metric Facility Location. Assume for the moment that there
is an integral solution, say (x, y), which is optimal for (P). This solution corresponds to a

60
set I ⊆ F and a mapping a : C → I. Thus, under this solution, yi = 1 if and only if i ∈ I
and xij = 1 if and only if a(j) = i. Let (α, β) be an optimal solution for (D).
Now, for (P) and (D) the primal-dual complementary slackness conditions are:

(1) For all i ∈ F and j ∈ C: either xij = 0 or αj − βij = cij .


P
(2) For all i ∈ F : either yi = 0 or j∈C βij = fi .
P
(3) For all j ∈ C: either αj = 0 or i∈F xij = 1.

(4) For all i ∈ F and j ∈ C: either βij = 0 or yi = xij .

By (2) each open facility i must be “paid” by the dual variables βij , i.e.,
X
βij = fi .
j∈C

By condition (4), if facility i is open, but city j is not assigned to it, i.e., a(j) 6= i, then
we must have yi 6= xij and thus βij = 0. This means that no city contributes to a facility
it is not connecting to.
By condition (1), if for some city j and facility i we have a(j) = i then we must have
αj − βij = cij . Thus we can think of αj = βij + cij as the total price paid by city j, where
βij is its opening cost share and cij its connection cost (paid exclusively).

8.2 Primal-Dual Algorithm


Here we apply the primal-dual schema, i.e., we carry out the following steps: Relax the
primal slackness conditions and use these for an algorithm which ensures dual feasibility
and improves primal optimality.

Relaxing the Slackness Conditions


We will relax the primal slackness conditions as follows: The cities are partitioned into
two sets, directly connected and indirectly connected. Only directly connected cities will
pay for opening facilities, i.e., βij can be non-zero only if j is a directly connected city and
a(j) = i. For an indirectly connected city j, we relax the primal slackness condition to
1
c ≤ αj ≤ ca(j)j .
3 a(j)j
So, with the above intuition this reads: The total price paid by an indirectly connecting
city is at most its direct connection cost, but at least one third of this cost. All other
primal conditions are maintained, i.e., for a directly connecting city j we have

αj − βa(j)j = ca(j)j ,

and for each open facility i, X


βij = fi .
j:a(j)=i

61
Algorithm
The algorithm consists of two phases. In the first phase, the algorithm operates in a
primal-dual fashion. It finds a dual feasible solution and also determines a set of tight
edges and temporarily open facilities Ft . In the second phase the algorithm chooses a
subset I ⊆ Ft of facilities to open permanently, and a mapping a : C → I.

Phase 1. We would like to find as large a dual solution as possible. This motivates the
following underlying process: Each city j raises its dual variable αj until it gets
connected to an open facility, i.e., until αj = cij for some open facility i. All other
primal and dual variables simply respond to this change, trying to maintain feasibility
or satisfying complementary slackness conditions.
A notion of time is defined in this phase, so that each event can be associated with
the time at which it happened; the phase starts at time zero. Initially each city is
defined to be unconnected. Throughout this phase, the algorithm raises the dual
variable αj for each unconnected city at unit rate, i.e., αj will grow by one in unit
time. When αj = cij for some edge ij, the algorithm will declare this edge to be
tight. Henceforth, the dual variable βij will also be raised uniformly, thus ensuring
that the constraint αj − βij ≤ cij in (D) is never violated. At this point in time the
connection cost are paid and the variable βij goes towards paying for the opening
cost of facility i. Each edge ij such that βij > 0 is called special.
P
Facility i is said to be paid for if j βij = fi . If so, the algorithm declares the facility
temporarily open, i.e., i ∈ Ft . Furthermore, all unconnected cities having tight edges
to this facility are declared connected and facility i is declared the connecting witness
for each of these cities. (Notice that the dual variables αj of these cities are not raised
any more.) In the future, as soon as an unconnected city j gets a tight edge to i,
j will also be declared connected and i the connection witness for j. (Notice that
βij = 0 and the edge ij is not special.) When all cities are connected, the first phase
terminates. If several events happen simultaneously, the algorithm executes them in
arbitrary order.
As a side remark, at the end of this phase, a city may have paid towards temporarily
opening several facilities. However, we want to ensure that a city pays only for the
facility that it is eventually connected to. This is ensured in the second phase, which
chooses a set of temporarily open facilities for opening permanently.
Phase 2. Let Ft denote the set of temporarily open facilities and T denote the subgraph
of G induced by all special edges. Let T 2 denote the graph that has an edge uv if
and only if there is a path of length at most two between u and v in T , and let H
be the subgraph of T 2 induced by Ft . Find any maximal independent set in H, say
I. All facilities in the set I are declared open.
For city j, define Fj = {i ∈ Ft : ij is special}. Since I is an independent set, at most
one of the facilities in Fj is opened. If there is a facility i ∈ Fj that is opened, then
set a(j) = i and declare the city j directly connected. Otherwise, consider the tight
edge i0 j such that i0 was the connecting witness for j. If i0 ∈ I, again set a(j) = i0
and declare the city j directly connected (notice that in this case βi0 j = 0). In the
remaining case that i0 6∈ I, let i be any neighbor of i0 in the graph H such that i ∈ I.
Set a(j) = i and declare city j indirectly connected.

The set I and the mapping a : C → I define a primal integral solution: xij = 1 if and
only if a(j) = i and yi = 1 if and only if i ∈ I. The values for αj and βij obtained at the

62
end of the first phase form a dual feasible solution.

Analysis
The crucial result for the analysis, which directly gives the approximation guarantee, is
the following.

Theorem 8.3. The primal and dual solutions constructed by the algorithm satisfy
XX X X
cij xij + 3 fi yi ≤ 3 αj .
i∈F j∈C i∈F j∈C

We will show how the dual variables αj pay for the primal costs of opening facilities
and connecting cities to facilities. Denote by αjf and αjc the contributions of city j to these
two costs respectively; αj = αjf + αjc . If j is indirectly connected then αjf = 0 and αjc = αj .
If j is directly connected then the following must hold:

αj = cij + βij ,

where i = a(j). Now let αjf = βij and αjc = cij .

Lemma 8.4. Let i ∈ I then


αjf = fi .
X

j:a(j)=i

Proof. Since i is temporarily open at the end of phase one, it is completely paid for, i.e.,
X
βij = fi .
j:ij is special

The critical observation is that each city j that has contributed to fi must be directly
connected to i. For each such city, αjf = βij . Any other city j 0 that is connected to facility
i must satisfy αjf0 = 0.

αjf .
P P
Corollary 8.5. i∈I fi = j∈C

Recall that αjf was defined to be 0 for indirectly connected cities. Thus, only the
directly connected cities pay for the cost of opening facilities.

Lemma 8.6. For an indirectly connected city j, cij ≤ 3αjc , where i = a(j).

Proof. Let i0 be the connecting witness for city j. Since j is indirectly connected to i, the
edge ii0 must be an edge in H. In turn, there must be a city, say j 0 , such that ij 0 and
i0 j 0 are both special edges. Let t1 and t2 be the times at which i and i0 were declared
temporarily open during phase 1.
Since edge i0 j is tight, αj ≥ ci0 j . We will show that αj ≥ cij 0 and αj ≥ ci0 j 0 . Then, the
lemma will follow by using the triangle inequality.
Since edges ij 0 and i0 j 0 are tight, αj 0 ≥ cij 0 and αj 0 ≥ ci0 j 0 . Since both these edges are
special, they must both have gone tight before either i or i0 is declared temporarily open.
Consider the time min{t1 , t2 }. Clearly, αj 0 can not be growing beyond this point in time
and we have αj 0 ≤ min{t1 , t2 }. Finally, since i0 is the connecting witness for j, αj ≥ t2 .
Therefore αj ≥ αj 0 , and the required inequalities follow.

63
Proof of Theorem 8.3. For a directly connected city j, cij = αjc ≤ 3αjc , where i = a(j).
With Lemma 8.6 we get XX X
cij xij ≤ 3 αjc .
i∈F j∈C j∈C

Adding to this the equality given in Corollary 8.5 multiplied by 3 yields the claim.

Running Time
Clearly, the total number of edges of the complete bipartite graph G = (F ∪ C, E) is
m = |F ||C|. For the implementation of the algorithm sort all the edges by increasing cost.
This is the ordering in which they go tight. For each facility i, we maintain the number of
cities that are currently contributing to it, and the anticipated time ti , at which it would
be completely paid for if no other event happens on the way. Initially all ti ’s are infinite
and each facility has 0 cities contributing to it. The ti ’s are maintained in a binary heap,
so we can update each one and find the current minimum in O (log |F |) time. Two types
of events happen and they lead to the following updates:

An edge ij goes tight. If facility i is not temporarily open, then it gets one more city
contributing towards its cost. The corresponding amount can easily be computed.
Thus, the anticipated time for facility i to be paid for can be computed in constant
time. The heap can be updated in O (log |F |) time.
If facility i is already temporarily open, city j is declared connected, and αj is not
raised anymore. For each facility i0 that was counting j as a contributor, we need
to decrease the number of contributors by 1 and recompute the anticipated time at
which it gets paid for.

Facility i is completely paid for. In this event, i will be declared temporarily open,
and all cities contributing to i will be declared connected. For each of these cities,
we will execute the second case of the previous event, i.e., update facilities that they
were contributing towards.

Observe that each edge ij will be considered at most twice. First, when it goes tight,
and second when city j is declared connected. For each consideration of this edge, we will
do O (log |F |) work. This discussion of the running time together with Theorem 8.3 yields
Theorem 8.1.

Tight Example
The following familty of examples shows that the analysis of the algorithm is tight.

Example 8.7. There are two facilities with opening cost f1 = ε and f2 = (n + 1)ε. There
are n cities where city one is at distance one from facility one, but the remaining cities are
at distance three from it, and each city is at distance one from facility two. The optimal
solution is to open facility two at total cost (n + 1)ε + n. The algorithm will open facility
one and connect all cities to it at total cost ε + 1 + 3(n − 1).

64
Chapter 9

Makespan Scheduling

In this chapter, we consider the classical Makespan Scheduling problem. We are given
m machines for scheduling, indexed by the set M = {1, . . . , m}. There are furthermore
given n jobs, indexed by the set J = {1, . . . , n}, where job j takes pi,j units of time
if scheduled
P on machine i. Let Ji be the set of jobs scheduled on machine i. Then
`i = j∈Ji pi,j is the load of machine i. The maximum load `max = cmax = maxi∈M `i is
called the makespan of the schedule.
The problem is NP-hard, even if there are only two identical machines. However, we
will derive several constant factor approximations and a PTAS for identical machines and
a 2-approximation for the general case.

9.1 Identical Machines


In the special case of identical machines, we have that pi,j = pj for all i ∈ M and all j ∈ J.
Here pj is called the length of job j.

List Scheduling
As a warm-up we consider the following two heuristics for Makespan Scheduling. The
List Scheduling algorithm works as follows: Determine any ordering of the job set J,
stored in a list L. Starting with all machines empty, determine the machine i with the
currently least load and schedule the respective next job j in L on i. The load of i before
the assignment of j is called the starting time sj of job j and the load of i after the
assignment is called the completion time cj of job j. In the Sorted List Scheduling
algorithm we execute List Scheduling, where the list L consists of the jobs in decreasing
order of length.

Theorem 9.1. The List Scheduling algorithm is a 2-approximation for Makespan


Scheduling on identical machines.

Proof. Let T ∗ be the optimal makespan of the given instance. We show that sj ≤ T ∗ for
all j ∈ J. This implies cj = sj + pj ≤ T ∗ + pj ≤ 2 · T ∗ for all j ∈ J, since we clearly must
have T ∗ ≥ pj for all j ∈ J.
Assume that sj > T ∗ for some j ∈ J. Then we have that the load before the assignment
of j is `i > T ∗ for P
all i ∈ M . Thus the jobs J 0 ⊆ J scheduled before j by the algorithm
have total length j 0 ∈J 0 pj 0 > m · T ∗ . On P
the other hand, since the optimum solution
schedules all jobs J until time T we have j∈J pj ≤ m · T ∗ . A contradiction and the

List Scheduling algorithm must start all jobs not later than time T ∗ .

65
Here we show that Sorted List Scheduling is a 3/2-approximation, but one can
actually prove that the algorithm is a 4/3-approximation.

Theorem 9.2. The Sorted List Scheduling algorithm is a 3/2-approximation for


Makespan Scheduling on identical machines.

Proof. Let T ∗ be the optimal makespan of the given instance. Partition the jobs JL =
{j ∈ J : pj > T ∗ /2} and JS = J − JL , called large and small jobs. Notice that there can
be at most m large jobs: Assume that there are more than m such jobs. Then, in any
schedule, including the optimal one, there must be at least two such jobs scheduled on
some machine. Since the length of a large job is more than T ∗ /2, this contradicts that T ∗
is the optimal makespan.
Since there are at most m large jobs and the algorithm schedules those first and hence
on individual machines, we have that each large job completes not later than T ∗ , i.e.,
cj ≤ T ∗ for all j ∈ JL . Thus, if a job completes later than T ∗ it must be a small job having
length at most T ∗ /2. Since each job starts not later than T ∗ we have cj ≤ T ∗ +pj ≤ 3/2·T ∗
for every small job j ∈ JS .

Polynomial Time Approximation Scheme


In this section we give a polynomial time approximation scheme (PTAS) for Makespan
Scheduling on identical machines. This means, for any error parameter ε > 0, there is
an algorithm which determines a (1 + ε)-approximate solution with running time poly-
nomial in the input
 size, but arbitary in 1/ε. We will give a PTAS with running time
O n2k · dlog2 1/εe , where k = dlog1+ε 1/εe.
The two main ingredients of the algorithm are these:

(1) Firstly, assume that we are given the optimal makespan T ∗ at the outset. Then we
can try to construct a schedule with makespan at most (1 + ε) · T ∗ . But how do
we determine the number T ∗ ? It turns out that we can perform binary search in an
interval [α, β], where α is any lower bound on T ∗ and β any upper bound on T ∗ .
This binary search will enable us to eventually find a number B, which is within
(1 + ε) times T ∗ and where the number of binary search iterations depends on the
error parameter ε.

(2) Secondly, assume that the number of distinct values of job lengths is a constant k,
say. Then we can determine all configurations of jobs that do not violate a load bound
of t if scheduled on a single machine. This is the basis of a dynamic programming
scheme to determine a schedule on m machines. Of course, this approach involves
rounding the original job lengths to constantly many values, which introduces some
error. The error can be controled by adjusting the constant k of distinct job lengths
at the expense of running time and space requirement for the dynamic programming
table.

Consider the instance J with jobs of lengths p1 , . . . , pn . Let t be a parameter and


let m(J, t) be the smallest number of machines required to schedule the jobs J having
makespan at most t. With this definition, the minimum makespan T ∗ is given by T ∗ =
min{t : m(J, t) ≤ m}. We will later perform binary
P search on the parameter t, in the
interval [α, β], where α = max{maxj∈J pj , 1/m · j∈J pj } and β = 2 · α. Notice that α is
a lower bound on T ∗ and β an upper bound on T ∗ .

66
Dynamic Programming. Assume for now that |{p1 , . . . , pn }| = k, i.e., there are k
distinct job lengths. Fix an ordering of the jobs lengths. Then a k-tuple (i1 , . . . , ik )
describes for any ` ∈ {1, . . . , k} the number i` of jobs having the respective length. For
any k-tuple (i1 , . . . , ik ) let m(i1 , . . . , ik , t) be the smallest number of machines needed to
schedule these jobs Phaving makespan at most t. For a given parameter t and an instance
(n1 , . . . , nk ) with k`=1 n` = n, we first compute the set Q of all k-tuples (q1 , . . . , qk ) such
that m(q1 , . . . , qk , t) = 1, 0 ≤ q` ≤ n` for ` = 1, . . . , k, i.e., all sets of jobs that can be
scheduled on a single machine with makespan at most t. Clearly, Q contains at most O nk
elements. Having these numbers computed, we determine the entries m(i1 , . . . , ik , t) for
every (i1 , . . . , ik ) ∈ {0, . . . , n1 } × · · · × {0, . . . , nk } of a k-dimensional table as follows: The
table is initialized by setting m(q, t) = 1 for every q ∈ Q. Then we use the following
recurrence to compute the remaining entries:

m(i1 , . . . , ik , t) = 1 + min m(i1 − q1 , . . . , ik − qk , t).


q∈Q

k time. Thus the entire table can be computed in



Computing each entry takes O n
O n2k time, thereby determining m(n1 , . . . , nk , t) in polynomial time provided that k is


a constant.

Rounding. Let ε > 0 be an error parameter and let t ∈ [α, β] as defined above. We
say that a job j is small if pj < ε · t. Small jobs are removed from the instance for
now. The rest of the job lengths are rounded down as follows: If a job j has length
pj ∈ [t · ε · (1 + ε)i , t · ε · (1 + ε)i+1 ) for i ≥ 0, it is replaced by p0j = t · ε · (1 + ε)i . Thus
there can be at most k = dlog1+ε 1/εe many distinct job lengths. Now we invoke the
above dynamic programming scheme and determine a the optimal number of machines
for scheduling these jobs if the makespan is at most t. Since the rounding reduces the
length of each job by a factor of at most (1 + ε), the computed schedule has makespan
at most (1 + ε) · t when considering the original job lengths. Now we schedule the small
jobs greedily in leftover space and open new machines if needed. Clearly, whenever a new
machine is opened, all previous machines must be loaded to an extent of at least t. Denote
by a(J, t, ε) the number of machines used by this algorithm. Recall that the makespan is
at most (1 + ε) · t.

Lemma 9.3. We have that a(J, t, ε) ≤ m(J, t).

Proof. If the algorithm does not open any new machines for small jobs, then the assertion
clearly holds since the rounded down jobs have been scheduled optimally with makespan
t. In the other case, all but the last machine are loaded to the extent of t. Hence, the
optimal schedule of J having makespan t must also use at least a(J, t, ε) machines.

Corollary 9.4. We have that T = min{t : a(J, t, ε) ≤ m} ≤ min{t : m(J, t) ≤ m} = T ∗ .

Binary Search. If T could be determined with no additional error during the binary
search, then clearly we could use the above algorithm to obtain a schedule with makespan
at most (1 + ε) · T ∗ . Next, we will specify the details of the binary search and show how
to control the error it introduces. The binary search is performed in the interval [α, β] as
defined above. Thus, the length of the available interval is β − α = α at the start of the
search and it reduces by a factor of two in each iteration. We continue the search until it
drops to a length of at most ε · α. This will require dlog2 1/εe many iterations. Let B be
the right endpoint of the interval [A, B] we terminate with.

67
Lemma 9.5. We have that B ≤ (1 + ε) · T ∗ .

Proof. Clearly T = min{t : a(J, t, ε) ≤ m} must be in the interval [B − ε · α, B]. Hence

B ≤ T + ε · α ≤ (1 + ε) · T ∗ ,

where we have used T ∗ ≥ α and Corollary 9.4.

For t ≤ B this directly gives the result we wanted to show:

Theorem 9.6. For any 0 < ε ≤ 1 the algorithm produduces a schedule with makespan
2 ∗ ∗ 2k
at most (1 + ε) · T ≤ (1 + 3ε) · T within running time O n · dlog2 1/εe , where k =
dlog1+ε 1/εe.

9.2 Unrelated Machines


Here we give a 2-approximation algorithm for Makespan Scheduling on unrelated ma-
chines, which means that job j takes time pi,j if scheduled on machine i. The algorithm
is based on a suitable LP-formulation and a procedure for rounding the LP.
An obvious integer program for this problem is the following: Let xi,j be a variable
indicating if job j is assigned to machine i. The objective is to minimize the makespan.
The first set of constraints ensures that each job is scheduled on one of the machines and
the second ensures that each machine has a load of at most t.

minimize t
X
subject to xi,j = 1 j ∈ J,
i∈M
X
pi,j xi,j ≤ t, i ∈ M,
j∈J

xi,j ∈ {0, 1}.

If we relax the constraints xi,j ∈ {0, 1} to xi,j ∈ [0, 1], it turns out that this formulation
has unbounded integrality gap. (It is left as an exercise to show this.) The main cause of
the problem is an “unfair” advantage of the LP-relaxation: If pi,j > t, then we must have
xi,j = 0 in any feasible integer solution, but we might have xi,j > 0 in feasible fractional
solutions. However, we can not formulate the statement “if pi,j > t then xi,j = 0” in terms
of linear constraints.

Parametric Pruning. We will make use of a technique called parametric pruning to


overcome this difficulty. Let the parameter t be a “guess” of a lower bound for the actual
makespan T ∗ . Of course, we will do binary search on t in order to determine a suitable
value in an outside loop. However, having a value for t fixed, we are now able to enforce
constraints xi,j = 0 for all machine-job pairs i, j for which pi,j > t. Define St = {(i, j) :
pi,j ≤ t}. We now define a family lp(t) of linear programs, one for each value of the
parameter t. lp(t) uses the variables xi,j for which (i, j) ∈ St and asks if there is a feasible

68
solution using the restricted assignment possibilities, only.

minimize 0 (lp(t))
X
subject to xi,j = 1 j ∈ J,
i:(i,j)∈St
X
pi,j xi,j ≤ t, i ∈ M,
j:(i,j)∈St

xi,j ≥ 0 (i, j) ∈ St .

Extreme Point Solutions. With a binary search, we find the smallest value for t such
that lp(t) has a feasible solution. Let T be this value and observe that T ∗ ≥ T , i.e., the
actual makespan is bounded from below by T . Our algorithm will “round” an extreme
point solution of lp(T ) to yield a schedule with makespan at most 2 · T ∗ . Extreme point
solutions to lp(T ) have several useful properties.
Lemma 9.7. Any extreme point solution to lp(T ) has at most n + m many non-zero
variables.
Proof. Let r = |ST | represent the number of variables on which lp(T ) is defined. Recall
that a feasible solution is an extreme point solution to lp(T ) if and only if it sets r many
linearly independent constraints to equality. Of these r linearly independent constraints,
at least r − (n + m) must be chosen from the third set of constraints, i.e., of the form
“xi,j ≥ 0”. The corresponding variables are set to zero. So, any extreme point solution
has at most n + m many non-zero variables.

Let x be an extreme point solution to lp(T ). We will say that job j is integrally set
if xi,j ∈ {0, 1} for all machines i. Otherwise, i.e., xi,j ∈ (0, 1) for some machine i, job j is
said to be fractionally set.
Corollary 9.8. Any extreme point solution to lp(T ) must set at least n − m many jobs
integrally.
Proof. Let x be an extreme point solution to lp(T ) and let α and β be the number of
jobs that are integrally and fractionally set by x, respectively. Each job of the latter kind
is assigned to at least 2 machines and therefore results in at least 2 non-zero entries in x.
Hence we get α + β = n and α + 2β ≤ n + m. Therefore β ≤ m and α ≥ n − m.

Algorithm. The algorithm starts by computing the range in which it finds the right
value for T . For this it constructs a greedy schedule, in which each job is assigned to
the machine on which it has the smallest length. Let α be the makespan of this schedule.
Then the range is [α/m, α] (and it is an exercise to show that α/m is indeed a lower bound
on T ∗ ).
The LP-rounding algorithm is based on several interesting properties of extreme point
solutions of lp(T ), which we establish now. For any extreme point solution x for lp(T )
define a bipartite graph G = (M ∪ J, E) such that (i, j) ∈ E if and only if xi,j > 0. Let
F ⊆ J be the fractionally set jobs in x and let H be the subgraph of G induced by the
vertex set M ∪ F . Clearly (i, j) ∈ E(H) if 0 < xi,j < 1. A matching in H is called perfect
if it matches every job j ∈ F . We will show and use that the graph H admits perfect
matchings.
We say that a connected graph on a vertex set V is a pseudo tree if it has at most |V |
many edges. Since the graph is connected, it must have at least |V | − 1 many edges. So,

69
Algorithm 9.1 Schedule Unrelated
Input. J, M , pi,j for all i ∈ M and j ∈ J

Output. xi,j ∈ {0, 1} for all i ∈ M and j ∈ J

Step 1. By binary search in [α/m, α] compute smallest value T of the parameter t such
that lp(t) has a feasible solution.

Step 2. Let x be an extreme point solution for lp(T ).

Step 3. Construct graph H and find perfect matching P .

Step 4. Round in x all fractionally set jobs according to the matching P .

it is either a tree or a tree with an additional single edge (closing exactly one cycle). A
graph is a pseudo forrest if each of its connected components is a pseudo tree.
Lemma 9.9. We have that G is a pseudo forrest.
Proof. We will show that the number of edges in each connected component of G is
bounded by the number of vertices in it. Hence, each connected component is a pseudo
tree.
Consider a connected component Gc . Restrict lp(T ) and the extreme point solution x
to the jobs and machines of Gc , only, to obtain lpc (T ) and xc . Let xc̄ represent the rest
of x. The important observation is that xc must be an extreme point solution for lpc (T ).
Suppose that this is not the case. Then, xc is a convex combination of two feasible solutions
to lpc (T ). Each of these, together with xc̄ form a feasible solution for lp(T ). Therefore x
is a convex combination of two feasible solutions to lp(T ). But this contradicts the fact
that x is an extreme point solution. With Lemma 9.7 Gc is a pseudo tree.
Lemma 9.10. Graph H has a perfect matching P .
Proof. Each job that is integrally set in x has exactly one edge incident at it in G. Remove
these jobs together with their incident edges from G. The resulting graph is clearly H.
Since an equal number of edges and vertices have been removed from the pseudo forrest
G, H is also a pseudo forrest.
In H, each job has a degree of at least two. So, all leaves in H must be machines.
Keep matching a leaf with the job it is incident to and remove them both from the graph.
(At each stage all leaves must be machines.) In the end we will be left with even cycles
(since we started with a bipartite pseudo forrest.) Match alternating edges of each cycle.
This gives a perfect matching P .
Theorem 9.11. Algorithm Schedule Unrelated is a 2-approximation for Makespan
Scheduling on unrelated machines.
Proof. Clearly T ≤ T ∗ since lp(T ∗ ) has a feasible solution. The extreme point solution
x to lp(T ) has a fractional makespan of at most T . Therefore, the restriction of x to
integrally set jobs has an integral makespan of at most T . Each edge (i, j) of H satisfies
pi,j ≤ T . The perfect matching found in H schedules at most one extra job on each
machine. Hence, the total makespan is at most 2 · T ≤ 2 · T ∗ as claimed. The algorithm
clearly runs in polynomial time.
It is an exercise to show that the analysis is tight for the algorithm.

70
Chapter 10

Bin Packing

Here we consider the classical Bin Packing problem: We are given a set I = {1, . . . , n} of
items, where item i ∈ I has size si ∈ (0, 1] and a set B = {1, . . . , n} of bins with capacity
one. Find an assignment a : I → P B such that the number of non-empty bins is minimal.
As a shorthand, we write s(J) = j∈J sj for any J ⊆ I.

10.1 Hardness of Approximation


The Bin Packing problem is NP-complete. More specifically:
Theorem 10.1. It is NP-complete to decide if an instance of Bin Packing admits a
solution with two bins.
Proof. We reduce from Partition, which we know is NP-complete. Recall that in the
Partition problem, we are given n numbers P . . , cn ∈ N and are asked to decide if
c1 , .P
there is a set S ⊆ {1, . . . , n} such that i∈S ci = i6∈S ci . Given P a Partition instance,
we create an instance for Bin Packing by setting si = 2ci /( nj=1 cj ) ∈ (0, 1] for i =
1,
P. . . , n. Obviously
P two bins suffice if and only if there is a S ⊆ {1, . . . , n} such that
i∈S ic = c
i6∈S i .

This allows us to derive a lower bound on the approximabilty of Bin Packing.


Corollary 10.2. There is no ρ-approximation algorithm with ρ < 3/2 for Bin Packing
unless P = NP.

10.2 Heuristics
We will show that there are constant factor approximations for Bin Packing. Firstly
we consider the probably most simple Next Fit algorithm, which can be shown to be
2-approximate. Secondly, we give the First Fit Decreasing algorithm and show that
it is 3/2-approximate. Thus, with the above hardness result, this is best-possible, unless
P = NP.

Next Fit
The Next Fit algorithm works as follows: Initially all bins are empty and we start with
bin j = 1 and item i = 1. If bin j has residual capacity for item i, assign item i to bin j,
i.e., a(i) = j, and consider item i + 1. Otherwise consider bin j + 1 and item i. Repeat
until item n is assigned.

71
Theorem 10.3. Next Fit is a 2-approximation for Bin Packing. The algorithm runs
in O (n) time.

Proof. Let k be the number of non-empty bins in the assignment a found by Next Fit.
Let k ∗ be the optimal number of bins. We show the slightly stronger statement that

k ≤ 2 · k ∗ − 1.

Firstly we observe the lower bound k ∗ ≥ ds(I)e. Secondly, for bins j = 1, . . . , bk/2c we
have X
si > 1.
i:a(i)∈{2j−1,2j}

Adding these inequalities we get  


k
< s(I).
2
Since the left hand side is an integer we have that

k−1
 
k
≤ ≤ ds(I)e − 1.
2 2

This proves k ≤ 2 · ds(I)e − 1 ≤ 2 · k ∗ − 1 and hence the claim.

The analysis is tight for the algorithm, which can be seen with the following instance
with 2n items. For some ε > 0 let s2i−1 = 2 · ε, s2i = 1 − ε for i = 1, . . . , n.

First Fit Decreasing


The algorithm Next Fit never considers bins again that have been left behind. Thus
the wasted capacity therein leaves room for improvement. A natural way is First Fit:
Initially all bins are empty and we start with current number of bins k = 0 and item i = 1.
Consider all bins j = 1, . . . , k and place item i in the first bin that has sufficient residual
capacity, i.e., a(i) = j. If there is no such bin increment k and repeat until item n is
assigned. One can prove that First Fit uses at most k ≤ d17/10 · k ∗ e many bins, where
k ∗ is the optimal number.
There is a further natural heuristic improvement of First Fit, called First Fit
Decreasing: Reorder the items such that s1 ≥ · · · ≥ sn and apply First Fit. The
intuition behind considering large items first is the following: “Large” items do not fit into
the same bin anyway, so we already use unavoidable bins and try to place “small” items
into the residual space.

Theorem 10.4. First Fit Decreasing is a 3/2-approximation for Bin Packing. The
algorithm runs in O n2 time.

Proof. Let k be the number of non-empty bins of the assignment a found by First Fit
Decreasing and let k ∗ be the optimal number.
Consider bin number j = d2/3ke. If it contains an item i with si > 1/2, then each bin
j 0 < j did not have space for item i. Thus j 0 was assigned an item i0 with i0 < i. As the
items are considered in non-increasing order of size we have si0 ≥ si > 1/2. That is, there
are at least j items of size larger than 1/2. These items need to be placed in individual
bins. This implies
2
k ∗ ≥ j ≥ k.
3

72
Otherwise, bin j and any bin j 0 > j does not contain an item with size larger than 1/2.
Hence the bins j, j + 1, . . . , k contain at least 2(k − j) + 1 items, none of which fits into
the bins 1, . . . , j − 1. Thus we have

s(I) > min{j − 1, 2(k − j) + 1}


≥ min{d2/3ke − 1, 2(k − (2/3k + 2/3)) + 1}
= d2/3ke − 1

and k ∗ ≥ s(I) > d2/3ke − 1. This even implies


 
2 2
k∗ ≥ k ≥ k
3 3
and hence the claim.

10.3 Asymptotic Polynomial Time Approximation Scheme


With the hardness result that there is no approximation algorithm for Bin Packing with
guarantee better than 3/2, unless P = NP, we do not have to search for a PTAS (or even
an FPTAS). However, notice that the reduction used that the optimal number of bins is
“small”, such as 2 or 3. It is plausible that, in “practical” instances, the optimal number
k ∗ of bins grows as the number of items grows. Maybe we can do better for those instances.
This leads us to define: An asymptotic polynomial time approximation scheme (AP-
TAS) is a familiy of algorithms, such that for any ε > 0 there is a number k 0 and a
(1 + ε)-approximation algorithm, whenever k ∗ ≥ k 0 . For Bin Packing such a family
exists. However, the involved running times are rather high, even though polynomial in n.
Theorem 10.5. For any 0 < ε ≤ 1/2 there is an algorithm that runs in time polynomial
in n and finds an assignment having at most k ≤ (1 + ε) · k ∗ + 1 many bins.
Lemma 10.6. Let ε > 0 and d ∈ N be constants. For any instance of Bin Packing
where si ≥ ε and |{s1 , . . . , sn }| ≤ d, there is a polynomial time algorithm that solves it
optimally.
Proof. The number of items in a bin is bounded by m := b1/εc. Therefore, the number
of different assignments for one bin is bounded by r = m+d

m , which is a (large) constant.
There are at most n bins used and therefore, the number of feasible assignments is bounded
by p = n+r r . This is a polynomial in n. Thus we can enumerate all assignments and
choose the best one to give an optimum solution.

Lemma 10.7. Let ε > 0 be a constant. For any instance of Bin Packing where si ≥ ε,
there is a (1 + ε)-approximation algorithm.
Proof. Let I be the given instance. Sort the n items by increasing size and partition them
into g = d1/ε2 e many groups each having at most q = bnε2 c many items. Notice that two
groups may contain items of the same size.
Construct an instance J by rounding up the size of each item to the size of the largest
item in its group. Instance J has at most g many different item sizes. Therefore, we
can find an optimal assignment for J by invoking Lemma 10.6. This is clearly a feasible
assignment for the original item sizes.
Now we show that k ∗ (J) ≤ (1 + ε)k ∗ (I): We construct another instance J 0 by rounding
down the size of each item to the smallest item size in its group. Clearly k ∗ (J 0 ) ≤ k ∗ (J).

73
The crucial observation is that an assignment for instance J 0 yields an assignment for all
but the largest q items of the instance J. Therefore

k ∗ (J) ≤ k ∗ (J 0 ) + q ≤ k ∗ (I) + q.

To finalize the proof, since each item has size at least ε, we have k ∗ (I) ≥ n · ε and
q = bnε2 c ≤ ε · k ∗ (I). Hence
k ∗ (J) ≤ (1 + ε) · k ∗ (I)
and the claim is established.

Proof of Theorem 10.5. Let I denote the given instance and I 0 the instance after discarding
the items with size less than ε from I. We can invoke Lemma 10.7 and find an assignment
which uses at most k(I 0 ) ≤ (1 + ε) · k ∗ (I 0 ) many bins. By using First Fit, we assign the
items with sizes less than ε into the solution found for instance I 0 . We use additional bins
if an item does not fit into any of the bins used so far.
If no additional bins are needed, then our assignment uses k(I) ≤ (1 + ε) · k ∗ (I 0 ) ≤
(1 + ε) · k ∗ (I) many bins. Otherwise, all but the last bin have residual capacity less than
ε. Thus s(I) ≥ (k(I) − 1)(1 − ε), which is a lower bound for k ∗ (I). Thus we have

k ∗ (I)
k(I) ≤ + 1 ≤ (1 + 2ε) · k ∗ (I) + 1,
1−ε
where we have used 0 < ε ≤ 1/2.

74
Chapter 11

Traveling Salesman

Here we study the classical Traveling Salesman problem: Given a complete graph
G = (V, E) on n vertices with non-negative edge cost c : E → R+ find a tour T , P
i.e., a cycle
in G which visits each vertex v ∈ V exactly once, having minimum cost(T ) = e∈T c(e).

11.1 Hardness of Approximation


The disappointing first fact about Traveling Salesman is that, without assumptions
on the edge-cost, the problem can not be approximated, unless P = NP.
Theorem 11.1. Let α(n) be any polynomial time computable function. Then there is no
α(n)-approximation algorithm for Traveling Salesman, unless P = NP.
Proof. For sake of contradiction, assume that there is a polynomial time α(n)-approximation
algorithm Alg. We show that such an algorithm can be used to decide the NP-complete
problem of deciding the Hamiltionian Cycle problem: Given a graph H = (V, E) on n
vertices, decide if H has a tour.
We transform any input H for the Hamiltionian Cycle problem into a graph G for
the Traveling Salesman problem as follows: V (G) = V (H), E(G) = {uv : u, v ∈ V },
and (
1 if e ∈ E(H),
c(e) =
α(n) · n otherwise.
If H has no tour, then any tour T of G has cost cost(T ) > α(n) · n. This includes the tour
found by Alg.
If H has a tour, then G has a tour T ∗ with cost(T ∗ ) = n. Since Alg is a α(n)-
approximation algorithm, it produces a tour T with cost(T ) ≤ α(n) · n. Clearly T is also
a tour in H, since it can not traverse any edge with cost α(n) · n in G.
Therefore, Alg is a polynomial time algorithm which can be used to decide the Hamil-
ton Cycle problem contradicting P 6= NP.

11.2 Metric Traveling Salesman


As we have seen that the general traveling Salesman problem can not be approximated,
unless P = NP, we introduce assumptions on the edge-cost. A natural choice, called
Metric Traveling Salesman is that the cost satisfy the triangle inequality c(uv) ≤
c(uw) + c(wv) for all u, v, w ∈ V . The problem is still NP-hard but allows constant factor
approximations.

75
Spanning Tree Heuristic
Observe that the cost of any minimum spanning tree S of G is a lower bound for the
optimal tour T ∗ , i.e., cost(T ∗ ) ≥ cost(S). This is because the removal of any edge in any
tour T , including T ∗ , yields a spanning tree of G.
A graph G is called Eulerian, if all its degrees are even. In this case it has an Euler
tour, i.e., is possible to traverse the edges of G in a cycle that visits each edge exactly
once. A respective algorithm can be implemented to run in O (n + m) time.

Algorithm 11.1 Spanning Tree Heuristic


Input. Complete graph G = (V, E), c : E → R+

Output. Tour T in G

Step 1. Compute minimum spanning tree S of G.

Step 2. Double the edges of S to obtain Eulerian graph D.

Step 3. Compute Euler tour Q in D.

Step 4. Compute tour T in G that traveres the vertices V in the order of their first
appearance in Q.

Step 5. Return T .

Theorem 11.2. The algorithm Spanning Tree Heuristic is a 2-approximation for


Metric Traveling Salesman.

Proof. Let T ∗ be an optimal tour in G. We clearly have cost(T ∗ ) ≥ cost(S) = cost(Q)/2.


In the construction of T , consider a situation, where T traverses an edge uv, while Q
traverses a path uw1 , w1 w2 , . . . , wk−1 wk , wk v. By the triangle inequality (applied multiple
times if necessary), we have

c(uv) ≤ c(uw1 ) + · · · + c(wk v).

Therefore cost(T ) ≤ cost(Q), which yields

cost(T ) ≤ 2 · cost(T ∗ ).

It remains to show that T is indeed a tour in G. Since T visits each vertex in the order of
first appearance in Q, i.e., at most once, and since Q visits each vertex at least once as S
is a spanning tree, T visits each vertex exactly once.

It is an exercise to give a tight example for this algorithm.

Christofides Algorithm
In the above heuristic, we doubled all the edges of the spanning tree S in order to obtain
an Eulerian graph D. Maybe there is a smarter way of finding such a graph. Recall that a
graph is Eulerian if all its degrees are even. Thus we do not have to be concerned about the
vertices with even degree in the spanning tree S. Also recall that the number of vertices
with odd degree in any graph is even k, say. Our goal will be to start with the spanning

76
tree S and obtain a graph D by adding a collection of edges (a matching) e1 , . . . , ek/2
between the vertices of odd degree in S. Observe that the even degrees in S remain even
in D and that the odd degrees in S become also even in D. Thus D is an Eulerian graph.
We want to find the cheapest possible matching of such kind.

Algorithm 11.2 Christofides


Input. Complete graph G = (V, E), c : E → R+

Output. Tour T in G

Step 1. Compute minimum spanning tree S of G.

Step 2. Let W ⊆ V be the odd-degree vertices in S. Let H = (W, F ), where F = {vw :


v, w ∈ W }.

Step 3. Compute minimum cost perfect matching M in H (using the cost function c).

Step 4. Let D = S ∪ M and compute an Euler tour Q in D.

Step 5. Compute tour T in G that traveres the vertices V in the order of their first
appearance in Q.

Step 6. Return T .

Lemma 11.3. Let W ⊆ V such that |W | is even, let H = (W, F ), where F = {vw : v, w ∈
W }, and let M be a minimum cost perfect matching in H. Then

cost(T ∗ ) ≥ 2 · cost(M ).

Proof. First observe that H has a perfect matching since the graph is complete and has an
even number of vertices. Let T ∗ be an optimal tour in G and let T be the tour in H which
visits the vertices W in the same order as in T ∗ . For every edge uv ∈ T there is a path
uw1 , . . . , wk v ∈ T ∗ and by the triangle inequality we have c(uv) ≤ c(uw1 ) + · · · + c(wk v).
Therefore cost(T ∗ ) ≥ cost(T ). On the other hand, T is a cycle with even number of edges.
Thus, by considering the edges alternatingly, T can be decomposed into two matchings
M1 and M2 . Clearly cost(M1 ) ≥ cost(M ) and cost(M2 ) ≥ cost(M ), which yields

cost(T ∗ ) ≥ cost(T ) = cost(M1 ) + cost(M2 ) ≥ 2 · cost(M )

as claimed.

Theorem 11.4. The algorithm Christofides is a 3/2-approximation for Metric Trav-


eling Salesman.

Proof. We have already argued that the graph D constructed is an Eulerian graph and,
by the triangle inequality, the constructed tour T has cost(T ) ≤ cost(D). Then we have
1 3
cost(T ) ≤ cost(D) = cost(S) + cost(M ) ≤ cost(T ∗ ) + · cost(T ∗ ) = · cost(T ∗ )
2 2
as claimed.

77

You might also like