Thesis A Komen Final June 6
Thesis A Komen Final June 6
Thesis A Komen Final June 6
Master Thesis
First Supervisor:
Author: Dr. J. A. Hoogeveen
Abel Komen
Utrecht University Second Supervisor:
Dr. Ir. J. M. van den Akker
June 6, 2017
Abstract
In this thesis an attempt is made to find out how the type of uncertainty (dis-
crete and finite or polyhedral) influences performance of Benders’ decomposition
[4] and Column & Constraint Generation [24] when solving the demand robust
location-transportation problem. A generalization of Benders’ decomposition is
presented to make it applicable to a large group of demand robust optimiza-
tion problems. Also, Column & Constraint Generation is adapted to be used
on discrete and finite uncertainty sets. In [24] it was shown that Column &
Constraint Generation is able to solve the problem a lot better than a standard
implementation of Benders’ decomposition. The performance comparison for
discrete and finite uncertainty sets made in this thesis is new. On top of that,
a number of techniques for making Benders’ faster are applied. Special atten-
tion is paid to the role of the MIP-solver that is used as a black box for both
algorithms.
Contents
1 Introduction 3
3 Benders’ Decomposition 14
3.1 Benders’ Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Benders’ Decomposition for Demand Robust Optimization . . . . 17
3.2.1 Discrete Uncertainty Sets . . . . . . . . . . . . . . . . . . 17
3.2.2 Polyhedral Uncertainty Sets . . . . . . . . . . . . . . . . . 20
3.3 Demand Robust Location-Transportation . . . . . . . . . . . . . 21
3.3.1 Discrete Uncertainty Sets . . . . . . . . . . . . . . . . . . 21
3.3.2 Polyhedral Uncertainty Sets . . . . . . . . . . . . . . . . . 22
3.4 Algorithmic Enhancements . . . . . . . . . . . . . . . . . . . . . 23
3.4.1 2 Phase Method . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.2 Using Incumbent Solutions . . . . . . . . . . . . . . . . . 24
5 Computational Results 29
5.1 Problem Generation . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2.1 Discrete Uncertainty Sets . . . . . . . . . . . . . . . . . . 30
5.2.2 Polyhedral Uncertainty Sets . . . . . . . . . . . . . . . . . 33
1
6 Interpretation of Results 35
6.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1.1 Problem Selection . . . . . . . . . . . . . . . . . . . . . . 36
6.1.2 Code and Algorithmic Choices . . . . . . . . . . . . . . . 37
6.1.3 Performance Variability . . . . . . . . . . . . . . . . . . . 37
6.2 Possible Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7 Conclusion 40
2
Chapter 1
Introduction
Real world optimization problems often include uncertainty in some form. The
easiest way to deal with this uncertainty is to use a deterministic model whose
solutions are close enough to the real outcome. Unfortunately this is not always
possible or it could be that capturing the uncertainty in an optimization problem
yields superior results. A number of ways to do this will be covered here.
There is ample literature regarding real world optimization problems that
deal with uncertainty. In the field of robustness a couple of examples are prob-
lems in railway related optimization problems ([8], [16]), airline scheduling prob-
lems problems ([19]) and problems regarding electricity grid stability ([1], [14],
[15], [23], [25]).
To illustrate the possibilities, consider the cutting-stock problem. The cutting-
stock problem concerns the divisions of some standard-sized pieces of supplied
material into pieces of a specific size. The objective is to minimize the waste
that is left over after the demand is fulfilled. This problem could for example be
encountered in a company that is supplied with pieces of sheet metal of a fixed
size that have to be cut into numerous pieces of a specific size. The company has
m different sizes, and requires qj pieces of size with index j ∈ {1, . . . , m}. Each
standard-sized piece can be cut using a so-called pattern. The company has n
patterns and for each pattern i ∈ {1, . . . , n} parameter aij indicates the number
of pieces of size j it produces. Also, each pattern i has waste ci associated with
it. The variable xi indicates how often pattern i is utilized. Of course, xi is
integer. The deterministic
Pn Pnversion of cutting stock is given by
minx∈{0,1,...} { i=1 ci xi | i=1 aij xi ≥ qj ∀j ∈ {1, . . . , m}}.
In the cutting-stock problem, uncertainty can arise in a number of ways.
It could be that demand qj for the cuts is unknown at the time of decision
making or that the number of cuts aij produced by a pattern is uncertain (due
to low quality of supplied material for example). Also, the amount of waste
produced by pattern i can be uncertain. For ease of exposition, assume that
the uncertainty can be adequately captured by some K scenarios {1, . . . , K}.
This means that scenario k ∈ {1, , . . . , K} is a set of parameters (ck , q k , ak )
The most strict way to deal with this uncertainty is given by robust opti-
3
mization. In robust optimization the chosen solution has to be feasible for all
possible scenarios and the objective is to minimize the costs of the most expen-
sive scenario. So the robust optimization
Pn Pversion of cutting stock is given by
n
minx∈{0,1,...} {maxk∈{1,...,K} { i=1 cki xi | i=1 akij xi ≥ qjk ∀j ∈ {1, . . . , m}}}.
Robust optimization is a one stage approach to handling uncertainty, be-
cause all decisions have to be made at once before the outcome of the uncertain
processes is known and it is not possible to adjust the solution afterwards. On
the other hand, there are also two stage approaches. These allow for a decision
maker to observe the outcome of his plans and adjust where necessary. For
cutting-stock this means that it is possible to make a plan and check whether
all demands are met after the plan is executed and the results can be checked.
If not, some more pieces of material can be cut.
A well known two stage approach is stochastic programming. The goal of two
stage stochastic programming is to minimize the cost of the first stage decisions
plus the average cost over all scenarios of the second stage decisions. Let the
first stage decisions be called yi . The second stage decisions for scenario k are
called xki . The cost for the first stage decisions are denoted by di and to avoid
a trivial situation where all decisions can be postponed to the second stage it
is necessary that di < cki . The two stage stochastic programming version of
cutting-stock is given by
n K n
X 1 XX k k
min d i yi + ci xi
y,xk
i=1
K i=1
k=1
n
X
s.t. akij (yi + xki ) ≥ qjk ∀j ∈ {1, . . . , m}, k ∈ {1, . . . , K}
i=1
y, xk ≥ 0 and integer
4
n
X n
X
min di yi + max min cki xki
y k xk
i=1 i=1
n
X
s.t. akij (yi + xki ) ≥ qjk ∀j ∈ {1, . . . , m}, k ∈ {1, . . . , K}
i=1
y, xk ≥ 0 and integer
5
uncertainty in that paper. A polyhedral scenario set is used which could put
Benders’ at a disadvantage because the problem no longer can be solved by a
classical Benders’ but has to be solved by a variant of the algorithm.
Therefore, in this thesis Benders’ decomposition will be compared to column
and constraint generation for demand robust location-transportation with a
discrete scenario set and with a polyhedral uncertainty set. This will hopefully
shed some light on the strengths and weaknesses of both algorithms and could
help with future algorithm selection.
Also, the problem with discrete uncertainty sets is solved by a standard
Gurobi MIP-solver to compare the performance of the specialized algorithms to
an off-the-shelf standard solver. The results of these experiments can be found
in Chapter 5.
In Chapter 6 we will take a closer look at how to interpret the results of the
computational experiments.
6
Chapter 2
Demand Robust
Optimization
7
value of some decision variables can be chosen after the uncertainty is lifted.
From now one, variables that have to be chosen before the uncertainty is lifted
will be denoted by y and variables that can be chosen after the uncertainty
is lifted will be marked x, unless explicitly stated otherwise. The variables y
are called non-adjustable or first stage and the variables x are called adjustable
or second stage. It is possible that there are constraints that only affect the y
variables. The set Y will be the set of allowed values of y and is defined by
Y = {y|Dy ≥ e} ∩ N (throughout this report, the symbol N will be used to
denote the natural numbers including 0). The Adjustable RC (ARC, [3]) can
now be formulated as
The matrix A is called the recourse matrix. The problem is called a fixed
recourse problem when A is not uncertain. The vector u is called the demand
vector.
The concept of demand robustness (DR) is introduced in [11]. Whitout call-
ing it demand robustness, [24] also gives a description of demand robustness.
The description provided here is modeled after the description in [24] but contin-
uing the notation used in the beginning of [3]. A DR problem aims to optimize
the total cost of the non-adjustable variables y plus the worst case costs of the
adjustable variables x while only the demand vector u is uncertain. To fit this
into the framework of 2.3 a variable η is added to the non-adjustable variables
accompanied by constraints of the form η ≥ cx. In this case, the costs of the
second stage decisions (captured by c) can also vary. The uncertainty set Z
is renamed U to emphasize this property and furthermore [11] only deals with
uncertainty sets that consist of scenarios, meaning that U is a discrete and fi-
nite set. So U = {u1 , . . . , uk }. The discrete and finite nature of U allows for
the introduction of a separate vector of variables xk to be introduced for every
scenario uk .
A general DR problem in the style of 2.3 can be formulated as follows:
A different kind of uncertainty set for demand robust problems was intro-
duced in [5] where the uncertain values are not in a discrete and finite set but
in a special type of polyhedron. Every element uj of u ∈ U has a base value
uj and some extra demand of at most ūj can be added to that. So every uj
takes a value in uj + gj ūj with gj ∈ [0, 1]. On top of that there is a limit Γ
to the amount
P of total deviation from the base values which is enforced by the
constraint j gj ≤ Γ. The uncertainty set can now be defined as
8
A DR problem with an uncertainty set as in 2.5 can be summarized as
min dy + max min{cx|Ax ≥ u − By} (2.6)
y∈Y u∈U x≥0
There is a large difference in solving 2.4 and 2.6. The discrete and finite
uncertainty set of 2.4 allows the problem to be formulated as one linear program.
This is not the case for 2.6. The way to solve the problem for some fixed ȳ can
be found in [13] and is repeated here. How to use this to solve the complete
problem is explained in the chapters on Benders’ decomposition (Chapter 3)
and Column & Constraint Generation (Chapter 4).
For fixed ȳ, problem 2.6 reduces to ([13])
s.t. Ax ≥ u − B ȳ (2.8)
x≥0 (2.9)
Taking the dual of the inner minimization of 2.7 - 2.9 and combining the
maximization problems yields
s.t. πA ≤ c (2.11)
u ∈ U, π≥0 (2.12)
s.t. πA ≤ c (2.14)
1g ≤ Γ (2.15)
g ∈ {0, 1}, π≥0 (2.16)
Note that g ∈ {0, 1} and not g ∈ [0, 1] because an optimal solution will
always be in an extreme point of the uncertainty set when Γ is integer ([13],
[24]).
Problem 2.13 - 2.16 contains the quadratic terms π(gū). Due to these terms
the problem cannot be solved by a standard MIP-solver. These quadratic terms
can be eliminated by introducing a new variable vector ω such that ωj = πj
when gj = 1 and 0 otherwise. Because ū ≥ 0 this can be achieved by the
constraints ω ≤ π and ω ≤ M g for some sufficiently large real number M . This
gives the following problem
9
max πu + ωū − πB ȳ (2.17)
g,π
s.t. πA ≤ c (2.18)
ω≤π (2.19)
ω ≤ Mg (2.20)
1g ≤ Γ (2.21)
gj ∈ {0, 1}∀j, π ≥ 0, ω≥0 (2.22)
The solution to Problem 2.17 - 2.22 can be used for Column & Constraint
Generation or Benders’ decomposition.
X
min c(i,j) x(i,j) (2.23)
(i,j)∈A
Problem 2.23 - 2.27 can be expanded by the realization that implicit con-
straints pr − pi ≥ 0 for all i 6= t are left out. This makes sense for the determin-
istic version of the problem. Mentioning them explicitly makes it easier to see
how this problem can be cast into the demand robust framework of 2.4. The
demand robust version of min-cut is such that the terminal vertex t is uncer-
tain and it is cheaper to select edges before the terminal is known. Due to the
nature of this problem (a vertex is the terminal or not) there is only a demand
robust version of this problem with a discrete and finite uncertainty set. For
10
every edge there is a variable y(i,j) that indicates if (i, j) is selected before the
terminal is known. For every edge in every scenario k there is a variable xk(i,j)
that indicates if edge (i, j) is selected after the terminal is revealed in scenario
k. Selecting (i, j) before the terminal is revealed costs c(i,j) and selecting (i, j)
after the terminal is known costs σ k c(i,j) in scenario k for some σ k ≥ 1. The
uncertain demand vector consists of the right-hand sides of constraints pr − pki
with pr −pki ≥ 0 if i 6= t in scenario k and 1 otherwise. If tk denotes the terminal
in scenario k the demand robust equivalent of 2.23 - 2.27 is
X
min c(i,j) y(i,j) + η (2.28)
(i,j)∈E
pki ≥ 0 ∀i ∈ V, ∀k (2.32)
xk(i,j) ≥0 ∀(i, j) ∈ E, ∀k (2.33)
11
X X
min (fi yi + ai zi ) + cij xij (2.34)
(y,z,x)
i ij
s.t. zi ≤ Ki yi (2.35)
X
xij ≤ zi (2.36)
j
X
xij ≥ uj (2.37)
i
yi ∈ {0, 1}, zi ≥ 0, xij ≥ 0 (2.38)
In the demand robust version of this problem the variables yi and zi are the
non-adjustable or first stage variables and the variables xij are the adjustable
or second stage variables. The uncertain demand vector u is drawn from a
polyhedral uncertainty set U defined as in 2.5. Let
P umax be the largest possible
total demand for all scenarios. The constraint i zi ≥ umax is added to make
sure that the second stage problem is always feasible. The demand robust
location-transportation problem is given by
X X
min (fi yi + ai zi ) + max min cij xij (2.39)
(y,z,x) u∈U
i ij
s.t. zi ≤ Ki yi ∀i (2.40)
X
zi ≥ umax (2.41)
i
X
xij ≤ zi ∀i (2.42)
j
X
xij ≥ uj ∀j (2.43)
i
yi ∈ {0, 1}, zi ≥ 0, xij ≥ 0 (2.44)
12
X
min (fi yi + ai zi ) + η (2.45)
(y,z,η)
i
s.t. zi ≤ Ki yi ∀i (2.46)
X
zi ≥ umax (2.47)
i
X
xkij ≤ zi ∀i, k (2.48)
j
X
xkij ≥ ukj ∀j, k (2.49)
i
X
η− cij xkij ≥ 0 ∀k (2.50)
ij
13
Chapter 3
Benders’ Decomposition
min cx + dy (3.1)
s.t. Ax + By ≥ b (3.2)
Dy ≥ e (3.3)
x ≥ 0, y∈N (3.4)
Problems 3.5 and 3.6 are equivalent. The inner maximization problem of
3.6 is a linear program so it can be either bounded, unbounded or infeasible.
If it is infeasible the inner minimization problem of 3.5 is either unbounded
or infeasible. This means that the original problem 3.1 - 3.4 is unbounded or
14
infeasible in that case. Therefore, the inner maximization of 3.6 is assumed to be
feasible. Note that feasibility does not depend on y, so the inner maximization
problem is infeasible for all y or feasible for all y. If it is unbounded for some
ȳ ∈ Y then the inner minimization problem of 3.5 is infeasible and that means
there exists no solution for the original problem with y = ȳ. Also, an unbounded
dual problem means there exists some extreme ray ρ with ρ(b − By) > 0. This
situation can be avoided by adding a constraint
min dy + ζ (3.9)
p
s.t ζ ≥ π (b − By) ∀p ∈ {1, . . . , P } (3.10)
q
ρ (b − By) ≤ 0 ∀q ∈ {1, . . . , Q} (3.11)
y∈Y (3.12)
min dy + ζ (3.13)
s.t ζ ≥ π p (b − By) ∀p ∈ {1, . . . , P 0 } (3.14)
q 0
ρ (b − By) ≤ 0 ∀q ∈ {1, . . . , Q } (3.15)
y∈Y (3.16)
15
The master problem after t iterations is the same as problem 3.9 - 3.12 with
a number of constraints removed, and hence the objective value of an optimal
solution for the master problem is a lower bound (LB) for problem 3.9 - 3.12.
Therefore, the master problem after t iterations provides a LB for the original
problem 3.1 - 3.4.
The feasibility and optimality constraints are generated by solving the inner
maximization of 3.6. The y variables in the objective function get the values
of the y variables of the most recent optimal solution for the master problem.
This is called the subproblem. Optimization of the subproblem will yield an
extreme ray or extreme point that can be used to either build a feasibility or
an optimality constraint, respectively. When the subproblem has been solved
to optimality (so it is not unbounded), the optimal objective function plus dy
are an upper bound (UB) for the original problem. If this UB is smaller than
the incumbent UB, then a new upper bound has been found.
So the algorithm starts by solving the master problem and storing the LB.
Then the solution to the master problem is used to build a new subproblem.
The subproblem is solved and either a feasibility or optimality constraint is
added to the master. If a new UB is found and it is better than the old one the
UB is updated. Now the master (including the new constraint) is solved, which
leads to a new LB and a new subproblem to be solved. The algorithm stops
when LB=UB, because at that point the solution to the original problem has
been found. Pseudocode of the algorithm can be found in Algorithm 3.1.
16
3.2 Benders’ Decomposition for Demand Robust
Optimization
Benders’ is also well suited to be applied to robust optimization problems. For
discrete uncertainty sets the general Benders’ decomposition as described in
Section 3.1 can be used in a straightforward way. For polyhedral uncertainty
sets Benders’ has to be adapted to the fact that the subproblem is no longer a
linear program. In the case described in this report the subproblem is a bilinear
program which calls for a different implementation of Benders’.
min dy + η (3.17)
k k
s.t. Ax + By ≥ b ∀k ∈ {1, . . . , K} (3.18)
k k
η−c x ≥0 ∀k ∈ {1, . . . , K} (3.19)
k
y ∈ Y, x ≥ 0 ∀k ∈ {1, . . . , K} (3.20)
where y are the first stage variables and xk are the second stage variables
for scenario k. The set Y is defined as Y = {y|y ∈ N, Dy ≥ e}. An expanded
representation of 3.17 - 3.20 is given by
17
min dy + η (3.21)
0 A 0 ··· 0 1
0 0 A ··· 0 B b
.. .. .. .. .. ..
..
η
. . . . . 1 .
.
x B
K
0 0 0 ··· A . + y ≥ b
s.t.
1 −c1 . 0 (3.22)
0 ··· 0 .
0
1
0 −c2 ··· 0 x
K .
..
.
..
. .. .. .. ..
..
. . . . 0 0
1 0 0 ··· −cK
η ≥ 0, x1 , . . . , xK ≥ 0, y ∈ Y (3.23)
min η (3.24)
0 A 0 ··· 0 1
0 0 A ··· 0 b B
.. .. .. .. .. ..
.. η
. . . . . 1 K
. .
x b B
0 0 0 ··· A . ≥ − ȳ
s.t. (3.25)
1
−c1 0 ··· 0 . 0 0
.
1
0 −c2 ··· 0 xK
. .
.. ..
. .. .. .. ..
..
. . . . 0 0
1 0 0 ··· −cK
η ≥ 0, x1 , . . . , xK ≥ 0 (3.26)
The dual of problem 3.24 - 3.26 is the Benders’ subproblem. The dual
variables will be
In 3.27 π u,k refers to a set of variables that correspond to the scenario con-
straints Axk ≥ bk − B ȳ for every scenario k, and the π η,k variables are the
dual variables of the cost constraint η − ck xk ≥ 0 for every k. The Benders’
subproblem for some master solution ȳ can now be expressed as
18
b1
B
.. ..
. .
K
b B
max 0 − 0 ȳ
π (3.28)
. .
.. ..
0 0
0 A 0 ··· 0
0 0 A ··· 0
.. .. .. ..
..
.
. . . .
0 0 0 ··· A
s.t. π ≤ 1 0 ··· 0 (3.29)
1
−c1 0 ··· 0
1
0 −c2 ··· 0
. .. .. .. ..
..
. . . .
1 0 0 ··· −cK
π≥0 (3.30)
The objective 3.28 can be reduced because for all dual variables of cost
constraints π η,k the objective parameter is 0 so 3.28 can also be expressed as
1
b B
u,1 u,K .. ..
π ··· π . − . ȳ (3.31)
bK B
The problem can further be reduced in the following way. Suppose k ∗ is
the most expensive scenario for some optimal solution to 3.24 - 3.26 for fixed
∗ ∗
ȳ. In other words, η ≥ ck xk > ck xk for all k 6= k ∗ (if multiple scenarios are
tied for most expensive then the first one that is encountered is chosen). This
means that the cost constraints η − ck xk are not binding and therefore the dual
variables π η,k are 0 for k 6= k ∗ due to complementary slackness. The constraint
that linked the dual problem together is now no longer necessary so the problem
can be split into K independent problems.
For all k 6= k ∗ the problem is
19
∗ ∗
max π u,k (bk − B ȳ) (3.35)
u,k∗ η,k∗ k∗
s.t. π A−π c ≤0 (3.36)
u,k∗ η,k∗
π ≥ 0, 0≤π ≤1 (3.37)
∗
Assuming ck ≥ 0 for all k choosing π η,k = 1 gives the largest feasible region,
so problem 3.35 - 3.37 reduces to
∗ ∗
max π u,k (bk − B ȳ) (3.38)
u,k∗ k∗
s.t. π A≤c (3.39)
u,k∗
π ≥0 (3.40)
∗ ∗
min ck xk (3.41)
k∗ k∗
s.t. Ax ≥b − B ȳ (3.42)
k∗
x ≥0 (3.43)
This observation can be used to device a solution strategy for 3.28 - 3.30.
It starts by solving minxk ≥0 {ck xk |Axk ≥ bk − B ȳ, xk ≥ 0} for every scenario k.
All objective values for the optimal solution are then compared, and the most
expensive scenario is called k ∗ . For all other scenarios k 6= k ∗ the following
∗ ∗
values are assigned: π η,k = π u,k = 0. For k ∗ : π η,k = 1 and π u,k equals the
vector of shadow prices of the constraints 3.42.
This yields the following Benders’ reformulation
min dy + ζ (3.44)
u,k k
s.t. ζ ≥ π b − By ∀p ∈ {1, . . . , P } (3.45)
y∈Y (3.46)
20
scenario set are explicitly generated the solution method of the previous section
could be applied because the set of extreme points of the scenario set is discrete
and finite. Unfortunately this is infeasible in practice. Therefore the solution
to problem 2.17 - 2.22 has to be used. An optimal solution π ∗ , g ∗ is sufficient to
generate an optimality constraint π ∗ (u∗ − By) for the Benders’ master problem.
X
min (fi yi + ai zi ) + ζ (3.47)
(y,z,ζ)
i
s.t. zi ≤ Ki yi ∀i (3.48)
X
zi ≥ umax (3.49)
i
yi ∈ {0, 1}, zi ≥ 0 (3.50)
In the starting master problem variable ζ is unconstrained, which leads to
an unbounded problem. This can be prevented by choosing ζ = 0 or removing
ζ from the master and introducing it the first time an optimality constraint is
added. Be careful to ignore the LB found by optimizing the master problem
with ζ = 0 because it could be too high.
The K subproblems (one for every scenario k) based on an (optimal) solution
(y ∗ , z ∗ ) of the master problem are
X
min cij xkij (3.51)
ij
X
s.t − xkij ≥ −zi∗ ∀i (3.52)
j
X
xkij ≥ ukj ∀j (3.53)
i
k
xij ≥0 (3.54)
Optimizing the subproblem for every scenario k yields a solution π sup,k∗ , π dem,k∗
for the most expensive scenario where π sup,k∗ are the dual values of supply con-
straints 3.52 and π dem,k∗ are the dual values of demand constraints 3.53. This
21
gives Benders’ extreme point constraint ζ − j πjdem,k∗ uk∗
P P sup,k∗
j + i πi zi ≥ 0
that can be added to the master. So after T iterations the master problem is
given by
X
min (fi yi + ai zi ) + ζ (3.55)
(y,z,η)
i
s.t. zi ≤ Ki yi ∀i (3.56)
X
zi ≥ umax (3.57)
i
(dem,k∗),t k∗ (sup,k∗),t
X X
ζ− πj uj + πi zi ≥ 0 ∀t ∈ {1, . . . , T } (3.58)
j i
X X X
max uj πjdem + ūj ωj − zi∗ πisup (3.60)
j j i
ωj ≤ πjdem ∀j (3.63)
ω j ≤ M gj ∀j (3.64)
πjdem , πisup , ωj ≥ 0, gj ∈ {0, 1} (3.65)
M is some really big number. According to [13] it can be set to the value
πjdem,∗ where πjdem,∗ is the value of πjdem in the optimal solution of the subroblem
for Γ equals the number of customers, so all demands are at their maximum
value. This means that gj = 1 for all j and ωj = πjdem . So these values can be
found by solving
X X
max ((uj + ūj )πjdem ) − zi∗ πisup (3.66)
j i
22
3.4 Algorithmic Enhancements
Benders’ decomposition can work well for some problems but solving numerous
(increasingly large) MIPs can hurt performance. A number of ways of overcom-
ing this problem exist. Below are two ways of improving Benders’ decomposi-
tion that can be used separately or combined. The 2-phase method described
in 3.4.1 and the method of using incumbent solutions (3.4.2) are ways to utilize
the observation that optimal subproblem solutions based on non-optimal master
solutions also provide valid constraints. This can be used to reduce the number
of times the master MIP has to be solved.
23
the second phase the integer constraints are reinstated and the master problem
is solved.
The time spent in the first phase can be chosen in a number of ways:
Method 1 leads to the longest possible first phase. Once the first phase
has converged no new constraints can be found using this method. Method 2
is really straightforward. Before starting a number of iterations for the first
phase is chosen and after that the algorithm reapplies the integer constraints to
the master variables that were removed in the first phase. Method 3 compares
the value of the ζ-variable in the master problem and the objective value of
the optimized subproblem. Once these are deemed close enough the algorithm
continues to the second phase.
It is hard to say in advance which method of stopping the first phase gives
the biggest performance boost (if any). The best implementation of this 2-phase
algorithm can only be determined experimentally, also because it is likely to be
problem dependent.
24
Chapter 4
A rather novel way of solving robust optimization problems called Column &
Constraint Generation(C&CG) was first presented in [24]. C&CG is a decom-
position strategy that uses a master and subproblem framework similar to Ben-
ders’ decomposition. The main difference between Benders’ decomposition and
C&CG is that Benders’ is a general solving procedure that can be applied to a
wide range of mixed integer programming problems while C&CG is specifically
suited to solving robust optimization problems.
Despite being quite new there are already a number of applications for
C&CG. It has been mostly used in robust optimization problems concerning
power grids ([1], [14], [15], [23], [25]) but it also seems to work well for facility
location problems ([2], [24]).
The name Column & Constraint Generation is based on the fact that the
algorithm iteratively adds variables (columns) and constraints to the problem
until the optimal solution is found. The variables that are added correspond to
second stage decision variables and the constraints are from scenarios that are
added to the problem. The scenarios are selected on basis of being the worst
case at some point in the optimization.
25
min dy + η (4.1)
y,η
In the problem formulation above the vector y consists of the first stage
variables and the vector xk is made up of the second stage variables for scenario
k. ck is the cost vector for the second stage variables in scenario k.
The main idea of C&CG is iteratively adding scenarios and the corresponding
variables until an optimal solution is found. The optimal solution is found when
the lower bound (LB) and upper bound (UB) maintained by the algorithm are
equal, so when LB = U B.
The LB is based on the idea that the objective value of the optimal solution
for a restricted set of scenarios is never worse than the objective value of the
optimal solution for the complete set of scenarios. Let U be the complete set
of scenarios for problem (4.1)-(4.4) and let U 0 ⊆ U be some restricted scenario
set, then the objective value of the optimal solution for the problem with a
restricted scenario set is a lower bound for the original problem if the subproblem
is bounded and has a feasible solution. Suppose that y ∗ , x1,∗ , . . . , xK,∗ is an
optimal solution for the problem with scenario set U , then y ∗ can be used as a
partial solution for the problem with scenario set U 0 that has an objective value
that is never larger than the objective value for this first stage solution with
scenario set U . Therefore, the optimal solution of the restricted problem leads
to a lower bound for the complete problem.
UB is found by solving the restricted problem and using the resulting op-
timal first stage vector y 0∗ to solve the second stage problem for all scenarios.
This leads to a solution (y 0∗ , x0∗ , . . . , x0K ) that is feasible for the problem with
complete scenario set U and the objective value of this solution is therefore an
upper bound on the objective value of the optimal solution with scenario set U .
The above leads to the C&CG algorithm where the master problem is the
original problem with a restricted set of scenarios and the subproblem is opti-
mizing the second stage variables xk for all scenarios separately given the first
stage decision y found by optimizing the master problem. The scenario k 0 that
0 0
has the highest second stage costs ck xk is then added to the master problem
and the master problem is solved again. This continues until UB=LB.
26
of time. Therefore, scenarios will be generated by solving problem 2.17 - 2.22.
Solving this problem also results in finding the most expensive scenario.
X
min (fi yi + ai zi ) + η (4.5)
(y,z,η)
i
s.t. zi ≤ Ki yi ∀i (4.6)
X
zi ≥ umax (4.7)
i
X
xtij ≤ zi ∀i, t (4.8)
j
X
xtij ≥ utj ∀j, t (4.9)
i
X
η− cij xtij ≥ 0 ∀t (4.10)
ij
27
X
min cij xkij (4.12)
ij
X
s.t − xkij ≥ −zi∗ ∀i (4.13)
j
X
xkij ≥ ukj ∀j (4.14)
i
xkij ≥0 (4.15)
28
Chapter 5
Computational Results
X X
min (fi yi + ai zi ) + max min cij xij (5.1)
(y,z,x) u∈U
i ij
s.t. zi ≤ Ki yi (5.2)
X
xij ≤ zi (5.3)
j
X
xij ≥ uj (5.4)
i
yi ∈ {0, 1}, zi ≥ 0, xij ≥ 0 (5.5)
29
drawn from [100, 1000], variable facility costs per unit ai from [10, 100] and
maximal allowable capacity Ki from [200, 700]. Transportation costs cij are in
the interval [1, 1000]. P P
To ensure feasibility the inequality i Ki ≥ maxu∈U { j uj } has to be re-
spected. Neither [24] nor [13] make clear how this is taken care of during gen-
eration so instances that violate the feasibility constraint will just be ignored.
So an instance is generated, the feasibility is checked and if it is feasible it is
entered into the set of test instances.
The scenarios for the version of the problem with discrete uncertainty sets
were generated based on the corresponding polyhedral uncertainty sets. To
generate a scenario Γ customers are chosen to have their maximum demand in
that scenario. All other customers have base demand. All discrete problems
have 100 scenario’s.
All algorithms are implemented in C# and executed on a computer with an
Intel Core2Duo 2.20 GHz processor.
5.2 Results
To evaluate the performance of the various optimization algorithms 10 instances
of size 30x30 were generated. Each of these problems was extended with 100 ran-
dom scenario’s for every value of Γ. The values of Γ are 3, 6, 9, 12, 15, 18, 21, 24
and 27.
Benders’ Decomposition
Three variations of Benders decomposition are compared. A classic implementa-
tion (denoted by BenClass) that alternatingly solves the master and subproblem
to optimality until the algorithm converges, an implementation where the sub-
problem is solved every time the solver of the master problem encounters a
new incumbent solution (BenCB) and an implementation that implements the
2 Phase method (Ben2Phase). The 2 Phase method was implemented in such
a way that the relaxed master problem was solved for 5 iterations before the
standard Benders’ algorithm took over. As you can see in Table 5.2.1, the classic
implementation outperforms BenCB and is comparable to Ben2Phase. This can
be explained by the fact that both enhancement methods are aimed at reduc-
ing the burden caused by repeatedly solving the master problem to optimality.
Apparently, the master problem is not such a huge bottleneck for this problem
that these methods provide a significant boost in performance.
30
Γ 3 6 9 12 15 18 21 24 27
BenClass time(s) 33.5 69.3 63.5 58.5 57.6 51.7 48.0 45.0 17.6
iterations 67.4 58.7 55.1 53 52.5 48.8 47.7 45.7 45.4
Ben2Phase time(s) 36.4 69.5 24.0 48.2 31.5 19.1 42.1 39.1 38.7
iterations 64.2 58 52 46 43.2 44.3 42 40 40.2
BenCB time(s) 133.9 105.4 106.5 123.4 68.2 89.1 102.9 87.7 84.0
iterations 653.3 517.3 506.7 464.4 427.6 433.7 383.3 339.7 370.5
Gurobi MIP-solver
The Gurobi MIP-solver provides a wide range of settings that can be used to
tune its performance. All were left in their default setting, except for one: the
optimality gap. The optimality gap is used to determine when the algorithm
can stop optimizing. When the relative gap between the lower bound and the
upper bound is smaller than the optimality gap the algorithm terminates. It
would be more elegant if the algorithm terminates when the upper and lower
bound are equal but in practice this is not feasible, mainly due to rounding
errors. It could be that a rounding error causes the upper bound to be slightly
larger than the lower bound even though the optimal solution is found and this
would cause a failure to terminate. So the optimality gap is a necessary evil.
The standard optimality gap in Gurobi is 10−4 . There is no fundamental
reason for the gap to have this value. A value of 10−4 still allows for termi-
nation before an optimal solution is certain to be found, as can be seen in the
results. Therefore it is justified to see how this parameter affects algorithmic
performance. A comparison is made to solution times obtained when setting
the optimality gap to 10−2 .
The results are quite surprising. Solution times are a lot better for the larger
optimality gap. However, it could be that solutions obtained by the larger gap
could be a lot worse because 10−2 is a lot bigger than 10−4 . This seems to
be the case when looking at the results in Table 5.2.1. The solver terminates
with a solution that is guaranteed to be optimal for a considerable number of
instances when the allowed gap is 10−4 and the average final optimality gap is
roughly 100 times worse when the allowed gap is 10−2 as would be expected.
Γ 3 6 9 12 15 18 21 24 27
MIP (gap= 10−4 ) time(s) 37.8 68.1 70.0 71.8 58.4 49.2 48.5 44.2 52.8
exact 5/10 3/10 4/10 6/10 5/10 7/10 3/10 2/10 2/10
avg gap (∗10−5 ) 2.29 4.40 3.68 1.77 3.00 1.09 4.14 4.89 4.83
MIP (gap= 10−2 ) time(s) 5.0 9.1 9.2 9.3 9.2 9.0 9.3 9.4 7.3
exact 0/10 0/10 0/10 0/10 0/10 0/10 0/10 0/10 0/10
avg gap (∗10−5 ) 379 362 310 298 266 259 247 220 214
31
The actual difference between the solutions that are found by both methods
are a lot smaller. The average relative difference (see Table 5.2.1) is less than 1 in
1000. Based on the difference between the allowed optimality gaps this relative
difference could have been almost 1 percent. The relatively small difference can
be explained by the way an optimality gap is calculated. It is simply (UB -
LB)/UB so a larger optimality gap does not necessarily mean that the current
solution is bad. A difference in optimality gap can also be caused by a lower
bound that is less tight. For this problem that seems to be the case. This shows
that performance can be dramatically increased while the obtained solution is,
on average, less than 1 in a 1000 worse.
Γ 3 6 9 12 15 18 21 24 27
avg diff. (∗10−5 ) 44.5 69.6 37.1 39.5 37.9 31.1 22.5 2.0 1.9
Table 5.3: Relative difference between optimal solutions obtained by the Gurobi
MIP-solver with optimality gap 10−2 and 10−4 . The number for each value of
Γ is the average relative difference over ten instances. The realtive difference is
−2 −4 −4
calculated by (U B 10 − U B 10 )/U B 10
Table 5.2.1 makes it clear that a large part of the difference in optimality gap
can be explained by a worse lower bound. The performance of the Gurobi MIP-
solver could potentially be sped up by instructing it to focus more on improving
the lower bound. Fortunately it provides this option. There is a parameter
called MIPFocus that can make the solver spend more resources on improving
the lower bound. According to Gurobi’s documentation: ”If you believe the
solver is having no trouble finding good quality solutions, and wish to focus
more attention on proving optimality, select MIPFocus=2.” The result can be
found in Table 5.2.1. For this comparison the allowed optimality gap was left
at its default value of 10−4 . The results show that changing the focus of the
solver has a positive influence on its performance. The average time it takes to
solve a problem decreases.
Γ 3 6 9 12 15 18 21 24 27
MIP (default) time(s) 37.8 68.1 70.0 71.8 58.4 49.2 48.5 44.2 52.8
exact 5/10 3/10 4/10 6/10 5/10 7/10 3/10 2/10 2/10
avg gap (∗10−5 ) 2.29 4.40 3.68 1.77 3.00 1.09 4.14 4.89 4.83
MIP (MIPFocus = 2) time(s) 40.0 13.2 20.9 18.3 19.1 20.9 14.4 20.2 16.0
exact 5/10 4/10 4/10 5/10 1/10 4/10 3/10 1/10 3/10
avg gap (∗10−5 ) 3.44 3.47 2.44 4.03 4.18 4.23 2.82 4.80 1.88
Table 5.4: Result of difference focus settings for Gurobi MIP-solver, discrete
uncertainty sets, 100 scenarios
The experiments with a different optimality gap and a different focus show
that the performance of a solver can be influenced changes in its settings. These
experiments were not aimed at finding the optimal settings for the Gurobi MIP-
32
solver but are an indication that performance of this solver can be tuned and
that this tuning can have a large effect on its performance.
Both Benders’ decomposition and Column & Constraint Generation use a
MIP-solver and a LP-solver as a subroutine. When comparing the performance
of these algorithms it is important to keep in mind that solver performance can
be greatly affected by changes in settings.
Γ 3 6 9 12 15 18 21 24 27
BenClass time(s) 33.5 69.3 63.5 58.5 57.6 51.7 48.0 45.0 17.6
iterations 67.4 58.7 55.1 53 52.5 48.8 47.7 45.7 45.4
C&CG time(s) 3.2 3.0 5.2 4.1 3.4 3.7 4.6 3.3 1.6
iterations 4.4 3.2 4.2 3.8 3.5 3.6 4.1 3.3 3.2
MIP Gap= 10−2 time(s) 5.0 9.1 9.2 9.3 9.2 9.0 9.3 9.4 7.3
33
advantage over Benders’ decomposition. It should be noted that adding a sce-
nario to the master problem when performing Column & Constraint Generation
involves adding 900 continuous variables and 60 constraints to the master. Dur-
ing Benders’ decomposition no variables are added to the master problem. The
advantage that C&CG has is created by the smaller number of iterations while
its master problem does not become so slow that it becomes a disadvantage.
The fact that the master problems can be expanded with so many variables and
constraints and not have its performance crippled is an interesting observation.
Γ 3 6 9 12 15 18 21 24 27
BenClass time(s) 249.44 5536.17 2620.42 3718.39∗ 3698.02∗ 3706.67∗ 1083.17∗ 1867.93∗ 20.45
iterations 65 59.5 34.5 34.5 22.5 18.5 33.5 5770 27
C&CG time(s) 10.42 101.43 509.25 611.17 477.70 230.56 97.75 14.40 3.17
iterations 4.7 4.8 6.5 6.5 5.9 5.6 5.4 4 3.4
34
Chapter 6
Interpretation of Results
35
Generation performs better than Benders decomposition for the instances tested
using the code and computer system of the author. So to claim a general result,
you would have to prove the following:
In the next three sections, the challenges accompanying these three points
will be explained. After that, some possible ways to mitigate these effects will
be discussed.
6.1 Challenges
6.1.1 Problem Selection
It is unclear how the performance of solver on a number of problem instances
should be elevated to a general result for some problem. The generation of
problems for this report was done using the description of [24], which is based
on [13]. The parameters used to generate problem instances seem to be picked at
random, or at least they are not thoroughly justified. Such a justification could
for example be that these problems are a very good representation of a general
problem or that they are close to a problem that is relevant in the real world. For
example, when solver performance for the recourse problem is analysed section
4.1 of [13] the number of customers is larger than the number of warehouses ‘to
be closer to reality’. However, when the full problem is solved the number of
customers equals the number of warehouses. This way of generating problems
is not wrong in any way but it will not lead to a general result, simply because
it is unclear how representative the sample is for what population.
A question that arises when thinking about the problem selection is the
required number of tested instances. No attempt is made to link the sample
size to the reliability of the result. So something that could be given more
attention in papers like [24] and [13] is what set of problems was tested and how
do the algorithmic performance figures translate to a larger group of problems.
Another possible issue that arises from the lack of a transparent way selecting
test problems is possibility of cherry picking results. If authors are free to choose
the problem instances on which they test their algorithms there is the possibility
of trying an algorithm on a set of instances and only publishing the results for the
subset of instances on which it performed well. It is hard to say how often this is
done, but the major problem is that it is impossible to check because everyone
can select their problem instances without having to justify their choices.
36
6.1.2 Code and Algorithmic Choices
An algorithm like Benders’ decomposition is not so much one algorithm as it is a
family of algorithms. That can be seen in this report for example by the fact that
Benders’ can be implemented with the 2 Phase-method or with a technique that
uses incumbent solutions during the solving of the master problem to generate
constraints. The Gurobi MIP-solver that was used for the problem with discrete
uncertainty set has a wide range of settings that can be changed, one of which
was the optimality gap that affected performance quite a bit. Even more, both
Benders’ decomposition and C&CG employ MIP- and LP-solvers to solve the
master problem and subproblem respectively. These solvers can be configured
in a lot of ways that can affect performance.
Besides algorithmic choices that influence performance there is also the way
in which an algorithm is implemented that affects how well it does. On top of
that, the programming language itself in which the algorithm was written can
have an effect on its speed.
Two specific implementations of an algorithm that are given the same name
by their respective authors could have differences in performance that are caused
by different algorithmic choices and differences in implementation that are not
immediately clear to the reader of an academic paper.
A fundamental problem with the way results are usually presented is that it is
unclear how much an algorithm was tuned to perform optimally on the problem
instances it is tested on. This could lead to a problem that is equivalent to the
statistical notion of overfitting. This implies that it is possible for an algorithm
that was tuned to perform optimally on the test instances might not perform
very well on instances that it was not initially tested on.
37
6.2 Possible Solution
The best way to deal with these issues is transparency. Transparency comes
in many different shades and it is not the objective of this paper to present a
ready-made solution, but it could be beneficial to future scientific endeavors to
think about how results are made public.
To be able to judge the results published in an academic paper it is important
that they can be reproduced. This is by definition impossible if the problem
instances are not available in some form. If the problem instances are generated
in some random and reproducible way the experiment can be repeated, but there
is still no way to repeat the experiments of the original publication. This could
be dealt with by demanding that the set of problem instances is published in
an easily accessible way along with the article. Some kind of public repository
is an option for this. An initiative like miplib.zib.de is a start, but as of now
it only contains 361 problem instances spread out over a large number different
problems.
However, due to the proprietary nature of some data not all problem in-
stances can be made public. Depositing such a problem at some kind of neutral
third party that is able to run an algorithm for an interested researcher could
be solution in these cases.
Some kind of database that shows what kind of algorithms have been applied
to some problem would speed up the way in which research is done. This
would provide an overview over the large amount of papers that are produced.
An example from the medical science shows what this could look like. The
International Committee of Medical Journal Editors requires that all medical
trials are registered with clinicaltrials.gov as a prerequisite for publication
of trial results.
Such a database does not completely prevent the possibility of cherry picking
but it will at least result in a nice overview of what has been tried and also allows
for negative results to be made public without being published in a journal
article.
The greatest opportunity lies in the sharing of code. If code is required to be
made public along with an article presenting results, doubts about code qual-
ity and algorithmic choices can be easily addressed. The availability of code
together with the publication of problem instances leads to simple reproduc-
tion and verification of results. Testing how well the results translate to other
instances is made easier.
A comprehensive open source optimization library would be even better.
Open source projects can be very successful, see for example the Python pro-
gramming language, the R project for statistical computing or the Linux operat-
ing system. If such a thing would exist for optimization it would make applying
different algorithms to problems a lot easier. No longer would scientists have to
write their own implementations of algorithms to be able to test them. Also,
comparison of performance would be made a bit easier if an algorithm is avail-
able in an open source library so there are no doubts about the way it was
implemented.
38
Of course, computer code can also be deemed proprietary in some cases.
Here, a neutral third party could also serve the purpose of making experiments
reproducible without making everything public.
Overfitting of an algorithm on a set of problem instances can be dealt with
in two ways. The simplest one is applying the algorithm to problem instances
that it was not initially tested on. This could be done after a result is published,
but it can also be incorporated in the development of the algorithm by splitting
the available problem instances into a training set and a test set. Similarly to
the way in which this scheme is applied in machine learning, the algorithm can
be tuned on the training set and its performance judged on the test set.
Addressing the issue of performance variability is the toughest nut to crack.
This issue will always exist in some form unless all computational experiments
are run within the same (virtual) computing system. However, such a stan-
dardized system would be one among many and it is hard to see how it could
be designed to allow maximal crossover between the results of experiments and
applications in the real world. That does not mean performance variability
is something that can just be ignored when publishing results. More research
could shed light on what algorithms are particularly affected by it and how the
effects could be alleviated.
39
Chapter 7
Conclusion
40
Bibliography
41
[11] K. Dhamdhere, V. Goyal, R. Ravi, and M. Singh. “How to pay, come what
may: Approximation algorithms for demand-robust covering problems”.
In: 46th Annual IEEE Symposium on Foundations of Computer Science
(FOCS’05). IEEE. 2005, pp. 367–376.
[12] M. Fischetti, A. Lodi, M. Monaci, D. Salvagnin, and A. Tramontani. “Im-
proving branch-and-cut performance by random sampling”. In: Mathe-
matical Programming Computation 8.1 (2016), pp. 113–132.
[13] V. Gabrel, M. Lacroix, C. Murat, and N. Remli. “Robust location trans-
portation problems under uncertain demands”. In: Discrete Applied Math-
ematics 164 (2014), pp. 100–111.
[14] R. A. Jabr, I. Džafić, and B. C. Pal. “Robust optimization of storage
investment on transmission networks”. In: IEEE Transactions on Power
Systems 30.1 (2015), pp. 531–539.
[15] C. Lee, C. Liu, S. Mehrotra, and M. Shahidehpour. “Modeling transmis-
sion line constraints in two-stage robust unit commitment problem”. In:
IEEE Transactions on Power Systems 29.3 (2014), pp. 1221–1231.
[16] C. Liebchen, M. Lübbecke, R. Möhring, and S. Stiller. “The concept of
recoverable robustness, linear programming recovery, and railway appli-
cations”. In: Robust and online large-scale optimization. Springer, 2009,
pp. 1–27.
[17] T. L. Magnanti and R. T. Wong. “Accelerating Benders decomposition:
Algorithmic enhancement and model selection criteria”. In: Operations
research 29.3 (1981), pp. 464–484.
[18] D. McDaniel and M. Devine. “A modified Benders’ partitioning algorithm
for mixed integer programming”. In: Management Science 24.3 (1977),
pp. 312–319.
[19] A. Mercier, J. F. Cordeau, and F. Soumis. “A computational study of Ben-
ders decomposition for the integrated aircraft routing and crew scheduling
problem”. In: Computers & Operations Research 32.6 (2005), pp. 1451–
1476.
[20] R. H. Pearce and M. Forbes. “Disaggregated Benders Decomposition for
solving a Network Maintenance Scheduling Problem”. In: arXiv preprint
arXiv:1603.02378 (2016).
[21] T. Santoso, S. Ahmed, M. Goetschalckx, and A. Shapiro. “A stochas-
tic programming approach for supply chain network design under un-
certainty”. In: European Journal of Operational Research 167.1 (2005),
pp. 96–115.
[22] L. Tang, W. Jiang, and G. K. Saharidis. “An improved Benders decom-
position algorithm for the logistics facility location problem with capacity
expansions”. In: Annals of operations research 210.1 (2013), pp. 165–190.
[23] W. Wei, F. Liu, S. Mei, and Yunhe Hou. “Robust energy and reserve
dispatch under variable renewable generation”. In: IEEE Transactions on
Smart Grid 6.1 (2015), pp. 369–380.
42
[24] B. Zeng and L. Zhao. “Solving two-stage robust optimization problems us-
ing a column-and-constraint generation method”. In: Operations Research
Letters 41.5 (2013), pp. 457–461.
[25] M. Zugno and A. J. Conejo. “A robust optimization approach to energy
and reserve dispatch in electricity markets”. In: European Journal of Op-
erational Research 247.2 (2015), pp. 659–671.
43