Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
89 views

Ahuja2001 ReferenceWorkEntry MaximumFlowProblemMaximumFlowP

Uploaded by

nada abdelrahman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

Ahuja2001 ReferenceWorkEntry MaximumFlowProblemMaximumFlowP

Uploaded by

nada abdelrahman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 360

M

MATRIX COMPLETION PROBLEMS some matrix B. A matrix A is said to be completely


Matrix completion problems are concerned with positive if A - B B T for some nonnegative matrix
determining whether partially specified matrices B. An n x n real symmetric matrix D - (dij) is
can be completed to fully specified matrices sat- a Euclidean distance matrix (abbreviated as dis-
isfying certain prescribed properties. In this ar- tance matrix) if there exist vectors vl,...', vn C R k
ticle we survey some results and provide refer- (for some k > 1) such that, for all i, j - 1 , . . . , n,
ences about these problems for the following ma- dij is equal to the square of the Euclidean distance
trix properties: positive semidefinite matrices, Eu- between vi and vj. Finally, a (rectangular) matrix
clidean distance matrices, completely positive ma- A is a contraction matrix if all its singular values
trices, contraction matrices, and matrices of given (that is, the eigenvalues of A ' A ) are less than or
rank. We treat mainly optimization and combina- equal to 1.
torial aspects. The set of positions corresponding to the spec-
ified entries of a partial matrix A is known as the
I n t r o d u c t i o n . A partial matrix is a matrix whose pattern of A. If A is an n x m partial matrix, its
entries are specified only on a subset of its posi- pattern can be represented by a bipartite graph
tions; a completion of a partial matrix is simply with node bipartition [1, n] U [1, m] having an edge
a specification of the unspecified entries. Matrix between nodes i e [1, n] and j e [1, m] if and only
completion problems are concerned with determin- if entry aij is specified.
ing whether or not a completion of a partial matrix When asking about existence of a psd comple-
exists which satisfies some prescribed property. We tion of a partial n x n matrix A, it is commonly
consider here the following matrix properties: pos- assumed that all diagonal entries of A are speci-
itive (semi) definite matrices, distance matrices, fied (which is no loss of generality if we ask for a
completely positive matrices, contraction matri- pd completion); moreover, it can obviously be as-
ces, and matrices of given rank; definitions are re- sumed that A is partial Hermitian, which means
called below. that entry aji is specified and equal to hi*j when-
In what follows, x*, A* denote the conjugate ever aij is specified. Hence, in this case, complete
transpose (in the complex case) or transpose (in information about the pattern of A is given by
the real case) of vector x and matrix A. A square the graph G - ([1, n ] , E ) w i t h node set [1, n]
real symmetric or complex Hermitian matrix A is and whose edge set E consists of the pairs ij
positive semidefinite (psd) if x*Ax > 0 for all vec- (1 _ i < j _ n) for which aij is a specified entry
tors x and positive definite (pd) if x*Ax > 0 for of A. The same holds when dealing with distance
all vectors x ~ 0; then we write" X ~ 0 (X ~- 0). matrix completions (in which case diagonal entries
Equivalently, A is psd (respectively, pd) if and only can obviously be assumed to be equal to zero).
if all its eigenvalues are nonnegative (respectively, An important common feature of the above ma-
positive) and A is psd if and only if A - B B T for trix properties is that they possess an 'inheritance
Matrix completion problems

structure'. Indeed, if a partial matrix A has a psd (where A . X "- ~-~in,j=l ai~xi j for two Hermitian
(pd, completely positive, distance matrix) comple- (n × n)-matrices A and X).
tion, then every principal specified submatrix of A The exact complexity status of problems (PSD)
is psd (pd, completely positive, a distance matrix); and (P) is not known; in particular, it is not known
similarly, if a partial matrix A admits a comple- whether they belong to the complexity class NP.
tion of rank _ k, then every specified submatrix However, it is shown in [60] that (P) is neither
of A has rank < k. Hence, having a completion of NP-complete nor co-NP-complete if NP~co-NP.
a certain kind imposes certain 'obvious' necessary However, the semidefinite programming problem
conditions. This leads to asking which are the pat- and, thus, problem (PSD) can be solved with an
terns for the specified entries that insure that if the arbitrary precision in polynomial time. This can
obvious necessary conditions are met, then there be done using the ellipsoid method (since one can
will be a completion of the desired type; therefore, test in polynomial time whether a rational matrix
this introduces a combinatorial aspect into matrix A is positive semidefinite and, if not, find a vec-
completion problems, as opposed to their analyti- tor x such that x*Ax < 0; cf. [24]), or interior
cal nature. point methods (cf. [56], [3], [27]). There has been
In this article we survey some results and pro- a growing interest in semidefinite programming in
vide references for the various matrix completion the recent years (1994), which is due, in particular,
problems mentioned above, concerning optimiza- to its successful application to the approximation
tion and combinatorial aspects of the problems. of hard combinatorial optimization problems (cf.
See [32], [47] for more detailed surveys on some of the survey [20]). This has prompted active research
the topics treated here. on developing interior point algorithms for solving
semidefinite programming problems; the literature
Positive Semidefinite Completion Problem. is quite large, see [65], [64] for extensive informa-
We consider here the following positive (semi) def- tion. Numerical tests are reported in [34] where
inite completion problem (PSD)" Given a partial an interior point algorithm is proposed for the ap-
Hermitian matrix A - (aij)ijES whose entries are proximate psd completion problem; it permits to
specified on a subset S of the positions, determine find exact completions for random instances up to
whether A has a psd (or pd) completion; if, yes, size 110.
find such a completion. (Here, S is generally as- Moreover, it is shown in [59] that problem (P)
sumed to contain all diagonal positions.) can be solved in polynomial time (for rational in-
This problem belongs to the most studied ma- put data Aj, bj) if either the number m of con-
trix completion problems. This is due, in partic- straints, or the order n of the matrices X, Aj in
ular, to its many applications, e.g., in probabil- (2) is fixed (cf. also [9]). Moreover, under the same
ity and statistics, systems engineering, geophysics, assumption, one can test in polynomial time the
etc., and also to the fact that positive semidefinite- existence of an integer solution and find one if it
ness is a basic property which is closely related to exists [39].
other matrix properties like being a contraction or Call a partial Hermitian matrix A partial psd
distance matrix. Equivalently, (PSD) is the prob- (respectively, partial pd) if every principal specified
lem of testing feasibility of the following system submatrix of A is psd (respectively, pd). As men-
(in variable X - ( x i j ) ) " tioned in the Introduction, being partial psd (pd)
is an obvious necessary condition for A to have a
X ~ O, xij - aij (ij E S). (1)
psd (pd) completion. In general, this condition is
Therefore, (PSD) is an instance of the following not sufficient; for instance, the partial matrix:
semidefinite programming problem (P)" Given Her-
mitian matrices A 1 , . . . , Am and scalars b l , . . . , bin,
decide whether the following system is feasible" A ~
!11!
1 1

1
?

1
0

X ~- O, Aj . X - bj (j - 1 , . . . , m ) (2) ? 1

222
Matrix completion problems

('?' indicates an unspecified entry) is partial psd, existence of psd completions. Namely, it is shown
yet no psd completion exists; note that the pat- in [8] that if a partial matrix A - (aij) with pat-
tern of A is a circuit of length 4. Call a graph tern G and diagonal entries equal to 1 is com-
chordal if it does not contain any circuit of length pletable to a psd matrix, then the associated vector
>_ 4 as an induced subgraph; chordal graphs oc- x "-(arccos(aij)/Tr)ijEE satisfies the inequalities"
cur in particular in connection with the Gaussian
Exe- ~ xe <_ lF] - I (3)
elimination process for sparse pd matrices (el. [61],
eEF eEC\F
[21]). (An induced subgraph of a graph G - (V, E)
for all F C C, C circuit in G, IF[ odd.
being of the form H - (U, F) where U C_ V and
F "- {ij E E" i , j E U}.) It is shown in [23] that Moreover, any partial matrix with pattern G sat-
every partial psd matrix with pattern G has a psd isfying (3) is completable to a psd matrix if and
completion if and only if G is a chordal graph; only if G does not contain a homeomorph of/(4
the same holds for pd completions. This extends as an induced subgraph (then, G is also known
an earlier result from [16] which dealt with 'block- as series-parallel graph) [44]. (Here, /(4 denotes
banded' partial matrices; in the Toeplitz case (all the complete graph on 4 nodes and a homeomorph
entries equal along a band), one finds the classical of/(4 is obtained by replacing the edges of/(4
Carath6odory-Fej6r theorem from function theory. by paths of arbitrary length.) The patterns G for
The proof from [23] is constructive and can be which every partial psd matrix satisfying (3) has
turned into an algorithm with a polynomial run- a psd completion are characterized in [6]; they are
ning time [48]. Moreover, it is shown in [48] that the graphs G which can be made chordal by adding
(PSD) can be solved in polynomial time when re- a set of edges in such a way that no new clique of
stricted to partial rational matrices whose pattern size 4 is created. Although (3) can be checked in
is a graph having a fixed minimum fill-in; the mini- polynomial time for rational x [5], the complexity
mum fill-in of a graph being the minimum number of problem (PSD) for series-parallel graphs (or for
of edges needed to be added in order to obtain a the subclass of circuits) is not known. A strength-
chordal graph. This result is based on the above ening of condition (3) (involving cuts in graphs) is
mentioned results from [59], [39] concerning the formulated in [44].
polynomial time solvability of (integer) semidefi- Another approach to problem (PSD) is consid-
nite programming with a fixed number m of linear ered in [I], [28], which is based on the study of the
constraints in (2). cone
The result from [23] on psd completions of par-
tial matrices with a chordal l~attern has been gen-
~ G "-- {
X -- (Xij)i,jEV" Vi ¢ j, ij ~_ E }
eralized in various directions; for instance, consid- associated to graph G = (V, E). Indeed, it is shown
ering general inertia possibilities for the comple- there that a partial matrix A with pattern G has
tions ([35], [17]), or considering completions with a psd completion if and only if
entries in a function ring [37].
Z + a x,j >_ o, vx (4)
If A is a partial matrix having a pd completion, iEV i¢j,
then A has a unique pd completion with maximum ijEE

determinant (this unique completion being charac- Obviously, it suffices to check (4) for all X extremal
terized by the fact that its inverse has zero entries in P c (i.e., X lying on an extremal ray of the cone
at all unspecified positions of A) [23]. In the case Pc).
when the pattern of A is chordal, explicit formu- Define the order of G as the maximum rank of
las for this maximum determinant are given in [7]. an extremal matrix in Pc. The graphs of order 1
The paper [52] considers the more general problem are precisely the chordal graphs [1], [58] and the
of finding a maximum determinant psd completion graphs of order 2 have been characterized in [46].
satisfying some additional linear constraints. One might reasonably expect that problem (PSD)
Further necessary conditions are known for the is easier for graphs having a small order. This is

223
Matrix completion problems

indeed the case for graphs of order 1; the complex- instance of the latter problem is the molecular
ity of (PSD) remains however open for the graphs conformation problem in chemistry; indeed, nu-
of order 2 (partial results are given in [48]). clear magnetic resonance spectroscopy permits to
determine some pairwise interatomic distances,
Euclidean Distance Matrix Completion the question being then to reconstruct the global
P r o b l e m . We consider here the Euclidean dis- shape of the molecule from this partial information
tance matrix completion problem (abbreviated (cf. [13], [41]).
as distance matrix completion problem) (EDM): In view of relation (6), problem (EDM) can be
Given a graph G = (Y = [1, n], E) and a real par- formulated as an instance of the semidefinite pro-
tial symmetric matrix A - ( a i j ) with pattern G gramming problem (P) and, therefore, it can be
and with zero diagonal entries, determine whether solved with an arbitrary precision in polynomial
A can be completed to a distance matrix; that is, time. Exploiting this fact, some specific algorithms
whether there exist vectors v l , . . . , v n E R k for based on interior point methods are presented in
some k > 1 such that [2] together with numerical tests. Moreover, prob-
aij - IIvi - vj [I2 for all ij E E. (5) lem (EDM) can be solved in polynomial time when
restricted to partial rational matrices whose pat-
(here, Ilvll - i~-~'kh_l v 2 denotes the Euclidean tern is a chordal graph or, more generally, a graph
norm of v E Rk.) The vectors v l , . . . , V n are with fixed minimum fill-in [48]; as in the psd case,
then said to form a realization of A. A variant of this follows from the fact (mentioned below) that
problem (EDM) is the graph realization problem partial matrices that are completable to a distance
(EDMk), obtained by letting the dimension k of matrix admit a good characterization when their
the space where one searches for a realization of A pattern is a chordal graph.
be part of the input data. While the exact complexity of problem (EDM)
Distance matrices are a central notion in the is not known, it has been shown in [62] that prob-
area of distance geometry; their study was ini- lem (EDMk) is NP-complete if k - 1 and NP-hard
tiated by A. Cayley in the 18th century and it if k > 2 (even when restricted to partial matrices
was continued in particular by K. Menger and I.J. with entries in {1, 2}). Finding e-optimal solutions
Schoenberg in the 1930s. They are, in fact, closely to the graph realization problem is also NP-hard
related to psd matrices. The following basic con- for small e ([53]). The graph realization problem
nection was established in [63]. Given a symmetric (EDMk) has been much studied, in particular in
(n x n)-matrix D -- (dij) ni,j--1 with zero diagonal dimension k < 3, which is the case most relevant
entries, consider the symmetric ((n - 1) x (n - 1))- to applications. The problem can be formulated as
matrix X - (xij )i,j=l
n-1 defined by a nonlinear global optimization problem: min f ( v )
such that v - ( v l , . . . , vn) E R kn, where the cost
xij - 1 (din + din - d i j )
-~ (6) function f(.) can, for instance, be chosen as

for a l l i , j = 1 , . . . , n - 1. f(v) - (llv - viii -


Then, D is a distance matrix if and only if X is psd; ijCE

moreover, D has a realization in the k-space if and Hence, f(.) is zero precisely when the vi's provide
only if X has rank < k. Other characterizations are a realization of the partial matrix A. This opti-
known for distance matrices. As the literature on mization problem is hard to solve (as it may have
this topic is quite large, see the monographs [11], many local optimum solutions). Several algorithms
[13], [14], where further references can be found. have been proposed in the literature; see, in par-
Problems (EDM) and (EDMk) have many im- ticular, [13], [19], [26], [29], [31], [41], [54], [57].
portant applications; for instance, to multidimen- They are based on general techniques for global
sional scaling problems in statistics (cf. [49]) and to optimization like tabu and pattern search [57], the
position-location problems, i.e., problem (EDMk) continuation approach (which consists of trans-
mostly in dimension k < 3. A much studied forming the original function f (.) into a smoother

224
Matrix completion problems

function having fewer local optimizers, [53], [54]), C o m p l e t i o n to C o m p l e t e l y P o s i t i v e a n d


or divide-and-conquer strategies aiming to break C o n t r a c t i o n M a t r i c e s . Call a matrix doubly
the problem into a sequence of smaller or eas- nonnegative if it is psd and entrywise nonnegative.
ier subproblems [13], [29], [31]. In [29], [31], the Every completely positive (cp, for short) matrix is
basic step consist of finding principal submatri- obviously doubly nonnegative. The converse impli-
ces having a unique realization, treating each of cation holds for matrices of order n < 4 (cf. [22])
them separately and then trying to combine the and for certain patterns of the nonzero entries in
solutions. Thus arises the problem of identifying A (cf. [40]). The cp property is obviously inherited
principal submatrices having a unique realization, by principal submatrices; call a partial matrix A
which turns out to be NP-hard [62]. However, sev- a partial cp matrix if every fully specified princi-
eral necessary conditions for unicity of realization pal submatrix of A is cp. It is shown in [15] that
are known, related with connectivity and generic every partial cp matrix with graph pattern G is
rigidity properties of the graph pattern [67], [30]. completable to a cp matrix if and only if G is a
Generic rigidity of graphs can be characterized and so-called block-clique graph. A block-clique graph
recognized in polynomial time only in dimension being a chordal graph in which any two distinct
k < 2 ([42], [51]) (cf. the survey [43] for more ref- maximal cliques overlap in at most one node or,
erences). equivalently, a chordal graph that does not con-
Call a partial matrix A a partial distance ma- tain an induced subgraph of the form"
trix if every specified principal submatrix of A is a
distance matrix. Being a partial distance matrix is
obviously a necessary condition for A to be com-
pletable to a distance matrix. It is shown in [4]
that every partial distance matrix with pattern G
is completable to a distance matrix if and only if G
is a chordal graph; moreover, if all specified prin-
Recall that an (n × m)-matrix A is a contrac-
cipal submatrices of the partial matrix A have a
tion matrix if all eigenvalues of A*A are less than
realization in the k-space, then A admits a com-
or equal to 1 or, equivalently, if the matrix
pletion having a realization in the k-space.
As noted in [33], if a partial matrix A with pat-
tern G is completable to a distance matrix, then
the associated vector x "= ( ~ ) i j e E must satisfy is positive semidefinite. Call a partial matrix A a
the inequalities: partial contraction if all specified submatrices of A
are contractions. As every submatrix of a contrac-
• (7) tion is again a contraction, an obvious necessary
f~C\{e}
condition for a partial matrix A to be completable
for all e E C, C circuit in G. to a contraction matrix is that A be a partial con-
The graphs G for which every partial matrix (re- traction. Thus arises the question of characteriz-
spectively, partial distance matrix) A with pattern ing the graph patterns G for which every partial
G for which (7) holds is completable to a distance contraction with pattern G can be completed to a
matrix, are the graphs containing no homeomorph contraction matrix.
of K4 as an induced subgraph [45] (respectively, As we now deal with rectangular n x m partial
the graphs that can be made chordal by adding matrices A, their pattern is the bipartite graph G
edges in such a way that no new clique of size 4 with node set U U V, where U, V index the rows
is created [33]). Note the analogy with the cor- and columns of A and edges of G correspond to
responding results for the psd completion prob- the specified entries of A. We may clearly assume
lem; some connections between the two problems to be dealing with partial matrices whose pattern
(EDM) and (PSD) are exposed in [38], [45]. is a connected graph (as the partial matrices asso-
ciated with the connected components can be han-

225
Matrix completion problems

died separately). Below is an example of a partial equal to 0). On the other hand, determining mr(A)
matrix A which is a partial contraction, but which seems to be a much more difficult task.
is not completable to a contraction matrix: We first deal with the problem of finding maxi-
mum rank completions. Let A be an n × m partial
• ,/~
A- 1__ 1__ • matrix with graph pattern G, i.e., G is the bipar-
tite graph (U U V, E) where U, V index respec-
In fact, the graph pattern displayed in this exam- tively the rows and columns of A, and the edges
ple is in a sense present in every partial contraction of G correspond to the specified entries of A, and
which is not completable to a contraction. Namely, let G denote the complementary bipartite graph
it is shown in [36] that the following assertions (i- whose edges correspond to unspecified entries of
iii) are equivalent for a connected bipartite graph A. Note that computing MR(A) amounts to com-
G with node bipartition U t2 V: puting the generic rank of A when viewing the
i) Every partial contraction with pattern G can unspecified entries of A as independent variables
be completed to a contraction; over the field containing the specified entries. For
a subset X C U U V, let Ax denote the submatrix
ii) G does not contain an induced matching of
of A with respective row and column index sets
size 2 (i.e., if e := uv, e ~ := u~v ~ are edges in
{i e [1, n]" ui ¢ X} and {j e [1, m]" vj ~ Z } .
G with u ~ u ~ C U, v ~: v ~ E V, then at least
Call X a cover of G if every edge of G has at
one of the pairs uv ~, u~v is an edge in G; that
least one end node in X; that is, if Ax is a
is, G is nonseparable in the terminology of
fully specified submatrix of A. Clearly, we have:
[21]);
MR(A) __ r a n k ( A x ) + IX[. In fact, the following
iii) The graph G obtained from G by adding all equality holds"
edges uu' (u ~ u' E V) and vv' (v ~ v' E V)
is chordal. MR(A) - rain _ r a n k ( A x ) + IX[ (9)
X cover of G
(Note that the implication iii)~i) is a consequence
as shown in [12]. A determinantal version of the
of the result on psd completions from [23] men-
result was given in [25]. In the special case when
tioned in the Section on the positive semidefinite
all specified entries of A are equal to 0, then
completion problem above, as G is the graph pat-
MR(A) coincides with the maximum cardinality of
tern of the matrix A defined in (8).)
a matching in G and, therefore, the minimax rela-
tion (9) reduces to the Frobenius-KSnig theorem
R a n k C o m p l e t i o n s . In this section, we consider
(cf. [50] for details on the latter result). Moreover,
the problem of determining the possible ranks for
one can determine MR(A) and construct a max-
the completions of a given partial matrix. For a
imum rank completion of A in polynomial time.
partial matrix A, let mr(A) and MR(A) denote,
This was shown in [55] by a reduction to matroid
respectively, the minimum and maximum possible
intersection and, more recently, in [18] where a
ranks for a completion of A. If B, C are comple-
simple greedy procedure is presented that solves
tions of A of respective ranks mr(A), MR(A), then
the problem by perturbing an arbitrary comple-
changing B into C by changing one entry of B
tion.
into the corresponding entry of C at a time per-
We now consider m i n i m u m rank completions.
mits to construct completions realizing all ranks
To start with, note that mr(A) may depend, in
in the range [mr(A), MR(A)]. Hence, the question
general, on the actual values of the specified en-
is to determine the two extreme values mr(A) and
tries of A (and not only on the ranks of the speci-
MR(A). As we see below, the value MR(A) can, in
fied submatrices of A). Indeed, consider the partial
fact, be expressed in terms of ranks of fully spec-
ified submatrices of A and it can be computed in matrix A - d~ where a b, c, d, e J' ¢ 0 Then
, , • ,

polynomial time; this constitutes a generalization mr(A) - 1 if ace - bdf and mr(A) - 2 otherwise,
of the celebrated Frobenius-Khnig theorem (cor- while all specified submatrices have rank 1 in both
responding to the case when specified entries are cases. Thus arises the question of identifying the

226
Matrix completion problems

bipartite graphs G for which mr(A) depends only given sparsity pattern', Linear Alg. ~ Its Appl. 107
on the ranks of the specified submatrices of A for (1988), 101-149.
every partial matrix A with p a t t e r n G; such graphs
[2] ALFAKIH, A.Y., KHANDANI,A., AND WOLKOWICZ,H.:
'Solving Euclidean distance matrix completion prob-
are called rank determined. The graph p a t t e r n of lems via semidefinite programming', Comput. Optim.
the above instance A is the circuit C6. Hence, C6 Appl. 12 (1998), 13-30.
is not rank determined. Call a bipartite graph G [3] ALIZADEH, F.: 'Interior point methods in semidefinite
bipartite chordal if it does not contain a circuit of programming with applications in combinatorial opti-
length _ 6 as an induced subgraph. Then, if a bi- mization', SIAM J. Optim. 5 (1995), 13-51.
[4] BAKONYI, M., AND JOHNSON, C.R.: 'The Euclidian
partite graph is rank determined, it is necessarily
distance matrix completion problem', SIAM J. Matrix
bipartite chordal [12]. It is conjectured there that, Anal. Appl. 16 (1995), 646-654.
conversely, every bipartite chordal graph is rank [5] BARAHONA, F., AND MAHJOUB, A.R.: 'On the cut
determined. The conjecture was shown to be true polytope', Math. Program. 36 (1986), 157-173.
in [66] for the nonseparable bipartite graphs (i.e., [6] BARRETT, W.W., JOHNSON, C.R., AND LOEWY, R.:
'The real positive definite completion problem: cy-
the bipartite graphs containing no induced match-
cle completability', Memoirs Amer. Math. Soc. 584
ing of size 2; they are obviously bipartite chordal). (1996).
Note that a partial matrix A has a nonseparable [7] BARRETT, W.W., JOHNSON, C.R., AND LUNDQUIST,
p a t t e r n if and only if it has (up to row/column M.: 'Determinantal formulas for matrix completions as-
permutation) the following 'triangular' form: sociated with chordal graphs', Linear Alg. ~ Its Appl.
121 (1989), 265-289.
[8] BARRETT, W., JOHNSON, C.R., AND TARAZAGA, P.:
'The real positive definite completion problem for a
simple cycle', Linear Alg. ~ Its Appl. 192 (1993), 3-
31.
[9] BARVINOK, A.I.: 'Feasibility testing for systems of
real quadratic equations', Discrete Comput. Geom. 10
(1993), 1-13.
[10] BARVINOK, A.I.: 'Problems of distance geometry and
convex properties of quadratic maps', Discrete Corn-
put. Geom. 13 (1995), 189-202.
[11] BLUMENTHAL, L.M.: Theory and applications of dis-
tance geometry, Oxford Univ. Press, 1953.
Then, mr(A) can be explicitly formulated in [12] COHEN, N., JOHNSON, C.R., RODMAN, L., AND WOE-
DERMAN, H.J.: 'Ranks of completions of partial ma-
terms of the ranks of the specified submatrices
trices', in H. DYM ET AL. (eds.): The Gohberg Anniv.
of A; in the simplest case, the formula for mr(A) Coll., Vol. I, Birkh~iuser, 1989, p. 165-185.
reads" [13] CRIPPEN, G.M., AND HAVEL, T.F.: Distance geometry
and molecular conformation, Res. Studies Press, 1988.
mr(~ D) [14] DEZA, M., AND LAURENT, M.: Geometry of cuts and
metrics, Vol. 15 of Algorithms and Combinatorics,
Springer, 1997.
-rank(c ) + rank (C D) - rank(C). [15] DREW, J.H., AND JOHNSON, C.R.: 'The completely
positive and doubly nonnegative completion problems',
It is shown in [12] t h a t the above conjecture holds Linear Alg. 8J Its Appl. 44 (1998), 85-92.
when the p a t t e r n G is a path, or when G is ob- [16] DYM, H., AND GOHBERG, I.: 'Extensions of band ma-
tained by 'gluing' a collection of circuits of length trices with band inverses', Linear Alg. ~ Its Appl. 36
4 along a common edge. ( 1981), 1-24.
[17] ELLIS, R.L., LAY, D.C., AND GOHBERG, I.: 'On neg-
See also: I n t e r i o r p o i n t m e t h o d s for s e m i -
ative eigenvalues of selfadjoint extensions of band ma-
definite programming; Semidefinite pro- trices', Linear Alg. ~¢ Its Appl. 24 (1988), 15-25.
gramming and determinant maximization. [18] GEELEN, J.: 'Maximum rank matrix completion', Lin-
ear Alg. ~ Its Appl. 288 (1999), 211-217.
References [19] GLUNT, W., HAYDEN, T.L., AND RAYDAN, M.:
[1] AGLER, J., HELTON, J.W., MCCULLOUGH, S., AND 'Molecular conformations from distance matrices', J.
RODMAN, L." 'Positive semidefinite matrices with a

227
Matrix completion problems

Comput. Chem. 14 (1998), 175-190. [36] JOHNSON, C.R., AND RODMAN, L.: 'Completion of ma-
[2o] GOEMANS, M.X.: 'Semidefinite programming in combi- trices to contractions', J. Funct. Anal. 69 (1986), 260-
natorial optimization', Math. Program. 79 (1997), 143- 267.
161. [37] JOHNSON, C.R., AND RODMAN, L.: 'Chordal inheri-
[21] GOLUMBIC, M.C.: Algorithmic theory and perfect tance principles and positive definite completions of
graphs, Acad. Press, 1980. partial matrices over function rings', in I. GOHBERG
[22] GRAY, L.J., AND WILSON, D.G.: 'Nonnegative factor- ET AL. (eds.): Contributions to Operator Theory and
ization of positive semidefinite nonnegative matrices', its Applications, Birkh~iuser, 1988, p. 107-127.
Linear Alg. ~ Its Appl. 31 (1980), 119-127. [3s] JOHNSON, C.R., AND TARAZAGA, P.: 'Connections be-
[23] GRONE, R., JOHNSON, C.R., S/~, E.M., AND tween the real positive semidefinite and distance matrix
WOLKOWICZ, H.: 'Positive definite completions of par- completion problems', Linear Alg. FJ Its Appl. 223//4
tial hermitian matrices', Linear Alg. ~ Its Appl. 58 (1995), 375-391.
(1984), 109-124. [39] KHACHIYAN, L., AND PORKOLAB, L.: 'Computing inte-
[24] GROTSCHEL, M., Lovksz, L., AND SCHRIJVER, A.. gral points in convex semi-algebraic sets': 38th Annual
Geometric algorithms and combinatorial optimization, Symp. Foundations Computer Sci., 1997, p. 162-171.
Springer, 1988. [4o] KOGAN, N., AND BERMAN, A.: 'Characterization
HARTFIEL, D.J., AND LOEWY, R.: 'A determinantal of completely positive graphs', Discrete Math. 114
version of the Frobenius-K5nig theorem', Linear Mul- (1993), 297-304.
tilinear Algebra 16 (1984), 155-165. [41] KUNTZ, I.D., THOMASON, J.F., AND OSHIRO, C.M.:
[26] HAVEL, T.F.: 'An evaluation of computational strate- 'Distance geometry', Methods in Enzymologie 177
gies for use in the determination of protein structure (1993), 159-204.
from distance constraints obtained by nuclear mag- [42] LAMAN, G.: 'On graphs and rigidity of plane skeletal
netic resonance', Program. Biophys. Biophys. Chem. structures', J. Engin. Math. 4 (1970), 331-340.
56 (1991), 43-78. [43] LAURENT, M.: 'Cuts, matrix completions and graph
[27] HELMBERG, C., RENDL, F., VANDERBEI, R.J., AND rigidity', Math. Program. 79 (1997), 255-283.
WOLKOWICZ, H.: 'An interior-point method for semi- [44] LAURENT, M.: 'The real positive semidefinite comple-
definite programming', SIAM J. Optim. 6 (1996), 342- tion problem for series-parallel graphs', Linear Alg.
361. Its Appl. 252 (1997), 347-366.
[2s] HELTON, J.W., PIERCE, S., AND RODMAN, L.: 'The [45] LAURENT, M.: 'A connection between positive semidef-
ranks of extremal positive semidefinite matrices with inite and Euclidean distance matrix completion prob-
given sparsity pattern', SIAM J. Matrix Anal. Appl. lems', Linear Alg. ~ Its Appl. 273 (1998), 9-22.
10 (1989), 407-423. [46] LAURENT, M.: 'On the order of a graph and its
[29] HENDRICKSON, B.: 'The molecule problem: Deter- deficiency in chordality', CWI Report P N A - R 9 8 0 1
mining conformation from pairwise distances', Techn. (1998).
Report Dept. Computer Sci. Cornell Univ. 9 0 - 1 1 5 9 [47] LAURENT, M.: 'A tour d'horizon on positive semidef-
(1990), PhD Thesis. inite and Euclidean distance matrix completion prob-
[3o] HENDRICKSON, B.: 'Conditions for unique graph real- lems', in P.M. PARDALOS AND H. WOLKOWICZ(eds.):
izations', SIAM J. Comput. 21 (1992), 65-84. Topics in Semidefinite and Interior-Point Methods,
[31] HENDRICKSON, B.: 'The molecule problem: exploiting Vol. 18 of Fields Inst. Res. Math. Sci. Commun., Amer.
structure in global optimization', SIAM J. Optim. 5 Math. Soc., 1998, p. 51-76.
(1995), 835-857. [4s] LAURENT, M.: 'Polynomial instances of the positive
[32] JOHNSON, C.R.: 'Matrix completion problems: A sur- semidefinite and Euclidean distance matrix completion
vey', in C.R. JOHNSON (ed.): Matrix Theory and Appl., problems', SIAM J. Matrix Anal. Appl. (to appear).
Vol. 40 of Proc. Symp. Appl. Math., Amer. Math. Soc., [49] LEEUW, J. DE, AND HEISER, W.: 'Theory of mul-
1990, p. 171-198. tidimensional scaling', in P.R. KRISHNAIAH AND
[33] JOHNSON, C.R.., JONES, C., AND KROSCHEL, B.: L.N. KANAL (eds.): Handbook Statist., Vol. 2, North-
'The distance matrix completion problem: cycle com- Holland, 1982, p. 285-316.
pletability', Linear Multilinear Algebra 39 (1995), 195- [~0] Lov~.sz, L., AND PLUMMER, M.D." Matching theory,
207. Akad. KiadS, 1986.
[34] JOHNSON, C.R., KROScHEL, B., AND WOLKOWICZ, [51] LOVASZ, L., AND YEMINI, Y." 'On generic rigidity in
H.: 'An interior-point method for approximate posi- the plane', SIAM J. Alg. Discrete Meth. 3 (1982), 91-
tive semidefinite completions', Comput. Optim. Appl. 98.
9 (1998), 175-190. [52] LUNDQUIST, M.E., AND JOHNSON, C.R.: 'Linearly con-
[35] JOHNSON, C.R., AND RODMAN, L.: 'Inertia possibili- strained positive definite completions', Linear Alg.
ties for completions of partial Hermitian matrices', Lin- Its Appl. 150 (1991), 195-207.
ear multilinear algebra 16 (1984), 179-195. [53] MOR~, J.J., AND W v , Z" 'e-optimal solutions to dis-

228
Matroids

tance geometry problems via global continuation', in MATROIDS


P.M. PARDALOS AND D. SHALLOWAY(eds.): DIMACS,
Matroids have been defined in 1935 as general-
Vol. 23, Amer. Math. Soc., 1996, p. 151-168.
[54] Mon~, J.J., AND Wv, Z." 'Global continuation for dis- ization of graphs and matrices. Starting from the
tance geometry problems', SIAM J. Optim. 7 (1997), 1950s they have had increasing interest and the
814-836. theoretical results obtained have been used for
[55] MUROTA, K.: 'Mixed matrices: Irreducibility and de- solving several difficult problems in various fields
composition', in R.A. BRUALDI ET AL. (eds.): Combi-
such as civil, electrical, and mechanical engineer-
natorial and graph-theoretical problems in linear alge-
bra, Vol. 50 of IMA, Springer, 1993, p. 39-71.
ing, computer science, and mathematics. A com-
[56] NESTEROV, Y.E., AND NEMIROVSKY, A.S.: Interior prehensive treatment of matroids can not be con-
point polynomial algorithms in convex programming: tained in few pages or even in only one book. Thus,
Theory and algorithms, SIAM, 1994. the scope of this article is to introduce the reader
[57] PARDALOS, P.M., AND LIN, X.: 'A tabu based pattern to this theory, providing the definitions of some
search method for the distance geometry problem', in
different types of matroids and their main proper-
F. GIANNESSIS ET AL. (eds.): Math. Program., Kluwer
Acad. Publ., 1997. ties.
[bs] PAULSEN, V.I., POWER, S.C., AND SMITH, R.R.:
'Schur products and matrix completions', J. Funct.
Anal. 85 (1989), 151-178. H i s t o r i c a l O v e r v i e w . In 1935, H. Whitney in
[59] PORKOLAB, L., AND KHACHIYAN, L.: 'On the com- [38] studied linear dependence and its important
plexity of semidefinite programs', J. Global Optim. 10 application in mathematics. A number of equiva-
(1997), 351-365.
lent axiomatic systems for matroids is contained
[6o] RAMANA, M.V.: 'An exact duality theory for semi-
definite programming and its complexity implications',
in his pioneering paper, that is considered the first
Math. Program. 7'7 (1997), 129-162. scientific work about matroid theory.
[61] ROSE, D.J.: 'Triangulated graphs and the elimination In the 1950s and 1960s, starting from the Whit-
process', J. Math. Anal. Appl. 32 (1970), 597-609. ney's ideas, W. Tutte in [25], [26], [27], [28], [29],
[62] SAXE, J.B.: 'Embeddability of weighted graphs in [30], [31], [32], [33] built a considerable body of
k-space is strongly NP-hard': Proc. 17th Allerton
Conf. Communications, Control and Computing, 1979,
theory about the structural properties of matroids,
p. 480-489. which became popular in the 1960s, when J. Ed-
[63] SCHOENBERG, I.J.: 'Remarks to M. Fr@chet's article monds in [7], [5], [6], [8], [9], [10], [11] intro-
'Sur la d@finition axiomatique d'une classe d'espaces duced matroid theory in combinatorial optimi-
vectoriels distanci@s applicables vectoriellement sur zation. From 1965 on, a growing number of re-
l'espace de Hilbert", Ann. of Math. 36 (1935), 724-
searchers became interested in matroids. In 1976,
732.
[64] WEB: 'htt p://orion.math, uwat erloo, ca: 80 / ~ hwolkowi/ D.J.A. Welsh ([34]) published the first book on
henry/software/readme.html'. matroid theory. In the 1970s, 1980s, and 1990s se-
[65] WEB: 'htt p: / / www. zib. de/h elmb erg / semi def. ht ml'. lected topics have been covered by a huge number
[66] WOERDEMAN, H.J.: 'The lower order of lower trian- of scientific publications, among them [13], [17],
gular operators and minimal rank extensions', Integral
Eq. Operator Theory 10 (1987), 859-879.
[1], [lS], [35], [36], [37], [15], [24], [20], [23],
[67] YEMINI, Y.: 'Some theoretical aspects of position-
[21], [2], [3]. [16] provides an excellent historical
location problems': Proc. 20th Annual Syrup. Founda- survey, while [21] is a good book for students.
tions Computer Sci., 1979, p. 1-8.

Monique Laurent D e f i n i t i o n of a M a t r o i d . Matroids are combina-


CWI torial structures often treated in together with the
Kruislaan 413
1098 SJ Amsterdam, The Netherlands
greedy technique, which yields optimal solutions
E-mail address: monique©cwi.nl when applied for solving simple problems defined
on matroids.
MSC2000: 05C50, 15A48, 15A57, 90C25 In order to provide the definition of a general
Key words and phrases: partial matrix, completion of ma-
matroid, some notation and further definitions are
trices.
needed.

229
Matroids

DEFINITION 1 An ordered pair S - ( E , I ) , where such that


E - { e l , . . . , en} and I C_ 2E, is an independent u {y} \ e B.
system (SI) if and only if
VA, B C_ E " B c A c I ~ B c I. (1)
THEOREM 8 A set C of subsets of E is the set of
E is also called ground set. [3 circuits for a matroid M - (E, I) if and only if the
Note that the empty set is necessarily a member following two properties hold:
of I. 1) for a l l X ~ Y e C , X~Y;
DEFINITION 2 The members of I are called inde- 2) for all X 7~ Y C C and z E X M Y , there
pendent sets. [3 exists Z ¢ C such that Z C X U Y \ {z}.
Cl
DEFINITION 3 The members of D - 2E \ I are
called dependent sets. [3 Other alternative axiomatic characterizations of a
DEFINITION 4 The members of the set matroid need some further definitions.
Let M - (E, I) be a matroid.
B-{AC_E'AEI, VfcE\A" BU{f}~I}
DEFINITION 9 For all A C E, let p" 2 E ---+ N be a
are called maximal independent sets or bases. [:3
function such that
In other words, a basis is an independent set which
p(A)- max{IX]" X C A , X e I}.
is maximal with respect to set inclusion operation.
p is called rank of M. [3
DEFINITION 5 The members of the set
Note that the rank of M is equal to the rank of
C-{CC_E'C¢D, VI¢C" C\{f}eI}
E, which is given by the cardinality of the maxi-
are called minimal dependent sets or circuits. A mal independent subset of E. The rank is always
1-element circuit is a loop. 53 well-defined, due to the following proposition.
DEFINITION 6 A matroid M is an independent PROPOSITION 10 If A is a subset of E and X and
system ( E , I ) such that if A , B 6 I, [A[ < IB[, Y are maximal independent subsets of A, then
then there is some element x E B \ A such that I X l - IYI. a
AU{x}6I.
Proposition 10 claims that the maximal indepen-
We say that M satisfies the exchange property. dent subsets contained in A C E of a given matroid
E:] M - ( E , I ) have the same cardinality. Choosing
Most combinatorial problems can be viewed as the A - E, the following corollary holds.
problem of finding an element in one of the above COROLLARY 11 The bases of any matroid have
defined sets corresponding to the optimal objective the same cardinality. Cl
function value.
DEFINITION 12 A subset A of E is called a closed
The word matroid is due to Whitney. He stud-
of M if
led matric matroids, in which the elements of E
are the rows of a given matrix and a set of rows p(A U {x}) - p(A) + l, VxeE\A,
is independent if they are linearly independent in i.e. if it is not possible to add to A any element
the usual sense. without increasing its rank. [3
The following theorems express two equivalent
DEFINITION 13 The closure operator for M is a
axiomatic definitions of matroids in terms of bases
function a " 2 E --+ 2 E such that for all A C_ E a(A)
and circuits.
is the closed of minimum cardinality that contains
THEOREM 7 A nonempty set B of subsets of E A, i.e.
is the set of bases for a matroid M - ( E , I ) if
a(A) - A U {x e E \ A" p(A U {x}) - p(A)}.
and only if for all B1,B2 E B, B1 ~ B2, and
x E B1 \ B2, there exists an element y E B2 \ B1 M

230
Matroids

DEFINITION 14 A subset A of E covers M if and The weight function w extends to subsets A of


only if it contains a basis of M, i.e. E by summation:

p(A) - p(E).
xEA
[2] V]
W i t h these further definitions at hand, the follow-
ing theorems express three other equivalent ax-
M i n o r of M a t r o i d s : R e s t r i c t i o n and Con-
iomatic characterizations of a matroid in terms of
traction. A minor of a matroid M - ( E , I ) is
its rank.
a 'submatroid' obtained from deleting or contract-
THEOREM 15 A function p" 2 E -+ N is a rank ing from the ground set E one or more elements.
function of a matroid M - (E, I) if and only if for A loop is an element y of a matroid such that
all X C E and for all y, z E E the following three {y} is not independent. EquivMently, {y} does not
properties hold: lie in any independent set, nor in maximal inde-
1) p(O) - O; pendent sets.
2) p(X) <_ p(X U {y}) _< p ( X ) + 1; DEFINITION 19 Let M - ( E , I ) be a matroid. If
3) p(x) - p(X u {y}) - p ( x u {z)) p(x u an element {x} is not a loop, the matroid M/x,
{y, z}) - p(X). called a contraction of M, is defined as follows:

Vq 1) the ground set of M / x is E \ {x};


2) a set A is independent in M / x if and only if
THEOREM 16 A function p" 2 E -+ N is a rank
A U {x} is independent in M.
function of a matroid M - (E,I) if and only if
for all X ¢ Y C E the following three properties 77
hold" The concept of matroid contraction can be dual-
1) 0 _< p(X) <_ IXf; ized. In fact, an element y is a coloop if it is con-
2) X c_ Y =~ p(X) <_ p(Y); tained in every basis of M.

3) p(X u Y) + p(X N Y) <_ p(X) + p(Y). DEFINITION 20 Let M = (E, I) be a matroid. If


an element {x} is not a coloop, the matroid M \ x ,
[-7
called a restriction of M, is defined as follows:
Note that the second property of theorem 16 im- 1) the ground set of M \ x is E \ {x};
plies that p is a monotonic function, while the third
2) a set A is independent in M \ x if and only if
property expresses its submodularity.
it is independent in M.
THEOREM 17 A function or" 2 E -+ 2 E is a closure
O
operator of a matroid M - (E, I) if and only if for
all X ~ Y C E and for all x, y E E the following The above definitions have been given in terms of
four properties hold" restriction and contraction of only one element,
but they can be easily extended to the restriction
1) x ¢_
and contraction of a set X. The minors obtained
2) Y C_ X ~ a(Y) C_ or(X); will be denoted M \ X and M / X , respectively.
3) a(X) - a(o(X));
4) y ¢ o ( x ) , y e x e o(xu{y}). R e p r e s e n t a b i l i t y of M a t r o i d s . One among the
most common canonical examples of matroids is
O
the vectorial matroid, whose ground set E is a fi-
DEFINITION 18 A matroid M = (E, I) is weighted nite set of vectors from a vector space, while the
if there is an associated weight function w that as- independent sets are the linearly independent sub-
signs a strictly positive weight w(x) to each ele- sets of vectors of E. A matroid M = (E, I) is rep-
ment x E E. resentable on a field F if there exists some vector

231
Matroids

space V over F, with some finite set E of vectors Uniform Matroid. Let E be a set of n elements
of V, so that M is isomorphic to the vectorial ma- and let I be the family of subsets A of E such
troid of the set E. A binary matroid is a matroid that IAI _< k < n. T h e n M = ( E , I ) is called the
representable over GF(2), while a ternary matroid uniform matroid of rank k and is denoted by Uk,n.
is representable over GF(3). The sets of the bases and the circuits of Uk,n are
In recent literature (as of 1999) the problem of
B-{X E" IXl-k}
classifying all the fields over which a given matroid
is representable and the inverse problem of char- and
acterizing all the matroids that are representable C-{XCE Ixl-k+l},
on a given field have had growing interest. An im-
respectively.
portant result for matroid representability is the
following theorem. Moreover, for all A C_ E,

THEOREM 21 A matroid M - ( E , I ) is repre- - ~ IAI if IAI _ K,


p(A)
sentable over any field if and only if it is repre- [K otherwise,
sentable over GF(2) and over some field of charac-
if IAI _< K,
teristic other t h a n two. [:] o(A)- E
otherwise.
A matroid as in the previous theorem is called reg-
ular. Graphic Matroid. If F is the set of forests of a
graph G - (V, E), M - (E, F) is called a graphic
C o n n e c t i v i t y of M a t r o i d s . Connectivity is an matroid. The circuits of M are the graph-theoretic
important concept in matroid theory. circuits of G, while the rank of a subset E1 of E is
given by
DEFINITION 22 A matroid M - ( E , I ) admits a
k-separation if there exists a partition (X, Y) of p(E1) - IYl -
the ground set E such that where c(E1) is the number of connected compo-
1) IXl > k, IYI > k; nents of G1 - (V, E1 ).

2) p ( x ) + p ( y ) - p(E) < k - 1. Transversal Matroid. Let E be a finite set, C =


{ S 1 , . . . , Sm} a collection of subsets of E, and let
[:]
T - { e l , . . . , e t } C_ E.
DEFINITION 23 The smallest k such that a ma- T is called a transversal of C if there exist dis-
troid M - (E, I) admits a k-separation is called tinct integers j ( 1 ) , . . . ,j(t) such that ei C Sj(i),
the connectivity of M. [3 i - 1 , . . . , t. Let I be the set of all transversals of
E, then M - (E, I) is a transversal matroid.
If k > 2, M is n-connected for any n < k; if k - 1,
M is disconnected; if M admits any k-separations Partition Matroid. Let E be a finite set, II -
for all integers k, M has infinite connectivity. { E l , . . . ,Ep} a partition of E, t h a t is a collection
An i m p o r t a n t result for matroid connectivity is of disjoint subsets of E covering E, and d l , . . . , dp
the following theorem. p nonnegative integers. A subset A of E is inde-
THEOREM 24 A matroid M - ( E , I ) is discon- pendent , i.e. A E I, if and only if IA A Ejl < dj,
nected if and only if there exists a partition (X, Y) j - 1 , . . . , p. The system M - (E, I) is a matroid,
of the ground set E such that every circuit C of called a partition matroid.
M is either a subset of X or a subset of Y. [2] An example of a partition matroid can be ob-
tained by considering any digraph G - (V, E) and
partitioning the edges of the set E according to
E x a m p l e s of M a t r o i d s . In this section some of which node is the head (or, equivalently, the tail)
the most popular types of matroids involved in of each. Suppose t h a t dj - 1, j - 1 , . . . , p ; then
combinatorial optimization will be described. a set A of edges is independent if no two edges of

232
Matroids

A have the same head (or, equivalently, the same A and B TA - O. B T is the dual matroid M of the
tail). vectorial matroid M.
Dual Matroids. Let M - (E, I) be a matroid, and
let B be its set of bases. Greedy Algorithms on Weighted Matroids.
Many combinatorial problems for which the greedy
The dual matroid M is the matroid on the
technique gives an optimal solution can be formu-
ground set E, whose bases are the complements
lated in terms of finding a maximum-weight inde-
of the bases of M. Thus, a set A is independent in
pendent subset in a weighted matroid. In more de-
M if and only if A is disjoint from some basis of
tail, there is given a weighted matroid M = (E, I)
M. Note that M - M.
and the objective is to find an independent set
For a pair of matroids (M, M) and their rank
A C I such that w(A) is maximized (also called
functions, the following propositions hold.
an optimal subset of M). Since the weight w(x) of
PROPOSITION 25 Let M - ( E , I ) be a matroid, any element x 6 E is positive, a maximum-weight
and let p be its rank function. Let M - (E, 7) be independent subset is always a maximal indepen-
the dual matroid of M; then dent subset.
~(A) - ]A I + p(E \ A) - p(E), In the minimum spanning tree problem, for ex-
ample, there are given a connected undirected
for each A C E. [-1 graph G = (V, E) and a length function w such
PROPOSITION 26 Let M be the dual of the ma- that w(e) is the positive length of the edge e. The
troid M - (E, I), let A be a subset of E and let objective is to find an acyclic subset T of E that
A - E \ A. If p and ~ are the rank functions of M connects all of the vertices of G and whose total
and M respectively, then length

1) IAI - -fi(A) - p(E) - p(A); -

2) ~(E) - ~(A) - [ A I - p(A). eET

D is minimized. This is a classical combinatorial


problem and can be formulated as a problem of
PROPOSITION 27 Let M - (E, I) be a matroid,
finding an optimal subset of a matroid. In fact,
then
consider the graphic weighted matroid M c with
1) x is a loop in M if and only if x is a coloop weight function w' such that w'(e) = w o - w(e),
in M and vice versa; where w0 is larger than the maximum length of
2) If x is not a loop in M, t.hen the dual of M / x any edge. It can be easily seen that for each e C E,
is the matroid M \ x ; w'(e) >_ 0 and that an optimal subset of M c is a
3) If x is not a coloop in M, then the dual of spanning tree of minimum total length in the orig-
M \ x is the matroid -Mix. inal graph G. In more detail, each maximal inde-
pendent subset A corresponds to a spanning tree
and since
As example of the dual of a matroid, let us con-
w'(A) = ( [ V [ - 1 ) . w 0 - w(A)
sider the vectorial matroid. Suppose that the vec-
tors representing M are the columns of an m x n for any maximal independent subset A, the inde-
matrix A and that these vectors span Fm. Thus, A pendent subset that maximizes w'(A) must mini-
has rank m and is the matrix of a linear transfor- mize w (A).
mation T from F n onto Fm. Let K be the kernel J.S. Kruskal in [14] and R.C. Prim in [22] pro-
of T, and B the matrix of a linear embedding of U posed two greedy strategies for solving efficiently
into F n. Note that B is a n × ( n - m ) matrix (whose the minimum spanning tree, but in the following
columns are the basis for U) and has rank n - m. is reported the pseudocode of a greedy algorithm
Moreover, the columns of the (n - m) × n matrix that works for any weighted matroid. The algo-
B T are indexed by the same set as the columns of rithm GREEDY takes as input a matroid M =

233
Matroids

(E, I) and a weight function w and returns an op- this property guarantee the applicability of greedy
timal subset A. strategies as well as dynamic programming algo-
set A = 0 rithms.
sort E[M] = { x l , . . . ,xt} into nonincreasing or- See also: O r i e n t e d m a t r o i d s .
der by weight w
FOR i = 1 to t
IF A U {xi} E I[M] References
set A = AU {xi} [1] AIGNER, M.: Combinatorial theory, Springer, 1979.
return(A) [2] BACHEM, A., AND KERN, W.: Linear programming du-
ality: An introduction to oriented matroids, Springer,
GREEDY(M, w).
1992.
Like any other greedy algorithm, GREEDY always [3] BJ(~RNER, A., VERGNAS, M. LAS, STURMFELS, B.,
WHITE, N., AND ZIEGLER, G.M.: Oriented matroids,
makes the choice that looks best at the moment.
Vol. 46 of Encycl. Math. Appl., Cambridge Univ. Press,
In fact, it considers in turn each element xi be- 1993.
longing to E[M], whose element are sorted into [4] CORMEN, T.H., LEISERSON, C.E., AND RIVEST, R.L.:
nonincreasing order by weight w and immediately Introduction to algorithms, MIT, 1990.
adds x to the building set A if A U {xi} is still [5] EDMONDS, J.: 'Lehman's switching game and a theo-
rem of Tutte and Nash-Williams', J. Res. Nat. Bureau
independent. Note that the returned set A is al-
Standards (B) 69 (1965), 73-77.
ways independent, because it is initialized to the [6] EDMONDS, J.: 'Maximum matching and a polyhedron
empty set, which is independent by definition of with {0, 1} vertices', J. Res. Nat. Bureau Standards (B)
a matroid, and then at each iteration an element 69 (1965), 125-130.
xi is added to A while preserving the A's inde- [7] EDMONDS, J.: 'Minimum partition of a matroid into
pendence. A is also an optimal subset of the ma- independent subsets', J. Res. Nat. Bureau Standards
(B) 69 (1965), 67-72.
troid M and therefore, a minimum spanning tree
[8] EDMONDS, J.: 'Paths, trees, and flowers', Canad. J.
for the original graph G. To prove its optimality, Math. 17 (1965), 449-467.
it is enough to show that weighted matroids ex- [9] EDMONDS, J.: 'Optimum branchings', J. Res. Nat. Bu-
hibit the two ingredients whose existence guaran- reau Standards (B) 71 (1967), 233-240.
tee that a greedy strategy will solve optimally the [10] EDMONDS, J.: 'Systems of distinct representatives and
linear algebra', J. Res. Nat. Bureau Standards (B) 71
given problem: the greedy-choice property and the
(1967), 241-245.
optimal substructure property. The proof that ma- [11] EDMONDS, J.: 'Submodular functions, matroids, and
troids exhibit both these properties can be found certain polyhedra', in R. GuY, H. HANANI, N. SAVER,
in [4]. Generally speaking, the proof of the ex- AND J. SCH(~NHEIM (eds.): Combinatorial Structures
hibition of the greedy-choice property consists of and Their Applications, Gordon and Breach, 1970.
showing that a globally optimal solution can be ob- [12] FUJISHIGE, S.: 'Submodular functions and optimiza-
tion', Ann. Discrete Math. 47 (1991).
tained by making a locally optimal (greedy) choice.
[13] H.H. CRAPO, G.C. ROTA: On the ]oundations of com-
The proof examines a global optimal solution. It binatorial theory: Combinatorial geometries, prelimi-
shows that the solution can be modified so that a nary ed., MIT, 1970.
greedy choice is made at the first step and that this [14] KRUSKAL, J.B.: 'On the shortest spanning subtree
choice reduces the original problem into an equiv- of a graph and the traveling salesman problem',
Proc. Amer. Math. Soc. 7 (1956), 48-50.
alent problem having smaller size. By induction,
[15] KUNG, J.P.S.: 'Numerically regular hereditary classes
it is proved that a greedy choice can be made at of combinatorial geometries', Geometriae Dedicata 21
each step. To show that making a greedy choice (1986), 85-105.
reduces the original problem into a similar but [16] KUNG, J.P.S.: A source book in matroid theory,
smaller problem reduces the proof of correctness to Birkh~iuser, 1986.
demonstrating that an optimal solution must ex- [17] LAWLER, E.L.: Combinatorial optimization: Networks
and matroids, Holt, Rinehart and Winston, 1976.
hibit optimal substructure. The optimal substruc-
[18] LOV~.SZ, L., AND PLUMMER, M.D." Matching theory,
ture property is exhibited by a given problem, if an Akad. Kiad6, 1986.
optimal solution to the problem contains within it [19] MAFFIOLI, F.: Elementi di programmazione matemat-
optimal solutions to subproblems. The validity of ica, Masson, 1990.

234
Maximum constraint satisfaction: Relaxations and upper bounds

[2o] MUROTA, K., IRI, M., AND NAKAMURA, M.: 'Com- MAXIMUM C O N S T R A I N T SATISFACTION:
binatorial canonical form of layered mixed matrices
RELAXATIONS AND UPPER B O U N D S
and its application to block-triangularization of sys-
tems of linear/nonlinear equations', SIAM J. Alg. Dis- M a x i m u m constraint satisfaction problems (MAX-
crete Meth. 8 (1987), 123-149. CSPs) generalize m a x i m u m satisfiability (MAX-
[21] OXLEY, J.G.: Matroid theory, Oxford Univ. Press, SAT) to include cases where the variables are no
1992. longer restricted to binary (or Boolean) values.
[22] PRIM, R.C.: 'Shortest connection networks and some
MAX-CSP is NP-complete even in the special
generalizations', Bell System Techn. J. 36 (1957),
1389-1401. case of binary CSPs. Therefore designing proce-
[23] RECSKI, A.: Matroid theory and its application in elec- dures to compute upper bounds to the exact (un-
trical networks and statics, Springer, 1989. known) optimum value (maximum number of sat-
[24] SCHRIJVER, A.: Theory of linear and integer program- isfied constraints) is a relevant issue. Such bounds
ming, Wiley, 1986.
may be useful, in particular, to provide estimates
[25] TUTTE, W.W.: 'A homotopy theorem for matroids I-
II', Trans. Amer. Math. Soc. 88 (1958), 144-160; 161- of the quality of solutions obtained from various
174. heuristic approaches.
[26] TUTTE, W.W.: 'Matroids and graphs', Trans. Amer. This article describes a systematic way of com-
Math. Soc. 90 (1959), 527-552. puting upper bounds for large scale MAX-CSP in-
[27] TUTTE, W.W.: 'An algorithm for determining wheter
stances such as those arising from the so-t:alled ra-
a given binary matroid is graphic', Proc. Amer. Math.
Soc. 11 (1960), 905-917. dio link frequency a s s i g n m e n t problem (RLFAP).
[2s] TUTTE, W.W.: 'On the problem of decomposing a After discussing the general relaxation principle
graph into n connected factors', J. London Math. Soc. and the basic procedure from which the bounds
36 (1961), 221-230. are derived, we present results of extensive com-
[29] TUTTE, W.W.: 'From matrices to graphs', Canad. J. putational experiments on series of 90 instances
Math. 16 (1964), 108-127.
[30] of RLFAP including both real test problems and
TUTTE, W.T.: 'Lectures on matroids', J. Res. Nat. Bu-
reau Standards (B) 69 (1965), 1-47. randomly generated 'realistic' test problems (for
[31] TUTTE, W.W.: Connectivity in graphs, Univ. Toronto sizes ranging from 396 variables and about 1700
Press, 1966. constraints to 831 variables and about 4800 con-
[32] TUTTE, W.T.: 'Connectivity in matroids', Canad. J. straints).
Math. 18 (1966), 1301-1324.
These results clearly indicate that the proposed
[33] TUTTE, W.T.: Introduction to the theory of matroids,
Amer. Elsevier, 1971. approach is practically useful to produce fairly
[34] WELSH, D.J.A.: Matroid theory, Acad. Press, 1976. accurate upper bounds for such large MAX-CSP
[35] WHITE, N.: Theory of matroids, Cambridge Univ. problems.
Press, 1986.
[36] WHITE, N.: Combinatorial geometries, Cam-
bridge Univ. Press, 1987. I n t r o d u c t i o n . Constraint satisfaction problems
[37] WHITE, N.: Matroid applications, Cambridge Univ. (CSPs) may be viewed as a generalization of sat-
Press, 1991.
[38] isfiability (SAT) to include cases where, instead of
WHITNEY, H.: 'On the abstract properties of linear de-
pendence', Amer. J. Math. 57 (1935), 509-533. taking binary values only (0-1 or true-false) the
variables may take on a finite number (> 2) of
given possible values.
Paola Festa
Dip. Mat. e Inform. Univ. Salerno For an infeasible CSP, a relevant question, both
Via S. Allende theoretically and practically, is to determine an as-
84081 Baronissi (SA), Italy signment of values to variables such that the num-
E-mail address: paofes~unisa, it ber of satisfied constraints is the largest possible.
This is the so-called m a x i m u m constraint satisfac-
MSC2000: 90C09, 90C10 tion problem (MAX-CSP), which generalizes in a
Key words and phrases: combinatorial optimization, greedy natural way m a x i m u m satisfiability (MAX-SAT).
technique, graph optimization. Since MAX-2SAT is NP-complete (see e.g. [12,
pp. 259-260]) even the subclass of MAX-CSP cor-

235
Maximum constraint satisfaction: Relaxations and upper bounds

responding to binary CSPs (those problems with if the combination is allowed, FALSE oth-
constraints involving pairs of variables only) is erwise. (For any S C { 1 , . . . , n ) and x C
NP-complete. Therefore, for very large instances D1 × . . . × Dn, x[s] denotes the vector x re-
such as those arising from practical applications stricted to components in S.)
(e.g. the RLFAP discussed below) one can only
Given a CSP specified as above, we define a free
hope for approximate solutions using some of the
assignment as any n-tuple x E D - D1 × . . . × Dn.
currently available heuristic approaches such as:
A feasible assignment (or solution) is a free as-
simulated annealing, tabu search, genetic algo-
signment such that ~k(X[sk]) -- TRUE for all
rithms, or local search of various kinds.
k-1,...,K.
However, for many applications, getting an ap-
For simplicity, we restrict here to the case where
proximate solution without any information about
each variable takes scalar values only (i.e. real or
the quality of this solution (e.g. measured by the
integer values), but we note that more general
difference between the cost of this solution and the
CSPs may be defined with variables taking, for
optimal cost) may be of little value.
instance, vector values.
We address in this paper the problem of com-
The arity of a constraint ~k is the cardinality
puting upper bounds to the optimum cost of MAX-
of its support set" ISkl -- Isupp(~k)[. A binary
CSP problems from which estimates on the quality
CSP is a constraint satisfaction problem in which
of heuristic solutions can be derived.
Isupp(~k)l _< 2 for all k - 1 , . . . , g .
The article is organized as follows. Basic defi-
The constraint hypergraph associated with a
nitions about CSPs and MAX-CSPs are recalled
given CSP is the hypergraph having vertex set
in the second section. Modeling the so-called ra-
I - { 1 , . . . , n } and edge set { $ 1 , . . . , SK}. In case
dio link frequency assignment problem (RLFAP)
of a binary CSP this is a graph.
in terms of CSP and MAX-CSP is addressed in the
The two examples below are interesting spe-
third section. Then we present a general class of re-
cial cases of the general definition and show
laxations for MAX-CSP problems and its special-
NPcompleteness of arbitrary CSPs.
ization to the computation of MAX-CSP bounds
for RLFAP. Finally results of extensive computa- EXAMPLE 1 (Satisfiability) SAT is easily recog-
tional experiments carried out on series of both nized as a special case of CSP where Vi" Di =
real test problems and realistic randomly gener- {TRUE, FALSE} and where there is a constraint
ated test problems are presented. To our knowl- ~k corresponding to each clause Ck with ~k(X) --
edge, this is the first time extensive computational TRUE ¢:~ clause Ck is satisfied under truth assign-
results of this kind are reported for such large scale ment x. U
MAX-CSP problems.
EXAMPLE 2 (Hypergraph q-coloring; see [2, Chap.
19]) Let q > 1 be a given integer and H - I V , E]
C S P a n d M A X - C S P . A constraint satisfaction
an hypergraph with vertex set V and edge set E.
problem (CSP) is defined by specifying:
The problem is to assign one out of q colors to each
• a set of n variables x l , . . . , x n ; vertex of H so that each edge of H has vertices of
• for each variable xi, i E I = { 1 , . . . , n } the different colors. Clearly this may be formulated as
domain of i, i.e. the (finite) set Di of possible a CSP problem where there is one variable xi for
values for xi; each vi E V, with domain Di - { 1 , . . . , q}, and one
constraint ~k for each edge ek -- { i l , . . . , ip} E E
• a set of K constraints ~k, k = 1 , . . . , K . For
such that ~ ( 5 i ~ , . . . , xip) - TRUE ¢~ no two values
each k C [1, K], constraint ~k is defined by
in {~i~,..., xip } are equal. Note that when H is a
its support set (i.e. the subset Sk = supp(~k)
graph (i.e. lekl -- 2 for all ek E E), the resulting
of indices of the variables involved in the
CSP is a binary CSP. [-1
constraint) and an oracle which, given any
combination x[sk] of values for variables in For an infeasible CSP, one basic question is to de-
Sk, answers TRUE if ~k (x[sk]) -- TRUE, i.e. termine a 'best possible' or 'least infeasible' as-

236
Maximum constraint satisfaction: Relaxations and upper bounds

signment. If the criterion for quality (or degree of of noninterference constraints, (most constraints
'feasibility') of an assignment x is taken to be the usually involving pairs of links). A CSP formula-
number a(x) of constraints satisfied under that as- tion of RLFAP is as follows: With n denoting the
signment, we are led to the so-called MAX-CSP number of links, for each link i - 1 , . . . , n, there
problem: is an associated variable xi representing the fre-
• Given: a CSP defined by its variables quency to be assigned to link i. The domain Di of
x l , . . . , x n , domains D 1 , . . . , D n , and con- xi is the (finite) set of allowed frequencies for link
straints ~ 1 , . . . , ~g. i (frequencies are expressed in Hz, KHz, MHz or
any other specified unit).
• Find: x E D1 × . ' . × Dn such that
Any assignment x E S - D1 × " " × Dn is not al-
a(x)- I{k E [1, K]" ~k(X[sk])- TRUE} I lowed because a number of constraints, called non-
is maximized. interference constraints have to be satisfied.
We will only consider here the case of binary
EXAMPLE 3 ( M A X - S A T , MAX-2SAT) Clearly,
noninterference constraints (i.e. involving only
MAX-SAT is a special case of MAX-CSP when
pairs of links), which is relevant to many appli-
the given CSP is a satisfiability problem. The as-
cations of interest (see e.g. [16], [15]). For a given
sociated decision problem is NP-complete even for
pair of links i and j, two (exclusive) types of con-
the special case of MAX-2SAT ([13]), showing that
straints are possible:
MAX-CSP is NP-complete even for binary CSPs.
[-1 • equality constraints of the form

Heuristics for approximately solving the MAX- (E) ] x i - x j l = wij;


SAT problem have been proposed by [19], [23], [27], • inequality constraints of the form
[17]. A branch and bound algorithm for MAX-SAT
(I) Ix - xjl >
based on probabilistic bounds is described in [3]
with computational results up to 100 binary vari- The real number wij which represents the re-
ables and 1000 clauses. The branch and cut al- quested slack or minimum requested slack between
gorithm described in [20] presents computational the two assigned frequencies will be called the
results for general Max-3SAT problems up to 100 weight of the constraint.
binary variables and 575 clauses. For a recent sur- An instance of RLFAP is therefore specified by
vey on SAT and MAX-SAT, see [9]. n (number of links), a list of domains D 1 , . . . , Dn
For more general MAX-CSP problems, many and a list of constraints i.e. a list of quadruples
heuristic approaches have been investigated such of the form (i,j, wij,Tij) where: i, j are the in-
as tabu search ([7], [4]), simulated annealing [5], dices of the two links involved, wij is the weight
genetic algorithms [18]. Exact Algorithms for ran- of the constraint, and Tij its type ((n) or (I)).
dom MAX-CSP problems were proposed in [11]. The constraint graph associated with an instance
However in the computational experiments re- of RLFAP is defined as the undirected graph G
ported, the sizes of the problems for which exact with node set { 1 , . . . , n} and with an edge (i, j) for
optimal solutions were found are rather small (144 each constraint (i, j, wij, Tij). We denote K the to-
variables with domains of cardinality 4 and 646 tal number of constraints in an instance of RLFAP.
constraints for the largest problems solved in [11]). Benchmarks of the RLFAP involving real instances
up to 916 variables and 5744 constraints have been
M A X - C S P and the R a d i o Link Frequency made publicly available in the context of the Euro-
A s s i g n m e n t P r o b l e m . Operating large radio pean Project CALMA (see [15], [8]). In those prac-
link telecommunication networks gives rise to the tical instances, the number of equality constraints
so-called radio link frequency assignment problem (of type (E)) is never more than n/2, and assign-
(RLFAP), which is to choose, for each transmis- ments satisfying all of them can easily be found.
sion link, a specific operating frequency (among a We will denote S' C S = D1 × ' " × Dn the set
given list of allowed values) while satisfying a list of all such assignments. All assignments x E S \ S '

237
Maximum constraint satisfaction: Relaxations and upper bounds

must be disregarded because they are physically We note here that in the case where an upper
meaningless, therefore, from now on, we will only bound 3 is found such that ~ < K, then we can
consider assignments in S ~ as possible solution to deduce that the given RLFAP has no feasible solu-
RLFAP. tion. Thus, an interesting by-product of computing
An assignment in S ~ which satisfies all con- bounds will be to produce proofs of infeasibility of
straints of type (I) will be called feasible. The fea- a given instance of RLFAP. Clearly, such an infor-
sibility version of RLFAP may therefore be stated mation may be of considerable importance to the
as the following CSP: practitioner.
• Given" an instance of RLFAP.
A G e n e r a l Class of R e l a x a t i o n s for C o m -
• Question" does there exist a feasible fre-
p u t i n g M A X - C S P B o u n d s . MAX-CSP may be
quency assignment ?
reformulated as the discrete optimization problem
• Answer" yes or no and, if yes, output a feasi-
K
ble assignment x.
max z -- ~ Yk
Efficient solution methods for RLFAP are of ma- k=l
jor interest to numerous practical applications in s.t. gk(x) >_ Yk, V k - 1,... ,K, (1)
the context of civilian mobile communication net- Yk -- 0 or 1, Vk,
works as well as of military networks. Since the x- ( X l , . . . ,x~) T ~ S'.
available spectrum is severely limited and the com-
munication needs (traffic requirements) are con- In the above, for all k - 1 , . . . , K , gk(x) _> 1 if
tinuously increasing, a high proportion of the in- ~k(X[sk]) -- TRUE, and gk(x) < 1 if ~k(X[sk] ) --
stances of the RLFAP encountered in applications FALSE. Note that in the case of RLFAP, this spe-
turn out to be infeasible. cializes to: gk(x) - - I x i - xj[/Wk, where xi and xj
When faced with an instance which is either are the two variables involved in constraint k, and
infeasible or which is presumably infeasible (e.g. Wk the weight of constraint k.
because running a heuristic solution method just A relaxation of an optimization problem such as
failed to produce a feasible solution) a key ques- (1) is obtained by replacing its solution set by a
tion for the practitioner becomes to determine a larger solution set. Clearly if the relaxed problem
'best possible' or 'least infeasible' assignment. can be solved exactly (i.e. to guaranteed optimal-
This leads to the 'optimization version' of the ity) then its optimal objective function value is
RLFAP in the form of the following M A X - C S P : an upper bound (in case of maximization) to the
optimum objective function value of the original
• Given: an instance of RLFAP with n vari-
problem.
ables (links) and K constraints.
There exists a number of standard ways of re-
• Question: determine x* E S ~ such that a(x*) laxing an optimization problem such as (1), e.g.
(number of satisfied constraints) is maxi- using Lagrangian relaxation (e.g. [10]) or consid-
mized: ering the so-called continuous relaxation of some of
a(x*) - max{a(x)}. the variables (e.g. relaxing the constraints on the
xES' Yk variables in (1) to 0 < Yk ~_ 1). However, in our
In view of the NP-completeness of MAX-CSP treatment of RLFAP, those standard relaxations
for binary CSPs, guaranteed optimal solutions to have not been considered because they do not give
the above for large scale instances (such as those of rise to easily solvable relaxed problems. We there-
the CELAR benchmarks) cannot be reasonably ex- fore investigated a different approach according to
pected from currently available techniques in com- the following general principle.
binatorial optimization. A less ambitious, though The relaxations we consider are based on the
practically relevant objective, addressed in the fol- identification of those parts of the constraint graph
lowing section, is to try and obtain good upper or hypergraph which are responsible for the infea-
bounds to an optimal solution value. sibility of the whole problem. Preliminary compu-

238
Maximum constraint satisfaction: Relaxations and upper bounds

tational results obtained in [25] have shown that, the procedure SOLVE.RELAX below). Clearly,
at least for MAX-CSP problems deriving from RL- any such upper bound still provides a valid upper
FAP, it is most often possible to identify in a given bound to the original problem. Of course, in the
instance an infeasible induced subproblem of suf- above approach, the quality of the bound derived
ficiently reduced size to make the corresponding from R[~'] essentially depends on how to select
MAX-CSP bound computable in reasonable time. the subset )~'. We now describe the selection pro-
This suggests to consider relaxations of (1) cedure which has been used in our computational
formed by subproblems induced by properly cho- experiments.
sen subsets of constraints. Thus, if K:' C ~ =
{ 1 , . . . , K} is the subset of constraints chosen, the
induced relaxation considered is: Building Relaxations for RLFAP Using
K M a x i m u m Cliques. We now specialize the gen-
max z -- E Yk eral relaxation scheme described above to derive
k=l bounds for RLFAP. The presentation below im-
s.t. gk(x) >_ Yk, Vk E E', (2) proves and extends our preliminary work in [25].
Yk = 0 or 1, Vk = 1,... ,K, The basic idea of our selection procedure for
xES'. choosing K:' C K: is that, for RLFAP, infeasibility is
more likely to occur on subsets of links which are
Note that, in an optimal solution to (2) all mutually constrained, i.e. on subsets of links
which induce a clique (complete subgraph) in the
k E )~\]C' ~ Yk = 1.
constraint graph. Since for RLFAP the constraint
Therefore ~, the optimum objective function value graphs arising from applications are always very
of (2), may be rewritten as: sparse (less than 1% density for the CELAR in-
stances), it is known that finding a clique of maxi-
mum cardinality can be efficiently done even using
where ~' is the optimum value of the problem: simple approaches such as implicit enumeration.
max z'- E Yk In [6] an efficient implicit enumeration based al-
kEK7 gorithm with good computational results for large
s.t. gk(x) >_ Yk, Vk E ~', sparse graphs up to 3000 vertices is described;
yk--Oorl, Vk E K:', however, it assumes very small maximum clique
sizes (in the computational results presented in
xES'.
[6], maximum clique sizes do not exceed 11, and
Clearly, the constraint graph or hypergraph G' the running times seem to increase extremely fast
corresponding to a relaxation R[~'] is deduced with this parameter). Unfortunately, in view of the
from the constraint graph or hypergraph G by fact that, for our large RLFAP instances, the max-
deleting all edges associated with the constraints imum clique sizes turned out to be commonly in
in K~\~'. Also observe that if G' has several distinct the range [12, 25], the above algorithm could not
connected components, then the solution of R[K'] be used.
decomposes into independent subproblems, one for We therefore worked out a different implementa-
each connected component. tion of the implicit enumeration technique which
If the constraint graph or hypergraph G' is of allowed us to find guaranteed maximum cliques
sufficiently small size, then it is possible to solve for all the test problems treated within acceptable
R[~'] exactly, and the optimum solution value ob- computing times (see results at the end of the pa-
tained clearly leads to an upper bound to the op- per). Using this maximum clique algorithm, the
timum value of the original problem. When G' is procedure for building a relaxation to MAX-CSP
too large to get the exact optimal solution value of for R F L A P is as follows.
R[~ ~] then we will content ourselves with getting The heuristic solution method used in our ex-
an upper bound to this exact optimal value (see periments to implement step b l) is a variant of

239
Maximum constraint satisfaction: Relaxations and upper bounds

local search consisting in iteratively improving an T h e procedure SOLVE.RELAX(R[K:']) deter-


initial starting solution; at each iteration an ex- mines a decreasing sequence of u p p e r bounds to
act tree search is carried out to find an opti- the optimal value of R[K: ~] until either t e r m i n a t i o n
mal solution to a s u b p r o b l e m involving only a is obtained (at step c)) or the m a x i m u m compu-
few variables. In our c o m p u t a t i o n a l experiments tation time has been reached.
we observed t h a t the impact of the quality of In the former case, the exact o p t i m u m solution
the heuristic solutions p r o d u c e d at step b l) on value to R[E ~] is obtained; in the latter case, only
the quality of the relaxation obtained at the end an upper b o u n d to this optimal value is produced.
of B U I L D . R E L A X was practically negligible (the
a Initialization: Set 0 +--IK~'I.
main reason for this is t h a t ~ is only used as a
b Current step:
stopping criterion in the process of successive ex- Apply FIND.SOLUTION(R[K:'], 0)
traction of m a x i m u m cliques). T h e c o m p u t a t i o n a l IF the answer is NO,
results shown below confirm t h a t b o u n d s of good THEN set 0 +- 0 - 1 and return to b).
average quality indeed result from the above con- ELSE perform step c).
struction. c A YES answer has been obtained at step b):
0 is the optimal solution value to R[)~']. Ter-
Set: G = IX, U] +-- G (the initial constraint minate.
graph), i +-- 0
Current step" Procedure SOLVE.RELAX(R[K:']).
Apply a heuristic solution algorithm to get a
good approximate solution to MAX-CSP on G. W h e n G ~, the constraint g r a p h of R[E ~] has sev-
Let y denote the number of constraints satisfied eral distinct connected c o m p o n e n t s corresponding
in this solution. to subsets of constraints, K:~,... ,K:p, then solv-
IF Y - IUI go to c) (end of the construction),
ing R[)~ ~] decomposes into the solution of several
ELSE set' i +-- i + 1.
smaller subproblems R [ K ~ ] , . . . , R[~p]. In the pro-
Look for a maximum clique on G. Let Ci be the
clique obtained, with node set N(Ci) and edge cedure S O L V E . R E L A X , this decomposability may
set E(Ci). be exploited in various possible ways. In our im-
Let G' denote the subgraph of G induced by plementation, this is done by organizing the com-
X\N(Ci) (obtained from G by deleting all edges p u t a t i o n into phases n u m b e r e d t - 0, 1, .... The
having at least one endpoint in N(Ci)).
current u p p e r b o u n d value UB is initialized by:
Set G +--G' and return to b).
IF i = 0, the problem is feasible and step b l) UB +-- IK~I. T h e current phase t consists in run-
produces an assignment satisfying all the con- ning the procedure F I N D . S O L U T I O N on each of
straints. Terminate. the subproblems R[)~], j - 1 , . . . , p , with the pa-
ELSE the relaxation R[~'] obtained corre- l
r a m e t e r 0 - I)i~jl- t. Each time a NO answer is
sponds to the set K:' of all constraints in
obtained, UB is u p d a t e d by UB +- U B - 1. Clearly
U}=IE(Cj).
with the above process, when a YES answer has
Procedure BUILD.RELAX. been obtained for some s u b p r o b l e m R [ ~ ] during
phase t, this s u b p r o b l e m should not be considered
S o l v i n g t h e R e l a x e d P r o b l e m R[K:~]. In order any more at later phases t ~ > t. T h e c o m p u t a t i o n
to solve the relaxed problem R[)~ ~] we use a ba- stops either at the end of a phase during which
sic procedure called F I N D . S O L U T I O N ( R [ ~ ' ] , 0 ) a YES answer has been obtained for all subprob-
which, for any integer value 0 e [1, IK'I], answers lems; or when a user-specified time limit has been
YES or NO depending on w h e t h e r there exists reached.
a solution to R [ ~ ~] with objective function value T h e basic procedure F I N D . S O L U T I O N has
z >_ 0 or not. In case of a YES answer, the pro- been i m p l e m e n t e d as a classical d e p t h first tree
cedure also exhibits the corresponding solution. search process of the implicit e n u m e r a t i o n type,
We assume t h a t this procedure is exact i.e. always (achieved by means of a recursive C function).
finds the right answer. Clearly, any value of 0 lead- Since getting the exact answer (YES or NO) is
ing to a NO answer produces an upper bound to essential to the derivation of our bounds, the pro-
the optimal solution value of R[K~]. cedure F I N D . S O L U T I O N is r u n until full comple-

240
Maximum constraint satisfaction: Relaxations and upper bounds

tion of the tree search (i.e. when all the nodes of 5 minutes to 35 minutes with an average of about
the tree have been explored implicitly or explic- 12 minutes.
itly). Prob. n K NF Relaxation
# # var. # const.
1 680 2389 8 44 257
2 680 3367 16 38 339
Computational R e s u l t s . In order to validate
3 680 4103 24 84 671
the above described approach, systematic compu- 4 680 2725 8 74 490
tational experiments have been carried out on two 5 680 2576 8 46 311
series of test problems. 6 680 2470 8 44 284
The first set was composed of 15 infeasible real 7 831 3451 16 16 113
8 831 4802 24 33 248
problems which arose from actual network engi-
9 396 1792 12 70 375
neering studies carried out on three distinct large 10 396 1792 12 70 375
radio link networks (one in the 2GHz frequency 11 396 1792 12 70 375
range, one in the 2, 5GHz frequency range and one 12 396 1792 12 70 375
in the 4GHz frequency range). 13 396 1792 12 70 375
14 396 1792 12 70 37~
The second series concerned a set of 5 x 15 -
15 396 1792 12 70 375
75 'realistic' test problems generated by applying
some r a n d o m p e r t u r b a t i o n to the above 15 real Table I.
problems. More precisely, each problem of the sec- Prob. HS Best upper bound
ond series is generated from one problem of the # obtained within
15 s 5' 1h
first series by changing the weight wij of each in-
1 2376 2387 2385 2383
equality constraint of the form: Ixi - xj] >_ wij to:
2 3358 3367 3366 3365
wij - wij × (a + ~ ) where (I, is a pseudoran- 3 4090 4102 4098 4098
dom number drawn from a uniform distribution 4 2700 2720 2713 2708
on [0, 1] and a,/3 are chosen parameters (of course 5 2559 2571 2569 2564
the p s e u d o r a n d o m drawing is assumed to be inde- 6 2457 2467 2464 2459
7 3440 3450 3450 3450
pendent from one constraint to the next).
8 4781 4800 4800 4799
Table I presents the characteristics of the 15 real 9 1762 1786 1780 1777
test problems treated, numbered 1 to 15 and pro- 10 1759 1786 1780 1776
vides for each problem: number of variables (n), 11 1761 1786 1780 1778
number of constraints (K), number of distinct fre- 12 1764 1786 1780 1776
13 1761 1786 1780 1775
quencies used ( N F ) and the main characteristics
14 1757 1786 1780 1775
of the relaxed subproblem obtained from the pro- 15 1764 1786 1783 1777
cedure BUILD.RELAX" number of variables # v a r ,
Table II.
and number of constraints # c o n s t .
Table III presents in a similar way the charac- Table II shows the results obtained on the 15
teristics of the 5 x 15 - 75 test problems deduced real test problems of Table I and Table IV shows
from the previous ones by r a n d o m perturbation. the results for t h e 5 x 15 problems of Table III.
The 5 instances corresponding to each basic prob- The computer used was a P C P e n t i u m 166 work-
lem i are numbered i 1 , . . . , i5. For each instance the station with 32Mb RAM. For each problem we pro-
values of the parameters a a n d / 3 used to generate vide: HS, the best heuristic solution value obtained
the instance are displayed together with the char- (number of satisfied constraints); the best upper
acteristics (number of variables, number of con- bounds obtained after 15 seconds, 5 minutes and
straints) of the relaxed subproblem produced by 1 hour. The results in Table II confirm that our
BUILD.RELAX. approach is practical to consistently produce good
The c o m p u t a t i o n times taken to construct the bounds for real R L F A P instances within accept-
relaxed subproblems (using B U I L D . R E L A X ) on able solution times.
the problems of Tables I and III, are all between

241
Maximum constraint satisfaction: Relaxations and upper bounds

Prob. a /3 Relaxation
# # var. # const.
11 0,5 1 42 257
Prob. C~ Relaxation
12 0,5 1 42 261 # var. # const.
13 0,8 0,4 42 261
91 0,2 1,6 12 375
14 0,8 0,4 42 261
92 0,2 1,6 12 66
15 0,8 0,4 42 261
93 0,2 1,6 48 66
21 0,5 1 38 339
94 0,2 1,6 24 264
22 0,5 1 38 339
95 0,2 1,6 36 132
23 0,8 0,4 38 339
101 0,2 1,6 36 375
24 0,8 0,4 38 339
102 0,2 1,6 12 198
25 0,8 0,4 38 339
103 0,2 1,6 48 66
31 0,5 1 54 671
104 0,2 1,6 24 264
32 0,5 1 70 460
105 0,2 1,6 36 132
33 0,8 0,4 84 480
111 0,2 1,6 36 375
34 0,8 0,4 84 671
112 0,2 1,6 12 198
35 0,8 0,4 54 671
113 0,2 1,6 48 66
41 0,5 1 74 490
114 0,2 1,6 24 264
42 0,5 1 74 490
115 0,2 1,6 36 132
43 0,8 0,4 74 490
121 0,2 1,6 24 375
44 0,8 0,4 74 490
122 0,2 1,6 12 132
45 0,8 0,4 74 490
123 0,2 1,6 48 66
51 0,5 1 46 311
124 0,2 1,6 24 264
52 0,5 1 46 311
125 0,2 1,6 36 162
53 0,8 0,4 46 311
131 0,2 1,6 36 375
54 0,8 0,4 46 311
132 0,2 1,6 12 198
55 0,8 0,4 46 311
133 0,2 1,6 48 66
61 0,5 1 44 284
134 0,2 1,6 24 264
62 0,5 1 44 284
135 0,2 1,6 36 132
63 0,8 0,4 44 284
141 0,2 1,6 36 375
64 0,8 0,4 44 284
142 0,2 1,6 12 198
65 0,8 0,4 44 284
143 0,2 1,6 48 66
71 0,8 0,4 16 113
144 0,2 1,6 24 264
72 0,8 0,4 16 113
145 0,2 1,6 36 132
73 0,8 0,4 16 113
151 0,2 1,6 24 375
74 0,8 0,4 16 113
152 0,2 1,6 12 132
75 0,8 0,4 16 113
153 0,2 1,6 48 66
81 0,5 1 33 248
154 0,2 1,6 24 264
82 0,5 1 33 248
155 0,2 1,6 36 132
83 0,5 1 33 248
84 0,5 1 33 248
85 0,5 1 33 248

Table III.

242
M a x i m u m constraint satisfaction: Relaxations and upper bounds

Prob. HS Best upper bound


# obtained within
15 s 5' 1h
Prob. HS Best upper bound
11 2376 2386 2383 2378
# obtained within
12 2376 2386 2383 2378
15 s 5' 1h
13 2376 2386 2383 2378
91 1779 1 7 9 1 1 7 9 1 1791
14 2376 2386 2383 2378
92 1777 1 7 9 1 1790 1789
15 2376 2386 2383 2378
93 1774 1788 1788 1785
21 3358 3366 3365 3365
94 1777 1790 1789 1789
22 3358 3367 3365 3365
95 1779 1789 1787 1787
23 3358 3367 3366 3365
101 1780 1789 1788 1787
24 3358 3367 3365 3365
102 1780 1 7 9 1 1790 1788
25 3358 3367 3366 3365 103 1776 1788 1787 1785
31 4081 4103 4101 4101 104 1778 1790 1789 1789
32 4081 4102 4101 4101 105 1777 1789 1788 1788
33 4086 4102 4098 4098 111 1783 1789 1789 1789
34 4086 4102 4098 4098
112 1780 1791 1789 1788
35 4088 4102 4101 4101
113 1777 1788 1788 1786
41 2700 2720 2713 2708
114 1780 1790 1789 1789
42 2700 2720 2713 2708
115 1779 1789 1788 1787
43 2700 2720 2713 2708
121 1779 1790 1790 1789
44 2700 2720 2713 2708
122 1780 1791 1790 1789
45 2700 2720 2713 2708
123 1777 1788 1788 1786
51 2559 2571 2569 2564
124 1780 1790 1789 1787
52 2559 2572 2569 2564
125 1778 1789 1788 1787
53 2559 2572 2569 2564
131 1782 1789 1789 1788
54 2559 2573 2569 2564
132 1777 1790 1789 1789
55 2559 2573 2569 2564
133 1776 1788 1787 1786
61 2457 2467 2464 2459
134 1779 1790 1789 1789
62 2457 2467 2464 2459
135 1777 1789 1788 1788
63 2457 2467 2464 2459
141 1782 1789 1789 1788
64 2457 2467 2464 2459
142 1775 1791 1789 1789
65 2457 2467 2464 2459
143 1775 1788 1787 1786
71 3438 3450 3450 3450
144 1779 1791 1789 1789
72 3437 3450 3450 3450
145 1776 1789 1788 1788
73 3421 3430 3430 3430
151 1780 1790 1790 1789
74 3414 3424 3424 3424 152 1779 1791 1789 1788
75 3436 3450 3450 3450 153 1777 1788 1788 1788
81 4780 4800 4800 4799
154 1781 1790 1789 1788
82 4783 4800 4800 4799
155 1780 1789 1788 1788
83 4778 4800 4800 4799
84 4781 4800 4800 4799
85 4781 4800 4800 4799

Table IV.

243
Maximum constraint satisfaction: Relaxations and upper bounds

From Tables II and IV, it is seen t h a t for all Left. 9 (1990), 375-382.
the instances treated, the difference between the [7] CASTELINO, D.J., HURLEY, S., AND STEPHENS, N.M.:
'A tabu search algorithm for frequency assignment',
heuristic solution values HS and the best u p p e r
Ann. Oper. Res. 63 (1996), 301-319.
bounds obtained are always quite small. More
[8] CELAR: Radio link frequency assign-
precisely for all the examples treated, the ratio ment problem benchmark, CELAR, 1994,
R = ( U P - H S ) / U B is most often well below 1% ft p. cs. unh. edu /p ub / csp / archive / code/benchmarks.
(Problem 14 in Table II is the only one for which [9] Du, D.-Z., Gu, J., AND PARDALOS, P.M. (eds.): Sat-
R > 1%). We note t h a t since HS is only a lower is fiability problem: Theory and applications, Vol. 35 of
DIMACS, Amer. Math. Soc., 1997.
bound, R is a pessimistic estimate of the relative
[10] FISHER, M.L.: 'The Lagrangian relaxation method for
difference between the best u p p e r b o u n d obtained solving integer programming problems', Managem. Sci.
and the optimal, unknown, solution value. 2r (1981), 11-18.
Also, from Table IV, it is seen t h a t the results [11] FREUDER, E.G., AND WALLACE, R.J.: 'Partial con-
obtained a p p e a r to be fairly stable, in spite of the straint satisfaction', Artif. Intell. 58 (1992), 21-70.
[12] GAREY, M.R., AND JOHNSON, D.S.: Computers
i m p o r t a n c e of the p e r t u r b a t i o n s applied to gener-
and intractability. A guide to the theory of NP-
ate the corresponding 75 instances. In addition to completeness, Freeman, 1979.
practical applicability, and efficiency, this clearly [13] GAREY, M.R., JOHNSON, D.S., AND STOKMEYER, L.:
shows good stability and robustness in the behav- 'Some simplified NP-complete graph problems', Theo-
ior of our algorithms. To our knowledge, this is ret. Computer Sci. 1 (1976), 237-267.
the first time a systematic way of deriving u p p e r
[14] GONDRAN, M., AND MINOUX, M.: Graphes et algo-
rithmes, third ed., Eyrolles, 1995.
b o u n d s to such large scale M A X - C S P problems [15] HAJEMA, W., MINOUX, M., AND WEST, C.: 'CALMA
has been i m p l e m e n t e d and fully tested. project specification': Statement of the Radio Link Fre-
To conclude, let us mention that, in view of quency Assignment Problem. Appendix 3 to EUCLID
the results obtained, the techniques described here RTP, 6-~ Implementing Arrangement, June 25, 1992.
HALE, W.K.: 'Frequency assignment theory and appli-
have been included in an industrial software tool [16]
cations', Proc. IEEE 68, no. 12 (1980), 1497-1514.
for radio network engineering developed by the
[17] HANSEN, P., AND JAUMARD, B.: 'Algorithms for the
French M O b ( D G A / C E L A R ) . maximum satisfiability problem', R UTCOR Res. Re-
See also" F r e q u e n c y assignment problem; port Rutgers Univ., New Jersey (USA) R R R ~ 4 3 - 8 7
Graph coloring. (1987).
[i8] HURLEY, S., THmL, S.U., AND SMITH, D.H.: 'A com-
parison of local search algorithms for radio link fre-
References quency assignment problems': Proc. A CM Symposium
[1] BABEL, L., AND TINHOFER, G.: 'A branch and bound on Applied Computing, 1996, pp. 251-257.
algorithm for the maximum clique problem', ZOR: [19] JOHNSON, D.S.: 'Approximation algorithms for com-
Methods and Models of Oper. Res. 34 (1990), 207-217. binatorial problems', J. Comput. Syst. Sci. 9 (1974),
[2] BERGE, C.: Graphes et hypergraphes, second ed., 256-278.
Dunod, 1973. [20] Joy, S., MITCHELL, J., AND BORCHER, S.B.: 'A
[3] BOROS, E., AND PRI~KOPA, A." 'Probabilistic bounds branch and cut algorithm for MAX-SAT and weighted
and algorithms for the maximum satisfiability prob- MAX-SAT': Satisfiability Problem: Theory and Appl.,
lem', RUTCOR Res. Report Purgers Univ., New Jersey DIMACS 35, Amer. Math. Soc., 1997, pp. 519-536.
(USA) RRR#1r-88 (1988). [21] KUMAR, V.: 'Algorithms for constraint satisfaction
[4] SovJu, A., BOYCE, J.F., DIMITROPOULOS, C.H.D., problems: A survey', AI Magazine Spring (1992), 32-
SCHEIDT, G. VON, AND TAYLOR, J.G.: 'Tabu search 44.
for the radio link frequency assignment problem': [22] LANFEAR, T.A.: 'Graph theory and radio frequency
Proc. Conf. on Applied Decision Technologies: Mod- assignment': NATO EMC Analysis Project, Vol. 5,
ern Heuristic methods, Brumel Univ., Uxbridge (UK), NATO, 1989.
1995, pp. 233-250, work carried out in the CALMA [23] LIEBERHERR, K.J.: 'Algorithmic extremal problems in
PROJECT. combinatorial optimization', J. Algorithms 3 (1982),
[5] BOURRET, P.: 'Simulated annealing': Deliverable 2.3 225-244.
of the CALMA Project. Report 3/3507.00/DERI- [24] LIEBERHERR, K.J., AND SPECKER, E.: 'Complexity of
ONERA-CERT, CALMA, 1995. partial satisfaction', J. ACM 28, no. 2 (1981), 411-421.
[6] CARRAGHAN,a., AND PARDALOS, P.M.: 'An exact al- [25] MAVROCoRDATOS, P., AND MINOUX, M.: 'Allocation
gorithm for the maximum clique problem', Oper. Res.

244
Maximum entropy principle: Image reconstruction

de ressources dans les r4seaux (frequency allocation Image reconstruction is a procedure for process-
in networks)', Final Techn. Report CELAR con- ing the measurement data to construct an image
tract ~ 0 1 1 4 1 9 3 (1995), R4solution de probl~mes
of the object. This section introduces the basic
d'optimisation combinatoire pour application
l'allocation optimis4e de fr4quences dans les grands concept of image reconstruction from projection
r4seaux. data. Two types of entropy optimization models,
[26] PARDALOS, P.M., AND RODGERS, G.P.: 'A branch and namely, the finite-dimensional model and vector-
bound algorithm for the maximum clique problem', space model, and three classes of entropy optimi-
Comput. Oper. Res. 19, no. 5 (1992), 363-375.
zation methodologies, namely, the discretization
[27] POLJAK, S., AND TURZIK, D.: 'A polynomial algorithm
for constructing a large bipartite subgraph, with an ap- methods, Banach-space methods (e.g., MENT)
plication to a satisfiability problem', Canad. J. Math. and Hilbert-space methods (e.g., finite element
X X X I V , no. 3 (1982), 519-524. method) are included. For more details about im-
[28] RESENDE, M.G.C., PITSOULIS, L.S., AND PARDALOS, age reconstruction, the reader is referred to [7], [2],
P.M.: 'Approximate solution of weighted MAX-SAT [13] and the references therein.
problems using GRASP': Satisfiability Problem: The-
ory and Appl., DIMACS 35, Amer. Math. Sou., 1997,
A very important scientific application of im-
pp. 393-405. age reconstruction is in computerized tomography
[29] ROBERTS, F.S.: 'T-colorings of graphs; recent results (CT) for medical diagnosis. Physicians need to
and open problems', Discrete Math. 93 (1991), 229- know, for example, the location, shape, and size
245. of a suspected tumor inside a patient's brain in
[30] SMITH, D.H., AND HURLEY, S.: 'Bounds for the fre-
order to plan a suitable course of treatment. With
quency assignment problem', Discrete Math. 167/168
(1997), 571-582. computerized tomography, images of cross-sections
[31] SMITH, D.H., HURLEY, S., AND THIEL, S.U.: 'Improv- of a human body can be constructed from data
ing heuristics for the frequency assignment problem', obtained by measuring the attenuation of X-rays
Europ. J. Oper. Res. 107 (1998), 76-86. along a large number of straight lines (or strips)
[32] WERRA, D. DE, AND GAY, Y.: 'Chromatic scheduling
through each cross-section. For ease of introduc-
and frequency assignment', Discrete Appl. Math. 49
(1994), 165-174. tion, we illustrate the basic ideas about image re-
construction with the example of two-dimensional
M. Minoux
X-ray CT, with the understanding that the dis-
Univ. Paris 6
cussion can be generalized to higher-dimensional
4 place Jussieu
75005 Paris, France settings.
E-mail address: Michel. Minoux@lip6. fr In this example, the distribution to be deter-
P. Mavrocordatos mined is that of the X-ray linear attenuation co-
Algotheque and Univ. Paris 6 efficient of human body tissues. The total attenu-
4 place Jussieu ation of the X-ray beam between a source and a
75005 Paris, France
detector is approximately the integral of the linear
E-mail address: p 2 m ~ a l g o t h e q u e , com
attenuation coefficient along the line between the
MSC 2000:90C10 source and the detector. The unknown distribution
Key words and phrases: constraint satisfaction, relaxation. of the X-ray linear attenuation coefficient is rep-
resented by a density function f of two variables,
which assumes zero-value outside a squared-shape
MAXIMUM ENTROPY PRINCIPLE: IM- region. The squared region is usually referred to as
AGE R E C O N S T R U C T I O N , entropy optimiza- the support of the image.
tion for image reconstruction Two basic types of entropy optimization models,
Images can be used to characterize the underlying namely, finite-dimensional model and vector-space
distribution of certain physical properties, such as model, are commonly used to decide the density
density, shape, and brightness, of an object un- function f. The finite-dimensional models approx-
der investigation. In many applications where an imate the density values over the support of the
image is required, only a finite number of observa- image at a finite number of grid points, while the
tions and/or indirect measurements can be made. density is approximated by a real-value function

245
M a x i m u m entropy principle: Image reconstruction

for the entire scanning region in the vector-space zation approach' to find a solution that is not
models. The latter models were motivated to re- only feasible in the above sense but also optimal
construct the image with only a small number of with respect to a certain criterion. In the liter-
available projections. ature, at least three different types of optimiza-
In the finite-dimensional models, the support tion problems have been proposed, namely, the en-
of the density f is represented by n (given by tropy maximization problem, the quadratic min-
the users) regularly spaced grid points, and the imization problem, and the maximum likelihood
values of the density function f at these points problem.
are denoted by f - ( f l , . . . , f n ) . Assume that m The entropy optimization problem seeks to op-
projections are made and the measurement data timize an entropic objective function subject to (2)
d - ( d l , . . . , din) are obtained. and (3) as follows.
The relationship between the unknown density Model 1"
values f and the observed measurement d can be n

approximated by a linear relation - ZSjlnSj

d ~ Af, (1)
s.t. d-e _ A f _< d + e ,
where A = [aij] is a projection matrix. f>_O.
Note that the approximation sign in (1) reflects
possible errors in modeling and measurement. Also Some researchers proposed models in which the
f j ' s are normalized in such a way that Ejn__l f j -
note that, in the classical square pixel model, the
1, and the projection matrix and the measurement
image is discretized by partitioning its support into
data differ from those of Model 1. See, e.g., [4]. In
a finite number of equi-sized square regions (called
this way, a solution that is consistent with the mea-
pixels or cells) whose centers are those n sample
surement data but remains maximally noncommit-
points. By assuming that the density function f
tal can be found. Note that an optimal solution to
is constant in each of the equi-sized pixels, i.e.,
such models can also be interpreted as the most
f = f j throughout pixel j, the value of aij in the
probable solution that is consistent with the mea-
projection matrix is simply the length of the in-
tersection of the line corresponding to the ith pro- surement data [3].
jection with the pixel surrounding the j t h sample Other variations of Model 1 exist. Despite pos-
point. sible modeling and measurement errors, one com-
mort practice is to replace (1) and inequalities (2),
Once the projection matrix A is defined and
and (5) by a system of equations" A f - d.
the measurement d is known, the problem is to
find an f satisfying (1). To cope with the errors A different version of the finite-dimensional en-
mentioned above, G.T. Herman [6] suggested that tropy optimization model begins with the defini-
(1) be replaced by an 'interval constraint' and a tion of an error vector e - (el, ... , era) T, where
nonnegativity constraint be added: n

ei - di - E aij f j, i = l, . . . , m .
d-e < Af < d +e, (2) j=l
f_~O, (3)
Assume that errors e l , . . . , em exist due to impre-
where e - ( e l , . . . , e m ) is an m-vector of user- cise measurement and are independent noise terms
chosen tolerance levels. Note that (2) can be re- with zero mean and known variance a i2 . S.F . Burch
placed by an equivalent system of inequalities et al. [1] observed that the strong law of large num-
bers implies that
A ' f <_ d', (4)
with twice as many one-sided inequalities [2], [6].
For such an image reconstruction model, we can
m i- 1 °i
adopt either the 'feasibility a p p r o a c h ' to find a
as m ---+ (x).
solution to (2) and (3) directly, or the ' o p t i m i -

246
Maximum entropy principle: Image reconstruction

Thus, if m is sufficiently large, the following en- the desirability of employing iterative methods in
tropy optimization problem with quadratic con- CT systems.
straints can be useful: In many situations, e.g., conducting diagnostic
Model 2: experiments on plasma in magnetic confinement
devices or laser target impositions with measure-
n
ments on fusion reactor cores, only few projec-
max -- ~ f j ln f j
j=l
tions are available, e.g., less than 10. When the
1 finite-dimensional entropy optimization model is
s.t. - - ( A f - d ) T S 2 ( A f - d) - 1,
m applied, it tends to produce 'streaking' artifacts.
fj>_O, j- 1,...,n, This motivated the use of the vector-space model.
Take the two-dimensional X-ray CT problem as
where S is a diagonal matrix with 1/ai being its an example. By assuming that the unknown den-
ith diagonal element. sity function f(x, y) is continuous over a compact
Concerns such as the smoothing effect, nonuni- support D such that
formity, peakness, and exactness [14] of a con-
structed image can also be addressed in this model I(x, Y) > 0 and [ f f(x, y)dx dy - 1, (6)
t]
with proper modification of the objective functions , 2

D
and constraints. So far, we have used the square
pixel model to illustrate the idea of entropy opti- G. Minerbo [9] defined the entropy of ](x, y) as
mization for image reconstruction. Other models
exist [2].
i(I) - - [ [ f(x, y)ln[f(x, y)A] dx dy,
i] i]

For an introduction to the concept of Shan- D

non's entropy and related entropy optimization where A is the area of D. Denote the set of contin-
principles, i.e., principle of maximum entropy and uous, nonnegative functions with compact support
principle of minimum cross-entropy, see E n t r o p y in D by C+ (D).
o p t i m i z a t i o n : S h a n n o n m e a s u r e of e n t r o p y The scanning area is partitioned into parallel
a n d its p r o p e r t i e s . A large amount of literature strips, each of which is penetrated by an X-ray
has been devoted to developing iterative methods beam. Let Oj, j = 1,..., J, be the J distinct pro-
for solving finite-dimensional entropy optimization jection angles with respect to the X-axis of the
problems with linear and/or quadratic constraints. scanning area. Also let M(j) be the number of
For details and a unification of such methods, see parallel beams associated with the j t h projection
[3]. or view, and Sjl < " " < S j / ( j ) be a set of ab-
The method currently employed in most CT scissas for the j t h view. The projection data are
systems is the 'filtered back-projection' method, assumed to be in the form of the following 'strip
which is based on a finite-dimensional model. (See integrals':
[5], [10] for details.) Compared to the iterative
Sj(m+1) oo
methods for solving entropy optimization prob-
lems, this method provides speed, which enables
reconstruction of the image while X-ray transmis-
sion data are being collected. Hence the time be- f (s cos Oj - t sin Oj, s sin Oj + t cos Oj) dt ds,
tween scanning and obtaining reconstructed im-
ages is reduced. However, there are situations where m = 1 , . . . , M ( j ) and j = 1 , . . . , J. It is as-
where iterative methods produce comparable or sumed that, for j = 1 , . . . , J,
better reconstructed images than the filtered back- O0

projection method, e.g., in image reconstruction


with few projections or in high-contrast image re-
/
--O0
f (s cos Oj - t sin Oj, s sin Oj + t cos Oj) dt = O,

construction. The ever increasing computer speed


and its companion reduction in cost may increase for 8 < ~jl or 8 > SjM(j ).

247
Maximum entropy principle: Image reconstruction

Let Gjm denotethe observed values of Pjm(f), for sup G(f) -- ~(f) -- "/~-~.[Gjm -- Pjm(f)] 2,
m - 1,... ,M(j), and j - 1 , . . . , J. Note that (6) left j,m
implies aim >_ 0 and )--~M(j) Gjm - 1. where ~ > 0 is an adjustable penalty parameter
Then the vector-space model results in the fol- and ft is a convex and weakly (sequentially) com-
lowing optimization problem" pact set of nonnegative functions in L2(D), with
Model 3" a compact support in D and containing physical
information known a priori about the object to be
sup ¢(:) scanned, e.g., upper and lower bounds on the den-
C+(D) sity function. (A set gt of nonnegative functions
s.t. Pjm(f) - Gjm, (7) in L2+(D)is weakly (sequentially) compact if and
m - 1,...,M(j); only if every sequence in f~ has a weakly convergent
subsequence whose weak limit lies in ft; a sequence
j-1,...,J.
{ fn (x, y) } converges weakly to f (x, y) if and only
if the sequence {(fn(x, y), g(x, y)} } converges to
A finite-dimensional unconstrained dual prob- (f(x,y),g(x,y)} for every g(x,y) e L2+(D), where
lem can be derived by using the technique of La- (hi, h2) - f f hi (x, y)h2(x, y)dxdy denotes the in-
grange multipliers. An algorithm known as MENT ner product of hi and h2 in the space of L2+(D).)
[9] was proposed. It was shown that the solu- With the aid of the theory of Hilbert space, it
tions produced by MENT converge to a density can be shown [8] that G has a unique maximizer
function f* which satisfies the constraint (7) with in f~, for any given data Gjm, m = 1,... , M ( j ) ,
~(f*) - SuPc+(D)~(f). However, the limiting den- j-1,...,J.
sity function f* is not continuous. Actually, as
Based on this alternative formulation, the den-
pointed out in [8], f* is piecewise constant and
sity function f(x, y) can be approximated by using
f* ~ C+(D). When few projections are available the finite element method [11]. For simplicity, as-
and the object being scanned has a simple struc-
sume that D - [-1, 1] x [-1, 1]. First, we superim-
ture (or close to circular symmetry in density),
pose a fixed rectangular mesh on D, with uniform
some preliminary computational results indicated
mesh size h - 1In in both the x and y directions.
the potential of this approach.
We also use the product of piecewise linear func-
Recognizing the fact that the supremum of tions in x and y as the finite element space S h. In
Model 3 is not attained by any function f C this way, a basis for S h has the form
C+(D), M. Klaus and R.T. Smith [8] defined an al-
ternative formulation in a richer class of functions
than C+(D). More precisely, they replaced C+(D) for k- 1 , . . . , ( 2 n + 1) 2 ,
by L~_(D), the set of all nonnegative square inte-
where
grable functions on D, as the setting. Note that
(k-1)-(k-1) (mod2n+l)~_
all piecewise-constant functions over D are con- n,
2n+1 J
tained in L2+(D). Also recognizing that measure-
i-k-(l+n)(2n+l)-n-1,
ments may not be consistent and even be flawed,
they considered an optimization problem where and
the objective function is the original entropy func- if t<(j-1)h
tional ~(f) minus a penalty term corresponding or t_>(j+l)h,
to the residual error in meeting the measurement t-(j- 1)h
Cj(t) -
constraints, and the constraint is that the maxi- if (j-1)h<t<jh,
mizer lies in a weakly compact set that is deter-
if jh <_t <_ (j + l)h.
mined by known physical information about the
density function of the object to be scanned. A It is reasonable to expect that, in practice, one
corresponding formulation becomes should know a priori the minimum and maximum
Model 4: densities of the object being examined. Hence we

248
Maximum flow problem

focus on a simple constraint set [4] FRIEDEN, B.R.: 'Restoring with maximum likelihood
and maximum entropy', J. Optical Soc. Amer. 62
~ _ { f E L2+(D). O < a < f < b < oc a.e., } (1972), 511-518.
f - 0 a.e., i n R 2 \ D " [5] HENDEE, W.R.: The physical principles of computed
tomography, Little, Brown and Company, 1983.
The density function f ( x , y ) is then approxi-
[6] HERMAN, G.T.: 'A relaxation method for reconstruct-
m a t e d in S h by ing objects from noisy X-rays', Math. Program. 8
N (1975), 1-19.
[7] HERMAN, G.T. (ed.): Image reconstruction from pro-
](x, y) - Z y),
k-1 jections: implementation and applications, Springer,
1979.
where N - (2n + 1) 2 and Ck'S are chosen as the [8] KLAUS, M., AND SMITH, R.T.: 'A Hilbert space ap-
optimal solution of the following finite-dimensional proach to maximum entropy reconstruction', Math.
optimization problem: Meth. Appl. Sci. 10 (1988), 397-406.
[9] MINERBO, G.: 'MENT: A maximum entropy algorithm
for reconstructing a source from projection data', Com-
puter Graphics and Image Processing 10 (1979), 48-68.
eERN k=l
N [10] NATTERER, F.: Mathematics of computerized tomogra-
phy, Wiley, 1986.
s.t. 0 < a <_ ~ CkCk(X, y) <_ b. [11] SMITH, R.T., AND ZOLTANI, C.K.: 'An application of
k=l
the finite element method to maximum entropy tomo-
This problem can be further reduced to graphic image reconstruction', J. Sci. Comput. 2, no. 3
(1987), 283-295.
[12] SMITH,R.T., ZOLTANI,C.K., KLEM,G.J., ANDCOLE-
sup ~ CkCk(x,y) MAN, M.W.: 'Reconstruction of tomographic images
cER g k--1
from sparse data sets by a new finite element maximum
entropy approach', Applied Optics 30, no. 5 (1991),
- ~ ~_~ aim - ~ ckPjm(¢k(x, y)) 573-582.
j,m k=l
[13] STARK, H. (ed.): Image recovery: theory and applica-
,s.t. O < a < ck < b, k-1,...,N. tion, Acad. Press, 1987.
[14] WANG, Y., AND LU, W.: 'Multi-criterion maximum
Preliminary computational results reported in entropy image reconstruction from projections', IEEE
[11], [12] indicate some improvements of this alter- Trans. Medical Imaging 11 (1992), 70-75.
native approach over the M E N T algorithm when
Shu-Cherng Fang
the object under investigation does not have cir- North Carolina State Univ.
cular s y m m e t r y in density and has a high density North Carolina, USA
area near the edge of the scanning region. E-mail address: fang(Deos .ncsu. edu
See also" Entropy optimization: Shannon H.-S. Jacob Tsao
measure of entropy and its properties; San Jose State Univ.
San Jose, California, USA
Jaynes' maximum entropy principle; En-
E-mail address: jtsao©email, sjsu. e d u
tropy optimization: Parameter estimation;
Entropy optimization: Interior point meth- MSC 2000: 94A17, 94A08
Key words and phrases: entropy optimization, image recon-
ods; Optimization in medical imaging.
struction, maximum entropy principle, principle of maxi-
mum entropy.
References
[I] BURCH,S.F., GULL,S.F., AND SKILLING, J.K.: 'Image
restoration by a powerful maximum entropy method',
Computer Vision, Graphics, and Image Processing 23 MAXIMUM FLOW PROBLEM
(1983), 113-128. The maximum flow problem seeks the m a x i m u m
[2] CENSOR,Y., ANDHERMAN, G.T.: 'On some optimiza- possible flow in a capacitated network from a spec-
tion techniques in image reconstruction', Applied Nu-
ified source node s to a specified sink node t with-
mer. Math. 3 (1987), 365-391.
[3] FANG,S.-C., RAJAsEKERA, J.R., AND TSAO, H.-S.J.: out exceeding the capacity of any arc. A closely re-
Entropy optimization and mathematical programming, lated problem is the minimum cut problem, which
Kluwer Acad. Publ., 1997. is to find a set of arcs with the smallest total ca-

249
Maximum flow problem

pacity whose removal separates node s and node In examining the maximum flow problem, we
t. The m a x i m u m flow and minimum cut problems impose two assumptions"
arise in a variety of application settings as diverse
i) all arc capacities are integer; and
as manufacturing, communication systems, distri-
bution planning, matrix rounding, and schedul- ii) whenever the network contains arc (i, j),
ing. These problems also arise as subproblems in then it also contains arc (j, i).
the solution of more difficult network optimization The second assumption is nonrestrictive since we
problems. In this article, we study the m a x i m u m allow arcs with zero capacity.
flow and minimum cut problems, briefly introduc- Sometimes the flow vector x might be required
ing the underlying theory and algorithms, and pre- to satisfy lower bound constraints imposed upon
senting some applications. See [2] for a wealth of the arc flows; that is, if lij ~_ 0 specifies the lower
additional material that amplifies on this discus- bound on the flow on arc (i, j) E A, we impose the
sion. condition xij ~_ lij. We refer to this problem as
Let G = (N, A) be a directed network defined the maximum y~ow problem with nonnegative lower
by a set N of n nodes and a set A of m directed bounds. It is possible to transform a maximum
arcs. We refer to nodes i and j as endpoints of arc flow problem with nonnegative lower bounds into
(i, j). A directed path il - i 2 - . . . - - i k is a set of arcs a maximum flow problem with zero lower bounds.
( i 1 , i 2 ) , . . . , ( i k - l , i k ) . Each arc (i,j) has an associ- The minimum cut problem is a close relative of
ated capacity uij denoting the m a x i m u m amount the maximum flow problem. A cut [S, S] partitions
of flow on this arc. We assume that each arc capac- the node set N into two subsets S and S - N- S
ity uij is an integer, and let U = max{uij: (i, j) It consists of all arcs with one endpoint in S and
A}. The network has two distinguished nodes, a the other in S. We refer to the arcs directed from
source node s and a sink node t. To help in rep- S to S, denoted by (S, S), as .forward arcs in the
resenting a network, we use the arc adjacency list cut and the arcs directed from S to S, denoted by
A(i) of node i, which is the set of arcs emanating (s, s), back a the cut The cut IS, S]
from it, that is, A(i) = {(i,j) E A: j E N } . is called an s- t-cut if s E S and t E S. We define
The m a x i m u m flow problem is to find the maxi- the apacity of IS, S], denoted
m u m flow from the source node s to the sink node as ~(i,j)~(s,~) uij. A minimum cut in G is an s-t-
t that satisfies the arc capacities and mass balance cut of minimum capacity. We will show that any
constraints at all nodes. We can state the problem algorithm that determines a maximum flow in the
formally as follows. network also determines a minimum cut in the net-
work.
max v (1)
The remainder of this article is organized as fol-
subject to lows. To help in understanding the importance of
the maximum flow problem, we begin by describ-
Xij -- E XJi (2)
{j. (i,j)EA} { j (j,i)EA} ing several applications. In the next section we
present some preliminary results concerning flows
v f o r / - - s, and cuts. We next discuss two important classes of
-- 0 f o r i ~ {s,t}, algorithms for solving the maximum flow problem:
--v for i -- t, augmenting path algorithms, and preflow-push al-
gorithms. As described in the next section, aug-
0 <_ xij < uij for all (i, j) E A. (3)
menting path algorithms augment flow along di-
We refer to a vector x - {xij} satisfying (2) rected paths from the source node to the sink node.
and (3) as a flow and the corresponding value of The proof of the validity of the augmenting path
the scalar variable v as the value of the flow. We algorithm yields the well-known max-flow min-cut
refer to the constraints (2) as the mass balance theorem, which implies that the value of a max-
constraints, and refer to the constraints (3) as the imum flow in a network equals the capacity of a
flow bound constraints. minimum cut in the network. In the next section,

250
Maximum flow problem

we study preflow-push algorithms that 'flood' the bution network. In this problem context, the re-
network so that some nodes have excesses and then finery corresponds to a particular node s in the
incrementally 'relieve' the flow from nodes with distribution network and the storage facility cor-
excesses by sending flow from excess nodes for- responds to another node t. The capacity of each
ward toward the sink node or backward toward arc is the maximum amount of oil per unit time
the source node. In the final section, we study im- that can flow along it. The value of a maximum
plications of the max-flow min-cut theorem and s - t flow determines the maximum flow rate from
prove some max-min results in combinatorics. the source node s to the sink node t. Similar ap-
We would like to design maximum flow algo- plications arise in other settings, for example, de-
rithms that are guaranteed to be efficient in the termining the transmission capacity between two
sense that their worst-case running times, that nodes of a telecommunications network.
is, the total number of multiplications, divisions,
additions, subtractions, and comparisons in the F e a s i b l e F l o w P r o b l e m . The feasible flow prob-
worst-case grow slowly in some measure of the lem consists of finding a feasible flow satisfying the
problem's size. We say that a maximum flow al- following constraints:
gorithm is an O(n 3) algorithm, or has a worst-
case complexity of O(n3), if it is possible to solve xij - ~ xji - b(i) (4)
any maximum flow problem using a number of (j: (i,j)EA) (j: (j,i)EA)
computations that is asymptotically bounded by for all i E N,
some constant times the term n 3. We say that 0 ~ xij ~_ uij for all (i, j) E A. (5)
an algorithm is a polynomial time algorithm if it's
worst-case running time is bounded by a polyno- We assume that ~-~iENb(i) -- O. The following
mial function of the input size parameters. For a distribution scenario illustrates how the feasible
maximum flow problem, the input size parameters flow problem arises in practice. Suppose that mer-
are n, m, and log U (the number of bits needed chandise available at several seaports is desired
to specify the largest arc capacity). We refer to by other ports. We know the stock of merchan-
a maximum flow algorithm as a pseudopolynomial dise available at the 'supply' ports, the amount
time algorithm if its worst-case running time is required at the other ports, and the maximum
bounded by a polynomial function of n, m, and U. quantity of merchandise that can be shipped on a
For example, an algorithm with worst-case com- particular sea route. We wish to know whether we
plexity of O(nm log U) is a polynomial time algo- can satisfy all of the demands by using the avail-
rithm, but an algorithm with worst-case complex- able supplies.
ity of O(nmU) is a pseudopolynomial time algo- We can solve the feasible flow problem by solv-
rithm. ing a maximum flow problem defined on an aug-
mented network as follows. We introduce two new
nodes, a source node s and a sink node t. For each
A p p l i c a t i o n s . The maximum flow problem arises
node i with supply (that is, with b(i) > 0), we
in a variety of situations and in several forms.
add an arc (s, i) with capacity b(i), and for each
Sometimes, it arises directly in combinatorial ap-
node i with demand (that is, with b(i) < 0), we
plications that on the surface might not appear to
add an arc (i, t) with capacity -b(i). We refer to
be maximum flow problems at all; at other times, it
the new network as the transformed network. We
occurs as a subproblem in the solution of more dif-
then solve a maximum flow problem from node s
ficult network optimization problems. In this sec-
to node t in the transformed network. It is easy
tion, we describe three applications of the maxi-
to show that the model (4)-(5) has a feasible so-
mum flow problem.
lution if and only if the maximum flow saturates
Capacity o] Physical Networks. An oil company all the arcs emanating from the source node, that
needs to ship oil from a refinery to a storage fa- is, xsj - usj for all arcs (s,j) E A(s). Moreover,
cility using the pipelines of its underlying distri- if each b(i) and uij is integer, then model (4)-(5)

251
Maximum flow problem

Row
Sum

3.1 6.8 7.3 17.2

9.6 2.4 0.7 12.7

3.6 1.2 6.6 11.3


(11,

Column 16.3 10.4 14.5 (6, 7)


Sum

(a) (b)

Fig. 1: Network for the matrix rounding problem.

always has an integer feasible solution whenever it Using a numerical example, we will show how to
has a feasible solution (see Theorem 3). transform a matrix rounding problem into a max-
imum flow problem. Fig. la) shows an instance of
Sometimes in a feasible flow problem arcs have
the matrix rounding problem and Fig. lb) gives
nonnegative lower bounds, that is, the flow bound
the m a x i m u m flow network G for this problem.
constraints are lij ~_ Xij ~ Uij instead of 0 < xij ~_
The network G contains a node i corresponding to
uij, for some constants lij > 0 for each (i, j) E A.
each row i of the matrix D, a node j corresponding
By substituting Yij -- x i j - lij for xij, we can trans-
to each column j of D, a source node s, and a sink
form this problem to the formulation (4)-(5). Then
node t. The network contains an arc (i, j) corre-
(5) reduces to 0 < Yij <_ ( u i j - lij) and (4) reduces
sponding to the ijth element in the matrix, an arc
to the same set of equations, but with a different
(s, i) for each row i (this arc represents the sum
right-hand side vector b~.
of row i), an arc (j, t) for each column j (this arc
represents the sum of column j). For any arc (i, j),
Matrix Rounding Problem. This application is con-
we define its upper b o u n d uij = Idij] and lower
cerned with consistent rounding of the elements,
bound lij -- [dij]. Notice that the flow xij = dij is
the row sums, and the column sums of a ma-
a real-valued feasible flow x in the network. Since
trix. We are given a p × q matrix of real num-
there is a one-to-one correspondence between the
bers D = {dij}, with row sums ai and column
consistent roundings of the matrix and feasible in-
sums/3j. We can round any real number d to the
teger flows in the corresponding network, we can
next smaller integer [d] or to the next larger in-
find a consistent rounding by solving a feasible flow
teger [d], and the decision to round up or round
problem on the corresponding network. The feasi-
down is entirely up to us. The matrix-rounding
ble flow algorithm will produce an integer feasible
problem requires that we round the matrix ele-
flow (because of Theorem 3), which corresponds to
ments, and the row and column sums of the matrix
a consistent rounding.
so that the sum of the rounded elements in each
row equals the rounded row sum, and the sum of
the rounded elements in each column equals the P r e l i m i n a r i e s . In this section, we discuss some el-
rounded column sum. We refer to such a round- ementary properties of flows and cuts. We will use
ing as a consistent rounding. The matrix-rounding these properties to prove the celebrated max-flow
problem arises is several application contexts, for min-cut theorem and to establish the correctness
example, the rounding of census data to disguise of the augmenting p a t h algorithm described in the
data on individuals. next section.

252
Maximum flow problem

Residual Network. The concept of residual network amount of flow from the nodes in S to nodes in
plays a central role in the development of maxi- S, and the second expression denotes the amount
mum flow algorithms. Given a flow x, the residual of flow returning from the nodes in S to the nodes
capacity r i j of any arc (i, j) C A is the maximum in S. Therefore, the right-hand side denotes the
additional flow that can be sent from node i to total (net) flow across the cut, and (6) implies
node j using the arcs (i,j) and (j, i). (Recall the that the flow across any s - t-cut IS, S] equals v.
assumption from the first Section that whenever Substituting xij < uij in the first expression of
the network contains arc (i, j), it also contains the (6) and xij >_ 0 in the second expression yields"
arc (j, i).) The residual capacity rij has t w o com- v _ ~(i,j)e(s,~)uij - u[S,S] implying that the
ponents: value of any flow can never exceed the capacity
i) u i j - xij, the unused capacity of arc (i, j); of any cut in the network. We record this result
ii) the current flow xji on arc (j, i), which we formally for future reference.
can cancel to increase the flow from node i LEMMA 1 The value of any flow can never exceed
to node j. the capacity of any cut in the network. Conse-
Consequently, r i j -- u i j - x i j nt- x j i . W e refer to the quently, if the value of some flow x equals the ca-
network G(x) consisting of the arcs with positive pacity of some cut [S, S], then x is a maximum
residual capacities as the residual network (with flow and the cut IS, S] is a minimum cut-. 71
respect to the flow x). Fig. 2 gives an example of The max-flow rain-cut theorem, to be proved in
a residual network. the next section, states that the value of some flow
always equals the capacity of some cut.

Generic Augmenting Path A l g o r i t h m . In this


section, we describe one of the simplest and most
intuitive algorithms for solving the maximum flow
(2,2)~ /(OLD
problem, an algorithm known as the augmenting
path algorithm.
(a) (b) Let x be a feasible flow in the network G, and
let G(x) denote the residual network correspond-
Fig. 2" Illustrating the construction of a residual network;
ing to the flow x. We refer to a directed path
a) the original network, with arc capacities and a flow x;
b) the residual network. from the source to the sink in the residual network
G(x) as an augmenting path. We define the resid-
Flow across an s - t-Cut. Let x be a flow in the ual capacity 5(P) of an augmenting path P as the
network. Adding the mass balance constraint (2) maximum amount of flow that can be sent along
for the nodes in S, we obtain the equation it, that is, 5(P) = m i n s e t r i j ( i , j ) C P. Since the
residual capacity of each arc in the residual net-

• {j" (i,j)EA} {j" j , i ) E A }


1 '0'
work is strictly positive, the residual capacity of
an augmenting path is strictly positive. Therefore,
we can always send a positive flow of 5 units along
-- E X ij - E X ij. it. Consequently, whenever the network contains
(~,j)e(s,s) (i,j)~(s,s) an augmenting path, we can send additional flow
The second equality uses the fact that when- from the source to the sink. (Sending an additional
ever both the nodes p and q belong to the node 5 units of flow along an augmenting path decreases
set S and (p, q) E A, the variable Xpq in the first the residual capacity of each arc (i, j) in the path
term within the bracket (for node i - p) can- by 5 units.) The generic augmenting path algo-
cels the variable --Xpq in the second term within rithm is essentially based upon this simple obser-
the bracket (for node j - q). The first expres- vation. The algorithm identifies augmenting paths
sion in the right-hand side of (6) denotes the in G(x) and augments flow on these paths until

253
Maximum flow problem

the network contains no such path. The algorithm capacity of arc (4, 3) from 0 to 4. Fig. 3b) shows the
below describes the generic augmenting path algo- residual network at this stage. In the second iter-
rithm. ation, the algorithm selects the path 1 - 2 - 3 - 4
We can identify an augmenting path P in G ( x ) and augments 1 unit of flow; Fig. 3c) shows the
by using a graph search algorithm. A graph search residual network after the augmentation. In the
algorithm starts at node s and progressively finds third iteration, the algorithm augments one unit
all nodes that are reachable from the source node of flow along the path 1 - 2 - 4. Fig. 3d) shows the
using directed paths. Most search algorithms run corresponding residual network. Now the residual
in time proportional to the number of arcs in the network contains no augmenting path and so the
network, that is, O ( m ) time, and either identify an algorithm terminates.
augmenting path or conclude that G ( x ) contains
no augmenting path; the latter happens when the
sink node is not reachable from the source node.
BEGIN
x : - 0;
WHILE G(x) contains a directed path
from node s to node t DO
BEGIN
identify an augmenting path P from s to t; (a) (b)
set ~ : - m i n { r i j : (i, j) E P};
augment 5 units of flow along P;
update G(x);
END;
END;

Generic augmenting path algorithm.

For each arc (i, j) C P, augmenting (f units (c) (d)


of flow along P decreases rij by (f units and in-
c r e a s e s rji by 5 units. The final residual capac- Fig. 3: Illustrating the augmented path algorithm: a) the
ities rij when the algorithm terminates specifies residual network G(x) for x = 0; b) the residual network
after augmenting four units along the path (1 - 3 - 4); c)
a maximum (arc) flow in the following manner.
the residual network after augmenting one unit along the
S i n c e rij - u i j - x i j Jr x j i , the arc flows satisfy the path (1 - 2 - 3 - 4); d) the residual network after
equality Xij - Xji -- Uij - - r i j . I f ~tij > r i j , we c a n augmenting one unit along the path (1 - 2 - 4).
set xij = uij - r i j and xji = 0; otherwise, we set
xij = 0 and xji = rij - u i j . Does the augmenting path algorithm always
We use the maximum flow problem given in find a maximum flow? The algorithm terminates
Fig. 3 to illustrate the algorithm. Fig. 3a) shows when the search algorithm fails to identify a di-
the residual network corresponding to the start- rected path in G ( x ) from node s to node t, in-
ing flow x - 0, which is identical to the original dicating that no such path exists (we prove later
network. The residual network contains three aug- that the algorithm would terminate finitely). At
menting paths: 1 - 3 - 4, 1 - 2 - 4, and 1 - 2 - 3 - 4. this stage, let S denote the set of nodes in N that
Suppose the algorithm selects the path 1 - 3 - 4 for are reachable in G ( x ) from the source node using
augmentation. The residual capacity of this path is directed paths, and S - N - S. Clearly, s E S and
= min{r13, r34} -- min{4, 5} -- 4. This augmen- t ~ S. Since the search algorithm cannot reach
tation reduces the residual capacity of arc (1, 3) to any node in S and it can reach each node in S,
zero (thus we delete it from the residual network) we know that rij - 0 for each (i, j) C (S, S). Re-
and increases the residual capacity of arc (3, 1) to call that rij - (uij - xij) + xji, xij <_ uij, and
4 (so we add this arc to the residual network). The xji >_ O. If rij - O, then xij - uij and xji - O.
augmentation also decreases the residual capacity Since rij - 0 for each (i, j ) E (S,S), by substi-
of arc (3, 4) from 5 to 1, and increases the residual tuting these flow values in expression (6), we find

254
Maximum flow problem

that v - u[S, S]. Therefore, the value of the cur- source node to the sink node. To bound the num-
rent flow x equals the capacity of the cut. Lemma ber of iterations, we will determine a bound on the
1 implies that x is a maximum flow and IS, S] is a maximum flow value. By definition, U denotes the
minimum cut. This conclusion establishes the cor- largest arc capacity, and so the capacity of the cut
rectness of the generic augmenting path algorithm ({s}, S - { s } ) is at most nU. Since the value of any
and, as a byproduct, proves the following max-flow flow can never exceed the capacity of any cut in
min-cut theorem. the network, we obtain a bound of nU on the max-
imum flow value and also on the number of itera-
THEOREM 2 The maximum value of the flow from
tions performed by the algorithm. Consequently,
a source node s to a sink node t in a capacitated
the running time of the algorithm is O(nmU),
network equals the minimum capacity among all
which is a pseudopolynomial time bound. We sum-
s - t-cuts. [-1
marize the preceding discussion with the following
The proof of the max-flow min-cut theorem theorem.
shows that when the augmenting path algorithm
THEOREM 4 The generic augmenting path al-
terminates, it also discovers a minimum cut IS, S],
gorithm solves the maximum flow problem in
with S defined as the set of all nodes reachable
O(nmU) time. [-7
from the source node in the residual network cor-
responding to the maximum flow. For our previ- The augmenting path algorithm is possibly the
ous numerical example, the algorithm finds the simplest algorithm for solving the maximum flow
minimum cut in the network, which is IS, S] with problem. Empirically, the algorithm performs rea-
sonably well. However, the worst-case bound on
The augmenting path algorithm also establishes the number of iterations is poor for large values of
another important result, the integrality theorem: U. For example, if U - 2n, the bound is exponen-
tial in the number of nodes. Moreover, as shown
THEOREM 3 If all arc capacities are integer, then
by known examples, the algorithm can indeed per-
the maximum flow problem always has an integer
form these many iterations. A second drawback
maximum flow. V-]
of the augmenting path algorithm is that if the
This result follows from the facts that the initial capacities are irrational, the algorithm might not
(zero) flow is integer and all arc capacities are in- terminate. For some pathological instances of the
teger; consequently, all initial residual capacities maximum flow problem, the augmenting path al-
will be integer. Since subsequently all arc flows gorithm does not terminate in a finite number of
change by integer amounts (because residual ca- iterations and although the successive flow values
pacities are integer), the residual capacities remain converge to some value, they might converge to a
integer throughout the algorithm. Further, the fi- value strictly less than the maximum flow value.
nal integer residual capacities determine an integer (Note, however, that the max-flow min-cut theo-
maximum flow. The integrality theorem does not rem is valid even if arc capacities are irrational.)
imply that every optimal solution of the maximum Therefore, if the augmenting path algorithm is to
flow problem is integer. The maximum flow prob- be guaranteed to be effective in all situations, it
lem might have noninteger solutions and, most of- must select augmenting paths carefully.
ten, it has such solutions. The integrality theorem Researchers have developed specific implemen-
shows that the problem always has at least one tations of the generic augmenting path algorithms
integer optimal solution. that overcome these drawbacks. Of these, the
What is the worst-case running time of the al- following three implementations are particularly
gorithm? An augmenting path is a directed path noteworthy:
in G(x) from node s to node t. We have seen ear-
lier that each iteration of the algorithm requires i) the maximum capacity augmenting path al-
O(m) time. In each iteration, the algorithm aug- gorithm which always augments flow along a
ments a positive integer amount of flow from the path in the residual network with the max-

255
Maximum flow problem

imum residual capacity and can be imple-


e(i) - ~ xji - ~ xij.
mented to run in O(m 2 log U) time; ( j : (j,i)CA} (j : (j,i)EA}
ii) the capacity scaling algorithm which uses a
We refer to a node with positive excess as an ac-
scaling technique on arc capacities and can
tive node. We adopt the convention that the source
be implemented to run in O(nm log U) time;
and sink nodes are never active. In a preflow-push
iii) the shortest augmenting path algorithm algorithm, the presence of an active node indicates
which augments flow along a shortest path that the solution is infeasible. Consequently, the
(as measured by the number of arcs) in the basic operation in this algorithm is to select an ac-
residual network and runs in O(n2m) time. tive node i and try to remove the excess by pushing
These algorithms are due to J. Edmonds and flow out of it. When we push flow out of an active
R.M. Karp [6], H.N. Gabow [9], and E.A. Dinic node, we need to do it carefully. If we just push flow
[5], respectively. L.R. Ford and D.R. Fulkerson [8] to an adjacent node in an arbitrary manner and
and P. Elias, A. Fenstein and C.E. Shannon [7] in- the other nodes do the same, then it is conceivable
dependently developed the basic augmenting path that some nodes keep pushing flow among them-
algorithm. selves resulting in an infinite loop, which is not
a desirable situation. Since ultimately we want to
G e n e r i c P r e f l o w - P u s h A l g o r i t h m . Another send the flow to the sink node, it seems reasonable
class of algorithms for solving the maximum flow for an active node to push flow to another node
problem, known as preflow-push algorithms, is that is 'closer' to the sink. If all nodes maintain
more decentralized than augmenting path algo- this rule, then the algorithm could never encounter
rithms. Augmenting path algorithms send flow by an infinite loop. The concept of distance labels de-
augmenting along a path. This basic operation fur- fined next allows us to implement this algorithmic
ther decomposes into the more elementary opera- strategy.
tion of sending flow along individual arcs. Sending The preflow-push algorithms maintain a dis-
a flow of (f units along a path of k arcs decomposes tance label d(i) with each node in the network.
into k basic operations of sending a flow of (f units The distance labels are nonnegative (finite) inte-
along each of the arcs of the path. We shall refer gers defined with respect to the residual network
to each of these basic operations as a push. The G(x). We say that distance labels are valid with
preflow-push algorithms push flows on individual respect to a flow x if they satisfy the following two
arcs instead of on augmenting paths. conditions:
A path augmentation has one advantage over a d(t) = 0, (8)
single push: it maintains conservation of flow at all
d(i) ~_ d ( j ) + 1 for every arc (i,j) (9)
nodes. The preflow-push algorithms violate con-
servation of flow at all steps except at the very in the residual network G(x).
end, and instead maintain a 'preflow' at each iter- We refer to the conditions (8) and (9) as the
ation. A preflow is a vector x satisfying the flow validity conditions. It is easy to demonstrate that
bound constraints and the following relaxation of d(i) is a lower bound on the length of any directed
the mass balance constraints (2): path (as measured by number of arcs) from node
• (7) i to node t in the residual network, and thus is
(j : (i,j)CA} (j : (j,i)EA} a lower bound on the length of the shortest path
for all i E N - (s, t}. between nodes i and j. Let i -- il . . . . . ik- t
be any path of length k in the residual network
Each element of a preflow vector is either a from node i to node t. The validity conditions (8),
real number or equals ÷c~. The preflow-push al- (9) imply that d(i) = d(il) ~_ d ( i 2 ) + 1, d(i2) _
gorithms maintain a preflow at each intermediate d(i3) + 1 , . . . , d ( i k ) ~ d(t)+ 1 = 1. Adding these
stage. For a given preflow x, we define the excess inequalities shows that d(i) ~ k for any path of
for each node i E N - (s, t} as length k in the residual network, and therefore

256
Maximum flow problem

any (shortest) p a t h from node i to node t con- shortest p a t h from that node to node t, the resid-
tains at least d(i) arcs. We say that an arc (i, j) ual network contains no directed path from s to
in the residual network is admissible if it satisfies t. The subsequent pushes maintain this property
the condition d(i) = d(j) + 1; we refer to all other and drive the solution toward feasibility. Conse-
arcs as inadmissible. quently, when there are no active nodes, the flow
The basic operation in the preflow-push algo- is a m a x i m u m flow.
r i t h m is to select an active node i and try to re- A push of 5 units from node i to node j decreases
move the excess by pushing flow to a node with both the excess e(i) of node i and the residual rij
smaller distance label. (We will use the distance of arc (i, j) by 5 units and increases both e(j) and
labels as estimates of the length of the shortest rji by 5 units. We say t h a t a push of 5 units of
p a t h to the sink node.) If node i has an admissible flow on an arc (i, j) is saturating if d = rij and is
arc (i, j), then d(j) = d ( i ) - 1 and the algorithm nonsaturating otherwise. A nonsaturating push at
sends flow on admissible arcs to relieve the node's node i reduces e(i) to zero. We refer to the pro-
excess. If node i has no admissible arc, then the cess of increasing the distance label of a node as a
algorithm increases the distance label of node i so relabel operation. The purpose of the re label op-
t h a t node i has an admissible arc. The algorithm eration is to create at least one admissible arc on
terminates when the network contains no active which the algorithm can perform further pushes.
nodes, that is, excess resides only at the source
It is instructive to visualize the generic preflow-
and sink nodes. The next algorithm describes the
push algorithm in terms of a physical network:
generic preflow-push algorithm.
arcs represent flexible water pipes, nodes represent
BEGIN joints, and the distance function measures how far
set x := 0 and d(j) := 0 for all j E N;
nodes are above the ground. In this network, we
set x sj = usj for each arc (s, j) E A(s);
d(s) := n; wish to send water from the source to the sink. We
WHILE residual network G(x) contains visualize flow in an admissible arc as water flowing
an active node downhill. Initially, we move the source node up-
DO ward, and water flows to its neighbors. Although
BEGIN we would like water to flow downhill toward the
select an active node I;
sink, occasionally flow becomes t r a p p e d locally at
push/r elab el (i);
END; a node that has no downhill neighbors. At this
END; point, we move the node upward, and again water
p r o c e d u r e push/relabel(i); flows downhill toward the sink.
BEGIN
IF network contains an admissible arc (i, j)
Eventually, no more flow can reach the sink. As
THEN push 5 : - min{e(i), r,j } units of flow we continue to move nodes upward, the remain-
from node i to node j ing excess flow eventually flows back toward the
ELSE replace d(i) by source. The algorithm terminates when all the wa-
min{d(j) + 1: (i,j) E A(i),r~j > 0}; ter flows either into the sink or flows back to the
END;
source.
The generic preflow-push algorithm.
To illustrate the generic preflow-push algorithm,
The algorithm first saturates all arcs emanating we use the example given in Fig. 4. Fig. 4a) spec-
from the source node; then each node adjacent to ifies the initial residual network. We first saturate
node s has a positive excess, so that the algorithm the arcs e m a n a t i n g from the source node, node 1,
can begin pushing flow from active nodes. Since and set d(1) = n = 4. Fig. 4b) shows the residual
the preprocessing operation saturates all the arcs graph at this stage. At this point, the network has
incident to node s, none of these arcs is admissi- two active nodes, nodes 2 and 3. Suppose that the
ble and setting d(s) - n will satisfy the validity algorithm selects node 2 for the push/relabel op-
condition (8), (9). But then, since d(s) = n, and a eration. Arc (2, 4) is the only admissible arc and
distance label is a lower bound on the length of the the algorithm performs a saturating push of value

257
Maximum flow problem

5 - m i n { e ( 2 ) , r 2 4 } - min{2, 1} 1. Fig. 4c) gives - - The preflow-push algorithm has several attrac-
the residual network at this stage. Suppose the al- tive features, particularly its flexibility and its po-
gorithm again selects node 2. Since no admissi- tential for further improvements. Different rules
ble arc emanates from node 2, the algorithm per- for selecting active nodes for the push/relabel
forms a relabel operation and gives node 2 a new operations create many different versions of the
distance label d(2) - m i n { d ( 3 ) + 1, d ( 1 ) + 1} - generic algorithm, each with different worst-case
rain{2,5} = 2. The new residual network is the complexity. As we have noted, the bottleneck op-
same as the one shown in Fig. 4c) except that eration in the generic preflow-push algorithm is the
d(2) = 2 instead of 1. Suppose this time the al- number of nonsaturating pushes and many specific
gorithm selects node 3. Arc (3, 4) is the only ad- rules for examining active nodes can produce sub-
missible arc emanating from node 3, and so the stantial reductions in the number of nonsaturating
algorithm performs a nonsaturating push of value pushes. The following specific implementations of
5 - min{e(3), r34} - min{4, 5} - 4. Fig. 4d) spec- the generic preflow-push algorithms are notewor-
ifies the residual network at the end of this itera- thy:
tion. Using this process for a few more iterations,
the FIFO preflow-push algorithm examines
the algorithm will determine a maximum flow.
the active nodes in the first-in, first-out
(e(i),d(i) (e(j), d(j))
(FIFO) order and runs in O(n 3) time;

/ ii) the highest label preflow-push algorithm


pushes flow from an active node with the
(o, o) (o, 4 (o,
highest value of a distance label and runs in
%/l O(n2m1/2) time; and
(o, l) (2, I)
(b)
iii) the excess-scaling algorithm uses the scaling
(4 1) (0,1) of arc capacities to attain a time bound of
' 5
O(nm + n 2 log U).
f4 / 1

(o, 4 ) ~ (l, o) (o, 4 ) ~ (5, o) These algorithms are due to A.V. Goldberg and
R.J. Tarjan [10], J. Cheriyan and S.N. Maheshwari
(1,1) (1, 2) [4], and R.K. Ahuja and J.B. Orlin [3], respectively.
(C) (d) These preflow-push algorithms are more general,
Fig. 4: Illustrating the preflow-push algorithm: a) the
more powerful, and more flexible than augment-
residual network G(x)
for x = 0; b) the residual network ing path algorithms. The best preflow-push algo-
after saturating arcs emanating from the source; c) the rithms currently outperform the best augmenting
residual network after pushing flow on arc (2, 4); d) the path algorithms in theory as well as in practice
residual network after pushing flow on arc (3, 4). (see, for example, [1]).

The analysis of the computational (worst-case)


C o m b i n a t o r i a l I m p l i c a t i o n s of the M a x -
complexity of the generic preflow-push algorithm
Flow M i n - C u t T h e o r e m . The max-flow min-
is somewhat complicated. Without examining the
cut theorem has far reaching consequences. It can
details, we might summarize the analysis as fol-
be used to prove several important results in com-
lows. It is possible to show that the preflow-push
binatorics that appear to be difficult to prove using
algorithm maintains valid distance labels at all
other means. We will illustrate the use of the max-
steps of the algorithm and increases the distance
flow min-cut theorem to prove two such important
label of any node at most 2n times. The algorithm
results.
performs O(nm) saturating pushes and O(n2m)
nonsaturating pushes. The nonsaturating pushes Network Connectivity. Given a directed network
are the limiting computational operation of the al- G - (N, A) and two specified nodes s and t, we
gorithm and so it runs in O(n2m) time. are interested in the following two questions:

258
Maximum flow problem

i) what is the maximum number of arc-disjoint a corresponding node cover with v nodes. Conse-
(directed) paths from node s to node t; and quently, the max-flow min-cut theorem establishes
ii) what is the minimum number of arcs that the following result:
we should remove from the network so that COROLLARY 6 In a bipartite network G = (N1 t2
it contains no directed paths from node s to N2, A), the maximum cardinality of any matching
node t. equals the minimum cardinality of any node cover
We will show that these two questions are of G. El
closely related. The second question shows how ro-
These two examples illustrate important rela-
bust a network, for example, a telecommunications
tionships between maximum flows, minimum cuts,
network, is to the failure of its arcs. and many other problems in the field of combi-
In the network G, let us define the capacity of natorics. The maximum flow problem is of inter-
each arc as equal to one. Consider any feasible flow est because it provides a unifying tool for view-
x of value v in the resulting unit capacity network. ing many such results, because it arises directly
We can decompose the flow x into flows along v di- in many applications, and because it has been a
rected paths from node s to node t, each path car- rich arena for developing new results concerning
rying a unit flow. Now consider any s - t-cut IS, S] the design and analysis of algorithms.
in the network. The capacity of this cut is I(S, S) I
See also: M i n i m u m cost flow p r o b l e m ; N o n -
that is, equals the number of forward arcs in the
c o n v e x n e t w o r k flow p r o b l e m s ; Traffic net-
cut. Since each path joining nodes s and t contains
work equilibrium; N e t w o r k location: Cover-
at least one arc in the set (S, S), the removal of all
ing p r o b l e m s ; S h o r t e s t p a t h t r e e a l g o r i t h m s ;
the arcs in (S, S) disconnects all paths from node
Steiner tree problems; E q u i l i b r i u m net-
s to node t. Consequently, the network contains
works; S u r v i v a b l e n e t w o r k s ; D i r e c t e d t r e e
a disconnecting set of arcs of cardinality equal to
n e t w o r k s ; D y n a m i c traffic n e t w o r k s ; Auc-
the capacity of any s - t-cut [S, S]. The max-flow
tion algorithms; Piecewise linear network
min-cut theorem immediately implies the follow-
flow p r o b l e m s ; C o m m u n i c a t i o n n e t w o r k as-
ing result:
signment problem; Generalized networks;
COROLLARY 5 The maximum number of arc- Evacuation networks; Network design prob-
disjoint paths from s to t in a directed network lems; S t o c h a s t i c n e t w o r k p r o b l e m s : M a s -
equals the minimum number of arcs whose removal sively p a r a l l e l s o l u t i o n ; N o n o r i e n t e d m u l t i -
will disconnect all paths from node s to node t. [D c o m m o d i t y flow p r o b l e m s .

M a t c h i n g s a n d Covers. The max-flow min-cut References


[1] AHUJA, R.K., KODIALAM, M., MISHRA, A.K., AND
theorem also implies a max-min result concern-
ORLIN, J.B.: 'Computational investigations of maxi-
ing matchings and node covers in a directed bi- mum flow algorithms', Europ. J. Oper. Res. 97 (1997),
partite network G - (N1 U N2, A), with arc set 509-542.
A C_ N1 x N2. In the network G, a subset M C_ A [2] AHUJA, R.K., MAGNANTI,T.L., AND ORLIN, j.B."
is a matching if no two arcs in M have an endpoint Network flows: Theory, algorithms, and applications,
Prentice-Hall, 1993.
in common. A subset C C N1N2 is a node cover
[3] AHUJA, R.K., AND ORLIN, J.B.: 'A fast and simple
of G if every arc in A has at least one endpoint algorithm for the maximum flow problem', Oper. Res.
in the node set C. Suppose we create the network 37 (1989), 748-759.
G' from G by adding two new nodes s and t, as [4] CHERIYAN, J., AND MAHESHWARI, S.N.: 'Analysis of
well as arcs (s, i) of capacity 1 for each i E N1 and preflow-push algorithms for maximum network flow',
arcs (j, t) of capacity 1 for each j E N2. All other SIAM J. Comput. 18 (1989), 1057-1086.
[5] DINIC, E.A.: 'Algorithm for solution of a problem of
arcs in G' correspond to the arcs in G and have
maximum flow in networks with power estimation', So-
infinite capacity. It is possible to show that each viet Math. Dokl. 11 (1970), 1277-1280.
matching of cardinality v defines a flow of value [6] EDMONDS, J., AND KARP, R.M.: 'Theoretical improve-
v in G', and each s - t cut of capacity v induces ments in algorithmic efficiency for network flow prob-

259
Maximum flow problem

lems', J. ACM 19 (1972), 248-264. ements of U (the subsets L and R may not be
[7] ELIAS, P., FEINSTEIN, A, AND SHANNON, C.E.: 'Note
disjoint), together with a sequence of m distinct
on maximum flow through a network', IRE Trans. In-
form. Theory IT-2 (1956), 117-119.
partitions of S: (A1, B ~) , . . . , (Am, B m ) such that
[8] FORD, L.R., AND FULKERSON, D.R.: 'Maximal flow for all i = 1 , . . . , rn, the partition (Ai, Bi) pairs the
through a network', Canad. J. Math. 8 (1956), 399- elements ai and bi. The maximum partition match-
404. ing problem is to construct a partition matching of
[9] GhBOW, H.N.: 'Scaling algorithms for network prob- order rn for a given collection S with m maximized.
lems', J. Comput. Syst. Sci. 31 (1985), 148-168.
[10] GOLDBERG, A.V., AND TARJAN, R.E.: 'A new ap- The maximum partition matching problem
proach to the maximum flow problem', J. A CM 35 arises in connection with the parallel routing prob-
(1988), 921-940, also: Proc. 19th ACM Symp. Theory lem in interconnection networks. In particular, in
of Computing, pp.136-146.
the study of the star networks [1], which are attrac-
Ravindra K. Ahuja tive alternatives to the popular hypercubes net-
Dept. Industrial and Systems Engin. Univ. Florida works. It can be shown that constructing an opti-
Gainesville, FL 32611, USA
mal parallel routing scheme in the star networks
E-mail address: ahuja©ufl, e d u
can be effectively reduced to the maximum parti-
Thomas L. Magnanti
tion matching problem. Readers interested in this
Sloan School of Management
and connection are referred to [2] for a detailed discus-
Dept. Electrical Engin. and Computer Sci. sion.
Massachusetts Inst. Technol.
The maximum partition matching problem can
Cambridge, MA 02139, USA
be formulated in terms of the 3-dimensional
E-mail address: magna.nti@mit, edu
James B. Orlin
matching problem as follows: given an instance S =
Sloan School of Management {C1,..., Ck} of the maximum partition matching
Massachusetts Inst. Technol. problem, we construct an instance M for the 3-
Cambridge, MA 02139, USA dimensional matching problem such that a triple
E-mail address: jorl±nOrait, edu (a, b, P) is contained in M if and only if the parti-
MSC 2000:90C35 tion P of S pairs the elements a and b. However,
Key words and phrases: network, maximum flow problem, since the number of partitions of the collection S
minimum cut problem, augmenting path algorithm, preflow- can be as large as 2n and the 3-dimensional match-
push algorithm, max-flow min-cut theorem.
ing problem is NP-hard [4], this reduction does not
hint a polynomial time algorithm for the maximum
partition matching problem.
M A X I M U M PARTITION MATCHING, MPM
The maximum partition matching problem was in- In the rest of this article, we study the ba-
troduced recently in the study of routing schemes sic properties for the maximum partition match-
on interconnection networks [2]. In this article, we ing problem, and present an algorithm of running
study the basic properties of the problem. An effi- time O(n 2 log n) for the problem. We first intro-
cient algorithm for the maximum partition match- duce necessary terminologies that will be used in
ing problem is presented. our discussion.
Let 7r = ( L , R , ( A 1 , B 1 ) , . . . , ( A m , B m ) ) be a
Definitions and Motivation. Let S = partition matching of the collection S, where L =
{ C 1 , . . . , Ck} be a collection of subsets of the uni- {al,...,am} and R = {bl,...,bm}. We will say
versal set U - { 1 , . . . , n} such that uik=lCi -- U, that the partition (Ai, Bi) left-pairs the element
and Ci fq Cj - 0 for all i ¢- j. A partition ai and right-pairs the element bi. An element a
( A , B ) of S pairs two elements a and b in U if is said to be left-paired if it is in the set L.
a is contained in a subset in A and b is con- Otherwise, the element a is left-unpaired. Simi-
tained in a subset in B. A partition matching larly we define right-paired and right-unpaired ele-
(of order m) of S consists of two ordered subsets ments. The collections A/ and Bi are called the
L - { a l , . . . , a m } and R - {bl,...,bm} of m el- left-collection and right-collection of the parti-

260
Maximum partition matching

tion (Ai, Bi). The partition matching 7r may also Suppose that the collection S consists of k sub-
be written as ~[(al, b t ) , . . . , (am, bin)] if the corre- sets C1, .... , Ck and 2 k >_ 4n. The pre-matching a
sponding partitions are implied. contains at most n pairs. Let (a, b) be a pair in a
For the rest of this paper, we assume that U - and let C and C ~be two arbitrary subsets in S such
{ 1 , . . . ,n} and that S - { C 1 , . . . , C k } is a collec- that C contains a and C' contains b. Note that the
tion of pairwise disjoint subsets of U such that number of partitions (A, B) of S such t h a t C is in
k A and C ~ is in B is equal to 2 k - 2 _> n. Therefore,
u~= 1C~ - U.
at least one such partition can be used to left-pair
C a s e I. V i a P r e - M a t c h i n g w h e n IISI{ is a and right-pair b. This observation results in the
L a r g e . A necessary condition for two ordered sub- following theorem.
sets L - { a l , . . . , a m } and R - { b t , . . . , b m } of U THEOREM 1 Let S - { C 1 , . . . , Ck} be a collec-
to form a partition matching for the collection S
tion of nonempty subsets of the universal set U =
is that ai and bi belong to different subsets in the k
{ 1 , . . . , n} such t h a t Ui= 1Ci - U and Ci A Cj - 0,
collection S, for all i - 1 , , . . . ,m. We say that for i ¢ j. If 2 k > 4n, then a m a x i m u m partition
the two ordered subsets L and R of U form a pre- matching in S can be constructed in time O(n2).
matching a - {(hi, bi)" 1 <_ i <_ rn} if ai and bi do [:]
not belong to the same subset in the collection S,
for all i - 1 , . . . , m. The pre-matching a is maxi- PROOF. Consider the following algorithm
mum if m is the largest among all pre-matchings partition- matching-I.
of S. Input: the collection S = { C 1 , . . . , Ck } of subsets
A m a x i m u m pre-matching can be constructed of U
efficiently by the algorithm pre-matching given be- Output: a partition matching r in S
1. construct a m a x i m u m pre-matching a of
low, where we say that a set is singular if it con-
s;
sists of a single element. See [3] for a proof for the F O R each pair (a, b) in a DO
correctness of the algorithm. use an unused partition of S to pair a and
Input: the collection S = { C t , . . . , Ck } of subsets b.
of U
Algorithm partition-matching-I.
Output: a m a x i m u m pre-matching a in S
1. T = S; a--- 0; Suppose the pre-matching a constructed in step
2. W H I L E T contains more than one set but 1 is a - { ( a t , b t ) , . . . , (am, bin)}. According to
does not consist of exactly three singular
the above discussion, for each pair (ai, bi) in a,
sets
DO there is always an unused partition of S that left-
2.1. pick two sets C and C' of largest cardi- pairs a and right-pairs b. Therefore, step 2 of the
nality in T; algorithm partition-matching-I is valid and con-
2.2. pick an element a in C and an element b structs a partition matching 7r for the collection S.
in C'; Since each partition matching for S induces a pre-
2.3. a=aU{(a,b),(b,a)};
matching in S and a is a m a x i m u m pre-matching,
2.4. c = c- {~}; c ' = c ' - {b};
2.5. if C or C' is empty now, delete it from T; we conclude that the partition matching 7r is a
3. IF T consists of exactly three singular sets m a x i m u m partition matching for the collection S.
C1 = {at}, C2 = {a2}, and C3 = {a3} By carefully organizing the elements in U and
THEN
the partitions of S, we can show that the algorithm
o = o u {(~, ~ ) , (~, ~ ) , (~, ~ ) ) .
partition-matching-I runs in time O(n2). See [3].
Algorithm pre-matching. [~
In the following, we show that when the cardi-
nality of the collection S is large enough, a maxi- C a s e II. V i a G r e e d y M e t h o d w h e n I]Sll is
m u m partition matching of S can be constructed S m a l l . Now we consider the case 2 k < 4n. Since
from the m a x i m u m pre-matching a produced by the number 2 k of partitions of the collection S is
the algorithm pre-matching. small, we can apply a greedy strategy that expands

261
Maximum partition matching

a current partition matching by trying to add each tion P1 can be either used or unused). The par-
of the unused partitions to the partition matching. tition P is directly right-reachable from a parti-
We show in this section that a careful use of this tion P2 = (A2, B2) if the right-paired set of P is
greedy method constructs a maximum partition contained in B2. A partition Ps is left-reachable
matching for the given collection. (resp. right-reachable) from a partition P1 if there
Suppose we have a partition matching ~ - are partitions P 2 , . . . , Ps-1 such that Pi is directly
7r[(al, bl),..., (ah, bh)] and want to expand it. The left-reachable (resp. directly right-reachable) from
partitions of the collection S then can be classified Pi-1, for all i = 2 , . . . , s. [::]
into two classes: h of the partitions are used to
The left-reachability and the right-reachability are
pair the h pairs (a/, bi), i - 1 , . . . , h, and the rest
transitive relations.
2 k - h partitions are unused. Now if there is an
Let P1 = (AI, B1) be an unused partition such
unused partition P - (A, B) such that there is a
that there are no left-unpaired elements in A1,
left-unpaired element a in A and a right-unpaired
and let Ps = (As, Bs) be a partition left-reachable
element b in B, then we simply pair the element
from P1 and there is a left-unpaired element as in
a with the element b using the partition P, thus
As. We show how we can use a chain justification
expanding the partition matching ~.
to make a left-unpaired element for the collection
Now suppose that there is no such unused parti-
A1.
tion, i.e., for all unused partitions (A, B), either A
By the definition, there are used partitions
contains no left-unpaired elements or B contains
P 2 , . . . , Ps-I such that Pi is directly left-reachable
no right-unpaired elements. This case may not nec-
from Pi-1, for i = 2 , . . . , s. We can further assume
essarily imply that the current partition match-
that Pi is not directly left-reachable from Pi-2 for
ing is the maximum. For example, suppose that
i - 3 , . . . , s (otherwise we simply delete the par-
(A, B) is an unused partition such that there is a
tition Pi-I from the sequence). Thus, these parti-
left-unpaired element a in A but no right-unpaired
tions can be written as
elements in B. Assume further that there is a used
partition (A', B') that pairs elements (a', b'), such P1 - ( { e l } U A I , B 1 ),
that the element b' is in B and there is a right-
P2 - ({C__A1,C2} U A[, B2),
unpaired element b in B'. Then we can let the par-
tition ( A ' , B ' ) pair the elements (a', b), and then P3 - ({C2, C3 } U A~, B3),
let the partition ( A , B ) pair the elements (a,b'), °

thus expanding the partition matching r. An ex-


planation of this process is that the used partitions Ps-I - ({C,-2. Cs-1} U A's_l. Bs_I).
have been incorrectly used to pair elements, thus in Ps-({Cs-I,Cs}UA's,Bs),
order to construct a maximum partition matching,
where A ~ , . . . , A's are subcollections of S without
we must re-pair some of the elements. To further
an underlined set.
investigate this relation, we need to introduce a
few notations. We can assume that the left-unpaired element
For a used partition P of S, we put an under-
as in As - {Cs-1, Cs} [-JA's is in a nonunderlined
set Cs in As (otherwise we consider the sequence
line on a set in the left-collection (resp. the right-
P I , , . . . , Ps-I instead). We modify the partition se-
collection) of P to indicate that an element in the
quence into
set is left-paired (resp. right-paired) by the parti-
tion P. The sets will be called the left-paired set P1 - ({C1} U A I , B 1 ),
and the right-paired set of the partition P, respec-
P2 - ({C1, C2,} U A[, B2),
tively.
P3 - ({C2, C3} U A'3, B3),
DEFINITION 2 A used partition P is directly left-
reachable from a partition P1 - ( A I , B 1 ) if the
I
left-paired set of P is contained in AI (the parti- P s - 1 - ( { C s - 2 , Cs-1}UAs-I,Bs-1),

262
Maximum partition matching

P, - ({C,-1, C,} U A's, B,). In case 2 k ,~ 4n, a careful organization of the


elements and the partitions can make the running
The interpretation is as follows" we use the par- time of the algorithm greedy-expanding bounded
tition P8 to left-pair the left-unpaired element as by O(n21ogn). Briefly speaking, we construct a
(the right-paired element in the right-collection Bs graph G of 2 k vertices in which each vertex repre-
is unchanged). Thus, the element as-1 in the set sents a partition of S. The direct left- and right-
Cs-1 of the partition Ps used to left-pair becomes reachabilities of partitions are given by the edges
left-unpaired. We then use the partition Ps-1 to in the graph G, so that checking left- and right-
left-pair the element as-1 and leave an element reachabilities and performing left- and right- chain
as-2 in the set Cs-2 left-unpaired, then we use the justifications can be done efficiently. Interested
partition P,-2 to left-pair as-2, etc. At the end, readers are referred to [3] for a detailed descrip-
we use the partition P2 to left-pair an element a2 tion.
in the set C2 and leave an element a l in the set After execution of the algorithm greedy-
C1 left-unpaired. Therefore, this process makes an expanding, we obtain a partition matching 7I'exp.
element in the left-collection A1 - {C1} t2 A~ of For each partition P - ( A , B ) not included in
the partition P1 left-unpaired. 71"exp, either A has no left-unpaired elements and
The above process will be called a left-chain jus- no used partition left-reachable from P has a left-
tification. Thus, given an unused partition P1 = unpaired element in its left-collection, or B has
(A1,B1) in which the left-collection A1 has no no right-unpaired elements and no used partition
left-unpaired elements and given a used partition right-reachable from P has a right-unpaired ele-
Ps - (As,Bs) left-reachable from P1 such that ment in its right-collection.
the left-collection As of Ps has a left-unpaired ele-
DEFINITION 3 Define Lfree to be the set of par-
ment, we can apply the left-chain justification that
titions P not used by 7rex p such that the left-
keeps all used partitions in the partition matching
collection of P has no left-unpaired elements and
and makes a left-unpaired element for the par-
no used partition left-reachable from P has a left-
tition P1. A process called right-chain justification
unpaired element in its left-collection, and define
for right-collections of the partitions can be de-
Rf~ee to be the set of partitions P ' not used by 71"exp
scribed similarly.
such that the right-collection of P ' has no right-
A greedy method based on the left-chain and
unpaired elements and no used partition right-
right-chain justifications is presented in the follow-
reachable from P ' has a right-unpaired element in
ing algorithm greedy-expanding.
its right-collection. [--]
Input" the collection S = { C 1 , . . . , Ck } of subsets
of U According to the algorithm greedy-matching, each
Output: a partition matching 71"expin S partition not used by 71"exp is either in the set Lfree
1. ?Fexp = O; or in the set Rfree. The sets Lfree and Rfree may
2. repeat until no more changes not be disjoint.
IF there is an unused partition
P = ( A , B ) that has a left-unpaired ele- DEFINITION 4 Define Lreac to be the set of par-
ment a in A and a right-unpaired element titions in 71"exp that are left-reachable from a par-
binB
tition in Lfree, and define Rreac t o be the set of
THEN pair the elements (a, b) by the par-
partitions in 71"exp that are right-reachable from a
tition P and add P to the matching ~exp
ELSE IF a left-chain justification or a partition in Rreac. [--]
right-chain justification (or both) is ap-
According to the definitions, if a used partition P
plicable to make an unused partition P =
( A , B ) to have a left-unpaired element in is in the set Lreac, then all elements in its left-
A and a right-unpaired element in B collection are left-paired, and if a used partition P
THEN apply the left-chain justification is in the set Rreac, then all elements in its right-
and/or the right-chain justification collection are right-paired.
Algorithm greedy-expanding. We first show that if Lreac and Rreac a r e not dis-

263
Maximum partition matching

joint, then we can construct a maximum partition flipping, we show that a maximum partition
matching from the partition matching ~exp con- matching of n pairs can be constructed by flipping
structed by the algorithm greedy-expanding. For d partitions in the partitions P i , . . . , Pt.
this, we need the following technical lemma.
Input: a partition matching { P i , . . . , P t } that
LEMMA 5 If the sets Lreac and Rreac contain a left-pairs all elements in Uk=2Ci, t =
common partition and the partition matching ?rex p E i=2
' ICil, and the set Ci is contained in
the right-collection of each partition Pi,
has less than n pairs, then there is a set Co in S, i=l,...,t,d=lC11 <_t
ICol <_ n/2, such that either all elements in each Output: a maximum partition matching in S with
set C ¢ Co are left-paired and every used parti- n pairs.
tion whose left-paired set is not Co is contained in if not all elements in the set C1 are right-
Lreac, or all elements in each set C ~ Co are right- paired by P i , . . . , P t , replace a proper
number of right-paired elements in Ui=2Ci k
paired and every used partition whose right-paired
by the right-unpaired elements in Ci so
set is not Co is contained in Rreac. [--] that all elements in Ci are right-paired
For a proof, see [3]. by P ~ , . . . , P t ;
suppose that the partitions P I , . . . , P t - d
T H E O R E M 6 If Lreac and Rreac have a common right-pair t - d elements b~,... ,bt-d in
k
t-J,=2Ci, and that P t - d + i , . . . , P t right-
partition, then the collection S has a maximum
pair the d elements in C1;
partition matching of n pairs, which can be con-
suppose that P ~ , . . . , P t - d are the t - d
structed in linear time from the partition matching partitions in { P i , . . . , P t } that left-pair
7rexp . [-] the elements b~ , . . . , bt-d;
flip each of the d partitions in
PROOF. If 7rexp has n pairs, then /rexp is al-
{Pi,...,Pt} - {P~,...,Pt-d} to get
ready a maximum partition matching. Thus we d partitions P ~ , . . . , PJ to left-pair the d
assume that 7rexp has less than n pairs. Accord- elements in Ci. The right-paired element
ing to the above lemma, we can assume, without of each P[ is the left-paired element
loss of generality, that all elements in each set Ci, before the flipping;
{P~,...,Pt,P~,...,P~} is a partition
i = 2 , . . . , k, are left-paired, and that every used
matching of n pairs.
partition whose left-paired set is not Ci is in Lreac.
Moreover, ICll _< E k=2 Ic l. Algorithm partition-flipping.
k
Let t - ~-~i=2 ICil and d - ICll. Then we can Step 1 of the algorithm is always possible" since
assume that the partition matching ~exp consists Ci is contained in the right-collection of each par-
of the partitions tition Pi, i - 1 , . . . , t , and t >__ d, for each right-
unpaired element b in Ci, we can always pick a
, . . . , Pt , Pt + , . . . , Pt + h
k
partition Pi that right-pairs an element in Ui=2Ci,
where Pi,..., Pt are used by ~exp to left-pair the and let Pi right-pair the element b. We keep do-
elements in uik=2ci, and Pt+i,..., Pt+h are used by ing this replacement until all d elements in Ci get
7rexp to left-pair the elements in Ci, h < d. More- right-paired. At this point, the number of parti-
over, all partitions P i , . . . , Pt are in the set Lreac. tions in { P 1 , . . . , P t } that right-pair elements in
Thus, the set Ci must be contained in the right- k
Ui=2Ci is exactly t - d. Step 3 is always possible
collection in each of the partitions Pi,..., Pt. since the partitions P 1 , . . . , Pt left-pair all elements
We ignore the partitions Pt+i,...,Pt+h and in Uik=2Ci.
use the partitions Pi,..., Pt to construct a max- Now we verify that the constructed sequence
imum partition matching of n pairs. Note that {Pi,..., Pt, P~,..., P~} is a partition matching in
{Pi,..., Pt } also forms a partition matching in the S. No two partitions Pi and Pj can be identical
collection S. since { P i , . . . , P t } is supposed to be a partition
For a partition ( A , B ) of S, we say that the matching in S. No two partitions P/' and Pj can
partition (B, A) is obtained by flipping the parti- be identical since they are obtained by flipping
tion (A, B). In the following algorithm partition- two different partitions in {P1,... ,P t}. No par-

264
Maximum partition matching

tition Pi is identical to a partition P~ because Pi conclude that the partitions in W L - - Lfree U Lreac
has C1 in its right-collection while P~ has C1 in its can be used to left-pair at most l U L l - ILreacl ele-
left-collection. Therefore, the partitions P 1 , . . . , Pt, ments in any partition matching in S.
P ~ , . . . , PJ are all distinct. Similarly, the partitions in the set WR - Rfree U
Each of the partitions P 1 , . . . , P t left-pairs an Rreac can be used to right-pair at most IRreacl el-
element in Ui=2Ci k , and each of the partitions ements in any partition matching in S.
P ~ , . . . , PJ .left-pairs an element in C1. Thus, all Therefore, any partition matching in the col-
elements in the universal set U get left-paired in lection S can include at most I'Lreacl partitions in
{P1,... , P,, P{, . . . , P~ }. the set WL, at most IRreacl partitions in the set
Finally, the partitions P 1 , . . . , P t right-pair all WR, and at most all partitions in the set Wother.
elements in C1 and the elements b l , . . . , b t - d Consequently, a maximum partition matching in
k
in Ui=2Ci. Now by our selection of the parti- S consists of at most ILreacl + IRreac]-+-[Wotherl
tions, the partitions P~,... ,P~ precisely right- partitions. Since the partition matching 7rexp con-
pair all the elements in Ui=2C k i -{bl,...,bt-d}. s t r u c t e d by the algorithm greedy-expanding con-
Thus, all elements in U also get right-paired in tains just this many partitions, ~exp is a maximum
{P1,... , Pt, P~, . . . , P[~}. partition matching in the collection S. K]
This concludes that the constructed sequence Now it is clear how the maximum partition
{ P 1 , . . . , P t , P ~ , . . . , P ~ } is a maximum partition matching problem is solved.
matching in the collection S. The running time of
THEOREM 8 The maximum partition matching
the algorithm partition-flipping is obviously linear.
problem is solvable in time O ( n 2 log n). K]
D
Now we consider the case when the sets Lreac PROOF. Suppose that we are given a collection
and Rreac have no common partitions. S - { C 1 , . . . , Ck} of pairwise disjoint subsets of
U- (1,...,n}.
THEOREM 7 If Lreac and Rreac have no common
In case 2 k > 4n, we can call the algorithm
partitions, then the partition matching 7rexp is a
partition-matching-I to construct a maximum par-
maximum partition matching. K]
tition matching in time O(n2).
PROOF. Let Wother be the set of used parti- In case 2 k < 4n, we first call the algorithm
tions in ~'exp that belong to neither Lreac nor greedy-expanding to construct a partition match-
Rreac. Then Lfree U Rfree U Lreac U Rreac U Wother ing 7rexp and compute the sets Lreac and Rreac. If
is the set of all partitions of the collection S, Lreac and Rreac have no common partition, then
and Lreac U Rreac U Wother is the set of partitions according to the previous theorem, 7rexp is already
contained in the partition matching 7rexp. Since a maximum partition matching. Otherwise, we
all sets nreac, Rreac, and Wother are pairwise dis- call the algorithm partition-flipping to construct
joint, the number of partitions in 7rexp is precisely a maximum partition matching. All these can be
[Lreac[ + IRreacl + [Wotherl. done in time O ( n 2 1 o g n ) . A detailed analysis of
Now consider the set WL - Lfree U Lreac. Let this algorithm c a n be found in [3]. [3
UL be the set of elements that appears in the left- See also: F r e q u e n c y assignment problem;
collection of a partition in WL. We have Bi-objective assignment problem; Assign-
• Every P C Lreac left-pairs an element in UL; ment and matching; Assignment methods in
• Every element in UL is left-paired; clustering; Quadratic assignment problem;
Communication network assignment prob-
• If an element a in UL is left-paired by a par-
lem.
tition P, then P E Lreac.
Therefore, the partitions in Lreac precisely left-
References
pair the elements in UL. This gives ILreac[ - l U L l . [I] AKERS, S.B., AND KRISHNAMuRTHY, B." 'A group-
Since there are only lULl elements that appear in theoretic model for symmetric interconnection net-
the left-collections in partitions in Lfree U Lreac, we works', IEEE Trans. Computers 38 (1989), 555-565.

265
Maximum partition matching

[2] CHEN, C-C., AND CHEN, J.: 'Optimal parallel routing MAX-SAT is of considerable interest not only
in star networks', IEEE Trans. Computers 48 (1997), from the theoretical side but also from the prac-
1293-1303. tical one. On one hand, the decision version SAT
[3] CHEN, C-C., AND CHEN, J.: 'The maximum partition
was the first example of an NP-complete problem
matching problem with applications', SIAM J. Corn-
put. 28 (1999), 935-954. [16], moreover MAX-SAT and related variants play
[4] GAREY, M.R., AND JOHNSON, D.S.: Computers an important role in the characterization of differ-
and intractability: A guide to the theory of NP- ent approximation classes like APX and PTAS [5].
completeness, Freeman, 1979. On the other hand, many issues in mathematical
Jianer Chen logic and artificial intelligence can be expressed in
Texas A&M Univ. the form of satisfiability or some of its variants,
College Station like constraint satisfaction. Some exemplary prob-
Texas, USA
lems are consistency in expert system knowledge
E-mail address: chen@cs, tamu. edu
bases [46], integrity constraints in databases [4],
MSC2000: 05A18, 05D15, 68M07, 68M10, 68Q25, 68R05 [23], approaches to inductive inference [35], [40],
Key words and phrases: maximum matching, greedy algo-
asynchronous circuit synthesis [32]. An extensive
rithm, star network, parallel routing algorithm.
review of algorithms for MAX-SAT appeared in
[9].
MAXIMUM SATISFIABILITY PROBLEM, M. Davis and H. P u t n a m [19] started in 1960
MAX-SAT the investigation of useful strategies for handling
resolution in the satisfiability problem. Davis, G.
In the maximum satisfiability (MAX-SAT) prob-
Logemann and D. Loveland [18] avoid the memory
lem one is given a Boolean formula in conjunctive
explosion of the original DP algorithm by replacing
normal form, i.e., as a conjunction of clauses, each
the resolution rule with the splitting rule. A recent
clause being a disjunction. The task is to find an
review of advanced techniques for resolution and
assignment of truth values to the variables that
splitting is presented in [31].
satisfies the maximum number of clauses.
The MAX W-SAT problem has a natural inte-
Let n be the number of variables and m the
ger linear programming formulation. Let yj = 1 if
number of clauses, so that a formula has the fol-
Boolean variable uj is 'true', yj - 0 if it is 'false',
lowing form:
and let the Boolean variable zi = 1 if clause 6'/
is satisfied, zi = 0 otherwise. The integer linear
program is:
l<i<m l<k<lC~l m
where 1(7/I is the number of literals in clause (7/ max E wizi
and lik is a literal, i.e., a propositional variable i=1
uj or its negation u---~,for 1 < j < n. The set of subject to the constraints:
clauses in the formula is denoted by C. If one asso-
ciates a weight wi to each clause (7/one obtains the ' ~ yj+ ~ ( 1 - y j ) > _ z i ,
weighted MAX-SAT problem, denoted as MAX W- jeu,+ jeu~
SAT" one is to determine the assignment of truth i= 1,...,m,
values to the n variables that maximizes the sum yj E {0,1}, j-1,...,n,
of the weights of the satisfied clauses. In the liter- zi E {0, 1}, i= 1,...,m,
ature one often considers problems with different
numbers k of literals per clause, defined as MAX- where U+ and Ui- denote the set of indices of vari-
k-SAT, or MAX W-k-SAT in the weighted case. ables that appear unnegated and negated in clause
In some papers MAX-k-SAT instances contain up Ci, respectively. If one neglects the objective func-
to k literals per clause, while in other papers they tion and sets all zi variables to 1, one obtains an in-
contain exactly k literals per clause. We consider teger programming feasibility problem associated
the second option unless otherwise stated. to the SAT problem [11].

266
Maximum satisfiability problem

The integer linear programming formulation of gorithms that achieve a performance ratio of 3/4
MAX-SAT suggests that this problem could be have been proposed in [27] and [55]. Moreover, it is
solved by a branch and bound method (cf. also possible to derandomize these algorithms, that is,
Integer programming: Branch and bound to obtain deterministic algorithms that preserve
m e t h o d s ) . A usable method uses Chv£tal cuts. In the same bound 3/4 for every instance. The ap-
[35] it is shown that the resolvents in the proposi- proximation ratio 3/4 can be slightly improved
tional calculus correspond to certain cutting planes [28]. T. Asano [2] (following [3]) has improved the
in the integer programming model of inference bound to 0.77. For the restricted case of MAX-2-
problems. SAT, one can obtain a more substantial improve-
Linear programming relaxations of integer lin- ment (performance ratio 0.931) with the technique
ear programming formulations of MAX-SAT have in [21]. If one considers only satisfiable MAX W-
been used to obtained upper bounds in [33], [55], SAT instances, L. Trevisan [54] obtains a 0.8 ap-
[27]. A linear programming and rounding approach proximation factor, while H. Karloff and U. Zwick
for MAX-2-SAT is presented in [13]. A method for [41] claim a 0.875 performance ratio for satisfi-
strengthening the generalized set covering formu- able instances of MAX W-3-SAT. A strong nega-
lation is presented in [47], where Lagrangian mul- tive result about the approximability can be found
tipliers guide the generation of cutting planes. in [36]: Unless P = NP MAX W-SAT cannot be
The first approximation algorithms with a approximated in polynomial time within a perfor-
'guaranteed' quality of approximation [5] were mance ratio greater than 7/8.
proposed by D.S. Johnson [38] and use greedy MAX-SAT is among the problems for which lo-
construction strategies. The original paper [38] cal search has been very successful: in practice,
demonstrated for both of them a performance ra- local search and its variations are the only effi-
tio 1/2. In detail, let k be the minimum number cient and effective method to address large and
of variables occurring in any clause of the formula, complex real-world instances. Different variations
re(x, y) the number of clauses satisfied by the fea- of local search with randomness techniques have
sible solution y on instance x, and m*(x) the max- been proposed for SAT and MAX-SAT starting
imum number of clauses that can be satisfied. from the late 1980s, see for example [30], [52], mo-
For any integer k >_ 1, the first algorithm tivated by previous applications of 'rain-conflicts'
achieves a feasible solution y of an instance x such heuristics in the area of artificial intelligence [44].
that The general scheme is based on generating a
starting point in the set of admissible solution and
y) > 1 1
m*(x) - k+l' trying to improve it through the application of ba-
sic moves. The search space is given by all possi-
while the second algorithm obtains
ble truth assignments. Let us consider the elemen-
m(z, y) > 1 1 tary changes to the current assignment obtained
m*(x) - 2k" by changing a single truth value. The definitions
Recently (1997) it has been proved [12] that the are as follows.
second algorithm reaches a performance ratio 2/3. Let U be the discrete search space: U = {0, 1}n,
There are formulas for which the second algorithm and let f be the number of satisfied clauses. In
finds a truth assignment such that the ratio is 2/3. addition, let U (t) E U be the current configura-
Therefore this bound cannot be improved [12]. tion along the search trajectory at iteration t, and
One of the most interesting approaches in the N(U (t)) the neighborhood of point U (t), obtained
design of new algorithms is the use of random- by applying a set of basic moves #i (1 < i __ n),
ization. During the computation, random bits are where #i complements the ith bit ui of the string:
generated and used to influence the algorithm pro- #i ( u l , . . . , u i , . . . , u n ) = ( U l , . . . , 1 - ui,...,un)"
cess. In many cases randomization allows to obtain
better (expected) performance or to simplify the
construction of the algorithm. Two randomized al- N(U (t))

267
Maximum satisfiability problem

phase of the search when the next points are gen-


= {U E U" U - #i, u(t) i - 1 ~ ' ' ' ~ n} •

erated. The term 'memory-less' denotes this lack


The version of local search that we consider of feedback from the search history.
starts from a random initial configuration U (°) C In addition to the cited multiple-run local
U and generates a search trajectory as follows: search, these techniques are based on Markov
V- BESTNEIGHBOR(N(U('))), (1) processes (simulated annealing; cf. also Simu-
l a t e d a n n e a l i n g m e t h o d s in p r o t e i n folding),
u(t+l) = [ Y i f f ( Y ) > f(u(t)),
(2) 'plateau' search and 'random noise' strategies, or
[ U (t) i f f ( V ) _< f(U (t)) combinations of randomized constructions and lo-
where BESTNEIGHBOR selects V E N(U (t)) cal search. The use of a Markov process to gener-
with the best f value and ties are broken randomly. ate a stochastic search trajectory is adopted, for
V in turn becomes the new current configuration example in [53].
if f improves. Other versions are satisfied with an The Gsat algorithm was proposed in [52] as a
improving (or nonworsening) neighbor, not neces- model-finding procedure, i.e., to find an interpre-
sarily the best one. Clearly, local search stops as tation of the variables under which the formula
soon as the first local optimum point is encoun- comes out 'true'. Gsat consists of multiple runs
tered, when no improving moves are available, see of LS +, each run consisting of a number of itera-
(2). Let us define as LS + a modification of LS tions that is typically proportional to the problem
where a specified number of iterations are exe- dimension n. An empirical analysis of Gsat is pre-
cuted and the candidate move obtained by BEST- sented in [25], [24]. Different 'noise' strategies to
NEIGHBOR is always accepted even if the f value escape from attraction basins are added to Gsat
remains equal or worsens. in [50], [51].
Properties about the number of clauses satisfied A hybrid algorithm that combines a random-
at a local optimum have been demonstrated. Let ized greedy construction phase to generate initial
m* be the best value and k the minimum num- candidate solutions, followed be a local improve-
ber of literals contained in the problem clauses. ment phase is the GRASP scheme proposed in [48]
Let mloc be the number of satisfied clauses at a for the SAT and generalized for the MAX W-SAT
local optimum of any instance of MAX-SAT with problem in [49]. GRASP is an iterative process,
at least k literals per clause, mloc satisfies the fol- with each iteration consisting of two phases, a con-
lowing bound [34]: struction phase and a local search phase.
k Different history-sensitive heuristics have been
mloc_> ~ : + l m
proposed to continue local search schemes beyond
and the bound is sharp. Therefore, if mloc is the local optimality. These schemes aim at intensifying
number of satisfied clauses at a local optimum, the search in promising regions and at diversifying
then: the search into uncharted territories by using the
k information collected from the previous phase (the
mloc _> k + 1 m*" (3)
history) of the search. Because of the internal feed-
State-of-the-art heuristics for MAX-SAT are back mechanism, some algorithm parameters can
obtained by complementing local search with be modified and tuned in an on-line manner, to re-
schemes that are capable of producing better ap- flect the characteristics of the task to be solved and
proximations beyond the locally optimal points. In the local properties of the configuration space in
some cases, these schemes generate a sequence of the neighborhood of the current point. This tuning
points in the set of admissible solutions in a way has to be contrasted with the off-line tuning of an
that is fixed before the search starts. An example algorithm, where some parameters or choices are
is given by multiple runs of local search starting determined for a given problem in a preliminary
from different random points. The algorithm does phase and they remain fixed when the algorithm
not take into account the history of the previous runs on a specific instance.

268
Maximum satisfiability problem

Tabu search is a history-sensitive heuristic pro- is added to the dynamical system:


posed by F. Glover [26] and, independently, by T(t) _ react(T(t-1), U ( ° ) , . . . , u(t)).
P. Hansen and B. Jaumard, that used the term
'SAMD' (steepest ascent mildest descent) and ap- An algorithm that combines local search and
plied it to the MAX-SAT problem in [34]. The nonoblivious local search [8], the use of prohibi-
main mechanism by which the history influences tions, and a reactive scheme to determine the pro-
the search in tabu search is that, at a given it- hibition parameter is the Hamming-reactive tabu
eration, some neighbors are prohibited, only a search algorithm proposed in [7], which contains
nonempty subset NA(U (t)) C N ( U (t)) of them is also a detailed experimental analysis.
allowed. The general way of generating the search Given the hardness of the problem and the rel-
trajectory that we consider is given by: evancy for applications in different fields, the em-
NA(U (t)) - allow(N(U(t)), , . . . , u(t)), (4) phasis on the experimental analysis of algorithms
for the MAX-SAT problem has been growing in
U (t+l) - B E S T N E I G H B O R ( N A ( U ( t ) ) ) . (5) recent years (as of 2000).
The set-valued function allow selects a nonempty In some cases the experimental comparisons
subset of N ( U (t)) in a manner that depends on the have been executed in the framework of 'chal-
entire previous history of the search U ( ° ) , . . . , U (t). lenges,' with support of electronic collection and
A specialized tabu search heuristic is used in [37] distribution of software, problem generators and
to speed up the search for a solution (if the prob- test instances. An example is the the Second
lem is satisfiable) as part of a branch and bound DIMACS algorithm implementation challenge on
algorithm for SAT, that adopts both a relaxation cliques, coloring and satisfiability, whose results
and a decomposition scheme by using polynomial have been published in [39]. Practical and indus-
instances, i.e., 2-SAT and Horn-SAT. trial MAX-SAT problems and benchmarks, with
Different methods to generate prohibitions pro- significant case studies are also presented in [20].
duce discrete dynamical systems with qualitatively Some basic problem models that are considered
different search trajectories. In particular, prohi- both in theoretical and in experimental studies of
bitions based on a list of moves lead to a faster MAX-SAT algorithms are described in [31].
escape from a locally optimal point than prohibi- Different algorithms demonstrate a different de-
tions based on a list of visited configurations [6]. gree of effort, measured by number of elementary
In detail, the function allow can be specified by steps or CPU time, when solving different kinds
introducing a prohibition parameter T (also called of instances. For example, in [45] it is found that
list size) that determines how long a move will re- some distributions used in past experiments are
main prohibited after its execution. The fixed tabu of little interest because the generated formulas
search algorithm is obtained by fixing T through- are almost always very easy to satisfy. It also re-
out the search [26]. A neighbor is allowed if and ports that one can generate very hard instances of
only if it is obtained from the current point by k-SAT, for k _> 3. In addition, it reports the fol-
applying a move that has not been used during lowing observed behavior for random fixed length
the last T iterations. In detail, if LU(#) is the last 3-SAT formulas" if r is the ratio r of clauses to
usage time of move # (LU(#) = - o c at the begin- variables (r - m / n ) , almost all formulas are saris-
ning)" fiable if r < 4, almost all formulas are unsatisfiable
if r > 4.5. A rapid transition seems to appear for
NA(U (t)) - { U - #U (t)" L U ( # ) < ( t - T ) } .
r ~ 4.2, the same point where the computational
The reactive tabu search algorithm of [10], de- complexity for solving the generated instances is
fines simple rules to determine the prohibition pa- maximized, see [42], [17] for reviews of experimen-
rameter by reacting to the repetition of previously- tal results.
visited configurations. One has a repetition if Let ~ be the least real number such that, if r
U (t+R) - U (t) for R _> 1. The prohibition period T is larger than ~, then the probability of g being
depends on the iteration t and a reaction equation satisfiable converges to 0 as n tends to infinity. A

269
Maximum satisfiability problem

n o t a b l e result f o u n d i n d e p e n d e n t l y by m a n y peo- Annual IEEE Conf. Computational Complexity (Ulm,


ple, including [22] a n d [14] is t h a t Germany), 1997, pp. 274-281.
[13] CHERIYAN, J., CUNNINGHAM,W.H., TUNCEL, T., AND
_< logs/7 2 - 5.191. WANG, Y.: 'A linear programming and rounding ap-
proach to MAX 2-SAT', in M. TRICK AND D.S. JOH-
A series of t h e o r e t i c a l analyses aim at approxi- SON (eds.): Proc. Second DIMACS Algorithm Imple-
m a t i n g the unsatisfiability threshold of r a n d o m for- mentation Challenge on Cliques, Coloring and Satisfi-
mulas [43], [1], [15], [29]. ability, DIMACS 26, 1996, pp. 395-414.
[14] CHV/~TAL, V., AND SZEMERI~DI,E." 'Many hard exam-
See also" Greedy randomized adaptive
ples for resolution', J. ACM 35 (1988), 759-768.
search procedures; Integer programming. [15] CHV/~TAL, V., AND REED, B." 'Mick gets some (the
odds are on his side)': Proc. 33th Ann. IEEE Syrup. on
References Foundations of Computer Sci., IEEE Computer Soc.,
[1] ACHLIOPTAS,D., KIROUSIS, L.M., KRANAKIS, E., AND 1992, pp. 620-627.
KRINZAC, D.: 'Rigorous results for random (2 + p)- [16] COOK, S.A.: 'The complexity of theorem-proving pro-
SAT': Proc. Work. on Randomized Algorithms in Se- cedures': Proc. Third Annual A CM Syrup. Theory of
quential, Parallel and Distributed Computing (RAL- Computing, 1971, pp. 151-158.
COM 97), Santorini, Greece, 1997, pp. 1-10. [17] COOK, S.A., AND MITCHELL, D.G.: 'Finding hard in-
[2] ASANO, T.: 'Approximation algorithms for MAX-SAT: stances of the satisfiability problem: A survey', in D.-Z.
Yannakakis vs. Goemans-Williamson': Proc. 3rd Israel Du, J. Gu, AND P.M. PARDALOS (eds.): Satisfiability
Symp. on the Theory of Computing and Systems, Ra- Problem: Theory and Applications, DIMACS 35, Amer.
mat Gan, Israel, 1997, pp. 24-37. Math. Soc. and ACM, 1997, pp. 1-17.
[3] ASANO, T., ONO, T., AND HIRATA, T.: 'Approxima- [lS] DAVIS, M., LOGEMANN, G., AND LOVELAND, D.: 'A
tion algorithms for the maximum satisfiability prob- machine program for theorem proving', Comm. A CM
lem': Proc. 5th Scandinavian Work. Algorithms The- 5 (1962), 394-397.
ory, 1996, pp. 110-111. [10] DAVIS, M., AND PUTNAM, H.: 'A computing procedure
[4] ASIRELLI, P., SANTIS, M. DE, AND MARTELLI, A.: 'In- for quantification theory', J. A CM 7 (1960), 201-215.
tegrity constraints in logic databases', J. Logic Pro- [20] Du, D.-Z., Gu, J., AND PARDALOS, P.M. (eds.): Sat-
gramming 3 (1985), 221-232. isfiability problem: Theory and applications, Vol. 35 of
[5] AUSIELLO,G., CRESCENZI, P., AND PROTASI, M.: 'Ap- DIMACS, Amer. Math. Soc. and ACM, 1997.
proximate solution of NP optimization problems', The- [21] FEIGE, U., AND GOEMANS, M.X.: 'Approximating the
oret. Computer Sci. 150 (1995), 1-55. value of two proper proof systems, with applications
[6] BATTITI, R.: 'Reactive search: Toward self-tuning to MAX-2SAT and MAX-DICUT': Proc. Third Is-
heuristics', in V.J. RAYWARD-SMITH, I.H. OSMAN, rael Syrup. Theory of Computing and Systems, 1995,
c . a . REEVES, AND G.D. SMITH (eds.): Modern pp. 182-189.
Heuristic Search Methods, Wiley, 1996, pp. 61-83. [22] FRANCO, J., AND PAULL, M.: 'Probabilistic analysis
[7] BATTITI, R., AND PROTASI, M.: 'Reactive search, a of the Davis-Putnam procedure for solving the satisfi-
history-sensitive heuristic for MAX-SAT', A CM J. Ex- ability problem', Discrete Appl. Math. 5 (1983), 77-87.
perimental Algorithmics 2, no. 2 (1997). [23] GALLAIRE, H., MINKER, J., AND NICOLAS, J.M.:
[8] BATTITI, R., AND PROTASI, M.: 'Solving MAX-SAT 'Logic and databases: A deductive approach', Comput-
with non-oblivious functions and history-based heuris- ing Surveys 16, no. 2 (1984), 153-185.
tics', in D.-Z. Du, J. Gu, AND P.M. PARDALOS [24] GENT, I.P., AND WALSH, T.: 'An empirical analysis of
(eds.): Satisfiability Problem: Theory and Applications, search in GSAT', J. Artif. Intell. Res. 1 (1993), 47-59.
no. 35 in DIMACS, Amer. Math. Soc. and ACM, 1997, [25] GENT, I.P., AND WALSH, T.: 'Towards an understand-
pp. 649-667. ing of hill-climbing procedures for SAT': Proc. Eleventh
[9] BATTITI, R., AND PROTASI, M.: 'Approximate algo- Nat. Conf. Artificial Intelligence, AAAI Press/MIT,
rithms and heuristics for MAX-SAT', in D.-Z. Du AND 1993, pp. 28-33.
P.M. PARDALOS (eds.): Handbook Combinatorial Op- [26] GLOVER, F.: 'Tabu search: Part I', ORSA J. Comput.
tim., Kluwer Acad. Publ., 1998, pp. 77-148. 1, no. 3 (1989), 190-260.
[10] BATTITI, R., AND TECCHIOLLI, G.: 'The reactive tabu [27] GOEMANS, M.X., AND WILLIAMSON, D.P.: 'New 3/4-
search', ORSA J. Comput. 6, no. 2 (1994), 126-140. approximation algorithms for the maximum satisfiabil-
[11] BLAIR, C.E., JEROSLOW, R.G., AND LOWE, J.K.: ity problem', SIAM J. Discrete Math. 7, no. 4 (1994),
'Some results and experiments in programming for 656-666.
propositional logic', Computers Oper. Res. 13, no. 5 [2s] GOEMANS, M.X., AND WILLIAMSON, D.P.: 'Improved
(1986), 633-645. approximation algorithms for maximum cut and satis-
[12] CHEN~ J., FRIESEN, D., AND ZHENG~ H.: 'Tight bound fiability problems using semidefinite programming', J.
on Johnson's algorithm for MAX-SAT': Proc. 12th

270
Maximum satis fiability problem

ACM 42, no. 6 (1995), 1115-1145. LAIRD, P.: 'Solving large-scale constraint satisfac-
[29] GOERDT, A.: 'A threshold for unsatisfiability', J. Corn- tion and scheduling problems using a heuristic repair
put. Syst. Sci. 53 (1996), 469-486. method': Proc. 8th Nat. Conf. Artificial Intelligence
[30] Gu, J.: 'Efficient local search for very large-scale satis- (AAAI-90), 1990, pp. 17-24.
fiability problem', ACM SIGART Bull. 3, no. 1 (1992), [45] MITCHELL, D., SELMAN, S., AND LEVESQUE, H.:
8-12. 'Hard and easy distributions of SAT problems': Proc.
[31] Gu, J., PURDOM, P.W., FRANCO, J., AND WAH, lOth Nat. Conf. Artificial Intelligence (AAAI-92), July
B.W.: 'Algorithms for the satisfiability (SAT) problem: 1992, pp. 459-465.
A survey', in D.-Z. Du, J. Gu, AND P.M. PARDALOS [46] NGUYEN, T.A., PERKINS, W.A,, LAFFREY, T.J., AND
(eds.): Satisfiability Problem: Theory and Applications, PECORA, D.: 'Checking an expert system knowledge
Vol. 35 of DIMA CS, Amer. Math. Soc. and ACM, 1997. base for consistency and completeness': Proc. Internat.
[32] GU, J., AND PuRI, R.: 'Asynchronous circuit synthesis Joint Conf. on Artificial Intelligence, 1985, pp. 375-
with Boolean satisfiability', IEEE Trans. Computer- 378.
Aided Design Integr. Circuits 14, no. 8 (1995), 961- [47] NOBILI, P., AND SASSANO, A.: 'Strengthening La-
973. grangian bounds for the MAX-SAT problem', Techn.
[33] HAMMER, P.L., HANSEN, P., AND SIMEONE, B.: 'Roof Report Inst. Informatik KSln Univ., Germany, no. 96-
duality, complementation and persistency in quadratic 230 (1996), Proc. Work Satisfiability Problem, Siena,
0-1 optimization', Math. Program. 28 (1984), 121-155. Italy (J. Franco and G. Gallo and H. Kleine Buening,
[34] HANSEN, P., AND JAUMARD, B.: 'Algorithms for Eds.).
the maximum satisfiability problem', Computing 44 [4s] RESENDE, M.G.C., AND FEO, T.A.: 'A GRASP
(1990), 279-303. for satisfiability', in M. TRICK AND D.S. JOHSON
[35] HOOKER, J.N.: 'Resolution vs. cutting plane solution (eds.): Proc. Second DIMACS Algorithm Implementa-
of inference problems: some computational experience', tion Challenge on Cliques, Coloring and Satisfiability,
Oper. Res. Left. 7, no. 1 (1988), 1-7. DIMACS 26, Amer. Math. Soc., 1996, pp. 499-520.
[36] H~.STAD, J.: ~Someoptima] inapproximability results': [49] RESENDE, M.G.C., PITSOULIS, L.S., AND PARDALOS,
Proc. 28th Annual A CM Symp. on Theory of Comput- P.M.: 'Approximate solution of weighted MAX-SAT
ing, El Paso, Texas, 1997, pp. 1-10. problems using GRASP', in D.-Z. Du, J. Gu, AND
[37] JAUMARD, B., STAN, M., AND DESROSIERS, J.: 'Tabu P.M. PARDALOS (eds.): Satis.fiability Problem: The-
search and a quadratic relaxation for the satisfiability ory and Applications, DIMACS 35, Amer. Math. Soc.,
problem', in M. TRICK AND D.S. JOHSON (eds.): Proc. 1997.
Second DIMA CS Algorithm Implementation Challenge [~01 SELMAN, B., AND KAUTZ, H.: 'Domain-independent
on Cliques, Coloring and Satisfiability, DIMACS 26, extensions to GSAT: Solving large structured satisfia-
1996, pp. 457-477. bility problems': Proc. Internat. Joint Conf. Artificial
[3s] JOHNSON, D.S.: 'Approximation algorithms for com- Intelligence, 1993, pp. 290-295.
binatorial problems', J. Comput. Syst. Sci. 9 (1974), [51] SELMAN, B., KAUTZ, H.A., AND COHEN, B.: 'Local
256-278. search strategies for satisfiability testing', in M. TRICK
[39] JOHNSON, D.S., AND TRICK, M. (eds.): Cliques, col- AND D.S. JOHSON (eds.): Proc. Second DIMACS Algo-
oring, and satisfiability: Second DIMA CS implementa- rithm Implementation Challenge on Cliques, Coloring
tion challenge, Vol. 26 of DIMA CS, Amer. Math. Soc., and Satisfiability, DIMACS 26, 1996, pp. 521-531.
1996. SELMAN, B., LEVESQUE, H., AND MITCHELL, D.: 'A
[401 KAMATH, A.P., KARMARKAR, N.K., RAMAKRISHNAN, new method for solving hard satisfiability problems':
K.G., AND RESENDE, M.G.: 'Computational exprience Proc. l Oth Nat. Conf. Artificial Intelligence (AAAI-
with an interior point algorithm on the satisfiability 92), July 1992, pp. 440-446.
problem', Ann. Oper. Res. 25 (1990), 43-58. [53] SPEARS, W.M.: 'Simulated annealing for hard satis-
[41] KARLOFF, H., AND ZWICK, U.: 'A 7/8-approximation fiability problems', in M. TRICK AND D.S. JOHNSON
algorithm for MAX 3SAT?': Proc. 38th Annual IEEE (eds.): Proc. Second DIMACS Algorithm Implementa-
Symp. Foundations of Computer Sci., IEEE Computer tion Challenge on Cliques, Coloring and Satis fiability,
Soc., 1997. no. 26 in DIMACS 26, 1996, pp. 533-555.
[42] KIRKPATRICK, S., AND SELMAN, B.: 'Critical behav- [54] TREVISAN, L.: 'Approximating satisfiable satisfiabil-
ior in the satisfiability of random Boolean expressions', ity problems': Proc. 5th Annual European Symp. Al-
Science 264 (1994), 1297-1301. gorithms, Graz, Springer, 1997, pp. 472-485.
[43] KIROUSlS, L.M., KRANAKIS, E., AND KRIZANC, D.: YANNAKAKIS, M.: 'On the approximation of maximum
'Approximating the unsatisfiability threshold of ran- satisfiability', J. Algorithms 17 (1994), 475-502.
dom formulas': Proc. Fourth Annual European Symp.
Algorithms, Springer, Sept. 1996, pp. 27-38.
[44] MINTON, S., JOHNSTON, M.D., PHILIPS, A.B., AND Roberto Battiti
Dip. Mat. Univ. Trento

271
Maximum satisfiability problem

Via Sommarive, 14, 38050 Povo (Trento), Italy Annealing refers to a process of cooling material
E-mail address: battiti%science, unitn, i t slowly until it reaches a stable state.
Metropolis also made several early contributions
MSC 2000: 03B05, 68Q25, 90C09, 90C27, 68P10, 68R05,
68T15, 68T20, 94C10 to the use of computers in the exploration of non-
Key words and phrases: maximum satisfiability, local linear dynamics. In the Sixties and Seventies he
search, approximation algorithms, history-sensitive heuris- collaborated with G.-C. Rota and others on sig-
tics. nificance arithmetic. Another contribution of Me-
tropolis to numerical analysis is an early paper on
the use of Chebyshev's iterative method for solving
large scale linear systems [1].
METROPOLIS~ NICHOLAS CONSTANTINE
Nicholas Constantine Metropolis was born in References
[1] BLAIR, A., METROPOLIS, N., NEUMANN, J. VON,
Chicago on June 11, 1915 and died on October 17,
TAUB, A.H., AND TSINGOU, M.: 'A study of a nu-
1999 in Los Alamos. At Los Alamos, Metropolis merical solution to a two-dimensional hydrodynamical
was the main driving force behind the development problem', Math. Tables and Other Aids to Computation
of the MANIAC series of electronic computers. He 13, no. 67 (July 1959), 145-184.
was the first to code a problem for the ENIAC in [2] HARLOT, F., AND METROPOLIS, N.: 'Computing and
computers: Weapons simulation leads to the computer
1945-1946 (together with S. Frankel), a task which
era', Los Alamos Sci. (1983), 132-141.
consumed approximately 1,000,000 IBM punched [3] KIRKPATRICK, S., GELATT, C.D., AND VECCHI JR.,
cards. M.P.: 'Optimization by simulated annealing', Science
Metropolis received his PhD in physics from the 220, no. 4598 (1983), 671-680.
University of Chicago in 1941. He went to Los [4] METROPOLIS, N.: 'The beginning of the Monte Carlo
method', Los Alamos Sci. 15 (1987).
Alamos in 1943 as a member of the initial staff of
[5] METROPOLIS, N.: The age of computing: A personal
fifty scientists of the Manhattan Project. He spent memoir, Daedalus, 1992.
his entire career at Los Alamos, except for two [6] METROPOLIS, N., HOWLETT, J., AND ROTA, G.-C.
periods (1946-1948 and 1957-1965), during which (eds.): A history of computing in the twentieth century,
he was professor of Physics at the University of Acad. Press, 1980.
Chicago. [7] METROPOLIS, N., AND NELSON, E.C.: 'Early comput-
ing at Los Alamos', Ann. Hist. Comput. 4, no. 4 (Oct.
Metropolis is best known for the development 1982), 348-357.
(joint with S. Ulam and J. von Neumann) of the [8] METROPOLIS, N., ROSENBLUTH, i . , TELLER, A., AND
Monte-Carlo method. The Monte-Carlo method TELLER, S.: 'Equation of state calculation by fast com-
provides approximate solutions to a variety of puting machines', J. Chem. Phys. 21 (1953).
mathematical problems by performing statistical Panos M. Pardalos
sampling experiments on a computer. However, Center for Applied Optim.
the real use of Monte-Carlo methods as a research Dept. Industrial and Systems Engin. Univ. Florida
Gainesville, FL 32611, USA
tool stems from work on the atomic bomb during
E-mail address: pardalosCufl, edu
the second world war. This work involved a direct
MSC2000: 90C05, 90C25
simulation of the probabilistic problems concerned
Key words and phrases: Metropolis, simulated annealing,
with random neutron diffusion in fissile material. Monte-Carlo method.
Metropolis and his collaborators, obtained Monte-
Carlo estimates for the eigenvalues of Schrodinger
equation. MINIMAX: DIRECTIONAL DIFFERENTIA-
In 1953, Metropolis co-authored the first paper BILITY
on the technique that came to be known as sim- Minimax is a principle of optimal choice (of some
ulated annealing [3], [8]. Simulated annealing is parameters or functions). If applied, this princi-
a method for solving optimization problems. The ple requires to find extremal values of some max-
name of the algorithm derives from an analogy be- type function. Since the operation of taking the
tween the simulation of the annealing of solids. pointwise maximum (of a finite or infinite number

272
Minimax: Directional differentiability

of functions) generates, in general, a n o n s m o o t h ~ ( x , g) is the gradient of ~ with respect to x for


function, it is i m p o r t a n t to study properties of a fixed y, (a, b) is the scalar product of vectors a
such a function. F o r t u n a t e l y enough, though a and b,
max-function is not differentiable, in m a n y cases
it is still directionally differentiable. The direc- o f (x) - co { :" (~, g) . y e R(x) } c R ~.
tional differentiability provides a tool for formulat- [:]
ing necessary (and sometimes sufficient) conditions
for a m i n i m u m or m a x i m u m and for constructing The set Of(x) is called the subdifferential of f at
numerical algorithms. x. It is convex and compact. The m a p p i n g Of is,
Recall t h a t a function f : R n --+ R is called in general, discontinuous.
Hadamard directionally differentiable (H.d.d.) at
REMARK 2 It turns out t h a t a convex function can
a point x E R n if for any g C R n there exists the
also be represented in the form (1) with ~ being
finite limit
affine in x. For this special (convex) case the set
f (x + ~g') - f (x) . Of(x) is
lira
[~,g']-~[+0,g] a
A function f : R n ~ t t is called Dini direction- Of(x)
ally differentiable (D.d.d.) at a point x E R n if for = {v ~ R ~ f ( z ) - f ( x ) >__ (~,z - ~), W e S } .
any g E R n there exists the finite limit
[:]
f (x + ~g) - f (x) .
lim
a$O c~ The discovery of the directional differentiability of
max-functions ([6], [1], [2]) and convex functions
If f is H.d.d., then it is D.d.d. as well and
[10] was a b r e a k t h r o u g h and led to the develop-
f~(x, g) =f~ (x, g).
ment of minimax theory and convex analysis ([10],
Let ~ C R n be a convex compact set, x E ~t.
[4], [9]).
The cone

N ~ ( ~ ) = {v e R ' : (~,~) = p~(~)} A Maximum Function with Dependent


is called normal to ~ at x. Here Constraints. Let x C R n, Y C R m be open sets
and let
yEgt '
f(~)- m~x :(~,g), (3)
is the support function of ~t at x.

where a(x) is a multivalued m a p p i n g with compact


A max-function. Let
images, ~" X × Y --~ R is H a d a m a r d differentiable
f(x) - m a x ~(x y) (1) as a function of two variables, i.e. there exists the
yEG '
limit
where ~: S × G --+ R is continuous jointly in x, y
on S × G and continuously differentiable in x there, ~ ( [ x , y], [g, v]) - lim
S C R n is an open set, G is a compact set of some
[~,9',¢]-~[+0,g,~]
space. Under the conditions stated, the function f 1 [~(~ + ~g,, y + ~v') - :(~, y)].
c~
is continuous on S.
PROPOSITION 1 The function f is H.d.d. at any T h e n ~ is continuous and ~ is continuous as a
point x E S and function of direction [g, v].
The function f is called a maximum function
:~I(~,g)- m~x ( : ~'(x, y ) , g ) - m a x (v,g), (2) with dependent constraints. Such functions are of
great i m p o r t a n c e and have widely been studied
where
(see [3], [8], [7], [5]). To illustrate the results let
R(x) = {y e V: f (x) = : ( ~ , y ) } , us formulate one of t h e m [5, T h m . 1,6.3].

273
Minimax: Directional differentiability

PROPOSITION 3 Let a mapping a be closed and V[~,y,z]eD~(x), v e R m.


bounded, its images be convex and compact, the
Assume also that G1 is convex. Let y E G1. Put
support function a(x,l) - maXvca(z)(v,l ) be uni-
formly differentiable with respect to parameter 1. ~(y) - {v - ~ ( y ' - y). ~ > 0, y' ~ a~ },
Let, further, x E X and a function q~ be concave in r(y) = cl ~(y).
some convex neighborhood of the set {[x, y]: y C
PROPOSITION 4 [3, Thm. 5.2] Under the above as-
R(x)} (where n ( x ) = {y e a(x): q~(x,y)= f(x)}).
sumptions the function f (see (5)) is Hadamard
Then f (see (3)) is H.d.d. and
directionally differentiable and
f ' ( x , g) = sup min [(/1, g) + a'(x,12;g)], ftt(x, g) --- sup sup min
u~n(~) [l~,12]~V(~,y)
~n(~) ~r(v) ~Q(~,~)
(4)
where ~y ,v + Ox 'g "
v(~, y) - {t - [t~, t:] e -~:(x, y). t~ e N~,,}, [-q

OqD(x, y) is the superdifferential of qD at the point REMARK 5 More sophisticated results on the di-
[x, y], and Nz,u is the cone normal to a(x) at y. [3 rectional differentiability of max- and maxmin
functions can be found, e.g., in [8]. [-1
Recall that if a function F : R s -+ R is concave,
Z C R s is open, z E Z, then the set
H i g h e r - O r d e r D i r e c t i o n a l D e r i v a t i v e s . The
OF(z)
results above are related to the first order direc-
_ lz, l- lzl lv, z,_zl, tional derivatives. Using these derivatives, it is
possible to construct the following first order ex-
is called the superdifferential of F at z C Z. It is pansion:
convex and compact. f(x + ag) -- f(x) + af'(x,g) + ox,g(a), (6)
where f ' is either f~ or f~.
A m a x m i n f u n c t i o n . Let ~ ( x , y , z ) : S × G1 ×
In some cases it is possible to get 'higher-order'
G2 -4 R be continuous jointly in all variables,
S C R n be an open set, G1 C R m, G2 C R p expansions.
be compact. Put Let

f (x) -- max min : ( x , y, z). (5) f ( x ) -- max f~(x), (7)


iEI
yEG1 zEG2
where I = 1 : N , x = ( x l , . . . , X n ) E R n, the f~s
The function f is continuous on S.
are continuous and continuously differentiable up
Let
t h e / t h order on an open set S C R r. Fix x C S.
(I)(x, y) -- min q~(x, y, z), Then for sufficiently small a > 0
zEG2
R(~) = {y e a~: ~(~, y) = 1(~)}, fi(x + ag) (8)
Q(~, y) = {z e a2: ~(~, y, z) = ~(~, y)).
I °~k f}k)(x, g) -+"oi(g, a l
: fi(x)+ E g )'
Fix x C S, let De(e > 0) be an e-neighborhood k-1
of the set {x} × R(x) × t_Juen(z)Q(x,y ). Assume where
that the derivatives n

(9~ 099 02~ 02qp (:92~


jl, " -1 - Oxjk gjl ' " " " ' gjk '
Ox' Oy' Ox2' OxOy ' Oy 2
(9)
exist and are continuous jointly in all variables on
D~(x) and that k E 1,...,1,

( 02 q°(-2' y' z) ) o~(g,. ~)


-+0
-O-y2 v, v <_0, ~l a$O

274
Minimax: Directional differentiability

uniformly with respect to g, Ilgll = 1.


Let us use the following notation : l 1 f(k)(x , A)] + o(llAllk),
iEI
k=l
f°(x, g) - fi(x), Let us use the notation (see (9))
Vi E I, Ro(x,g) = I,
f~k) (X, A) -- Aik Ak.
R k ( z , g) = {i e g):
The function f[k) (x, A) is a kth order :form of co-
(x,g)- max , , ordinates A 1 , . . . , An; Aik being the set of coef-
jERk-l(x,g)
ficients of this form. Then (12) can be rewritten
kE1,...,l. &S

Clearly f ( x + A) (13)
Ro(x, g) D Rl (x, g) D R2(x, g) D . . . . [
= max fi(x) + E
iEI
,1 ]
~. Aik/kk + O(llAIIk)
Note that R0(x, g) does not depend on x and g, k=l
and R1 (x, g) does not depend on g.
= f(x) + max Ak Ak k
PROPOSITION 6 [3, Thm. 9.1] The following ex-
pansion holds:
where
l C~k
f ( x + ag) -- f ( x ) + E -~. f(k)(x' g) + o(g, cJ), d ' f ( x ) = co {A (i) - ( A i o , . . . , A i l ) " i E I } ,
k=l
Aio - f i ( x ) - f(x), A - ( A o , . . . , Al),
(10)
Ao E R, A 1 E R ~ ,
Vg ~ R ~,
k times
where A2 E Rn×n, . . . , Ak E ~t n×''×2.
f(k)(x,g)-- max f[k)(x,g), (11) k times

Here, R nx''xr~ is the space of kth order real


o(g,
-+0 forms, e.g. R nxn is the space of real (n x n)-
o~l a.l.O
matrices.
uniformly with respect to g, [[g[[ - 1. [:3 The set dlf(x) is called the kth order hypodif-
The value Ok f (x)log k - f(k)(x, g) is called the kth ferential of f at x. It is an element of the space
l
derivative of f at x in a direction g.
R x R ~ x ... x RnX...x~. The mapping dlf is con-
REMARK 7 The mapping R1 (x, g) is not continu-
tinuous in x.
ous in x, while the mappings Rk(x, g) (k > 2) are
not continuous in x as well as in g. Therefore the REMARK 8 Expansion (13) can be extended to the
functions f(k)(x,g) in (11) are not continuous in x case where f is given by (1) and ~ is l times con-
and (if k > 2) in g and, as a result, expansion (6) tinuously differentiable in x. [::]
is also not 'stable' in x. Ill Max functions represent a special case of the class
To overcome this difficulty we shall employ an- of quasidifferentiable functions (see [5]).
other tool. See also" Bilevel l i n e a r p r o g r a m m i n g : C o m -
plexity, e q u i v a l e n c e t o m i n m a x , concave
H y p o d i f f e r e n t i a b i l i t y of a M a x F u n c t i o n . p r o g r a m s ; Bilevel o p t i m i z a t i o n : F e a s i b i l i t y
Let us again consider the case where f is de- t e s t a n d flexibility index; N o n d i f f e r e n t i a b l e
fined by (7). It follows from (8) that, for A = optimization: Minimax problems; Stochas-
e R , tic p r o g r a m m i n g : M i n i m a x a p p r o a c h ; Sto-
c h a s t i c q u a s i g r a d i e n t m e t h o d s in m i n i m a x
f ( x + A) (12) problems; Minimax theorems.

275
Minimax: Directional differentiability

References and first published in [27]. On the other hand,


[1] DANSKIN, J.M.: The theory of max-min and its appli- G.C. Stockman [29] introduced the SSS, algo-
cation to weapons allocation problems, Springer, 1967. rithm. Both methods try to minimize the number
[2] DEMYANOV, V.F.: 'On minimizing the maximal devi-
of nodes explored in the game tree using special
ation', Vestn. Leningrad. Univ. 7 (1966), 21-28.
[3] DEMYANOV, V.F.: Minimax: Directional differentiabil- traversal strategies and cut conditions.
ity, Leningrad Univ. Press, 1974. (In Russian.)
[4] DEMYANOV, V.F., AND MALOZEMOV, V.N.: Introduc-
M i n i m a x Trees. A two-player zero-sum per/ect-
tion to minimax, Wiley, 1974, Second edition: Dover,
1990. information game, also called minimax game, is a
[5] DEMYANOV, V.F., AND RUBINOV, A.M.: Constructive game which involves exactly two players who al-
nonsmooth analysis, P. Lang, 1995. ternatively make moves. No information is hidden
[6] GmSANOV, I.V.: 'Differentiability of solutions of the from the adversary. No coins are tossed, that is, the
mathematical programming problems': Abstracts Conf.
game is completely deterministic, and there is per-
Applications of Functional Analysis Methods to Solving
Nonlinear Problems, 1965, pp. 43-45. fect symmetry in the quality of the moves allowed.
[7] LEVITIN, E.S.: Perturbation theory in mathematical Go, checker and chess are such minimax games
programming and its applications, Wiley, 1994. whereas backgammon (the outcome of a die deter-
[8] MINCHENKO, L.I., AND BORISENKO, O.F.: Differential mines the moves available) or card games (cards
properties of marginal functions and their applications
are hidden from the adversary) are not.
to optimization problems, Nauka i Techn. (Minsk),
1992. (In Russian.) A minimax tree or game tree is a tree where
[9] PSCHENICHNY, B.N.: Convex analysis and extremal each node represents a state of the game and each
problems, Nauka, 1980. (In Russian.) edge a possible move. Nodes are alternatively la-
[10] ROCKAFELLAR, R.T.: Convex analysis, Prince- beled 'max' and 'rain' representing either player's
ton Univ. Press, 1970. turn. A node having no descendants represents a
Vladimir F. Demyanov final outcome of the game. The goal of a game is to
St. Petersburg State Univ. find a winning sequence of moves, given that the
St. Petersburg, Russia opponent always plays his best move.
E-mail address: vladimir, demyanov@pobox,spbu. ru
The quality of a node t in the minimax game
MSC2000: 90C30, 65K05 tree, representing a configuration, is given by its
Key words and phrases: minimax problem, max-function, value e(t). The value e(t), also called minimax
maxmin function, directional derivative, higher-order
value, is defined recursively as
derivatives, hypodifferentiability, support function.

f(t) if t is a leave node,

MINIMAX GAME TREE SEARCHING m a x e(s) if t is labeled 'max',


e(t)-- sEsons(t)
With the introduction of computers, also started rain e(s) ift is labeled 'min'.
the interest in having machines play games. Pro- sEsons(t)
gramming a computer such that it could play, for
If the considered minimax tree represents a com-
example chess, was seen as giving it some kind of
plete game, that is, all possible board configura-
intelligence. Starting in the mid fifties, a theory
tions, the function f may be defined as follows:
on how to play two player zero sum perfect in-
formation games, like chess or go, was developed.
÷1 if t leads to a winning position,
This theory is essentially based on traversing a tree
called minimax or game tree. An edge in the tree f (t) -- 0 ift leads to a tie position,
represents a move by either of the players and a -1 if t leads to a losing position;
node a configuration of the game.
Two major algorithms have emerged to com- otherwise f (t) represents an evaluation of the qual-
pute the best sequence of moves in such a mini- ity of a board position.
max tree. On one hand, there is the alpha-beta The relation between minimax trees and games
algorithm suggested around 1956 by I. McCarthy is detailed in the following table.

276
Minimax game tree searching

Minimax tree notion Minimax game notion of a node are passed to its sons and tightened dur-
Minimax tree All board configurations ing the execution of the algorithm. It is easy to see
Node in the tree Board configuration that if the lower b o u n d of a node t of type 'max'
Edge from 'max' to 'min' Move by player 'max' is larger t h a n its upper b o u n d then all not visited
node
sons of node t can be pruned, and similar for nodes
Edge from 'min' to 'max' Move by player 'min'
node of type 'min'.
Node value Quality of a board position FUNCTION AlphaBeta(n, a,/3) IS
Leave node Outcome of a game BEGIN
Solution path Sequence of moves leading IF is_leave(n) THEN RETURN f(n)
to the best outcome s +-- first_son(n)
IF node_type(n) = max THEN
LOOP
Sequential Minimax Game Tree A l g o - a +-- max{a, AlphaBeta(s, a, 13)}
r i t h m s . Let t be a node of a minimax tree. T h e n IF a >__/3THEN RETURN
the function first_son(t) returns the first son node EXIT LOOP WHEN no_more_sons(s, n)
sl of t and n e x t _ s o n ( s i , t) returns the i + l t h son of s +-- next_son(s, n)
node t. The function n o _ m o r e _ s o n s ( s , t) returns END LOOP
RETURN a
true of s is the last son of t. Otherwise it returns
ELSE
false. The ordering of the sons introduced by these LOOP
functions is arbitrary. In practice it is given by j3 +--max{a, AlphaBeta(s, a,~)}
some heuristic function. The function father(t) re- IF ~ < a THEN RETURN a
turns the father node of t, is_leave(t) whether or EXIT LOOP WHEN no_more_sons(s, n)
s +-- next_son(s, n)
not t is a leave node and n o d e _ t y p e ( t ) the type of
END LOOP
node t. RETURN ~3
END IF
Minimax Algorithm. The most basic minimax al-
END AlphaBeta
gorithm is called the minimax algorithm. It sys-
tematically traverses, in a depth first, left to right Pseudocode for the alpha-beta algorithm.
fashion, the complete minimax tree. All nodes are
It has been proved in [18] t h a t the alpha-beta al-
visited exactly once.
gorithm correctly calculates the minimax value of
Alpha-Beta Algorithm. The first nontrivial algo- a tree. The above pseudocode describes the alpha-
r i t h m introduced to compute the minimax value b e t a algorithm.
of a game tree was the alpha-beta algorithm. Ac- The minimax value of a tree T is computed as
cording to D. K n u t h and R. Moore, McCarthy's follows.
comments at the D a r t m o u t h summer research con-
ference on artificial intelligence led to the use e (root(T)) +-- A l p h a B e t a ( r o o t ( T ) , - c ~ , + ~ ) .
of alpha-beta pruning in game playing programs
since the late 1950s. The first published discussion Optimal State Space Search Algorithm SSS,. It has
of an algorithm for minimax tree pruning appeared been introduced by Stockman in 1979, [29]. It orig-
in 1958 (see [11, p. 56]). Two early extensive stud- inates not in game playing but in systematic pat-
ies of the algorithm may be found in [18] and [27]. tern recognition. The algorithm was first analyzed
The idea behind the alpha-beta algorithm is to and criticized in [26].
traverse the minimax tree in a depth first, left to The idea behind the SSS. algorithm is to use
right fashion. It tries to prune sub-trees that can a tree traversal strategy t h a t is, better t h a n the
not influence the minimax value of the tree. The depth first and left to right strategy found in the
conditions used to prune sub'trees are called cut alpha-beta algorithm. The criteria used to order
conditions. The idea behind the suggested cut con- the nodes yet to visit is an upper bound of their
ditions is to associate to each node a lower and an value. Nodes are stored in non increasing order of
upper bound, called a and ~ bounds. The bounds their upper b o u n d in a list called 'open'.

277
Minimax game tree searching

T h e SSS. algorithm first traverses the mini- (Apply the F operator to node s) --
m a x tree from top to b o t t o m . Nodes whose sons IF t -" live AND n o d e t y p e = max
have not yet been visited and which cannot yet be AND NOT is_leave(t) THEN
s +-- first _ son (t)
p r u n e d are m a r k e d 'live'. Nodes m a r k e d 'solved'
LOOP
have already been visited once and have therefore insert(s, live, m, open)
their best u p p e r b o u n d associated. EXIT LOOP WHEN no more_sons(s,t)
T h e operation purge(t, open) removes all nodes s +- next_son(s, t)
from the open list for which the node t is an an- END LOOP
END IF
cestor. Due to the fact t h a t the nodes in the open
IF t = live AND node_type = min
list are sorted in nonincreasing order of their as- AND NOT is_leave(t) THEN
sociated u p p e r bound, the p r u n i n g operation only insert(firstson(t), live, m, open)
eliminates nodes t h a t need no further considera- END IF
tion. IF t = live AND is_leave(t) THEN
insert(t, solved, min {f (t), m }, open)
The SSS, algorithm is described by the follow- END IF
ing pseudocode. IF t = solved AND node_type -- max
AND NOT no_more_sons(t, father(t)) THEN
insert(next _son(t, father(t)), live, m, open)
END IF
IF t -- solved AND node_type -- max
FUNCTION SSS • IS AND no_more_sons(t, father(t)) THEN
BEGIN
insert(father(t), solved, m, open)
open +-- q}
END IF
insert(root, live, +c~, open)
IF t = solved AND node_type - min THEN
LOOP
insert(father(t), solved, m, open)
(s, t, m) +-- remove(open)
purge(father(t), open)
IF s -- root AND t = solved THEN RETURN m
END IF
(Apply the F operator to node s)
END LOOP
END SSS • SCOUT: Minimax Algorithm of Theoretical Inter-
est. In the previous sections, we have described
Pseudocode for the SSS. algorithm. the most c o m m o n m i n i m a x algorithms. While try-
ing to show the optimality of the a l p h a - b e t a al-
gorithm, J. Pearl [23] introduced the S C O U T al-
gorithm. His idea was to show t h a t the S C O U T
T h e operator F(s) is applied to each node s ex- algorithm is d o m i n a t e d by the a l p h a - b e t a algo-
t r a c t e d from the 'open' list. r i t h m and to prove t h a t S C O U T achieves an op-
It is possible to define a dual version of the timal performance. But counterexamples showed
SSS,, which may be called SSS.-dual, in which t h a t the a l p h a - b e t a a l g o r i t h m does not dominate
the c o m p u t a t i o n of u p p e r b o u n d s is replaced by the S C O U T algorithm because the conservative
the c o m p u t a t i o n of lower bounds. T h e S S S , - d u a l testing approach of the S C O U T algorithm may
algorithm has been suggested in [21]. sometimes cut off nodes t h a t would have been ex-
S t o c k m a n has shown t h a t if the SSS, algorithm plored by the a l p h a - b e t a algorithm.
explores a node, t h e n this node is also explored by The S C O U T a l g o r i t h m itself recursively com-
the a l p h a - b e t a algorithm. In fact, the a l p h a - b e t a putes the value of the first of its sons. T h e n it tests
algorithm loses efficiency (in the n u m b e r of nodes to see if the value of the first son is b e t t e r t h a t the
visited) against the SSS. algorithm when the value value of the other sons. In case of a negative result,
of the m i n i m a x tree is found towards the right of the son t h a t failed the test is completely evaluated
the tree. If the SSS. algorithm is applied to win- by recursively calling S C O U T .
lose trees then it visits exactly the same nodes in A l t h o u g h the S C O U T algorithm is more of
the same order as would the a l p h a - b e t a algorithm. theoretical interest, there are some problem in-

278
Minimax game tree searching

stances where it outperforms all other minimax return that value. If the minimax value does not
algorithms. A last advantage of the SCOUT al- belong to the set ]a,b[, then the value returned
gorithm versus one of its major competitors, the will be either a or b, depending on whether the
SSS, algorithm, is that its storage requirements minimax value belongs to ] - c o , a] or [b, +cc[. We
are similar to those of the alpha-beta algorithm. then say that the alpha-beta algorithm ]ailed low,
respectively high. In the case where the algorithm
GSEARCH: Generalized Game Tree Search Algo-
failed low, the call
rithm. In 1986, T. Ibaraki [16] proposed a gen-
eralization of the previously known algorithms to e +-- AlphaBeta ( r o o t ( T ) , - c o , a + 1)
compute the minimax value of a game tree. His
idea was to use a branch and bound like approach. will return the correct value. But it would also
Nodes of the considered tree which have not yet be possible to reiterate this procedure on a subset
been evaluated are stored in a list which is or- a + 1[.
dered according to a given criteria. Different or- The technique of limiting the interval in which
derings give different traversal strategies. A lower
the solution may be found is called aspiration
and upper bound is associated to each node. These
search. If the minimax value belongs to the spec-
bounds generalize the a and 13 values found in the
ified interval, then a much larger number of cut
alpha-beta algorithm.
conditions are verified and the tree actually tra-
Finally Ibaraki showed how the algorithm versed is much smaller than the one traversed by
GSEARCH is related to other minimax algorithms the alpha-beta algorithm without initial alpha and
like alpha-beta or SSS,, and proved that his algo- beta bounds.
rithm always surpasses the alpha-beta algorithm. Furthermore it is interesting to note that aspi-
SSS-2: Recursive State Space Search Algorithm. ration search is at the bases of a technique called
The SSS-2 algorithm has been proposed by W. Pi- iterative deepening which is used in many game
jls and A. de Bruin [24]. It is based on the idea of playing programs.
computing an upper bound for the root node and I. Alth5fer [5] suggested an incremental nega-
then repeatedly transforming this upper bound max algorithm which uses estimates of all nodes
into a tighter one. They have shown that the SSS- in the minimax tree, rather than only those of the
2 algorithm exactly expands the same nodes as leave nodes, to determine the value of the root
those to which the SSS, algorithm applies the F node. This algorithm is useful when dealing with
operator. erroneous leave evaluation functions. Under the
assumption of independently occurring and suf-
Some Variations On The Subject. Computing the ficiently small errors, the proposed algorithm is
minimax value of a game tree may be seen as aspir- shown to have exponentially reduced error prob-
ing the solution value from a leave node through abilities with respect to the depth of the tree.
the whole tree up to the root node. While moving R.L. Rivest [25] proposed an algorithm for
closer to the root node, more and more useless sub-
searching minimax trees based on the idea of ap-
trees will be eliminated, as we have already stated
proximating the min and the max operators by
for the alpha-beta algorithm. The better the a and
generalized mean value operators. The approxima-
/3 bounds, the more subtrees may be pruned. If, for tion is used to guide the selection of the next leave
instance, one knows that the minimax value will,
node to expand, since the approximation allows to
with high probability, be found in the subset ]a, b[,
select efficiently that leave node upon whose value
then it may be worth calling the alpha-beta algo-
the minimax value most highly depends. B.W. Bal-
rithm as lard [6] proposed a similar algorithm where the
e <---AlphaBeta (root(T), a, b) value of some nodes (the chance node as he calls
them) is a, possibly weighted, average of the val-
If, indeed, the minimax value e(root(T)) belongs ues of its sons. In fact he considers one additional
to the set ]a, b[, then the algorithm will correctly type of nodes called chance nodes.

279
Minimax game tree searching

Conspiracy numbers have been introduced by bors. The probability to find the optimum in the
D.A. McAllester in [22] as a measurement of the subtree rooted at a given son then always decreases
accuracy of the minimax value of an incomplete when traversing the sons in a left to right order.
tree. They measure the number of leave nodes Such ordering information is generally available in
whose value must change in order to change the game-playing programs, the ordering function be-
minimax value of the root node by a given amount. ing a heuristic function based on the knowledge of
the game to be played.

Parallel Minimax Tree Algorithms. Paral- A Mandatory Work First Algorithm. R. Hewett
lelizing the minimax algorithm is trivial over uni- and G. Krishnamurthy [15] proposed an algorithm
form trees. Even on irregular trees, the paralleliza- that achieves an efficiency of roughly 50% for an
tion remains easy. The only additional problem number of processors in the range of 2 to 25. All
arises from the fact that the size of the subtrees the nodes that still need to be explored are main-
to explore may now vary. Different processors will tained in a list called 'open' list. This list is ordered
be attributed problems of varying computational with respect to how the nodes have been reached.
volume. All what is needed then to achieve excel- More precisely, the algorithm maintains two lists
lent speedups, is a load-balancing scheme, that is, called 'open' and 'closed', and a tree called 'cut'.
a mechanism by means of which processors may, The 'open' list contains all the nodes yet to be
during run-time, exchange problems so as to keep explored, the 'closed' list contains the expanded
all processors busy all the time. nodes not yet pruned and the 'cut' tree contains
The parallelization of the alpha-beta and the the pruned nodes. The 'open' list initially contains
SSS. algorithms are much more interesting than only the root node. All processors fetch nodes from
the more theoretical minimax algorithm. There ex- the 'open' list and process them if they cannot be
ist basically two approaches or techniques to par- discarded, that is, they do not have any of their
allelize the alpha-beta algorithm. In the first ap- ancestors in the 'cut' tree. Leave nodes are eval-
proach, which has been one of the first techniques uated and their result is returned to the parent
used, all processors explore the entire tree but us- which may update its value and check for possi-
ing different search-intervals. This approach is at ble pruning by traversing the 'cut' tree up to the
the basic of the algorithm called parallel aspiration root node applying the usual alpha and beta cut-
search by G. Baudet [7]. The second one consists offs. If the node selected is not a leave node, it is
in exploring simultaneously different parts of the expanded and its sons are inserted into the 'open'
minimax tree. list and itself into the 'closed' list.
S.G. Akl et al. [1], [2] proposed an algorithm
A Simple Way to Parallelize the Exploration o]
that uses the same approach for exploring the
Minimax Trees. Exploring a minimax tree in par-
minimax tree. Their priority function is computed
allel can very simply be obtained by generating
as
the sons of the root node, and their sons and so
on up to the point where one has as many son p(ni) - p(father(ni)) - (bn, + 1 - i) . 10 (h-/-l>,
nodes waiting to be explored as there are proces-
sors. At this point, each processor explores the sub- where ni is the ith son of node father(hi), bn~ the
tree rooted at one of these nodes, using any given branching of node father(hi), h the search depth
sequential minimax algorithm. When all proces- (the maximal depth of the minimax tree) and ]
sors have completed their exploration, the solution the depth of node father(ni) in the minimax tree.
for the entire tree is computed by using the partial K. Almquist et al. [3] also developed an algo-
results obtained from each of the processors. rithm based on the idea of having two categories
In practice the sons of a node may be ordered in of unexplored nodes which are ordered according
such a way that any son has a probability of yield- to a given priority function. Furthermore they add
ing the locally optimal path that is no smaller than to this concept parallel aspiration search as well as
the corresponding probabilities for its right neigh- a novel scheduling algorithm.

280
Minimax game tree searching

In the same direction, V.-D. Cung and C. Rou- sons si of n to its Pb slaves. As soon as one slave
cairol [9] have proposed a shared memory parallel returns the next unexplored son sj is spawned to
minimax algorithm which distinguishes between that slave or the current value is returned to the
critical and non critical nodes. In their algorithm father processor if the cut condition is satisfied. If
one processor is assigned to each node. all the sons of a node have been spawned to its
slaves, the father processor waits for the results of
In the algorithm by I.R. Steinberg and M.
all its slaves. Leave processors simply compute the
Solomon [28], which is also a mandatory work first
value of their associated node using the sequential
type algorithm, the list containing the speculative
work or non critical nodes is dynamically ordered. alpha-beta algorithm.
An important advantage of the tree-splitting al-
Aspiration Search. The parallel algorithm called gorithm over other more elaborated algorithms is
aspiration search has been introduced by Baudet that it may be simply implemented as well on
in 1978 [7]. In this algorithm the search interval a shared memory parallel machine as on a dis-
] - c ~ , +co[ used by the sequential alpha-beta algo- tributed memories parallel machine.
rithm is divided into a certain number of subinter- The tree-splitting algorithm has been imple-
vals that cover the entire range ] - c ~ , +c~[. Now, mented and its execution has been simulated. On
every processor explores the entire minimax tree a 27 processor simulated machine, in which each
using one subinterval, different processors being processor has tree slave sons associated, the aver-
assigned different intervals. Any processor search- age speedup was 5.31 for trees of depth eight and
ing an interval ]ai, ai+l] may either fail low or high. a branching of three.
The principle is the same as in the sequential ver-
sion of the algorithm. Exactly one processor will P VSPLIT: Principal Variation Splitting Algo-
neither fail low, nor fail high. The value computed rithm. It has been proposed by T.A. Marsland and
by this processor is the value of the minimax tree M.S. Campbell [19] and is by far the most often
to explore. implemented algorithm, especially in chess playing
programs. The algorithm is based on the structure
The implementation of the aspiration search al-
of the sequential alpha-beta algorithm. The idea
gorithm is really simple. Furthermore, there is no
is to first explore in a sequential fashion a path
information exchange needed between processors.
from the root node to its leftmost leave. This path
If the nodes in the to explore minimax tree are or-
is called the principal variation path. The traver-
dered in such a way that the alpha-beta algorithm
sal is done to obtain alpha and beta bounds. If
has to explore the whole tree, then the speedup
the minimax tree to explore is of type best first,
obtained by using the aspiration search algorithm
then the explored principal variation path repre-
is maximal. But, when the aspiration search algo-
sents the solution path. In a second phase, for each
rithm is applied to randomly generated trees then
level of the minimax tree all the yet to be visited
Baudet has shown that the speedup is limited to
sons are explored in parallel by using the bounds
about six and is independent of the number of pro-
computed during the principal variation path com-
cessors used.
putation and the traversal of the lower levels of the
Tree-Splitting Algorithm. Among the early parallel minimax tree.
minimax algorithms is the tree-splitting algorithm The P V S P L I T algorithm is completely de-
by R.A. Finkel and J.P. Fishburn [14]. This algo- scribed by the following pseudocode using the
rithm is based on the idea to look at the available negamax notation.
processors as a tree of processors. Each processor, The P V S P L I T algorithm has been implemented
except for the ones representing leaves in the pro- in [20] on a network of Sun workstations. An ac-
cessor tree, have a fixed number Pb of s o n or slave celeration of 3.06 has been measured on 4 proces-
processors. During the execution of the algorithm sors when traversing minimax trees representing
a non leave processor associated with a node n in real chess games. The main problem of the PVS-
the minimax tree spawns the exploration of the PLIT algorithm is that, during the second phase,

281
Minimax game tree searching

the subtrees explored in parallel are not necessar- distributed among all the processors. This opera-
ily of the same size. tion concludes the synchronization phase.
The PVSPLIT algorithm is most efficient when The computation phase of the SDSSS algorithm
the iterative deepening technique is used, because may be described by the following pseudocode.
with each iteration is is increasingly likely that the
(Computation phase) -
first move tried, that is, the one on the principal
W H I L E (there exists a node in the open list
variation path, is the best one. having an upper bound of m*> L O O P
F U N C T I O N PVSplit(b, a, 13) IS (s, t,m* ) +- remove(open)
BEGIN IF s = root A N D t = solved T H E N
IF is_leave(n) T H E N R E T U R N f ( n ) BROADCAST 'the solution has been found'
s +-- first_son(n) RETURN m*
a +-- - P V S p l i t ( s , - f ~ , - a ) E N D IF
IF a >_ ~ T H E N R E T U R N a <Apply the F operator to node s>
F O R s' E s o n s ( n ) - {s} L O O P IN PARALLEL END LOOP
(wait until a slave node is idle)
vi +-- - T r e e S p l i t ( s ' , - ~ , - a ) Pseudocode for the computation phase of the SDSSS
IFvi>aTHEN algorithm.
ol +--- vi
(Update the bounds according to a Experiments executing the SDSSS algorithm on
on all slaves> an Intel iPSC/2 parallel machine have been con-
END IF ducted. Speedups of up to 11.4 have been mea-
IF c~ > / 3 T H E N
sured for 32 processors.
(Terminate all slave processors>
R E T U R N c, Distributed Game Tree Search Algorithm. R. Feld-
END IF
man [12] parallelized the alpha-beta algorithm for
END L O O P
R E T U R N c~ massively parallel distributed memory machines.
END PVSplit Different subtrees are searched in parallel by dif-
ferent processors. The allocation of processors to
Pseudocode for the P V S P L I T algorithm. trees is done by imposing certain conditions on the
nodes which are be selectable. They introduce the
Synchronized Distributed State Space Search. A concept of younger brother waits. This concept es-
completely different approach to parallelizing the sentially says that in the case of a subtree rooted
SSS, algorithm has been taken by C.G. Diderich at s l, where Sl is the first son node of a node n, is
and M. Gengler [10]. The algorithm proposed is not yet evaluated, then the other sons s 2 , . . . , Sb of
called synchronized distributed state space search node n are not selectable. Younger brothers may
(SDSSS). It is an alternation of computation and only be considered after their elder brothers, which
synchronization phases. The algorithm has been has as a consequence that the value of the elder
designed for a distributed memory multiproces- brothers may be used to give a tight search win-
sor machine. Each processor manages its own local dow to the younger brothers.
'open' list of unvisited nodes. This concept is nevertheless not sufficient to
The synchronization phase may be subdivided achieve the same good search window as the alpha-
in three major parts. First, the processors ex- beta algorithm achieves. Indeed when node Sl is
change information about which nodes can be re- computed, then the younger brothers may all be
moved from the local 'open' lists. This corresponds explored in parallel using the value of node Sl.
to each processor sending the nodes for which the Thus the node s2 has the same search window as it
'purge' operation may be applied by all the other would have in the sequential alpha-beta algorithm,
processors. Next, all the processors agree on the but this is not true anymore for si, where i >_ 3.
globally lowest upper bound m* for which nodes Indeed if nodes s2 and s3 are processed in paral-
exist in some of the 'open' lists. Finally all the lel, they only know the value of node Sl, while in
nodes having the same upper bound m* are evenly the sequential alpha-beta algorithm, the node s3

282
Minimax game tree searching

would have known the value of b o t h Sl and s2. [2] AKL, S.G., BARNARD, D.T., AND DORAN, R.J.: 'De-
This fact forces the parallel algorithm to provide sign, analysis, and implementation of a parallel tree
an information dissemination protocol. search algorithm', IEEE Trans. Pattern Anal. Machine
Intell. PAMI-4, no. 2 (1982), 192-203.
In case the nodes s2 and 83 are evaluated on [3] ALMQUIST, K., MCKENZIE, N., AND SLOAN, K.: 'An
processors P and p , , and processor P finishes its inquiry into parallel algorithms for searching game
work before P ' , producing a better value t h a n node trees', Techn. Report Univ. Washington, Seattle, WA
81 did, then processor P will inform processor P ' 12, no. 3 (1988).
of this value, allowing it to continue with better [4] ALTHOFER, I.: 'On the complexity of searching game
trees and other recursion trees', J. Algorithms 9 (1988),
information on the rest of its subtree or to termi-
538-567.
nate its work if the new value allows P ' to con- [5] ALTH(~FER, I." 'An incremental negamax algorithm',
clude that its c o m p u t a t i o n becomes useless. The Artif. Intell. 43 (1990), 57-65.
load distribution is realized by means of a dynamic [6] BALLARD, B.W.: 'The *-minimax search procedure for
load balancing scheme, where idle processors ask trees containing chance nodes', Artif. Intell. 21 (1983),
327-350.
other processors for work.
[7] BAUDET, G.M.: 'The design and analysis of algo-
Speedups as high as 100 have been obtained on rithms for asynchronous multiprocessors', PhD Thesis
a 256 processor machines. In [13], a speedup of Carnegie-Mellon Univ. Pittsburgh, PA, no. CMU-CS-
344 on a 1024 t r a n s p u t e r network interconnected 78-116 (1978).
as a grid and a speedup of 142 on a 256 processor Is] BOHM, M., AND SPECKENMEYER, E.: 'A dynamic pro-
cessor tree for solving game trees in parallel': Proc.
t r a n s p u t e r de Bruijn interconnected network have
SOR '89, 1989.
been shown.
[0] CUNG, V.-D., AND ROUCAIROL, C.: 'Parallel minimax
tree searching', Res. Report INRIA 1549 (1991). (In
Parallel M i n i m a x Algorithm with Linear Speedup.
French.)
In 1988, Alth5fer [4] proved t h a t it is possible, [10] DIDERICH, C.G.: 'Evaluation des performances de
to develop a parallel minimax algorithm which l'algorithme SSS* avec phases de synchronisation sur
achieves linear speedup in the average case. W i t h une machine parall~le ~. m~moires distributes', Techn.
the assumption that all minimax trees are binary Report Computer Sci. Dept. Swiss Federal Inst. Techn.
win-loss trees, he exhibited such a parallel mini- Lausanne, Switzerland, no. LiTH-99 (July 1992). (In
French.)
max algorithm.
[11] FEIGENBAUM, E.A., AND FELDMAN, J.: Computers
M. B5hm and E. Speckenmeyer [8] also sug- and thought, McGraw-Hill, 1963.
gested an algorithm which uses the same basic [12] FELDMANN, a., MONIEN, B., MYSLIWIETZ, P., AND
ideas as Alth5ffer. Their algorithm is more gen- VORNBERGER, O.: 'Distributed game tree search',
ICCA J. 12, no. 2 (1989), 65-73.
eral in the sense t h a t it needs only to know the
[13] FELDMANN, R., MYSLIWIETZ, P., AND MONIEN, B.:
distribution of the leave values and is independent
'Game tree search on a massively parallel system',
of the branching of the tree explored. in H.J. VAN DEN HERIK, I.S. HERSCHB~.RG, AND
In 1989, R.M. Karp and Y. Zhang [17] proved J.W.H.M. UITERWIJK (eds.): Advances in Computer
that it is possible to obtain linear speedup on ev- Chess, Vol. 7, Univ. Limburg, 1994, pp. 203-218.
ery instance of a r a n d o m uniform minimax tree if
[14] FINKEL, R.A., AND FISHBURN, J.P.: 'Parallelism in
alpha-beta search', Artif. Intell. 19 (1982), 89-106.
the number of processors is close to the height of [15] HEWETT, a., AND KRISHNAMURTHY, G.: 'Consistent
the tree. linear speedup in parallel alpha-beta search': Proc.
See also" S h o r t e s t p a t h t r e e a l g o r i t h m s ; ICCI'92, Computing and Information, IEEE Computer
Directed tree networks; Bottleneck Steiner Soc. Press, 1992, pp. 237-240.
tree problems. [10] IBARAKI, T.: 'Generalization of alpha-beta and SSS*
search procedures', Artif. Intell. 29 (1986), 73-117.
[17] KARP, R.M., AND ZHANG, Y.: 'On parallel evaluation
of game trees': A CM Annual Syrup. Parallel Algorithms
References and Architectures (SPAA'89), ACM, 1989, pp. 409-
[1] AKL, S.G., BARNARD, D.T., AND DORAN, R.J.: 420.
'Searching game trees in parallel': Proc. 3rd Biennial [is] KNUTH, D.E., AND MOORE, R.W.: 'An analysis of
Conf. Canad. Soc. Computation Studies of Intelligence, alpha-beta pruning', Artif. Intell. 6, no. 4 (1975), 293-
1979, pp. 224-231.

283
Minimax game tree searching

326. inf sup f (x, y) - sup inf ] (x, y).


[19] MARSLAND, T.A., AND CAMPBELL, M.S.: 'Parallel yEY xEX xEX yEY
search of strongly ordered game trees', A CM Comput-
The purpose of this article is to give the reader
ing Surveys 14, no. 4 (1982), 533-551.
the flavor of the different kind of minimax theo-
[2o] MARSLAND, T.A., AND POPOWICH, F.: 'Parallel game-
tree search', IEEE Trans. Pattern Anal. Machine In- rems, a n d of the techniques that have been used
tell. PAMI-7, no. 4 (July 1985), 442-452. to prove them. This is a very large area, and it
[21] MARSLAND, T.A., REINEFELD, A., AND SCHAEFFER, would be impossible to touch on all the work that
J.: 'Low overhead alternatives to SSS*', Artif. Intell. has been done in it in the space that we have at
al (1987), 185-199.
our disposal. The choice t h a t we have made is to
[22] MCALLESTER, D.A.: 'Conspiracy numbers for min-
max searching', Artif. Intell. 35 (1988), 287-310. give the historical roots of the subject, and then go
[23] PEARL, J.: 'Asymptotical properties of minimax trees directly to the most recent results. The reader who
and game searching procedures', Artif. Intell. 14, no. 2 is interested in a more complete narrative can refer
(1980), 113-138. to the 1974 survey article [35] by E.B. Yanovskaya,
[24] PIJLS, W., A N D DE BRUIN, A.: 'Another view of the
the 1981 survey article [8] by A. Irle and the 1995
SSS* algorithm': Proc. Internat. Symp. (SIGAL'90),
Aug. 1990.
survey article [31] by S. Simons.
[25] RIVEST, R.L.: 'Game tree searching by rain/max ap-
proximation', Artif. Intell. 34, no. 1 (1987), 77-96. V o n N e u m a n n ' s R e s u l t s . In his investigation of
[26] ROIZEN, I., A N D PEARL, J.: 'A minimax algorithm games of strategy, J. von N e u m a n n realized that,
better than alpha-beta? Yes and no', Artif. Intell. 21
even though a two-person zero-sum game did not
(1983), 199-230.
[27] SLAGLE, J.H., AND DIXON, J.K.: 'Experiments with
necessarily have a solution in pure strategies, it
some programs that search game trees', J. A CM 16, did have to have one in mixed strategies. Here is a
no. 2 (Apr. 1969), 189-207. statement of that seminal result ([19], translated
[2s] STEINBERC, I.R., AND SOLOMON, M.: 'Searching game into English in [21]):
trees in parallel': Proc. IEEE Internat. Conf. Parallel
Processing, Vol. III, 1990, pp. I I I - 9 - III-17. THEOREM 1 (1928) Let A be an m × n matrix,
[29] STOCKMAN, G.C.: 'A minimax algorithm better than and X and Y be the sets of nonnegative row and
alpha-beta?', Artif. Intell. 12, no. 2 (1979), 179-196. column vectors with unit sum. T h e n
Claude G. Diderich
min max x A y - max min xAy.
Computer Sci. Dept. yEY xEX xEX yEY
Swiss Federal Inst. Technology-Lausanne
[3
CH-1015 Lausanne, Switzerland
E-mail address: diderich@acm, org Despite the fact that the statement of this result
Marc Gentler is quite elementary, the proof was quite sophisti-
Ecole Sup. d'Ing6nieurs de Luminy cated, and depended on an extremely ingenious
Univ. M6diterrann6e
induction argument. Nine years later, in [20], von
F-13288 Marseille, France
E-mail address: Marc. Gengler©esil. univ-mrs, f r
N e u m a n n showed t h a t the bilinear character of
Theorem 1 was not needed when he extended it
MSC2000: 49J35, 49K35, 62C20, 91A05, 91A40
as follows, using Brouwer's fixed point theorem:
Key words and phrases: algorithms, games, minimax,
searching. THEOREM 2 (1937) Let X and Y be nonempty
compact, convex subsets of Euclidean spaces, and
f" X × Y --+ R be jointly continuous. Suppose that
MINIMAX THEOREMS ] is quasiconcave on X and quasiconvex on Y (see
We suppose that X and Y are nonempty sets and below). Then
f : X x Y --+ R. A minimax theorem is a theorem
min max f - max min f.
that asserts that, under certain conditions, Y X X Y
[3
inf sup f -- sup i~f f,
Y X X W h e n we say t h a t f is quasiconcave on X , we
that is to say, mean that

284
Minimax theorems

• for all y E Y and A E R, GT(A, y) is convex, (or the related Knaster-Kuratowski-Mazurkiewicz


and when we say that f is quasiconvex on Y, we lemma (KKM lemma) on closed subsets of a finite-
mean that dimensional simplex).

• for all x E X and A E R, LE(x, A) is convex.


Functional-Analytic Minimax Theorems.
Here, G T ( A , y ) a n d L E ( x , A ) a r e 'level sets' asso- The first person to take minimax theorems out
ciated with the function f. Specifically, of the context of convex subsets of vector spaces,
GT(A,y) "- {x E X" f ( x , y ) > A} and their proofs (other t h a n that of the matrix
case discussed in Theorem 1) out of the context
and
of fixed point theorems was Fan in 1953 ([2]). We
LE(x, A) "- {y E Y" f (x, y) < A}. present here a generalization of Fan's result due
In 1941, S. K a k u t a n i [10] analyzed von Neu- to H. Khnig ([15]). Khnig's proof depended on the
mann's proof and, as a result, discovered the fixed Mazur-Orlicz version of the H a h n - B a n a c h theo-
point theorem that bears his name. rein (see T h e o r e m 5 below).
THEOREM 4 (1968) Let X be a nonempty set and
Infinite-Dimensional R e s u l t s for C o n v e x Y be a nonempty compact topological space. Let
Sets. The first infinite-dimensional minimax the- f : X × Y -+ R be lower semicontinuous on Y.
orem was proved in 1952 by K. Fan ([1]), who gen- Suppose that:
eralized Theorem 2 to the case when X and Y are
• for all Xl,X2 E X, there exists x3 E X such
compact, convex subsets of infinite-dimensional lo- that
cally convex spaces, and the quasiconcave and qua-
siconvex conditions are somewhat relaxed. The re- f (x3, ") > o n Y;
- 2
sult in this general line that has the simplest state-
• for all yl, Y2 E Y, t h e r e e x i s t s Y3 E Y s u c h
ment is that of M. Sion, who proved the following
that
([33]):
THEOREM 3 (1958) Let X be a convex subset /(y3)
•, </(''
-
+2 f(' y2) on X.
of a linear topological space, Y be a compact Then
convex subset of a linear topological space, and
min sup f - sup m~n f.
f : X × Y ~ R be upper semicontinuous on X Y X X
and lower semicontinuous on Y. Suppose that f is D
quasiconcave on X and quasiconvex on Y. T h e n
We give here the statement of the Mazur-Orlicz
min sup f - sup m~n f. version of the Hahn-Banach theorem, since it is a
Y X X
very useful result and it not as well-known as it
deserves to be.
W h e n we say t h a t f is 'upper semicontinuous on
THEOREM 5 (Mazur-Orlicz theorem) Let S be a
X ' and 'lower semicontinuous on Y' we mean that,
sublinear functional on a real vector space E, and
for all y E Y, the map x ~ f(x, y) is upper semi-
C be a n o n e m p t y convex subset of E. Then there
continuous and, for all x E X, the map y ~ f(x, y)
exists a linear functional L on E such that L < S
is lower semicontinuous. The importance of Sion's
on E and infc L = infc S. [:5
weakening of continuity to semicontinuity was that
it indicated that many kinds of minimax problems See [16], [22] and [23] for applications of the
have equivalent formulations in terms of subsets Mazur-Orlicz theorem and the related 'sandwich
of X and Y, and led to Fan's 1972 work ([4]) on theorem' to measure theory, Hardy algebra theory
sets with convex sections and minimax inequali- and the theory of flows in infinite networks.
ties, which has since found many applications in The kind of m i n i m a x theorem discussed in this
economic theory. Like Theorem 2, all these result section (where X is not topologized) has turned
relied ultimately on Brouwer's fixed point theorem out to be extremely useful in functional analysis,

285
Minimax theorems

in particular in convex analysis and also in the THEOREM 7 (1972) Let X be a nonempty set and
theory of monotone operators on a Banach space. Y be a nonempty compact topological space. Let
(See [32] for more details of these kinds of appli- f : X x Y -+ R be lower semicontinuous on Y.
cations.) Suppose that,
• for all Xl,X2 C X there exists x3 C X such
Minimax Theorems that Depend on Con-
that
n e c t e d n e s s . It was believed for some time that
.) )
proofs of minimax theorems required either the f (X3, ") > on Y.
- 2
fixed point machinery of algebraic topology, or the
functional-analytic machinery of convexity. How- Suppose also that, for all /k C R and, for all
ever, in 1959, W.-T. Wu proved the first mini- nonempty finite subsets W of X,
max theorem in which the conditions of convex- LE(W, i~) is connected in Y.
ity were totally replaced by conditions related to
connectedness. This line of research was continued Then
by H. Tuy, L.L. Stach6, M.A. Geraghty with B.-L. min sup f - sup n~n f.
Lin, and J. Kindler with R. Trost, whose results Y X X

were all subsumed by a family of general topo- K]


logical minimax theorem established by K6nig in
[17]. Here is a typical result from [17]. In order to
A M e t a m i n i m a x T h e o r e m . It was believed for
simplify the statements of this and some of our
some time that Brouwer's fixed point theorem
later results, we shall write ], := supx infy f. / ,
or the Knaster-Kuratowski-Mazurkiewicz lemma
is the 'lower value' of f. If A C R, V C Y and
was required to order to prove Sion's theorem,
W C X, we write GT(~, V) "- ~y~y GT()~, y) and
Theorem 3. However, in 1966, M.A. Ghouila-Houri
LE(W, i~) "- ~x~w LE(x, A).
([7]) proved Theorem 3 using a simple combinato-
THEOREM 6 (1992) Let X be a connected topo- rial property of convex sets in finite-dimensional
logical space, Y be a compact topological space, space. This was probably the first indication of
and f : X × Y ~ R be upper semicontinuous on the breakdown of the classification of minimax
X and lower semicontinuous on Y. Let A be a theorems as either of 'topological' or 'functional-
nonempty subset of (f,, ce) such that inf A = f, analytic' type. Further indication of this break-
and suppose that, for all A E A, for all nonempty down was provided by Terkelsen's result, Theorem
subsets V of Y, and for all nonempty finite subsets 7, and the subsequent 1982 results of I. Jo6 and
W of X, Stach6 ([9]), the 1985 and 1986 results of Geraghty
GT(A, V) is connected inX, and Lin ([5] and [6]), and the 1989 results of H.
Komiya ([18]).
and
Kindler ([11]) was the first to realize (in 1990)
LE(W, )~) is connected in Y. that some abstract concept akin to connected-
ness might be involved in minimax theorems, even
Then when the topological condition of connectedness
min sup f -- sup m~n f. was not explicitly assumed. This idea was pursued
Y X X by Simons with the introduction in 1992 of the
K] concept of pseudoconnectedness, which we will now
describe. We say that sets H0 and H1 are joined
by a set H if
Mixed Minimax Theorems. In [34], F.
Terkelsen proved the first mixed minimax theorem. HCHoUH1, HMHo#O
We describe Terkelsen's result as 'mixed' since one
and
of the conditions in it is taken from Theorem 4, and
the other from Theorem 6: HMH1 ~:0.

286
Minimax theorems

We say that a family 7t of sets is pseudoconnected More recent work by Kindler ([12], [13] and [14])
if on abstract intersection theorems has been at the
interface between minimax theory and abstract set
H0, HI, H E 7t and H0 and HI joined by H
theory.

HoNH1 ~ 0. Minimax Theorems and Weak Compact-


Any family of closed connected subsets of a topo- ness. There are close connections between mini-
logical space is pseudoconnected. So also is any max theorems and weak compactness. The follow-
family of open connected subsets. However, pseu- ing 'converse minimax theorem' was proved by Si-
doconnectedness can be defined in the absence of mons in [25]; this result also shows that there are
any topological structure and, as we shall see in limitations on the extent to which one can totally
Theorem 8, is closely related to minimax theorems. remove the assumption of compactness from mini-
Theorem 8 is the improvement of the result of [29] max theorems.
due to Kbnig (see [30]). We shall say that a subset THEOREM 9 (1971) Suppose that X is a
W of X is good if nonempty bounded, convex, complete subset of
• W is finite; and a locally convex space E with dual space E*, and
• for all x E X, LE(x, f . ) n LE(W, f.) ¢ 0. inf sup <x, y> - sup inf <x, y>
yEY xEX xEX yEY
THEOREM 8 (1995) Let Y be a topological space,
and A be a nonempty subset of R such that whenever Y is a nonempty convex, equicontinuous,
infA - f.. Suppose that, for all A E A and for subset of E*. Then X is weakly compact. [2]
all good subsets W of X,
No compactness is assumed in the following, much
• for all x E X, LE(x, A) is closed and com- harder, result (see [26]):
pact; { L E ( x , A ) n LE(W,A)}xEx is pseudo-
THEOREM 10 (1972) If X is a nonempty bounded,
connected; and
convex subset of a locally convex space E such
• for all xo, x] E X, there exists x E X such that every element of the dual space E* attains its
that LE(xo, A) and LE(xI,A) are joined by supremum on X, and Y is any nonempty convex
LE(x, A) N LE(W, A). equicontinuous subset of E*, then
Then
inf sup < x , y ) - sup inf <x y>.
min sup f - sup m~n f. yEY xEX xEX yEY
Y X X,
[2]
O
If one now combines the results of Theorems 9 and
Theorem 8 is proved by induction on the cardi-
10, one can obtain a proof of the 'sup theorem' of
nality of the good subsets of W. Given the obvi-
R.C. James, one of the most beautiful results in
ous topological motivation behind the concept of
functional analysis:
pseudoconnectedness, it is hardly surprising that
Theorem 8 implies Theorem 6. W h a t is more un- THEOREM 11 (James sup theorem) If C is a
expected is that Theorem 8 implies Theorems 4 nonempty bounded closed convex subset of E,
and 7 also. We prefer to describe Theorem 8 as a then C is w(E, E*)-compact if and only if, for all
metaminimax theorem rather than a minimax the- x* E E*, there exists x E C such that (x,x*> =
orem, since it is frequently harder to prove that m a x c x*. [2]
the conditions of Theorem 8 are satisfied in any James's theorem is not easy ~ the standard proof
particular case that it is to prove Theorem 8 itself. can be found in the paper [24] by J.D. Pryce.
So Theorem 8 is really a device for obtaining mini-
See [31] for more details of the connections be-
max theorems rather than a minimax theorem in
tween minimax theorems and weak compactness.
its own right.

287
Minimax theorems

Minimax Inequalities for Two or More tures'. This question is discussed in [27] and
F u n c t i o n s . Motivated by Nash equilibrium and [28]. The relationship between Theorem 12 and
the theory of noncooperative games, Fan general- Brouwer's fixed point theorem is quite interest-
ized Theorem 2 to the case of more than one func- ing. As we have already pointed out, Sion's the-
tion. In particular, he proved in [3] the following orem, Theorem 3, can be proved in an elementary
two-function minimax inequality (since the com- fashion without recourse to fixed point related con-
pactness of X is not needed, this result can in fact cepts. On the other hand, Theorem 12 can, in fact,
be strengthened to include Sion's theorem, Theo- be used to prove Tychonoff's fixed point theorem,
rem 3, by taking g = f): which is itself a generalization of Brouwer's fixed
THEOREM 12 (1964) Let X and Y be nonempty point theorem. (See [3] for more details of this.)
compact, convex subsets of topological vector A number of authors have proved minimax in-
spaces and f,g: X × Y --+ R. Suppose that f is equalities for more than two functions. See [31] for
lower semicontinuous on Y and quasiconcave on more details of these results.
X, g is upper semicontinuous on X and quasicon-
vex on Y, and Coincidence Theorems. A coincidence theorem
is a theorem that asserts that if S : X --+ 2Y and
f<g o n X x Y.
T: Y --+ 2 x have nonempty values and satisfy cer-
Then tain other conditions, then there exist x0 E X and
min sup f < sup i~f g. Y0 E Y such that y0 E Sxo and x0 E Tyo. The con-
Y X -- X nection with minimax theorems is as follows: Sup-
D pose that infy supx f ~ supz infy f. Then there
exists A E R such that
Fan (unpublished)and Simons (see [27]) general-
ized K6nig's theorem, Theorem 4, with the follow- sup inf f < A < inf sup f.
X Y Y X
ing two-function minimax inequality:
Hence,
THEOREM 13 (1981) Let X be a nonempty set, Y
• for all x E X there exists y E Y such that
be a compact topological space and f, g: X × Y -~
f(x, y) < )~; and
R. Suppose that f is lower semicontinuous on Y,
and • for all y E Y there exists x E X such that

• for all Yl,y2 E Y there exists y3 E Y such


f(x, y)>
that Define S: X --+ 2 Y and T: Y --+ 2 x by

f(',Y3) < f(',Yl)+ f(',Y2) Sx := {y E Y: f ( x , y ) < A} ~ O


- 2 onX;
and
• for all xl,x2 E X there exists x3 E X such
that := {x E X: f(z, V) > # 0.
g(x3, .) -> ) +2 g(x , .) o n Y;
If S and T were to satisfy a coincidence theorem,
then we would have x0 E X and Y0 E Y such that
and
f(xo, Yo) < ~ and f(xo, Yo) > )~,
• f<_gonX×Y.
which is clearly impossible. Thus this coincidence
Then
theorem would imply that
min sup f < sup incfg.
Y X -- X inf sup f - sup inf f.
Y X X Y
D
The coincidence theorems known in algebraic
Theorems 12 and 13 both unify the theory of mini- topology consequently give rise to corresponding
max theorems and the theory of variational in- minimax theorems. There is a very extensive lit-
equalities. The curious feature about these two re- erature about coincidence theorems. See [31] for
sults is that they have 'opposite geometric pic- more details about this.

288
Minimax theorems

See also: Stochastic quasigradient meth- [is] KOMIYA, H.: 'On minimax theorems', Bull. Inst. Math.
ods in minimax problems; Stochastic pro- Acad. Sinica 17 (1989), 171-178.
gramming: Minimax approach; Minimax: [19] NEUMANN, J. VON: 'Zur Theorie der Gesellschaft-
spiele', Math. Ann. 100 (1928), 295-320.
Directional differentiability; Bilevel linear [20] NEUMANN, J. VON: 'Ueber ein 5konomisches Gle-
programming: Complexity, equivalence to ichungssystem und eine Verallgemeinerung des Brouw-
minmax, concave programs; Bilevel optimi- erschen Fixpunktsatzes', Ergebn. Math. Kolloq. Wien
zation: Feasibility test and flexibility in- 8 (1937), 73-83.
dex; Nondifferentiable optimization: Mini- [21] NEUMANN, J. VON: 'On the theory of games of strat-
egy', in A.W. TUCKER AND R.D. LUCE (eds.): Contri-
max problems. butions to the Theory of Games, Vol. 4, Princeton Univ.
Press, 1959, pp. 13-42.
References [22] NEUMANN, M.: 'Some unexpected applicatons of the
[1] FAN, K.: 'Fixed-point and minimax theorems in locally sandwich theorem': Proc. Conf. Optimization and Con-
convex topological linear spaces', Proc. Nat. Acad. Sci. vex Analysis, Univ. Mississippi, 1989.
USA 38 (1952), 121-126. [23] NEUMANN, M.: 'Generalized convexity and the Mazur-
[2] FAN, K.: 'Minimax theorems', Proc. Nat. Acad. Sci. Orlicz theorem': Proc. Orlicz Memorial Conf., Univ.
USA 39 (1953), 42-47. Mississippi, 1991.
[3] FAN, K.: 'Sur un th~or~me minimax', C.R. Acad. Sci. [24] PRYCE, J.D.: 'Weak compactness in locally convex
Paris 259 (1964), 3925-3928. spaces', Proc. Amer. Math. Soc. 17 (1966), 148-155.
[4] FAN, K.: 'A minimax inequality and its applications', [25] SIMONS, S.: 'Crit~res de faible compacit6 en termes du
in O. SHISHA (ed.): Inequalities, Vol. III, Acad. Press, th6or~me de minimax', Sdm. Choquet, no. 23 (1970/1),
1972, pp. 103-113. 8.
[5] GERAGHTY, M.A., AND LIN, B.-L.: 'Minimax theo- [26] SIMONS, S.: 'Maximinimax: minimax, and antiminimax
rems without linear structure', Linear Multilinear Al- theorems and a result of R.C. James', Pacific J. Math.
gebra 17 (1985), 171-180. 40 (1972), 709-718.
[6] GERAGHTY, M.A., AND LIN, B.-L.: 'Minimax theo- [27] SIMONS, S.: 'Minimax and variational inequalities:
rems without convexity', Contemp. Math. 52 (1986), Are they or fixed point or Hahn-Banach type?', in
102-108. 0. MOESCHLIN AND m. PALLASCHKE (eds.): Game
[7] GHOUILA-HOURI, M.A.: 'Le th~or~me minimax de Theory and Mathematical Economics, North-Holland,
Sion': Theory of games, English Univ. Press, 1966, 1981, pp. 379-388.
pp. 123-129. [2s] SIMONS, S.: 'Two-function minimax theorems and vari-
[8] IRLE, A.: 'Minimax theorems in convex situations', ational inequalities for functions on compact and non-
in O. MOESCHLIN AND D. PALLASCHKE (eds.): Game compact sets with some comments on fixed-points the-
Theory and Mathematical Economics, North-Holland, orems', Proc. Syrup. Pure Math. 45 (1986), 377-392.
1981, pp. 321-331. [29] SIMONS, S.: 'A flexible minimax theorem', Acta Math.
[9] JOO, I., AND STACHO, L.L." 'A note on Ky Fan's mini- Hungarica 63 (1994), 119-132.
max theorem', Acta Math. Acad. Sci. Hung. 39 (1982), [30] SIMONS, S.: 'Addendum to: A flexible minimax theo-
401-407. rem', Acta Math. Hungarica 69 (1995), 359-360.
[10] KAKUTANI, S.: 'A generalization of Brouwer's fixed- [31] SIMONS, S.: 'Minimax theorems and their proofs',
point theorem', Duke Math. J. 8 (1941), 457-459. in DING-ZHU DU AND PANOS M. PARDALOS (eds.):
[11] KINDLER, J.: 'On a minimax theorem of Terkelsen's', Minimax and Applications, Kluwer Acad. Publ., 1995,
Arch. Math. 55 (1990), 573-583. pp. 1-23.
[12] KINDLER, J.: 'Intersection theorems and minimax the- [32] SIMONS, S.: Minimax and monotonicity, Vol. 1693 of
orems based on connectedness', J. Math. Anal. Appl. Lecture Notes Math., Springer, 1998.
178 (1993), 529-546. [33] SION, M.: 'On general minimax theorems', Pacific J.
[13] KINDLER, J.: 'Intersecting sets in midset spaces. I', Math. 8 (1958), 171-176.
Arch. Math. 62 (1994), 49-57. [34] TERKELSEN, F.: 'Some minimax theorems', Math.
[14] KINDLER, J.: 'Intersecting sets in midset spaces. II', Scand. 31 (1972), 405-413.
Arch. Math. 62 (1994), 168-176. [35] YANOVSKAYA, E.B.: 'Infinite zero-sum two-person
[15] K(SNIG, H." 'Uber daN Von Neumannsche Minimax- games', J. Soviet Math. 2 (1974), 520-541.
Theorem', Arch. Math. 19 (1968), 482-487.
[16] KONIG, H.: 'On certain applications of the Hahn-
Banach and minimax theorems', Arch. Math. 21 Stephen Simons
(1970), 583-591. Dept. Math. Univ. California
[17] K~SNIG,H.: 'A general minimax theorem based on con- Santa Barbara, California 93106-3080, USA
nectedness', Arch. Math. 59 (1992), 55-64. E-mail address: simons@math, ucsb. edu

289
Minimax theorems

MSC 2000: 46A22, 49J35, 49J40, 54D05, 54H25, 55M20, Arc flow capacities can be removed by adding ad-
91A05 ditional source nodes, one for each capacitated arc
Key words and phrases: minimax theorem, fixed point the-
[19], [23].
orem, Hahn-Banach theorem, connectedness.
The fixed charge transportation problem
(FCTP) is a type of M C T P in which the cost
MINIMUM CONCAVE TRANSPORTATION function ¢ij (xij) for each arc (i, j) E A is of the
PROBLEMS, M C T P form
The m i n i m u m concave transportation problem _ ~0 ifxij -- O, (5)
¢ij(xij)
(MCTP) concerns the least cost method of carry- [ f ij -+-gij " xij if xij > 0,
ing flow on a bipartite network in which the mar-
where fij and gij are coefficients with fij >_ O.
ginal cost for an arc is a nonincreasing function of
F C T P s are commonly used to model network flow
the flow on that arc. A bipartite network contains
problems involving setup costs [9]. Furthermore,
source nodes and sink nodes, but no transshipment
a variety of combinatorial problems can be con-
(i.e., intermediate) nodes. The M C T P can be for-
verted to FCTPs. For instance, consider the 0-1
mulated as
knapsack problem (KP). The KP is formulated as
rain E ¢ij(xij) (1) n

(i,j)EA max E Ck" Yk (6)


k=l
subject to:
subject to"
E xij - si' Vi E M, (2) n

jEN ak " Yk <_ b, (7)


E xij - dj' Vj e N, (3) k=l
iEM Yk e {O, 1}, fork-l,...,n, (8)
xij >_ O, V ( i , j ) c A, (4) with ak >_ 0 and Ck >_ 0 for k - 1 , . . . , n . The
where M is the set of source nodes; N is the set KP can be converted to a F C T P with two source
of sink nodes; si is the supply at source node i, nodes and n + 1 sink nodes. Define an+l - b
dj is the demand at sink node j; A = {(i,j): i C and cn+l - O. Then, the network is specified
M, j E N} is the (directed) arc set; xij is the flow as M - {1,2}, N - { 1 , . . . , n + 1}, sl - b,
n
carried on arc (i,j); and ¢ij(Xij) is the concave s2 - ~ k = l ak, and dj - aj for j = 1 , . . . , n + 1;
cost function for arc (i,j). Objective function (1) and the cost function is of the form of (5) where,
minimizes total costs; constraints (2) balance flow for each arc (i, j) E A, the coefficients fij and gij
at the source nodes; and constraints (3) balance are given by
flow at the sink nodes. If EiEM 8i is less (greater) n

t h a n ~-~jEgdj, then a dummy source (sink) node ECk if j -- 1 , . . . , n ,


fij- k-1 (9)
can be added to set M (N).
0 ifj - - n + 1,
M C T P s arise naturally in distribution prob-
lems involving shipments sent directly from supply _cj ifi--1,
points to demand points in which the transporta- gij -- aj (10)
0 i f / - - 2.
tion costs exhibit economies of scale [21]. However,
the M C T P is not limited to this class of problems. For j - 1 , . . . , n sink node j has two incoming arcs,
Specifically, any network flow problem with arc exactly one of which will have nonzero flow in the
cost functions that are not concave can be con- optimal solution to the FCTP. If x~j > 0 in the
verted to a network flow problem on an expanded FCTP, then y~ - 1 in the KP. If x~j > 0 in the
network whose arc cost functions are all concave FCTP, then y~ - 0 in the KP.
[16]. Then, the expanded network can be converted One consequence of this result is that any inte-
to a bipartite network by replacing each transship- ger programming problem with integer coefficients
ment node with a source node and a sink node. can (in principle) be formulated and solved as a

290
Minimum concave transportation problems

FCTP by first converting the integer program to [7] FLOUDAS, C.A., AND PARDALOS, P.M.: A collection
a KP [10]. of test problems for constrained global optimization Al-
gorithms, Vol. 455 of Lecture Notes Computer Sci.,
Exact solution methods for the MCTP are pre-
Springer, 1990.
dominately branch and bound enumeration pro-
Is] GRAY, P.: 'Exact solution of the fixed-charge trans-
cedures [2], [3], [4], [6], [8], [11], [12], [15]. Binary portation problem', (]per. Res. 19 (1971), 1529-1538.
partitioning is used for the FCTP; and interval [9] GUISEWITE, G.M., AND PARDALOS, P.M.: 'Mini-
partitioning is used for the MCTP with arbitrary mum concave-cost network flow problems: Applica-
tions, complexity, and algorithms', Ann. Oper. Res. 25
concave arc cost functions. Finite convergence of
(1990), 75-100.
the method was shown by R.M. Soland [22]. The
[10] KENDALL, K.E., AND ZOINTS, S.: 'Solving integer pro-
convex envelope of the cost function ¢ij(Xij) is gramming problems by aggregating constraints', Oper.
an affine function. Hence, a subproblem in the Res. 25 (1977), 346-351.
branch and bound procedure can be solved effi- [11] KENNINGTON, J.: 'The fixed-charge transportation
ciently as a linear transportation problem (LTP) problem: A computational study with a branch-and-
bound code', AIIE Trans. 8 (1976), 241-247.
[1]. Fathoming techniques (such as 'up and down
[12] KENNINGTON, J., AND UNGER, V.E.: 'A new branch-
penalties' and 'capacity improvement') based on and-bound algorithm for the fixed charge transporta-
post-optimality analysis of the LTP facilitate the tion problem', Managem. Sci. 22 (1976), 1116-1126.
branch and bound procedure for the MCTP [2], [13] KHANG, D.B., AND FUJIWARA, O.: 'Approximate so-
[3], [18], [20]. The LTP is also used in approximate lutions of capacitated fixed-charge minimum cost net-
solution methods for the MCTP which rely on suc- work flow problems', Networks 21 (1991), 689-704.
[14] KIM, D., AND PARDALOS, P.M.: 'A solution approach
cessive linearizations of the concave cost function,
to the fixed charge network flow problem using a dy-
[5], [13], [14] namic slope scaling procedure', Oper. Res. Lett. 24
Test problems for the MCTP are given in [7], (1999), 195-203.
[8], [,2], [17], [20] [15] LAMAR, B.W.: 'An improved branch and bound algo-
rithm for minimum concave cost network flow prob-
See also: Concave programming; Bilevel
lems', J. Global Optim. 3 (1993), 261-287.
linear programming: Complexity, equiv- [16] LAMAR, B.W.: 'A method for solving network flow
alence to minmax, concave programs; problems with general nonlinear arc costs', in D.-Z.
Motzkin transposition theorem; Multi- Du AND P.M. PARDALOS (eds.): Network Optimization
index transportation problems; Stochastic Problems: Algorithms, Applications, and Complexity,
transportation and location problems. World Sci., 1993, pp. 147-167.
[17] LAMAR, B.W., AND WALLACE, C.A.: 'A comparison
of conditional penalties for the fixed charge transporta-
tion problem', Techn. Report Dept. Management Univ.
References Canterbury (1996).
[1] BALINSKI, M.L.: 'Fixed-cost transportation problems', [18] LAMAR, B.W., AND WALLACE, C.A.: 'Revised-
Naval Res. Logist. 8 (1961), 41-54. modified penalties for fixed charge transportation prob-
[2] BARR, R.S., GLOVER, F., AND KLINGMAN,D.: 'A new lems', Managem. Sci. 43 (1997), 1431-1436.
optimization method for large scale fixed charge trans- [19] LAWLER, E.L.: Combinatorial optimization: Networks
portation problems', Oper. Res. 29 (1981), 448-463. and matroids, Holt, Rinehart and Winston, 1976.
[3] BELL, G.B., AND LAMAR, S.W.: 'Solution methods [20] PALEKAR, U.S., KARWAN, M.H., AND ZIONTS, S.: 'A
for nonconvex network problems', in P.M. PARDALOS, branch-and-bound method for the fixed charge trans-
D.W. HEARS, AND W.W. HAGER (eds.): Network Op- portation problem', Managem. Sci. 36 (1990), 1092-
timization, Vol. 450 of Lecture Notes Economics and 1105.
Math. Systems, Springer, 1997, pp. 32-50. [21] RECH, P., AND BARTON, L.G.: 'A non-convex trans-
[4] CABOT, A.V., AND ERENGUC, S.S.: 'Some branch- portation algorithm', in E.M.L. BEALE (ed.): Applica-
and-bound procedures for fixed-cost transportation tions of Mathematical Programming Techniques, Eng-
problems', Naval Res. Logist. 31 (1984), 145-154. lish Univ. Press, 1970.
[5] DIABY, M.: 'Successive linear approximation procedure [22] SOLAND, R.M.: 'Optimal facility location with concave
for generalized fixed-charge transportation problems', costs', Oper. Res. 22 (1974), 373-382.
J. Oper. Res. Soc. 42 (1991), 991-1001. [23] WAGNER, H.M.: 'On a class of capacitated transporta-
[6] FLORIAN, M., AND ROBILLAND, P.: 'An implicit enu- tion problems', Managem. Sci. 5 (1959), 304-318.
meration algorithm for the concave cost network flow
problem', Managem. Sci. 18 (1971), 184-193.

291
Minimum concave transportation problems

Bruce W. Lamar
Minimize Z CijXij (1)
Economic and Decision Analysis Center
(i,j)EA
The MITRE Corp.
Bedford, MA 01730 USA subject to
E-mail address: bwlamar~mitre, org
2_, b(i), (2)
MSC2000: 90C26, 90C35, 90B06, 90B10 {j: (i,j)EA} {j: (j,i)eA}
Key words and phrases: flows in networks, global optimiza- for all i E N,
tion, nonconvex programming, fixed charge transportation
problem. lij <_ xij < uij, for all (i, j) E A. (3)
We refer to the constraints (2) as the mass bal-
ance constraints. For a fixed node i, the first term
in the constraint (2) represents the total outflow
MINIMUM COST FLOW PROBLEM
of node i and the second term represents the to-
The minimum cost flow problem seeks a least cost
tal inflow of node i. The mass balance constraints
shipment of a commodity through a network to
state that outflow minus inflow must equal the
satisfy demands at certain nodes by available sup-
supply/demand of each node. The flow must also
plies at other nodes. This problem has many, var-
satisfy the lower bound and capacity constraints
ied applications: the distribution of a product from
(3), which we refer to as flow bound constraints.
manufacturing plants to warehouses, or from ware-
This article is organized as follows. To help
houses to retailers; the flow of raw material and in-
in understanding the applicability of the mini-
termediate goods through various machining sta-
mum cost flow problem, we begin in Section 2 by
tions in a production line; the routing of automo-
describing several applications. In Section 3, we
biles through an urban street network; and the
present preliminary material needed in the subse-
routing of calls through the telephone system. The
quent sections. We next discuss algorithms for the
minimum cost flow problem also has many less di-
minimum cost flow problem, describing the cycle-
rect applications. In this article, we briefly intro-
canceling algorithm in Section 4 and the successive
duce the theory, algorithms and applications of the
shortest path algorithm in Section 5. The cycle-
minimum cost flow problem. [1] contains much ad-
canceling algorithm identifies negative cost cycles
ditional material on this topic.
in the network and augments flows along them.
Let G = (N, A) be a directed network defined
The successive shortest path algorithm augments
by a set N of n nodes and a set A of m directed
flow along shortest cost augmenting paths from the
arcs. Each arc (i, j) E A has an associated cost cij
supply nodes to the demand nodes. In Section 6,
that denotes the cost per unit flow on that arc. We
we describe the network simplex algorithm.
assume that the flow cost varies linearly with the
amount of flow. Each arc (i, j) E A has an associ-
A p p l i c a t i o n s . Minimum cost flow problems arise
ated capacity uij denoting the maximum amount
in almost all industries, including agriculture, com-
that can flow on this arc, and a lower bound lij
munications, defense, education, energy, health
that denotes the minimum amount that must flow
care, manufacturing, medicine, retailing, and
on the arc. We assume that the capacity and flow
transportation. Indeed, minimum cost flow prob-
lower bound for each arc (i, j) are integers. We as-
lems are pervasive in practice. In this section, by
sociate with each node i E N an integer b(i) rep-
considering a few selected applications that arise in
resenting its supply/demand. If b(i) > 0, node i is
distribution systems planning, capacity planning,
a supply node; if b(i) < 0, then node i is a demand
and vehicle routing, we give a passing glimpse of
node with a demand o f - b ( i ) ; and if b(i) = 0, then
these applications.
node i is a transshipment node. We assume that
~-~iEg b(i) - O. The decision variables xij are arc Distribution Problems. A large class of network
flows defined for each arc (i, j) E A. flow problems center around distribution applica-
The minimum cost flow problem is an optimi- tions. One core model is often described in terms
zation model formulated as follows: of shipments from plants to warehouses (or, alter-

292
Minimum cost flow problem

natively, from warehouses to retailers). Suppose a upon their flows to model contractual agreements
firm has p plants with known supplies and q ware- with shippers or capacities imposed upon any dis-
houses with known demands. It wishes to identify tribution channel. Finally, demand arcs connect
a flow that satisfies the demands at the warehouses retailer/model nodes to the retailer nodes. These
from the available supplies at the plants and that arcs have zero costs and positive lower bounds that
minimizes its shipping costs. This problem is a equal the demand of that model at that retail cen-
well-known special case of the minimum cost flow ter.
problem, known as the transportation problem. We pzlrr~
next describe in more detail a slight generalization
of this model that also incorporates manufacturing
costs at the plants. I~/~

A car manufacturer has several manufacturing


plants and produces several car models at each
plant that it then ships to geographically dispersed
retail centers throughout the country. Each retail
center requests a specific number of cars of each
model. The firm must determine the production
plan of each model at each plant and a shipping
pattern that satisfies the demand of each retail
Plant Plan#meal R~r/mmtel Retailer
center while minimizing the overall cost of pro- ued¢~ aed~ aed~ nedm

duction and transportation.


Fig. 1: Formulating the production-distribution problem.
We describe this formulation through an exam-
ple. Fig. 1 illustrates a situation with two manufac- The production and shipping schedules for the
turing plants, two retailers, and three car models. automobile company correspond in a one-to-one
This model has four types of nodes: fashion with the feasible flows in this network
i) plant nodes, representing various plants; model. Consequently, a minimum cost flow pro-
ii) plant/model nodes, corresponding to each vides an optimal production and shipping sched-
model made at a plant; ule.

iii) retailer/model nodes, corresponding to the Airplane Hopping Problem. A small commuter air-
models required by each retailer; and line uses a plane, with a capacity to carry at
iv) retailer nodes corresponding to each retailer. most p passengers, on a 'hopping flight' as shown
The network contains three types of arcs: in Fig. 2a). The hopping flight visits the cities
1 , . . . , n, in a fixed sequence. The plane can pick up
i) production arcs;
passengers at any node and drop them off at any
ii) transportation arcs; and other node. Let bij denote the number of passen-
iii) demand arcs. gers available at node i who want to go to node j,
The production arcs connect a plant node to a and let fij denote the fare per passenger from node
plant/model node; the cost of this arc is the cost of i to node j. The airline would like to determine the
producing the model at that plant. We might place number of passengers that the plane should carry
lower and upper bounds on production arcs to con- between the various origins to destinations in or-
trol for the minimum and maximum production der to maximize the total fare per trip while never
of each particular car model at the plants. Trans- exceeding the plane's capacity.
portation arcs connect plant/model nodes to re- Fig. 2b) shows a minimum cost flow formulation
tailer/model nodes; the cost of any such arc is the of this hopping plane flight problem. The network
total cost of shipping one car from the manufactur- contains data for only those arcs with nonzero
ing plant to the retail center. The transportation costs and with finite capacities: any arc listed with-
arcs might have lower or upper bounds imposed out an associated cost has a zero cost; any arc

293
Minimum cost flow problem

listed without an associated capacity has an infi- m a n problem arises in other settings as well; for
nite capacity. Consider, for example, node 1. Three instance, patrolling streets by police, routing street
types of passengers are available at node 1: those sweepers and household refuse collection vehicles,
whose destination is node 2, node 3 or node 4. fuel oil delivery to households, and spraying roads
We represent these three types of passengers in a with sand during snowstorms. The directed Chi-
new derived network by the nodes 1 - 2, 1 - 3 nese p o s t m a n problem assumes that all arcs are
and 1 - 4 with supplies b12, b13 and b14. A pas- directed, that is, the postal carrier can traverse an
senger available at any such node, say 1 - 3, could arc in only one direction (like one-way streets).
board the plane at its origin node represented by In the directed Chinese p o s t m a n problem, we
flowing through the arc (1 - 3, 1) and incurring a are interested in a closed (directed) walk that tra-
cost o f - f 1 3 units (or profit of f13 units). Or, the verses each arc of the network at least once. The
passenger might never board the plane, which we network might not contain any such walk. It is easy
represent by the flow through the arc ( 1 - 3 , 3). It is to show that a network contains a desired walk if
easy to establish a one-to-one correspondence be- and only if the network is strongly connected, t h a t
tween feasible flows in Fig. 2b) and feasible loading is, every node in the network is reachable from ev-
of the plane with passengers. Consequently, a min- ery other node via a directed path. Simple graph
imum cost flow in Fig. 2b) will prescribe a most search algorithms are able to determine whether
profitable loading of the plane. the network is strongly connected, and we shall
therefore assume that the network is strongly con-
{hi nected.
In an optimal walk, a postal carrier might tra-
verse arcs more t h a n once. The m i n i m u m length
walk minimizes the sum of lengths of the repeated
~4 b arcs. Let xij denote the number of times the postal
carrier traverses arc (i, j) in a walk. Any carrier
walk must satisfy the following conditions:
t~wt "--"~ -

- ,>, 0 (4)
{j: (i,j)EA} {j: (j,i)EA}
for all i E N,
•up.it7 xij _ 1 for all (i,j) E A. (5)

Fig. 2: Formulation of the hopping plane flight problem as The constraints (4) state that the carrier enters
a minimum cost flow problem. a node the same number of times that he or she
leaves it. The constraints (5) state that the car-
Directed Chinese Postman Problem. The directed rier must visit each arc at least once. Any solution
Chinese postman problem is a generic routing x satisfying the system (4)-(5) defines a carrier's
problem that can be stated as follows. In a di- walk. We can construct a walk in t h e following
rected network G = (Y, A) in which each arc (i, j) manner. Given a flow xij, we replace each arc (i, j)
has an associated cost cij, we wish to identify a with xij copies of the arc, each arc carrying a unit
walk of m i n i m u m cost that starts at some node flow. In the resulting network, say G ~ - (N, A~),
(the post office), visits each arc of the network at each node has the same number of outgoing arcs
least once, and returns to the starting point (see as it has the incoming arcs. It is possible to decom-
the next Section for the definition of a walk). This pose this network into at most m/2 arc-disjoint di-
problem has become known as the Chinese post- rected cycles (by walking along an arc (i, j) from
m a n problem because a Chinese mathematician, some node i with xij > 0, leaving an node each
K. Mei-Ko, first discussed it. The Chinese post- time we enter it until we repeat a node). We can

294
Minimum cost flow problem

connect these cycles together to form a closed walk A path is a walk without any repetition of nodes,
of the carrier. and a directed path is a directed walk without any
The preceding discussion shows that the solu- repetition of nodes. A cycle is a path il, i 2 , . . . , ir
tion x defined by a feasible walk for the carrier together with the arc (it, il) or (i~, it). A directed
satisfies conditions (4)-(5), and, conversely, every cycle is a directed path il, i 2 , . . . , ir together with
feasible solution of system (4)-(5) defines a walk the arc (it, il). A spanning tree of a directed graph
of the postman. The length of a walk defined by G is a subgraph G' - (N, A') with A' C_ A that is
the solution x equals ~-~(i,j)EA CijXij. This problem connected (that is, contains a path between every
is an instance of the minimum cost flow problem. pair of nodes) and contains no cycle.

Residual network. The algorithms described in this


P r e l i m i n a r i e s . In this Section, we discuss some
article rely on the concept of a residual network
preliminary material required in the following sec-
G(x) corresponding to a flow x. For each arc
tions.
(i, j) E A, the residual network contains two arcs
Assumptions. We consider the minimum cost flow (i, j) and (j, i). The arc (i, j) has cost cij and resid-
problem subject to the following six assumptions: ual capacity rij = u i j - xij, and the arc (j, i) has
1) lij = 0 for each (i, j) E A; cost cji = -cij and residual capacity rji = xij.
The residual network consists of arcs with positive
2) all data (cost, supply/demand, and capacity)
residual capacity. If (i, j) E A, then sending flow
are integral;
on arc (j, i) in G (x) corresponds to decreasing flow
3) all arc costs are nonnegative; on arc (i, j); for this reason, the cost of arc (j, i) is
4) for any pair of nodes i and j, the network the negative of the cost of arc (i, j). These conven-
does not contain both the arcs (i, j) and (j, i); tions show how to determine the residual network
G(x) corresponding to any flow x. We can also de-
5) the minimum cost flow problem has a feasible
solution; and termine a flow x from the residual network G(x) ~
as follows. If r i j > 0, then using the definition
6) the network contains a directed path of suf-
of residual capacities and Assumption 4), we set
ficiently large capacity between every pair of
xij -- Uij - - r i j if (i, j) E A, and xji = rij other-
nodes.
wise. We define the cost of a directed cycle W in
It is possible to show that none of these assump- the residual network G(x) as ~']~(i,j)ew cij.
tions, except 2), restricts the generality of our de-
velopment. We impose them just to simply our dis- Order notation. In our discussion, we will use some
cussion. well-known notation from the field of complexity
theory. We say that an algorithm for a problem 79
Graph Notation. We use standard graph notation. is an O(n 3) algorithm, or has a worst-case complex-
A directed graph G - (N, A) consists of a set N ity of O(n3), if it is possible to solve any instance of
of nodes and a set A of arcs. A directed arc (i, j) 79 using a number of computations that is asymp-
has two endpoints, i and j. An arc (i, j) is inci- totically bounded by some constant times the term
dent to nodes i and j. The arc (i,j) is an out- n 3. We refer to an algorithm as a polynomial time
going arc of node i and an incoming arc of node algorithm if its worst-case running time is bounded
j. A walk in a directed graph G - (N,A) is a by a polynomial function of the input size param-
sequence of nodes and arcs il, hi, i2, a 2 , . . . , ir sat- eters, which for a minimum cost flow problem, are
isfying the property that for all 1 < k < r - 1, ei- n, m, log C (the number of bits needed to specify
ther ak -- (ik, ik+l) E A or ak = (ik+l, ik) E A. We the largest arc cost), and log U (the number of bits
sometimes refer to a walk as a sequence of arcs (or needed to specify the largest arc capacity). A poly-
nodes) without any explicit mention of the nodes nomial time algorithm is either a strongly polyno-
(or arcs). A directed walk is an oriented version of mial time algorithm (when the complexity ;erms
the walk in the sense that for any two consecutive involves only n and m, but not log C or log U), or
nodes ik and i k + l on the walk, ak -- ( i k , ik+l) E A. is a weakly polynomial time algorithm (when the

295
Minimum cost flow problem

complexity terms include log C or log U or both). found a minimum cost flow. Fig. 3 specifies this
We say that an algorithm is a pseudopolynomial generic version of the cycle-canceling algorithm.
time algorithm if its worst-case running time is
BEGIN
bounded by a polynomial function of n, m and U. establish a feasible flow x in the network;
For example, an algorithm with worst-case com- WHILE G(x) contains a negative cycle DO
plexity of O(nm21ogn) is a strongly polynomial BEGIN
time algorithm, an algorithm with worst-case com- identify a negative cycle W;
plexity O(nm 2 log U) is a weakly polynomial time 5 := min{r,j : (i,j) E W};
augment 5 units of flow in the cycle W
algorithm, and an algorithm with worst-case com-
and update G(x);
plexity of O(n2mU) is a pseudopolynomial time END;
algorithm. END

Fig. 3: Cycle-canceling algorithm.


4. C y c l e - C a n c e l i n g A l g o r i t h m . In this Section,
we describe the cycle-canceling algorithm, one of
the more popular algorithms for solving the mini-
mum cost flow problem. The algorithm sends flows ~ .I~ "
(called augmenting flows) along directed cycles
with negative cost (called negative cycles). The
algorithm rests upon the following negative cycle (.) (v)

optimality condition stated as follows.


THEOREM 1 (Negative cycle optimality condi-
tion) A feasible solution x* is an optimal solution
of the minimum cost flow problem if and only if
the residual network G(x*) contains no negative
cost (directed) cycle. [i] Fig. 4: Illustration of the cycle-canceling algorithm, a) the
original network with flow x and arc costs; b) the residual
It is easy to see the necessity of these conditions. network G(x); c) the residual network after augmenting a
If the residual network G(x*) contains a negative unit of flow along the cycle 2 - 1 - 3 - 2; d) the residual
cycle (that is, a negative cost directed cycle), then network after augmenting a unit of flow along the cycle
by augmenting positive flow along this cycle, we 4-5-6-4.

can decrease the cost of the flow. Conversely, it is The numerical example shown in Fig. 4a) illus-
possible to show that if the residual network G(x*) trates the cycle-canceling algorithm. This figure
does not contain any negative cost cycle, then x* shows the arc costs and the starting feasible flow
must be an optimal flow. in the network. Each arc in the network has a ca-
The negative cycle optimality condition sug- pacity of 2 units. Fig. 4b) shows the residual net-
gests one simple algorithmic approach for solving work corresponding to the initial flow. We do not
the minimum cost flow problem, which we call the show the residual capacities of the arcs in Fig. 4b)
cycle-canceling algorithm. This algorithm main- since they are implicit in the network structure. If
tains a feasible solution and at every iteration im- the residual network contains both arcs (i, j) and
proves the objective function value. The algorithm (j, i) for any pair i and j of nodes, then both have
first establishes a feasible flow x in the network residual capacity equal to 1; and if the residual
by solving a related (and easily solved) problem network contains only one arc, then its capacity
known as the maximum flow problem. Then it it- is 2 (this observation uses the fact that each arc
eratively finds negative cycles in the residual net- capacity equals 2). The residual network shown in
work and augments flows on these cycles. The algo- Fig. 4b) contains a negative cycle 1 - 3 - 2 - 1
rithm terminates when the residual network con- with cost - 3 . By augmenting a unit flow along
tains no negative cost directed cycle. Theorem 1 this cycle, we obtain the residual network shown
implies that when the algorithm terminates, it has in Fig. 4c). The residual network shown in Fig. 4c)

296
Minimum cost flow problem

contains a negative cycle 6 - 4 - 5 - 6 with cost - 4 . i) a version that augments flow in arc-disjoint
We augment unit flow along this cycle, producing negative cycles with the maximum improve-
the residual network shown in Fig. 4d), which con- ment [2]; and
tain no negative cycle. Given the optimal residual ii) a version that augments flow along a nega-
network, we can determine optimal flow using the tive cycle with minimum mean cost, that is,
method described in the previous Section. the average cost per arc in the cycle [4]).
A byproduct of the cycle-canceling algorithm is
the following important result.
Successive S h o r t e s t P a t h A l g o r i t h m . The
THEOREM 2 (Integrality property) If all arc ca- cycle-canceling algorithm maintains feasibility of
pacities and supply/demands of nodes are integer, the solution at every step and attempts to achieve
then the minimum cost flow problem always has optimality. In contrast, the successive shortest
an integer minimum cost flow. D path algorithm maintains optimality of the solu-
This result follows from the fact that for problems tion at every step (that is, the condition that the
with integer arc capacities and integer node sup- residual network G(x) contains no negative cost
plies/demand, the cycle-canceling algorithm starts cycle) and strives to attain feasibility. It maintains
with an integer solution (which is provided by the a solution x, called a pseudoflow (see below), that
maximum flow algorithm used to obtain the initial is nonnegative and satisfies the arcs' flow capac-
feasible flow) and at each iteration augments flow ity restrictions, but violates the mass balance con-
by an integral amount. straints of the nodes. At each step, the algorithm
selects a node k with excess supply (i.e., supply
What is the worst-case computational require-
not yet sent to some demand node), a node l with
ment (complexity) of the cycle-canceling algo-
unfulfilled demand, and sends flow from node k to
rithm? The algorithm must repeatedly identify
node l along a shortest path in the residual net-
negative cycles in the residual network. We can
work. The algorithm terminates when the current
identify a negative cycle in the residual network in
solution satisfies all the mass balance constraints.
O(nm) time using a shortest path label-correcting
algorithm [1]. How many times must the generic To be more precise, a pseudo.flow is a vector
cycle-canceling algorithm perform this computa- x satisfying only the capacity and nonnegativity
tion? For the minimum cost flow problem, m C U constraints; it need not satisfy the mass balance
is an upper bound on the initial flow cost (since constraints. For any pseudoflow x, we define the
cij < C and xij <_ U for all (i, j) C A) and imbalance of node i as
- m C U is a lower bound on the optimal flow cost e(i) - b(i) + E xji- E xij (6)
(since cij >_ - C and xij ~ U for all (i,j) C {j,i)EA} {(i,j)EA}
A). Any iteration of the cycle-canceling algorithm
for all i E N.
changes the objective function value by an amount
~-~(i,j)ew ci,j)5, which is strictly negative. Since we If e(i) > 0 for some node i, then we refer to
have assumed that the problem has integral data, e(i) as the excess of node i; if e(i) < 0, then we
the algorithm terminates within O(mCU) itera- refer to -e(i) as the node's deficit. We refer to a
tions and runs in O(nm2CU) time, which is a pseu- node i with e(i) = 0 as balanced. Let E and D
dopolynomial running time. denote the sets of excess and deficit nodes in the
The generic version of the cycle-canceling al- network. Notice that EiEN e(i) - EicN b(i) - O,
gorithm does not specify the order for selecting which implies that ~-~i~E e(i) -- -- ~i~D e(i). Con-
negative cycles from the network. Different rules sequently, if the network contains an excess node,
for selecting negative cycles produce different ver- then it must also contain a deficit node. The resid-
sions of the algorithm, each with different worst- ual network corresponding to a pseudoflow is de-
case and theoretical behavior. Two versions of the fined in the same way that we define the residual
cycle-canceling algorithm are polynomial time im- network for a flow. The successive shortest path
plementations" algorithm uses the following result.

297
Minimum cost flow problem

THEOREM 3 (Shortest augmenting path theorem)


Suppose a pseudoflow (or a flow) x satisfies the
optimality conditions and we obtain x ~ from x by
sending flow along a shortest path from node k to
some other node 1 in the residual network, then x ~
also satisfies the optimality conditions. D
(a)

To prove this Theorem, we would show that if the


.~ -!

residual network G(x) contain no negative cycle, • .t °

then augmenting flow along any shortest path does


not introduce any negative cycle (we will not es-
tablish this result in this discussion). Fig. 5 gives a
formal description of the successive shortest path
algorithm. Fig. 6: Illustration of the successive shortest path
BEGIN algorithm, a) the residual network corresponding to x - 0;
x := 0; b) the residual network after augmenting 2 units of flow
e(i) = b(i) for all i E N; along the path 1 - 2 - 4 - 6; c) the residual network after
initialize the sets E and D; augmenting 2 units of flow along the path 1 - 3 - 5 - 6.
WHILE E ¢ 0 DO
BEGIN To show that the algorithm correctly solves the
select a node k E E and a node l E D; minimum cost flow problem, we argue as follows.
identify a shortest path P in G(x) The algorithm starts with a flow x = 0 and the
from node k to node l; residual network G(x) is identical to the origi-
5 := min[e(s),-e(t),min{ro: (i, j)E P}];
nal network. Assumption 3) implies that all arc
augment ~ units of flow along the path P
and update x and G(x); costs are nonnegative. Consequently, the residual
END network G(x) contains no negative cycle and so
END the flow vector x satisfies the negative cycle op-
Fig. 5: Successive shortest path algorithm. timality conditions. Since the algorithm augments
flow along a shortest p a t h from excess nodes to
The numerical example shown in Fig. 6a) illus-
deficit nodes, Theorem 3 implies that the pseud-
trates the successive shortest path algorithm. The
oflow maintained by the algorithm always satis-
algorithm starts with x - 0, and at this value
fies the optimality conditions. Eventually, node ex-
of flow, the residual network is identical to the
cesses and deficits become zero; at this point, the
starting network. Just as we observed in Fig. 4,
solution maintained by the algorithm is an optimal
whenever the residual network contains both the
flow.
arcs (i,j) and (j, i), the residual capacity of each
arc is 1. If the residual network contains only one W h a t is the worst-case complexity of this algo-
arc, (i, j) or (j,i), then its residual capacity is 2 rithm? In each iteration, the algorithm reduces the
units. For this problem, E - {1} and D = {6}. In excess of some node. Consequently, if U is an up-
the residual network shown in Fig. 6a), the short- per bound on the largest supply of any node, then
est path from node 1 to node 6 is 1 - 2 - 4 - 6 the algorithm would terminate in at most nU itera-
with cost equal to 9. The residual capacity of this tions. We can determine a shortest path in G(x) in
path equals 2. Augmenting two units of flow along O(nm) time using a label-correcting shortest path
this path produces the residual network shown in algorithm [1]. Consequently, the running time of
Fig. 6b), and the next shortest p a t h from node 1 the successive shortest p a t h algorithm is n2mU.
to node 6 is 1 - 3 - 5 - 6 with cost equal to 10. The successive shortest p a t h algorithm requires
The residual capacity of this p a t h is 2 and we aug- pseudopolynomial time to solve the minimum cost
ment two unit of flow on it. At this point, the sets flow problem since it is polynomial in n, m and the
E - D - 0, and the current solution solves the largest supply U. This algorithm is, however, poly-
minimum cost flow problem. nomial time for some special cases of the minimum

298
Minimum cost flow problem

cost flow problem (such as the assignment prob- bounds. Consider the leaf node 4 (a leaf node is a
lem for which U = 1). Researchers have developed node with exactly one arc incident to it). Node 4
weakly polynomial time and strongly polynomial has a supply of 5 units and has only one arc (4, 2)
time versions of the successive shortest path algo- incident to it. Consequently, arc (4, 2) must carry 5
rithm; some notable implementations are due to units of flow. So we set x42 - 5, add 5 units to b(2)
[3] and [5]. (because it receives 5 units of flow sent from node
4), and delete arc (4, 2) from the tree. We now have
N e t w o r k S i m p l e x A l g o r i t h m . The network a tree with one fewer node and next select another
simplex algorithm for solving the minimum cost leaf node, node 5 with the supply of 5 units and
flow problem is an adaptation of the well-known the single arc (5, 2) incident to it. We set x52 = 5,
simplex method for general linear programs. Be- again add 5 units to b(2), and delete the arc (5, 2)
cause the minimum cost flow problem is a highly from the tree. Now node 2 becomes a leaf node
structured linear programming problem, when with modified s u p p l y / d e m a n d of b(5) = - 1 0 , im-
applied to it, the computations of the simplex plying that node 5 has an unfulfilled demand of 10
method become considerably streamlined. In fact, units. Node 2 has exactly one incoming arc (1, 2)
we need not explicitly maintain the matrix repre- and to meet the demand of 10 units of node 2,
sentation (known as the simplex tableau) of the we must send 10 units of flow on this arf. We set
linear program and can perform all of the compu- x12 = 10, subtract 10 units from b(1) (since node 1
tations directly on the network. Rather than pre- sends 10 units), and delete the arc (1,2) from the
senting the network simplex algorithm as a special tree. We repeat this process until we have identi-
case of the linear programming simplex method, fied flow on all arcs in the tree. Fig. 7b) shows the
we will develop it as a special case of the cycle- corresponding flow. Our discussion assumed that
canceling algorithm described above. The primary U is empty. If U were nonempty, we would first set
advantage of our approach is that it permits the xij " - Uij, add U i j to b(j), and subtract uij from
network simplex algorithm to be understood with- b(i) for each arc ( i , j ) E U, and then apply the
out relying on linear programming theory. preceding method.
The network simplex algorithm maintains solu-
tions called spanning tree solutions. A spanning
tree solution partitions the arc set A into three
bJ)
subsets:
1@

i) T, the arcs in the spanning tree; la) I)i

ii) L, the nontree arcs whose flows are restricted Fig. 7: Computing flows for a spanning tree.
to value zero;
iii) U, the nontree arcs whose flow values are re- We say a spanning tree structure is ]easible if
stricted in value to the arcs' flow capacities. its associated spanning tree solution satisfies all
of the arcs' flow bounds. We refer to a spanning
We refer to the triple (T, L, U) as a spanning tree
tree structure as optimal if its associated spanning
structure. Each spanning tree structure (T, L, U)
tree solution is an optimal solution of the min-
has a unique solution that satisfies the mass bal-
imum cost flow problem. We will now derive the
ance constraints (2). To determine this solution,
optimality conditions for a spanning tree structure
we set xij - 0 for all arcs (i, j) E L, xij - uij for
(T,L,U).
all arcs (i, j) E U, and then solve the mass balance
equations (2) to determine the flow values for arcs The network simplex algorithm augments flow
in T. along negative cycles. To identify negative cycles
To show that the flows on spanning tree arcs are quickly, we use the concept of node potentials. We
unique, we use a numerical example. Consider the define node potentials lr(i) so that the reduced
spanning tree T shown in Fig. 7a). Assume that cost for any arc in the spanning tree T is zero.
U = ~, that is, all nontree arcs are at their lower That is, that is, c~ - c i j - 7r(i) + 7r(j) - 0 for

299
Minimum cost flow problem

each (i, j) E T. W i t h the help of an example, we


show how to c o m p u t e the vector 7r of node poten- (i,j)EWkl (i,j)EWkl
tials. Consider the spanning tree shown in Fig. 8a)
Let c~(Wkl) denote the change in the reduced
with arc costs as shown. T h e vector 7r has n vari-
costs if we send one unit of flow in the cycle Wkl
ables and must satisfy n - 1 equations, one for
along its orientation, t h a t is,
each arc in the spanning tree. Therefore, we can
assign one potential value arbitrary. We assume Z
that 7r(1) - 0 . Consider arc (1, 2) incident to node (i,j)EWkl (i,j)EWkl
1. The condition c[2 - c 1 2 - 7r(1)+ 7r(2) - 0 yields It is easy to show t h a t dr(Wkl) -- C(Wkl). This
7r(2) - - 5 . We next consider arcs incident to node result follows from the fact t h a t when we substi-
2. Using the condition cg2 - c52 - 7r(5)+ 7r(2) - 0, tute c~l - cij - 7r(i) + 7r(j) and add the reduced
we see t h a t 7r(5) - - 3 , and the condition c~2 = costs around any cycle, then the node potentials
c32 - ~r(3) + 7r(2) - 0 shows t h a t 7r(3) - - 2 . We 7r(i) cancel one another. Next notice t h a t the man-
repeat this process until we have identified poten- ner we defined node potentials ensures that each
tials of all nodes in the tree T. Fig. 8b) shows the arc in the f u n d a m e n t a l cycle Wkl except the arc
corresponding node potentials. (k,l) has zero reduced cost. Consequently, if arc
e
(k,l) E L, then
(Wkl) -- ( W k l ) -- 4 l ,

and if arc (k, l) C U, then


-I (Wkl) - (Wkl) - -41.

This observation and the negative cycle opti-


mality condition ( T h e o r e m 1) implies that for a
Fig. 8" Computing node potentials for a spanning tree. spanning tree solution to be optimal, it must sat-
isfy the following necessary conditions:

> 0 for every arc (i, j) e L, (7)


Consider any nontree arc (k, 1). Adding this arc
_< 0 for every arc (i, j) e U. (8)
to the tree T creates a unique cycle, which we de-
note as Wkl. We refer to Wkl as the fundamental It is possible to show that these conditions are
cycle induced by the nontree arc (k, 1). If (k, l) C L, also sufficient for optimality; that is, if any span-
then we define the orientation of the f u n d a m e n t a l ning tree solution satisfies the conditions (7)-(8),
cycle as in the direction of (k, l), and if (k, l) e U, then it solves the minimum cost flow problem.
then we define the orientation opposite to t h a t of We now have all the necessary ingredients to
(k, l). In other words, we define the orientation of describe the network simplex algorithm. The algo-
the cycle in the direction of flow change p e r m i t t e d rithm maintains a feasible spanning tree structure
by the arc (k, l). We let c(Wkl) denote the change at each iteration, which it successively transforms
in the cost if we send one unit of flow on the cycle it into an improved spanning tree structure until
Wkl along its orientation. (Notice t h a t because of the solution becomes optimal. The algorithm first
flow bounds, we might not always be able to send obtains an initial spanning tree structure. If an
flow along the cycle Wkl.) Let W kl denote the set initial spanning tree structure is not easily avail-
of forward arcs in Wkl (that is, those with the same able, then we could use the following method to
orientation as (k,/)), and let Wkz denote the set construct one: for each node i with b(i) >__ 0, we
of backward arcs in Wkl (that is, those with an connect node i to node 1 with an (artificial) arc of
opposite the orientation to arc (k, l)). Then, if we sufficiently large cost and large capacity; and for
send one unit of flow along Wkl, then the flow on each node i with b(i) < 0, we connect node 1 to
arcs in W kl increases by one unit and the flow on node i with an (artificial) arc of sufficiently large
arcs in Wkl decreases by one unit. Therefore, cost and capacity. These arcs define the initial tree

300
Minimum cost flow problem

T, all arcs in A define the set L, and U = 0. Since the tree, creating a cycle. Since (3, 5) is at its up-
these artificial arcs have large costs, subsequent per bound, the orientation of the cycle is opposite
iterations will drive the flow on these arcs to zero. to that of (3, 5). The arcs (1, 2) and (2, 5) are for-
BEGIN ward arcs in the cycle and arcs (3, 5) and (1, 3)
determine an initial feasible tree structure are backward arcs. The maximum increase in flow
(T,L,U); permitted by the arcs (3, 5), (1, 3), (1, 2), and (2, 5)
let x be the flow and without violating their upper and lower bounds is,
let r be the corresponding node potentials;
respectively, 3, 3, 2, and 1 units. Thus, we aug-
WHILE (some nontree arc violates
its optimality condition) DO ment 1 unit of flow along the cycle. The augmen-
BEGIN tation increases the flow on arcs (1,2) and (2, 5)
select an entering arc (k,l) by one unit and decreases the flow on arcs (1,3)
violating the optimality conditions; and (3, 5) by one unit. Arc (2, 5) reaches its upper
add arc (k, l) to the spanning tree T,
bound and we select it as the leaving arc. We up-
thus forming a unique cycle Wkz;
augment the maximum possible flow 5 in date the spanning tree structure; Fig. 10c) shows
the cycle Wkl and the new spanning tree T and the new node poten-
identify a leaving arc (p, q) that tials. The sets L and U become L = {(2, 3), (5, 4)}
reaches its lower or upper flow bound; and U = {(2, 5), (4, 6)}. In the next iteration, we
update the flow x, select arc (4, 6) since this arc violates the arc opti-
the spanning tree structure (T, L, U)
mality condition. We augment one unit flow along
and the potentials 7r;
END; the c y c l e 6 - 4 - 2 - 1 - 3 - 5 - 6 a n d a r c (3, 5)
END leaves the spanning tree. Fig. 10d) shows the next
spanning tree and the updated node potentials. All
Fig. 9: The network simplex algorithm.
nontree arcs satisfy the optimality conditions and
Given a spanning tree structure (T,L, U), we the algorithm terminates with an optimal solution
first check whether it satisfies the optimality con- of the minimum cost flow problem.
ditions (7) and (8). If yes, we stop; otherwise, we
select an arc (k,l) E L or (k,l) E U violating
its optimality condition as an entering arc to be
added to the tree T, obtain the fundamental cycle '%. I \ " 1 f"
Wkl induced by this arc, and augment the max- .1 .~

imum possible flow in the cycle Wkl without vi- (,) (~)

olating the flow bounds of the tree arcs. At this


value of augmentation, the flow on some tree arc, -J -41 .3 " ~ ~ ~ --11

say arc (p, q), reaches its lower or upper bound; we .18

select this arc as an arc to leave the spanning tree


T, adding it added to L or U depending upon its •Z 1 4 .2
J .7

flow value. We next add arc (k, l) to T, giving us a


new spanning tree structure. We repeat this pro-
Fig. 10: Numerical example for the network simplex
cess until the spanning tree structure satisfies the algorithm.
optimality conditions. Fig. 9 specifies the essential
steps of the algorithm. The network simplex algorithm can select any
To illustrate the network simplex algorithm, we nontree arc that violates its optimality condition
use the numerical example shown in Fig. 10a). as an entering arc. Many different rules, called
Fig. 10b) shows a feasible spanning tree so- pivot rules, are possible for choosing the enter-
lution for the problem. For this solution, ing arc, and these rules have different empirical
T = {(1,2),(1,3),(2,4),(2,5),(5,6)}, n = and theoretical behavior. [1] describes some popu-
{(2, 3), (5, 4)} and U - ((3, 5), (4, 6)}. We next lar pivot rules. We call the process of moving from
compute c~5 - 1. We introduce the arc (3, 5) into one spanning tree structure to another as a pivot

301
Minimum cost flow problem

operation. By choosing the right data structures [3] EDMONDS, J., AND KARP, R.M.: 'Theoretical improve-
for representing the tree T, it is possible to per- ments in algorithmic efficiency for network flow prob-
lems', J. A CM 19 (1972), 248-264.
form a pivot operation in O(m) time.
[4] GOLDBERG, A.V., AND TARJAN, R.E.: 'Finding
To determine the number of iterations per- minimum-cost circulations by canceling negative cy-
formed by the network simplex algorithm, we dis- cles': Proc. 20th A CM Symposium on the Theory of
tinguish two cases. We refer to a pivot operation Computing, 1988, pp. 388-397, Full paper: J. ACM 36
as nondegenerate if it augments a positive amount (1989), 873-886.
of flow in the cycle Wkl (that is, 5 > 0), and degen-
[5] ORLIN, J.B.: 'A faster strongly polynomial minimum
cost flow algorithm': Proc. 20th A CM Syrup. Theory of
erate otherwise (that is, ~ = 0). During a degen- Computing, 1988, pp. 377-387, Full paper: Oper. Res.
erate pivot, the cost of the spanning tree solution 41 (1989), 338-350.
decreases by Ic~llS. When combined with the inte- [6] ORLIN, J.B.: 'A polynomial time primal network sim-
grality of data assumption (Assumption 2) above), plex algorithm for minimum cost flows', Math. Pro-
gram. 78B (1997), 109-129.
this result yields a pseudopolynomial bound on the
number of nondegenerate iterations. However, de- Ravindra K. Ahuja
generate pivots do not decrease the cost of flow Dept. Industrial and Systems Engin. Univ. Florida
Gainesville, FL 32611, USA
and so are difficult to bound. There are methods to
E-mail address: ahuja©ufl.edu
bound the number of degenerate pivots. Obtaining
Thomas L. Magnanti
a polynomial bound on the number of iterations re- Sloan School of Management
mained an open problem for quite some time; [6] and
suggested an implementation of the network sim- Dept. Electrical Engin. and Computer Sci.
plex algorithm that runs in polynomial time. In Massachusetts Inst. Technol.
any event, the empirical performance of the net- Cambridge, MA 02139, USA
E-mail address: magnanti©mit.edu
work simplex algorithm is very attractive. Empir-
James B. Orlin
ically, it is one of the fastest known algorithms for
Sloan School of Management
solving the minimum cost flow problem. Massachusetts Inst. Technol.
See also: N o n c o n v e x n e t w o r k flow prob- Cambridge, MA 02139, USA
lems; Traffic n e t w o r k e q u i l i b r i u m ; N e t w o r k E-mail address: jorlinOmit.edu
location: C o v e r i n g p r o b l e m s ; M a x i m u m flow MSC 2000:90C35
problem; Shortest path tree algorithms; Key words and phrases: network, minimum cost flow prob-
S t e i n e r t r e e p r o b l e m s ; E q u i l i b r i u m net- lem, cycle-canceling algorithm, successive shortest path al-
works; S u r v i v a b l e networks; D i r e c t e d t r e e gorithm, network simplex algorithm.
networks; D y n a m i c traffic networks; Auc-
tion a l g o r i t h m s ; P i e c e w i s e linear n e t w o r k
flow p r o b l e m s ; C o m m u n i c a t i o n n e t w o r k as- MINLP: A P P L I C A T I O N IN FACILITY
s i g n m e n t p r o b l e m ; G e n e r a l i z e d networks; LOCATION-ALLOCATION
E v a c u a t i o n networks; N e t w o r k design p r o b - The location-allocation problem may be stated in
lems; S t o c h a s t i c n e t w o r k p r o b l e m s : Mas- the following general way: Given the location or
sively parallel solution; M u l t i c o m m o d i t y distribution of a set of customers which could be
flow p r o b l e m s ; N o n o r i e n t e d m u l t i c o m m o d - probabilistic and their associated demands for a
ity flow p r o b l e m s . given product or service, determine the optimal
locations for a number of service facilities and the
allocation of their products or services to the cos-
References tumers, so as to minimize total (expected) loca-
[1] AHUJA, R.K., MAGNANTI, T.L., AND ORLIN, J.B.: tion and transportation costs. This problem finds
Network flows: Theory, algorithms, and applications,
a variety of applications involving the location of
Prentice-Hall, 1993.
[2] BARAHONA,F., AND TARDOS, E.: 'Note of Weintraub's
warehouses, distribution centers, service and pro-
minimum cost circulation algorithm', SIAM J. Corn- duction facilities and emergency service facilities.
put. 18 (1989), 579-583. In the last section we are going to consider the de-

302
MINLP: Application in facility location-allocation

velopment of an offshore oil field as a real-world ap- n p

plication of the location-allocation problem. This min C - ~ ~ Oi)~ijcij


i=1 j=l
problem involves the location of the oil platforms p
and the allocation of the oil wells to platforms. s.t. ~ )~ij - - 1 , i - - 1 , . . . ,n,
It was shown in [25] that the joint location- j=l

allocation problem is NP-hard even with all the )~ij - O, 1, i-1,...,n,j-1,...,p,


demand points located along a straight line. In the where Oi is the quantity demanded at location i
next section alternative location-allocation models whose coordinates are (xi, yi); and Aij is the binary
will be presented based on different objectives and variables that is assigned the value of 1 if demand
the incorporation of consumer behavior, price elas- point i is located to center j and zero otherwise.
ticity and system dynamics within the location- The above formulation allocate the consumers to
allocation decision framework. their nearest center while ensuring that only one
center will serve each customer. This however, can
lead to disproportionally sized facilities. In the
more realistic situation where the capacities of the
facilities are limited to supplies of s l , . . . , sn for
i - 1 , . . . , n facilities then the location-allocation
problem takes the following form [24]:
L o c a t i o n - A l l o c a t i o n M o d e l s . In developing
n p
location-allocation models different objectives al- min C -- ~ y~ WijCij
ternatives are examined. One possibility is to fol- i=1 j=l
low the approach in [5], to minimize the number P

of centers required to serve the population. This s.t ~ wij - si, i -1, . . . , n,
objective is appropriate when the demand is ex- j=l II

ogenously fixed. A more general objective is to ~ wij - d i, j - 1 , . . . ,p,


maximize demand by optimally locating the cen- i=1
ters as proposed in [10]. The demand maximiza- £~j=0,1, i=l,...,n,j=l,...,p,
tion requires the incorporation of price elasticity where wij is the amount shipped from facility i lo-
representing the dependence of the costumer pref- cated at (xi, yi) to destination j. In the above for-
erence to the distance from the center. The cost of mulations the distance (or the generalized trans-
establishing the centers can also be incorporated port cost, which is assumed to be proportional to
in the model as proposed in [13]. An alternative distance) between the demand point i and the sup-
objective towards the implementation of costumer ply point j is represented by cij. The Euclidean
preference towards the nearest center is the mini- metric:
mization of an aggregated weighted distance which
)2
is called the median location-allocation problem.
The simplest type of location-allocation prob- or the rectilinear metric"
lem is the Weber problem, as posed in [9], which
involves locating a production center so as to min-
imize aggregate weighted distance from the dif- The rectilinear metric is appropriate when the
ferent raw material sources. The extension of the transportation is occurring along a grid of city
Weber problem is the p - m e d i a n location-allocation streets (Manhattan norm) or along the aisles of
problem, which involves the optimal location of a a floor shop [8].
set of p uncapacitated centers to minimize the to- The aforementioned location-allocation models
tal weighted distance between them and n demand are based on the assumption that the consumers
locations. Here, each source is assumed to have in- always prefer the nearest center to obtain service.
finite capacity. In continuous space, the p-median In reality however, as reported in the literature
problem can be formulated as follows: from several empirical studies [11] there exist sev-

303
MINLP: Application in facility location-allocation

eral services for which consumers choose their ser- from i to any other facility and zero otherwise.
vice facility center. The travel patterns of the con- Therefore, the Sij tends to OiXij and this model
sumers for example can produce a variety of al- allocates the demand to the nearest facility as the
locations that differ from the nearest center rule. original p-median problem.
In order to accommodate such behavior a spatial- All the models mentioned above consider the
interaction model is incorporated within the un- static location-allocation problem where all the ac-
capacitated p-median location-allocation model in tivities take place at one instance. These formula-
the following manner: tions are sufficient if neither the level nor the lo-
1 cation of demand alters over time. An important
min ~ ~ ~ Sij log(Sij - 1) factor however, in any location-allocation problem
j i
is the dynamics of the system involving demand
-~ ~ ~ Yj Sijcij
changes over time. Particularly, in the competi-
j i
tive environment, an optimal center location could
s.t. ~-~YjSij - Oi, i - 1,...,n,
become undesirable as new competing centers de-
J
velop. Potential directions include the literature
on decision making under uncertainty, [12]. A.J.
J
i=l,...,n, j=l,...,p, Scott [18] proposed a general framework for the
integration of the spatial and discrete temporal
Yj = O, 1, j=l,...,p,
dimensions in the location-allocation models. He
where the decision variables include Yj which takes proposed a modification of the location-allocation
the value of one if the facility is located at J mod- so as to minimize an aggregate weighted transport
els. and zero otherwise; cost over T time periods, during which time the
&j - AiOi]~ exp(-13cij ) number nt, level Oit and the location (xit, yit) of
the demand points change. If the locations were
that defines the interaction of facility i and con- greatly different the center would be likely to re-
sumer j. locate at some time and costs of relocation are
1 included in the model. It was assumed that when
Ai = ~--~lYl exp(-/3cil)' i - 1 , . . . , m, a center relocates it incurs a fixed cost, a. Based
that ensures that the sum of all outflows from the on these ideas the formulation proposed for the
origin i add up to the amount of demand at that lo- uncapacitated location-allocation problem has the
cation;/3 is either calibrated to match some known following form:
interaction data or is defined exogenously. The fol- nl

lowing relationship holds between the original p- min al + ~ Oilcijl


median model and the spatial-interaction model as i
shown in [17]. The value of the optimal objective
function at the solution of the p-median problem +S, t i--1
is given by:
s.t. ~t=0,1, t=2,...,T,
~ ~ OiXij cij ,
i j where the subscript t refers to different time pe-
riods, al is the cost of establishing the center in
where Xij allocates demand to the nearest of p
the first time period. The problem as formulated
available centers. Turning to spatial-interaction
above is to locate in the first period one center
model, as the impedance parameter/3 increases the
that takes into account future variations. Extend-
term:
ing the aspects of this model allows the replace-
Yj exp(-13cij ) ment of a truly dynamic model by a series of static
)-~l Yl exp(-/3ca) problems as proposed in [3], thus outlining a mul-
of the Sij tends to Xij, where Xij - 1 if the travel tilayer approach, where the objective is to sequen-
time from i to j is smaller that the travel time tially locate each period's facility given the previ-

304
MINLP: Application in facility location-allocation

ous period's facility locations in order to minimize ear distance location problem always has an
the present period cost. This strategy is appropri- optimal solution with the sources located at
ate whenever the period durations are sufficiently the grid points of the vertical and horizontal
long or under uncertainty regarding future data lines drawn through the existing customer lo-
or decisions. An alternative approach proposed in cations; and
[24] is a discounted present worth strategy which b) the optimal source locations lie in the convex
is appropriate whenever the foregoing conditions hull of the existing facility locations.
do not hold. In this case the facilities are being
Based on these ideas and by denoting k = 1 , . . . , K
located one per period and the decisions are made
the intersection grid points that also belong to the
in a rolling horizon framework.
convex hull of the existing facility locations, [21],
introduced the decision binary variables zik that
S o l u t i o n A p p r o a c h e s . For the uncapacitated take the value of 1 if source i is located at point
location-allocation problem using Euclidean met- k and zero otherwise. This leads to the following
ric for the distances between each facility and the discrete location-allocation problem:
different demand points, R.F. Love and H. Juel [15] n p K
showed that this problem is equivalent to a concave min Z ~ ~ CijkWijZik
minimization problem for which they used several i--1 j = l k = l
K
heuristic procedures. For the capacitated problems
s.t. ~ Zik - - 1 , i --1,... ,n,
assuming that the costs are proportional to Iq us-
k=l
ing lp distances where p >_ 1 and q > 1 are integers, P
M. Avriel [1] developed a geometric programming Z Wij -- Si,
j=l
i -- 1 , . . . , n,
approach. H.D. Sherali and C.M. Shetty [22] pro- n

posed a polar cutting plane algorithm for the case w~j-dj, j-1,...,p,
p - q - 1. For the case p - q - 2, Sherali and C.H. i=1
Tuncbilek [23] proposed a branch and bound algo- w~j>_O, i-1,...,n, j-1,...,p,
rithm (cf. M I N L P : B r a n c h a n d b o u n d m e t h - Zik -- O, 1, i--1,...,n,
ods; M I N L P : B r a n c h a n d b o u n d global op-
t i m i z a t i o n a l g o r i t h m ) that utilizes a specialized where Cijk -- cij[lak - a j [ + I~k --/3jl]" The above
model corresponds to a mixed integer bilinear pro-
tight, linear programming representation to calcu-
gramming problem. See [19] for a related version
late strong upper bounds via a Lagrangian relax-
of this discrete-site location-allocation problem
ation scheme. They exploit the special structure
involving one-to-one assignment restriction and
of the transportation constraints to derive a parti-
fixed charges. See [20] for the solution of the prob-
tioning scheme. Additional cut-set inequalities are
lem as a bilinear programming problem, since the
also incorporated to preserve partial solution.
binary variables z can be treated as positive vari-
For the uncapacitated location-allocation model
ables because of the problem structure that pre-
using rectilinear distance metric Love and J.G.
serves the binariness of z at optimality. However,
Morris [16] have developed an exact two-stage al-
in [21] it is proved that it is more useful to exploit
gorithm. R.E. Kuenne and R.M. Soland [14], have
the binary nature of z variables for the efficient so-
developed a branch and bound algorithm based on
lution of the above model. Before giving more de-
a constructive assignment of customers to sources.
tails of this proposed branch and bound based ap-
The capacitated problem has been addressed in
proach we should mention the heuristic approach
[19], [21] and utilize the discrete equivalence of the
proposed in [4], which is very widely used. This so-
capacitated location-allocation problem. In partic-
called alternating procedure exploited the funda-
ular, [8], and [26] showed that
mental concepts of the location-allocation problem
a) the optimal values of xi and Yi for each i and simply involves allocating demand to centers
must satisfy xi - a j for some j and yi = 3j and relocating centers until some convergence cri-
for some j, which means that the rectilin- terion is achieved. For the uncapacitated p-median

305
MINLP: Application in facility location-allocation

problem, the alternating procedure involves iterat- lower bounds via a suitable Lagrangian dual for-
ing through the following equations: mulation.
Briefly, for the location-allocation problems
~in=l O i ~ i j X i / C i j
= '
that have embedded spatial-interaction equations
dual-based exact methods, [17], and heuristic ap-
~in=l Oi)~ijYi/Cij
proaches, [2], have been developed.
YJ -- ~in=l Oi,'~ij/aij '

which are derived from differentiating the objec-


tive function with respect to xj and yj and setting
the partial derivatives to zero. The major draw-
back of this procedure is that it does not guarantee A p p l i c a t i o n : D e v e l o p m e n t of Offshore Oil
global optimality. This is in fact a concern because Fields. In this section a real world application of
the spatial configuration of the local and the global the location-allocation problem is presented con-
optimum may be very different. As a rule, repeated sidering the minimum-cost development of offshore
runs using numerous starting values should be un- oil fields, [6]. The facilities to be located are the
dertaken, although there is no guarantee that the platforms and the demands to be allocated are
repeatedly found solution would be the global opti- the oil wells. For the initial information about an
mum. Note however that the procedure is general oil field, locations are decided upon the produc-
to all different models of the location-allocation tion wells which are specified by two map coor-
problem. dinates and a depth coordinate. The drilling is
Returning to the approach proposed in [21] performed directionally from fixed platforms. The
for the case of rectilinear capacitated location- cost of drilling depends on the length and angle
allocation problem, the following linear reformu- of the well from the platform. The platform cost
lation of the problem is used: depends on the water depth and on the number of
n p K wells to be drilled from the platform. Consequently
min ~ Z ~ CijkXijk for a large number of wells (25 to 300) an optimi-
i=1 j = l k = l zation problem that arises is to find the number,
K
size and location of the platforms and the alloca-
s.t. ~ Xijk - wij -- O, V(i,j), tion of wells to platforms so as to minimize the
k=l
P sum of platform and drilling costs.
X i j k - 8iZik -- O, V(i, k), In order to formulate this problem the following
j=l indices, parameters and variables are introduced.
-- X i j k -Jr-Uij Zik ~_ O, V(i,j), Let m denote the number of wells and i the in-
K
dex of well, n the number of platforms and j the
~ Zik = 1, Vi,
index for platform, zij are then the binary vari-
k=l
P ables that represent the allocation of the well i
~ Wij -" 8i, Vi, to platform j if it takes the value of 1, otherwise
j=l it becomes 0, Sj the capacity of the platform j
n

wij - dj , Vj, representing the number of wells drilled from this


i=1 platform, (ai, bi) denote the location coordinates
wij > O, V(i,j), of well i and ( x j , y j ) the location of platform j,
..

- - + (yj - is horizontal
zik -- 0,1, Vi,
Euclidean distance between well i and platform j,
x jk > 0, v(i,j, k),
g(dij) denotes the drilling cost function that de-
where uij -- min{si, dj }. The above model cor- pends on distance dij, P ( S j , xj, yj) is the platform
responds to a mixed integer linear programming cost which is a function of platform size Sj and
problem for which a special branch and bound al- its location. Based on this notation the location-
gorithm is applied based on the derivation of tight allocation problem can be formulated as follows:

306
MINLP: Application in facility location-allocation

m n n m n n

rain E E E p(sj, xj, yj)


i=1 j=l j=l i=1 j=i j=i
n n

s.t. Z 1, v(i), s.t. Zz J = 1, v(i),


j=l j=l
m m

Z - sj, vj, Z - sj, vj,


i=1 i=1
zij - O, 1, V(i,j), zij - 0,1, V(i,j),

note that the platform cost now depends only on


Sj since the location of the platforms are known.
where the first set of constraints guarantee that The solution procedure for this problem depends
each well is assigned to exactly one platform and on the form of the platform cost P ( S j ) . Five dif-
the second set guarantee that exactly Sj wells are ferent forms are discussed in [6]"
assigned to each platform. Note that n is fixed in 1) Single fixed cost with no capacity constraints"
the problem and is usually small in the size of 3 P(Sj) - aj In this case the total cost for
to 5. The nature of the problem depends upon platforms is fixed and the optimal allocation
the form of the cost of the drilling function and corresponds to the assignment of the wells to
the platform cost function. The approach taken the closest platform.
in [6] is the alternating location-allocation method
2) Single fixed cost with capacity constraints"
presented in the previous section. For the specific
P ( S j ) = aj and capacity constraints are in-
problem the approach involves the following steps:
troduced as inequalities ~im__l zij ~_ Sj, Vj. In
this case the problem corresponds to a linear
a) given fixed platform locations find a mini- programming model.
mum cost allocation of wells to platforms; 3) Linear platform cost" P ( S j ) - aj + bjSj
By considering the following transformation
c~j - cij + bj the problem takes the form of
b) given fixed allocation of wells to platforms
case 1).
find the minimum total cost location for each
platform. 4) Piecewise linear function. In this case the
problem has the structure of 'transshipment
problem' which can be solved network flow
The procedure alternates between steps a) and b) techniques.
until convergence is achieved. The convergence cri-
terion is the following: From the solution of step a) 5) Step function • P ( S j ) - ~ k gj
= l rjk z ikj ~ where
a set of n subproblems are generated for each one K j are the number of different size platforms
of the platforms, the solution of these problems available and r k is the cost of kth size of plat-
result in the relocation of the platforms. The iter- form j. The problem in this case is a mixed
ations continue until no changes are possible. As integer linear programming problem.
mentioned above, the solution obtained from this The mathematical formulation for problem b),
algorithmic procedure is locally optimum in the the location problem, is the following. Assuming
sense that for a given assignment of wells to plat- that Aj is the set of indices for the wells assigned
forms the solution cannot be improved by chang- to platform j, then Zij 1, for i E Aj, zij - 0 oth-
- -

ing locations and for given locations, the so!ution erwise and the problem for platform j takes the
cannot be improved by altering the assignment of form:
wells to platforms. The mathematical formulation m

of problem a), the allocation subproblem is the fol- rain E E g(dij ) + P ( x j , yj ).


lowing: i= 1 iEAj

307
MINLP: Application in facility location-allocation

Note that the platform cost is a function of plat- n e t w o r k synthesis; M I N L P : Reactive dis-
form location only since the size is assumed known. tillation column synthesis; M I N L P : Design
Since the drilling cost function is convex, if the and scheduling of b a t c h processes; M I N L P :
platform cost is also convex then the problem cor- Applications in t h e i n t e r a c t i o n of design
responds to the minimization of a convex function and control; Generalized B e n d e r s decom-
that can be achieved through a local minimiza- position; M I N L P : Applications in blending
tion algorithm. Of course if the platform cost is and pooling problems.
nonconvex then global optimality cannot be guar-
anteed and global optimization techniques should References
be considered, [7]. [1] AVRIEL, M.: 'A geometric programming approach to
Finally, M.D. Devine and W.G. Lesso, [6], ap- the solution of locational problems', J. Reg. Sci. 20
plied the aforementioned procedure to two test (1980), 239-246.
[2] BEAUMONT, J.R.: 'Spatial interaction models and the
problems one involving 60 wells and 7 platforms
location-allocation problem', J. Reg. Sci. 20 (1980),
and a second one involving 102 wells and 3 plat- 37-50.
forms. In both cases they reported large economic [3] CAVALIER, T.M., AND SHERALI, H.D.: 'Sequential
savings in the field development. location-allocation problems on chains and trees with
See also: C o m b i n a t o r i a l o p t i m i z a t i o n al- probabilistic link demands', Math. Program. 32 (1985),
249-277.
g o r i t h m s in resource allocation problems;
[4] COOPER, L.: 'Heuristic methods for location-allocation
Optimizing facility location with rectilin- problems', SIAM Rev. 6 (1964), 37-53.
ear distances; Single facility location: Multi- [5] CRISTALLER, W.: Central places in southern Germany,
objective Euclidean distance location; Sin- Prentice-Hall, 1966.
gle facility location: M u l t i - o b j e c t i v e recti- [6] DEVINE, M.D., AND LESSO, W.G.: 'Models for the
minimum cost development of offshore oil fields', Man-
linear distance location; Single facility lo-
agem. Sci. 18 (1972), 378-387.
cation: Circle covering problem; Multifacil- [7] FLOUDAS, C.A.: 'Deterministic global optimization in
ity and r e s t r i c t e d location problems; Net- design, control, and computational chemistry', IMA
work location: Covering problems; Ware- Proc.: Large Scale Optimization with Applications.
house location problem; Facility location Part H: Optimal Design and Control 93 (1997), 129-
with externalities; P r o d u c t i o n - d i s t r i b u t i o n 184.
[8] FRANCIS, R.L., AND WHITE, J.A.: Facility layout and
s y s t e m design problem; Global optimiza-
location: An analytical approach, Prentice-Hall, 1974.
tion in W e b e r ' s p r o b l e m with a t t r a c t i o n [9] FRIEDRICH, C.J.: Alfred Weber's theory of the location
and repulsion; Facility location with stair- of industries, Univ. Chicago Press, 1929.
case costs; Stochastic t r a n s p o r t a t i o n and [1o] GETIS, A., AND GETIS, J.: 'Cristaller's central place
location problems; Facility location prob- theory', J. Geography 65 (1966), 200-226.
lems with spatial interaction; Voronoi di- [11] HUBBARD, M.J.: 'A review of selected factors condi-
tioning consumer travel behavior', J. Consumer Res. 5
a g r a m s in facility location; Resource allo- (1978), 1-21.
cation for epidemic control; C o m p e t i t i v e [12] IERAPETRITOU, M.G., ACEVEDO, J., AND PIS-
facility location; Chemical process plan- TIKOPOULOS, E.N.: 'An optimization approach for pro-
ning; Mixed integer linear p r o g r a m m i n g : cess engineering problems under uncertainty', Comput-
Mass and heat exchanger networks; Mixed ers Chem. Engin. 20 (1996), 703-709.
integer nonlinear p r o g r a m m i n g ; M I N L P "
[13] KOSHAKA, R.E.: 'A central-place model as a two-
level location-allocation system', Environm. Plan. 15
O u t e r a p p r o x i m a t i o n algorithm; General- (1983), 5-14.
ized o u t e r approximation; M I N L P : Gener- [14] KUENNE, R.E., AND SOLAND, R.M.: 'Exact and ap-
alized cross decomposition; E x t e n d e d cut- proximate solutions to the multisource Weber prob-
ting plane algorithm; M I N L P : Logic-based lem', Math. Program. 3 (1972), 193-209.
methods; M I N L P : B r a n c h and b o u n d m e t h - [15] LOVE, R.F., AND JUEL, H.: 'Properties and solu-
tion mathods for large location-allocation problems',
ods; M I N L P : B r a n c h and b o u n d global opti- J. Oper. Res. Soc. 33 (1982), 443-452.
mization algorithm; M I N L P : Global optimi- [16] LOVE, R.F., AND MORRIS, J.G.: 'A computational
zation with aBB; M I N L P : H e a t exchanger procedure for the exact solution of location-allocation

308
MINLP: Applications in blending and pooling problems

problems with rectangular distances', Naval Res. Lo- store the intermediate streams produced by vari-
gist. Quart. 22 (1975), 441-453. ous processes. Also, chemical products often need
[17] O'KELLY, M.: 'Spatial interaction based location- to be t r a n s p o r t e d as a mixture, either in a pipeline,
allocation models', in A. GHOSH AND G. RUSHTON
a t a n k car or a tanker. In each case, blended or
(eds.): Spatial Analysis and Location Allocation Mod-
els, v. Nostrand, 1987, pp. 302-326. pooled streams are t h e n used in further down-
[ls] SCOTT, A.J.: 'Dynamic location-allocation systems: s t r e a m processing. In modeling these processes, it
Some basic planning strategies', Environm. Plan. 3 is necessary to model not only p r o d u c t flows but
(1971), 73-82. the properties of intermediate streams as well. The
[19] SHERALI, A.D., AND ADAMS, W.P.: 'A decomposition
presence of these pools can introduce nonlineari-
algorithm for a discrete location-allocation problem',
Oper. Res. 32 (1984), 878-900. ties and nonconvexities in the model of the process,
[20] SHERALI, A.D., AND ALAMEDDINE, A.R.: 'A new resulting in difficult problems with multiple local
reformulation-linearization technique for the bilinear optima.
programming problems', J. Global Optim. 2 (1992),
379-410.
[21] SHERALI, A.D., RAMACHANDRAN,S., AND KIM, S.: 'A
localization and reformulation discrete programming

[22]
approach for the rectilinear discrete location-allocation
problem', Discrete Appl. Math. 49 (1994), 357-378.
SHERALI, A.D., AND SHETTY, C.M.: 'The rectilinear
A, ......... ...... PoOtS~j~~i:;;:~
s D,
"'-. 1 .-'""" .'""
distance location-allocation problem', AIIE Trans. 9
(1977), 136-143. Co ts
[23] SHERALI, A.D., AND TUNCBILEK, C.H.: 'A squared-
Euclidean distance location-allocation problem', Naval A~ O~
Res. Logist. 39 (1992), 447-469.
[24] SHERALI, H.D.: 'Capacitated, balanced, sequential
location-allocation problems on chain and trees', Math.
Program. 49 (1991), 381-396.
[25] SHERALI, H.D., AND NORDAI, F.L.: 'NP-hard, capac-
itated, balanced p-median problems on a chain graph
~ D~
with a continuum of link demands', Math. Oper. Res.
13 (1988), 32-49.
[26] WENDELL, R.E., AND HURTER, A.P.: 'Location the- Fig. 1: General pooling and blending problem.
ory, dominance and convexity', Oper. Res. 21 (1973),
314-320.
Marianthi Ierapetritou
Dept. Chemical and Biochemical Engin. Rutgers Univ.
98 Brett Road
Piscataway, NJ 08854, USA
E-mail address: mariemth@sol.rutgers.edu Given a set of c o m p o n e n t s i, a set of products
Christodoulos A. Floudas j, a set of pools k and a set of qualities l, let xil
Dept. Chemical Engin. Princeton Univ. be the a m o u n t of c o m p o n e n t i allocated to pool l,
Princeton, NJ 08544-5263, USA Ytj be the a m o u n t going from pool 1 to p r o d u c t j,
E-mail address: floudasOtitan, princeton, edu zij be the a m o u n t of c o m p o n e n t i going directly
MSC 2000:90C26 to p r o d u c t j and Ptk be the level of quality k in
Key words and phrases: MINLP, facility location- pool l. F u r t h e r m o r e , let Ai, D j and St be u p p e r
allocation. b o u n d s for c o m p o n e n t availabilities, p r o d u c t de-
m a n d s and pool sizes respectively, let Cik be the
level of quality k in c o m p o n e n t i, Pjk be upper
MINLP: A P P L I C A T I O N S IN B L E N D I N G b o u n d s on p r o d u c t qualities, ci be the unit price
A N D POOLING P R O B L E M S of c o m p o n e n t i and dj be the unit price of prod-
Pooling and blending is inherent in m a n y manu- uct j. T h e general pooling and blending model can
facturing plants with limited tankage available to then be w r i t t e n as [1]:

309
MINLP: Applications in blending and pooling problems

have values of 9 and 15, respectively. The mathe-


max - ~ ' cixil + E djy,j + ~-'~(dj - ci)zij
i ,l l ,j ~ ,j
matical model for the problem consists of writing
mass and sulfur balances for the various streams,
s.t. Exi, + Ezij ~_ ai
l j and can be formulated as

E ylJ + E z i J ~_ Dj
l i m&x 9. (Yll + z31) + 15. (Y12 + z32)
- 6 X l l - 13x21 - 10. (z31 + z32)
i j
s.t. x11+x21-Yll-Y12 =0
xil <_ Sl
i P ' Y l l + 2z31 - 2.5(yll + z31) _ 0
-- E
i
CikXil + Plk EJ YlJ - 0 P'Yl2 + 2z32 - 1.5(Y12 + z32) _ 0
P" (Yll + Y12) - 3Xll - x21 = 0
E ( P , k - Pjk )Y,j Yll + Z31 ~ 100
l
+ - Pjk)z j < o Y12 + Z32 ~ 200.
i
xil , YIj , zij , Plk ~_ O. The variable p represents the sulfur content of the
pool (and of yll and y12) and is determined as an
The first two sets of constraints ensure that the average of the sulfur contents of Xll and x21.
amount of components used and products made do
not exceed the respective availabilities or demands.
The third and fourth set of constraints are mate-
rial balance constraints around each pool, which
ensure that there is no accumulation or overflow of
material in the pools. The fifth set of constraints
relates the quality of each pool to the quality of
the components going into the pool (in this case,
the qualities are assumed to blend linearly, that ° ° . . . . . . -*-

is, the pool quality is an average of the qualities


of the components). Finally, the sixth set of equa- Fig. 2: Haverly pooling problem.
tions ensures that any upper bound specifications
on product qualities are met. These last two sets
of equations are bilinear, and can cause significant Characteristics of P o o l i n g and Blending
problems in solving these models. Problems.
The general blending problem has a similar for-
mulation as above, except that the pools need not Multiple Solutions. The presence of nonconvex
be present; the components can be blended di- constraints needed to define pool and product
rectly to make various products. It should also be qualities often results in multiple local solutions
noted that there are various other formulations in these models. For example, consider the opti-
possible, involving multiple time periods, tanks mal solution of the Haverly pooling problem as a
and inventories for components and products, and function of the pool qualityp, as shown in Fig. 3.
costs for pooling. Moreover, not all the compo- It can be seen that the problem has three solu-
nents need go through all pools. One example of a tions:
simplified pooling model, due to C.A. Haverly [8],
1) A local maximum of 125 at p - 2.5 with
[9], is given in Fig. 2, where three components with
Xll -- 75, X21 -- 25, Yll - 100 and all other
varying sulfur contents are to be blended to form
variables zero;
two products. There is a maximum sulfur restric-
tion on each product. The components have values 2) a saddle point region with 1 < p < 2, all flows
of 6, 13 and 10, respectively, while the products zero and profit of zero; and

310
MINLP: Applications in blending and pooling problems

3) a global maximum of 750 at p - 1.5 with Then, all specifications on the blend RVP can be
Xll - 50, x21 - 150, y12 - 200 and all other converted using the same index. For example, if
variables zero. there is a lower bound R L on the blend RVP, then
using the blending index results in the constraints
It is not uncommon for a large pooling problem
as:
to have many dozen local optima, with the objec-
tive function varying by small amounts but with
all the flow and quality variables taking on vastly
different values.
In some cases, the properties (such as octane num-
800.0
ber or pour point) can require complex blending
J rules which cannot be simplified using the blend-
ing index, and the full nonlinear blending equation
600.0 must be included in the model as is.

0 Single versus Multiperiod Models. Since compo-


~> nents are pooled or blended in the plants on a
-~ 400.0
E regular basis, it is often advantageous to model
0 these processes using multiple periods. With mul-
tiperiod models, it is possible to accumulate mate-
200.0
rial in the pools or blend tanks, thereby facilitat-
ing the allocation of stocks ahead of time in an-
ticipation of a future lifting of a valuable product.
0"01.0 ' 1.5 ' 210 2.5 3.0
This requires the model to incorporate inventories
P (carry-over stock) in each tank or pool, resulting in
more complex models. It is important to note that
each period does not need to be of the same dura-
tion. Often, the results of the multiperiod models
Fig. 3: Optimal solution to Haverly pooling problem.
will only be implemented for the first period, with
results for future periods being used for planning
Nonlinear Blending. For the sake of simplicity, it
purposes. Therefore, initial periods are typically of
is often assumed in formulating these models that
shorter duration (say a day each) while later pe-
the qualities to be tracked blend linearly by volume
riods might be as long as a month. This way, the
or weight of each component. In practice, however,
same multiperiod model can be used as an oper-
this is rarely the case. For example, one of the
ating tool for the present and a planning tool for
properties commonly tracked in refinery blends is
the future.
the Reid vapor pressure (RVP), which measures
the volatility of a blend. The most commonly used Another important consideration in multiperiod
blending rule for RVP is the Chevron method: models is the disposition of stocks at the end of
the final period. If the final inventories/stocks are
included simply as variables, the optimal solution
will almost always set them to zero. In practice,
• i
however, this is unrealistic since it is not desired
where ri is the RVP of component i, xi is its vol- to run down stocks. This can be dealt with in sev-
ume, and R is the RVP of the blend. Including eral ways:
such a nonlinear equation in the model can cause
a) set the final inventory levels to reasonable
difficulty in its solution. Fortunately, this can be
values (say the same as inventory levels at
avoided by introducing a blendin 9 index, defined
the beginning of the first period);
as
b) assign a value to final inventory; this way the
~ - - r i1.25, -

R-
- R1.25. model can decide if it is worthwhile to pro-

311
MINLP: Applications in blending and pooling problems

duce stock to sell at the end of the final pe- used extensively in the practical solution of these
riod. problems in industry.

Logical Constraints and M I N L P Formulations. It Complexity of Models. With the various options of
is often necessary to impose additional logical con- single versus multiperiod and linear versus nonlin-
straints that dictate how various components are ear blending, the models for pooling and blending
to be blended in relation to each other. Modeling can vary significantly in complexity. This is shown
such constraints often requires the addition of in- pictorially in Fig. 4.
teger variables, as discussed below.
~- g
a) If a component is to be used in a particu-
lar blend, then it must be present in at least
a certain amount in the blend. This arises
from the fact that it is usually not practical o
._N
to blend in infinitesimally small quantities.
If x represents the volume of such a compo- Simple Complex
Complexity of Process
nent, then introducing a new binary variable
5 (i.e. 5 is either 0 or 1) and the constraints Fig. 4: Types of pooling problems.
x- M 6 < 0,
x-mS>O S o l u t i o n M e t h o d s . Pooling problems can be
are sufficient to ensure this condition is sat- solved using a variety of solution algorithms. These
isfied. Here, M is a sufficiently large number, can be broadly classified as local and global solu-
while m represents the threshold value below tion methods.
which a component should not be blended in. Local Optimization Approaches. Traditionally,
b) Each product can have at most k compo- pooling and blending problems have been solved
nents in its blend. This is typically imposed using various recursion and successive linear pro-
by limitations on how many streams can be gramming (SLP) techniques. The first published
physically blended in a reasonable amount of approach for solving the pooling problem was due
time. Again, introducing the new variables to Haverly [8], who proposed the following recur-
and constraints as below" sion approach for solving the problem given in
xl - m61 > 0, Fig. 2:
° . o 1 Start with a guess for the pool quality p.
2 Solve the remaining linear problem for all other
Xn - - m~n > O, variables.
61 -{- " " " -~- 6n ~ k, 3 Calculate a new value for p from the solution in
2).
(~1,. • • ,(~n E { 0 - 1}n,
Unfortunately, this rather simple recursion will
ensures this condition is met. converge to a suboptimal solution regardless of
c) If component ,4 is to be present in the blend, the starting value for p. This can be partially
then component B must also be present: addressed by using a 'distributed recursion' ap-
xA -- m6A >_ O, proach, where an additional recursion coefficient
f and two additional 'correction vectors' are in-
x s -- m6B >_ 0,
troduced, modifying the inequalities in the model
(~S ~_ (~A. as follows:
Each of these logical constraints results in a mixed
P'Y11 + 2z31 - 2.5(yll + z31)
integer nonlinear programming (MINLP) model
(cf. also M i x e d i n t e g e r n o n l i n e a r p r o g r a m - + f (over - under) < 0,
ming). To date (2000), such models have not been P'y12 + 2z32 - 1.5(y12 + z32)

312
MINLP: Applications in blending and pooling problems

+(1 - f ) ( o v e r - under) < 0. (such as the pooling/blending problem). Surveys


of these algorithms can be found in [10], [12]. These
This formulation serves to distribute the error
approaches can generally be classified as either
made in estimating the pool quality to the two
pool destinations. Recursing on both p and f has decomposition-based or branch and bound algo-
a better likelihood of identifying the optimal solu- rithms.
tion. One of the common approaches to dealing with
SLP algorithms solve nonlinear models through the nonconvexities in the pooling problem is to
a sequence of linear programs (LPs), each of which reduce the bilinear terms to linear terms over a
is a linearized version of the model around some convex envelope [2]. Noting that for any bilinear
base point. These methods consist of replacing term p • y,
nonlinear constraints of the form (p_ pL). (y_ yL) > 0,
g(x) < O, h(x) = O, (p_ pu). (y_ yu) > 0,
with the linearizations (p_ pL). (y_ yu) ___0,
g(~k) + vg(~k), (x -~k) < 0, (p_ p~). (~ _ yL) < o,
h(-2k) + Vh(sk) • (x - 5k) _ 0 where ~gL,pv] and [yL, yV] define the ranges for
the variables p and y. This allows the term p . y
around a base point 5k at the kth iteration. The
to be replaced by a set of linear inequalities in
linearized problems can be solved using standard
the model, resulting in a linearized problem which
LP methods. The solution to the problem is used
provides an upper bound on the global solution to
to provide a value for ~k+l. As long as there is
the original problem. After solving this problem,
an improvement in the objective function value as
the rectangle defined by the bounds on p and y
well as the feasibility of the original constraints,
can be subdivided into smaller rectangles, and a
these methods can be shown to converge to a local
new linearized problem can be solved over each of
optimum. They work well for largely linear prob-
these subrectangles. By continuously subdividing
lems and have therefore found widespread use in
these rectangles, the upper bound can be made to
the refining industry for solving pooling, blending
asymptotically approach the global solution. See
and general refinery planning problems [4], [11].
[7] for the solution of several pooling problems us-
However, when there are nonlinear blending con-
ing this approach.
straints, the linearization in the SLP is often a bad
Note that the pooling problem is a partially lin-
approximation of the original problem, leading to
ear problem. That is, it can be formulated as
poor convergence rates and large solution times.
Pooling and blending problems can also be min c-rx
solved using other nonlinear programming (NLP) • ,~ (1)
s.t. A(p)x < b,
methods such as generalized reduced gradient, suc-
cessive quadratic programming or penalty function where p represents the pool quality and x repre-
methods. In general, these methods have not found sents all component flow rates. For such problems,
large acceptance in solving these problems, mainly decomposition approaches provide a natural solu-
due to difficulties with convergence and stability. tion mechanism. For a fixed value of p, this prob-
lem is linear, and provides an upper bound on the
Global Optimization Approaches. The recursive, global solution. The solution to this linear prob-
SLP and conventional NLP techniques all suf- lem (called the 'primal' problem) can be used to
fer from the drawback that the solution found is generate a Lagrange function of the form
highly dependent on the starting point, and in gen-
eral cannot guarantee convergence to the global
L(x,p) - c-rx + ~. ( A ( p ) x - b)
solution. In the last dozen years, numerous ap- where ~ represents the multipliers or marginal val-
proaches have been proposed for the solution of ues for the constraints from the primal problem.
quadratically constrained optimization problems Then, the 'dual' problem

313
MINLP: Applications in blending and pooling problems

rain cracking) are usually sent to common pools from


x,p,# (2) which finished products such as gasoline and diesel
s.t. # > L(x,p) oil are made. In both cases, it is important to know
various qualities of the stream coming out of the
provides an upper bound on the global solution.
pool (such as chemical compositions like sulfur or
Problem (2) contains bilinear terms of the form
physical properties such as vapor pressure).
A(p)x, which can be underestimated in a variety
In addition to refinery processes, blending is a
of ways. C.A. Floudas and V. Visweswaran [5], [6]
feature of various other manufacturing processes.
have developed the GOP algorithm based on this
These include
approach. By alternating between the primal prob-
lem and a series of relaxed dual problems (devel- • agriculture, where blending livestock feeds or
oped by successively partitioning the feasible re- fertilizers at minimum cost is very important;
gion), the GOP algorithm guarantees convergence • mining, where different ores are often mixed
to the global solution. In [13], [14], they show that to achieve a desired quality;
it is possible to develop properties that reduce • various aspects of food manufacturing; and
the number of relaxed dual problems that need to
• pulp and paper, involving blending of raw
be solved, thus speeding up the overall algorithm.
materials used to produce paper.
They also report the solution of numerous pooling
and blending problems using this approach. See also: C h e m i c a l process planning;
Instead of fixing p for the primal problem, it is M i x e d i n t e g e r linear p r o g r a m m i n g : M a s s
possible to solve (1) directly using local optimiza- a n d h e a t e x c h a n g e r n e t w o r k s ; M i x e d integer
tion techniques. For example, nonsmooth optimi- n o n l i n e a r p r o g r a m m i n g ; M I N L P : O u t e r ap-
zation techniques can be effective in finding local proximation algorithm; Generalized outer
solutions to these problems [1]. The dual problem a p p r o x i m a t i o n ; M I N L P : G e n e r a l i z e d cross
d e c o m p o s i t i o n ; E x t e n d e d c u t t i n g plane al-
can also be solved this way, with the region for p
gorithm; MINLP: Logic-based methods;
being refined by partitioning. See [1] for the so-
lution of several pooling problems using this ap- MINLP: Branch and bound methods;
proach. M I N L P : B r a n c h a n d b o u n d global o p t i m i -
z a t i o n a l g o r i t h m ; M I N L P : Global optimi-
It is important to note that these global opti-
z a t i o n w i t h ~BB; M I N L P : H e a t e x c h a n g e r
mization approaches (and others) for solving the
n e t w o r k synthesis; M I N L P : R e a c t i v e dis-
pooling problem can be computationally intensive.
tillation c o l u m n synthesis; M I N L P : Design
Invariably, a large number of subproblems need to
a n d s c h e d u l i n g of b a t c h processes; M I N L P :
be solved before convergence to a global solution
A p p l i c a t i o n s in t h e i n t e r a c t i o n of design
can be guaranteed. Because the subproblems are
a n d control; M I N L P : A p p l i c a t i o n in facil-
usually of the same structure, varying only slightly
ity l o c a t i o n - a l l o c a t i o n ; G e n e r a l i z e d B e n d e r s
in the data for the problems, they can be solved
decomposition.
in parallel. See [3] for an implementation of a dis-
tributed parallel version of the GOP algorithm and References
a successful application to solve pooling problems [1] A. BEN-TAL AND, G. EIGER, AND GERSHOVITZ, V.:
of medium size. 'Global minimization by reducing the duality gap',
Math. Program. 63 (1994), 193.
[2] AL-KHAYYAL, F.A., AND FALK, J.E.: 'Jointly con-
Applications. The most common application of strained biconvex programming', Math. Oper. Res. 8,
pooling and blending models is in the refining no. 2 (1983), 273.
and petrochemical industries. Crude oil from vari- [3] ANDROuLAKIS, I.P., VISWESWARAN, V., AND
FLOUDAS, C.A.: 'Distributed decomposition-based
ous sources is often brought into the refinery and
approaches in global optimization', in C.A. FLOUDAS
stored in common tanks before being processed AND P.M. PARDALOS (eds.): Proc. State of the Art
downstream. Similarly, intermediate streams from in Global Optimization: Computational Methods and
various refinery processes (alkylation, reforming, Applications, Kluwer Acad. Publ., 1996, pp. 285-301.

314
MINLP: Applications in the interaction o] design and control

[4] BAKER, T.E., AND LASDON, L.S.: 'Successive linear gineers develop and synthesize the structure of the
programming at Exxon', Managem. Sci. 31, no. 3 flowsheet and determine the operating parameters
(1994), 264. and steady-state operating conditions. Then, the
[5] FLOUDAS, C.A., AND VISWESWARAN,V.: 'A global op-
control engineer takes the fixed design and devel-
timization algorithm (GOP) for certain classes of non-
convex NLPs: I. Theory', Computers Chem. Engin. 14 ops a control system to maintain the system at
(1990), 1397. the desired specifications. During the first step, the
[6] FLOUDAS, C.A., AND VISWESWARAN, V.: 'A primal- dynamic operation of the process is generally not
relaxed dual global optimization approach', J. Optim. considered, and in the second step, changes to the
Th. Appl. 78, no. 2 (1993), 187.
flowsheet and operating conditions generally can
[7] FOULDS, L.R., HAUGLAND, D., AND JSRNSTEN, K.:
'A bilinear approach to the pooling problem', Chr. not be made.
Michelsen Inst. Working Paper 90, no. 3 (1990). Process design seeks to determine the arrange-
[8] HAVERLY, C.A.: 'Studies of the behaviour of recur- ment of processing units that will convert the given
sion for the pooling problem', A CM SIGMAP Bull. 25 raw materials into the desired products. The idea
(1978), 19.
is to develop a process flowsheet from the large
[9] HAVERLY, C.A.: 'Behaviour of recursion model-more
studies', ACM SIGMAP Bull. 26 (1979), 22. number of possible design alternatives. Numerous
[10] HORST, R., AND TUY, H.: Global optimization: Deter- process design methods and techniques exist for
ministic approaches, second ed., Springer, 1993. determining the best process flowsheet and oper-
[11] LASDON, L.S., WAREN, A.D., SARKAR, S., AND ating conditions. This best design is determined by
PALACIOS-GOMEZ, F.: 'Solving the pooling problem
optimizing some economic criteria and the quality
using generalized reduced gradient and successive lin-
ear programming algorithms', ACM SIGMAP Bull. 27 of the design is based on its economic value. Hence,
(1979), 9. the process is designed to operate at steady state
[12] PARDALOS, P.M., AND ROSEN, J.B.: Constrained and issues relating to the process dynamics, oper-
global optimization: Algorithms and applications, ability, and controllability are usually not consid-
Vol. 268 of Lecture Notes Computer Sci., Springer, ered.
1987.
[13] VISWESWARAN, V., AND FLOUDAS, C.A.: 'Computa- Once the process has been designed, the plans
tional results for an efficient implementation of the are handed over to the process control engineer
GOP algorithm and its variants', in I.E. GRoss- whose task is to ensure the stable dynamic per-
MANN (ed.): Global Optimization in Engineering De- formance of the process. The control engineer is
sign, Nonconvex Optim. Appl., Kluwer Acad. Publ., concerned with developing a control system which
1996, pp. 111-154.
maintains the operation of the process at the de-
[14] VISWESWARAN, V., AND FLOUDAS, C.A.: 'New for-
mulations and branching strategies for the GOP al- sired steady state in the presence ever-changing ex-
gorithm', in I.E. GROSSMANN(ed.): Global Optimiza- ternal influences. Issues such as disturbances, un-
tion in Engineering Design, Kluwer Acad. Publ., 1996, certainty, and changes in production rates must be
pp. 75-110. addressed so as to maintain product quality and
Viswanathan Visweswaran safe operation. By addressing the design and con-
SCA Technologies LLC trol sequentially, the inherent connection between
Pittsburgh, PA, USA the two is neglected. For instance, the steady-state
E-mail address: vishy, visweswaran@sca-tech, corn design of a process may appear to produce great
MSC2000: 90C90, 90C30 economic profits. However, unfavorable dynamic
Key words and phrases: pooling, blending, multiperiod op- operation may lead to a product which does not
timization. meet the required specifications. This may result
in an economic loss due to disposal or reworking
costs. Thus, a process design with good control-
M I N L P : APPLICATIONS IN THE INTER- lability aspects may have better economic value
ACTION OF DESIGN AND CONTROL that an economically optimal steady state design
In the development of a process, the steady state when the dynamic operation is considered. This
design aspects and dynamic operability issues are trade-off between the steady state design and the
usually handled sequentially. First, the design en- dynamic controllability motivates the treatment of

315
MINLP: Applications in the interaction of design and control

the issues simultaneously. is not clear.


There are additional incentives for employing a
simultaneous approach. Due to economic and en-
P r e v i o u s W o r k . In comparison to the amount of
vironmental reasons, the recent trend in process
research on the controllability measures, relatively
design has been towards more highly integrated
little work has been placed on methods for system-
process in terms of both material and energy flows.
atically determining the trade-offs between steady-
Processes are also required to operate under much
state economics and dynamic controllability. Al-
tighter operating conditions due to environmental
though economics continues to be the driving force
and safety issues. Both of these lead to designs
in the design of a process, there is no straightfor-
with increased dynamic interactions and processes
ward method for evaluating the economics of the
which are generally more difficult to control. Thus,
dynamic operation of the process. Several meth-
the dynamic operation of the process must be con-
ods have been proposed to address these issues. M.
sidered at the early stages of the design.
Morari and J.D. Perkins [14] discuss the concept
A systematic method for analyzing the inter-
of controllability and emphasize that the design of
action of design and control requires quantitative
a control system for a process is part of the overall
controllability measures of the process. Such mea-
design of the process. Noting that a great amount
sures have been derived to quantify certain quali-
of effort has been placed on the assessment of con-
tative concepts about the controllability of the pro-
trollability, particularly for linear dynamic models,
cess such as inversion, interaction effects, and di-
they indicate that very little has been published on
rectionality problems. A common measure for con-
algorithmic approaches for determination of pro-
trollability is the integral squared error (ISE) be-
cess designs where economics and controllability
tween outputs and their desired levels. Although
are traded off systematically.
it is easy to measure, it is not of direct inter- ObjectiveContours
est in practice. Other performance criteria such as
maximum deviation of output variables, maximum h1 ~ ~ ~ ~ ~ FunctionObjectiveImpr°ving
magnitude of control variables, or time to return
to steady state can also be used.
Most of the work in the development of control- Z2
lability measures has focused on linear dynamic
models. The control objective is the robust perfor-
mance of the process without any restrictions on
the controller structure [15]. One such measure is
the structured singular value, a, which indicates
the performance in the presence of uncertainty.
The condition number, -y, has been developed as
an indicator of closed-loop sensitivity to model er-
Fig. 1: Illustration of the back-off approach.
ror while the disturbance conditions number, 3'4,
indicates the sensitivity of the process to distur- In order to deal with the controllability issues
bances. The relative gain array (RGA), A, is used on a economic level, a back-off method was pre-
as an indicator of the relationship between control sented in [18] to determine the economic impact of
error and set point changes while the closed-loop disturbances on the system. The basic idea is to de-
disturbance gain (CLDG) is used to measure the termine the optimal steady-state operating point
relation between control error and disturbances. such that the feasible operation is maintained with
These measures have been used extensively in ap- respect to all constraints in the presence of un-
plications for controllability assessment; however, certainties and disturbances. This operating point
they can be misleading. While these indicators give is compared to the optimal steady-state operating
ideas as to the closed loop performance of the pro- point determined in the absence of disturbances.
cess, their impact on the economics of the process The economic penalty incurred by backing away

316
MINLP: Applications in the interaction of design and control

from the disturbances-free operating point to the The dynamic controllability is measured econom-
feasible operating point can be determined and ically by calculating the amount of material pro-
thus the cost of the disturbance can be evaluated. duced that is off-specification and on-specification.
This concept is illustrated in Fig. 1. Point A indi- The on-specification material leads to profits while
cates the nominal steady-state design, and point the off-spec material results in costs for reworking
B is the back-off point which corresponds to the or disposal.
design which will not violate the constraints hi A back-off technique was also developed in [1]
and h2 in the presence of uncertainties and distur- for the design of steady-state and open-loop dy-
bances. namic processes. Both uncertainties and distur-
The method is further developed in [17], where bances are considered for determining the amount
the control structure selection problem is analyzed. of back-off. In order to address the fact that back-
Perfect control assumptions are used along with a off approaches address the feasible operation and
linearized model to formulate a mixed integer lin- do not address controllability aspects, [5] intro-
ear program (MILP) where the integer variables duces a recovery factor which is defined as the ratio
indicate the pairings between the manipulated and of the amount of penalty recovered with control to
controlled variables. The back-off approach incor- the penalty with no control. This ratio is then used
porated the dynamic operation of the process into to rank different control strategies.
the design, but it only ensures the feasible opera- The advantage of the back-off approaches is that
tion of the process and does not directly address they determine the cost increase associated with
controllability aspects. moving to the back-off position which is attrib-
An approach for determining process designs uted to the uncertainties and disturbances. A lim-
which are both steady-state and operationally itation of this approach is that it can lead to rather
optimal was presented in [2]. The controllabil- conservative designs since the worst-case uncer-
ity of potential designs is evaluated along with tainty scenario is considered. Although the proba-
their economic performance by incorporating a bility of the worst-case uncertainty occurring may
model predictive control algorithm into the pro- not be high, this is the basis for the "final design.
cess design optimization algorithm. This coordi- Also, the method has not been applied to the de-
nated approach uses an objective function which sign/synthesis problem. A fixed design is consid-
is a weighted sum of economic and controllability ered and then the back-off is considered as a mod-
measures. ification of this design.
A multi-objective approach was proposed in [9], The optimal design of dynamic systems under
[10] to simultaneously consider both controllabil- uncertainty was addressed in [13]. Flexibility as-
ity and economic aspects of the design. This ap- pects as well as the control design were considered
proach incorporates both design and control as- simultaneously with the process design. The algo-
pects into a process synthesis framework where rithm is used to find the economic optimum which
the trade-offs between various open-loop control- satisfies all of the constraints for a given set of
lability measures and the economics of the process uncertainties and disturbances when the control
can be observed. The problem is formulated as a system is included.
mixed integer nonlinear program (MINLP), where S. Walsh and Perkins [23] outline the use of op-
integer variables are utilized for structural alter- timization as a tool for the design/control prob-
natives in the process flowsheet. Through the ap- lem. They note that the advances in computa-
plication of multi-objective techniques, a process tional hardware and optimization tools have made
design which is both economic and controllable is it possible to solve the complex problems that arise
determined. in design/control. Their assessment focuses on the
A screening approach was proposed in [4], where control structure selection problem where the eco-
the variability in the product quality is used nomic cost of a disturbance is balanced against the
to compare different steady-state process designs. performance of the controller.

317
MINLP: Applications in the interaction o/ design and control

The increasing importance of design and control are used to represent structural alternatives such
issues had lead to more and more discussion on the as the existence of process units. The modeling of
topic. One contribution to the area has been [11]. steady-state processes leads to algebraic equations
The fundamental design and control concepts are and constraints and results in an MINLP. When
described and several quantitative examples are dynamic models are to be used, the continuous
given which illustrate the interaction of design and variables are partitioned into dynamic state vari-
control. ables, control variables, and time invariant vari-
Most of the previous work does not address ables, and the resulting formulation is classified as
synthesis issues and does not treat the problem a mixed integer optimal control problem (MIOCP).
quantitatively. Two methods employ the optimi-
zation approach in process synthesis to arrive S t e a d y - S t a t e M o d e l i n g A p p r o a c h . This ap-
at mathematical programming formulations which proach was outlined in [9], [10] and follows the op-
are solved to determine the trade-offs between the timization approach for process synthesis. A sys-
steady-state design and dynamic controllability. tematic procedure is presented for incorporating
The first method [9], [10] uses steady state linear open-loop steady-state controllability measures
controllability measures while the second method into the process synthesis problem. The problem
[20] uses full nonlinear dynamic models of the pro- is formulated mathematically as a MINLP and a
cess. multi-objective optimization problem is solved to
quantitatively determine the best-compromise so-
P r o c e s s Synthesis. Mathematical programming lution among the economic and control objectives.
has been found to be a very useful tool for process The c-constraint method is used to determine the
synthesis. Its application in analyzing the inter- nonin/erior solution set where one objective can
action of design and control has followed directly be improved only at the expense of another, and
along the process synthesis methodology. the best-compromise solution is determined using
The goal in process synthesis to determine the a cutting plane algorithm.
structure and operating conditions of the process In order to apply the process synthesis ap-
flowsheet. The optimization approach to the syn- proach, the controllability measure must be ex-
thesis problem involves three steps: pressed as a function of the unknown design pa-
1) The representation of process design alterna- rameters. Steady-state controllability measures are
tives of interest through a process superstruc- used to simplify the problem and reduce imple-
ture. mentation difficulties that arise when considering
2) The mathematical modeling of the super- controllability measures as functions of frequency.
structure. The steady-state gains of the process can be writ-
ten in an analytical form thus allowing for an al-
3) The algorithmic development of solution pro-
gebraic representation.
cedure to extract the optimal process flow-
The starting point for the controllability anal-
sheet from the superstructure and solution
ysis is the linear multiple input/multiple output
of the optimization problem.
system written in the Laplace domain as
The key aspect is the postulation of a superstruc-
ture which contains all possible design alternatives z(s) = G ( s ) u ( s ) + Gd(s)d(s),
of interest. The superstructure must be sufficiently
where z are the output variables, u are the control
rich so as to include the numerous design possibili-
variables, G(s) is the process transfer function ma-
ties yet succinct enough to eliminate redundancies
trix, and Gd(s) is the disturbance transfer function
and reduce complexities.
matrix.
The mathematical model is characterized by the
Closed-loop control can be considered by ex-
variables and equations used in the model. Con-
pressing the control variable u(s) as
tinuous variables are used to represent flowrates,
compositions, temperatures, etc. Binary variables u(s) = G~(s)(~.*(s) - ~.(s)),

318
MINLP: Applications in the interaction of design and control

where Gc(s) is the controller transfer function and


z* is the desired set-point. This requires that the
form of controller transfer function be known as 6
well as the method for calculating the parame-
ters. Since this causes problems in the formulation
of the optimization problem, the controllability is
viewed as a property inherent to the process and
independent of the particular control system de-
sign. The analysis thus considers only the open-
loop controllability measures which depend only
on the process itself.
Since both the process design and controllability Noninferior Solution Set
measures can be expressed as functions of the un-
known design parameters, the synthesis problem
can be expressed as a multi-objective MINLP:
fl* 6
min J(x, y)
Fig. 2: Noninferior solution set for a problem with two
s.t. h(x, y) = 0
objectives.
g(x, y) = 0
By reducing the problem to a single objec-
•/ -- h ( x , y)
tive problem, MINLP optimization techniques can
xEXCR p be applied to solve the problem. These MINLP
y e {0, 1}q. techniques include generalized Benders decompo-
sition (GBD)[7], [19], outer approximation (OA)
In this formulation, J is a vector of objectives
[3], outer approximation with equality relaxation
which includes both the economic objectives and
(OA/ER) [8], and outer approximation with equal-
controllability objectives. The expressions h and
ity relaxation and augmented penalty [22]. These
g represent material and energy balances, ther-
are discussed in detail in [6].
modynamic relations, and other constraints. The
Once the noninferior solution set is determined,
controllability measures are included in the for-
the best compromise solution is determined by ap-
mulation as r/. The variables in this problem are
plying a cutting plane algorithm. The trade-offs
partitioned as continuous x and binary y.
among the objectives are quantitatively assessed
The problem is posed with multiple objectives
using weight factors which come from the slope of
representing the competing economic and open-
the noninferior solution set.
loop controllability measures. Different techniques
have been developed in order to assess the trade-
offs among the objectives quantitatively. In this D y n a m i c M o d e l i n g A p p r o a c h . The major lim-
approach, the noninferior solution set is generated itation of the above approach is that is does not
to determine the set of solutions in which one ob- consider the dynamic behavior of the process. This
jective can be improved only at the expense of the approach considers the full dynamic model of the
other(s). The noninferior solution set for a two ob- process and a dynamic controllability measure. An
jective problem is visually depicted in Fig. 2. optimization approach is applied which involves a
This noninferior solution set is generated using dynamic optimization problem.
an e-constraint method where one objective is op- One of the initial difficulties with this method
timized and the others are included as constraints is defining a controllability measure for nonlinear
less than a parameter e. The problem is reduced dynamic systems. As in the previous method, the
to a single objective optimization problem which controllability measure must be capable of being
is iteratively solved for varying values of e to gen- expressed as a function of the unknown design pa-
erate the noninferior solution set. rameters. One possible choice for the controllabil-

319
MINLP: Applications in the interaction of design and control

ity measure is the integral square error (ISE). The point constraints where ti represents the time in-
benefit of this measure is that it is easy to calcu- stance at which the constraint is enforced and h"
late and and does reflect the dynamics of the pro- and g" are general constraints. The objective func-
cess albeit only in the outputs of the process. One tions for the economic and controllability measures
downside of this measure is that there is no one to are represented by the vector J.
one correspondence between the the control struc- The initial condition for the above system is de-
ture and the ISE measure. Thus, different dynamic termined by specifying n of the 2n + m variables
characteristics of the process may not be reflected zl(t0), il(t0), z2(t0). For DAE systems with index
in the ISE. 0 or 1, the remaining n + m values can be deter-
The superstructure is the same as in the previ- mined. In this work, DAE systems of index 0 or 1
ous approach, but a dynamic model is used instead are considered and the initial conditions for zl(t)
of a steady-state model. The dynamic modeling and z2(t) are z ° and z ° respectively.
of the superstructure leads to a problem that in- Note that in this general formulation, the y vari-
cludes differential and algebraic equations (DAEs) ables appear in the DAE system as well as in the
and the formulation is a multi-objective MIOCP. point constraints and general constraints. This has
New algorithmic techniques must be developed for implications on the solution strategy.
the solution of the formulation. A similar approach to that of the previous ap-
The general formulation for the multi-objective proach is applied to address the multi-objective
MIOCP is as follows: nature of the problem. An e-constraint method is
min J(zl(ti),zl(ti),z2(ti),u(ti),x,y) applied to reduce to problem to an iterative solu-
tion of single objective MIOCPs.
s.t. fl (zl (t), Zl (t), z2 (t), u(t), x, y, t) = 0
f2(zl (t), z2(t), u(t), x, y, t) = 0 M I O C P S o l u t i o n A l g o r i t h m . The strategy for
z (t0) - solving the MIOCP is to apply iterative decom-
(t0) - position strategies similar to existing MINLP al-
gorithms with extensions for handling the DAE
h'(zl(ti),zl(ti),z2(ti), u ( t i ) , x , y ) = 0
system. The algorithm developed for the solu-
g'(zl(ti),zl(ti),z2(ti), u ( t i ) , x , y ) _< 0 tion of the MIOCP closely parallels existing al-
h"(x, y) -- 0 gorithms for MINLP optimization (GBD, OA,
g"(x, y) < 0 OA/ER, O A / E R / A P ) . The presence of the y vari-
xEXCR p ables in DAE system for the general case prohibits
the use of Outer Approximation and its variants.
y E {0, 1} q
For the special cases where the y variables do not
e It0, tN]
appear in the DAEs and do participate in a linear
i=O,...,N. and separable fashion, outer approximation and its
(1) variants can be applied to the problem. The GBD
Here, zl(t) is a vector of n dynamic variables algorithm can be applied to the solution of the
whose time derivatives, zl(t), appear explicitly, general problem, and the algorithmic development
and z2(t) is a vector of m dynamic variables whose closely follows those of GBD.
time derivatives do not appear explicitly, x is a The GBD algorithm is an iterative procedure
vector of p time invariant continuous variables, y which generates upper and lower bounds on the
is a vector of q binary variables, and u(t) is a vector solution of the MINLP formulation. The upper
of r control variables. Time t is the independent bound results from the solution of an NLP pri-
variable for the DAE system where to is the fixed mal problem and the lower bound from an MILP
initial time, ti are time instances, and tN is the master problem. The bounds on the solution con-
final time. The DAE system is represented by fl, verge in a finite number of iterations to yield the
the n differential equations, and f2, the m dynamic solution to the MINLP model. A similar method-
algebraic equations. The constraints h ~ and g~ are ology is applied to the MIOCP problem, but the

320
MINLP: Applications in the interaction of design and control

forms of the primal and master problems have to u(t) = ¢ ( w , ~.(t), ,.(t)).
be altered.
In both cases, w are the time invariant control pa-
Primal Problem. The primal problem is obtained
rameters. The set of time invariant parameters, x,
by fixing the y variables which leads to an optimal
is now expanded to include the control parameters:
control problem. For fixed values of y - yk, the
MIOCP has the following form: x = {x, w } .
min J(il(ti),zl(ti),z2(ti), u ( t i ) , x , y k)
The set of DAEs (f) is expanded to include pa-
s.t. fl (zl (t), zl (t), z2 (t), u(t), x, yk, t) -- 0
rameterization functions
f2(zl (t), z2 (t), u(t), x, yk, t) -- 0
Zx (to) -- Z 0 f(.) = {f(.), ¢('), ¢(')}
~,~ (t0) - ~0
and the control variables are converted to dynamic
h'(il(ti),zl(ti),z2(ti), u(ti),x, y k) - 0 state variables:
g'(il(ti),zl(ti),z2(ti),u(ti),x,y k) < 0
h"(x, y k) - 0 ~. = {~., u}.
g"(x, yk) < 0
Through the application of the control parame-
x E ,1:' C RP
terization, the control variables are effectively re-
t~ e [to, tN] moved from the problem and the following prob-
i-O,...,N. lem results:
(2)
min J(Zl (ti), Zl (ti), z2 (ti), x, yk)
The solution of this optimal control problem can
s.t. fl (Zl (t), Zl (t), z2(t), x, y k t) -- 0
be handled in several ways: complete discretiza-
tion, solution of the necessary conditions, dynamic f2(zl (t), z2 (t), x, yk, t) -- 0
programming, and control parameterization. This ~,~ (to) - z °
work focuses on the control parameterization tech- ~,~ (t0) - ~0
niques which parameterize only the control vari- (ti), Zl (ti), z2 (ti), x, yk) = 0
h ' (Zl
ables u(t) in terms of time invariant parameters. (3)
g'(Zl(ti),zl(ti),z2(ti),x,y k) < 0
At each step of the optimization procedure, the
h"(x, y k) - 0
DAEs are solved for given values of the decision
variables and a feasible path for z(t) is obtained. g " ( x , y }) < 0
This solution is used to evaluate the objective xEXCR p
function and remaining constraints. The control t~ e [to, tN]
parameterization can either be open loop as de-
i =0,...,N.
scribed in [21] or closed-loop such as that described
in [17] and [16] which also includes the control This problem is a nonlinear program with dif-
structure selection.
ferential and algebraic constraints (NLP/DAE).
The basic idea behind the control parameteri- This problem is solved using a parametric method
zation is to express the control variables u(t) as where the DAE system is solved as a function of
functions of time invariant parameters. This pa- the x variables. The solution of the DAE system
rameterization can be done in terms of the inde- is achieved through an integration routine which
pendent variable t (open loop): returns the values of the z variables at the time
u(t) = ¢ ( w , t). instances, z(ti), along with their sensitivities with
respect to the parameters, dz/dx(ti). The result-
Alternatively, the parameterization can be done in ing problem is an NLP optimization over the space
terms of the state variables z(t) (closed-loop): of x variables which has the form:

321
MINLP: Applications in the interaction of design and control

¢
a lower bound and y variables for the next pri-
min J(zl (ti), Zl (ti), z2(ti), x, yk)
mal problem. Dual information is required from all
s.t. h'(il(ti),zl(ti),z2(ti),x,y k) - 0 of the constraints including the DAEs whose dual
g'(il(ti),zl(ti),z2(ti),x,y k) <_ 0 variables, or adjoint variables, are dynamic. The
h"(x, yk) _ 0 constraints and their corresponding dual variables
(4)
g " ( x , y k) _< 0 are listed in Table 1.
xEX constraint dual variable
e fl v,(t)
f2 v~(t)
i = 0,...,N, g' tt I
h' ,V
where the variables 2;1(ti), zl(ti), and z2(ti) are de- g"
termined through the solution of the DAE system
h"
by integration:
Table 1: Constraints and their corresponding dual
fl (Zl (t), Zl (t), Z2 (t), X, yk, t) - 0, variables.
f2(zl (t), z2(t), x, yk t) -- 0
(5) The dual variables ~u', A', if', and X" are gener-
Zl (to) -- Z O,
ally obtained from the solution technique for the
(t0) - primal problem. Dual information from the DAE
The functions J(.), g'(.), and h'(.) are func- system is obtained by solving the adjoint problem
tions of z(ti) which are implicit functions of the x for the DAE system which has the following for-
variables through the integration of the DAE sys- mulation:
tem. For the solution of the NLP the objective and
constraints evaluations, along with their gradients p - vTdfl
with respect to x, are required. These are evalu- 1 dil'
_ dfi v~df2
ated directly for the constraints g"(x) and h"(x). 0 (7)
dzl ,
However, for the functions J(.), g'(.), and h'(.),
the values z(ti), and the gradients dz/dx(ti), as
returned from the integration, are used. The func-
tions J(.), g'(.), and h'(-) are evaluated directly This is a set of DAEs where the solutions for
and the gradients dJ/dx, dg~/dx, and dh'/dx are dfl/d~.l, dfl/dzl, df2/dzl, dfl/dz2, and df2/dz2 are
evaluated by using the chain rule: known functions of time obtained from the solu-
dJ - ( OJ
~x tion of the primal problem. The variables vl(t)
and v2(t) are the adjoint variables and the solu-
tion of this problem is a backward integration in
_ (oh: oh: time with the following final time conditions:
(6)
dx -$X-X '
-dJ
-+ Ai dh I tt , dg' dfi
dg~ -- ( Ogi dzl
\

Standard gradient based optimization tech- Thus, the Lagrange multipliers for the end-time
niques can be applied to solve this problem as an constraints are used as the final time conditions
NLP. The solution of this problem provides values for the adjoint problem and are not included in
of the x variables and trajectories for z(t). the master problem formulation.
The master problem is formulated using dual in- The master problem is formulated using the so-
formation and the solution of the primal problem. lution of the primal problem, x k and zk(t), along
Provided that the y variables participate linearly, with the dual information, tt ''k, X''k, and vk(t).
the problem is an MILP whose solution provides The master problem has the following form:

322
MINLP: Applications in the interaction of design and control

distillate composition is controlled by the reflux


min #b
Y,Db rate. Since only the product composition is speci-
s.t. #b _> J( xk, Y) fied, the distillate composition set-point is free and
tN
left to be determined through the optimization.
+ / Ulk (t)fl (ilk (t), z k (t), z k (t), x k , y, t) dt
to XN+I , d L
v
tN a=2.5
'___.

+f (t) ,~ v
® L
to YN-IIV
f2(z k (t), zk2(t), x k, y, t) dt qN-I

Ym tv
+/~,,kg,, (x k, y) + A,,kh,, (x k, y), x"'l%'[ qi+l

,, Tv Q
k E Kfeas,
tN
A----~B
y,., Iv
x,l , [
0>f ,2 Tv
G
to
iy, lv
fl (zk (t), z k (t), zk2(t), x k , y, t) dt Q
xI L! V
tN
+ / uk(t)f2(zk(t),zk2(t),xk,y,t) dt
, i
to
+lz ''k g" (x k, y) + A"kh '' (x k, y), Fig. 3: Superstructure for reactor-separator-recycle system.
, , , , ,
k E ginfeas, 0.016

y e {0, 1} q. 0.014

(8) 0.012

The integral term can be evaluated since the pro- W-~ 0.01

files f o r zk(t) a n d uk(t) both are fixed and known. i 0.008

Note that this formulation has no restrictions on 0.01~

whether or not y variables participate in the the 0.004

DAE system.
w
i i i i i
500000 550000 600000 650000 700000 750000
corn ($)
Example: Reactor-Separator-Recycle Sys-
tem. The example problem considered here is the Fig. 4: Noninferior solution set for the
design of a process involving a reaction step, a sep- reactor-separator-recycle system.
aration step, and a recycle loop. Fresh feed con- The cost function includes column and reactor
taining A and B flow into a an isothermal reactor capital and utility costs.
where the first order irreversible reaction A --+ B
takes place. The product from the reactor is sent cOStreactor - - 17639Dr1'°66 (2Dr)°"8°2,
to a distillation column where the unreacted A is cOStcolumn - - 6802Dc1"°66(2.4Nt)°"8°2
separated from the product B and sent back to the
+548.8Dlc'55Nt,
reactor. The superstructure is shown in Fig. 3.
The model equations for the reactor (CSTR) cOStexchanger s = 193023V°s "65,

and the separator (ideal binary distillation col- cOStutilities -- 72420Vss,


umn) can be found in [12]. The specific problem cOStreactor + cOStcolumn -[- cOStexchangers
design follows the work in [10]. coSttotal -- f~pay

For this problem, the single output is the prod- ~-~tax [cOStutilities ].
uct composition. The bottoms (product) compo-
sition is controlled by the vapor boil-up and the The controllability measure is the time weighted

323
MINLP: Applications in the interaction of design and control

ISE for the product composition: gorithm; M I N L P : Logic-based methods;


M I N L P : Branch and bound methods;
d# = t ( x B -- X~) 2
dt M I N L P : Branch and bound global optimi-
zation algorithm; M I N L P : Global optimi-
The noninferior solution set is shown in Fig. 4,
zation with aBB; M I N L P : Heat exchanger
and Table 2 lists the solution information for three
network synthesis; M I N L P : Reactive dis-
of the designs in the noninferior solution set. The
tillation column synthesis; M I N L P : Design
dynamic profile for these three designs are shown
and scheduling of batch processes; Gen-
in Fig. 5.
eralized Benders decomposition; M I N L P :
Solution A B C
Application in facility location-allocation;
Cost ($) 489,000 534,000 736,000
Capital ($) 321,000 364,000 726,000
M I N L P : Applications in blending and pool-
Utility ($) 168,000 170,000 10,000 ing problems; O p t i m a l control of a flex-
ISE 0.0160 0.00379 0.0011 ible arm; D y n a m i c programming: Conti-
Trays 19 8 1 nuous-time optimal control; H a m i l t o n -
Feed 19 8 1 J a c o b i - B e l l m a n equation; Dynamic pro-
Vr (kmol) 2057.9 3601.2 15000
gramming: O p t i m a l control applications;
V (kmol/hr) 138.94 141.25 85.473
Kv 90.94 80.68 87.40 Multi-objective optimization: Interaction of
ry (hr) 0.295 0.0898 0.0156 design and control; Sequential quadratic
programming: Interior point m e t h o d s for
Table 2: Solution information for three designs.
d i s t r i b u t e d optimal control problems; Ro-
All of the designs in the noninferior solution set bust control; R o b u s t control: Schur stability
are strippers. Since the feed enters at the top of of polytopes of polynomials; Semi-infinite
the column, there is no reflux and thus no control p r o g r a m m i n g and control problems; Dy-
loop for the distillate composition. The controlla- namic p r o g r a m m i n g and N e w t o n ' s m e t h o d
bility of the process is increased by increasing the in unconstrained optimal control; Duality in
size of the reactor and decreasing the size of the optimal control with first order differential
column. The most controllable design has a large equations; Infinite horizon control and dy-
reactor and a single flash unit. namic games; Control vector iteration; Sub-
0.01064
optimal control.

References
. •0.01058 [1] BAHRI, P.A., BANDONI, J.A., AND ROMAGNOLI, J.A.:
I0.01056 'Effect of disturbances in optimizing control: Steady-
state open-loop backoff problem', AIChE J. 42, no. 4
~ 0.01054
(1996), 983-994.
[2] BRENGEL, D.D., AND SEIDER, W.D.: 'Coordinated de-
sign and control optimization of nonlinear processes',
Computers Chem. Engin. 16, no. 9 (1992), 861-886.
0.01048 I I ! I I I i
[3] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer-
0 1 2 3 4 5 8 7
Time(hr)
approximation algorithm for a class of mixed-integer
Fig. 5: Dynamic responses of product compositions for nonlinear programs', Math. Program. 36 (1986), 307-
three designs. 339.
[4] ELLIOTT, T.R., AND LUYBEN, W.L.: 'Capacity-based
See also: Chemical process planning; approach for the quantitative assessment of process
Mixed integer linear programming: Mass controllability during the conceptual design stage', In-
and heat exchanger networks; Mixed integer dustr. Engin. Chem. Res. 34 (1995), 3907-3915.
nonlinear programming; M I N L P : O u t e r ap- [5] FIGUEROA, J.L., BAHRI, P.A., BANDONI, J.A., AND
ROMAGNOLI, J.A.: 'Economic impact of disturbances
proximation algorithm; Generalized outer and uncertain parameters in chemical processes- A
approximation; M I N L P : Generalized cross dynamic back-off analysis', Computers Chem. Engin.
decomposition; E x t e n d e d cutting plane al- 20, no. 4 (1996), 453-461.

324
MINLP: Branch and bound global optimization algorithm

[6] FLOUDAS, C.A.: Nonlinear and mixed integer optimi- bined penalty function and outer approximation
zation: Fundamentals and applications, Oxford Univ. method for MINLP optimization', Computers Chem.
Press, 1995. Engin. 14, no. 7 (1990), 769-782.
[7] GEOFFRION, A.M.: 'Generalized Benders decomposi- [23] WALSH, S., AND PERKINS, J.D.: 'Operability and con-
tion', J. Optim. Th. Appl. 10, no. 4 (1972), 237-260. trol in process synthesis and design', in J.L. ANDERSON
Is] KocIs, G.R., AND GROSSMANN, I.E.: 'Relaxation (ed.): Adv. Chem. Engin., Vol. 23, Acad. Press, 1996,
strategy for the structural optimization of process flow pp. 301-402.
sheets', Industr. Engin. Chem. Res. 26, no. 9 (1987), Carl A. Schweiger
1869.
Dept. Chemical Engin. Princeton Univ.
[9] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the
Princeton, NJ 08544-5263, USA
interaction of design and control-1. A multiobjective
E-mail address: carl©titem, princeton, edu
framework and application to binary distillation syn-
thesis', Computers Chem. Engin. 18, no. 10 (1994), Christodoulos A. Floudas
933-969. Dept. Chemical Engin. Princeton Univ.
[10] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the Princeton, NJ 08544-5263, USA
interaction of design and control-2. Reactor-separator- E-mail address: floudas@titan, princeton, e d u
recycle system', Computers Chem. Engin. 18, no. 10
MSC2000: 90Cll, 49M37
(1994), 971-994.
Key words and phrases: mixed integer nonlinear optimiza-
[11] LUYBEN, M.L., AND LUYBEN, W.L.: 'Essentials of pro-
tion, parametric optimal control, interaction of design and
cess control': McGraw-Hill, 1997.
control.
[12] LUYBEN, W.L.: Process modeling, simulation, and con-
trol for chemical engineers, second ed., McGraw-Hill,
1990.
[13] MOHIDEEN, M.J., PERKINS, J.D., AND PISTIKOPOU- MINLP: B R A N C H AND BOUND GLOBAL
LOS, E.N.: 'Optimal design of dynamic systems under OPTIMIZATION ALGORITHM
uncertainty', AIChE J. 42, no. 8 (1996), 2251-2272.
A wide r a n g e of n o n l i n e a r o p t i m i z a t i o n p r o b l e m s
[14] MORARI, M., AND PERKINS, J.: 'Design for opera-
tions': FOCAPD Conf. Prec., 1994. involve integer or discrete variables in a d d i t i o n
[15] MORARI, M., AND ZAFIRIOU, E.: Robust process con- to continuous ones. T h e s e p r o b l e m are d e n o t e d
trol, Prentice-Hall, 1989. as mixed integer nonlinear programming ( M I N L P )
[16] NARRAWAY, L.T., AND PERKINS, J.D.: 'Selection of problems. Integer variables c o r r e s p o n d to logical
control structure based on economics', Computers
decision describing w h e t h e r c e r t a i n actions do or
Chem. Engin. 18 (1993), $511-515.
[17] NARRAWAY, L.T., AND PERKINS, J.D.: 'Selection of do not take place, or m o d e l i n g the sequence ac-
process control structure based on linear dynamic eco- cording to which those decisions take place. T h e
nomics', Industr. Engin. Chem. Res. 32 (1993), 2681- n o n l i n e a r n a t u r e of the M I N L P models m a y arise
2692. from:
[i8] NARRAWAY, L.T., PERKINS, J.D., AND BARTON,
G.W.: 'Interaction between process design and pro- • n o n l i n e a r relations in the integer d o m a i n only
cess control: Economic analysis of process dynamics',
• n o n l i n e a r relations in the continuous d o m a i n
J. Process Control I (1991), 243-250.
[19] PAULES IV, G.E., AND FLOUDAS, C.A.: 'APROS: only
Algorithmic development methodology for discrete- • n o n l i n e a r relations in the joint d o m a i n , i.e.,
continuous optimization problems', Oper. Res. 37,
p r o d u c t s of c o n t i n u o u s and binary/integer
no. 6 (1989), 902-915.
[20] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'Interaction variables.
of design and control: Optimization with dynamic mod-
T h e general m a t h e m a t i c a l f o r m u l a t i o n of the
els', in W.W. HAGERAND P.M. PARDALOS (eds.): Op-
M I N L P p r o b l e m s can be s t a t e d as follows:
timal Control: Theory, Algorithms, and Applications,
Kluwer Acad. Publ., 1997, pp. 388-435.
[21] VASSILIADIS, V.S., SARGENT, R.W.H., AND PAN- min f(x,y)
x,y
TELIDES, e.G.: 'Solution of a class of multistage dy-
s.t. h(x,y) =0
namic optimization problems 1. Problems without path
constraints', Industr. Engin. Chem. Res. 33 (1994), g(x, y) < 0
2111-2122. xEXcR n
[22] VISWANATHAN, J., AND GROSsMANN, I.E.: 'Acom-
y E Y (integer).

325
MINLP: Branch and bound global optimization algorithm

Here, x represents a vector of n continuous vari- First, a reasonable effort is made in solving the
ables, y is a vector of integer variables, ] ( x , y ) , original problem, by considering for instance the
h(x,y), g(x,y) represent the objective function, continuous relaxation of it. If the relaxation does
equality and inequality constraints, respectively. It not result in an integer-feasible solution, i.e., one
should be noted, that every problem of the form in which the binary variables achieve 0-1 at the
just presented, can be transformed into one where optimal point, them the root node is separated
all integer variables have been transformed into into two candidate subproblems which are subse-
binary, i.e., 0-1, variables, by realizing that every quently solved. The separation aims at creating
integer yL <_ y <_ yU can be expressed through 0-1 simpler instances of the original problem. Until the
variables, z = (Zl,... ,ZN), a s : problem is successfully solved this process of gener-
ating candidate subproblems is repeated. Branch
y _ yL _+_Zl + 2z2 -+- 4z3 + . . . + 2Y-lzN,
and bound algorithms are also known as divide-
1, and-conquer for that very reason. A basic princi-
log 2 " ple common to all branch and bound algorithms is .
Therefore, any MINLP problem can be written as: that the solution of the subproblems aims at gen-
erating valid lower bounds on the original MINLP
min f (x, y)
x,y through its relaxation to a continuous problem.
s.t. h(x,y) = 0 The relaxation, in the case of MINLP, results in
g(x, y) < o a nonlinear programming problem (NLP) which,
in the general case, is nonconvex and needs to be
xEXcR n
solved to global optimality so as to provide a valid
y e Y = { O , 1} m.
lower bound. If the NLP relaxation renders an in-
In the analysis of MINLP problems two issues teger solution, then this solution is referred to as
are of paramount importance: valid upper bound. The generation of the sequence
of valid upper and lower bounds is called bound-
• combinatorial explosion of computational re-
ing step. The way subproblems are created is by
quirements as the number of binary variables
forcing some of the binary variables to take on
increases
a value of 0 or 1. This is known as the branch-
• NP-hard nature of the problem of determin- ing step. Nodes in the tree are pruned when the
ing the global minimum solution of general corresponding valid lower bound exceeds the valid
nonconvex MINLP problems. upper bound, this stage is know as the fathom-
A complexity analysis of the former is presented ing step. The selection of the branching node, the
in [16], while the complexity of determining global branching variable and the generation of the lower
minimum solutions of MINLPs is discussed in [15]. bound are very crucial steps whose importance
Various methods exist for identifying a locally becomes even more pronounced when addressing
optimum solution of MINLP problems. These are nonconvex MINLP problems. Two basic strate-
discussed in great detail in [9] and in a recent thor- gies exists regarding the selection of the branching
ough review paper, [6], which presents a compre- node depending on whether one designs a branch
hensive account of the various approaches for ad- and bound based on a depth-first or a breadth-first
dressing issues related to the solution of mixed in- approach. In the former, the last node created is
teger nonlinear optimization problems. selected for branching, in the latter the node that
The main objective in a general branch and generated the best lower bound is selected. It is
bound algorithm is to perform an enumeration of not clear which strategy is the best and it is often
the alternatives without examining all 0-1 com- that the one that minimizes the computational re-
binations of the binary variables. A key element quirement is selected, [13]. Another alternative is
in such an enumeration if the representation of to select nodes based on the deviation of the solu-
alternatives via a binary tree. The basic ideas in tion from integrality, [12]. The most common strat-
a branch and bound algorithm are the following. egy for selecting a branching variable is to select

326
M I N L P : B r a n c h a n d b o u n d global o p t i m i z a t i o n a l g o r i t h m

the variable whose value at the solution of some such t h a t ai > x L, then x i - ai.
relaxed problem is the farthest from integer, i.e., b) If x i - x L - 0 at the solution of the con-
the most fractional variable, [17]. In [12] a method vex NLP and '~i - xL + (U - L ) / ) ~ is
based on the concept of p s e u d o c o s t s which quanti- such t h a t gi < x U, then x U - ai.
fies the effect of binary variables is also proposed, If neither b o u n d constraint is active at the
which assigns essentially priorities on the order of solution of the convex NLP for some vari-
branching variables. Finally, one of the most im- able x j , the problem can be solved by setting
portant computational step is the generation of the x j - x v or x j - x ji. Tests similar to those
lower bound, in other words the solution of the re- presented above are then used to update the
laxed problem. The effectiveness of a branch and bounds on x j .
bound depends of the quality of the lower bound
2) Feasibility based range reduction tests" In
that is generated. At every node of the branch and
addition to ensuring that tight bounds are
bound tree a nonlinear-nonconvex NLP is solved. available for the variables, the constraint un-
Two issues are important: the lower bound must
derestimators are used to generate new con-
be valid, in other words the relaxation at a par-
straints for the problem. Consider the con-
ticular node must underestimate the solution of straint g i ( x , y ) ~_ O. If its underestimating
the original problem for this node, and the lower
function g _ i ( x , y ) - 0 at the solution of the
bounds must be tight so as to enhance the fath-
convex NLP and its multiplier is #~ > 0, the
oming step. The key complexity when dealing with
constraint
nonconvex MINLPs is that the relaxation solved at
U-L
each node is, of course, a nonconvex NLP that has y) > - - - 7 -#i
-
to be solved to global optimality. W i t h the excep-
tion of problems which are convex in the x and can be included in subsequent problems.
relaxed y-space for which variants of the branch A global optimization algorithm branch and bound
and bound algorithms will lead the correct solu- algorithm has been proposed in [20]. It can be ap-
tion, [18], in all other cases g l o b a l o p t i m i z a t i o n al- plied to problems in which the objective and con-
gorithms have to be employed for the generation straints are functions involving any combination
of valid lower bounds. of binary arithmetic operations (addition, subtrac-
In [19] the scope of branch and bound algo- tion, multiplication and division) and functions
rithms was extended to problems for which valid that are either concave over the entire solution
convex underestimating NLPs can be constructed space (such as ln) or convex over this domain (such
for the convex relaxations. The problems included as exp).
bilinear and separable problems for which convex The algorithm starts with an automatic refor-
underestimators can be build [14]. A number of mulation of the original nonlinear problem into a
very useful tests were proposed to accelerate the problem that involves only linear, bilinear, linear
reduction of solution space. Namely: fractional, simple exponentiation, univariate con-
cave and univariate convex terms. This is achieved
1) Optimality based range reduction tests: For through the introduction of new constraints and
the first set of tests, an upper bound U on the variables. The reformulated problem is then solved
nonconvex MINLP must be computed and a to global optimality using a branch and bound ap-
convex lower bounding NLP must be solved proach. Its special structure allows the construc-
to obtain a lower bound L. If a bound con- tion of a convex relaxation at each node of the tree.
straint for variable x i , with x L < x i ~ x U, is The integer variables can be handled in two ways
active at the solution of the convex NLP and during the generation of the convex lower bound-
has multiplier A~ > 0, the bounds on x i can ing problem. The integrality condition on the vari-
be u p d a t e d as follows: ables can be relaxed to yield a convex NLP which
a) If x i - x v - 0 at the solution of the con- can then be solved globally. Alternatively, the inte-
vex NLP and '~i - x U - ( U - L ) / A * is ger variables can be treated directly and the con-

327
MINLP: Branch and bound global optimization algorithm

vex lower bounding MINLP can be solved using a local solution. This bound generation strategy is
a branch and bound algorithm as described ear- incorporated within a branch and bound scheme: a
lier. This second approach is more computation- lower and upper bound on the global solution are
ally intensive but is likely to result in tighter lower first obtained for the entire solution space. Sub-
bounds on the global optimum solution. In order sequently, the domain is subdivided by branching
to obtain an upper bound for the optimum solu- on a binary or a continuous variable, thus creating
tion, several methods have been suggested. The new nodes for which upper and lower bounds can
MINLP can be transformed to an equivalent non- be computed. At each iteration, the node with the
convex NLP by relaxing the integer variables. For lowest lower bound is selected for branching. If the
example, a variable y E {0, 1 } can be replaced by a lower bounding MINLP for a node is infeasible or
continuous variable z E [0, i] by including the con- if its lower bound is greater than the best upper
straint z- z. z = 0. The nonconvex NLP is then bound, this node is fathomed. The algorithm is
solved locally to provide an upper bound. Finally, terminated when the best lower and upper bound
the discrete variables could be fixed to some arbi- are within a pre-specified tolerance of each other.
trary value and the nonconvex NLP solved locally. Before presenting the algorithmic procedure, an
In [i] SMIN was proposed which is designed to overview of the underestimation and convexifica-
address the following class of problems to global tion strategy is given, and some of the options
optimality" available within the algorithm are discussed.
In order to transform the MINLP problem of
min f (x) + x TAoy + cToy
the form just described into a convex problem
s.t. h(x) + x T Aly + c~y - 0 which can be solved to global optimality with the
g(x) + x TA2y + cT2y < 0 OA or GBD algorithm, the functions f(x), h(x)
xEXCR n and g(x) must be convexified. The underestima-
tion and convexification strategy used in the c~BB
y EY (integer),
algorithm has previously been described in detail
where c0-V, c~ and c~ are constant vectors, A0, A1 [3], [5], [4]. Its main features are exposed here.
and A2 are constant matrices and f(x), h(x) and In order to construct as tight an underestimator
g(x) are functions with continuous second order as possible, the nonconvex functions are decom-
derivatives. The solution strategy is an extension posed into a sum of convex, bilinear, univariate
of the aBB algorithm for twice-differentiable NLPs concave and general nonconvex terms. The overall
[7], [5], [4]. It is based on the generation of two function underestimator can then be built by sum-
converging sequences of upper and lower bounds ming up the convex underestimators for all terms,
on the global optimum solution. A rigorous under- according to their type. In particular, a new vari-
estimation and convexification strategy for func- able is introduced to replace each bilinear term,
tions with continuous second order derivatives al- and is bounded by its convex envelope. The uni-
lows the construction of a lower bounding MINLP variate concave terms are linearized. For each non-
problem with convex functions in the continuous convex term nt(x) with Hessian matrix Hnt(x), a
variables. If no mixed-bilinear terms are present convex underestimator L(x) is defined as
(Ai = 0, Vi), the resulting MINLP can be solved
L(x) - nt(x) - ~ ai(x v - xi)(xi - xL), (1)
to global optimality using the outer approxima-
i
tion algorithm (OA), [8]. Otherwise, the general-
ized Benders decomposition (GBD) can be used, where x v and x L are the upper and lower bounds
[10], or the Glover transformations [11] can be ap- on variable xi, respectively, and the a parame-
plied to remove these bilinearities and permit the ters are nonnegative scalars such that H n t ( x ) +
use of the OA algorithm. This convex MINLP pro- 2 diag(ai) is positive semidefinite over the domain
vides a valid lower bound on the original MINLP. [xL,xg]. The rigorous computation of the a pa-
An upper bound on the problem can be obtained rameters using interval Hessian matrices is de-
by applying the OA algorithm or the GBD to find scribed in [3], [5], [4].

328
MINLP: Branch and bound global optimization algorithm

The underestimators are updated at each node for the largest separation distances between the
of the branch and bound tree as their quality convex underestimating functions and the original
strongly depends on the bounds on the variables. nonconvex functions. These efficient rules are ex-
An unusual feature of the SMIN-c~BB algorithm posed in [2]. Variable bound updates performed
is the strategy used to select branching variables. before the generation of the convex MINLP have
It follows a hybrid approach where branching may been found to greatly enhance the speed of conver-
occur both on the integer and the continuous vari- gence of the c~BB algorithm for continuous prob-
ables in order to fully exploit the structure of the lems [2]. For continuous variables, the variable
problem being solved. After the node with the low- bounds are updated by minimizing or maximiz-
est lower bound has been identified for branching, ing the chosen variable subject to the convexified
the type of branching variable must be determined constraints being satisfied. In spite of its compu-
according to one of the following two criteria: tational cost, this procedure often leads to signif-
icant improvements in the quality of the underes-
1) Branch on the binary variables first.
timators and hence a noticeable reduction in the
2) Solve a continuous relaxation of the noncon- number of iterations required.
vex MINLP locally. Branch on a binary vari- In addition to the update of continuous vari-
able with a low degree of fractionality at the able bounds, the SMIN-c~BB algorithm also relies
solution. If there is no such variable, branch on binary variable bound updates. Through simple
on a continuous variable. computations, an entire branch of the branch and
The first criterion results in the creation of an in- bound tree may be eliminated when a binary vari-
teger tree for the first q levels of the branch and able is found to be restricted to 0 or 1. The bound
bound tree, where q is the number of binary vari- update procedure for a given binary variable is as
ables. At the lowest level of this integer tree, each follows:
node corresponds to a nonconvex NLP and the 1) Set the variable to be updated to one of its
lower and upper bounding problems at subsequent bounds y = YB.
levels of the tree are NLP problems. The efficiency
2) Perform interval evaluations of all the con-
of this strategy lies in the minimization of the num-
straints in the nonconvex MINLP, using the
ber of MINLPs that need to be solved. The combi-
bounds on the solution space for the current
natorial nature of the problem and its nonconvex-
node.
ities are handled sequentially. If branching occurs
on a binary variable, the selection of that variable 3) If any of the constraints are found infeasible,
can be done randomly or by solving a relaxation fix the variable to y = 1 - ys.
of the nonconvex MINLP an~i choosing the most 4) If both bounds have been tested, repeat this
fractional variable at the solution. procedure for the next variable to be up-
The second criterion selects a binary variable dated. Otherwise, try the second bound.
for branching only if it appears that the two newly In [1] GMIN, which operates within a classical
created nodes will have significantly different lower branch and bound framework, was proposed. The
bounds.Thus, if a variable is close to integrality at main difference with similar branch and bound
the solution of the relaxed problem, forcing it to algorithms [12], [17] is its ability to identify the
take on a fixed value may lead to the infeasibility of global optimum solution of a much larger class of
one of the nodes or the generation of a high value problems of the form
for a lower bound, and therefore the fathoming of
a branch of the tree. If no binary variable is close min /(x,y)
x,y
to integrality, a continuous variable is selected for s.t. h(x,y) = 0
branching.
g(x, y) < 0
A number of rules have been developed for the
xeXCR n
selection of a continuous branching variable. Their
aim is to determine which variable is responsible y C N q,

329
MINLP: Branch and bound global optimization algorithm

where N is the set of nonnegative integers and the nearest integer to provide an updated bound for
only condition imposed on the functions f(x,y), y*.
g(x, y) and h(x, y) is that their continuous relax- See also" G l o b a l o p t i m i z a t i o n in b a t c h de-
ations possess continuous second order derivatives. sign u n d e r u n c e r t a i n t y ; S m o o t h n o n l i n e a r
This increased applicability results from the use of n o n c o n v e x o p t i m i z a t i o n ; I n t e r v a l global op-
the aBB global optimization algorithm for contin- t i m i z a t i o n ; a B B a l g o r i t h m ; Global o p t i m i -
uous twice-differentiable NLPs [7], [5], [4]. z a t i o n in g e n e r a l i z e d g e o m e t r i c p r o g r a m -
At each node of the branch and bound tree, the ming; G l o b a l o p t i m i z a t i o n in p h a s e a n d
nonconvex MINLP is relaxed to give a noncon- c h e m i c a l r e a c t i o n e q u i l i b r i u m ; Global op-
vex NLP, which is then solved with the aBB algo- t i m i z a t i o n m e t h o d s for s y s t e m s of nonlin-
rithm. This allows the identification of rigorously ear e q u a t i o n s ; C o n t i n u o u s global o p t i m i -
valid lower bounds and therefore ensures conver- zation: M o d e l s , a l g o r i t h m s a n d software;
gence to the global optimum. In general, it is not Disjunctive programming; Reformulation-
necessary to let the aBB algorithm run to com- l i n e a r i z a t i o n m e t h o d s for global opti-
pletion as each one of its iterations generates a mization; M I N L P : L o g i c - b a s e d m e t h o d s ;
lower bound on global solution of the NLP being MINLP: Branch and bound methods;
solved. A strategy of early termination leads to M I N L P : Global o p t i m i z a t i o n w i t h a B B ;
a reduction in the computational requirements of C h e m i c a l process p l a n n i n g ; M i x e d i n t e g e r
each node of the binary branch and bound tree linear p r o g r a m m i n g : M a s s a n d h e a t ex-
and faster overall convergence. c h a n g e r n e t w o r k s ; M i x e d i n t e g e r nonlin-
The GMIN-c~BB algorithm selects the node ear p r o g r a m m i n g ; M I N L P : O u t e r a p p r o x -
with the lowest lower bound for branching at every i m a t i o n a l g o r i t h m ; G e n e r a l i z e d o u t e r ap-
iteration. The branching variable selection strat- p r o x i m a t i o n ; M I N L P : G e n e r a l i z e d cross de-
egy combines several approaches: branching prior- c o m p o s i t i o n ; E x t e n d e d c u t t i n g plane algo-
ities can be specified for some of the integer vari- rithm; Generalized Benders decomposition;
ables. When no variable has a priority greater than MINLP: Heat exchanger network synthe-
all other variables, the solution of the continuous sis; M I N L P : R e a c t i v e d i s t i l l a t i o n c o l u m n
relaxation is used to identify either the most frac- synthesis; M I N L P : D e s i g n a n d scheduling
tional variable or the least fractional variable for of b a t c h processes; M I N L P : A p p l i c a t i o n s
branching. in t h e i n t e r a c t i o n of design a n d control;
Other strategies have been implemented to en- M I N L P : A p p l i c a t i o n in facility location-
sure a satisfactory convergence rate. In particular, allocation; M I N L P : A p p l i c a t i o n s in blend-
bound updates on the integer variables can be per- ing a n d p o o l i n g p r o b l e m s .
formed at each level of the branch and bound tree.
References
These can be carried out through the use of inter- [i] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
val analysis. An integer variable, y*, is fixed at its C.A.: 'Global optimization of MINLP probelms in pro-
lower (or upper) bound and the range of the con- cess synthesis and design', Computers Chem. Engin. 21
straints is evaluated with interval arithmetic, using (1997), $445-$450.
[2] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
the bounds on all other variables. If the range of
C.A.: 'A global optimization method, aBB for twice-
any constraint is such that this constraint is vio- differentiable NLP's- If. Implementation and compu-
lated, the lower (or upper) bound on variable y* tational results', Computers Chem. Engin. 22 (1998),
can be increased (or decreased) by one. Another 1137-1158.
strategy for bound updates is to relax the integer [3] ADJIMAN,C.S., ANDROuLAKIS, I.P, MARANAS,C.D.,
variables, to convexify and underestimate the non- AND FLOUDAS, C.A.: 'A global optimization method,
aBB, for process design', Computers Chem. Engin. 20
convex constraints and to minimize (or maximize)
(1996), $419-$424.
a variable y* in this convexified feasible region. The [4] ADJIMAN, C.S., DALLWIG, S., FLOUDAS, C.A., AND
resulting lower (or upper) bound on relaxed vari- NEUMAIER, A.: 'A global optimization method, aBB
able y* can then be rounded up (or down) to the for twice-differentiable N L P ' s - I. Theoretical Ad-

330
MINLP: Branch and bound methods

vances', Computers Chem. Engin. 22 (1998), 1159- ExxonMobil Res. & Engin.
1179. Annandale, New Jersey 08801, USA
[5] ADJIMAN, C.S., AND FLOUDAS, C.A.: 'Rigorous con- E-mail address: ipandro~erenj, com
vex underestimators for general twice-differentiable
problems', J. Global Optim. 9 (1996), 23-40. MSC2000: 90C10, 90C26
[6] ADJIMAN, C.S., SCHWEIGER, C.A., AND FLOUDAS, Key words and phrases: mixed integer nonlinear program-
C.A.: 'Mixed-integer nonlinear optimization in process ming, global optimization, branch and bound algorithms.
synthesis', in D.-Z. Du AND P.M. PARDALOS (eds.):
Handbook Combinatorial Optim., Kluwer Acad. Publ.,
1998.
[7] ANDROULAKIS, I.P, MARANAS, C.D., AND FLOUDAS, MINLP: BRANCH AND BOUND METH-
C.A.: 'aBB, a global optimization method for general ODS
constrained nonconvex problems', J. Global Optim. 7 A general mixed integer nonlinear programming
(1995), 337-363. problem (MINLP) can be written as
[8] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer-
approximation algorithm for a class of mixed-integer min J'(x, y)
nonlinear programs', Math. Program. 36 (1986), 307-
s.t. h(x, y) = 0
339.
[9] FLOUDAS, C.A.: Nonlinear and mixed-integer optimi- (MINLP) g(x, y) < 0
zation: Fundamentals and applications, Oxford Univ.
x E R '~
Press, 1995.
[10] GEOFFRION, A.M.: 'Generalized Benders decomposi- yEZ m.
tion', J. Optim. Th. Appl. 10 (1972), 237-260.
Here x is a vector of n continuous variables and y
[11] GLOVER, F.: 'Improved linear integer programming
formulations of nonlinear integer problems', Managem. is a vector of m integer variables. In many cases,
Sci. 22 (1975), 445-452. the integer variables y are restricted to the values
[12] GUPTA, O.K., AND RAVINDRAN, R.: 'Branch and 0 and 1. Such variables are called binary variables.
bound experiments in convex nonlinear integer pro- The function f is a scalar valued objective func-
gramming', Managem. Sci. 31 (1985), 1533-1546.
tion, while the vector functions h and g express
[13] LATER, E.L., AND WOOD, D.E.: 'Branching and
bound methods: A survey', Oper. Res. (1966), 699-719.
linear or nonlinear constraints. Problems of this
[14] MCCORMICK, G.P.: 'Computatbility of global solu- form have a wide variety of applications, in ar-
tions to factorable nonconvex programs; Part I - convex eas as diverse as IR spectroscopy [6], finance [3],
underestimating problems', Math. Program. 10 (1976), chemical process synthesis [9], topological design of
147-175. transportation networks [12], and marketing [10].
[151 MURTY, K.G., AND KABADI, S.N.: 'Some NP-
complete problems in quadratic and nonlinear prgram- The earliest work on branch and bound algo-
minK', Math. Program. 39 (1987), 117-123. rithms for mixed integer linear programming dates
[16] NEUMHAUSER, G.L., AND WOLSEY, L.A.: Integer and back to the early 1960s [7], [13], [15]. Although the
combinatorial optimization, Wiley, 1988. possibility of applying branch and bound methods
[17] OSTROVSKY, G.M., AND MIKHAILOV, G.W.: 'Discrete to mixed integer nonlinear programming problems
optimization of chemical processes', Computers Chem.
was apparent from the beginning, actual work on
Engin. 14 (1990), 111-124.
[lS] QUESADA, I., AND GROSSMANN, I.E.: 'An LP/NLP such problems did not begin until later. Early pa-
based branch and bound algorithm for convex MINLP pers on branch and bound algorithms for mixed
optimization problems', Computers Chem. Engin. 16 integer nonlinear programming include [11], [14].
(1992), 937-947. A branch and bound algorithm for solving
[19] RYOO, H.S., AND SAHINIDIS, N.V.: 'Global optimiza-
(MINLP) requires the following data structures.
tion of nonconvex NLPs and MINLPs with applications
in process design', Computers Chem. Engin. 19 (1995), The algorithm maintains a list L of unsolved sub-
551-566. problems. The algorithm also maintains a record
[20] SMITH, E.M.B., AND PANTELIDES, C.C.: 'Global op- of the best integer solution that has been found.
timization of nonconvex MINLPs', Computers Chem. This solution, (x*, y*), is called the incumbent so-
Engin. 21 (1997), $333-$338.
lution. The incumbent solution provides an upper
Ioannis P. A ndroulakis bound, ub, on the objective value of an optimal
Corp. Strategic Res. solution to (MINLP).

331
MINLP: Branch and bound methods

The basic branch and bound procedure is as fol- The optimal solution to the initial nonlinear pro-
lows. gramming relaxation is y = (1/4, 1/4, 0), with an
1) Initialize" Create the list L with (MINLP) as objective value of z - 0 . Both yl and y2 take on
the initial subproblem. If a good integer so- fractional values in this solution, so it is necessary
lution is known, then initialize x*, y*, and to select a branching variable. The algorithm ar-
ub to this solution. If there is no incumbent bitrarily selects yl as the branching variable, and
solution, then initialize ub to +ce. creates two new subproblems in which Yl is fixed
at 0 or 1. In the subproblem with yl fixed at 0, the
2) Select" Select an unsolved subproblem, S,
optimal solution is y - (0, 1/4, 0), with z = 1/16.
from the list L. If L is empty, then stop: If
Since the optimal value of y2 is fractional, the algo-
there is an incumbent solution, then that so-
rithm again creates two new subproblems, with y2
lution is optimal; If there is no incumbent
fixed at 0 and 1. The optimal solution to the sub-
solution, then (MINLP) is infeasible.
problem with Yl = 0 and Y2 - 0 is y = (0, 0, 0),
3) Solve" Relax the integrality constraints in S with z - 1/8. This establishes an incumbent in-
and solve the resulting nonlinear program-
teger solution. The subproblem with yl - 0 and
ming relaxation. Obtain a solution ~, ~, and
Y2 -- 1 is infeasible and can be eliminated from
a lower bound, Ib, on the optimal value of the
consideration. The subproblem with Yl = 1 has
subproblem.
an optimal solution with y = (1, 1/4,0) and ob-
4) Fathom" If the relaxed subproblem was in- jective value z - 9/16. Since 9/16 is larger than
feasible, then S will clearly not yield a better the objective value of the incumbent solution, this
solution to (MINLP) than the incumbent so- subproblem can be eliminated from consideration.
lution. Similarly, if lb > ub, then the current Thus the optimal solution to the example problem
subproblem cannot yield a better solution to is y* = (0, 0, 0) with objective value z* = 1/8.
(MINLP) than the incumbent solution. Re-
move S from L, and return to step 2. y--(1/4,1/4,0)
5) Integer Solution" If ~ is integer, then a z-O
new incumbent integer solution has been ob-
tained. Update x*, y*, and ub. Remove S
from L and return to step 2.
6) Branch: At least one of the integer variables
y-(0,1/4,0) y--(1,1/4,0)
Yk takes on a fractional value in the solution
to the current subproblem. Create a new sub- z-1/16 z - 9/16
problem, S1 by adding the constraint bound > ub

Yk 4 L~kJ.
Create a second new subproblem, $2 by Y2 -- 0
adding the constraint

yk > I- 'kl. y-(o,o,o)


Remove S from L, add S1 and $2 to L, and z=i/8 infeasible
return to step 2. integer
The following example demonstrates how the
branch and bound algorithm solves a simple Fig." Branch and bound tree for a sample problem.
(MINLP)"
1 1 Since each subproblem S creates at most two
min (Yl - ~)2 _[_ (Y2 -- ~)2 q_ y32
new subproblems, the set of subproblems consid-
--2yl+2y2 <1 ered by the branch and bound algorithm can be
y binary. represented as a binary tree. The above figure

332
MINLP: Branch and bound methods

shows the branch and bound tree for the exam- and bound algorithms for MILP is the 'best bound
ple problem. rule', in which the subproblem with the smallest
There are a number of important issues in the lower bound is selected. The best bound rule is
implementation of a branch and bound algorithm widely used within branch and bound algorithms
for (MINLP). for (MINLP)[4], [11], [18]

The first important issue is how to solve the In step 6, there may be a choice of several vari-
nonlinear programming relaxations of the sub- ables with fractional values to be the branching
problems in step 3. If the objective function f and variable. A simple approach is to select the vari-
the constraint functions g are convex, while the able whose value Y'k is furthest from being an inte-
constraint functions h are linear, then the nonlin- ger [4], [11]. In mixed integer linear programming,
ear programming subproblems in step 3 are convex estimates of the increase in the objective function
and thus relatively easy to solve. A variety of meth- that will result from forcing a variable to an inte-
ods have been used to solve these subproblems in- ger value are often made. These estimates, called
cluding generalized reduced gradient (GRG) meth- 'pseudocosts' or 'penalties', are used to select the
ods [11], sequential quadratic programming (SQP) branching variable. Penalties have also been used
[4], active set methods for quadratic programming in branch and bound algorithms for mixed integer
[8], and interior point methods [16]. nonlinear programming problems [11], [18].

However, if the nonlinear programming sub- The performance of the branch and bound algo-
problems are nonconvex, then it can be ex- rithm can be improved by computing lower bounds
tremely difficult to solve the nonlinear program- on the optimal value of a subproblem without ac-
ming relaxation of S or even obtain a lower tually solving the subproblem. In [8], lower bounds
bound on the optimal objective function value. For on the optimal objective value of a subproblem are
some specialized classes of nonconvex optimization derived from an optimal dual solution to the sub-
problems, including indefinite quadratic program- problem's parent problem. If this lower bound is
ming, bilinear programming, and fractional linear larger than the objective value of the incumbent
programming, convex functions which underesti- solution, then the subproblem can be eliminated
mate the nonconvex objective function are known. from consideration. In [4], Lagrangian duality is
These convex underestimators are widely used in used to compute lower bounds during the solution
branch and bound algorithms for nonconvex non- of a subproblem. When the lower bound exceeds
linear programming problems. Branch and bound the value of the incumbent solution, the current
techniques for nonconvex continuous optimization subproblem can be discarded.
problems can also been used within a branch and Another way to improve the performance of
bound algorithm for nonconvex mixed integer non- a branch and bound algorithm for (MINLP) is
linear programming problems. For instance, the to tighten the formulation of the nonlinear pro-
B A R O N system uses this approach to solve a va- gramming subproblems before solving them. In the
riety of nonconvex mixed integer nonlinear pro- BARON package, dual information from the so-
gramming problems [17], [18]. This approach is lution to a nonlinear programming subproblem is
also used in the GMIN-c~BB algorithm to solve used to restrict the ranges of variables and con-
nonconvex 0 - 1 mixed integer nonlinear program- straints in the children of the subproblem [17], [18].
ming problems with twice differentiable objective
In branch and cut approaches, constraints called
and constraint functions [1].
cutting planes are added to the nonlinear program-
The choice of the next subproblem to be solved ming subproblems [3], [19]. These additional con-
in step 2 can have a significant influence on the straints are selected so that they reduce the size
performance of the branch and bound algorithm. of the feasible region of nonlinear programming
In mixed integer linear programming, a variety subproblems without eliminating any integer solu-
of heuristics are employed to select the next sub- tions from consideration. This tightens the formu-
problem [2]. One popular heuristic used in branch lations of the subproblems and thus increases the

333
MINLP: Branch and bound methods

probability that a subproblem can be fathomed Design a n d scheduling of b a t c h processes;


by bound. Furthermore, the use of cutting planes M I N L P : A p p l i c a t i o n s in t h e i n t e r a c t i o n of
can make it more likely that an integer solution design a n d control; M I N L P : A p p l i c a t i o n in
will be obtained early in the branch and bound facility location-allocation; M I N L P : Appli-
process. A variety of cutting planes developed for cations in b l e n d i n g a n d pooling problems.
use in branch and cut algorithms for integer linear
programming have been adapted for use in branch References
[i] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
and cut algorithms for nonlinear integer program-
C.A.: 'Global optimization of MINLP problems in pro-
ming. These include mixed integer rounding cuts cess synthesis and design', Computers Chem. Engin.
[3], knapsack cuts [3], intersection cuts [3], and lift- 21, no. 1001 (1997), S445-$450.
and-project cuts [19]. [2] BEALE, E.M.L.: ~Integer programming', in D. JACOBS
(ed.): The State of the Art in Numerical Analysis,
To date, little work has been done to compare
Acad. Press, 1977, p. 409-448.
the performance of branch and bound methods for [3] BIENSTOCK, D.: 'Computational study of a family
(MINLP) with other approaches such as outer ap- of mixed-integer quadratic programming problems',
proximation and generalized Benders decomposi- Math. Program. 74, no. 2 (1996), 121-140.
tion. B. Borchers and J.E. Mitchell (1997) com- [4] BORCHERS, B., AND MITCHELL, J.E.: 'An improved
pared an experimental branch and bound code branch and bound algorithm for mixed integer nonlin-
ear programs', Comput. Oper. Res. 21, no. 4 (1994),
with a commercially available outer approxima-
359-367.
tion code on a number of test problems [5]. This [5] BORCHERS, B., AND MITCHELL, J.E.: 'A computa-
study found that the branch and bound code and tional comparison of branch and bound and outer ap-
outer approximation code were roughly compara- proximation algorithms for 0-1 mixed integer nonlinear
ble in speed and robustness. R. Fletcher and S. programs', Comput. Oper. Res. 24, no. 8 (1997), 699-
Leyffer (1998) compared the performance of their 701.
[6] BRINK, A., AND WESTERLUND, T.: 'The joint problem
branch and bound code for mixed integer con- of model structure determination and parameter esti-
vex quadratic programming problems with their mation in quantitative IR spectroscopy', Chemometrics
implementations of outer approximation, gener- and Intelligent Laboratory Systems 29 (1995), 29-36.
alized Benders decomposition, and an algorithm [7] DAKIN, R.J.: 'A tree-search algorithm for mixed in-
that combines branch and bound and outer ap- teger programming problems', Computer J. 8 (1965),
250-255.
proximation approaches [8]. Fletcher and Leyffer
[8] FLETCHER, R., AND LEYFFER, S.: 'Numerical experi-
found that their branch and bound solver was con- ence with lower bounds for MIQP branch-and-bound',
sistently faster than the other codes by about an SIAM J. Optim. 8 (1998), 604-616.
order of magnitude. [9] FLOUDAS, C.A.: Nonlinear and mixed-integer optimi-
See a l s o " Disjunctive programming; zation: fundamentals and applications, Oxford Univ.
Press, 1995.
Reformulation-linearization m e t h o d s for
[10] GAVISH, B., HORSKY, D., AND SRIKANTH, K.: 'An ap-
global o p t i m i z a t i o n ; M I N L P : Logic-base d proach to the optimal positioning of a new product',
m e t h o d s ; M I N L P : B r a n c h and b o u n d global Managem. Sci. 29, no. 11 (1983), 1277-1297.
o p t i m i z a t i o n algorithm; M I N L P : Global [11] GUPTA, O.K., AND RAVINDRAN, A.: 'Branch and
o p t i m i z a t i o n w i t h aBB; C h e m i c a l process bound experiments in convex nonlinear integer pro-
gramming', Managem. Sci. 31, no. 12 (1985), 1533-
planning; M i x e d integer linear p r o g r a m -
1546.
ming: Mass a n d heat e x c h a n g e r networks; [12] HOANG, HA! HOC: 'Topological optimization of net-
M i x e d integer n o n l i n e a r p r o g r a m m i n g ; works: A nonlinear mixed integer model employing gen-
M I N L P : O u t e r a p p r o x i m a t i o n algorithm; eralized Benders decomposition', IEEE Trans. A utom.
Generalized o u t e r a p p r o x i m a t i o n ; M I N L P : Control 27 (1982), 164-169.
G e n e r a l i z e d cross decomposition; E x t e n d e d [13] LAND, A., AND DOIG, A.: 'An automatic method of
solving discrete programming problems', Econometrika
c u t t i n g plane algorithm; G e n e r a l i z e d Ben-
28, no. 3 (1960), 497-520.
ders decomposition; M I N L P : H e a t ex- [14] LAUGHHUNN, D.J.: 'Quadratic binary programming
changer n e t w o r k synthesis; M I N L P : Reac- with applicationsto capital-budgeting problems', Oper.
tive distillation c o l u m n synthesis; M I N L P : Res. 18, no. 3 (1970), 454-461.

334
MINLP: Design and scheduling o~ batch processes

[15] LAWLER, E.L., AND WOOD, D.E.: 'Branch and bound this problem does not give the actual schedule,
methods: A survey', Oper. Res. 14, no. 4 (1966), 699- but does guarantee that a feasible schedule exists.
719. A separate problem, typically a MILP, must be
[16] LEE, E.K., AND MITCHELL, J.E.: 'Computa-
tional experience of an interior point algorithm
solved to find the actual schedule.
in a parallel branch-and-cut framework': Proc. The second method for formulating the batch
Eighth SIAM Conf. Parallel Processing for Sci. process design and scheduling problem is based on
Computing, Minneapolis, March 1 9 9 7 , 1997, a state-task-network (STN) representation. In this
www.siam.org/catalog/mcc07/heath97.htm. approach, the planning horizon is discretized into
[lZ] RYOO, H.S., AND SAHINIDIS, N.V.: 'A branch-and-
time steps. Each task must be assigned to both a
reduce approach to global optimization', J. Global Op-
tim. 8, no. 2 (1996), 107-139. unit and a time slot. The formulation results in
[ls] SAmNIDIS, N.V.: 'BARON: A general purpose global a large MINLP whose solution provides both the
optimization software package', J. Global Optim. 8, plant design and the actual schedule.
no. 2 (1996), 201-205.
[lO] STUBBS, R.A., AND MEHROTRA, S.: 'A branch-and-
Continuous-Time F o r m u l a t i o n s . The early
cut method for 0-1 mixed convex programming', Math.
Program. 80 (1999), 515-532. work of [10] was based on the single product cam-
paign (SPC) scheduling policy. In a single product
Brian Botchers
Dept. Math. New Mexico Tech.
campaign, all batches of one product are processed
Socorro, NM 87801, USA one after the other, followed by all of the batches
E-mail address: b o r c h e r s C n m t , edu of the next product, and so on.
MSC 2000: 90Cll In this approach, the scheduling information is
Key words and phrases: mixed integer programming, incorporated by way of a planning horizon con-
branch and bound, MINLP. straint. This constraint requires that all products
must be completed before the planning horizon,
H, is reached. In a single product campaign, the
MINLP: DESIGN AND SCHEDULING OF time between batches of product i is based on the
BATCH PROCESSES maximum processing time over all of the stages,
The design of batch processes has been a major
tLi = ma. x(tij ),
area of research for the past several decades. In )
conjunction with the design of batch plants, many where tLi is the 'limiting' time for product i. The
different approaches have been proposed for the planning horizon constraint can be written as the
determination of an optimal schedule for the plant. sum over all of the products of the limiting time
It has been recognized for some time that in order multiplied by the number of batches of each prod-
to increase the efficiency of batch processes, the
uct
two tasks of design and scheduling should be con-
Qi
sidered simultaneously.
The problem is to design a batch process con- i
sisting of M processing steps, in which N products where Qi is the total production of i and Bi is the
are made, where all materials follow the same path batch size for i. Because Qi and Bi are variables,
through the process. This is commonly known as this results in a NLP.
a multiproduct batch plant, or a flow-shop. In [4] the authors formulated the batch process
There are two predominant methods for for- design and scheduling problem as a MINLP. Their
mulating the batch process design and schedul- model was based on the SPC model of [10]. In this
ing problem. The first is a continuous-time ]ormu- problem, more than one piece of equipment per
lation in which the scheduling information is in- stage is available for use in parallel. Rather than
corporated through a planning horizon constraint. solve the MINLP rigorously, they relaxed the num-
This problem can be formulated as a NLP or ber of units per stage to be continuous and solved
MINLP depending on whether the number of par- the resulting NLP. [5] formulated the MINLP us-
allel units is fixed or variable. The solution of ing binary 0-1 variables and solved it with an outer

335
MINLP: Design and scheduling of batch processes

approximation method. In addition to the combi- 1 if unit j exists


natorial nature of the problem due to integer vari- YEXj - {
0 otherwise,
ables, the solution of the problem is complicated
by the nonconvex form of the planning horizon 1 if unit j contains
constraint. YCcj -- c parallel units
[2] developed extensions of the SPC formula-
0 otherwise,
tion to allow more efficient utilization of the batch
process equipment. They considered two mixed- 1 if task t is assigned
product campaign (MPC) scheduling policies, Ytj- to unit j

i) the unlimited intermediate storage (UIS) pol- 0 otherwise,


icy; and 1 if t is the first task
ii) the zero-wait (ZW) policy. YFtj - processed in unit j
0 otherwise.
As its name implies, a mixed product campaign
allows batches of different products to be pro- 2) Design constraints
cessed sequentially. For example, a SPC schedule - Task volume requirement, VtT, depends
for three batches each of two products A and B on batch size, Bi, of each product and
would be, AAABBB, while a MPC schedule could size factor, Sit, for each product in each
be ABABAB. In the zero-wait policy, when a prod- task.
uct has completed processing in one stage, it must
immediately begin processing in the next stage. > B S t.
Conversely, the UIS policy allows a product to be - The volume of a processing unit j must
stored for a period of time before beginning the be large enough to accomodate task t if
next processing step. [7] showed that for the case task t is assigned to unit j, (Ytj = 1).
of zero cleanup times, the UIS policy is the most
efficient mixed-product campaign policy, while the >_ VtT - V f ( 1 - Ytj).
ZW policy is the most conservative. [2] incorpo-
- The processing time, ptij, for each prod-
rated the new scheduling policies into the batch
uct in each unit is given by the corre-
process design problem by considering the charac-
sponding time factor, tit, for each prod-
teristic cycle time for each policy. The cycle time
uct in task t if task t is assigned to unit
becomes the basis upon which the planning hori-
j, (Ytj = 1).
zon constraint is imposed.
[3] used the batch design formulation with ptij >_ ~ titYtj.
mixed-product campaign schedules to formulate t
the batch synthesis, design and scheduling prob- - The number of batches, ni, multir 'ied by
lem. In this formulation the number of stages, M, the batch size must satisfy the ~'r, duc-
in the batch process is not fixed. Instead, each tion requirement, Qi, for each p~ ~'_-duct.
product is required to undergo the same sequence,
T, of processing tasks. Units that each can per- niBi >_ Qi.
form one of the tasks are given, and in addition,
3) Parallel equipment constraints
'superunits' are postulated that can combine two -The number of parallel units in each
or more tasks. The problem is to assign tasks to
stage j is determined by the binary vari-
units, size the units, and determine the number of able YCcj multiplied by the number c,
parallel units in the batch process.

Problem formulation. Nj - Z YC j.
(2

1) Binary variables 4) Scheduling constraint

336
MINLP: Design and scheduling of batch processes

For the UIS policy with zero cleanup The objective is to minimize the cost of
times, the planning horizon constraint the plant. [3] used a fixed-charge cost for
derived by [2] is used, each unit, ~j, plus a nonlinear cost func-
tion on the size of the unit,
niptij <_H. Nj.
i
Cost -
5) Logical constraints J
- If a stage j exists, then at least one pro-
cessing task must be assigned to it, This formulation is a MINLP where all binary
variables participate linearly and separably. How-
E Ytj >_YEXj. ever, it is a nonconvex problem due to the cost
t function, and the bilinear terms in the batch size
If a stage j does not exist, there can be constraints and the planning horizon constraints.
no tasks assigned to it, [3] used the outer approximation method imple-
mented in DICOPT ([11]) to solve a number of
Ytj <_YEXj. example problems. Due to the nonconvexities in
If a stage j exists, then one of the tasks the formulation, there is no guarantee of global
assigned to it must be the first task as- optimality with the outer approximation method,
signed to stage j, but they report good results for the examples pre-
sented in the paper.
YFtj = YEXj. Two examples are briefly discussed to illustrate
t the proposed approach for multiproduct batch
There cannot be more than one first task plants with a variety of scheduling policies. The
assigned to each stage, first example consists of three products with four
processing tasks and five potential units and su-:
EYFtj < 1. perunits. The MINLP formulation with the SPC
t
policy contains 33 binary variables and 54 contin-
A task can be the first task assigned to uous variables. With the ZW policy, the number
a stage only if the task is among those of binary variables drops to 8, with 98 continuous
assigned to the stage, variables. For the UIS policy, the formulation has
33 binary variables with 51 continuous variables.
YFtj <__Ytj.
The second example is larger and contains
No tasks that occur before the first task 6 products with 7 potential units and supe-
assigned to stage j can be among those runits. The SPC policy formulation contains 46
assigned to the stage, binary variables and 101 continuous variables. The
MINLP formulation for the ZW policy has 11 bi-
Yt,j <_ 1 - Y F t j fort'<t. nary and 374 continuous variables. The UIS policy
If multiple tasks are assigned to a unit, formulation has 46 binary and 95 continuous vari-
they must be consecutive tasks, ables. In all cases the examples were solved in less
than 50 minutes using G A M S / D I C O P T + + on Mi-
Ytj <_ YFtj + Yt-lj. crovax II.

One and only one binary variable that


determines the number of parallel units Discrete-Time Formulations. A.P.F.D.
in stage j must be active, Barbosa-Pbvoa and S. Macchietto, [1], proposed a
MILP formulation to address the problem of op-
E YCcj = 1 . timal batch design by simultaneously considering
c
optimizing production schedule. They based their
6) Objective function formulation on

337
MINLP: Design and scheduling of batch processes

a) an extended state-task-network (mSTN) rep- e) amount of material delivered and received at


resentation of the batch plant; and each time period;
b) the discrete time representation using uni- f) the amount of material transfered at each
form time discretization. time period; and
In the STN representation, proposed in [6], all g) the amount of material stored at each time
the materials are represented as states processed period.
through a set of processing steps ('tasks'). In or- The proposed formulation correspond to a mixed
der to incorporate connectivity constraints the ex- integer linear programming (MILP) problem since
tended state-task-network (mSTN) is proposed in- they used linear cost functions to express the cap-
volving the alternative design configurations con- ital cost of equipments and time discretization to
sidering all permitted equipment and connections represent time. Three examples were solved illus-
allocations. Single campaign is assumed with a trating:
cyclic schedule of cycle time T repeated over a
a) the effect of limited connectivity and connec-
planning horizon H. A cycle represents a sequence tion cost in the optimal design;
of operations involving the production of all prod-
b) the advantages of considering simultane-
ucts and the utilization of all resources. The op-
ously the plant design and plant connectivity
erational characteristics such as the allocation of
rather than optimizing first the equipment
equipments to tasks, batch sizes, task timings,
sizes and then optimizing plant connectivity.
transport of material and storage profiles are iden-
tical in each cycle. The mathematical formulation In later work, Barbosa-Pdvoa and C.C. Pan-
they proposed involves: telides, [1], proposed a new mathematical for-
mulation for the optimization of batch plant de-
allocation constraints for the assignment of
sign considering detailed operation characteris-
the tasks to the units
tics (i.e., short term scheduling). This formula-
• capacity constraints expressing the limiting tion also considers a uniform time discretization,
equipment capability the only difference lies in the plant representation.
• connectivity constraints for determining the The resource-state-task (RTN) plant representa-
connection of different units tion, [9], was used which corresponds to a more
• dedicated storage constraints general and uniform description of all available
production resources. However, the new formula-
• mass balances
tion shares the main characteristics of the previous
• production requirement constraints presented one with the same basic variables, and
• an objective function, which is chosen to be constraints.
either the minimization of the capital cost or Both formulations share the limitations of the
the maximization of plant profit. discrete time formulations, which are that:

The main variables of the formulation are: i) they correspond to an approximation of the
time horizon; and
a) binary structural variables representing the
existence of an equipment; ii) they result in an unnecessary increase of the
number of binary variables in particular, and
b) binary allocation variables for the assignment in the overall size of the mathematical model.
of a task to a unit at the beginning of a time
period; A continuous-time formulation was proposed
in [12], based on the STN representation and
c) continuous variables representing the capac- the scheduling formulation proposed in [13]. It
ity of a unit;
gives rise to a mixed integer nonlinear program-
d) continuous variables corresponding to the ming problem which is solved using a stochastic
batch size of a task to a unit at each time MINLP optimizer based on an evolutionary algo-
period; rithm (EA) with simulated annealing (SA) pre-

338
MINLP: Design and scheduling of batch processes

sented in [12]. The method is based on a guided which establish the relationship between pro-
stochastic generation of alternative vectors of deci- cessing time, Tijt, and time of event (l), Tl.
sion variables, which explore promising areas of the
O <_ T1 <_ T2 < . . . < Tlmax <_ H ,
search space through selection, crossover, and mu-
tation operations applied to individuals in a pop- expressing the monotonic increase in event
ulation of solution candidates. It can be used to times.
deal with nonconvex, nondifferentiable functions
4) Allocation constraints:
although it has no guarantee of convergence to
even a local optimal solution. The proposed for-
mulation involves the following basic variables:
o_ E E -E E E
iEb v,<v iEb l<V' V'<V
• Main design variables representing the dis- <Ej,
crete decisions of selecting a unit (j), Ej, or
a storage (s), Es, or continuous decisions cor- E /,,<_/max
iEIj
E iEIj
E E
t ( t t! ttt<~t max
responding to the capacity of unit storage or
utility, Vj, Vs, and Uu, respectively. W i j l -- E Xijll' ,
l'>l
• Main operation variables corresponding to
the discrete decision of allocation of task (i) expressing the relationship between Wijl and
in unit (j) at time Tt, Wijl, and the decision Xijw operation variables, [13].
of assigning task (i)in unit (j) between start- 5) Material balances written for state s at event
ing time Tt and end time Tl,, and continuous time Tt"
variables, the time of event (/), Tt, the batch
size, the processing time and utility require- Csl I - C s l ' - i
ment of task (i) allocated to unit (j) starting +)-]~E in E B i j l X i j l l '
Psij
at TI, Bijl, Tijl, Ui~ l, respectively, iEIs jEJi l<l'
Based on these variables the proposed formulation
--
ZZooPsij Bijl' ,
involves" iEIs jEJi
1) Processing task models" o <_ C,~, <_ V,o + tl,.

U~jl - a ~ l + ~ " i j l , 6) Utility constraints written for utility (u) at


expressing the consumption-generation of event time TI"
utilities as a function of batch size; Uul I -- V u l ' - 1
f~. . RTiJ
7"ijl - ~ijl + t-'U~'ijl + E It~Ui~ l
u
iEIu jEdi I<I'
+~ #~Aj~t,
ot - Z Z
iEIu jEdi
expressing the dependence of processing
o < u~,, < u . ,
time, Tiff, of batch size, Bijl, utilities, Ui~t,
and unit availabilities, Aj~.
2) Batch size constraints: iEIu jEdi l

~m 7) Availability constraints written for unit (j)


ij i n v .3w i j I <
-- Bijl < a~maxtr txr
-- 9'ij vj vv ijl
at event time Tl"
imposing the maximum and minimum capa-
bility of unit (j) when task (i) is performed. Ajat,+l = ~ Ajat'a~Wij' ' - ~ 3i3W'J' ' ,
3) Timing constraints" iEIj iEIj

Aj~, <_ ~ 7~W,j,,.


iEIj
l'>l

339
MINLP: Design and scheduling of batch processes

8) Existence constraints: annealing (SA), [12], to solve this problem. They


utilized simulated annealing to improve the poor
< Ej,
local search ability of EA. A suitable encoding
procedure is proposed which results in reduction
vjmm E j < Vj ~ v'jm ax E j ,
in the number of constraints and variables by up
Vsmin ~s
its, _< V, < Vsmax Es, to 50%. In particular, they explored the mathe-
that correspond to logical restrictions on pro- matical structure of the problem in the following
duction unit and storage tank size if this unit- sense. If Wijt - 1 and Xijw - 1, unit j exists, it
storage tank is present at the optimal design. executes operation k which starts at S T j - 1
finishes at FTJk - l' involving task TSJk - i
9) Production constraints:
with batch size B SJk - Bijt and utility usage
Cslmax ~__Rs , UJk -- U~t. So they proposed to replace Wijt,
expressing the requirement of producing at Xiju,, Bijl and Ui~ l by the operation sequence of
least as much as the market demands for tasks in units: task sequence T S j - ( i l , . . . ,iN~),
state (s). task batch size B s J - (B1,... ,BNj), task utility
10) Objective function" usage UJu - ( U ~ , . . . , U N j , ) , start time STJ =
(ll,...,1gj), finish time FTJ - (l~,..., gj)'l'
Profit = y ~ psCslma~
In this way the decision variables become
s6Sp
(E j, Vj, E s, V,, Uu, Tt, T SJ, B S j, uJu, STJ, FTJ).
-'1- E ps(Cslmax - Cso) The algorithm starts with an initial guess and
s6Si evolves a number of candidate instances for these
variables. The allocation and the capacity con-
s6S f u straints are automatically satisfied by each candi-
the first two terms represent the revenue due date solution and Tl are chosen so that the timing
to product and intermediate state produc- constraints are also satisfied. Two examples are
tion, respectively, whereas the last two terms presented to illustrate the applicability of the pro-
express the cost of raw materials and utili- posed approach to solve batch design problem
ties, respectively, involving detailed scheduling constraints. Linear
and nonlinear task processing times and unit cost
Cost - - ~-~(Ej~j+ Z j V ; j ) models are considered for both the examples. For
J the first example considering linear functions for
+ + Lv>); processing times and unit cost models the results
8
obtained are compared with a discrete time for-
the first term represent the cost of installing mulation, [8], and found to outperform it in terms
production unit (j), whereas the second term of number of variables which is expected since the
correspond to the cost of storage tank (s). formulation is based on the continuous time de-
Objective - Cost - Profit. scription and the computational requirement for
the solution of their model. Considering nonlin-
This above formulation correspond to a MINLP ear models for processing times and unit costs,
problem with decision variables" Wijt, Xijll,, Bijl, the resulting model for a problem with 4 produc-
Ui~t, ~ that correspond to plant operation and tion units, 4 storage tanks, 5 tasks and 4 states,
Ej , Es , Vj , Vs , Uu that represent design deci- involves 62 integer and 34 continuous variables
sions. Nonconvexities appear in the timing con- and 122 constraints. This example was the largest
straints, material balances, utility constraints as presented in this work, and required considerable
bilinear products of binary and continuous vari- computational effort, 7849.23 CPU seconds on a
ables and in the objective function in power form SUN ULTRAstation- 1.
of the type V ; j and Vs%. The authors proposed
an evolutionary algorithm (EA) with simulated See also: Chemical process planning;

340
MINLP: Generalized cross decomposition

Mixed integer linear programming: Mass ond Conf. Foundations of Computer Aided Operations
and heat exchanger networks; Mixed integer (1994), 253-274.
nonlinear programming; MINLP: Outer ap- [1o] SPARROW, R.E., FORDER, G.J., AND RIPPIN,
D.W.T.: 'The choice of equipment sizes for multiprod-
proximation algorithm; Generalized outer uct batch plants. Heuristics vs. branch and bound',
approximation; MINLP: Generalized cross Industr. Engin. Chem. Process Des. Developm. 14
decomposition; Extended cutting plane al- (1975), 197-203.
gorithm; MINLP: Logic-based methods; [ii] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
bined penalty function and outer-approximation
MINLP: Branch and bound methods;
method for MINLP optimization', Computers Chem.
MINLP: Branch and bound global optimi- Engin. 14 (1990), 769-782.
zation algorithm; MINLP: Global optimi- [12] XIA, Q., AND MACCHIETTO, S.: 'Design and synthesis
zation with ~BB; MINLP: Heat exchanger of batch plants- MINLP solution based on a stochastic
network synthesis; MINLP: Reactive distil- method', Computers Chem. Engin. 21 (1997), $697-
lation column synthesis; Generalized Ben- $702.
ders decomposition; MINLP: Applications [13] ZHANG, X., AND SARGENT, R.W.H.: 'The optimal op-
eration of mixed production facilities - general formu-
in the interaction of design and control; lation and some solution approaches for the solution',
MINLP: Application in facility location- Proc. 5th Internat. Syrup. Process Systems Engin. (Ky-
allocation; MINLP: Applications in blend- ongju, Korea) (1994), 171-177.
ing and pooling problems; Job-shop sched- Christodoulos A. Floudas
uling problem; Stochastic scheduling; Vehi- Dept. Chemical Engin. Princeton Univ.
cle scheduling. Princeton, NJ 08544-5263, USA
E-mail address: f l o u d a s ~ t i t a n , princeton, edu
S. T. Harding
References Dept. Chemical Engin. Princeton Univ.
[1] BARBOSA-P6VOA, A.P.F.D., AND MACCHIETTO, S.: Princeton, NJ 08544-5263, USA
'Detailed design of multipurpose batch plants', Com- Marianthi Ierapetritou
puters Chem. Engin. 18 (1994), 1014-1042. Dept. Chemical and Biochemical Engin. Rutgers Univ.
[2] BIREWAR, D.B., AND GROSSMANN, I.E.: 'Incorporat- 98 Brett Road
ing scheduling in the optimal design of multiproduct Piscataway, NJ 08854, USA
batch plants', Computers Chem. Engin. 13 (1989), E-mail address: marianth~sol.rutgers.edu
141-161.
MSC 2000:90C26
[3] BIREWAR, D.B., AND GROSSMANN, I.E.: 'Simultane-
ous synthesis, sizing and scheduling of multiproduct Key words and phrases: batch process, design, scheduling,
batch plants', Industr. Engin. Chem. Res. 29 (1990), continuous and discrete time models.
2242-2251.
[4] GROSSMANN, I.E., AND SARGENT, R.W.H.: 'Optimum
design of multipurpose batch plants', Industr. Engin. M I N L P : GENERALIZED CROSS DECOM-
Chem. Process Des. Developm. 18 (1979), 343-348.
POSITION
[5] Kocls, G.R., AND GROSSMANN, I.E.: 'Computational
experience with DICOPT solving MINLP problems in Decomposition methods, such as the classical Ben-
process synthesis engineering', Computers Chem. En- ders decomposition (cf. G e n e r a l i z e d B e n d e r s
gin. 13 (1989), 307-315. decomposition), [1], and Dantzig-Wolfe decom-
[6] KONDILI, E., PANTELIDES, C.C., AND SARGENT, position, [3], have been used to solve many dif-
R.W.H.: 'A general algorithm for short-term schedul-
ferent large structured optimization problems, by
ing of batch operations- I. MILP formulation', Com-
puters Chem. Engin. 17 (1993), 211-227. decomposing them with the help of relaxation of
[7] KU, H., AND KARIMI, I.: 'Scheduling in multistage se- constraints or fixation of variables. The success of
rial batch processes with finite intermediate storage - such an approach depends very much on the struc-
Part I. MILP formulation; Part II. Approximate algo- ture of the problem. In some cases these methods
rithms', AIChE Annual Meeting, Miami (1986). are very efficient, but in other cases they are not
[8] MANUAL: gBSS, general batch scheduling system - User
competitive with other techniques.
manual and language reference, Imperial College, 1996.
[9] PANTELIDES, C.C.: 'Unified frameworks for the op- However, the simple elegance of these basic prin-
timal proces planning and scheduling', Proc. Sec- ciples has inspired many researchers to propose

341
MINLP: Generalized cross decomposition

modifications of the basic methods, mostly aimed for example nonlinear mixed integer programming
at improving the efficiency of the methods, but problems, see for example [4].
also aimed at extending the applicability of the
approaches. T h e P r o b l e m . Consider the following general op-
Dantzig-Wolfe decomposition, originally for lin- timization problem.
ear programming problems, [3], has been extended
to convex nonlinear programming problems, [2],
v* - min f (x, y)
under several names, for example generalized lin- s.t. Gi(x, y) < 0
ear programming. We will here simply use the term (P) a2(x, y) <_ 0
'nonlinear Dantzig-Wolfe decomposition'. xEX
Benders decomposition, originally for linear yEY
mixed integer programming problems, [1], has
where X and Y are compact, nonempty sets. As-
been extended to partly convex nonlinear pro-
sume that X is convex and f, Gi and G2 are
gramming problems, [5], under the name 'gener-
proper convex functions in x for any fixed y E Y,
alized Benders decomposition'.
i.e. that the problem is convex in x. Also assume
On the other hand, among the numerous sug- that that f, Gi and G2 are bounded and Lips-
gestions for modifications to increase the effi- chitzian on (X, Y). Note that we do not assume
ciency, there is one which in a way shares the any convexity in the y-variables. An important
simplicity and clear principle of the basic meth- case is when Y is a (finite) set of integers.
ods, namely cross decomposition, [11]. Usually de-
Furthermore we assume the following (as
scribed as a combination of Benders decomposition
was done in [5] for generalized Benders de-
and Dantzig-Wolfe decomposition, simultaneously
composition). The optimization with respect to
using the two methods in an iterative manner,
x of the Lagrangian functions must be possi-
the method borrows its basic convergence prop-
ble to do 'essentially independent' of y (called
erties from these two methods. However, one can
property P by A.M. Geoffrion). We there-
also view cross decomposition as the more general
fore assume that the functions ql, q2, q3 and
method, and Benders and Dantzig-Wolfe decom-
q4 exist, such that f ( x , y ) + u~Gi(x,y)+
position as modifications of cross decomposition,
uT2 G2(x, y) -- qi(q3(x, u), y, u), Vx, y, u, and
obtained by excluding one of the subproblems and
u~Gi(x, y) + ~T2G2(x, y) -- q2(q4(x, u), y, u),
one of the master problems.
Vx, y, ~, where q3 and q4 are scalar functions, qi
Cross decomposition was originally developed and q2 are increasing in their first argument, and
for linear mixed integer programming problems, is assumed to belong to the set of all possi-
[11], but the approach is more general and not ble nonnegative, normalized directions C - {~ >_
restricted to such problems. The first application 0" e T ~ - 1}, where e is a vector of ones. Since ],
of cross decomposition was to the capacitated fa- Gi and G2 are convex in x and bounded and Lip-
cility location problem, [12], and produced a so- schitzian on (X, Y), the same applies to qi for any
lution method which is recognized as one of the fixed u _> 0, and to q2 for any fixed ~ E C.
most efficient existing methods for that problem. The optimal solution of P is denoted by (x*, y*).
However, another early application was to the sto- We will also mention the case when P is convex,
chastic transportation problem (a convex problem i.e. where f, Gi and G2 are convex functions (in
with linear parts), [10]. y too) and Y is a convex set. Lagrangian duality
Here we will describe 'generalized cross decom- can be used to get a dual solution (the optimal
position', which was first proposed in [6], and more Lagrange multipliers), denoted by u* - ( u ~ , ul).
thoroughly treated in [7]. The generalization of the Let us for convenience introduce the following
procedure, parallel to that in [5] for generalized notation.
Benders decomposition, enables the solving of non-
linear programming problems with convex parts, L(x,y,u) - f(x, y ) + u~Gi(x,y) + uT2 G2(x,Y),

342
MINLP: Generalized cross decomposition

y, - y) + Ia:(x, y), the optima in x (for fixed u and ~) will be attained.


qi and q2 are increasing in their first argument, so
Ll (x, y, ui ) - f (x, y) + uTI GI (x, y),
the minimization in x can be made in q3 and q4
instead, and the value of y will thus not influence
the result of this minimization. The minimization
T h e P r i m a l M a s t e r P r o b l e m . Using the primal over x can be made once (for any y) and the result
structure of (P) we can rewrite it as will then be true for all y E Y.
The relaxed primal master problem only con-
v* - min h(y),
yEV tains a finite number of cuts (with index sets Pu
where Vy--E V, and Ru) which gives an approximate description
of h(y) and V, and an optimal objective function
h(y) - m i n f ( x , y)
value, VPM ~ V*. Since the part of the problem
s.t. Gl(x, y) <_ 0 that is described by the constraints is convex in
G2(x, y) < 0 X, VpM will converge asymptotically towards v* as
xEX the sets of constraints grow.

and The constraints can now be expressed as

V _ I y E Y. 3x E X . Gl (x,y) <_ O, } /
G2(x, y) < 0 " q> ql | m i n q 3 ( x u (k)) y,u (k)) Vk E Pu,
\zEX ' ' - '
The problem is convex in x, so we can use La- %,

grangian duality to get, Vy E V, 0 >_ q2 (minq4(x, ~(k)) y, ~(k)~ Vk E Ru.


k.zEx ' ] '
h(y) = max minL(x, y, u).
u>0 xEX
The minimization in x can now be made inde-
A similar expression can be obtained for V: pendently in each constraint, since the other ar-

V= { yEY" ( -
maxminL(x,y,~)
~EC x EX
) <0 } . guments in q3 and q4, namely u and ~, are fixed.
Since the minima are attained, we use the nota-
The full primal master problem is given below: tion x (k), Vk E Pu, and ~(k), Vk E Ru, for the
minimizers of q3 and q4.
v* - min q
Inserting this, we obtain the final form of the
s.t. q>minL(x,y,u), Vu>_0, relaxed primal master problem.
xEX

0>minL(x,y,~), V~EC,
xEX VPM -- min q
yEY. (PM) s.t. q > L(x (k), y, u(k)), Vk E Pu,
This problem has an infinite number of con- 0 _> L(~(~), y, ~(k)), Vk E Ru,
straints, one for each nonnegative dual point and yEY.
one for each nonnegative dual direction. Each con-
straint contains an optimization problem (mini- The constraints in the first set are called value
mization with respect to x), which should in theory cuts, and those in the second set are called feasi-
be solved for all y E Y before the main problem, bility cuts.
minuey h(y), can be solved. However, we have

T h e D u a l M a s t e r P r o b l e m . Using Lagrangian
zexminL(x' y' u) - ql \zEx(minq3(x'u)' y, u)
duality on (P) yields a relaxation and a lower
and bound, VL, on v*"

minL(x y , ~ ) - q 2
xEX '
(
\xEX
minq4(x ~) y,
' '
) " VL -- maxg(ul)
Ul~O
Since ql and q2 are proper, convex, bounded and
Lipschitzian on X, and X is compact and convex, where, VUl _> 0,

343
MINLP: Generalized cross decomposition

g(Ul) -- min L1 (x, y, Ul) To handle unbounded dual solutions, ul, we can
use the following subproblem:
s.t. G2(x, y) <_ 0
xEX ~(ul) - min L1 (x, y, ul)
yEY. (UDS) s.t. G2(x, y) <_ 0
xEX
This leads to a dual master problem, which is a
convexification of the problem. If (P) is not convex yEY.
a duality gap might occur. We denote the subset (UDS) does not produce a bound on v*, but if
of the solutions that are included by (x(k),y(k)), ~(ul) _ 0 it yields a dual cut that will eliminate
Vk E Px, and obtain the restricted dual master it 1 •
problem as
VDM - maxq The Cross Decomposition Algorithm. In
(DM) s.t.
q < L1 (x (k) , y(k), ul), the subproblem phase of the cross decomposition
Vk E Px, method we iterate between the primal subproblem
(PS) and the dual subproblem (DS) (or (UDS)).
Ul~O.
The primal subproblem, (PS), supplies an upper
bound, h(~), on v*, and ul for the dual subprob-
The S u b p r o b l e m s . The primal subproblem is a
lem. The dual subproblem, (DS), supplies a lower
convex problem in x, obtained by fixing y to ~.
bound, g(ul), on v*, and ~ for the primal subprob-
h(~) - min f(x, ~) lem. If (PS) has an unbounded solution, ul, we use
(PS) s.t. Gl(x,~) <_ 0 (UDS) (instead of (DS)) to get ~.
G2(x, < 0 Unfortunately, the lack of controllability for the
important parts of the solutions, y and ul, which
xEX.
occurs unless the problem is strictly convex, im-
A solution to (PS) is assumed to consist of plies that this procedure alone cannot be expected
both a primal solution, x (k), and a dual solution, to converge to the optimal solution.
(k) u~k)
(u 1 , ). Due to the convexity we can use La- We therefore need to use the master problems to
grangian duality without creating a duality gap. ensure convergence. (PM) or (DM) can be solved
(PSL) h(~) - s u p m i n n ( x ~,u). with all the constraints generated by the subprob-
u>OXEX
lem solutions. We have all the known results for
If (PS) is infeasible, (PSL) will be unbounded in u, generalized Benders or nonlinear Dantzig-Wolfe
and a solution is represented by a direction, ~(k). decomposition to fall back on, so this technique
A valid cut for the primal master problem also is well known. After the solution of one master
requires a corresponding primal solution, ~(k), ob- problem, the subproblem phase is reentered. (We
tained by solving do not switch to Benders or Dantzig-Wolfe decom-
min L(z, ~, ~(k)). position completely.)
xEX We will later describe convergence tests that tell
(Note that ~(k) is not feasible in (PS).) us exactly when to use a master problem. The exis-
The dual subproblem is the following (noncon- tence of such convergence tests is a very important
vex) problem, obtained by relaxing the first set of aspect of cross decomposition. Let us, before get-
constraints in (P) and fixing the Lagrange multi- ting any further, give below a short algorithm for
pliers ul to ~1" cross decomposition algorithm.
g(ul) - - m i n n l ( z , y, ul) Let us denote the convergence test in step 3
(before (PS)) by CTP and the convergence test
(DS) s.t. G2(x, y) <_ 0 in step 6 (before (DS)) by CTD. The optimality
xEX tests (step 2 and step 5) are included in the con-
yEY vergence tests, and the decision about where to go

344
MINLP: Generalized cross decomposition

is based on the results of both tests. The algorithm tar problems and that the description of the
is pictured in Fig. 1. functions h(y) or g(ui) or the set Y is re-
. . . .

0 Get a starting ~. fined. By 'improvement' we will, in the rest of


1 Solve (DS) (or (UDS)). this paper mean bound-improvement a n d / o r cut-
2 IF optimal go to 8. improvement. W h e n using unbounded solutions as
3 IF not convergence, go to 7A (or 7B). input no finite bounds are obtained, so bound-
4 Solve (PS).
improvement can not appear. Also, a cut giving
5 IF optimal go to 8.
6 IF not convergence go to 7B (or 7A). ELSE go a cut-improvement can be a value cut or a feasi-
to 1. bility cut, i.e. generated by o u t p u t in the form of
7A Solve (PM). Go to 4. unbounded as well as bounded solutions.
7B Solve (DM). Go to 1. Let us by primal cut-improvement denote gen-
8 Stop. The solution from (PS) is optimal.
eration of a primal cut (for (PM)) and by dual cut-
Dual subproblem
improvement denote generation of a dual cut (for
(DM)). We also use the notation 'primal' or 'dual
t bound-improvement' to indicate which of the two
Master problem
subproblems that gave the improvement, i.e. pri-
I , mal bound-improvement means that h(~) < ~ and
Primal subproblem I"
dual bound-improvement means that g(ui) > v. (~
Fig. 1. is the least upper b o u n d known and v_ the largest
lower bound known.)
We can start with either one of the subprob-
The convergence tests are originally formulated
lems, so a good primal starting solution can also
be utilized. to give the answers to the following questions.

If C T P indicates that (PS) will not give fur- • Can ~ give a bound-improvement in ( P S ) ?
ther convergence, we use (PM). If CTD indicates • Can ui give a bound-improvement in (DS)?
failure of convergence for (DS), we can use (DM)
Testing extreme rays, ui, for convergence, we
(which however gives certain convergence only if
note that the subproblem (UDS) can not give
(P) is convex). After (PM) we go to (PS) and af-
bound-improvement. We call the test of un-
ter (DM) we go to (DS), in order to make use of
bounded solutions CTDU.
the output of the master problems. In the general
We now give the convergence tests, CT, with
nonconvex case, it is not necessary to use (DM).
strict inequalities, following [11]"
It is even possible to omit the convergence tests
C T D if only (DM) is used. CTP If L(x (k), ~, u(k)) < ~, Vk E Pu,
and ~(~(k), ~, ~(k)) < 0, Vk E Ru,
then y will give primal improvement. If
T h e C o n v e r g e n c e T e s t s . Returning to the ques- not, use a master problem.
tion of convergence in the subproblem phase, we CTD If L1 (x (k), y(k), ~ ) > v__,Vk E Px,
then ~i will give dual improvement. If not,
make the following definitions of ~-improvements.
use a master problem.
'~-bound-improvement' is an improvement of at CTDU If L~(x(k),y(k),~) > O, Vk E Px,
least e of the upper or lower bound. then ui will give dual cut-improvement. If
'e-cut-improvement' is a generation of a new, so . not, use a master problem.
far unknown cut, that is at least ~ better (i.e. has a We call C T D and the first part of C T P value
value of at least e higher or lower) than all known convergence tests and C T D U and the second part
cuts at some point. of C T P feasibility convergence tests. This conforms
Discussing linear mixed integer problems, as in to the notation of value and feasibility cuts in the
[11], one can let ~ = 0. In such a case we simply master problems.
omit e from the above notation. One can show t h a t the convergence tests C T P
Cut-improvement thus means that a new cut and C T D are necessary for bound-improvement
will be included in one of the restricted mas- and sufficient for cut- or bound-improvement, see

345
MINLP: Generalized cross decomposition

[7]. The convergence tests C T D U are sufficient for When the bounded set Y is completely de-
cut-improvement. scribed with an accuracy better t h a n e by either
However, there can be an infinite number of pri- value cuts or feasibility cuts, the e-convergence
mal a n d / o r dual improvements, so one can not be tests will fail (if not earlier). Each time the e-
certain that CT will fail within a finite number of convergence tests do not fail, we will get improve-
steps. For this reason it is necessary to consider ment according to one of the three cases mentioned
e-improvements. above.
We need the following e-convergence tests, CTe: A finite number of e-bound-improvements is ob-
CTPe If L(x (k), ~, u (k)) < ~ - e, Vk E Pu, viously sufficient to decrease the finite distance
and ~(~(k),~, ~(k)) < - e , Vk E Ru, between ~ and v* to less t h a n e. After an e-cut-
then ~ will give primal e-improvement. If improvement, the new cut describes h(y) with an
not, use a master problem. accuracy better t h a n e in the area around ~ where
CTDe If L~ (x (k), y(k), ~ ) >_ v__+ e, Vk E Px ,
h(y) < L ( x (1), y, u (1)) + e. Due to the Lipschitzian
then ~1 will give dual e-improvement. If
not, use a master problem. property of the functions f, G1 and G2, there is
CTDUe If L~(x (k), y(k), ~ ) >_ e, Vk E Px, a least distance, ~, proportional to ~, from ~ to
then ~1 will give dual e-cut-improvement. any point y violating this inequality, and the e-
If not, use a master problem. convergence tests will fail for any point with a dis-
The e-value convergence tests correspond to the tance to ~ less then ~. The bounded set V can
value cuts of the master problems, and the ~ used be completely covered by a finite number of such
corresponds directly to a change of e of the bounds areas.
(e-bound-improvement). The e-feasibility conver- In the third case, an ~l-bound-improvement to-
gence tests, on the other hand, correspond to fea- gether with an e2-cut-improvement, where el +
sibility cuts of the master problems, and the e used e2 = e, we can ignore the least of el and ~2, leaving
corresponds to the 'infeasibility' it gives some pre- us with the other one greater or equal to ~/2. This
viously feasible points, which is what we call e- yields one of the two cases above, so exchanging e
cut-improvement for feasibility cuts. W h i l e t h e s e for e/2 finiteness is still assured.
e-tests are sufficient for e-improvement, they are
For unbounded solutions to (PS), any y sat-
not necessary. To prove necessity would require
isfying ~(~(l),y,~(0) > _~ will make the ~-
an inverse Lipschitz assumption, namely that for
convergence tests fail, and because of the Lips-
points a certain distance apart, the value of a func-
chitzian property of G1 and G2 there is a least
tion (the feasibility cut) should differ by at least a
distance, ~ (proportional to ~), from ~ to any y
certain amount. The following result is proved in
not making the e-convergence tests fail. Thus an
[7].
area of a certain least size is made 'infeasible', and
The e-value convergence tests of CTPe, the
the bounded set Y \ V can be covered by a finite set
feasibility convergence tests of C T P and the e-
of such areas. Thus C T P e will fail within a finite
convergence tests CTDe are necessary for e-bound-
number of steps.
improvement. The e-convergence tests CTe are
sufficient for e-bound- or e-cut-improvement, in Note that it is enough that C T P e fails. To ob-
the sense that they are sufficient for one of the tain finiteness we do not need to use CTDe, even
following. if it might be useful in practice. We cannot show
that CTDe will fail within a finite number of steps.
I) e-bound-improvement.
Dual e-bound-improvement can only occur a finite
II) e-cut-improvement. number of times, but dual e-cut-improvement can
III) el-bound-improvement and ~2-cut- occur an infinite number of times, since the area to
improvement, where el + ~2 - - £. be covered by the cuts is the nonnegative orthant
Now it is possible to verify finiteness of the con- of Ul.
vergence tests. A formal proof for this can be found We therefore require t h a t (PM) is used regu-
in [7]. The following reasoning is used. larly. (One could even skip (DM) completely.) The

346
MINLP: Generalized cross decomposition

following is our main result. sible extent. Therefore the theoretical result that
generalized cross decomposition equipped with ~-
THEOREM 1 The generalized cross decomposition
algorithm equipped with c-convergence tests CT¢ convergence tests does not have asymptotically
finds an e-optimal solution to (P) in a finite num- weaker convergence than generalized Benders de-
ber of steps, if the generalized Benders decompo- composition, is quite satisfactory.
sition algorithm does. [i] Finally one might mention that these ap-
proaches also has been applied to pure (not mixed)
All the results for generalized Benders decompo- integer programming problems in [8] (nonlinear)
sition can be directly used for generalized cross and [9] (linear). In such cases, various duality
decomposition, especially the following two. gaps appear, and exact solution is not possible.
In [5] it is shown that generalized Benders de- However, the approach may be useful for obtain-
composition has finite exact convergence if Y is a ing good bounds on the objective function value,
finite discrete set. The worst case is solving the pri- which are to be used in branch and bound meth-
mal subproblem with each possible y E Y, which ods.
will give a perfect description of h(y) and V on Y. See also: D e c o m p o s i t i o n p r i n c i p l e of lin-
Therefore we know that if Y is a finite discrete ear p r o g r a m m i n g ; G e n e r a l i z e d B e n d e r s de-
set, the generalized cross decomposition algorithm composition; M I N L P : Logic-based meth-
will solve P exactly in a finite number of steps. ods; Simplicial d e c o m p o s i t i o n a l g o r i t h m s ;
It is also shown in [5] that generalized Ben- S t o c h a s t i c linear p r o g r a m m i n g : D e c o m -
ders decomposition terminates in a finite num- p o s i t i o n a n d c u t t i n g planes; Simplicial
ber of steps to an e-optimal solution, i.e. where d e c o m p o s i t i o n ; Successive q u a d r a t i c pro-
- v < ~ for any given ~ > 0, if the set of in- gramming: Decomposition methods; Chem-
teresting (Ul, u2)-solutions (possible optimal solu- ical process p l a n n i n g ; M i x e d i n t e g e r lin-
tions to the primal subproblem) is bounded and ear p r o g r a m m i n g : M a s s a n d h e a t ex-
Y C_ V. This makes the primal feasibility cuts (and c h a n g e r n e t w o r k s ; M i x e d i n t e g e r nonlin-
the corresponding convergence tests) unnecessary. ear p r o g r a m m i n g ; M I N L P : O u t e r a p p r o x -
So for generalized cross decomposition, we know i m a t i o n a l g o r i t h m ; G e n e r a l i z e d o u t e r ap-
the following. p r o x i m a t i o n ; E x t e n d e d c u t t i n g p l a n e algo-
If h(y) is bounded from above for all y E Y, i.e. rithm; MINLP: Branch and bound meth-
(PS) has a feasible solution for every y E Y, then ods; M I N L P : B r a n c h a n d b o u n d global opti-
the cross decomposition algorithm (without UDS m i z a t i o n a l g o r i t h m ; M I N L P : Global o p t i m i -
and the e-feasibility convergence tests of CT¢) will zation with aBB; M I N L P : Heat exchanger
yield finite ~-convergence, i.e. yield ~ - v < ~ in a n e t w o r k synthesis; M I N L P : R e a c t i v e dis-
finite number of steps, for any given ¢ > 0. t i l l a t i o n c o l u m n synthesis; M I N L P : Design
If Y ~ V one might get asymptotic conver- a n d s c h e d u l i n g of b a t c h processes; M I N L P :
gence of the feasibility cuts, i.e. solutions getting A p p l i c a t i o n s in t h e i n t e r a c t i o n of design
closer and closer to the feasible set, but never actu- a n d control; M I N L P : A p p l i c a t i o n in facility
ally becomes feasible. If one is reluctant to base a l o c a t i o n - a l l o c a t i o n ; M I N L P : A p p l i c a t i o n s in
stopping criterion on e-feasible solutions, one could blending and pooling problems.
use penalty functions, which transforms feasibility
cuts to value cuts and gives better possibilities of References
handling cases where Y ~ V. One could also use [1] BENDERS, J.F.: 'Partitioning procedures for solving
artificial variables for this purpose. As for nonlin- mixed-variables programming problems', Numerische
ear penalty function techniques, one should not Math. 4 (1962), 238-252.
forget the Lipschitzian assumption made. [2] DANTZIG, G.B.: Linear programming and extensions,
Princeton Univ. Press, 1963.
The practical motivation behind cross decompo- [3] DANTZIG, G.B., AND WOLFE, P.: 'Decomposition
sition is to replace the hard primal master problem principle for linear programs', Oper. Res. 8 (1960),
with the easier dual subproblem to the largest pos- 101-111.

347
MINLP: Generalized cross decomposition

[4] FLOUDAS, C.A.: Nonlinear and mixed-integer optimi- guarantees convergence to the global optimum of a
zation: Fundamentals and applications, Oxford Univ. much broader class of problems. The integer vari-
Press, 1995.
ables may participate in the problem in a very gen-
[5] GEOFFRION, A.M.: 'Generalized Benders decomposi-
tion', J. Optim. Th. Appl. 10 (1972), 237-260. eral way, provided that the continuous relaxation
[6] HOLMBERG, K.: 'Decomposition in large scale math- of the MINLP is C 2 continuous. This article de-
ematical programming', PhD Thesis Dept. Math. scribes both algorithms.
L inkbping Univ. (1985).
[7] HOLMBERG, K.: 'On the convergence of cross decom-
position', Math. Program. 47 (1990), 269-296. The SMIN-aBB A l g o r i t h m . The SMIN-
[8] HOLMBERG, K.: 'Generalized cross decomposition ap- a B B algorithm [1], [3], [7] guarantees finite e-
plied to nonlinear integer programming problems: Du- convergence to the global solution of MINLPs be-
ality gaps and convexification in parts', Optim. 23
longing to the class
(1992), 341-356.
[9] HOLMBERG, K.: 'Cross decomposition applied to inte-
ger programming problems: Duality gaps and convexi-
min f(x) + x T A I y + c~y
x,y
fication in parts', Oper. Res. 42, no. 4 (1994), 657-668. T
s.t. gi(x) + x TAg,iy + Cg,iy ___O,
[10] HOLMBERG, K., AND J(3RNSTEN, K.: 'Cross decompo-
sition applied to the stochastic transportation prob- i = 1,...,m,
lem', Europ. J. Oper. Res. 17 (1984), 361-368. T (1)
h(x) + x TAh,iy + Ch,iY -- O,
[11] RoY, T.J. VAN: 'Cross decomposition for mixed inte-
ger programming', Math. Program. 25 (1983), 46-63. i = 1,... ,p,
[12] RoY, T.J. VAN: 'A cross decomposition algorithm for X E [ x L , x U]
capacitated facility location', Oper. Res. 34 (1986),
y e {0, 1} q
145-163.
Kaj Holmberg where f (x), g(x), and h(x), are continuous, twice-
Dept. Math. Linkbping Inst. Technol. differentiable functions, m is the number of in-
SE-581 83 Linkbping, Sweden
equality constraints, p is the number of equality
E-mail address: kahol@mai, l i u . se
constraints, q is the dimension of the binary vari-
MSC2000: 90Cll, 90C30, 49M27 able vector, AI, Ag,i and Ah,i are n x q matrices,
Key words and phrases: decomposition, primal-dual, non-
and c f, Cg,i and Ch,i a r e q-dimensional vectors.
linear, mixed integer.
The main features of any branch and bound al-
gorithm are the strategy used to generate valid
lower and upper bounds for the problem and
MINLP: GLOBAL OPTIMIZATION WITH
the selection criteria for the branching node and
o BB
the branching variable. Optionally, a procedure to
The aBB global optimization algorithm for con-
tighten the variable bounds may be considered.
tinuous twice-differentiable NLPs (cf. a B B algo-
Each one of these issues is examined in the context
r i t h m ) [2], [4], [5], [6], [8], [18] can be used to de-
of the SMIN-aBB algorithm.
sign global optimization algorithms for mixed in-
teger nonconvex problems [1], [3], [7]. One such al- Generation of Valid Upper and Lower Bounds. A
gorithm, the special structure mixed integer a B B local solution of the nonconvex MINLP (1) using
algorithm (SMIN-aBB) is designed to address the one of the algorithms described in [13] constitutes
class of MINLPs in which all the integer variables a valid upper bound on the global optimum solu-
are binary variables that participate in linear or tion of that problem. The generalized Benders de-
mixed-bilinear terms and in which the nonconvex composition (GBD) [10], [14] or a standard MINLP
functions in the continuous variables have continu- branch and bound algorithm (B&B) [9], [11], [15],
ous second order derivatives. This algorithm is an [19], [20] may be used to obtain such a solution.
extension of the aBB algorithm and branching is When there are no mixed-bilinear terms, the outer
performed on both the continuous and the binary approximation with equality relaxation (OA/ER)
variables. A second algorithm, the general struc- [12], [16] may also be used. Alternatively, the bi-
ture mixed integer a B B algorithm (GMIN-aBB), nary variables may be fixed to a combination of

348
MINLP: Global optimization with c~BB

0 and 1 values and the resulting nonconvex NLP branched on. If a continuous variable is judiciously
may be solved locally. chosen, the partition results in an improvement of
A relaxed problem which can be solved to global the lower bound on the problem through a tight-
optimality must be constructed from problem (1) ening of the convex relaxation of the nonconvex
in order to obtain a valid lower bound. The class of continuous functions. Binary variables have an in-
MINLPs in which the continuous functions ] ( x ) , direct effect on the quality of the convex underes-
9i(x), and hi(x), are convex can be solved to timators as they influence the range of values that
global optimality using the GBD or B&B algo- the continuous variables can take on.
rithms, and, when there are no mixed-bilinear A first branching variable selection scheme ex-
terms, the O A / E R algorithm. To identify a guar- ploits the direct relationship between the range
anteed lower bound on the solution of the problem, of the continuous variables and the quality of the
it therefore suffices to construct convex underesti- lower bounds and therefore branches only on these
mators for the nonconvex functions f ( x ) , gi(x), variables. One of the rules available for the c~BB
and hi(x), and to solve the resulting problem with algorithm [2] is used for the selection. These are
one of these algorithms. The rigorous convexifica- based on the size of the variable ranges, or on a
tion/relaxation strategy used in the c~BB algorithm measure of the quality of the underestimator for
for nonconvex continuous problem [2], [4], [5], [6] each term, or on a measure of each variabJe's over-
allows the construction of the desired lower bound- all contribution to the quality of the underestima-
ing MINLP. This scheme is based on a decomposi- tors.
tion of the functions into a sum of terms with spe- A second approach aims to first tackle the com-
cial mathematical structure, such as linear, con- binatorial aspects of the problem by branching
vex, bilinear, trilinear, fractional, fractional tri- only on binary variables for the first q levels of the
linear, univariate concave and general nonconvex branch and bound tree, where q is the number of
terms. A different convex relaxation technique is binary variables. The nonconvexities are dealt with
then applied for each class of term. The fact that on subsequent levels of the tree, by branching on
a summation of convex functions is itself a con- the continuous variables. The specific binary vari-
vex function is then used to construct overall func- able used for branching is chosen randomly or from
tion underestimators and arrive at a convex lower a priority assigned on the basis of its effect on the
bounding MINLP. structure of the problem. In particular, the binary
variables that influence the bounds on the greatest
Selection o] Branching Node. A list of the lower
number of variables are given the highest priori-
bounds on all the nodes that have not yet been
ties. Once all the binary variables have been fixed,
explored during the branch and bound procedure
the problems that must be considered are continu-
is maintained. A number of approaches can be used
ous nonconvex and convex problems for the upper
to select the next branching node, such as depth-
and lower bound respectively. The bounding of the
first, breadth-first or smallest lower bound first.
nodes below level q is therefore less computation-
Since the purpose of the algorithm is to identify
ally intensive than above that level.
the global solution of the problem, all promising
regions, that is, all regions for which the lower A third approach also involves branching on
bound is less than or equal to the best upper bound the continuous and binary variables although the
on the solution, must be explored. The strategy choice is no longer based on the level in the tree.
that usually minimizes the number of nodes to be To increase the impact of binary variable branch-
examined and therefore the CPU requirements of ing on the quality of the lower bound, such a vari-
the algorithm is used to choose the next branching able is selected when a continuous relaxation of
node in the SMIN-c~BB algorithm. Thus, the node the problem indicates that the two children node
with the smallest lower bound is selected. will have significantly different lower bounds, and
that one of them may even be infeasible. Thus,
Selection o] Branching Variable. Several strate- if one of the binary variables is close to 0 or 1
gies can be used to select the next variable to be at a local solution of the continuous relaxation, it

349
MINLP: Global optimization with aBB

is branched on. The degree of closeness is an ar- Yi E {0, 1} whose bounds are being updated. The
bitrary parameter which can typically be set to procedure above is used.
0.1 or 0.2. If no 'almost-integer' binary variable is
found, a continuous variable is selected for branch- Algorithmic Procedure. The algorithmic procedure
ing. In general, this hybrid strategy results in a for the SMIN-aBB algorithm is as follows:
faster improvement in the lower bounds than the
second approach, but it is more computationally
intensive because a continuous relaxation must be
solved before selecting a branching variable and a
larger number of MINLP nodes may be encoun- PROCEDURE SMIN-aBB algorithm()
tered during the branch and bound search. Decompose functions in problem;
Set tolerance e;
Variable Bound Updates. The tightening of vari- Set f" = f0 = - c ~ and f* = ~o = +c~;
able bounds is a very important step because of Initialize list of lower bounds {f_0};
m

its impact on the quality of the underestimators. DO y * - f * > e


For continuous variables, the strategies developed Select node k with smallest lower bound,
fk, from list of lower bounds;
for the aBB algorithm may be used [2]. For the
Set f* = fk;
SMIN-aBB algorithm, they rely on the solution of
(Optional) Update binary and continuous
several convex MINLPs in the optimization-based variable bounds;
approach, or the iterative interval evaluation of the Select binary or continuous branching
constraints in the interval-based approach. In this variable;
latter case, the binary variables are relaxed during Partition to create new nodes;
DO for each new node i
the interval computation.
Generate convex lower bounding MINLP;
PROCEDURE binary variable bound update() Find solution fi of convex lower
Consider R = {(x, y ) e F: y, = 0}; bounding MINLP;
Test interval feasibility of R; IF infeasible or fi > -f. + c
IF infeasible, set yL = 1; Fathom node;
Consider R = {(x, y) E F : yi-- 1}; ELSE
Test interval feasibility of R; Add fi to list of lower bounds;
IF infeasible, Find a solution ~i of nonconvex
IF yL = 1, RETURN(infeasible node); MINLP;
ELSE, set yU = 0; IF f ' < f* THEN Set ]* = ~i;
RETURN(new bounds yL and yV); OD;
END binary variable bound update; OD; D

RETURN(f* and variables values at


Procedure for binary variable bound updates.
corresponding node);
In the case of binary variables, successful bound END SMIN-aBB algorithm;

updates are beneficial in two ways. First, they in- Pseudocode for the SMIN-aBB algorithm.
directly lead to the construction of tighter under-
estimators as they affect the continuous variable
bounds. Second, they allow a binary variable to be
fixed and therefore decrease the number of combi-
nations that potentially need to be explored. An
interval-based strategy can be used to carry out In order to illustrate the algorithmic procedure,
binary variable bound updates. Given the current a small example proposed in [17] is used. It is a
upper bound f* on the global optimum solution, simple design problem where one of two reactors
the feasible region F is defined by the constraints must be chosen to produce a given product at the
appearing in the nonconvex problem, a new con- lowest possible cost. It involves two binary vari-
straint f ( x ) + x T A f y + c}-y < ]*, and the box ables, one for each reactor, and seven continuous
(x,y) E [xL,x U] x [yn,yV]. Consider a variable variables. The formulation is:

350
MINLP: Global optimization with a B B

on the continuous variables may now begin. The


min 7.5yl + 5.5y2 + 7Vl + 6v2 + 5x
first selected variable is xl and regions 0 < xl _< 10
s.t. zl - 0.9 (1 - e -°'5v~) Xl - 0
and 10 < Xl ~ 20 are created. Since the left region
z2 - O.S (1 - e -°'5v2) x2 = 0 has the lowest lower bound (36.4), it is examined at
Xl-4"X2-- x --O iteration 3. Variable bound updates show that this
zl + z 2 - 10 region is in fact infeasible and it is therefore elim-
inated without further processing. The algorithm
v l - 10yl <_ 0
proceeds to node 4 for which Vl is selected as a
v 2 - 10y2 _~ 0
branching variable. The right region, 5 < vl <_ 10,
Xl -- 20yl _~ 0 is fathomed since it has a lower bound greater than
x2 - 20y2 <_ 0 99.2. The algorithm progresses along the branch
Yl q- Y2 = 1 and bound until, at iteration 9, two nodes are left
open with lower bounds of 99.2. This is within the
0<Xl,X2_<20; 0~zl,z2 <30
accuracy required for this run so the procedure is
O < v l , v 2 < lO; 0<x<20
terminated. One more iteration would reveal that
(Yl, Y 2 ) e {0, 1} 2 the only global optimum lies in the right child of
Because of the linear participation of the binary node 9.
variables, the SMIN-aBB algorithm is well-suited
to solve this nonconvex MINLP. It identifies the
global solution of 99.2 after nine iterations, when o/V 1
bound updates are performed at every iteration
2
and branching takes place on the binary variables
first. Branching variable selection takes place ran-
domly for the binary variables and according to [0.10] [10,20]
the term measures for the continuous variables. At
the global solution, the binary variable values are
yl = 1 and y2 -- 0. The steps of the algorithm are
shown in Fig. 1. The boldface numbers next to the [0,5] [5,10]
nodes indicate the order in which the nodes were
5
explored. The lower bound is computed by solving
a convex relaxation of the nonconvex problem is
indicated inside each node, and the branching vari- 6
[10,15] j ~ A I ~ , ~ [15,20] G
able selected for the node is also specified. The do-
main to which this branching variable is restricted
is displayed along each branch. A black node in- [0,2.5] [2.5,5]
[2.5,5]
dicates the lower bounding problem was found in-
feasible and a shaded node is fathomed because
its lower bound is greater than the current upper
[2.5,3.75] [3.75,5]
bound on the solution.
At the first node, the initial lower bound is 11.4
and an upper bound of 99.2 is found. The binary
variable Yl is selected as a branching variable. The
[2.5 [3.125,3.75]
region yl = 0 is infeasible and can therefore be
fathomed (black node), while an improved lower
bound is found for Yl - 1. This latter region is
therefore chosen for exploration at the second iter-
ation. Variable bound updates reveal that y2 -- 1 is Fig. 1" SMIN-aBB branch and bound tree.
infeasible so that y2 can be fixed to zero. Branching

351
MINLP: Global optimization with aBB

The SMIN-aBB algorithm is especially effective node of the branch and bound tree is obtained by
for chemical process synthesis problem such as dis- solving a continuous relaxation of the nonconvex
tillation network or heat exchanger network syn- MINLP at that node. When the integer variables
thesis [1], [3]. that have not yet been fixed are allowed to vary
continuously between their bounds, the problem
T h e G M I N - a B B A l g o r i t h m . The G M I N - a B B becomes a nonconvex NLP. The validity of the
algorithm is designed to address the broad class of lower bound can only be ensured if the global so-
problems represented by lution of this nonconvex NLP is identified or if a
lower bound on this solution is found. On the other
rain f ( x , y)
x,y hand, when all integer variables have been fixed to
s.t. g(x, y) _~ 0 integer values at a node, no additional partitioning
h(x, y) -- 0 (2) of this node can take place and the global optimum
solution of the nonconvex NLP is required to guar-
x e [x ,x U]
antee convergence of the GMIN-aBB. Based on
yE[yL,yU]MN q
these conditions, the aBB algorithm can be used
where f ( x , y), g(x, y), and h(x, y), are functions as as subroutine to generate valid lower bounds:
whose continuous relaxation is twice continuously
If at least one integer variable can be relaxed
differentiable.
at the current node, run the aBB algorithm
The GMIN-aBB algorithm [2], [3], [7] extends
for a few iterations to obtain a valid lower
the applicability of the standard branch and bound
bound on the global solution of the continu-
approaches for MINLPs [9], [11], [13], [15], [19], [20]
ous relaxation or run the aBB algorithm to
by making use of the aBB-algorithm. The most
completion to obtain the global solution of
crucial characteristics of the algorithm are the
the continuous relaxation.
branching strategy, the derivation of a valid lower
bound on problem (2), and the variable bound up- • Otherwise, run the aBB algorithm to com-
date strategies. pletion to obtain the global solution for the
current node.
Branching Variable Selection. Branching in the
GMIN-aBB algorithm is carried out on the in- This strategy makes use of the convergence charac-
teger variables only. When it is a bisection, the teristics of the aBB algorithm to improve the per-
partition takes place either at the midpoint of the formance of the GMIN-aBB algorithm. The rate
range of the selected variable, or at the value of of improvement of the lower bound on the global
that variable at the solution of the lower bound- solution of a nonconvex NLP is usually very high
ing problem. It is also possible to branch on more at early iterations and then gradually tapers off.
than one variable at a given node, or to perform At later stages of an aBB run, the computationally
k-section on one of the variables. More than two expensive reduction of the gap between the bounds
children node may be created from a parent node on the solution of the continuous relaxation does
when the structure of the problem is such that the not result in a sufficiently significant increase in
bounds on a small fraction of the integer variables the lower bound to affect the performance of the
affect the bounds on many of the other variables in GMIN-aBB algorithm and can therefore be by-
the problem. As in the SMIN-aBB algorithm, an passed.
integer variable is chosen randomly or according
Generation of a valid upper bound. Because of the
to branching priorities. An additional rule consists
finite size of the branch and bound tree, it is not
of selecting the most or least fractional variable
necessary to generate an upper bound on the non-
at the solution of a continuous relaxation of the
convex MINLP at each node in order to guaran-
problem.
tee convergence of the GMIN-aBB algorithm. In
Generation of a Valid Lower Bound. A guaranteed the worst case, the integer variables are fixed at
lower bound on the global solution of the current every node of the last level of the tree, and the

352
MINLP: Global optimization with aBB

solutions of the corresponding NLPs provide the tor of objective function, ]* denotes the current
upper bounds needed to. identify the global opti- best upper b o u n d on the global o p t i m u m solu-
m u m solution. However, upper bounds play a sig- tion, C(x, y, w) denotes the set of convexified con-
nificant role in improving the convergence rate of straints, and w is the set of new variables intro-
the algorithm by allowing the fathoming of nodes duced during the convexification/relaxation proce-
whose lower b o u n d is greater t h a n the smallest up- dure. Finally, the improved lower or upper bound
per bound and therefore reducing the final size of is obtained by setting yL _ [y,] or yU _ [y,].
the branch and bound tree. An upper bound on the In the interval-based approach, an iterative pro-
solution of a given node can be obtained in several cedure is followed based on an interval test which
ways. For example, if the solution of the contin- provides sufficient conditions for the infeasibility of
uous relaxation is integer-feasible, that is, all the the original constraints and the 'bound improve-
relaxed integer variables have integer values at the ment constraint' f ( x , y ) <_ f*, given the relaxed
solution, this solution is both a lower and an upper region (x, y) C Ix L, x U] × [yL, yU]. This set of con-
bound on the current node. If the a B B algorithm straints defines a region denoted by F. The proce-
was run for only a few iterations and the relaxed dure to improve the lower (upper) bound on vari-
integer variables are integer at the lower bound, able yi is as follows:
they can be fixed to these integer values and the
PROCEDURE interval-based bound update()
resulting nonconvex NLP can be solved locally to
Set initial bounds L = yL and U = yV;
yield an upper bound on the solution of the node. Set iteration counter k = 0;
Finally, a set of integer values satisfying the integer Set maximum number of iterations K;
constraints can be used to construct a nonconvex DOk<KandL~=U
NLP whose local solutions are upper bounds on Compute 'midpoint' M - [(U + L)/2];
the current node solution. Set left region
{(x,y) e F: y, e [L, M]};
Variable Bound Updates. If the bounds on the in- Set right region
{(x,y) e F: y, e [M + 1, U]};
teger variables at any given node can be tight-
Test interval feasibility of left
ened, the solution space can be significantly re- (right) region;
duced due to the combinatorial nature of the prob- IF feasible,
lem. The allocation of computational resources for Set U = M ( L = M ) ;
this purpose is therefore a potentially worthwhile ELSE
investment. An optimization-based approach or an Test interval feasibility of right
(left) region;
interval-based approach may be used to update the
IF feasible,
variable bounds. These approaches are similar to Set L - M ( U = M ) ;
those developed for the a B B algorithm but they ELSE
take advantage of the integrality of the variables. IF k=O,
Thus, in the optimization approach, the lower or RETURN(infeasible node);
ELSE
upper bound on variable yi is improved by first re-
SetL-U (U-L);
laxing the integer variables, and then solving the Set U = y V (L=yL);
convex NLP Set k = k + 1;
OD;
min or maxx,y,w Yi RETURN(y L = L (y~ = U));
s.t. /(x,y,w) ~_ 7* END interval-based bound update;
C(x, y, w) Interval-based bound update procedure.
(3)
x e Ix L, x v ]
The variable b o u n d tightening is performed be-
y e [yL, yU] fore calling the a B B algorithm to obtain a lower
w e [w L , w v] bound on the solution of the current node. In many
cases, during an a B B run, variable bound updates
where f ( x , y, w) denotes the convex underestima- are also used to improve the quality of the gener-

353
MINLP: Global optimization with c~BB

ated lower bounds. Although the a B B algorithm tation as previously.


treats the y variables as continuous, the b o u n d up- 1
date strategy within the c~BB algorithm may be
modified to account for the true nature of these 0 1
variables. A larger reduction in the solution space
can be achieved by adopting one of the integer
b o u n d u p d a t e strategies described here for the re-
laxed y variables. This more stringent approach
leads to a lower b o u n d which is not necessarily a
Fig. 2. GMIN-c~BB branch and bound tree.
valid lower bound on the continuous relaxation,
but which is always a lower b o u n d on the global At the first node, the continuous relaxation of
solution of the nonconvex MINLP. the nonconvex MINLP is solved for 10 ~BB itera-
The overall algorithmic procedure for the tions to yield a lower b o u n d of 60. No upper bound
GMIN-c~BB algorithm is shown below: is found. Next, the binary variable y2 is chosen
for branching and the continuous relaxation of the
PROCEDURE GMIN-aBB algorithm()
Set tolerance e; problem with y2 = 0 is solved. A lower bound of
Set f* = f0 __ -c<) and f* = ~0 = +oc; 92.2 is found as the global solution to this noncon-
Initialize list of lower bounds {f0}; vex NLP. In addition, this solution is integer feasi-
DO f * - f * >e ble and therefore provides an upper b o u n d on the
Select node k with smallest lower bound, global o p t i m u m solution of the nonconvex MINLP.
fk, from list of lower bounds;
The region y2 = 1 is then examined and the global
Set f* - fk;
(Optional) Update y variable bounds; solution of the NLP is found to be 101.7 after 10
Select integer branching variable(s); c~BB iterations. This node can therefore be fath-
Create new nodes by branching; omed and the procedure terminated.
DO for each new node i The G M I N - a B B algorithm has been used to
Obtain lower bound fi on node
solve nonconvex MINLPs involving nonconvex
IF all integer variables are fixed,
Find global solution fi of nonconvex terms in the integer variables and some mixed non-
NLP with c~BB algorithm; convex terms. Branching priorities combined with
ELSE variable b o u n d updates and a small number of
Relax integer variables; (~BB iterations for relaxed nodes allow the identifi-
Run c~BB algorithm to completion or cation of the global o p t i m u m solution after the ex-
for a few iterations to get fi
ploration of a small fraction of the m a x i m u m num-
(Optional) Use integer bound
updates on y variables; ber of nodes and with small C P U requirements. In
IF fi > ~i + e, THEN Fathom node; particular, the algorithm has been used on a p u m p
ELSE network synthesis problem [2], [3]. Some noncon-
Add fi to list of lower bounds; vex integer problems have also been tackled by the
(Optional) Obtain upper bound ~i on
same approach. For instance, the minimization of
nonconvex MINLP;
IF f ' < f* THEN Set f* - f'; trim loss, a problem taken from the paper cutting
OD; industry, has also been addressed for medium or-
OD; der sizes [3].
RETURN(f* and variables values at
corresponding node);
END GMIN-c~BB algorithm; C o n c l u s i o n s . The ~BB algorithm for nonconvex
NLPs can be incorporated within more general
Pseudocode for the GMIN-c~BB algorithm.
frameworks to address broad classes of nonconvex
The algorithmic procedure for the GMIN-(~BB al- MINLPs. One extension of the algorithm is the
gorithm is illustrated using the same example as SMIN-(~BB algorithm which identifies the global
for the SMIN-~BB algorithm. The branch and o p t i m u m solution of problems in which binary
bound tree is shown in Fig. 2, using the same no- variables participate in linear or mixed-bilinear

354
MINLP: Global optimization with aBB

terms and continuous variables appear in twice and scheduling of b a t c h processes; M I N L P :


continuously differentiable functions. The parti- Applications in the interaction of design
tioning of the solution space takes place in both and control; M I N L P : Application in facility
the continuous and binary domains. The GMIN- location-allocation; M I N L P : Applications in
aBB algorithm is designed to locate the global op- blending and pooling problems.
timum solution of problems involving integer and
continuous variables in functions whose continu-
References
ous relaxation is twice continuously differentiable. [1] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
The algorithm is similar to traditional branch and C.A.: 'Global optimization of MINLP problems in pro-
bound algorithms for mixed integer problems in cess synthesis', Computers Chem. Engin. 21 (1997),
that branching occurs on the integer variables only $445-$450.
and a continuous relaxation of the problem is con- [2] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
C.A.: 'A global optimization method, aBB, for general
structed during the bounding step. It uses the aBB
twice-differentiable constrained N L P s - II. Implemen-
algorithm for the efficient and rigorous generation tation and computational results', Computers Chem.
of lower bounds. Both algorithms are widely ap- Engin. 22 (1998), 1159.
plicable and have been successfully tested on a va- [3] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
riety of medium-size nonconvex MINLPs. C.A.: 'Global optimization of mixed-integer nonlinear
problems', Computers Chem. Engin. 46 (2000), 1769-
See also: Convex envelopes in optimiza- 1797.
tion problems; Global optimization in gen- [4] ADJIMAN,C.S., ANDROULAKIS,I.P., MARANAS,C.D.,
eralized geometric p r o g r a m m i n g ; Global AND FLOUDAS, C.A.: 'A global optimization method,
optimization of heat exchanger networks; aBB, for process design', Computers Chem. Engin. 20
Mixed integer linear p r o g r a m m i n g : Heat (1996), $419-$424.
[5] ADJIMAN, C.S., DALLWIG, S., FLOUDAS, C.A., AND
exchanger ne t w or k synthesis; M I N L P : Mass
NEUMAIER, A.: 'A global optimization method, c~BB,
and heat exchanger networks; Global op- for general twice-differentiable constrained N L P s - I.
timization in batch design u n d e r uncer- Theoretical advances', Computers Chem. Engin. 22
tainty; S m o o t h nonlinear nonconvex optimi- (1998), 1137.
zation; Interval global optimization; a B B al- [6] ADJIMAN, C.S., AND FLOUDAS, C.A.: 'Rigorous con-
vex underestimators for twice-differentiable problems',
gorithm; Global opt i m i z a t i on in phase and
J. Global Optim. 9 (1996), 23-40.
chemical reaction equilibrium; Global op- [7] ADJIMAN, C.S., SCHWEIGER, C.A., AND FLOUDAS,
timization m e t h o d s for systems of nonlin- C.A.: 'Mixed-integer nonlinear optimization in process
ear equations; Continuous global optimi- synthesis', in D.-Z. Du A N D P.M. PARDALOS (eds.):
zation: Models, algorithms and software; Handbook Combinatorial Optim., Kluwer Acad. Publ.,
Disjunctive p r o g r a m m i n g ; Reformulation- 1998, pp. 429-452.
[8] ANDROULAKIS, I.P., MARANAS, C.D., AND FLOUDAS,
linearization m e t h o d s for global optimiza- C.A.: 'aBB: A global optimization method for general
tion; Chemical process planning; Mixed in- constrained nonconvex problems', J. Global Optim. 7
teger linear p r o g r a m m i n g : Mass and heat (1995), 337-363.
exchanger networks; Mixed integer nonlin- [9] BEALE, E.M.L.: 'Integer programming': The State of
ear p r o g r a m m i n g ; M I N L P : O u t e r approx- the Art in Numerical Analysis, Acad. Press, 1977,
pp. 409-448.
imation algorithm; Generalized outer ap-
[10] BENDERS, J.F.: 'Partitioning procedures for solv-
proximation; M I N L P : Generalized cross de- ing mixed-variables programming problems', Numer.
composition; E x t e n d e d c u t t i n g plane al- Math. 4 (1962), 238.
gorithm; M I N L P : Logic-based methods; [11] BORCHERS, B., AND MITCHELL, J.E.: 'An improved
M I N L P : B r a n c h and b o u n d methods; branch and bound algorithm for mixed integer nonlin-
M I N L P : B r a n c h and b o u n d global optimi- ear programs', Techn. Report Renssellaer Polytechnic
Inst. 200 (1991).
zation algorithm; Generalized Benders de-
[12] DURAN, M.A., AND GROSsMANN, I.E.: 'An outer-
composition; M I N L P : H e a t exchanger net- approximation algorithm for a class of mixed-integer
work synthesis; M I N L P : Reactive distil- nonlinear programs', Math. Program. 36 (1986), 307-
lation column synthesis; M I N L P : Design 339.

355
MINLP: Global optimization with a B B

[13] FLOUDAS, C.A.: Nonlinear and mixed integer optimi- minimum temperature approach, ATtain, places
zation: Fundamentals and applications, Oxford Univ. a lower bound on the utility consumption in a
Press, 1995.
heat exchanger network and decomposed a heat
[14] GEOFFRION, A.M.: 'Generalized Benders decomposi-
tion', J. Optim. Th. Appl. 10 (1972), 237-260. exchanger network into independent subnetworks.
[15] GUPTA, O.K., AND RAVINDRAN, R.: 'Branch and This enables the heat exchanger network synthesis
bound experiments in convex nonlinear integer pro- problem to be decomposed into four subproblems.
graming', Managem. Sci. 31 (1985), 1533-1546. The first subproblem finds the appropriate mini-
[16] KocIs, G.R., AND GROSSMANN, I.E.: 'Relaxation
mum temperature approach, the second subprob-
strategy for the structural optimization of process flow
sheets', Industr. Engin. Chem. Res. 26 (1987), 1869.
lem minimizes the utility consumption, the third
[17] KocIs, G.R., AND GROSSMANN, I.E.: 'A modelling subproblem finds the minimum number of matches
and decomposition strategy for the MINLP optimiza- and identifies the matches and their heat duty, and
tion of process flowsheets', Computers Chem. Engin. the fourth finds and optimizes the actual network
13 (1989), 797-S19. structure.
[18] MARANAS, C.D., AND FLOUDAS, C.A.: 'Global
minimum potential energy conformations of small
See [5] for a systematic scheme for solving these
molecules', J. Global Optim. 4 (1994), 135-170. problems sequentially. First, the utility consump-
[19] OSTROVSKY, G.M., OSTROVSKY, M . G . , AND tion is minimized using the linear programming
MIKHAILOW, G.W.: 'Discrete optimization of chemical (LP) transshipment model approach of [10]. Sec-
processes', Computers Chem. Engin. 14 (1990), III. ond, a set of process matches and their heat du-
[20] QUESADA, I., AND GrtOSSMANN, I.E.: 'An LP/NLP
ties that minimize the total number of units is
based branch and bound algorithm for convex MINLP
optimization problems', Computers Chem. Engin. 16 found with the mixed integer linear programming
(1992), 937-947. (MILP) strategy of [10]. Then, the network struc-
ture is found [5] by optimizing a superstructure
Claire S. Adjiman
Dept. Chemical Engin. Princeton Univ.
that contains all possible network configurations
Princeton, NJ 08544-5263, USA embedded within it using a nonlinear program-
E-mail address: ¢laire@titem. princeton, e d u ming (NLP) problem. When there is more than
Christodoulos A. Floudas one combination of matches and heat duties that
Dept. Chemical Engin. Princeton Univ. satisfies the minimum unit criterion, the best com-
Princeton, NJ 08544-5263, USA bination is found by exhaustive enumeration. The
E-mail address: floudas@titan, princeton, e d u minimum temperature approach is optimized with
MSC2000: 65K05, 90Cll, 90C26 a golden section search that solves all three of these
Key words and phrases: global optimization, twice- optimization problems at each iteration.
differentiable MINLPs, branch and bound, aBB algorithm.
In the late 1980s it was found, [4], [12], that bet-
ter network designs could be obtained by solving
some of the heat exchanger network design sub-
M I N L P : HEAT EXCHANGER NETWORK problems simultaneously. C.A. Floudas and A.R.
SYNTHESIS Ciric [4] combined the MILP stream matching
Heat exchanger network synthesis problems arise problem with the NLP superstructure optimiza-
in chemical process design when the heat released tion problem formulated in [5], creating a mixed
by hot process streams is used to satisfy the de- integer nonlinear programming problem (MINLP)
mands of cold process streams. These problems that avoided the exhaustive search through all
have been the subject of an intensive research ef- combinations of matches that minimize the num-
fort, and over 400 publications have been written ber of units. In 1990, they [2] formulated the en-
in the area. See [7], [8], [9] for reviews of the area, tire heat exchanger network design problem as a
and [1], [3] for detailed analysis of HEN synthesis. MINLP. The solution of this problem yields the
The discovery by T. Umeda et al. [11] of a optimal temperature approach, utility level, pro-
thermodynamic pinch point that limits heat in- cess matches, heat duties, and network structure,
tegration in a heat exchanger network led to much eliminating the need for a global section search for
of this research effort. They showed that setting the optimum minimum temperature approach.

356
MINLP: Heat exchanger network synthesis

T.F. Yee and I.E. Grossmann [12] used a smaller within it. Two superstructures are particularly in-
superstructure proposed in [6] that embodies a teresting.
sequential-parallel network structure to formulate
an alternative MINLP for heat exchanger network f~.~a,i

synthesis. The solution of this MINLP yielded the


F~,TL
utility consumption, matches and network struc-
ture and heat exchanger areas. L'"'

.~,
P r o b l e m S t a t e m e n t . This article will explore
two mixed integer nonlinear programming prob- Fig. 1. A superstructure for one hot stream exchanging
lems in heat exchanger network synthesis: com- heat with two cold streams.
bined match-network optimization and heat ex- Fig. 1 shows a superstructure of a hot stream,
changer network synthesis without decomposition. above the thermodynamic pinch point, that may
The synthesis without decomposition problem can exchange heat with two cold streams [5]. Notice
be stated as follows: that the stream can be piped in series, in parallel,
Given: and in split-mix-bypass configurations, as shown
in Fig. 2. As we shall see, this richness leads to
1) A set of hot process streams and hot utilities
nonconvex constraints in the MINLP. The first
i E H, their inlet and outlet temperatures
network superstructure is created by constructing
T i, T °'i, and heat capacity flow rates Fi;
similar structures for every other stream above the
2) A set of cold process streams and cold util- pinch point.
ities j E C, their inlet and outlet tempera-
tures T j, T O'j, and heat capacity flow rates
FJ; and
F"T~//"I
3) Overall heat transfer coefficients Uij.
Determine"
A) The stream matches (ij), the heat duty Qij
I e.I, t/. ~ ,o.l fa o.I
of match (ij), and the heat exchanger area
Aij of match (i j);
F',Tj
B) the piping structure for each stream in the
network; and
~..s,I ~ @ to.~
c) the temperature and flowrate within each /'"' t
pipe of the network. j~o.I
In the match-network problem, one is also given ./~.at1

• the level of each utility; and ¥',T~

• a minimum temperature approach ATmin.


//.t, t/.' f ~ ' , , ~ t:....
These problems can be solved using mixed integer
nonlinear programming. The development and ap-
plication of these approaches is described in more Fig. 2: Stream piping configurations embedded in the
detail below. superstructure shown in Fig. 1.
Notice that in this subnetwork, streams H1 and
Heat Exchanger Network Superstructures. C1, and all other pairs of hot and cold streams, can
Mixed integer nonlinear programming approaches exchange heat no more than once. H1 and C1 may
to these problems begin with a superstructure exchange heat again in the subnetwork below the
that contains many alternative designs embedded thermodynamic pinch point. The thermodynamic

357
MINLP: Heat exchanger network synthesis

pinch point has partitioned the temperature range neously minimizes the utility consumption, selects
into two intervals, and in each interval, individual the stream matches, and optimizes the network
process streams can only exchange heat once. layout, in heat exchanger network synthesis with-
One could increase the number of times two out decomposition.
streams can exchange heat by partitioning the
Match-Network Problem. The MINLP model of
temperature range further. This is the basic strat-
the match-network problem has three components:
egy behind the second superstructure [6], [12]
a transshipment model [10] that identifies feasible
shown in Fig. 3. Here, the temperature range has
process stream matches and their heat duties, a
been partitioned into many intervals, or stages.
superstructure model of all possible network struc-
Within any particular stage, each hot stream may
tures, and an objective ]unction.
exchange heat with each cold stream; multiple in-
The transshipment model partitions the tem-
tervals allow any particular match to take place
perature range into t - 1 , . . . , T temperature in-
many times in the network. Unlike the first su-
tervals, using the inlet and outlet temperatures
perstructure, each stream in each stage is piped
of the streams and the temperature interval ap-
in a parallel configuration, and the inlet and out-
proach temperature (TIAT). Hot streams release
let temperature of each parallel line is fixed by the
heat into the temperature intervals, where it either
temperature interval. Series piping structures arise
flows to the cold streams in the same interval or
when a stream exchanges heat only once per inter-
cascades down to the next colder interval. The bi-
val. The superstructure does not contain split-mix-
nary variable Y~j denotes the existence of a match
bypass or series-parallel structures, but as we shall
between hot stream i and cold stream j, where
see that in exchange the nonconvex constraints
heat loads are qij and Qij, and heat residuals are
that arise from the first superstructure have been
eliminated. Rk.The model is composed by the following con-
straints:
st~l su~v-2

[ + =
I J6Ct
I ieg, jeCi,

~ie Rj t 1 T
TJEC, -- , . . . , ,

I ~ qiJt = QiJ ,
I ! '
Tcm~mtutc Tcmlmratttre Tcmocraure
h:gaOon
k=-I
kgaUon
k=2
location
le-3
I iEH, j ECi,
I Q~j - uY~j <_o,
Fig. 3: Two-stage superstructure. 1 iES, jECi,

Mathematical Models for H E N Synthesis


using M I N L P s . MINLP models of heat ex- The first two constraints in the transshipment
changer network synthesis arise when the process model are the energy balances for each temper-
stream matches are selected while simultaneously ature interval. The total heat load in a match
optimizing the heat exchanger network; the former is given by the third constraint. The fourth con-
is a discrete decision modeled with integer vari- straint bounds the heat load using the binary vari-
ables, the latter, a nonlinear optimization prob- able Y/j and a large fixed constant U. The last con-
lem. In this paper, we refer to this as the match- straint in the above model puts an upper bound
network problem. MINLPs may also be used to on the number of existing matches, which is the
formulate an optimization problem that simulta- maximum number of units.

358
MINLP: Heat exchanger network synthesis

The second part of the match-network synthe- where ATij,max equals T i - T j. Lastly, the objec-
sis model is the hyperstructure topology model, tive function minimizes the total investment cost"
which consists of mass and energy balances for the
mixers and splitters, feasibility constraints, utility
load constraint and bounds on the flow rate heat-
capacities. min ~a t~,~--t.Qii
oj' --t-'
o~ +t i~,j Yij.
iEa jEC Uij 3 ' J
Mass balances for the splitters at the inlet of the tI,i_tO,J
j
In t o ,i t I , j
superstructure: j
_

i
~ ¢I,k _ Fk k E HCT. The model is a mixed integer nonlinear program-
Jk ~
kI ming (MINLP) problem, as the objective function
Here, HCT is the set of all process streams and and the energy balances are nonlinear, and the de-
utilities. Mass balances for the mixers at the inlets cision variables Y/j are binary. Notice that the en-
of the exchangers: ergy balances are bilinear, creating a nonconvex
f kI,k ,eB,k E,k kt
feasible region.
, J k' ,k" -o, k RcT
ko Heat E x c h a n g e r N e t w o r k Synthesis without De-
Mass balances for the splitters at the outlets of the composition. MINLP models that optimize utility
exchangers: consumption as well as process matches, heat du-
ties, and network configurations can also be formu-
Sk0'~ + E'¢S'kk",k' -- f E''kk- 0, k', k E HOT.
lated. See [2] and [12] for pseudopinch approaches
ktt
that set the TIAT to a small value and lets heat
Energy balances for the mixers at the inlets of the
flow across the pinch. A strict decomposition at the
exchangers:
pinch can also be maintained by letting TIAT vary,
T k ~ I,k ~eB,k . O,k E , k . I,k and using integer variables to model the changing
~k, + ~-~ Jk' ,k,, t k' - f k, t k, - 0 ,
kit structure of the temperature cascade.
k ~, k E HCT. EXAMPLE 1 These techniques are demonstrated
Energy balances over the heat exchangers: with a problem given in both [12] and [2]. The
problem consists of two hot streams, two cold
Qij_ f.E,i
,3 (tI, i - tO, i / - 0 , i E H, j E C, streams, one hot utility (steam), and one cold util-
I , j ) --O. ity (cooling water). The stream data is given in
Qij -- fJiE , j ( t O , J _ ti
Table 1.
The minimum temperature approach between a Stream Ti,(C) Tout(C)FCp(kW/C)
hot stream and a cold stream: H1 500 320 6
H2 480 380 4
tjI,i _ tO,J >
_ ATmin,
H3 460 360 6
t O ' i - t I'j > ATmin. H4 380 360 20
H5 380 320 12
Logical relations between the heat-capacity flow C1 290 660 18
rates and the existence of a match: F 700 700
CW 300 320
f 7 'i - F i Y i j < O, U - 1.OkW/(m2C)
j ifE 'J _ F J Yi j _< O. Annual cost= 1200A°'6 for all exchangers
Cs = 140$/kW
Lower bounds on the heat-capacity flow rates C ~ = lO$/kW
through the exchanger: Table 1" Stream data for example problem.

f T,i Qij > O, Using the pseudopinch method with TIAT= 1C


ATij,max - and ATmin -~ 0 . 5 C , and allowing HRAT to vary
f E,j _ Q~j > O, between 1C and 30C, Ciric and Floudas [2] for-
gi i T i j , m a x -- mulated the problem as a MINLP problem and

359
MINLP: Heat exchanger network synthesis

solved it using the generalized Benders decompo-


~660

sition algorithm. The optimal network configura-


tion is pictured in Fig. 3. The network consumes 455.75
440.~

3592.4kW of steam and 1312.4kW of cooling wa-


ter, the HRAT is 8.42C. The annual cost of the 1.698
J~,.g02
'352.22
network is $571,080. The match data of this solu- .,N )ze
tion is given in Table 2. J3~

290
el
Match Q (kW) A (m 2)
H1-C1 948.454 79.391 Fig. 5. Optimal network configuration for example
H1-CW 131.546 6.280 problem; simultaneous approach [12].
H2-C1 400.000 29.057
H3-C1 600.000 57.488 V]
H4-C1 400.000 14.880
H5-C1 720.000 25.509
S-C1 3591.546 32.112 Conclusions. Mixed integer nonlinear program-
ming offer a powerful approach to heat exchanger
Table 2: Match data for example problem; pseudo-pinch
network synthesis. Using these techniques, stream
method [2].
matching, the combinatorial component of heat
exchanger network synthesis, can be performed
Yee and Grossmann [12] used the same problem while simultaneously minimizing the utility con-
to demonstrate the simultaneous optimization ap- sumption and selecting the cost-optimal heat
proach. The problem is again formulated as a exchanger network configuration. Merging these
MINLP problem. The optimal network configura- tasks leads to more cost-effective stream matches
tion is given in Fig. 4. The annual cost of this and lower exchanger costs.
network is $576,640. HRAT is 13.1C. The match See also" Global o p t i m i z a t i o n of heat
data of this network is given in Table 3. e x c h a n g e r networks; M i x e d integer lin-
ear p r o g r a m m i n g : H e a t e x c h a n g e r net-
Match Q (kW) A (m 2) work synthesis; M I N L P : Mass and heat ex-
S-C1 3676.4 32.6 changer networks; C h e m i c a l process plan-
H1-C1 863.6 64.1 ning; M i x e d integer linear p r o g r a m m i n g :
H2-C1 400.0 17.1 Mass and heat e x c h a n g e r networks; M i x e d
H3-C1 600.0 47.0 integer n o n l i n e a r p r o g r a m m i n g ; M I N L P :
H4-C1 400.0 13.8
O u t e r a p p r o x i m a t i o n algorithm; General-
H1-CW 216.4 7.9
H5-C1 720.0 18.4
ized o u t e r a p p r o x i m a t i o n ; M I N L P " Gener-
alized cross decomposition; E x t e n d e d cut-
Table 3: Match data for example problem; simultaneous ting plane algorithm; M I N L P : Logic-based
approach [12].
m e t h o d s ; M I N L P " B r a n c h and b o u n d m e t h -
ods; M I N L P : B r a n c h and b o u n d global op-
114 H2 t i m i z a t i o n algorithm; M I N L P : Global op-
2 ° t i m i z a t i o n w i t h c~BB; G e n e r a l i z e d B e n d e r s
decomposition; M I N L P : R e a c t i v e distilla-
~ -160.4~ 660
tion c o l u m n synthesis; M I N L P : Design and
I.,~s .,- l~S.~ J N ,mo.z scheduling of b a t c h processes; M I N L P : Ap-
.~2. I
plications in t h e i n t e r a c t i o n of design and
0 control; M I N L P : A p p l i c a t i o n in facility
location-allocation; M I N L P : Applications in
Fig. 4: Optimal network configuration for example blending and pooling problems.
problem; pseudopinch [2].

360
MINLP: Logic-based methods

References MINLP: LOGIC-BASED METHODS


[1] BIEGLER, L.T., GROSSMANN, I.E., AND WESTER- There has been an increasing trend to representing
BERG, A.W.: Systematic methods of chemical process
linear and nonlinear discrete optimization prob-
design, Prentice-Hall, 1997.
[2] Cmxc, A.R., AND FLOUDAS, C.A.: 'Heat exchanger lems by models consisting of algebraic constraints,
network synthesis without decomposition', Computers logic disjunctions and logic relations ([1], [7], [8]).
Chem. Engin. 15 (1990), 385-396. For instance, a mixed integer program can be for-
[3] FLOUDAS, C.A.: Nonlinear and mixed-integer optimi- mulated as a generalized disjunctive program as
zation, Oxford Univ. Press, 1995.
has been shown in [5]:
[4] FLOUDAS, C.A., AND CIRIC, A.R.: 'Strategies for over-
coming uncertainties in heat exchanger network syn- rain Z-- Eck ÷ f (x)
thesis', Computers Chem. Engin. 13, no. 10 (1989), k
1133. s.t. g(x) < 0
[5] FLOUDAS, C.A., CIRIC, A.R., AND GROSSMANN, I.E.:
'Automatic synthesis of optimum heat exchanger net-
work configurations', AIChE J. 32 (1986), 276.
V < o ,
[6] GROSSMANN, I.E., AND SARGENT, R.W.H.: 'Optimum
(DP1) < ieDk m Ck = ~-~'k
design of heat exchanger networks', Computers Chem.
Engin. 2, no. 1 (1978). kESD,
[7] GUNDERSEN, W., AND NAESS, L.: 'The synthesis of cost
f~(Y) = true
optimal heat exchanger networks: An industrial review
of the state-of-the-art', Computers Chem. Engin. 12, x C R n, c E R m,
no. 6 (1988), 503. Y e {true, false}m,
[8] JEZOWSKI, J.: 'Heat exchanger network grassroot and
retrofit design: The review of the state-of-the-art: Part in which Y/k are the boolean variables that estab-
I', Hungarian J. Industr. Chem. 22 (1994), 279-294. lish whether a given term in a disjunction is true
[9] JEZOWSKI, J.: 'Heat exchanger network grassroot and (hik(x) <__ 0), while a ( Y ) are logical relations as-
retrofit design: The review of the state-of-the-art: Part
sumed to be in the form of propositional logic in-
II', Hungarian J. Industr. Chem. 22 (1994), 295-308.
[10] PAPOULIAS, S.A., AND GROSSMANN, I.E.: 'A struc-
volving only the boolean variables. Y/k are auxil-
tural optimization approach in process synthesis- II: iary variables that control the part of the feasible
Heat recovery networks', Computers Chem. Engin. 7' space in which the continuous variables, x, lie, and
(1983), 707. the variables elk represent fixed charges which are
[11] UMEDA, T., HARADA, T., AND SHIROKO, K.: 'A ther- set to a value ~/ik if the corresponding term of the
modynamic approach to the structure in chemical pro-
disjunction is true. Finally, the logical conditions,
cesses', Computers Chem. Engin. 3 (1979), 373.
[12] YEE, T.F., AND GROSSMANN, I.E.: 'Simultaneous op- f~(Y), express relationships between the disjunc-
timization models for heat irltegration - II: Heat ex- tive sets. In the context of optimal synthesis of
changer network synthesis', Computers Chem. Engin. process networks, the disjunctions in (DP1) typi-
14, no. 10 (1990), 1165. cally arise for each unit i in the following form:

Kemal Sahin
Dept. Chemical Engin. Univ. Cincinnati
Cincinnati OH 45221, USA
I " I r: lO|
h,
ci - ~/i
< 0 V / 8'x =
L ci = 0 J
, i e I, (1)

Korhan Gursoy in which the inequalities hi apply and a fixed cost


Dept. Chemical Engin. Univ. Cincinnati
~'i is incurred if the unit is selected (Y/); otherwise
Cincinnati OH 45221, USA
(--Y/) there is no fixed cost and a subset of the
Amy Ciric
Dept. Chemical Engin. Univ. Cincinnati
x variables is set to zero with the matrix B i. An
Cincinnati OH 45221, USA important advantage of the above modeling frame-
work is that there is no need to introduce artifi-
MSC 2000:90C90 cial parameters for the 'big-M' constraints that are
Key words and phrases: MINLP, HEN synthesis, network normally used in MINLP to model disjunctions.
synthesis. M. Turkay and I.E. Grossmann [9] proposed a
logic version of the outer approximation algorithm

361
MINLP: Logic-based methods

for MINLP [3] for solving problem (DP1), and in one linear approximation of each of the terms in
which the disjunctions are given as in equation the disjunctions. Selecting the smallest number of
(1), and all the functions are assumed to be con- subproblems amounts to the solution of a set cov-
vex. The algorithm consists of solving a sequence ering problem, which is of small size and easy to
of NLP subproblems and master problems, which solve [9].
are as follows. The above problem (MDP1) can be solved by
For fixed values of the boolean variables, Y~k = the methods described in [1] and [7]. It is also in-
A

true and Y/k -- false for i # i, the corresponding teresting to note that for the case of process net-
NLP subproblem is as follows" works Turkay and Grossmann [9] have shown that
if the convex hull representation of the disjunctions
'min Z-- Eck + f(x)
in (1) is used in (MDP1), then assuming B i - I
k
and converting the logic relations gt(Y) into the
s.t. g(x) < 0
inequalities A y < a, leads to the MILP problem,
hik(X) < 0
for YTk - - true" (MIPDF) minZ- ECk + f(x)
(NLPD) Ck -- ~Yik k
for Yik -- false • ~ B i x -- 0 such that
LCk -- 0 >_ f ( x l) + V f ( x l ) T ( x -- xl),
kESD,
g(x l) + V g ( x l ) T ( x -- x l) < O,
xER n, ci E R 1
l -- 1 , . . . , L ,
Note that for every disjunction k E S D only con-
hi(x l)T Xz, + VxN, hi(x t) T XN
1i
straints corresponding to the boolean variable Yik
that is true are imposed. Also, fixed charges 7ik _< y,,
are only applied to these terms. Assuming that K
subproblems (NLPD) are solved in which sets of ~.EK~, iEI,
linearizations l - 1 , . . . , K are generated for sub- 2
XN~ -- x ~ -t- XN~ ,
sets of disjunction terms L(ik) - {l" YiZk -- true}, 1 ~ U
0 ~ XNi Xgiyi,
one can define the following disjunctive OA master
problem: 2 i __
O ~ XN <xV(l_yi)

(MDP1) minZ - ~Ck + f(x) A y <_ a,


k xER n, 1 ~ O, XNi
2 >
XNi __0 ,
such that
y{O, 1} m ,
c~ ~ f (x l) + V f (xl)T (x - xl),
where the vector x is partitioned into the variables
+ vg(z ) T < 0, for each disjunction i according to the definition
l = 1,...,L, of the matrix B i. The linearization set is given by
K~ - {g" Y~i - true, t ~ - 1,... ,L} that denotes
the fact that only a subset of inequalities were en-
V + Vh k(J)T( - _ 0 ,
forced for a given subproblem e. It is interesting
iE Dk Ck -- "~ik
to note that the logic-based outer approximation
kESD, algorithm represents a generalization of the mod-
~(Y) = true, eling/decomposition strategy [5] for the synthesis
c~ E R , x E R n, c E R m, of process flowsheets.
Turkay and Grossmann [9] have also shown that
Y E {true, false}re.
while a logic-based generalized Benders method [4]
It should be noted that before applying the cannot be derived as in the case of the OA algo-
above master problem it is necessary to solve vari- rithm, one can exploit the property for MINLP
ous subproblems (NLPD) so as to produce at least problems that performing one Benders iteration

362
MINLP: Mass and heat exchanger networks

[2] on the MILP master problem of the OA al- [3] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer-
gorithm, is equivalent to generating a generalized approximation algorithm for a class of mixed-integer
nonlinear programs', Math. Program. 36 (1986), 307.
Benders cut. Therefore, a logic-based version of the
[4] GEOFFRION, A.M.: 'Generalized Benders decomposi-
generalized Benders method consists of performing tion', J. Optim. Th. Appl. 10, no. 4 (1972), 237-260.
one Benders iteration on the MILP master prob- [5] Kocm, G.R., AND GROSSMANN, I.E.: 'A modeling and
lem (MIPDF). It should also be noted that slacks decomposition strategy for the MINLP optimization
can be introduced to (MDP1) and to (MIPDF) of process flowsheets', Computers Chem. Engin. 13
to reduce the effect of nonconvexities as in the (1989), 797.
[6] LEE, S., AND GROSSMANN, I.E.: 'New algorithms for
augmented-penalty MILP master problem [10]. nonlinear generalized disjunctive programming', Com-
Finally, it should be noted that S. Lee and puters Chem. Engin. 24 (2000), 2125.
Grossmann [6] have developed a new branch and [7] RAMAN, R., AND GROSSMANN, I.E.: 'Symbolic integra-
bound method and a MINLP reformulation that is tion of logic in mixed integer linear programming tech-
niques for process synthesis', Computers Chem. Engin.
based on the convex hull of each of the disjunctions
17' (1993), 909.
in (DP 1) with nonlinear inequalities. [8] RAMAN, R., AND GROSSMANN, I.E.: 'Modelling and
See a l s o " Disjunctive programming; computational techniques for logic based integer pro-
Reformulation-linearization m e t h o d s for gramming', Computers Chem. Engin. 18 (1994), 563.
global o p t i m i z a t i o n ; M I N L P : B r a n c h and [9] TURKAY, M., AND GROSSMANN, I.E.: 'A logic based
outer-approximation algorithm for MINLP ~optimiza-
bound methods; MINLP: Branch and
tion of process flowsheets', Computers Chem. Engin.
bound global optimization algorithm; 20 (1996), 959-978.
M I N L P : Global o p t i m i z a t i o n w i t h (~BB; [10] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
M I N L P : G e n e r a l i z e d cross decomposition; bined penalty function and outer-approximation
D e c o m p o s i t i o n principle of linear p r o g r a m - method for MINLP optimization', Computers Chem.
ming; G e n e r a l i z e d B e n d e r s decomposition; Engin. 14 (1990), 769.
Simplicial d e c o m p o s i t i o n algorithms; Sto- Ignacio E. Grossmann
chastic linear p r o g r a m m i n g : D e c o m p o s i t i o n Carnegie Mellon Univ.
Pittsburgh, PA, USA
and c u t t i n g planes; Simplicial decomposi-
E-mail address: grossmann©cmu, edu
tion; Successive q u a d r a t i c p r o g r a m m i n g :
MSC2000: 90C10, 90C09, 90Cll
D e c o m p o s i t i o n m e t h o d s ; C h e m i c a l process
Key words and phrases: generalized disjunctive pro-
planning; M i x e d integer linear p r o g r a m - gramming, disjunctive programming, outer approximation
ming: Mass a n d heat e x c h a n g e r networks; method, generalized Benders decomposition, mixed integer
M i x e d integer n o n l i n e a r p r o g r a m m i n g ; programming.
M I N L P : O u t e r a p p r o x i m a t i o n algorithm;
Generalized outer approximation; Extended
c u t t i n g plane algorithm; M I N L P : H e a t ex- MINLP: MASS A N D HEAT E X C H A N G E R
c h a n g e r n e t w o r k synthesis; M I N L P : Reac- NETWORKS, M E N , M H E N
tive distillation c o l u m n synthesis; M I N L P :
Design a n d scheduling of b a t c h processes; Mass integration in the form of mass exchanger
M I N L P : A p p l i c a t i o n s in t h e i n t e r a c t i o n of networks, MEN, appears in the chemical indus-
design and control; M I N L P : A p p l i c a t i o n in tries as an economic alternative in waste treat-
facility location-allocation; M I N L P : Appli- ment, feed preparation, product separation, recov-
cations in b l e n d i n g a n d pooling problems. ery of valuable materials, etc. The MEN involves a
set of rich streams, wherefrom one or more compo-
nents are removed by means of lean streams (mass
References separating agents) in mass transfer operations that
[1] BEAUMONT, N.: 'An algorithm for disjunctive pro- do not require energy (constant pressure and tem-
grams', Europ. J. Oper. Res. 48 (1991), 362-371.
[2] BENDERS, J.F.: 'Partitioning procedures for solv-
perature).
ing mixed-variables programming problems', Numer. The MEN synthesis/design problem is posed as
Math. 4 (1982), 238-252. a combinatorial problem, involving discrete and

363
MINLP: Mass and heat exchanger networks

continuous decisions (e.g. the mass exchange op- increasing thus the considered MEN struc-
erations/matches and the unit sizes, respectively), tures and the combinatorial complexity of the
that both affect the overall mass integration cost. synthesis problem. Note that, this is not sim-
Rich streams ilar to an a priori decomposition of the net-
R={il i=l..N R} work into separable subnetworks.
Gi yS
I,C
Lean • Each stream entering the network is split to-
streams wards all its potential mass exchanger units.
S ={jl j=I..Ns }
After each mass exchanger, a splitter is con-
Mass
Exchange
sidered for each stream, where the stream is
Network split towards its final mixer and all the other
l-
U
x! < xt < xu potential stream exchangers.
Lj _<Lj j,c-- j,c-- j,c
xs Prior to each potential mass exchanger, a
J,C
1 < y! < yU mixer is considered for each participating
Yi,c- ~,c- i,c stream, where the flow from the initial split-
Fig. 1. ter and connecting (bypass) flows from all the
other exchangers of the stream are merged
When the mass transfer operations can take
into the flow towards the exchanger.
place at different temperatures, heat integration
of the rich and lean streams is also considered • A mixer is considered at the network out-
within a combined mass and heat exchanger net- let of each stream, where flows from all the
work, MHEN, synthesis problem. potential stream exchangers are merged into
In isothermal MEN synthesis, the simultaneous the outlet flow.
optimization of the mass exchange operations, the For example, for a rich stream i and its mth and
mass separating agent flows and the network con- m~th possible exchangers with lean streams j and
figuration has been formulated by K.P. Papalexan- j' respectively, we have Fig. 2"
dri, E.N. Pistikopoulos and C.A. Floudas [9] as an
MINLP problem based on:
yOoc gO
g..I m ~ ~ , g l ~ J E ..............
~[,:,B ......... ~ j m
a) the MEN superstructure of synthesis/design q ijm ijj'mm' t o
alternatives; G i _ [ s. ,[-. R\ I other ~ _
s -\ omer g~..,.,.. ]------'~\ exchangers( t
Y ic \excnangers "JJ...... I \ ! Yic
b) modeling of mass exchange in each mass ex-
changer; and, gij'
'm'~ .........."...........J i ~ =@ "~ /
c) minimization of a total annualized network
Fig. 2: Rich stream superstructure.
cost.
xZ.. x.°
Details are given below qmc~ - qmc ,n

MEN Superstructure. The MEN superstructure for


1;°/ "'"............~
Lj /from \ ljm /~iymm' to
a given set of rich and lean streams includes all =I other 1Bk'-----~ [
x s. \....changers l i , i j m , m ~ \\ exchangers
o t h e r ,/ t
Xjc
possible mass exchange operations (mass exchange
jc /
matches) between the network streams in all pos- 1iJm'\ .......... ..."
sible network configurations. Its main features are:
Fig. 3. Lean stream superstructure.
• Each potential match between a rich and a
lean stream corresponds to a potential mass In Fig. 2, c -- 1 , . . . , C are the transferable
exchanger (one-to-one correspondence). components. All possible configurations for the
Multiple mass exchange matches between two exchangers ((ijm) and ( i j ' m ' ) i n series, or
two streams may be considered (i.e. streams in parallel), result by 'deleting' appropriate con-
integrated at different points in the network), necting streams. Stream deletion corresponds to

364
MINLP: Mass and heat exchanger networks

,
zero stream flows (e.g. gij'm' o
-- gijm - 0 and intermediate compositions of components (molar
gij'jm'mB _-- 0 results in the exchangers in series). fractions x I, x 0, yI, yO) are illustrated in the cor-
For a lean stream j and its mth and m'th ex- responding superstructure figures.
changers with rich streams i and i', we have Fig. 3.
Modeling Mass Exchange. The existence of each
The MEN superstructure is described by mass potential mass exchanger in the network is denoted
balances for the overall streams and each transfer- by a binary variable:
able component at the exchangers, splitters and
mixers of the superstructure: 1, when t h e m t h exchanger
Eijm - between streams i and j exists,
E I 0
gijm (Yijmc - Yijmc) - Mijmc, 0, otherwise,
iER, c-1,...,C,
liEm (Xijmc
0 I
(1) and defined by
-- Xijmc ) -- Mijmc,
i e S, c - 1 , . . . , C , Em-Eli m U <_0

0
gijm -~- ~
E
j6S,m
E
iER,m
g I m -- Gi - O,

IIijm-Lj -0'

B
gijj'mm' -- gijm = 0,
E
i E R,

iES,
(2)
ly lEm- E i j m U < O,
M i j m c - EijmU < O,
g,%, z,%, >_ O,
where Mijmc is the mass exchange load of com-
ponent c in mass exchanger (ijm), and U a large
(7)

fES,m I positive number.


0 B E (3) In each potential ma.ss exchanger a component
lij m + E lii,jm m, -- lijm = O,
i'6S,m' c is transferred from the rich to the lean stream
iER, jES, m=l,...,M, when the rich composition is greater than the
equilibrium composition with respect to the lean
9~'m + E B
gij'jm'm E -- O,
-- gijm stream:
fES,m I
liSm + ~ li,Bijm, m -- lijE m -- O, (4) yc>_f(Xc),
i'6S,m'
where f(xc) is the mass transfer equilibrium rela-
i6R, jES, m=l,...,M, tion, that may account for reactive mass transfer
I
gijmYic +
8
E B 0 also.
gij'jm'mYij'm'c
j'6 S,m' Feasibility of mass transfer is ensured imposing
E I the above constraint at the inlet and outlet of the
--gijmYijmc O, - -

I s streams, i.e. (for counter-current flows):


lijmXj c q- E B
li'ijm, 0 c
mXi,jm,
ilES,m I (5)
--yljm c q" f ( x O m c ) + 6ijc --(1 - E i j m ) U ~_ O,
E I
--lijmXijmc -- O, o I
--Yijmc + f(Xijmc) -+- eijc -- (1 -- Eijm)U <_ O,
iER, jES, (s)
c - 1 , . . . , C, m - 1 , . .. , M ,
w h e r e eijc is a m i n i m u m
composition difference
E 0 0
gijmYijmc - GiYict - O, that is required for feasible mass exchange in a
j6S,m unit of finite size (e.g. imposed from mechani-
iER, c-1,...,C, cal constraints). When f(Xc) is not convex the
E 0 0 constraints in (8) cannot guarantee feasible mass
lijmXijmc -- L j x j tc -- O,
i6R,m transfer throughout the exchanger. In this case
i E S, c = 1 , . . . , C , f(Xc) can be approximated by a set of convex func-
tions and feasible mass transfer be ensured con-
where the inlet, outlet, exchanger and exchanger- sidering the constraints in (8) also for intermedi-
connecting flows of the rich and lean s t r e a m s (gI, ate exchanger points, that define the convex parts.
gO, gE, gB and I x, l °, l E, l B, respectively) and the Note that, the mass-transfer feasibility or driving-

365
MINLP: Mass and heat exchanger networks

force constraints in (8) are activated only when the The main advantage of the simultaneous MEN
corresponding exchanger exists (Eijm - 1). synthesis model (P1), as opposed to the sequen-
The size of each potential mass exchanger (num- tial MEN synthesis method, is that the trade-off
ber of mass transfer stages, N st, etc.) is calculated between the capital and operating costs is system-
as a function of the variable mass transfer, through atically considered. Also,
appropriate design equations (e.g. for perforated- • (P1) derives the optimal network with re-
plate columns the Kremser equation): spect to all the transferable components, con-
_ Nst (gijm
E E
, lijm, XIjmc , Xijmc,
0 I y.O. sidering the mass transfer of each compo-
Yijmc, ~3mc)"
nent separately within the calculated mass-
(9)
transfer stages of each exchanger.
Minimizing Network Cost. The total network cost • Forbidden mass exchange matches, limited
comprises mass exchange and/or forbidden exchanger
connections can be explicitly considered in
• the annualized capital cost of the mass ex-
(P1).
changers, that may be discontinuous (involve
a fixed charge cost factor), and • Variable target compositions are straightfor-
wardly handled.
• the annualized operating cost, i.e. the cost of
the mass separating agents. When the mass exchange matches and mass ex-
change loads are fixed (e.g. when these are deter-
Consequently, the MEN MINLP synthesis mined within a sequential MEN synthesis frame-
model is formulated as follows: work), (P1) reduces to an NLP and can be solved
(P1) min to derive a network configuration and unit sizes
with minimum capital cost.
E (AC~jmEijm+ AC2m(N~jtm))+ ~ AC~Lj Extending the concept of cost optimality of the
ijrn j mass exchanger network, two special cases have
such that been studied:
• MEN and regeneration networks.
(2)-(9)
When regenerating agents are available for
, gijm, g i j j ' m m ~ ~ O, some (or all) lean streams, the total mass
I 0
Yijmc, Yijmc -- ,
>0 integration cost involves also the regenera-
tion cost. The regeneration network can be
icR, j,j'cS,
considered simultaneously within the MINLP
m, m I = 1, . . . , M, MEN synthesis model [9], accounting for
c= 1,...,C, all the regeneration alternatives of the lean
liIm , lijE m, lii,
B jmm, , lij0 m > O, streams and employing binary variables to
I 0
denote the existence of the regenerating ex-
Xijmc , Xijmc ~ O, changers. In this case, the mass separating
icR, j,j'cS, agents behave as lean streams in the mass
m,m I = 1,...,M, exchangers of the main MEN and as rich
streams in the regenerating mass exchang-
c= 1,...,C,
ers. The regeneration network is not neces-
E i j m = 0, 1, sarily separable from the main MEN, as a
iER, j,j' ES, lean stream may be partly regenerated be-
m, m l = 1 , . . . , M . fore being used as a separating agent in an-
other mass exchanger. Thus, the lean stream
(P1) is a nonconvex MINLP problem and global superstructures involve all the possible inter-
optimization methods are required to guarantee connections between the exchangers of the
global optimal solutions. main MEN and the regenerating exchangers.

366
MINLP: Mass and heat exchanger networks

For example, for a lean stream j and its mth peratures, heat integration between the network
and m~th exchangers with rich stream i and streams can be simultaneously considered within
regenerant k we have Fig. 4. a combined MEN and HEN synthesis problem [7].
x1.. xo The available rich and lean streams define hot, cold
=---. ..... -...
or hot-and-cold streams in the heat integration
1E BR ..............
L~ I ...: \ tjm pijkmm, to other \ problem, depending on whether their supply and
J I ." Rn\ I exchangers.
1;;..7_,_';L----P x (MENan.d ,~ target compositions are above or below the mass
X2s ~
'"
"
-jKtm m J \ regenerating)[ X(
=
J ".. E, exchange temperatures. Thus, their heat exchange
".. "........ / ~jkm'~ / g
toother "~ fro,,,O:t,;:r"..~,~ ~ ' ~ X I alternatives include both hot- and cold-side match-
MEN exchangers Jexch"an's'ers . . ~ ' ~ ) . . o-R .......
(MEN and. ^ikm
regenerating) " / ^ikm'
" "......" " , . . ~ ing. Inlet and outlet temperatures and composi-
tions in mass and heat exchangers are variables.
Fig. 4: Regenerable lean stream superstructure. The combined mass and heat exchanger super-
The overall superstructure of mass ex- structure involves all the possible mass and heat
change and regeneration alternatives involves exchangers of a stream and all the possible inter-
also the superstructures of the regenerating connections between them, Fig. 6.
agents, that have variable flows, while the I m~s [__
overall network cost includes the main MEN ...." Tm ...
/" oth........ I I other ""..~
and the regeneration cost (capital and oper- Rich stream
[ ' . . . . . . ha,tgersl I massexchangers "'~
I"".:"'~a:e~cthSaidn;ersl.4 ......[he%Ui~cl2ng~eers.......
.. l
ating cost). ~...- :: .. ..
zI zO
"j.'krn ~ ~km hot,i~ ~
~,,t,, ]exchanger]. 'ide I']exchanger
coldside ~. . -

T in > T °ut Tin ,: T °ut

Hk
/ ~~~]/ ............
= from other ,
...
...."
hE
jkm
/"
......~
'r~
to othe
exchangers ~._.__.._.~
Fig. 6: Combined MEN and HEN superstructure.
Z s ~ exchangers ~,J 4 ~ Zk
k The combined MEN and HEN superstructure is
'".........
described by
• mass balances at the superstructure splitters
Fig. 5: Regenerating stream superstructure. (i.e. the initial stream splitters and the split-
ters after each side of the possible mass and
• Flexible mass exchange networks.
heat exchangers), similar to (2) and (3), and
The ability of MEN to accommodate varia-
considering all the connecting flows;
tions in the rich stream flows and inlet com-
positions in an efficient manner affects cost • mass balances for overall flows and transfer-
optimality. A multiperio'd M I N L P M E N syn- able components at the superstructure mix-
thesis model has been suggested in [7], to de- ers (i.e. the final stream mixers and the mix-
rive mass exchange networks, flexible to ac- ers prior to each side of the potential mass
commodate in an optimal manner different and heat exchangers), similar to (4), (5) and
mass integration requirements. In the mul- (6), and considering all the connecting flows;
tiperiod MINLP model a weighted operat- • energy balances at the superstructure mixers;
ing cost is optimized simultaneously with the • mass balances at the mass exchangers, simi-
capital cost for mass exchangers that can op- lar to (1), and
erate feasibly under the different conditions.
• energy balances at the heat exchangers.
The MEN superstructure is extended to in-
clude control variables that enhance flexi- The MHEN synthesis model also involves
bility (as exchanger-bypassing streams and • binary variables, to denote the existence of
overall bypass streams that are accordingly mass and heat exchangers, and their defini-
penalized). tion (mixed integer constraints),
When the alternative mass transfer opera- • driving force constraints for mass exchange
tions take place at different and/or variable tem- (8) at the potential mass exchangers, and for

367
MINLP: Mass and heat exchanger networks

heat exchange at the potential heat exchang-


E Gi(yiSc Y~c) - ~ Lj(x}c - x~c) (10)
ers (based on ATtain), iCR j6s
• design equations for the potential mass and and
heat exchangers, and
• feasibility of mass exchange above (and be-
• a total annualized network cost. low) each candidate mass exchange pinch:
and is formulated as a (nonconvex) MINLP.
The simultaneous MHEN synthesis model ad-
dresses systematically the trade-off between capi-
Mass lost by all the rich
streams below each pinch
point candidate
/
tal and operating cost of mass and heat integra-
tion. The MEN and HEN are not assumed sep-
arable. Thus, better integration can be achieved,
as it is allowed for a stream to be partly heated
Mass gained by all the lean
streams below each pinch
point candidate
/_<°
for a particular mass exchange operation and then i.e.
heated further for final purification. EGi (11)
In the simple case when the temperatures of the i6R
mass exchange operations are given or can be pre- x [max(0, yp - Y~c)- max(0, yp - y,Sc)]
postulated, the rich and lean streams define hot
(or cold) streams before participating to mass ex- -ELj
j6S
changers and cold (or hot) streams afterwards [11].
× [m x(O, - xj )- m x(O, -
The final mass and heat exchanger network
structure results from the flows of the superstruc- <0
ture substreams. Alternatively, the use of binary Note that the thermodynamic feasibility re-
variables has been suggested in [7] to denote the quirements in (11) involve nondifferentiable terms
existence of exchanger connections. This, although if inlet and outlet compositions are variables (po-
increasing the combinatorial complexity of the sition of streams with respect to candidate pinch
MINLP synthesis model, allows for: points). These can be handled either employing
i) explicit piping cost considerations, differentiable approximation functions [6], or in-
ii) structural constraints to be easily modeled, troducing binary variables [2], [3], [5].
and The main assumption in MEN is that mass
transfer operations are isothermal. In the gen-
iii) the solution of simple NLP subproblems
eral case these can be followed (or caused) by
within a decomposition-based MINLP solu-
tion method. heat transfer, as in distillation. Assuming constant
counter-current molar flows, M.J. Bagajewicz and
Mass exchange networks have been introduced V. Manousiouthakis showed in [1] that distillation
as an end-of-pipe treatment alternative. However, columns can be handled as pure mass transfer op-
the extent of mass recovery and the corresponding erations and derived targets for energy consump-
cost are closely related to the reactive and mix- tion and separation of a 'key' component, employ-
ing operations in a process. A. Lakshmanan and ing the first and second thermodynamic laws in
L.T. Biegler [6] have suggested a MINLP model for (10) and (11), within an MINLP-based MHEN
the synthesis of optimal reactor networks, where sequential synthesis framework. The problem of
the thermodynamic feasibility of mass integration energy-induced separations has been addressed by
and its implications are taken simultaneously into M.M. El-Halwagi, B.K. Srinivas and R.F. Dunn in
account, applying the first and second thermody- [4], translating the energy-based separation tasks
namic laws for mass exchange, i.e. into simple energy-requiring operations (heating
• Total mass balance for the mass integrated and cooling tasks) and deriving targets for energy
streams (resulting process and available rich consumption and the corresponding mass recovery,
and lean streams); based on thermodynamic feasibility constraints.

368
MINLP: Outer approximation algorithm

Extending the concept of mass exchange to non- ous mass integration', Industr. Engin. Chem. Res. 35
isothermal mass transfer operations Papalexandri (1996), 4523-4536.
and Pistikopoulos introduced a mass/heat transfer
[7] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: 'A
multiperiod MINLP model for the synthesis of flexible
module [8], where mass is transferred between dif- heat and mass exchange networks', Computers Chem.
ferent phases or reacting species if that is thermo- Engin. 18 (1994), 1125-1139.
dynamically feasible, i.e. if that decreases the total IS] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: 'A
Gibbs free energy of the system. Mass and energy generalized modular representation framework for pro-
cess synthesis', AIChE J. 42 (1996), 1010-1032.
balances, taking into account possible reactions,
[9] PAPALEXANDRI, K.P., PISTIKOPOULOS, E.N., AND
and mass-transfer driving-force constraints based FLOUDAS, C.A.: 'Mass exchange networks for waste
on total Gibbs free energy are employed to model minimization: A simultaneous approach', Chem. En-
the mass/heat transfer module as an aggregate gin. Res. Des. 72 (1994), 279-294.
of differential mass and energy transfer phenom- [10] SEBASTIAN, P., NADEAU, J.P., AND PUIGGALI, J.R.:
ena. Considering a superstructure of mass/heat 'Designing dryers using heat and mass exchange net-
works: An application to conveyor belt dryers', Chem.
and heat exchange modules in a process and all
Engin. Res. Des. 74 (1996), 934-943.
possible interconnections between them, process [11] SRINIVAS, B.K., AND EL-HALWAGI, M.M.: 'Synthe-
synthesis tasks can be formulated as mass/heat sis of combined heat and reactive mass exchange net-
and heat exchange superstructure MINLP prob- works', Chem. Engin. Sci. 49 (1994), 2059-2074.
lems, where binary variables are employed to de- Katerina P. Papalexandri
note the existence of mass/heat and heat exchang- bp Upstream Technol.
ers. Then, process operations (conventional and/or U.K.
hybrid) and networks are derived as combinations E-mail address: papaloxk©bp, tom
of mass/heat and heat exchange phenomena [8],
MSC2000: 93A30, 93B50
Key words and phrases: MINLP, mass and heat exchange,
See also: M I N L P : H e a t e x c h a n g e r n e t - separation.
w o r k s y n t h e s i s ; G l o b a l o p t i m i z a t i o n of h e a t
e x c h a n g e r n e t w o r k s ; M i x e d i n t e g e r lin-
ear programming: Heat exchanger network
MINLP: O U T E R APPROXIMATION AL-
synthesis; M i x e d i n t e g e r l i n e a r p r o g r a m -
GORITHM
ming: Mass and heat exchanger networks;
The outer approximation algorithm (OA algo-
MINLP: Global optimization with aBB.
rithm) ([1], [2], [9]) addresses mixed integer non-
linear programs of the form:
References rain Z- f(x,y)
[1] BAGAJEWICZ, M.J., AND MANOUSIOUTHAKIS, V.:
'Mass/heat exchange network representation of distil- (P) s.t. gj(x,y) <_ 0, j C J,
lation networks', AIChE J. 38 (1992), 1769-1800. x6X, y~Y,
[2] EL-HALWAGI, M.M., AND MANOUSIOUTHAKIS, V.: 'Si-
multaneous synthesis of mass-exchange and regenera- where f(.), g(.) are convex, differentiable func-
tion networks', AIChE J. 36 (1990), 1209-1219. tions, J is the index set of inequalities, and x
[3] EL-HALWAGI, M.M., AND SmNIVAS, B.K.: 'Synthesis and y are the continuous and discrete variables,
of reactive mass exchange networks', Chem. Engin. Sci.
respectively. The set X is commonly assumed to
47 (1992), 2113-2119.
[4] EL-HALWAGI, M.M., SRINIVAS, B.K., AND DUNN, be a convex compact set, e.g. X = {x: x C
R.F.: 'Synthesis of optimal heat-induced separation R n, Dx <_ d, x L <_ x <_ xU}; the discrete set Y
networks', Chem. Engin. Sci. 50 (1995), 81-97. corresponds to a polyhedral set of integer points,
[5] GUPTA, A., AND MANOUSIOUTHAKIS,V.: 'Minimum Y = {y: y E Z m, Ay <_ a}, and in most cases is
utility cost of mass exchange networks with variable restricted to 0-1 values, y E {0, 1} m. In most ap-
single component supplies and targets', lndustr. En-
gin. Chem. Res. 32 (1993), 1937-1950. plications of interest the objective and constraint
[6] LAKSHMANAN, A., AND BIEGLER, L.T.: 'Synthesis functions f(.), g(-) are linear in y (e.g. fixed cost
of optimal chemical reactor networks with simultane- charges and logic constraints).

369
MINLP: Outer approximation algorithm

The OA algorithm is based on the following the- PROPERTY 2 The solution of problem (RM-OA),
orem [1]: corresponds to a lower bound to the solution of
THEOREM 1 Problem (P) and the following problem (P). [:]
mixed-integer linear program (MILP) master Note that since function linearizations are ac-
problem (M-OA) have the same optimal solution cumulated as iterations proceed, the master prob-
(x*, y*), lems (RM-OA) yield a nondecreasing sequence of
(M-OA) min ZL -- lower bounds, Z~ < . . . < Z K, since linearizations
such that are accumulated as iterations k proceed.
The OA algorithm as proposed by M.A. Duran
a >_ f ( x k yk) + V f ( x k yk) _ yk , and I.E. Grossmann [1] consists of performing a
cycle of major iterations, k = 1 , . . . , K, in which
gj(x k yk) + Vgj(x k yk) _ yk ~_ 0, (NLP1) is solved for the corresponding yk and
the relaxed MILP master problem (RM-OA) is up-
jEJ, kEK*, dated and solved with the corresponding function
xEX, yEY, linearizations at the point (x k, yk). The (NLP1)
subproblems yield an upper bound that is used to
where
define the best current solution, UB g - min(Zkv).
(x k, yk) is the optimal The cycle of iterations is continued until this upper
K*- / k: solution to (NLP1) / ' bound and the lower bound of the relaxed master
for all feasible yk E Y problem, are within a specified tolerance.
min Z~: - f(x, yk) It should be noted that for the case when the
problem (NLP1) has no feasible solution, there are
(NLP1) s.t. gj(x,y k) ~ O, j E J,
two major ways to handle this problem. The more
xEX, general option is to consider the solution of the
where Zkv is an upper bound to the optimum of feasibility problem,
problem (P). [::]
min u
Note that since the functions f(x, y) and g(x, y) (NLFP) s.t. gj(x, yk) ~_ u, j E J,
are convex, the linearizations in (M-OA) corre- x E X, u E R 1.
spond to outer approximations of the nonlinear
feasible region in problem (P). Also, since the mas- R. Fletcher and S. Leyffer [2] have shown that
ter problem (M-OA) requires the solution of all for infeasible NLP subproblems, if the linearization
feasible discrete variables yk, the following MILP at the solution of problem (NLFP) is included, this
relaxation is considered, assuming that the solu- will guarantee convergence to the optimal solution.
tion of K NLP subproblems is available: For the case when the discrete set Y is given
(RM-OA) m i n Z g -- a by 0-1 values in problem (P), the other option to
ensure convergence of the OA algorithm without
such that
solving the feasibility subproblems (NLFP), is to
o~ > f (x k yk) + V f (xk yk) ( x - x k ) introduce the following integer cut whose objective
-- , , y_ yk ,
is to make infeasible the choice of the previous 0-1
values generated at the K previous iterations [1]:
gj(x k yk) + Vgj(x k yk) _ yk ~_ 0,

jEJ, k = 1,...,K, (ICUT) iEBk iEN k

xEX, yEY. k- 1,...,K,

Given the assumption on convexity of the func- where B k - {i" yki -- 1}, g k - {i" yki -- 0},
tions ](x,y) and g(x,y), the following property k = 1 , , . . . , K. This cut becomes very weak as the
can be easily be established, dimensionality of the 0-1 variables increases. How-

370
MINLP: Outer approximation algorithm

ever, it has the useful feature of ensuring that new (RM-GBD) m i n Z K - c~


0-1 values are generated at each major iteration.
such that
In this way the algorithm will not return to a pre-
vious integer point when convergence is achieved. c~ > f (x k, yk) + Vu f (x k, yk) 7-(y _ yk)
Using the above integer cut the termination takes T + Vg(x _
place as soon as ZK _> UBK.
The OA method generally requires relatively k c KFS,
few cycles or major iterations. One reason for this Vg(x
behavior is given by the following property:
k E KIS,
PROPERTY 3 The OA algorithm trivially con-
x c X, c~ c R 1,
verges in one iteration if f(x, y) and g(x, y) are
linear. [--] where K F S is the set of feasible subproblems
(NLP1) and K I S the set of infeasible subprob-
The proof simply follows from the fact that if
lems whose solution is given by (NLFP). Also
f(x, y) and g(x, y) are linear in x and y the MILP
IKFS C K I S I = K. The following property, holds
master problem (RM-OA) is identical to the orig-
between the two methods [1]:
inal problem (P).
It is also important to note that the MILP mas- PROPERTY 4 Given the same set of K subprob-
ter problem need not be solved to optimality. In lems, the lower bounds predicted by the relaxed
fact given the upper bound UB g and a tolerance master problem (RM-OA) are greater or equal to
it is sufficient to generate the n e w (yg XK) by the ones predicted by the relaxed master problem
solving, (RM-GBD). D

(M-OAF) m i n Z g -- 0c~ The above proof follows from the fact that the
Lagrangian and feasibility cuts in (RM-GBD) are
such that
surrogates of the outer approximations in the mas-
~>_ U B k - e , ter problem (M-OA). Given the fact that the
lower bounds of GBD are generally weaker, this
, , (;_ , method commonly requires a larger number of cy-
cles or major iterations. As the number of 0-1
gj(x k yk) + Vgj(x k yk) (;__ yk <_0, variables increases this difference becomes more
pronounced. This is to be expected since only
jEJ, k=l,...,K, one new cut is generated per iteration. Therefore
xCX, yEY. user-supplied constraints must often be added to
the master problem to strengthen the bounds. As
While in (M-OA) the interpretation of the new for the OA algorithm, the trade-off is that while
point yK is that it represents the best integer so- it generally predicts stronger lower bounds than
lution to the approximating master problem, in GBD, the computational cost for solving the mas-
(M-OAF) it represents an integer solution whose ter problem (M-OA) is greater since the number
lower bounding objective does not exceed the cur- of constraints added per iteration is equal to the
rent upper bound UBK; in other words it is a fea- number of nonlinear constraints plus the nonlinear
sible solution to (M-OA) with an objective below objective.
the current estimate. Note that in this case the The OA algorithm is also closely related to the
OA iterations are terminated when (M-OAF) is extended cutting plane (ECP) method by T. West-
infeasible. erlund and F. Peterssen [8]. The main difference
Another interesting point about the OA algo- lies that in the ECP method no NLP subprob-
rithm is the relationship of its master problem with lem is solved, and that linerization simply takes
the one of the generalized Benders decomposition place over the predicted continuous points from
method [3], which is given by: the MILP master problem, which in turn will nor-

371
MINLP: Outer approximation algorithm

mally only include linearizations of the most vio- EXAMPLE 5 In order to illustrate the performance
lated constraints. of the OA algorithm, a simple numerical MINLP
Extension of the OA algorithm [4] include the example is considered.
L P / N L P based branch and bound [6], which
min Z- Yl + 1.5y2 + 0.5y3
avoids t h e complete solution of the MILP mas-
+
ter problem (M-•A) at each major iteration. The
method starts by solving an initial NLP sub- s.t. ( x l - 2) 2 - x 2 _< 0
problem which is linearized as in (M-•A). The x l - 2yl _> 0
basic idea consists then of performing an LP-
xl - x2 - 4 ( 1 - Y2) _< 0
based branch and bound method for (M-•A) in
(MIP-EX) Xl-(1-yi) ~0
which NLP subproblems (NLP1) are solved at
those nodes in which feasible integer solutions are x 2 - - Y 2 ~_0
found. By updating the representation of the mas- Xl + x2 __ 3y3
ter problem in the current open nodes of the tree Yl + Y2 + Y3 _~ 1
with the addition of the corresponding lineariza- 0 < _ x i ~_4, 0__x2__4
tions, the need of restarting the tree search is
Yl, Y2, Y3 -- 0, 1.
avoided. Another important extension has been
the method by Fletcher and Leyffer [2] who in-
cluded a quadratic approximation based on the Objective function
Hessian of the Lagrangian to the master problem . m

(M-OAF) in order to capture nonlinearities in the 10


0-1 variables. Note that in this case the optimal
solution of the mixed integer quadratic program 5
(MIQP), Z K, does not predict valid lower bounds
in this case, and hence the constraint c~ _< U B K - 0 $•@ • •
is added, with which the search is terminated when i

m
,~..........~"
no feasible solution can be found in the MIQP mas- -5 m

ter. m

-10 m

Finally, in order to handle equations in prob- ,.,,,

lem (P), G.R. Kocis and Grossmann [5] proposed


-15 m

the equality relaxation strategy, in which lineariza- m

tions of equations are converted into inequalities


-20
m

m
l
for the MIP master problem according to the sign m

of the Lagrange multipliers of the corresponding -~ I I I


NLP subproblem. J. Viswanathan and Grossmann 1 2 3 Iterations 4
[7], further proposed to add slack variables to this
MILP master problem, and an augmented penalty Fig. 1: Progress of iterations of OA and GBD for MINLP
function. Since in this generally nonconvex case in MIP-EX.
the bounding properties do not apply, the algo- The optimum solution to this problem corresponds
rithm was modified so as to start with the NLP to Yl = 0, Y2 = 1, Y3 ---- 0, X l = 1, X 2 - - - 1, Z = 3.5.
relaxation of problem (P). If no integer solution is Fig. 1 shows the progress of the iterations of the
found, iterations between the MILP and NLP sub- OA and GBD algorithm with the starting point
problems take place until there is no improvement Yl = Y2 = Y3 = 1. As can be seen the lower bounds
in the objective function. This idea was precisely predicted by the OA algorithm are considerably
implemented in the commercial code DICOPT, stronger than the ones predicted by GBD. In par-
which can also be modified to the original OA algo- ticular at iteration 1, the lower bound of OA is
rithm, if the user knows that the functions f(x, y) 1.0 while the one of GBD is -23.5. Nevertheless,
and g(x, y) are convex. since this is a very small problem GBD requires

372
MINLP: Reactive distillation column synthesis

only one more iteration t h a n OA (4 versus 3). It is Control, Vol. 93 of IMA Vol. Math. Appl., Springer,
interesting to note that the NLP relaxation of this 1997, pp. 73-100.
[5] KocIs, G.R., AND GROSSMANN, I.E.: 'Relaxation
problem is 2.53, which is significantly lower than
strategy for the structural optimization of process
the optimal mixed integer solution. Also, as can be flow sheets', Industr. Engin. Chem. Res. 26, no. 1869
seen in Table 1, an NLP-based branch and bound (1987).
method requires the solution of 5 NLP subprob- [6] QUESADA, I., AND GROSSMANN, I.E.: 'An LP/NLP
lems, while the E C P method requires 5 successive based branch and bound algorithm for convex MINLP
MILP problems. optimization problems', Computers Chem. Engin. 16
(1992), 937-947.
Method Subproblems Master LPs [7] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
problems solved bined penalty function and outer-approximation
BB 5 (NLP1) method for MINLP optimization', Computers Chem.
OA 3 (NLP2) 3 (M-MIP) 19 LPs Engin. 14 (1990), 769.
GBD 4 (NLP2) 4 (M-GBD) 10 LPs [8] WESTERLUND, T., AND PETTERSSON, F.: 'A cutting
ECP -- 5 (M-MIP) 18 LPs plane method for solving convex MINLP problems',
Table 1: Summary of computational results. Computers Chem. Engin. 19 (1995), $131-$136.
[9] YUAN, X., ZHANG, S., PIBOLEAU, L., AND
D DOMENECH, S.: 'Une methode d'optimisation non-
lineare en variables pour la conception de procedes',
See also" C h e m i c a l process planning;
Oper. Res. 22, no. 331 (1988).
Mixed integer linear programming: Mass
a n d h e a t e x c h a n g e r n e t w o r k s ; M i x e d in- Ignacio E. Grossmann
teger nonlinear programming; Generalized Carnegie Mellon Univ.
Pittsburgh, PA, USA
Benders decomposition; Generalized outer
E-mail address: grossmann@cmu, edu
a p p r o x i m a t i o n ; M I N L P : G e n e r a l i z e d cross
d e c o m p o s i t i o n ; E x t e n d e d c u t t i n g p l a n e al- MSC 2000: 90Cll
gorithm; MINLP: Logic-based methods; Key words and phrases: mixed integer nonlinear program-
ming, outer approximation method, generalized Benders de-
MINLP: Branch and bound methods;
composition, extended cutting plane method.
MINLP: B r a n c h a n d b o u n d g l o b a l o p t i m i -
zation algorithm; MINLP: Global optimi-
zation with aBB; MINLP: Heat exchanger
n e t w o r k s y n t h e s i s ; M I N L P : R e a c t i v e dis- MINLP: REACTIVE DISTILLATION COL-
tillation column synthesis; MINLP: Design UMN SYNTHESIS
•a n d s c h e d u l i n g o f b a t c h p r o c e s s e s ; M I N L P : Reactive distillation (RD) occurs when a reaction
A p p l i c a t i o n s in t h e i n t e r a c t i o n of d e s i g n takes place in the liquid holdup on the trays, in
a n d c o n t r o l ; M I N L P : A p p l i c a t i o n in f a c i l i t y the reboiler, or in the condenser of a distillation
l o c a t i o n - a l l o c a t i o n ; M I N L P : A p p l i c a t i o n s in column. Reactive distillation can increase the con-
blending and pooling problems. version of equilibrium limited reactions by con-
tinuously separating products and reactants, im-
References prove the selectivity in some kinetically limited
[1] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer- reaction systems, and separate azeotropic and iso-
approximation algorithm for a class of mixed-integer
meric mixtures by converting one species into an-
nonlinear programs', Math. Program. 36 (1986), 307.
[2] FLETCHER, R., AND LEYFFER, S.: 'Solving mixed inte- other that is easy to remove. It can also create a
ger nonlinear programs by outer approximation', Math. natural heat integration that uses an exothermic
Program. 66 (1994), 327. heat of reaction to create vapor boilup in a dis-
[3] GEOFFRION, A.M.: 'Generalized Benders decomposi- tillation column, and reduce capital costs by com-
tion', J. Optim. Th. Appl. 10, no. 4 (1972), 237-260. pleting several processing steps in a single vessel.
[4] GROSsMANN, I.E., AND KRAVANJA, Z.: 'Mixed-integer
Reactive distillation is used commercially to pro-
nonlinear programming: A survey of algorithms and
applications', in L.T. BIEGLER, T.F. COLEMAN, A.R. duce methyl tert-butyl ether [13], esters including
CONN, AND F.N. SANTOSA(eds.): Large-Scale Optimi- methyl acetate [1], and nylon 6,6 [9]. It has also
zation with Applications. Part II: Optimal Design and been proposed for hydrolysis reactions [7], ethy-

373
MINLP: Reactive distillation column synthesis

lene glycol synthesis [11], and cumene production Given:


[12]. See [7] for a review of the area. • the chemical species, i -- 1 , . . . , I, involved in
As a result of increasing interest in the reac- the distillation; desired products, i C P, and
tive distillation technique, systematic reactive dis- their production rates PI;
tillation design methods have gained much im- • the set of chemical reactions, j = 1 , . . . , J;
portance. See [2], [3], [4], [5] for residue curve
• rate expressions rj o r an equilibrium constant
maps, a powerful tool for visualizing distillation
Kj for each reaction j;
problems, to reactive distillation. In [7] this work
was extended by including kinetic effects when the • heat of vaporization and vapor-liquid equilib-
Damkohler number is fixed. In [14] synthesis of re- rium data;
active distillation with multiple reactions is stud- • cost of downstream separations;
ied. • cost c, and composition xi, of all feedstocks,
Reactive distillation poses a challenging prob- s=l...S;
lem for optimization based design techniques. Un- • the cost of the column as a function of the
like in conventional distillation, holdup volume is number of trays and the internal vapor flow
an important design variable in reactive distilla- rate, C(V, N);
tion, since the reaction generally takes place in the • the form of the catalyst.
liquid body on the tray. The constant molar over-
Determine:
flow assumption of conventional distillation design
is not valid unless the reaction has thermal neu- • the optimum number of trays;
trality and is stoichiometrically balanced. For an • the trays where reactions take place;
optimal solution one should take into account that • the holdup on each tray where a kinetically
the feed to the column may be distributed. This, limited reaction takes place;
in addition to the holdup volume, liquid and va-
• the reflux ratio;
por flows, composition and temperature profiles,
• the condenser and reboiler duties; and
number of trays and feed location(s) become ma-
jor variables of an optimization problem which . the feed location(s).
searches for a minimum of a cost function. The Such that the total cost is minimized while pro-
constraints of this optimization problem are ma- ducing the correct amount of product.
terial and energy balances, vapor-liquid equilib-
ria, mole fraction summations, kinetic and ther- Distillation Based Superstructure Ap-
modynamic relationships, and logical relationships proaches. One approach to MINLP based re-
between the variables. The resulting optimization active distillation column design uses a super-
model is a mixed integer nonlinear programming structure that contains many different alternative
problem since it involves the optimum number of designs embedded within it. Two different super-
trays and feed tray locations which are integer structures have been proposed; they differ in their
variables. The cost function and the material and treatment of the liquid reflux and vapor boilup,
energy balances cause the nonlinearity of the prob- and in their heat management. See [6] for a struc-
lem. ture that varies the number of trays and always
There are two approaches to RD design via recycles the liquid reflux to the top tray and the
MINLP methods. One addresses reactive distilla- vapor boilup to the bottom tray (Fig. 1). More
tion through heat and mass exchanger networks recently (1997), Z.H. Gumus and A.R. Ciric [8]
[10], and the other addresses it through distilla- modified the superstructure presented in [15] re-
tion column superstructures [6], [8]. cycling vapor boilup and liquid reflux to each tray
by adding a decanter to the distillate stream and
Problem Statement. The general problem of the side heaters and coolers to each tray (Fig. 2). In
reactive distillation column synthesis problem can both of these superstructures, the number of trays
be stated formally as follows. may vary between 1 and some upper bound K.

374
MINLP: Reactive distillation column synthesis

Each feed stream is split, and a portion is sent 0.802


to each tray in the superstructure. In kinetically
limited reactions, the hold-up volume may vary, k kl<k
and, in reactions systems with a solid catalyst, X(Yk - Yk+l)
some trays will have reaction while others do not.
subject to

E xisFsl - L l x i l (1 - fl) (1)


r
8

+L2xi,2 - V1Kil xil + E uij~lj - O,


J

[ ~ s xisFsk+Vk-lKi,k-lXik-1 (2)

+L~+lXi,k+l - L~xik - VkKi~xik


"1

Fig. 1: Superstructure for optimum feed location(s) and + ~j~k I Yk - o,


number of trays [6]. 3 J
k = 2,...,K,

(3)

D ~ - ~ ( Y ~ - ik+~)(Yk - Y~+l), (4)


ff]
k
w

Bi = (1 - f l ) L l X i l , (5)
B~ =P~, i ~ P, (6)

1] o, (7)
Fig. 2. Tray-by-tray superstructure of [8].

The structure shown in Fig. 1 is appropriate


for reactive distillation processes with a single liq-
11 o, (8)
uid phase and kinetically limited reactions that xdi - KikXik -- 1 + Yk -- Yk+~ <_ O, (9)
are catalyzed with a solid catalyst. Representing
Xi,k+l -- xdi - 1 + yk - yk+l ___O, (10)
the existence of each tray with an integer variable
Yk leads to a mixed integer nonlinear program- E xdi - 1, (11)
ming problem whose solution extracts a design i
with the number of trays, feed tray locations, reac- ~jk - Wklj(~k,Tk), (12)
tive trays, holdup volumes, reflux ratio and boilup Kik = Kik (Xik, Tk), (13)
ratio that minimize the total cost. Assumed vapor (14)
gk - Fm~xYk _<0,
liquid equilibrium on each tray, no reaction in the
vapor phase, homogeneous liquid phase, negligible ~Gk -Fm~xYk <_0, (15)
8
enthalpy of liquid streams, constant heat of vapor-
ization leads to the MINLP shown below [6]: L~+I - Fm~xYk <_ O,
W~ - Wm~xYk <_ 0, (17)
m i n Z - Co + ~ csFsk + CRQB + c c Q c
sk QB = flAL1, (18)

+ c T D 1"55 x E 2Yk + Qc - ~ ,~Vk(Yk -- Yt:+I), (19)


k

375
MINLP: Reactive distillation column synthesis

D 4 >_ CD~2L~, (20)


©
D _> Dmin, (21) 26 3 H20. X 05 I-~ l
26.3lifo
0.2 EO

Yk+l <_Yk. (22) 4.')7~ 0


4.69 EO
~ R~_-acfi o n
70nc
Reaction
zone
4.76 EO
4.89 EO

In this model, constraints (1) and (2) are the Distillatioa


zone
component balances of species i over the bottom
~a[iofl 0.04 t!:(.)
tray and the remaining trays k; constraint (3) is
the energy balance around tray k. The distillate 2.~EO 25EO
[.27DEG 1.27DEG
flow is found with constraint (4). Distillate flow
is calculated as the difference between the vapor Fig. 3" Optimal distributed and two-feed columns for
flow leaving the top tray and the liquid flow en- ethylene glycol production.
tering it. Note that the term Y k - Yk+l will be Reaction Rate AH
nonzero only for top tray, and zero for all others. (mol/cmS.s) (kJ/mol)
Constraint (5) calculates the bottoms flow rate 1 3.15 x 109 exp r]_v,5471
T(K) J XEOXH20 --80
L.
and constraint (6) specifies the production rate. 2 6.3× 109exp[ -°'547]
T(K) XEOXEG -13.1
Summation equations for the mole fractions are Component K for P - latm
given in constraints (7) and (8). Constraints (9)-
(11) identify the top tray and set the distillate and H20 221.2 exp {6.31 [ TT--5~.~]}
liquid reflux composition equal to the composition
EG 77 exp {9.94 [TT~,~4.~]}
of the vapor leaving the top tray. Reaction rates
are given in constraint (12), and the vapor liquid
47 exp 10 42 TT-s63~ { [ ]}
EO feedstock: $43.7/kmol
equilibrium constant is found by constraint (13).
Water feedstock: $21.9/kmol
Constraints (14)-(17) ensure that when Yk equals Downstream separation: $0.15/kmol H20
zero and tray k does not exist, the flows onto and in effluent
off of the tray are zero. Constraints (18) and (19) Csh = $222/yr
calculate the reboiler and condenser duties, while CT = $15.7/yr
constraints (20) and (21) find the column diam- Cn = $146.8/kW.yr
Cc = $24.5/kW.yr
eter. The last constraint ensures that tray k + 1
Co = $10,000/yr
does not exist if tray k does not exist.
In [6] this technique is demonstrated with the Table 1" Ethylene glycol system; reaction, physical
synthesis of a reactive distillation column that property and cost data.
makes ethylene glycol from ethylene oxide and wa- The problem is solved using the reaction, phys-
ter. The main reaction is ical property and cost data given in Table 1. The
production rate is taken as 25 kg.mol/h of ethylene
C2H40 -{- H 2 0 ---4 C2H602. glycol. When the problem is solved without spec-
ifying the number of feed trays or their locations
Further reaction of ethylene glycol gives the un- the solution obtained using GAMS is a 10-tray dis-
desired byproduct diethylene glycol: tillation column with a total annualized cost of
15.69 × 106/yr. The reaction zone is above tray
C 2 H 4 0 + C2H602 --4 C4H1003 4 and the feed is distributed to each tray in the
reaction zone. When the problem is slightly modi-
Ethylene glycol is produced using reactive dis- fied by adding constraints on the feed tray number,
tillation because the large volatility difference be- the solution changes to a 10-tray column with a to-
tween the product and the reactants allows the tal annualized cost of 15.73 × 106/yr. The reaction
continuous removal of EG from the reaction zone zone is between trays 4 and 10 and water is fed to
and absorption of the heat of reaction by the sep- tray 10 while ethylene glycol enters the column at
aration results in cost cuts. tray 4. The selectivity reached by both columns is

376
MINLP: Reactive distillation column synthesis

Feed type Diam. (m) Height (m) Boilup ratio Reboiler duty (MW) Condenser duty (MW)
Distr. 1.3 12 0.958 6.7 7.31
Two-feed 1.3 12 0.96 6.9 7.5

Table 2. Column specifications for ethylene glycol production.

the same. Fig. 3 shows the solutions. The column temperature approach is the driving force for heat
specifications are given in Table 2. transfer. Concentration and temperature approach
constraints are considered at each end of the ex-
Heat and Mass E x c h a n g e Networks. In this changer. Equilibrium can be represented by a zero
approach, process units are defined as combina- concentration approach, which means no driving
tions of heat and mass exchanger blocks, and the force for mass transfer.
alternatives for the synthesis are explored simul- Product
f V(ABC)D- Water1
taneously in a superstructure. A reactive distilla-
$ I
tion column can be described as a combination of I L(ABC)D - V(ABC)D ]
mass/heat exchanger units with a condenser and Feed
a reboiler [10]. Heat and mass transfer takes place
between the contacting vapor and liquid phases
[ 1
and from reactants to products. Multiple feeds and
products and side heating and cooling tasks can be I LAsc o,-v c , ]
included in the description in the form of multiple I t, I Steam-VABC(D)1
mass and heat exchanger blocks between liquid
and vapor streams. Its phase and quality define
J Producl

each stream. The quality indicator describes the


Fig. 4" Mass/heat exchange network representation of a
leanness or richness of a stream in different com-
multifeed reactive distillation column.
ponents. Heat and mass transfer occurs between
vapor and liquid streams of the same quality or
In the synthesis framework for an optimal pro-
between liquid and liquid (reactant and product)
cess network, one should start with the construc-
streams. For example, consider the reaction
tion of the stream sets containing all the initial,
A+B-+C+D. intermediate, and final process streams. The key
is the availability of the physical and chemical
Then there are liquid and vapor streams property information on the streams. When the
LABCD and VABCD in general notation. The information is not enough to identify the individ-
streams lean in a component, for example in A, ual streams, especially the intermediate streams,
have that letter in parentheses, e.g. L(A)BCD a general set of one vapor and one liquid stream
or V(A)BCD. All possibilities of such streams, is constructed, which contain all components in-
i.e. LAB(CD), VAB(CD), L(ABC)D, V(ABC)D, volved in the process. The second step is to list all
L(AB)C(D), V(AB)C(D), etc., and all the possible the possible stream matches. Engineering knowl-
matches between them are considered within the edge plays an important role in this step. One
structure. The possible matches are liquid-vapor should be careful about not listing redundant
matches of the same stream and all liquid-liquid or meaningless stream matches since these will
matches. only make the problem more complex. Knowledge
This model describes exchangers with simple about the system is the key in this screening stage.
mass and energy balances and constraints defin- Developing the mass/heat exchange network su-
ing phase and feasibility. Mass and heat generated perstructure is the next step in the framework. All
or consumed by chemical reactions are included possible interconnections between the stream split-
in the balances. Mass transfer is driven by a min- ters and mixers should be taken into consideration.
imum concentration approach while a minimum The last step is the optimization of the superstruc-

377
MINLP: Reactive distillation column synthesis

ture. Usually, the objective function of the optimi- timal reactive distillation column obtained is pic-
zation problem is a cost function. If the cost func- tured in Figure 6. The column has two reaction
tion includes only operating cost, which depends zones and multiple feeds, and the operating cost is
on the raw material and utility consumption, the 1.17 x 106 S/yr.
objective function can be easily formulated from
the superstructure. If, however, capital investment Pr°ductI [ V(ABC)D
- Water1
costs are involved in the objective cost function,
the formulation is not straightforward from the su- [ L(ABC)D-V(ABC)D I
A
perstructure, since process unit specifications are
not considered in the superstructure. In this case, r 1
capital cost is to be approximated using cost func-
tions that take operating conditions into account.
Separation difficulty can be used in evaluating the
capital cost of a distillation tray. t
J
V(ABC)D-,Watc~

I
I I
Fig. 6. Optimal reactive distillation column for ethylene
glycol production.
Feed
I L!c,o-vAB,OD 1
Conclusions. This paper discussed the MINLP
applications in reactive distillation design prob-
I t lems. Two main approaches are studied: distil-
f Steam-VABCCD)
]
lation based superstructure approach that uses
J rigorous tray-by-tray method to model reactive
distillation, and heat and mass exchanger net-
Fig. 5. Steps of the synthesis framework. work superstructure approach that realizes reac-
tive distillation processes as combinations of sev-
K.P. Papalexandri and E.N. Pistikopoulos [10] eral mass/heat exchangers with a condenser and
used the production of ethylene glycol from ethy- a reboiler. Examples are included to demonstrate
lene oxide and water to demonstrate this approach. the approaches.
The reactions involved in this production were See also: C h e m i c a l p r o c e s s planning;
given before. Physical properties, cost and reac- M i x e d i n t e g e r linear p r o g r a m m i n g : M a s s
tion data are the same as given earlier in Table a n d h e a t e x c h a n g e r n e t w o r k s ; M i x e d integer
1. The difference from the example problem stud- n o n l i n e a r p r o g r a m m i n g ; M I N L P : O u t e r ap-
ied in [6] is the objective, which is the minimiza- proximation algorithm; Generalized outer
tion of operating cost only. The set of streams in- a p p r o x i m a t i o n ; M I N L P : G e n e r a l i z e d cross
clude the intermediate streams L{EO, H20, EG, d e c o m p o s i t i o n ; E x t e n d e d c u t t i n g plane al-
DEG} and V{EO, H20, EG, DEG} and the prod- gorithm; MINLP: Logic-based methods;
uct streams L(EG) and L(DEG). Five liquid-liquid MINLP: Branch and bound methods;
mass/heat exchange matches and 15 liquid-vapor M I N L P : B r a n c h a n d b o u n d global o p t i m i -
mass/heat exchange matches are considered. Rep- z a t i o n a l g o r i t h m ; M I N L P : Global o p t i m i -
resenting each match with a binary variable, and zation with aBB; M I N L P : Heat exchanger
considering all possible interactions between units, n e t w o r k synthesis; G e n e r a l i z e d B e n d e r s de-
the problem is formulated as a mixed integer non- c o m p o s i t i o n ; M I N L P : D e s i g n a n d sched-
linear programming problem with the objective of uling of b a t c h processes; M I N L P : Appli-
minimizing operating cost, which includes raw ma- cations in t h e i n t e r a c t i o n of design a n d
terial cost, purification, and utility cost. The op- control; M I N L P : A p p l i c a t i o n in facility

378
MINLP: Trim-loss problem

location-allocation; MINLP: Applications in Cincinnati OH 45221, USA


blending and pooling problems. Zeynep Gumus
Dept. Chemical Engin. Univ. Cincinnati
Cincinnati OH 45221, USA
References
[1] AGREDA, V.H., PARTIN, L.R., AND HEISE, W.H.: MSC 2000:90C90
'High purity methyl acetate via reactive distillation', Key words and phrases: reactive distillation, bilevel pro-
Chem. Engin. Prog. 86, no. 2 (1990), 40-46. gramming, MINLP.
[2] BARBOSA, D., AND DOHERTY, M.F.: 'Design and min-
imum reflux calculations for double-feed multicompo-
nent reactive distillation columns', Chem. Engin. Sci.
43 (1988), 2377. MINLP: TRIM-LOSS PROBLEM
[3] BARBOSA, D., AND DOHERTY, M.F.: 'Design and min- T h e trim-loss p r o b l e m is one of the most demand-
imum reflux calculations for single-feed multicompo-
ing optimization problems in the paper-converting
nent reactive distillation columns', Chem. Engin. Sci.
industry. It a p p e a r s when an order specified by a
43 (1988), 1523.
[4] BARBOSA, D., AND DOHERTY, M.F.: 'The influence of customer is to be satisfied by cutting out a set of
equilibrium chemical reactions on vapor-liquid phase p r o d u c t reels from a wider raw paper reel.
diagrams', Chem. Engin. Sci. 43 (1988), 529. T h e p r o d u c t s in the order are characterized by
[5] BARBOSA, D., AND DOHERTY, M.F.: 'The simple dis-
w i d t h and quality. In a paper-converting mill the
tillation of homogeneous reactive mixtures', Chem. En-
raw p a p e r can be printed, coated and cut. In a typ-
gin. Sci. 43 (1988), 541.
[6] CmIc, A.R., AND Gu, D.: 'A mixed integer nonlin- ical paper-converting mill, there may be hundreds
ear programming approach to nonequilibrium reactive of different p r o d u c t s to be produced. W h e n consid-
distillation synthesis', AIChE J. 40, no. 9 (1994), 1479. ering the trim-loss problem, w i d t h is the most im-
[7] DOHERTY, M.F., AND BUZAD, G.: 'Reactive distilla- p o r t a n t p r o p e r t y while the main problem is to de-
tion by design', Trans. Inst. Chem. Engin. 70 (1992),
termine such cutting p a t t e r n s t h a t minimize waste
448-458.
[8] GUMUS, Z.H., AND CIRIC, A.R.: 'Reactive distilla- production, the trim loss.
tion column design with vapor/liquid/liquid equilib-
ria', Computers Chem. Engin. 21, no. Suppl. (1997),
$983-988.
[9] JACOBS, D.B., AND ZIMMERMANN, J.: 'Chap. 12', in
C.E. SCHmDKNECHT (ed.): Polymerization Processes, , -,,, m Loss
Wiley, 1977.
[10] PAPALEXANDRI,K.P., AND PmTIKOPOULOS, E.N.: 'A
generalized modular representation framework for pro-
cess synthesis based on mass/heat transfer principles',
]
AIChE J. 42, no. 4 (1996), 1010.
[11] PARKER, A.S.: 'Preparation of alkylene glycol', U.S.
Patent 2,839,588 (1958). Fig. 1: The cutting procedure.
[12] SHOEMAKER, J.D., AND JONES, E.M.: 'Cumene by
In the optimization problem, beyond the num-
catalytic distillation', Hydrocarbon Proc June (1987),
57-58. ber of cutting p a t t e r n s needed, the appearance of
[13] SMITH, L.A.: 'Method for the preparation of methyl each cutting p a t t e r n needs to be determined at the
tertiary butyl ether', U.S. Patent 4,978,807 (1984). same time as having to decide how m a n y times the
[14] UNG, S, AND DOHERTY, M.F.: 'Synthesis of reactive cutting p a t t e r n s ought to be repeated.
distillation systems with multiple equilibrium chemi-
The customer widths and the raw paper widths
cal reactions', Industr. Engin. Chem. Res. 34 (1995),
2555-2565. are often more or less i n d e p e n d e n t of each other.
[15] VISWANATHAN, J., AND GROSsMANN, I.E.: 'Optimal This makes it combinatorially very d e m a n d i n g to
feed locations and number of trays for distillation produce a cutting plan t h a t minimizes the trim
columns with multiple feeds', I-EC Res. 32 (1993), loss. Even if the trim-loss p r o b l e m is in its basic
2942-2949. form an integer problem, it has often been solved
Amy Ciric by linear p r o g r a m m i n g (LP) m e t h o d s [3] or some
Dept. Chemical Engin. Univ. Cincinnati heuristic algorithms [4]. A good survey of widely

379
MINLP: Trim-loss problem

used solution methods for trim-loss and assort- reel the problem can be simplified by omitting the
ment problems is given in [7]. raw paper length and assuming that the pattern
When using an LP-approach to solve an inte- lengths are equal.
ger problem the biggest difficulty is to convert the Besides the demand constraint, certain con-
continuous solution such that the integer variables straints are needed to keep the problem feasible.
obtain integer values. The rounding methods are Let the width of a product i be expressed by bi
heuristic [8] and often fail to give the optimal inte- and the width of the raw paper used for cutting
ger solution even though the solution may be fairly pattern j by ~j,max. The trim-loss width cannot
good. exceed, for instance, 200mm owing to the machin-
ery. This limit is represented by Aj. Furthermore,
P r o b l e m F o r m u l a t i o n . The trim-loss problem the maximum number of products that can be cut
is a bilinear nonconvex integer nonlinear program- out from a pattern often has a physical restric-
ming (INLP) problem. The appearance of a cut- tion. The outcoming product reels have to form
ting pattern needs to be determined by integer an angle big enough so that the reels do not at-
variables and the bilinearity comes from the de- tach together, yet with too big an angle between
mand constraints. the outermost reels the paper may be torn off. Let
A cutting pattern tells how many times a cer- this upper limit be Nj,max.
tain product is cut out from the raw paper. Let Besides the total number of patterns, the pat-
a cutting pattern have the index j and a prod- tern changes are also of interest when doing the
uct the index i. Assume a customer demand with optimization. This is due to the fact that the ma-
I different products and further assume that the chinery normally needs to be stopped for a knife
maximum allowed number of different cutting pat- change which causes a production stop. Let there-
terns is J. Further let m j be the number of times fore the variable yj be 1 if the cutting pattern j
a certain cutting pattern is repeated and nij be exists and 0 if not. The sum of yj variables then
the number of times a product i appears in cut- indicates how many different cutting patterns are
ting pattern j. If the demand of a product i is needed to satisfy the production and the sum of mj
expressed by ni,order, the demand constraints can indicates the total number of all patterns which are
be written as related to the running metres of the raw material.
J Now the basic formulation can be written in
ni,order -- E mj • nij ~_ 0, (1) mathematical form. The objective is to minimize
j=l the total number of patterns and the number of
i= 1,...,I, pattern changes.

mj, nij E Z +.

The negative bilinear terms make the problem


nonconvex. Both of the variables in the term are
min
mj 'no 'yJ
/J
E cj " m j + Cj . yj
j--1
/ (2)

integer variables and consequently the problem is subject to


a bilinear integer optimization problem. It is not
i
possible to replace one of the variables nij with
E bi" nij -- Bj,max ~ O, (3)
a continuous variable because this would violate i=1
the product specification. In theory it is possible I
to replace the m j with a continuous variable but -- E bi " nij + Bj,max - A j ~_ O, (4)
this may easily dissatisfy the desired product reel i=1
length and diameter requirements. Therefore, in I
the following study it is preferable to keep both E n i j - Nj,max ~_ O, (5)
m j and nij as integers. i=1
While raw paper reels of the same width are of-
yj - m j < o, (6)
ten glued together to form a continuous raw paper mj - Mj.yj < 0, (7)

380
MINLP: Trim-loss problem

j- 1,...,J, (2)-(8), all constraints but the last demand con-


J straint are linear. This means that the problem
ni,order -- E m j • nij ~ O~ (8) should be fairly well bounded already by the linear
j=l part of the problem and thus a linear formulation
i - 1,...,I, strategy seems to be fully possible.
However, this linear transformation requires
m j , nij E Z, yj E {0, 1}.
new variables and constraints that may compli-
The M j gives the upper bound for corresponding cate the problem. Using a standard approach, by
mj variables. When using an objective as in (2) rewriting one of the integer variables in the bilinear
the constraint (6) becomes irrelevant. The width term by binary variables, the following is obtained.
constraints are given in (3)-(4) and the constraint K
(5) restricts the number of cuts in a pattern. The mj - Z (9)
binary variables, yj, are defined in (6)-(7). k=l
The functionality of the variables are demon- m j e rt, e {0,1}.
strated in the following figure where the raw-paper K is the number of binary variables needed. By
width is Bj,max. Note that the pattern length may defining Lid to be the upper bound for respective
typically be e.g. 6500m. nij variables and introducing a new slack-variable
m, = 1 m2= 1 m~= 2
8ijk the following constraints will create a neces-
~'~ 1 3 3 3
sary link between the nij and 8ij k variables:
1 3 3 3

2 3 3
Sijk - nij <_ 0, (10)
4
--Sijk + n i j -- L i j " (1 - ~ j k ) ~_ O, (11)
n. = 2 n3~= 2 n3, = 3
n2, = 1 n,2= 1
Sijk -- Lij " ~jk <_ O. (12)
Fig. 2: The integer variables. Using the above constraints the bilinear demand
The last constraint, the demand constraint (8), constraint can be written in linear form
J K
is an integer bilinear constraint where both vari-
ables in bilinear terms are pure integers. This
ni,°rder- E E 2k-l'sijk ~ O. (13)
j = l k=l
makes the problem a nonconvex MINLP prob-
The mj could also be represented by special or-
lem where the nonconvexity appears in the integer
dered sets (SOS) where at most, one of the binary
variables.
variables are allowed to be nonzero.
There are very few methods available that are
K
capable of solving similar nonconvex MINLP prob-
mj - Z k . (14)
lems. Some heuristic methods such as simulated
k=l
annealing [9] may find the global optimal solu- K
tion within infinite time but algorithmic methods
have not been proven to converge with such types k=l
of problems. Only recently (1999) some advance- It should be noted that the usage of this kind of
ments have been reported in [1] and [11]. transformation may enlarge the integrality gap un-
However, it is fully possible to transform the less for instance the nij variables in equations (3)-
trim-loss problem into convex or linear form and (5) are replaced with corresponding variables Sijk.
use some established MINLP or MILP solver to The same transformation can be modified such
solve the resulting problem to global optimality. that nij is replaced by a binary representation and
Some linear transformations are presented in [6] mj is defined through the slack-variables 8ijk.
and methods to transform the nonconvex problem
into a convex form can be found in [10] and [5]. P a r a m e t e r i z a t i o n M e t h o d s . Beyond the linear
transformation, the problem can be written in lin-
L i n e a r T r a n s f o r m a t i o n s . As can be seen from ear form by simply parameterizing one of the vari-

381
MINLP: Trim-loss problem

ables in the bilinear term. This m e t h o d though ders. This creates an interesting problem, where
may lead to global optimality only in such cases the integer search space is reduced at the expense
where all the possible combinations have been con- of more complex nonlinear functions, which could,
sidered. This strategy may be good for smaller in principle, be used as benchmarks for the perfor-
problems but it may also generate far too many mance of MINLP algorithms.
integer variables in solving larger trim-loss prob- The basic principle for the convex transforma-
lems. tion is to first expand the bilinearity in the demand
It is quite easy to generate all the possible com- constraint
binations of nij variables satisfying the constraints mj. -- + + (16)
(3)-(5). This strategy results in a problem where
-- T " ( rnj q- nij ) -- T 2.
these constraints can be removed and where the
nij variables in the resulting linear d e m a n d con- In the following text, the translation constant T =
straint are parameters: 1 is used for simplicity. The second step is to sub-
I stitute the bilinear t e r m in the original demand
hi,order -- m j • nij < O, (15)
constraint
mjEZ.
J
The same type of parameterization strategy may ni,orde r -- ~ ( m j nt-- 1)(nij + 1) (17)
also be applied to the other variable mj but in j=l
this case it may be more difficult to define the ex- J
act values of the parameters. One strategy is to use + (mj + w ) + J < 0.
the upper bounds M j or define all the mj variables j=l

to be equal to one and make sure that a sufficient It should be noted t h a t the transformations that
amount of the variables m j a r e considered. follow need to consider the whole problem not only
Another alternative is to combine the param- individual functions, which makes the transforma-
eterization and transformation methods so that a tion techniques more demanding. A transforma-
proper amount of parameterized variables are com- tion of a single function may cause linear con-
bined with original variables. This strategy may be straints to become nonlinear if one is unaware of
very efficient but often requires such information this fact.
that may be difficult to obtain from a larger prob-
Exponential T r a n s f o r m a t i o n . The demand con-
lem without any knowledge of the solution.
straint is originally a negative bilinear constraint.
The exponential transformation can only be ap-
C o n v e x T r a n s f o r m a t i o n s . In the previous sec-
plied to a positive bilinear constraint. Therefore,
tions a number of methods were presented where
one of the variables in the bilinear term needs to
the nonconvex problem can be transformed or pa-
be substituted with its reversed value.
rameterized into linear form. The main drawback
for this linear transformation strategy is the large rij -- N j , m a x - nij (18)
number of extra constraints and continuous vari- and the d e m a n d constraint is modified to
ables. The parameterization strategy results in a J J
formulation with a few constraints but many extra ni,orde r -- ~ mj. N j , m a x -[- ~ mj.rij. (19)
integer variables. j=l j=l
In the following a number of convexification Now the exponential transformation can be ap-
methods are presented. Generally, the convex for- plied. The transformation is of the form
mulations need fewer extra constraints and contin-
uous variables as the linear strategies and no extra m j + 1 - e Mj , rij + 1 = e R~ (20)
integer variables as is the case with the parameter- and the variables are defined as
ization methods. Thus, the convex transformation Lj
could be expected to result in formulations which mj - ~~jl . l, (21)
are easier to solve especially for larger-scale or- /=1

382
MINLP: Trim-loss problem

Lj J3jk,13ijk E {0, 1}, Mj,Nij E R,


Mj - E / 3 j , . l n ( / + 1), (22)
l=l
and the resulting convex demand constraint is
Ki J
rij -- E ~ijk " k, (23) ni,order + J - E 4 M J " Nij (31)
k=l j=l
Ki
Rij - ~ ~ijk " ln(k + 1), (24)
k=l
+~ ~.l + ~k=l
~k. k _<0.
Lj Ki
E~jI<_I, E~iJk<_l, (25) Logarithmic and Square-Root Transformation.
1=1 k=l The square-root and the logarithmic functions can
~j~,~jk e {0,1}, M~, R~j e R. be combined, resulting in a third convex transfor-
When combining these definitions, the demand mation. It is directly applicable to the negative
constraint can be written in convex form bilinear function and the transformation can be
J written as
hi,order -- J + ~ e Mj + Rii (26)
mj + 1 = V/--~, nij + 1 -- In Nij. (32)
j=l
The mj, nij and Mj variables are defined as in the
square-root transformation and the g i j is defined
as
<0. Ki
g~ - ~ + ~ 9~jk"(~k+l _ ~) (33)
This transformation can also be achieved in
k=l
slightly another way but using this strategy also
requires updating some of the constraints in (3)- and the following convex demand constraint is ob-
(7). tained
J
Square-Root Trans]ormation. This transformation ni,°rder + J - E ~ " In Nij (34)
is almost equivalent to the previous one. A main j=l
difference is that it can be applied straight to the
negative bilinear constraint and thus no rij vari- +~ ~.l+ ~jk.k _<0.
ables need to be defined. The constraint (21) is j=l k=l
valid but the constraint (23) needs to be modified
It can be noted in equation (34) that the only dif-
to
ference to the former transformation is the third
Ki
term of the demand constraint.
nij -- E ~ijk " ]¢" (27)
k=l Inverted Transformation. The following transfor-
Note that the equations in (25) are valid. The mation can be applied to a positive bilinear con-
transformation is of the form straint. Thus the same definition of rij has to be
done as for the exponential transformation. The
mj + l - x~, nij + l - v/Yij. (28)
transformation has the form
The transformation variables Mj and Nij are de- 1 1
fined as mj + l - -~j, rij + l - nij" (35)
Lj
The definitions of the transformation variables fol-
Mj - 1 + E ~jl " l(l + 2), (29)
1=1 low:
Ki
Lj (1 1), (36)
Nij = 1 + E ~ijk " k(k + 2), (30) Mj=I+E~JI" /+1
k=l /----1

383
MINLP: Trim-loss problem

Rij--l+E3iJk"
(1) k+l 1 . (37)
Five methods for transforming the originally non-
convex trim-loss problem into convex form have
k-1 been discussed. Three of t h e m were directly appli-
The demand constraint is obtained exactly in the cable to a negative bilinear function but for two
same way as before methods some operations were needed to change
J the d e m a n d constraint into a positive bilinear con-
1
ni'°rder -- J + j~l= Mj - Rij (38) straint.

- E (Nmax + 1). ~ / 3 j l .
Lj 1+ E/3ijk" k
/ E x a m p l e : A N u m e r i c a l P r o b l e m . In this last
section a numerical example is solved with all of
j=l /=1 k-1 the presented methods. To improve the perfor-
<0. mance of the solution procedure some extra linear
constraints need to be defined. They are, however,
Modified Square-Root Transformation. As the last not specified here.
transformation, a modification to the previously In the following example order an upper limit
presented square-root transformation is intro- for products h i , m a x that are allowed to be produced
duced. In such cases where the variable mj may also has been defined. Here, the maximal possible
take large values, it may be more efficient to use overproduction of any product is 2. This limit is
another type of binary representation. somewhat u n n a t u r a l and is therefore not used as a
constraint. However, the use of this type of upper
- Z (39) bounds makes it possible to efficiently reduce the
l--1 combinatorial space.
where L~ - ~log2(mj,max)J + 1 if mj,max is the i bi (mm)hi,order hi,max _

upper bound for the respective mj variable. This 1 330 8 10


modification reduces the required number of bi- 2 360 16 18
nary variables and the transformation variable Mj 3 380 12 14
4 430 7 9
needs to be redefined. The definition also requires
5 490 14 16
additional slack-variables and constraints. In the 6 530 16 18
following, the square-root transformation is used:
Example order.
Lj !

Mj - 1 + E ( s 21-2 + 21) • fljl (40) The example d e m a n d is a mid-size customer order


l=l with a total weight of 27.5tons. Some important
Lj !
parameters need to be defined before optimization.
E 2/+m-1 "8jlm, The raw paper width of 2200mm is chosen and a
l,m=l;m<l maximal trim loss of 100mm is tolerated. At most
5 products may be cut out from a cutting pattern.
-Sjlm - 1 +/3jl + 13jm <_ O, (41)
Among the following parameters, the parameter
2 " sjtm - ~jl - ~jm <_ O, (42)
Mj refers to the upper b o u n d of the respective rnj
1, m - - I , . . . , ; m < l. variable and the p a r a m e t e r Ni to the nij variables.
By adding the extra constraints and defining Nij Note, that since the raw paper width is equal for
as in the square-root strategy, the d e m a n d con- every p a t t e r n the latter upper bound is indepen-
straint can be written in convex form as follows dent of the index j.
J J=I=6 Nj,max = 5
ni'°rder-~- J - E v/MJ " Nij (43) cj=l Mj = {14, 12, 8, 7, 4, 2}
j=l Cj = 0.1 Ni = {2,3,3,5,3,4}
Bj,max = 2200mm M m i n - - 15
Aj = 100mm
+Z 2 -1 + k < o.
j=l k=l The problem parameters.

384
MINLP: Trim-loss problem

The parameter Mmin is the lower bound for the variable combinations as a function of number of
sum of the variables mj. This sum can easily be binary variables. This information is more infor-
calculated in advance and significantly enhances mative than just the number of variables.
the optimization performance. Strategy Constraints Variables Comb. 2n
Before doing the actual optimization it should (I/B/C)
36/23/120 298
be pointed out that the results are not compara- o 408
2. 366 6/88/144 2105
ble. The main purpose for showing the numerical
214o
. 59 51/51/-
results is to demonstrate that the above presented 2634
4. 201 282/47/-
strategies are fully usable and result in quite ef- 2 96
. 199 -/169/84
ficient solvable formulations. The transformation 6. 199 -/169/84 2 96

strategies can be directly applied to any problem 7. 185 -/169/84 296


2 96
where the bilinear terms contain integer variables. 8. 185 -/169/84
225 -/208/84 211s
The methods are divided into three groups of 9.

which the linear transformation and the param- The MILP problems 1-4 were solved with
eterization strategies result into MILP formula- CPLEX-5.0 using default settings and the MINLP
tions. The third group, the convex transformation problems 5-9 were solved by 'mittlp', an ECP ap-
strategy produces MINLP formulations that have plication written by H. Skrifvars. The optimization
in this case been solved using the extended cutting was done on a Pentium Pro 200MHz running the
plane (ECP) algorithm by T. Westerlund and F. Linux operating system.
Peterssen [12]. The optimization results can be seen in the fol-
In the parameterization strategies the problem lowing table.
is redefined by parameterizing certain variables
Strategy Nodes ECP-iter. CPU-time (s)
which means that the resulting problem has al- (MILP) (MINLP)
ready been partly solved. This may, however, not 265 716
always be a benefit, especially in such problems 51 0.51
where a huge number of parameters increases the 2174 3.2
265 7.7
integer search space for other variables.
4 8.6
The strategies are numbered as follows: 66.6
7
9 138.6
1. binary representation of mj
10 736.4
2. binary representation of nij
6 49.9
3. parameterization of nij
4. parameterization of mj The optimal result has two cutting patterns with
5. exponential transformation
the widths B1 - 2110 mm and B2 = 2170 mm
6. square-root transformation
7. logarithmic and square-root transformation and multiples ml = 8, m 2 = 7. The appearances
8. inverted transformation of the patterns are given by the following variables:
9. modified square-root transformation nl,1 -- 1, n2,1 -- 2, n6,1 -- 2, n3,2 -- 2, n4,2 -- 1,
n5,2 ---- 2
The strategies enlarge the problem both in terms
of variables and constraints. In the following the
number of variables and constraints are given. All C o n c l u s i o n s . The study above is not a fair com-
the constraints are linear except in the convex parison. Experience has shown that the perfor-
transformation strategies where six of the con- mance order is highly dependent on the specific
straints are nonlinear. problem. In order to get an idea of which of the
The strategies 1-4 are linear formulations of methods is, in average the most efficient one, tens
which 3-4 use the parameterization strategy to of problems of different sizes need to be solved.
overcome the bilinearity. Strategies 5-9 are con- However, the study illustrates that it is fully pos-
vex transformations. The field with combinations sible to apply the transformation methods to a well
gives simply the number of unconstrained discrete explored real industrial problem.

385
MINLP: Trim-loss problem

In the present study the trim-loss problem was tion; S t o c h a s t i c i n t e g e r p r o g r a m m i n g : Con-


used as an example case but the transformation t i n u i t y , s t a b i l i t y , r a t e s o f c o n v e r g e n c e ; Sto-
methods are general and can be applied to any chastic integer programs; Branch and price:
problem with similar type of bilinear constraints. Integer programming with column genera-
tion.

Notation.
References
i product index [1] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
J cutting pattern index C.A.: 'Global optimization of MINLP problems in pro-
I number of products in the order cess synthesis and design', Computers Chem. Engin. 21
J number of possible cutting patterns (1997), $445-$450.
mj number of times the pattern j is used [2] DAKIN, R.J.: 'A tree search algorithm for mixed in-
nij number of product i in pattern j teger programming problems', Computer J. 8 (1965),
rij reversed value of nij 250-255.
Tti,order number of product i ordered [3] GILMORE, P.C., AND GOMORY, R.E.: 'A linear pro-
bi width of product i gramming approach to the cutting-stock problem',
Sigma;r, width of raw paper of pattern j Oper. Res. 9 (1961), 849-859.
Aj max. trim-loss width [4] HAESSLER, R.W.: 'A heuristic programming solution
j ~yl~a x max. number of products in pattern j to a non-linear cutting stock problem', Managem. Sci.
Yj binary variable that is one if mj > 0 17 (1971), S793-S802.
cj,C~ cost coefficients [5] HARJUNKOSKI, I., P(3RN, a., WESTERLUND, T., AND
M~ upper bound / transformation variable SKRIFVARS, H.: 'Different strategies for solving bilinear
binary variables for defining mj integer problems with convex transformations', Com-
L~j upper bound puters Chem. Engin. 21 (1997), $487-$492.
$ijk slack-variable for linear transformations [6] HARJUNKOSKI, I., WESTERLUND, T., ISAKSSON, J.,
flijk binary variables for defining nij or rij AND SKRIFVARS, H.: 'Different formulations for solv-
!
nij fixed nij values ing trim-loss problems in a paper converting mill with
T translation constant ILP', Computers Chem. Engin. 20 (1996), S121-S126.
Nq transformation variable [7] HINXMAN, A.I.: 'The trim-loss and assortment prob-
Rij transformation variable lems: A survey', Europ. J. (?per. Res. 5 (1980), 8-18.
l,k,m indices of binary variables [8] JOHNSTON, R.E.: 'Rounding algorithms for cutting
Lj , Ki number of binary variables needed stock problems', Asia-Pacific J. Oper. Res. 3 (1986),
See also: D e c o m p o s i t i o n t e c h n i q u e s for 166-171.
[9] KIRKPATRICK, S., GELATT, C.D., AND VECCHI, M.P.:
MILP: Lagrangian relaxation; LCP: Parda-
'Optimization by simulated annealing', Science 220
los-Rosen mixed integer formulation; Inte- (1983), 671-680.
ger linear complementary problem; Integer [10] SKRIFVARS, H., HARJUNKOSKI, I., WESTERLUND, T.,
p r o g r a m m i n g : C u t t i n g p l a n e a l g o r i t h m s ; In- KRAVANJA, Z., AND PORN, R.: 'Comparison of differ-
teger programming: B r a n c h a n d c u t al- ent MINLP methods applied on certain chemical engi-
neering problems', Computers Chem. Engin. 20 (1996),
gorithms; Integer programming: Branch
$333-$338.
and bound methods; Integer program- [11] SMITH, E.M.B., AND PANTELIDES, C.C.: 'Global op-
ming: Algebraic methods; Integer program- timization of nonconvex MINLPs', Computers Chem.
ming: Lagrangian relaxation; Integer pro- Engin. 21 (1997), $791-$796.
gramming duality; Time-dependent travel- [12] WESTERLUND, T., AND PETTERSSON, F.: 'An ex-
ing s a l e s m a n p r o b l e m ; Set covering, p a c k i n g tended cutting plane method for solving convex
MINLP problems', Computers Chem. Engin. 19
a n d p a r t i t i o n i n g p r o b l e m s ; S i m p l i c i a l piv-
(1995), S131-S136.
o t i n g a l g o r i t h m s for i n t e g e r p r o g r a m m i n g ;
Iiro Harjunkoski
Multi-objective mixed integer program-
Process Design Lab./~bo Akad. Univ.
ming; Mixed integer classification problems; Biskopsgatan 8
Integer programming; Multi-objective inte- FIN-20500 Turku, Finland
ger linear programming; Multiparametric E-mail address: ±harjunk©abo.fi
mixed integer linear programming; Para- Ray PSrn
metric mixed integer nonlinear optimiza- Dept. Math./~bo Akad. Univ.

386
Mixed integer classification problems

F~nriksgatan 3 mathematical programming has utilized either lin-


FIN-20500 Turku, Finland ear or mixed integer programming models. See [4]
E-mail address: rporn©abo.fi
and L i n e a r p r o g r a m m i n g m o d e l s for classifi-
Tapio Westerlund
c a t i o n for an overview of the subject.
Process Design Lab./~bo Akad. Univ.
Biskopsgatan 8
FIN-20500/~bo, Finland T h e T w o - G r o u p P r o b l e m . The following is a
E-mail address: twesterl@abo, fi typical mixed integer programming model for the
MSC2000: 90Cll, 90C90 two-group case, using a scalar linear discriminant
Key words and phrases: scheduling, paper converting, trim- function:
loss problem, bilinear, convex transformation. 2 N9
min ~-~ TrgCg ~-~Zgn
g=l gg n=l
MIXED INTEGER CLASSIFICATION PROB- s.t. X l w + w0" 1 - M . z l < 0 (1)
LEMS X2wWw0"lWM'z2_~ 0
The G-group classification problem, also known w, w0 free; zg E {0, 1} gg.
as the G-group discriminant problem, involves a
population partitioned into G distinct (and prede- Matrix Xg is an Ng ×p training sample from group
fined) groups. The object is to construct a scalar- g, while ~g and Cg are respectively the prior prob-
or vector- valued scoring ]unction f (.) of p given ability of group g and the cost of misclassifying a
attributes so that the group to which a popula- member of that group. M is an arbitrarily large
tion member with attributes x C R p belongs can positive constant, and 0 and 1 denote vectors, all
be determined, with best possible accuracy, from of whose entries are respectively 0 or 1. The dis-
its score f(x). By a wide margin, the majority of criminant function f ( x ) = w ' x + w0 is intended to
studies have focused on the two-group case. Con- produce negative scores for members of the first
struction of f() is based on training samples from group and positive scores for members of the sec-
the various groups. The most plausible criterion ond group. (The discriminant function is linear as
for choosing f (.) is expected misclassification cost, written, but polynomial functions are easily ac-
but many studies make the simplifying assump- commodated by expanding the sample matrices to
tion that all misclassifications are equally expen- include powers and products of attributes.) Biva-
sive and that groups are represented in the train- lent indicator variable Zgn takes value 1 if the nth
ing samples in proportion to their prior probability training observation from group g is classified in-
of being encountered, in which case the criterion correctly and 0 if it is classified correctly.
reduces to minimizing the number of misclassifica- A score of 0 results in an ambiguous classifica-
tions in the combined training samples. tion. Some authors deal with this by changing the
Classical discriminant analysis relies on dis- first two constraints of (1) to
tributional assumptions. In the two-group case Xlw+wo'l-M'zl __-e'l,
with normally distributed attributes, the scalar-
X2w + w0 • 1 + M . z 2 _~ +e- 1
valued discriminant function that minimizes ex-
pected misclassification cost is known to be linear where e is a small positive constant. This formula-
if the two groups have identical covariance struc- tion is nearly as general, although it is mathemat-
tures and quadratic if not. In both cases, direct ically possible that infelicitous choices of e and M
estimation of f (.) is straightforward. Nonparamet- could rule out an otherwise desirable solution.
ric approaches, making no distributional assump- While there will often be a unique best choice
tions, have utilized an eclectic assortment of tech- of training observations to misclassify (i.e., unique
niques including neural networks, genetic search optimal values of zl and Z2), there commonly will
and mathematical programming. Although some be infinitely many choices for the coefficients w, w0
consideration has been given to nonlinear pro- of a discriminant function that misclassifies those
gramming methods, the bulk of the work involving observations only. To select from among those co-

387
Mixed integer classification problems

efficient solutions, authors often introduce addi- yielding a variation of (1) in which the objective
tional terms in the objective function. As an ex- function is replaced with
ample, S.M. Bajgier and A.V. Hill [2] used a for- 2 Kg
mulation similar to the following: min~-~ 7rgCg E N g k Z g k .
r" 2 Ng 9-1 Yg k--1
7r9
min E ~ E [Cgzgn +eldgn -e2d;n ] In this formulation [1], Kg is the number of dis-
9=1 n=l tinct attribute vectors x in the training sample
s.t. Xlw+Wo'l+d +-d~-<0 from group g, Ngk is the number of repetitions of
X2w+wo.l-d ++d 2 >0 the kth distinct observation from group g, and the
dg - Mzg <_0 matrices Xg contain only one copy of each such
observation.
, w, w0 free; d +, dg >_ 0; zg E {0, 1}gg.
The deviation variables d + and dg measure the H e u r i s t i c s . Advances in computer hardware, op-
amount by which each score fails on the correct timization software and algorithms for the mixed
and incorrect side of the zero cutoff, respectively. integer classification problem have allowed pro-
The objective functions rewards the former and gressively larger training samples to be employed:
penalizes the latter, using small positive objective where G.J. Koehler and S.S. Erenguc [5] were re-
coefficients el and e2 to prevent improvements in stricted to combined training samples of 100 in
these terms from inducing unnecessary misclassi- 1990 (on a mainframe), P.A. Rubin [8] was able
fications. to handle over 600 observations in 1997 (on a per-
The motivation for formulation (1) is simple: if sonal computer). Nonetheless, a variety of heuris-
the training samples are representative of the over- tics have been developed to find near optimal solu-
all population, the discriminant function that min- tions to the problem. Several revolve around this
imizes misclassification costs on the training sam- property of the problem: if the training samples
ples should come close to minimizing expected mis- can be classified with perfect accuracy by a lin-
classification cost on the overall population. Mod- ear function, then problem (1) can be solved as a
els like (1) tend to be computationally expensive, linear program, with the zgr, deleted, to obtain a
however. The constant M must be chosen large discriminant function. Deletion of the zgn reduces
enough that the best choice of w and wo is not the objective function to a constant 0. Although
rendered infeasible by a misclassified observation this is perfectly acceptable, heuristics may substi-
with score larger than M in magnitude; but the tute an objective function from one of the linear
larger M is, the weaker the bounds in a branch and programming classification models, to encourage
bound solution of the problem, and thus the longer the chosen discriminant function to separate scores
the solution time. As is typical with mixed integer of the two groups as much as possible. This often
programming models, computation time increases also necessitates inclusion of a normalization con-
modestly with the number of attributes (p) but straint, to keep the resulting linear program from
more dramatically with the number of zero-one being unbounded. Alternatively, (1) may be solved
variables (N1 + N2, the combined sample size). heuristically to determine which training observa-
Unfortunately, the reliability of the discriminant tions to misclassify, and then a linear programming
functions improves as the training samples grow, model using the remaining observations may be
creating a tension between validity and tractabil- employed to select the final discriminant function.
ity. The BPMM heuristic of [5] solves the linear
In the special case where all attribute variables program dual to a relaxation of the mixed inte-
are discrete, it is likely that some observation vec- ger problem, notes which observations would be
tors will appear more than once in the training misclassified by the resulting discriminant func-
samples. When that occurs, the number of zero- tion, and then solves the dual of each linear re-
one variables can be reduced from one per ob- laxation obtainable by deleting one of those ob-
servation to one per distinguishable observation, servations. Solving the dual problem tends to be

388
Mixed integer classification problems

more efficient than solving the primal, since there G Ng


will typically be more observations than attributes min ETrgCgEzgn
(N1 + N2 > > p). The heuristics presented in [7] g=l Ng n=l

also operate on the dual of the linear relaxation of s.t. Xgw + w0" 1 - M . z g - Ug. 1 _< 0
the mixed integer problem, restricting basis entry Xgw+W0.1+M.zg-Lg.1 >_0
to force certain dual variables to take value zero
U g - L g >O
(equivalent to relaxing the corresponding primal
constraints, thus allowing the associated observa-
Lh - Ug + Myhg > e
tions to be misclassified). Ygh + Yhg -- 1
w, w0, L, U free;
z9 E {0, 1}Yg; y E {0, 1} G(G-1).

M u l t i p l e G r o u p s . When G > 2 groups are in- The first three constraints are repeated for g =
volved, the problem becomes considerably more 1 , . . . , G while the next two are repeated for all
complicated. In a practical application with mul- pairs g,h = 1 , . . . , G such that g # h. Observa-
tiple groups, it is plausible that misclassification tions are classified into group g if their scores fall
costs would depend not only on the group to which in the interval [Lg, Ug]. Variable Ygh = 1 if the scor-
a misclassified point belonged but also the one into ing interval for group g precedes that for group h.
which it was classified. Thus an appropriate objec- Parameter c > 0 dictates a minimum separation
tive function might look like between intervals.
Using a single scalar-valued discriminant func-
tion with G > 2 groups is restrictive; it assumes
that the groups project onto some line in an or-
derly manner. In [3], Gehrlein also suggested a
G G Ng model using a vector-valued discriminant function
7[9
f() of dimension G. Observation x would be clas-
g=l h=l n=l sifted into the group corresponding to the largest
h#g
component of f(x). The model increases the num-
ber of coefficient variables and the number of con-
straints but not the number of 0-1 variables, the
primary determinant of execution time. The model
where Cgh is the cost of classifying a point from is"
group g into group h and Zghn is 1 if the nth ob- G Ng
servation of group g is classified into group h and min ~ ~ g C g ~ z g , ~
0 otherwise. This represents a substantial escala- g=l Ng n=l

tion of the number of indicator variables. As a s.t. X g w g -Jr Wgo " 1


consequence, most research on the multiple group -Xgw h - whO" 1 @ M . zg > c. 1
problem assumes that misclassification costs de- wg, Wgo free; Zg E {0, 1} Ng.
pend only on the correct group.
Here, w~x + wg0 is the gth component of f(x)
and e > 0 is the minimum acceptable differ-
Few models, and fewer computational results, ence between the correct component of the scor-
have been published for the multiple group prob- ing function and the largest incorrect component.
lem. W.V. Gehrlein [3] presented one of the earli- The sole constraint is repeated once for each pair
est scalar-valued mixed integer models for the case g,h = 1 , . . . , G such that g ~ h.
G > 2. The range of his discriminant function is More recently, R. Pavur proposed a sequential
partitioned into separate intervals corresponding mixed integer method to handle multiple groups
to the groups. His model, adapted to the preced- [6], constructing a vector-valued scoring function
ing notation, is from a sequence of scalar functions. An initial

389
Mixed integer classification problems

mixed integer model similar to Gehrlein's is solved [3] GEHRLEIN, W.V.: 'General mathematical program-
to obtain the first scalar function. Thereafter, a se- ming formulations for the statistical classification prob-
lem', Oper. Res. Left. 5, no. 6 (1986), 299-304.
quence of similar mixed integer models is solved,
[4] HAND, D.J.: Construction and assessment of classifi-
with each model bearing additional constraints cation rules, Wiley, 1997.
compelling the scores produced by the next scoring [5] KOEHLER, G.J., AND ERENGUC, S.S.: 'Minimizing
function to have sample covariance zero with the misclassifications in linear discriminant analysis', De-
scores of each of the preceding functions. The co- cision Sci. 21, no. 1 (1990), 63-85.
variance constraints impose a sort of probabilistic [6] PAVUR, R.: 'Dimensionality representation of linear
discriminant function space for the multiple-group
'orthogonality' on the dimensions of the composite problem: An MIP approach', Ann. Oper. Res. 74
(vector-valued) scoring function. (1997), 37-50.
See also" I n t e g e r p r o g r a m m i n g ; M u l t i - [7] RUBIN, P.A.: 'Heuristic solution procedures for a
o b j e c t i v e m i x e d i n t e g e r p r o g r a m m i n g ; Sim- mixed-integer programming discriminant mode]', Man-
agerial and Decision Economics 11, no. 4 (1990), 255-
plicial p i v o t i n g a l g o r i t h m s for i n t e g e r pro-
266.
g r a m m i n g ; Set covering, p a c k i n g a n d par-
[8] RUBIN, P.A" 'Solving mixed integer classification
t i t i o n i n g p r o b l e m s ; T i m e - d e p e n d e n t travel- problems by decomposition', Ann. Oper. Res. 74
ing s a l e s m a n p r o b l e m ; G r a p h coloring; In- (1997), 51-64.
t e g e r p r o g r a m m i n g duality; I n t e g e r pro- Paul A. Rubin
gramming: Lagrangian relaxation; Integer Michigan State Univ.
programming: Algebraic methods; Integer East Lansing, MI, USA
programming: Branch and bound meth- E-mail address: rubin%msu, edu
ods; I n t e g e r p r o g r a m m i n g : B r a n c h a n d c u t MSC 2000: 62H30, 65Cxx, 65C30, 65C40, 65C50, 65C60,
algorithms; Integer programming: Cutting 90Cll
p l a n e a l g o r i t h m s ; I n t e g e r linear c o m p l e - Key words and phrases: classification, discriminant analy-
mentary problem; LCP: Pardalos-Rosen sis, integer programming.
mixed integer formulation; Decomposition
t e c h n i q u e s for M I L P : L a g r a n g i a n relax-
ation; M u l t i - o b j e c t i v e i n t e g e r linear pro- M I X E D INTEGER LINEAR PROGRAM-
gramming; Multiparametric mixed integer MING: H E A T E X C H A N G E R NETWORK
linear p r o g r a m m i n g ; P a r a m e t r i c m i x e d in- SYNTHESIS
t e g e r n o n l i n e a r o p t i m i z a t i o n ; S t o c h a s t i c in- Heat exchanger networks use the waste heat re-
t e g e r p r o g r a m m i n g : C o n t i n u i t y , stability, leased by hot process streams to heat the cold pro-
r a t e s of convergence; S t o c h a s t i c i n t e g e r cess streams of a chemical manufacturing plant,
p r o g r a m s ; B r a n c h a n d price: I n t e g e r pro- reducing utility costs by as much as 80%. Heat ex-
gramming with column generation; Statisti- changer network synthesis has been an active area
cal classification: O p t i m i z a t i o n a p p r o a c h e s ; of process research ever since the energy crisis of
L i n e a r p r o g r a m m i n g m o d e l s for classifica- the 1970s, and over 400 research papers have been
tion; O p t i m i z a t i o n in B o o l e a n classification published in the area. See [1], [2], [4], [5], [6], for
p r o b l e m s ; O p t i m i z a t i o n in classifying t e x t recent reviews.
documents. In 1979, T. Umeda et al. [8] discovered a thermo-
dynamic pinch point that limits the energy savings
References of a heat exchanger network, establishes minimum
[1] ASPAROUKHOV,O.K., AND STAM, A.: 'Mathematical utility levels, and partitions the heat exchanger
programming formulations for two-group classification network into two independent subnetworks. This
with binary variables', Ann. Oper. Res. 74 (1997), 89- discovery revolutionized heat exchanger network
112.
synthesis: with it, designers could compute util-
[2] BAJGIER, S.M., AND HILL, A.V.: 'An experimental
comparison of statistical and linear programming ap- ity levels a priori, then seek the heat exchanger
proaches to the discriminant problem', Decision Sci. network structure that uses the minimum utility
13, no. 4 (1982), 604-618. consumption while also minimizing the total in-

390
Mixed integer linear programming: Heat exchanger network synthesis

vestment cost. This remaining problem requires while cold process streams, the heat sinks, are akin
matching the hot utilities and process streams that to stores and shopping malls, the sinks of manu-
release heat with the cold process streams and util- factured goods.
ities that require heat, choosing the network struc- The analogy is not perfect, as heat only flows
ture of each stream, and designing the individual from a high temperature to a lower one, in obedi-
heat exchanger networks. In general, this is a mixed ence to the second law of thermodynamics. Par-
integer nonlinear programming problem (MINLP), titioning the temperature range of the heat ex-
but can be decomposed into two smaller problems changer network into intervals can capture this
by first selecting the matches between hot and cold heat flow pattern. Each interval sends excess, or
process streams and utilities by minimizing the to- residual, heat to the interval below it, just as ex-
tal number of units, then optimizing the network cess manufactured goods are sent to a discount
structure. The first problem is a mixed integer lin- warehouse.
ear programming problem that will be discussed The hot side of this temperature cascade is cre-
in detail here. ated by ordering T~ and TjI + ATmin from the
highest to the lowest value, creating t = 1 , . . . , T I
U s i n g M I L P M o d e l s to F i n d t h e M i n i - temperature intervals. Temperatures on the cold
m u m N u m b e r of U n i t s . Stated formally, the side of the cascade equal the temperature on the
minimum-units problem is: hot side minus ATmin. Hot stream i releases QiH
Given units of heat to temperature interval t. QiH is equal
1) A set of hot process streams and utilities to
i E H, and for each hot stream i: FCPi(Tt-1 - Tt)
a) the inlet and outlet temperatures T~ and if T[ >_ Tt -1 and T ° <_ Tt ,
T°;
FCP (T _ - T °)
b) either the heat capacity flow rate FCpi QiH -
or the heat duty Qi. if T I >__Tt -1 and T ° >_ Tt ,
Q
2) A set of cold process streams j C C, and for
each cold stream j:
a) the inlet and outlet temperatures TjI and
Cold stream j absorbs QjCt units of heat from tem-
T°;
perature interval t. Q~t equals
b) either the heat capacity flow rate FCpj
or the heat duty Qj,
F C P j ( T t - 1 - - Tt)
3) The minimum temperature difference be- if T I < Tt - ATtain and
tween hot and cold streams exchanging heat,
T ? >_ Tt - 1 - A Tmin,
ATtain.
Q~ _ F C P j ( T ° - Tt-~)
Identify a set of stream matches (i j) and their
if T] <_ Tt - ATmin and
heat duties Qij that
T ? <_ Tt -1 - ATmin,
a) meets the heating and cooling needs of each
stream; and Qj
i f T / - T° and T] - T t - i - ATmin.
b) minimizes the total number of matches.
S.A. Papoulias and I.E. Grossmann [7] formu- Any excess heat sent to interval t from hot
lated this as a mixed integer programming problem stream i cascades down to interval t + 1 through the
using a transshipment model, by making an anal- residual flow Rit. Process utilities may be treated
ogy between heat exchanger networks and trans- as process streams, or may be placed at the top or
portation networks. In the transshipment analogy, bottom of the cascade.
hot process streams, the sources of heat, are simi- This transshipment model of heat flow leads to
lar to manufacturing plants, the sources of goods, the following mixed integer linear programming

391
Mixed integer linear programming: Heat exchanger network synthesis

problem" Lower bounds on the solution of this problem


are given by linear programming problems where
min Yij some integer variables are fixed to either zero or
i,j one and the remainder are treated as continuous
subject to variables. The accuracy of these bounds depends
upon the parameters Uij is the fourth constraint.
R i , t - Ri,t-I + E qijt - QiH, (1) When these parameters are very large, the lower
J bounds will be quite far from the solution of the
i - 1 , . . . , H, t - 1 , . . . , TI, MILP.
The smallest acceptable value of Uij is the min-
Z - j-1,...,C, t-1,...,TI, (2) imum of the cooling requirements of stream i and
i
the heating requirements of stream j:
Oij- ~ qijt, i-1,...,H, j-1,...,C, (3)
tETI
Uij-min{EQi H, E Q C j t } •
Qij <_ Uijyij, i-1,...,H,j-1,...,C, (4) tETI tETI
qijt >_ O, EXAMPLE 1 This example is from [3] and features
Rt > O, (5) three hot streams, two cold streams, and a cold
utility. Table 1 gives the inlet and outlet stream
i- 1,...,H, j- 1,...,C, t- 1,...,TI,
temperatures and the flowrate heat capacities of
R0 - - 0, (6) each process stream and the cooling water duty.
Y i j - { O , 1}, i-1,...,g, j-1,...,C. (7) Stream T in (°C) T °ut (°C) FCp (kW/K)
H1 159 77 228.5
In this formulation, Yij is a binary variable H2 159 88 20.4
which is one if a match between hot process stream H3 159 90 53.8
i and cold process stream j occurs, and zero other- C1 26 127 93.3
C2 118 149 196.1
wise; qijt is the amount of heat exchanged between
hot stream i and cold stream j in temperature in- Table 1: S t r e a m data. Qcw = 8395.2kW, ATmin - 10°C.
terval t, Rit is the residual heat flow associated Temperatures on the hot side of the cascade are
with hot stream i that cascades down from tem- 159 °C, 128 °C, and 36 °C, while temperatures on
perature interval t to temperature interval t + 1, the cold side are 149°C, 118°C and 26°C. There
and Qij is the heat duty of match (i, j). The overall are two temperature intervals. Table 2 gives the
objective function minimizes the total number of heat released from hot streams to the temperature
units. Constraint (1) is the energy balance for hot intervals, while Table 3 gives the heat absorbed by
stream i around temperature interval t and con- the cold streams from the temperature intervals.
straint (2) is the energy balance for cold stream j
Stream Temperature Interval
around temperature interval t. Constraint (3) finds TI-1 TI-2
the overall heat duty of match (ij). Constraint (4) H1 7083.5 11635.5
sets this heat duty to zero when match (ij) does H2 632.4 816.0
not exist. The nonnegativity constraints prevents H3 1667.8 2044.4
heat flow from a low temperature to a higher one. Table 2: QH
it, heat released from hot stream i to
Note that the residual heat flows into the first tem- t e m p e r a t u r e interval t.
perature interval and out of the last temperature Stream Temperature Interval
interval are zero when there are no utilities above TI-1 TI-2
or below the cascade. The objective function and C1 839.7 8583.6
the constraints are linear, and the formulation in- C2 6079.1
volves both continuous and integer variables, mak- CW 8395.2

ing this a mixed integer linear programming prob- Table 3" QC, heat absorbed by cold stream i from
lem. t e m p e r a t u r e interval t.

392
Mixed integer linear programming: Heat exchanger network synthesis

Solution 1 I Solution 2 Solution 3 Solution 4


Match Duty I Match Duty Match Duty Match Duty
(Q'~ ) I..... (V'J ) (Q,j) (Q,~)
H1- 9423.3 H1- 7974.9 H1- 5711.1 H1- 4262.7
C1 ,
C1 C1 C1
H1- 6079.1 H1- 6079.1 H1- 6079.1 H1- 6079.1
C2 .
C2. . . .
C2
. . . .
C2
H1- 3234.6 H1- 4683.0 H1- 6946.8 H1- 8395.2
CW CW CW CW
H2- 1448.4 H2- 1448.4 H2- 1448.4 H2- 1448.4
CW C1 CW C1
H3- 3712.2 H3- 3712.2 H3- 3712.2 H3- 3712.2
W CW C1 C1

Table 4: Four solutions which satisfy minimum number of matches.

In this example, the minimum number of units [1] BIEGLER, L.T., GROSSMANN, I.E., AND WESTER-
is 5, and there are four solutions to this MILP that BERG, A.W.: Systematic methods of chemical process
meet this minimum (cf. Table 4). [2] design, Prentice-Hall, 1997.
[2] FLOUDAS, C.A.: Nonlinear and mixed-integer optimi-
zation, Oxford Univ. Press, 1995.
[3] GUNDERSEN, T., AND GROSSMANN, I.E.: 'Improved
Conclusions. Mixed integer linear programs are optimization strategies for automated heat exchanger
used in heat exchanger network synthesis to iden- synthesis through physical insights', Computers Chem.
tify the minimum number of units, and a set of Engin. 14, no. 9 (1990), 925.
matches and their heat loads meeting the mini- [4] GUNDERSEN, W., AND NAESS, L.: 'The synthesis of cost
mum. These MILPs are based upon a transship- optimal heat exchanger networks: An industrial review
of the state-of-the-art', Computers Chem. Engin. 12,
ment model of heat flow.
no. 6 (1988), 503.
See also: Global optimization of heat ex- [5] JEZOWSKI, J.: 'Heat exchanger network grassroot and
c h a n g e r networks; Chemical process plan- retrofit design: The review of the state-of-the-art: Part
ning; Mixed integer linear programming: I', Hungarian J. Industr. Chem. 22 (1994), 279-294.
Mass and heat exchanger networks; Mixed [6] JEZOWSKI, J.: 'Heat exchanger network grassroot and
retrofit design: The review of the state-of-the-art: Part
integer nonlinear programming; Gener-
II', Hungarian J. Industr. Chem. 22 (1994), 295-308.
alized Benders decomposition; MINLP: [7] PAPOULIAS, S.A., AND GROSSMANN, I.E.: 'A struc-
Outer approximation algorithm; General- tural optimization approach in process synthesis - II:
ized outer approximation; MINLP: Gener- Heat recovery networks', Computers Chem. Engin. 7
alized cross decomposition; Extended cut- (I 983), 707.
[8] UMEDA, T., HARADA, T., AND SHIROKO, K.: 'A ther-
ting plane algorithm; MINLP: Logic-based
modynamic approach to the structure in chemical pro-
methods; MINLP: Branch and bound meth- cesses', Computers Chem. Engin. 3 (1979), 373.
ods; MINLP: Branch and bound global opti-
mization algorithm; MINLP: Global optimi- Kemal Sahin
Dept. Chemical Engin. Univ. Cincinnati
zation with c~BB; MINLP: Heat exchanger
Cincinnati OH 45221, USA
network synthesis; MINLP: Reactive dis- Korhan Gursoy
tillation column synthesis; MINLP: Design Dept. Chemical Engin. Univ. Cincinnati
and scheduling of batch processes; MINLP: Cincinnati OH 45221, USA
Applications in the interaction of design Amy Ciric
and control; MINLP: Application in facility Dept. Chemical Engin. Univ. Cincinnati
location-allocation; MINLP: Applications in Cincinnati OH 45221, USA
blending and pooling problems. MSC 2000:90C90
Key words and phrases: MILP, HEN synthesis, transship-
References ment model.

393
Mixed integer linear programming: Mass and heat exchanger networks

MIXED INTEGER LINEAR PROGRAM- crating agents, MSAs), S = {j: j = 1 , . . . , N s } ,


MING: MASS AND HEAT EXCHANGER with known cost, inlet and outlet compositions
N E T W O R K S , MEN, MHEN for the same components, Xjc , x}c (exact values
Separation networks involving mass transfer oper- or bounds), as shown in Fig. 1.
ations that do not require energy (e.g. absorption, The synthesis problem refers to the selection of
liquid-liquid extraction, ion-exchange etc.) are the appropriate lean streams and their flowrates,
characterized as mass exchange networks (MEN). Lj, the mass exchange operations (mass exchange
These appear in the chemical industries mostly matches), the mass transfer load for each separa-
in waste treatment, but also, in feed preparation, tor and its required size, and the configuration of
product separation, recovery of valuable materi- the overall network.
als, etc. A mass exchanger, in this context, is any Mass transfer in each mass exchanger is gov-
counter-current, direct-contact mass transfer unit, erned by the first and second thermodynamic laws,
where one or more components are transferred at as is heat transfer in heat exchangers. Mass trans-
constant temperature and pressure from one pro- fer of a component c from a rich to a lean stream
cess stream, which is characterized as rich stream, is feasible if the composition of c in the rich phase
to another process or utility stream, characterized is greater than the equilibrium composition with
as lean stream. Mass integration aims to the purifi- respect to the lean phase:
cation of the rich streams and the recovery of valu-
w_> (1)
able or hazardous materials at the minimum total
cost (investment and operating cost of auxiliary where f(Xc) is the equilibrium relation and e
streams). In the specific case, when the mass trans- is a minimum composition difference that en-
fer operations take place at the same temperature, sures feasible mass transfer in a separator of fi-
or heating/cooling requirements are negligible, the nite size, in analogy to ATmin in heat exchangers.
integration problem is limited to the synthesis of a This analogy led to the development of synthe-
mass exchanger network (MEN) only. When mass sis methods for mass exchanger networks employ-
exchange operations at different temperature lev- ing mixed integer optimization techniques, simi-
els are encountered, mass and heat exchanger net- lar to heat exchanger networks (cf. M i x e d inte-
works (MHEN) may be considered simultaneously. ger linear p r o g r a m m i n g : M a s s a n d h e a t ex-
Rich streams changer networks; MINLP: Mass and heat
R={il i=l..N R} e x c h a n g e r n e t w o r k s ) , that are categorized into
Gi y.S the sequential synthesis and the simultaneous syn-
1,C
Lean
thesis methods.
streams
S ={jl j=I..Ns }
The sequential MEN synthesis method, intro-
duced in [3] and [4] involves the following steps:
Mass
Exchange 1) Minimum cost of mass separating agents
Network (minimum utility problem), to determine the
U
Lj _ L j x!j,c--
< Xl,c< x.j,c
u optimal flows of the mass separating agents.
xs 2) Minimum number of mass exchanger units,
j,c
for fixed MSA flows, to determine the mass
y~c
,
< y~c ~< yUI , C
exchange matches.
Fig. 1. 3) Network configuration and separator sizes for
MEN synthesis involves a set of rich streams, in fixed mass exchange operations.
terms of one or more components, R = { i : i = The first two synthesis steps involve the solution
1 , . . . , A m } , with known flowrates, Gi, inlet and of linear and mixed integer linear problems.
outlet compositions for the components of interest, A useful tool of the sequential MEN synthesis
YiSc, Y~c (exact values or bounds)respectively, and a method is the composition interval diagram, CID,
set of process or auxiliary lean streams (mass sep- where thermodynamic feasibility of mass transfer

394
Mixed integer linear programming: Mass and heat exchanger networks

is explored mapping the rich and the lean streams • Rk is the set of rich streams, present in in-
on equivalent composition scales, that are derived terval k;
from the mass transfer feasibility requirements in
• Sk is the set of lean streams, present in in-
(1). In general, the composition equivalent scales terval k;
and the minimum composition difference, e, are de-
fined for each component of interest and each pair • Nint is the number of composition intervals;
of rich and lean streams. In the simple case of a • WRik is the mass exchange load of rich
single component, where mass transfer is indepen- stream i in interval k,
dent of the presence of other components in the
WR~ - Gi(Yk - max(yk+l, y~));
rich streams, the CID is constructed as illustrated
in Fig. 2. • W SJk is the mass exchange load of lean
Ri, stream j in interval k,

WSak - Lj(min(x}, xjk) - Xjk+l);


Yk-1 intervalk-1 Xj'k-1 ..... xj"kl il
Yk Xj,k xj,,k • • 5k is the residual mass exchange load in in-
k terval k.
interval
Y k+ 1 Xj,k+ 1 xj,,k+ 1 ..........
Problem (TP1) results in the optimal flows of
Lj the mass separating agents and the identification
of the pinch points, i.e. the thermodynamic bot-
Fig. 2: Composition interval diagram.
tlenecks in mass transfer. The pinch points are de-
Feasible rich-to-lean mass transfer is guaranteed fined by zero residual flows and divide the mass
within a composition interval when the equilib- exchange network into subnetworks. Mass trans-
rium relation f(Xc) is convex within the interval. fer between different subnetworks (i.e. across the
When f(Xc) is convex in the whole composition pinch) increases the cost of mass separating agents.
range, only inlet compositions are required to con- An assumption in (TP1) is that molar flows
struct the CID [8]. of the rich and the lean streams are constant. If
The minimum cost of mass separating agents significant flowrate variations take place, composi-
is found employing a transshipment model, where tions and mass exchange loads are calculated based
the components of interest are the transferred on nontransferable components.
commodities, the rich and the lean streams are The following cases are distinguished:
considered as sources and sinks respectively, and
• Fixed inlet and outlet compositions.
the composition intervals define the intermediate
Then, (TP1) is an LP problem.
nodes [4]. The model involves energy balances
When multiple components are consid-
around the temperature intervals (intermediate
ered, the CID is defined for all the compo-
nodes):
nents of interest and (TP1) corresponds to
the multicommodity transshipment model.
rain
EJ cjLj The pinch points are then determined by the
component that requires the greater MSA
s.t. 5k-1 + E W R~ flows.
i~nk
Variable outlet compositions.
(TP1)
j6sk Then, the mass exchange loads of the rich
up and lean streams in their final intervals (de-
0 <_ Lj < Lj , j E S,
fined by the upper and lower bounds on their
50 --- 5Nint : 0; outlet compositions) are variables. Problem
5k _> 0, k = 1 , . . . , Nint - 1, (TP1) can still be solved as an LP [9], con-
sidering the variable mass exchange loads ex-
where plicitly in the model.

395
Mixed integer linear programming: Mass and heat exchanger networks

Variable inlet compositions usually require flexible minimum number of mass exchangers is found em-
mass exchange networks to accommodate the vari- ploying the expanded transshipment model, where
ations and define a different problem. For a single the existence of a mass exchange match-separator
component it has been shown that the minimum in a subnetwork is denoted by a binary variable:
MSA cost corresponds to the lower bounds of the
inlet compositions [8]. 1, when streamsi, j
For nonconvex equilibrium relations, (TP 1) can- Eijm- exchange mass
not guarantee feasible mass transfer throughout in subnetwork m
the composition range, while the predicted MSA 0, otherwise.
cost is a lower bound to the actual minimum one.
B.K. Srinivas and M.M. E1-Halwagi suggested in For a single component, the minimum number of
[14] an iterative procedure to calculate the mini- mass exchanger units is given by the following
mum required MSA cost, that involves two major MILP problem [4]:
steps:
i) a 'feasibility problem', where 'critical' com-
min
m iERm jESm
E Jm
position levels are identified and included in
s.t. (~ik -- (~ik-1 + E Mijk
the CID (nonconvex NLP step, that requires jESmk
global optimization methods), and - WRik,
ii) (TP1) with updated intervals, which calcu- k E Ira, i E Rmk, m E M,
lates increasing lower bounds to the mini-
mum MSA cost.
iE Rrnk
Instead of target outlet compositions for the rich (TP2) k E Ira, j E Sink, m E M
streams, it may be of interest to remove a cer-
E Mijk -- EijmUijm ~_ 0
tain total mass load of pollutants. Then, (TP1) is
kEIm
solved with variable rich outlets and a fixed total
5/k ~ 0 , k E I m , i E R m ,
mass exchange load [10]:
M~jk >_O, k E Im,
- Z - i E Rkm, j E Skm
i
E~jk =O, 1, k E Im,
The minimum-utility-cost problem has been al-
ternatively formulated as an LP or MINLP prob- i E Rkm, j E Skin,
lem, based on total mass balances and the follow- where
ing property:
• Rm is the set of rich streams, present in sub-
/ Mass l°st by allthe rich /
network m,
streams below each (2)
pinch point candidate • Sm is the set of lean streams, present in sub-
network m,
Mass gained by all the lean /
- streams below each _ 0 • Im is the set of intervals in subnetwork m,
pinch point candidate • Rkm is the set of rich streams, present in in-
and employing binary variables to denote the terval k of subnetwork m, or above,
relative position of variable outlet compositions • Skin is the set of lean streams, present in in-
with respect to each pinch point candidate in the terval k of subnetwork m,
CID [5], [6], [8], [9].
• WR~ is the mass exchange load of rich
The minimum number of mass exchange opera-
stream i in interval k,
tions (units) for fixed MSA cost is determined in
each subnetwork in a second step, in an attempt • (~ik is the residual mass exchange load of rich
to minimize the fixed cost of the separators. The stream i in interval k,

396
Mixed integer linear programming: Mass and heat exchanger networks

• W SJk is the mass exchange load of lean matches, featuring the minimum MSA cost, may
stream j in interval k, as determined by be generated by solving (TP2) iteratively and in-
(TP1), cluding integer cuts. These do not necessarily cor-
respond to networks of the same overall cost.
• M i j k is the mass exchange load between i and
j in interval k, The expanded transshipment model can also
be employed to determine the minimum MSA
• Uijm is an upper bound to the possible mass
cost, considering variable mass loads for the lean
exchange load between i and j in subnetwork
streams. Then, forbidden or restricted mass ex-
m~

min/ change operations can be explicitly accounted for.


Although (TP2) does not determine the network
structure, stream splitting and exchanger connec-
tivity may be guided by the resulting mass ex-
Srinivas and E1-Halwagi have shown [14] that, change load distribution in each composition inter-
when the equilibrium relations around a pinch val [4]. The actual network configuration is found
point are not convex, a mass exchanger can strad- in a next step, employing heuristic methods [3], [5]
dle the pinch and still be thermodynamically fea- or superstructure methods (NLP models).
sible. To account for such cases, exchangers across Special cases of mass exchange networks have
the pinch points can be considered introducing ex- been studied:
tra binary variables:
• MEN and regeneration networks [5], [11].
Iijp ~_ Mijp, The regeneration of mass separating
Iijp+ l <_ Mijp+ l , agents by auxiliary streams can be consid-
ered simultaneously with the main MEN, in
Iijp + Iijp+l >_ 2Blip,
another mass exchanger network, where the
I jp, I jp+ 1 { 0,1}, MSAs behave as the rich streams. In this
Bijp C {0, 1}, case, the CID is extended to include the
where equivalent composition scales of the regen-
erating agents. The inlet and outlet compo-
• Iijp denotes that streams i and j exchange
sitions of the lean streams in the main MEN
mass at the interval directly above pinch
are in general variables.
point p,
• Reactive mass exchange networks [6], [11],
• Iijp+l denotes that streams i and j exchange
[14]
mass at the interval directly below pinch
Rich-to-lean mass transfer may involve in-
point p,
terphase mass transfer and chemical reaction
• B i j p denotes the existence of an exchanger in the lean phase, at constant temperature.
between streams i and j, across the pinch p. Mass exchange operations of this kind are
Then, the number of required units to minimize considered deriving the equilibrium relations
is given by" based on chemical equilibrium.

z/zm i6Rm j6Sm P


The main advantage of the sequential synthesis
method for mass exchange networks is that sim-
ple optimization models are solved. However, un-
Note, that Iijp-variables can be relaxed to continu- less the MSA cost is dominant, as synthesis de-
ous, due to total unimodularity of the model with cisions are fixed from one step to the next, im-
respect to these variables: portant trade-offs between operating and capital
cost are not exploited and overall cost optimality
0 ~ Zijp, Iijp+l ~_ 1
cannot be guaranteed. Furthermore, the minimum
Problem (TP2) may not have a unique solu- composition difference, e that defines the mass re-
tion. Alternative combinations of mass exchange covery levels in (TP1) and (TP2), is in general,

397
Mixed integer linear programming: Mass and heat exchanger networks

an optimization variable for each mass exchanger MHEN is independent of such a stream decompo-
separately. In the sequential synthesis method this sition, see Fig. 3.
is fixed arbitrarily to a possibly conservative value
for the construction of the CID. E1-Halwagi and V. (Ri I , ySi ,T si ) ( R i l ' Y i t 'Tit )
Manousiouthakis [4] suggested a two-level optimi-
hot ~ rich / cold
zation procedure to select a unique e for all mass stream ~
stream t
stream
exchange operations, based on the impact of e on (Ri I ,ySi 'TI ) ~ ( R i l ' y i 'T1 )
the final MEN cost, still, not exploiting the overall
Fig. 4: Rich s u b s t r e a m with T2 _< Tl _< T[.
cost trade-offs.
When isothermal mass exchange operations Although the mass exchange temperatures
take place at different temperature levels, the op- ( T 1 , . . . , T N ) are variables, their relative position
erating and overall mass integration costs are af- with respect to inlet and outlet stream tempera-
fected by the heating and cooling requirements tures (greater or less) can be prepostulated. Thus,
of the system. Energy integration between the the rich and lean substreams define hot (or cold)
rich and lean streams can be considered within a streams before their mass exchange operations and
mass and heat exchanger network synthesis prob- cold (or hot) streams afterwards, cf. Fig. 4.
lem (MHEN) to reduce the total cost. The overall A CID is constructed, similarly to the simple
problem is addressed combining MEN and HEN MEN case, involving the several substreams with
synthesis tools. The optimal temperature of mass variable flows, and thus, variable mass loads in
exchange is defined for each pair of rich and lean each composition interval. Mass exchange is per-
streams by the equilibrium relations that limit mitted between substreams of the same tempera-
mass transfer ture. A temperature interval diagram, TID, is also
constructed, involving the hot and cold substreams
Yi >_Kij(T)xj, and the available heating and cooling utilities, with
variable heat loads per interval, due to the vari-
where Kij(T) is a known function of temperature. able substream flows. In order to avoid discrete
T s ,x s T1 ,x s decisions (i.e. presence or not of streams in tem-
_ ~ ~- Tk ,x s [ perature intervals with variable limits), the tem-
= TN,Xs ~ ""1 "'" 1 perature range for each mass transfer operation is
Heat Mass i1
discretized and a substream is associated with each
Exchange Exchange
Network Network candidate temperature [13].
Tt,x t TN'Xt ! [ The minimum utility cost is found from the so-
= = Tk x i.... lution of the combined LP transshipment model,
Tl ,x t which, for a single component is as follows:

Fig. 3.
min
In the sequential synthesis framework, the over-
all minimum operating cost for the network (cost
of mass separating agents and heating/cooling
(TP3) + E E chQHUhn
nETI hE HUn
utilities) may be calculated from a combined mass
and heat transshipment model. Each stream is
considered to consist of substreams, of the same in- nETI cECUn
let and outlet composition and temperature, each
such that
of which participates to isothermal mass exchange
operations at a different temperatures. Srinivas
and E1-Halwagi proved [13], that, for monotonic WRy',
dependence of the equilibrium constant on tem- jeSl}eSSjk

perature, the overall utility cost of the combined k c CI, i c R, l~ c RSik,

398
Mixed integer linear programming: Mass and heat exchanger networks

• T I is the set of temperature intervals n,


iE R l~ E R S i k • R S i k is the set of substreams of rich stream
k E C I , j E S, lj E S S j k , i, of variable flow, Gl,, such that

Otsn - Ot~n-1 E Gli -- Gi,


l
s'ERUS l'
8,6CSs'. present in interval k, or above,
• S S j k is the set of substreams of lean stream
+ ~ ec~.- es~,
c6CU. j, of variable flow, Ltd., such that
n E T I , s E R U S, Is E H S s n ,
E Ltj - Lj,
oL -OL_x l
present in interval k,
+ Z E QH,.o- Q v o,
sERUS lsECSsn • H S s n is the set of hot substreams of stream
n E T I , h E HUn, s, present in interval n, or above,
• CSsn is the set of cold substreams of stream
s, present in interval n,
s'6RUS l'
81 6 H S s,n
• H U n is the set of hot utilities, present in in-
+ y~ QHh~. - Q & ~ ,
terval n, or above,
h6HU.
• CUn is the set of cold utilities, present in in-
n E T I , s E R U S, ls E CSsn,
terval n,
E E Qc,. o- Qc o, • W Rtk~ is the mass exchange load of substream
s6RUS IsGHSsn
li, in interval k,
n E T I , c E CUn,
6lik ~ O, k E C I , i E R, li E RSik, WRtk ' = Gt, (Yk -- max(yk+l, yispt)),
lj
Ot, n >_ O, n E T I , s E R U S, ls E H S s n , • W S k is the mass exchange load of substream
Ohhn > O, n E T I , h E HUn, lj, in interval k,

M~,~;.k > O, W S klj - L t j ( m i n ( x j t, x j k ) - Xjk+l),


k E C I , i E R, j E S, • 5t, k is the residual mass load of substream li
I
1~ E RS~k, l j e SSjk, in interval k,
M l , l~k -- O, • Mt, t}k is the mass exchange load between li
and lj , i n k
k E C I , i E R, j E S,
I • F M is the set of mass exchanging substreams
li E R S i k , lj E S S j k ,
that are at different temperatures,
(lil~) E F M ,
• QSt~n is the heat load of substream ls in in-
51iO = 51iNc I = O, terval n,
i E R, li E R S i , • Qt~t',n is the heat exchange load between ls
Olso = OlsNT I -- O, and l~s, in interval n,
s E R U S, I~ E HS~, • Olsn is the residual heat load of hot substream
Oho -- OhNTI
h l s in interval n,
~ O~
h E HU, • Q H U h n is the heat load of hot utility h in
interval n,
where
* QCUcn is the heat load of cold utility c in
• C I is the set of composition intervals k, interval n,

399
Mixed integer linear programming: Mass and heat exchanger networks

• QHht~n is the heat exchange load between ming; G e n e r a l i z e d B e n d e r s decomposition;


hot utility h and l s in interval n, M I N L P : O u t e r a p p r o x i m a t i o n algorithm;
• ohn is the residual heat load of hot utility h Generalized outer approximation; M I N L P :
in interval n, G e n e r a l i z e d cross d e c o m p o s i t i o n ; E x t e n d e d
c u t t i n g plane algorithm; M I N L P : Logic-
• QCt~cn is the heat exchange load between Is
based m e t h o d s ; M I N L P : B r a n c h and b o u n d
and cold utility c in interval n.
m e t h o d s ; M I N L P : B r a n c h a n d b o u n d global
Problem (TP3) results in the minimum util- o p t i m i z a t i o n algorithm; M I N L P : Global op-
ity cost and the corresponding flows of separat- t i m i z a t i o n w i t h a B B ; M I N L P : H e a t ex-
ing agents and heating/cooling utility streams, the changer n e t w o r k synthesis; M I N L P : Reac-
optimal decomposition of each stream into sub- tive distillation c o l u m n synthesis; M I N L P :
streams of fixed mass exchange temperature and Design and scheduling of b a t c h processes;
the mass and heat exchange pinch points and cot- M I N L P : A p p l i c a t i o n s in t h e i n t e r a c t i o n of
responding subnetworks. design and control; M I N L P : A p p l i c a t i o n in
The minimum operating cost of the combined facility location-allocation; M I N L P : Appli-
MHEN can alternatively be found applying the cations in b l e n d i n g a n d pooling problems.
first and second thermodynamic laws (property in
(2)) on the composition and temperature interval References
diagrams [13]. [1] BAGAJEWICZ, M.J., AND MANOUSIOUTHAKIS, V.:
'Mass/heat exchange network representation of distil-
The minimum number of mass and heat ex- lation networks', AIChE J. 38 (1992), 1769-1800.
changers is determined in a second step through [2] EL-HALWAGI, M.M., HAMAD, A.A., AND GARRISON,
the expanded MILP transshipment model, sepa- G.W.: 'Synthesis of waste interception and allocation
rately in each mass and heat exchanger subnet- networks', AIChE J. 42 (1996), 3087-3101.
[3] EL-HALWAGI, M.M., AND MANOUSIOUTHAKIS, V.:
work. The final network configurations and unit
'Synthesis of mass exchange networks', AIChE J. 35
sizes are determined in a final step, applying (1989), 1233-1243.
heuristic rules or superstructure models. [4] EL-HALWAGI, M.M., AND MANOUSIOUTHAKIS, V.:
Additional disadvantages of the sequential 'Automatic synthesis of mass exchange networks with
MHEN synthesis method, compared to the syn- single component targets', Chem. Engin. Sci. 45
thesis of simple MEN, are that: (1990), 2813-283I.
[5] EL-HALWAGI, M.M., ANDMANOUSIOUTHAKIS, V.: 'Si-
i) the mass and heat exchange networks are as- multaneous synthesis of mass-exchange and regenera-
sumed separable and tion networks', AIChE J. 36 (1990), 1209-1219.
[6] EL-HALWAGI, M.M., AND SRINIVAS, B.K.: 'Synthesis
ii) the intermediate mass exchange tempera- of reactive mass exchange networks', Chem. Engin. Sci.
tures are decided in the first step; this forbids 47 (1992), 2113-2119.
full exploitation of the mass/heat integration [7] EL-HALWAGI, M.M., SRINIVAS, B.K., AND DUNN,
trade-offs, as capital cost implications of such R.F.: 'Synthesis of optima] heat-induced separation
networks', Chem. Engin. Sci. 50 (1995), 81-97.
decision is not accounted for.
[8] GUPTA, A., AND MANOUSIOUTHAKIS, V.: 'Minimum
Modeling concepts from the sequential mass utility cost of mass exchange networks with variable
and heat exchanger network synthesis methods, single component supplies and targets', Industr. En-
gin. Chem. Res. 32 (1993), 1937-1950.
employing LP and MILP optimization models,
[9] GUPTA, A., AND MANOUSIOUTHAKIS, V.: 'Variable
have been extended to explore distillation net- target mass-exchange network synthesis through linear
works [1], pervaporation systems [12] and other programming', AIChE J. 42 (1996), 1326-1340.
energy-requiring separation networks [7], [2]. [10] KIPERSTOK, A., AND SHARRAT, P.N.: 'On the optimi-
See also: Global o p t i m i z a t i o n of heat zation of mass exchange networks for removal of pollu-
tants', Chem. Engin. Res. Des. 73 (1995), 271-277.
e x c h a n g e r networks; M i x e d integer lin-
[11] PAPALExANDRI, K.P., PISTIKOPOULOS, E.N., AND
ear p r o g r a m m i n g : H e a t e x c h a n g e r net- FLOUDAS, C.A.: 'Mass exchange networks for waste
work synthesis; C h e m i c a l process plan- minimization: A simultaneous approach', Chem. En-
ning; M i x e d integer n o n l i n e a r p r o g r a m - gin. Res. Des. 72 (1994), 279-294.

400
Mixed integer nonlinear programming

[12] SRINIVAS,B.K., AND EL-HALWAGI, M.M.: 'Optimal ing/planning of batch processes and retrofit
design of pervaporation systems for waste reduction', of heat recovery systems).
Computers Chem. Engin. 17 (1993), 957-970.
[13] SPdNIVAS, B.K., AND EL-HALWAGI, M.M.: 'Synthe- The book [88] studies mixed integer linear optimi-
sis of combined heat and reactive mass exchange net- zation and combinatorial optimization, while the
works', Chem. Engin. Sci. 49 (1994), 2059-2074.
[40] studies mixed integer nonlinear optimization
[14] SRINWAS, B.K., AND EL-HALWAGI, M.M.: 'Synthesis
of reactive mass exchange networks with general non- problems.
linear equilibrium relations', AIChE J. 40 (1994), 463- The coupling of the integer domain and the con-
472. tinuous domain with their associated nonlineari-
Katerina P. Papalexandri ties make the class of MINLP problems very chal-
bp Upstream Technol. lenging from the theoretical, algorithmic, and com-
U.K. putational point of view. Mixed integer nonlinear
E-mail address: papalexkObp, cora optimization problems are encountered in a vari-
MSC2000: 93A30, 93B50 ety of applications in all branches of engineering
Key words and phrases: MILP, mass and heat exchange, and applied science, applied mathematics, and op-
separation. erations research. These represent very important
and active research areas that include:

• process synthesis
M I X E INTEGER
D NONLINEAR P R O -
- heat exchanger networks
G R A M M I N G, M I N L P
- retrofit of heat recovery systems
A wide range of nonlinear optimization problems distillation sequencing
-

involve integer or discrete variables in addition to mass exchange networks


-

the continuous variables. These classes of optimi- - reactor-based systems


zation problems arise from a variety of applica- - reactor-separator-recycle systems
tions and are denoted as mixed integer nonlinear utility systems
-

programming MINLP problems. total process systems


-

The integer variables can be used to model, for metabolic engineering


-

instance, sequences of events, alternative candi-


dates, existence or non-existence of units (in their • process design
zero-one representation), while discrete variables - reactive distillation
can model, for instance, different equipment sizes. - design of dynamic systems
The continuous variables are used to model the - plant layout
input-output and interaction relationships among - environmental design
individual units/operations and different intercon- • process synthesis and design under uncer-
nected systems. tainty
The nonlinear nature of these mixed integer op- - uncertainty analysis
timization problems may arise from: - dynamic systems
- batch plant design
i) nonlinear relations in the integer domain ex-
clusively (e.g., products of binary variables in • molecular design
the quadratic assignment model); - solvent selection
- design of polymers and refrigerants
ii) nonlinear relations in the continuous domain
- property prediction under uncertainty
only (e.g., complex nonlinear input-output
model in a distillation column or reactor • interaction of design, synthesis and control
unit); - steady state operation
iii) nonlinear relations in the joint integer- - dynamic operation
continuous domain (e.g., products of contin- • process operations
uous and binary variables in the schedul- - scheduling of multiproduct plants

401
Mixed integer nonlinear programming

-design and retrofit of multiproduct ii) design of dynamic systems under uncertainty
plants [31], [85]; and
- synthesis, design and scheduling of mul- iii) design of batch processes under uncertainty
tipurpose plants [109], [63], [57], [108].
- planning under uncertainty
In the area of molecular design, the MINLP ap-
• facility location and allocation
plications include"
• facility planning and scheduling
i) the computer-aided molecular design aspects
• topology of transportation networks of selecting the best solvents [91];
The applications in the area of process synthesis ii) design of polymers and refrigerants [80], [22],
in chemical engineering include: [23], [35], [111], [21], [126]; and
i) the synthesis of grassroot heat recovery net- iii) property prediction under uncertainty [81].
works [43], [25], [24], [139], [138], [140];
The MINLP applications in the area of interac-
ii) the retrofit of heat exchanger systems [25], tion of design, synthesis and control include:
[95];
i) studies under steady state operation of chem-
iii) the synthesis of distillation-based separation ical processes [78], [79], [96], [97]; and
systems [102], [131], [8], [9], [104], [90];
ii) studies under dynamic operation [85], [86],
iv) the synthesis of mass exchange networks [54], [11s].
[99];
Applications of MINLP approaches have also
v) the synthesis of complex reactor networks
emerged in the area of process operations and in-
[71], [73], [74], [119];
clude:
vi) the synthesis of reactor-separator-recycle sys-
i) short term scheduling of batch and semicon-
tems [72];
tinuous processes [143], [84];
vii) the synthesis of utility systems [65];
ii) the design of multiproduct plants [53], [17],
viii) the synthesis of total process systems [68], [lS];
[69], [75], [28], [29], [98], [76]; and
iii) the synthesis, design and scheduling of mul-
ix) the analysis and synthesis of metabolic path- tipurpose plants [127], [128], [36], [93], [94],
ways [30], [58], [59], [107]. [132], [133], [116], [37], [13], [137]; and
Reviews of the mixed integer nonlinear optimiza- iv) planning under uncertainty [64], [62], [63],
tion frameworks and applications in Process Syn- [77], [106].
thesis are provided in [49], [40], [50], and [7], while
algorithmic advances for logic and global optimi- Reviews of the advances in the design, scheduling
zation in Process Synthesis are reviewed in [44]. and planning of batch plants can be found in [113],
[52], while a collection of recent contributions can
The MINLP applications in the area of process
be found in the proceedings of the 1998 FOCAPO
design include:
meeting.
i) reactive distillation processes [26]; MINLP applications received significant atten-
ii) design of dynamic systems [14], [11], [117], tion in other engineering disciplines. These include
[118];
i) the facility location in a multi-attribute space
iii) plant layout systems [105], [47]; and [45];
iv) environmentally benign systems [27], [123]. ii) the optimal unit allocation in an electric
The MINLP applications in the area of process power system [16];
synthesis and design under uncertainty include: iii) the facility planning of an electric power gen-
i) deterministic and stochastic uncertainty eration [19], [114];
analysis [51], [1], [33]; iv) the chip layout and compaction [32];

402
Mixed integer nonlinear programming

v) the topology optimization of transportation min ] ( x , y)


networks [60]; and x,y
s,t, h(x, y) - 0
vi) the optimal scheduling of thermal generating (2)
g(x, y) < 0
units [48].
xEXCR n

y e
M a t h e m a t i c a l D e s c r i p t i o n . The general alge-
braic MINLP formulation can be stated as: where y now is a vector of q 0-1 variables (e.g., ex-
istence of a process unit (yi - 1) or nonexistence
min f(x, y) - 0)).
x,y
s,t, h(x, y) - 0 Challenges in M I N L P . Dealing with mixed integer
g(x, y) _~ 0 (1) nonlinear optimization models of the form (1) or
xcXCR n (2) present two major challenges. These difficul-
y EY integer. ties are associated with the nature of the problem,
namely, the combinatorial domain (y-domain) and
Here x represents a vector of n continuous vari- the continuous domain (x-domain).
ables (e.g., flows, pressures, compositions, temper- As the number of binary variables y in (2) in-
atures, sizes of units), and y is a vector of inte- crease, one is faced with a large combinatorial
ger variables (e.g., alternative solvents or mate- problem, and the complexity analysis results char-
rials); h ( x , y ) - 0 denote the m equality con- acterize MINLP problems as NP-complete [88].
straints (e.g., mass, energy balances, equilibrium At the same time, due to the nonlinearities the
relationships); g ( x , y ) < 0 are the p inequality MINLP problems are in general nonconvex which
constraints (e.g., specifications on purity of distil- implies the potential existence of multiple local so-
lation products, environmental regulations, feasi- lutions. The determination of a global solution of
bility constraints in heat recovery systems, logical the nonconvex M I N L P problems is also NP-hard,
constraints); and f ( x , y ) is the objective function since even the global optimization of constrained
(e.g., annualized total cost, profit, thermodynamic nonlinear programming problems can be NP-hard
criteria). [100], and even quadratic problems with one neg-
ative eigenvalue are NP-hard [101]. An excellent
REMARK 1 The integer variables y with given
book on complexity issues for nonlinear optimiza-
lower and upper bounds
tion is [129].
yL < y < y U Despite the aforementioned discouraging results
from complexity analysis, which are worst-case re-
can be expressed through 0-1 variables (i.e., bi- sults, significant progress has been achieved in the
nary), denoted as z, by the following formula: MINLP area from the theoretical, algorithmic, and
computational perspective. As a result, several al-
y _ y L + Zl + 2z2 + 4z3 + . . . 4- 2 N - I Z N , gorithms have been proposed for convex and non-
convex MINLP models, their convergence proper-
where N is the minimum number of 0-1 variables
ties have been investigated, and a large number of
needed. This minimum number is given by:
applications now exist that cross the boundaries
log (yU _ yL) of several disciplines. In the sequel, we will discuss
log 2 S these developments.

where the INT function truncates its real argu-


Overview of L o c a l Optimization Ap-
ment to an integer value. [:2]
p r o a c h e s for C o n v e x M I N L P M o d e l s . A rep-
Then, formulation (1) can be written in terms resentative collection of local M I N L P algorithms
of 0-1 variables" developed for solving convex M I N L P models of

403
Mixed integer nonlinear programming

the form (1) or restricted classes of (2) includes The generalized cross decomposition, GCD, si-
the following: multaneously utilizes primal and dual information
by exploiting the advantages of Dantzig-Wolfe and
1) generalized Benders decomposition, GBD,
generalized Benders decomposition.
[46], [103], [42];
An overview of these local MINLP algorithms
2) outer approximation, OA, [34];
and extensive theoretical, algorithmic, and appli-
3) outer approximation with equality relax- cations of GBD, OA, OA/ER, OA/ER/AP, GOA,
ation, OA/ER, [67]; and GCD algorithms can be found in [40].
4) outer approximation with equality relaxation The branch and bound, B B, approaches start
and augmented penalty, OA/ER/AP, [131]; by solving the continuous relaxation of the MINLP
5) generalized outer approximation, GOA, [38]; and subsequently perform an implicit enumeration
where a subset of the 0-1 variables is fixed at each
6) generalized cross decomposition, GCD, [61];
node. The lower bound corresponds to the NLP
7) branch and bound, BB, [15], [55], [92], [20], solution at each node and it is used to expand on
[110], [39]; the node with the lowest lower bound or it is used
8) feasibility approach, FA, [82]; to eliminate nodes if the lower bound exceeds the
9) extended cutting plane, ECP, [135], [134]; current upper bound. If the continuous relaxation,
NLP in most cases with the exception of the algo-
10) logic-ba [124], [130]. rithm of [110] where an LP problem is obtained, of
In the pioneering work [46] on the generalized the MINLP has a 0-1 solution for the y variables,
benders decomposition, GBD, two sequences of then the BB algorithm will terminate at that node.
updated upper (nonincreasing) and lower (nonde- With a similar argument, if a tight NLP relaxation
creasing) bounds are created that converge within results in the first node of the tree, then the num-
e in a finite number of iterations. The upper ber of nodes that would need to be eliminated can
bounds correspond to solving subproblems in the x be low. However, loose NLP relaxations may result
variables by fixing the y variables, while the lower in having a large number of NLP subproblems to
bounds are based on duality theory. be solved. The algorithm terminates when the low-
The outer approximation, OA, addresses prob- est lower bound is within a prespecified tolerance
lems with nonlinear inequalities, and creates se- of the best upper bound.
quences of upper and lower bounds as the GBD, The feasibility approach, FA, rounds the relaxed
but it has the distinct feature of using primal in- NLP solution to an integer solution with the least
formation, that is the solution of the upper bound local degradation by successively forcing the su-
problems, so as to linearize the objective and con- perbasic variables to become nonbasic based on
straints around that point. The lower bounds in the reduced cost information. The premise of this
OA are based upon the accumulation of the lin- approach is that the problems to be treated are
earized objective function and constraints, around sufficiently large so that techniques requiring the
the generated primal solution points. solution of several NLP relaxations, such as the
The OA/ER algorithm extends the OA to han- branch and bound approach, have prohibitively
dle nonlinear equality constraints by relaxing them large costs. They therefore wish to account for the
into inequalities according to the sign of their as- presence of the integer variables in the formulation
sociated multipliers. and solve the mixed integer problem directly. This
The O A / E R / A P algorithm introduces an aug- is achieved by fixing most of the integer variables
mented penalty function in the lower bound sub- to one of their bounds (the nonbasic variables)
problems of the O A / E R approach. and allowing the remaining small subset (the basic
The generalized outer approximation, GOA, ex- variables) to take discrete values in order to iden-
tends the OA to the MINLP problems that the tify feasible solutions. After each iteration, the re-
GBD addresses and introduces exact penalty func- duced costs of the variables in the nonbasic set are
tions. computed to measure their effect on the objective

404
Mixed integer nonlinear programming

function. If a change causes the objective function [130] introduced LOGMIP, a computer code for
to decrease, the appropriate variables are removed disjunctive programming and MINLP problems,
from the nonbasic set and allowed to vary for the and studied modeling alternatives and process syn-
next iteration. When no more improvement in the thesis applications.
objective function is possible, the algorithm is ter-
minated. This strategy leads to the identification O v e r v i e w of G l o b a l O p t i m i z a t i o n Ap-
of a local solution. p r o a c h e s for N o n c o n v e x M I N L P M o d e l s . In
The cutting plane algorithm proposed in [66] the previous Section we discussed local MINLP al-
for NLP problems has been extended to MINLPs gorithms which are applicable to convex MINLP
[135], [134]. The ECP algorithm relies on the lin- models. While identification of the global solution
earization of one of the nonlinear constraints at for convex problems can be guaranteed, a local so-
each iteration and the solution of the increasingly lution is often obtained for nonconvex problems.
tight MILP made up of these linearizations. The The recent book by [41] discusses the theoretical,
solution of the MILP problem provides a new point algorithmic and applications oriented advances in
on which to base the choice of the constraint to be the global optimization of mixed integer nonlin-
linearized for the next iteration of the algorithm. ear models. A number of global MINLP algorithms
The ECP does not require the solution of any NLP that have been developed to address different types
problems for the generation of an upper bound. of nonconvex MINLPs are presented in this sec-
As a result, a large number of linearizations are tion. These include:
required for the approximation of highly nonlinear
problems and the algorithm does not perform well 1) Branch and reduce approach, [115];
in such cases. Due to the use of linearizations, con- 2) interval analysis based approach, [125];
vergence to the global optimum solution is guar-
3) extended cutting plane approach, [135], [136];
anteed only for problems involving inequality con-
straints which are convex in the x and relaxed y- 4) re]ormulation/spatial branch and bound ap-
space. proach, [121], [122];
An alternative to the direct solution of the 5) hybrid branch and bound and outer approxi-
MINLP problem was proposed by [124]. Their ap- mation approach, [142], [141];
proach stems from the work of [70] on a model-
6) The SMIN-aBB approach, [2], [4];
ing/decomposition strategy which avoids the zero-
flows generated by the nonexistence of a unit in a 7) The GMIN-aBB approach, [2], [4].
process network. The first stage of the algorithm
In the sequel, we will briefly discuss the ap-
is the reformulation of the MINLP into a gener-
proaches 1)-7).
alized disjunctive program. A vector of Boolean
variables indicate the status of a disjunction (True Branch and Reduce algorithm. [115] extended the
or False) and are associated with the alternatives. scope of branch and bound algorithms to prob-
The set of disjunctions allows the representation lems for which valid convex underestimating NLPs
of several alternatives. A set of logical relation- can be constructed for the nonconvex relaxations.
ships between the Boolean variables is introduced. The range of application of the proposed algo-
Instead of resorting to binary variables within a rithm encompasses bilinear problems and separa-
single model, the disjunctions are used to gener- ble problems involving functions for which convex
ate a different model for each alternative. Since all underestimators can be built [83], [10]. Because the
continuous variables associated with the nonexist- nonconvex NLPs must be underestimated at each
ing alternatives are set to zero, this representa- node, convergence can only be achieved if the con-
tion helps to reduce the size of the problems to tinuous variables are branched on. A number of
be solved. Two algorithms are suggested by [124]. tests are suggested to accelerate the reduction of
They are logic-based variants of the outer approx- the solution space. They are summarized in the
imation and generalized Benders decomposition. following.

405
Mixed integer nonlinear programming

Optimality Based Range Reduction Tests. For the are not obtained through optimization. Instead,
first set of tests, an upper bound U on the non- they are based on the range of the objective func-
convex MINLP must be computed and a convex tion in the domain under consideration, as com-
lower bounding NLP must be solved to obtain a puted with interval arithmetic. As a consequence,
lower bound L. If a bound constraint for variable these bounds may be quite loose and efficient fath-
x/, with x L < x/ ~ x U, is active at the solution oming techniques are required in order to enhance
of the convex NLP and has multiplier )~ > 0, the convergence. [125] suggested node fathoming tests
bounds on x / c a n be updated as follows: and branching strategies which are outlined in the
sequel. Convergence is declared when best upper
1) If xi - x U - 0 at the solution of the convex
and lower bounds are within a prespecified toler-
NLP and ~i - x U - ( U - L ) / ) ~ is such that
~i > x L, then x L - ~i. ance and when the width of the corresponding re-
gion is below a prespecified tolerance.
2) If x / - x L - 0 at the solution of the convex
NLP and ni - x L + ( U - L ) / ) ~ is such that Node Fathoming Tests. The upper-bound test is a
~ / < x U, then x U - ~/. classical criterion used in all branch and b o u n d
schemes: If the lower bound for a node is greater
If neither bound constraint is active at the so-
than the best upper bound for the MINLP, the
lution of the convex NLP for some variable xj,
node can be fathomed.
the problem can be solved by setting xj - x v or
xj - x jL. Tests similar to those presented above The in feasibility test is also used by all branch
are then used to update the bounds on xj. and bound algorithms. However, the identifica-
tion of infeasibility using interval arithmetic differs
Feasibility Based Range Reduction Tests. In addi- from its identification using optimization schemes.
tion to ensuring that tight bounds are available An inequality constraint g / ( x , y ) < 0 is declared
for the variables, the underestimators of the con- infeasible if its interval inclusion over the current
straints are used to generate new constraints for domain, is positive. If a constraint is found to be
the problem. Consider the constraint gi(x, y) < 0. infeasible, the current node is fathomed.
If its underestimating function gi(x, y) - 0 at the The monotonicity test is used in interval-based
solution of the convex NLP and its multiplier is approaches. If a region is feasible, the monotonicity
#~ > 0, the constraint properties of the objective function can be tested.
U-L For this purpose, the inclusions of the gradients of
g (x, y ) >
- #; the objective with respect to each variable are eval-
uated. If all the gradients have a constant sign for
can be included in subsequent problems.
the current region, the objective function is mono-
The branch and reduce algorithm has been tonic and only one point needs to be retained from
tested on a set of small problems. the current node.
Interval Analysis Based Approach. An approach The nonconvexity test is used to test the exis-
based on interval analysis was proposed by [125] tence of a solution (local or global) within a region.
to solve to global optimality problems with a If such a point exists, the Hessian matrix of the
twice-differentiable objective function and once- objective function at this point must be positive
differentiable constraints. Interval arithmetic al- semidefinite. A sufficient condition is the nonneg-
lows the computation of guaranteed ranges for ativity of at least one of the diagonal elements of
these functions [87], [112], [89]. The approach re- its interval Hessian matrix.
lies on the same concepts of successive partition- [125] suggested two additional tests to acceler-
ing of the domain and bounding of the objective ate the fathoming process. The first is denoted
function, while the branching takes place on the as lower bound test. It requires the computation
discrete and continuous variables. The main differ- of a valid lower bound on the objective function
ence with the branch and bound algorithms is that through a method other t h a n interval arithmetic.
bounds on the problem solution in a given domain If the upper bound at a node is less t h a n this lower

406
Mixed integer nonlinear programming

bound, the region can be eliminated. The second 3) If constraint i is such that gi(xk,y k) > 0, add
test, the distrust region method, aims to help the its linearization around (x k, yk).
algorithm identify infeasible regions so that they
The convergence criterion is also modified. In
can be removed from consideration. Based on the
addition to the test used in Step 3, the following
knowledge of an infeasible point, interval arith-
two conditions must be met:
metic is used to identify an infeasible hypercube
centered on that point. 1) (x k - x k - 1 ) T ( x k _ x k - 1 ) _~ 5, & pre-specified
tolerance.
Branching Strategies. The variable with the widest
2) yk _ yk-1 _ 0.
range is selected for branching. It can be a contin-
uous or a discrete variable. In order to determine The ECP algorithm for pseudoconvex MINLPs
where to split the chosen variable, a relaxation of has been used to address a trim loss problem aris-
the MINLP is solved locally. ing in the paper industry [136]. A comparative
study between the outer approximation, the gen-
• Continuous Branching Variable: If the opti-
eralized Benders decomposition and the extended
mal value of the continuous branching vari-
cutting plane algorithm for convex MINLPs was
able, x*, is equal to one of the variable
presented in [120].
bounds, branch at the midpoint of the in-
terval. Otherwise, branch at x * - ~ , where Reformulation/Spatial Branch and Bound Algo-
is a very small scalar. rithm. A global optimization algorithm of the
• Discrete Branching Variable" If the optimal branch and bound type was proposed in [121]. It
value of the discrete branching variable, y*, can be applied to problems in which the objective
is equal to the upper bound on the variable, and constraints are functions involving any combi-
define a region with y - y* and one with nation of binary arithmetic operations (addition,
yL _< y < y . _ 1, where yL is the lower subtraction, multiplication and division) and func-
bound on y. Otherwise, create two regions tions that are either concave over the entire solu-
yL <_y < int(y*) and int(y*) + 1 < y < yU, tion space (such as ln) or convex over this domain
where yV is the upper bound on y. (such as exp).
The algorithm starts with an automatic refor-
This algorithm has been tested on a small example
mulation of the original nonlinear problem into a
problem and a molecular design problem [125].
problem that involves only linear, bilinear, linear
Extended Cutting Plane for Pseudoconvex fractional, simple exponentiation, univariate con-
MINLPs. The use of the ECP algorithm for non- cave and univariate convex terms. This is achieved
convex MINLP problems-was suggested in [135], through the introduction of new constraints and
using a modified algorithmic procedure as de- variables. The reformulated problem is then solved
scribed in [136]. The main changes occur in the to global optimality using a branch and bound ap-
generation of new constraints for the MILP at each proach. Its special structure allows the construc-
iteration (Step 4). In addition to the construction tion of a convex relaxation at each node of the tree.
of the linear function lk(x, y) at iteration k, the It should be noted that due to the introduction
following steps are taken: of many new constraints and variables the size of
the convex relaxation of the reformulated problem
1) Remove all constraints for which li(x k, yk) >
increases substantially even for modest size prob-
gji( xk,yk)" These correspond to lineariza-
lems. The integer variables can be handled in two
tions which did not underestimate the cor-
ways during the generation of the convex lower
responding nonlinear constraint at all points
bounding problem. The integrality condition on
due to the presence of nonconvexities.
the variables can be relaxed to yield a convex NLP
2) Replace all constraints for w h i c h li(x k, y k ) _ which can then be solved globally. Alternatively,
gji (xk, yk) _ 0 by their linearization around the integer variables can be treated directly and
(xk, yk). the convex lower bounding MINLP can be solved

407
Mixed integer nonlinear programming

using a branch and bound algorithm. This second tinuous variables and/or appear in at most bilin-
approach is more computationally intensive but ear terms, while nonlinear terms in the continuous
is likely to result in tighter lower bounds on the variables appear separably from the binary/integer
global optimum solution. variables. These mathematical models become:
In order to obtain an upper bound on the op- rain f ( x ) + x TA0y + c~y
x~y
timum solution, a local MINLP algorithm can
be used. Alternatively, the MINLP can be trans- s.t. h(x) + x TAly + c~y - 0
formed to an equivalent nonconvex NLP by relax- g(x) + x TA2y + c~y _< 0 (3)
ing the integer variables. For example, a variable xEXCR n
y C {0, 1} can be replaced by a continuous variable y EY integer,
z E [0, 1] by including the constraint z - z. z = 0.
This algorithm has been applied to reactor se- where c~-, c~ and c~ are constant vectors, A0, A1
lection, distillation column design, nuclear waste and A2 are constant matrices and f(x), h(x) and
blending, heat exchanger network design and mul- g(x) are functions with continuous second order
tilevel pump configuration problems. derivatives.
The theoretical, algorithmic and computational
Hybrid Branch and Bound and Outer Approxima- studies of the SMIN-aBB algorithm are presented
tion. [142] proposed a global optimization MINLP in detail in [41].
approach for the synthesis of heat exchanger net-
works without stream splitting. This approach is The GMIN-aBB Algorithm. The GMIN-aBB
a hybrid branch and bound with outer approxima- global optimization algorithm proposed in [2] op-
tion. It is based on two alternative convex underes- erates within a branch and bound framework. The
timators for the heat transfer area. The first type main difference with the algorithms of [56], [92]
of these convex underestimators along with the and [20] is its ability to identify the global opti-
variable bounds and techniques for the bound con- mum solution of a much larger class of problems
traction are based on a thermodynamic analysis. of the form
The second type is based on a relaxation and trans- rain f(x, y)
x~y
formation so as to employ specific underestimation
schemes. These convex underestimators result in s.t. h(x, y ) = 0
a convex MINLP that is solved using the Outer g(x, y) _~ 0
Approximation approach and which provides valid xEXCR n
lower bounds on the global solution. This approach y E N q,
has been applied to five heat exchanger network
examples that employ the MINLP model of [138] where N is the set of nonnegative integers and the
that contains linear constraints and nonconvex ob- only condition imposed on the functions f ( x , y ) ,
jective function. g(x, y) and h(x, y) is that their continuous relax-
ations possess continuous second order derivatives.
[141] introduced a deterministic branch and con-
This increased applicability results from the use of
tract approach for structured process systems that
the aBB global optimization algorithm for contin-
have univariate concave, bilinear and linear frac-
uous twice-differentiable NLPs [12], [6], [5], [3].
tional terms. They proposed properties of the con-
The theoretical, algorithmic and computational
traction operation and studied their effect on sev-
studies of the GMIN-aBB Algorithm are presented
eral applications.
in detail in [41].
The SMIN-aBB Algorithm. The SMIN-aBB See also: C o m p l e x i t y t h e o r y : Q u a d r a t i c
global optimization algorithm, proposed by [2] is programming; Complexity of d e g e n e r -
designed to solve to global optimality mathemati- acy; C o m p l e x i t y classes in o p t i m i z a -
cal models where the binary/integer variables ap- tion; I n f o r m a t i o n - b a s e d c o m p l e x i t y a n d
pear linearly and hence separably from the con- information-based optimization; Fractional

408
Mixed integer nonlinear programming

combinatorial optimization; Complexity of [6] ADJIMAN, C.S., AND FLOUDAS, C.A.: 'Rigorous con-
gradients, Jacobians~ and Hessians; Com- vex underestimators for general twice-differentiable
problems', J. Global Optim. 9 (1996), 23-40.
plexity theory; Computational complex-
[7] ADJIMAN, C.S., SCHWEIGER, C.A., AND FLOUDAS,
ity theory; Parallel computing: Complexity C.A.: 'Mixed-integer nonlinear optimization in pro-
classes; K o l m o g o r o v c o m p l e x i t y ; G l o b a l o p - cess synthesis', in D.-Z. Du AND P.M. PARDA-
t i m i z a t i o n in t h e a n a l y s i s a n d m a n a g e m e n t LOS (eds.): Handbook Combinatorial Optim., Kluwer
of environmental systems; Interval global Acad. Publ., 1998.
[8] AGGARWAL, A., AND FLOUDA'S, C.A.: 'Synthesis of
optimization; Continuous global optimiza-
general distillation sequences- Nonsharp separa-
tion: Applications; Chemical process plan- tions', Computers Chem. Engin. 14, no. 6 (1990), 631.
ning; Mixed integer linear programming: [9] AGGARWAL, A., AND FLOUDAS, C.A.: 'Synthesis
Mass and heat exchanger networks; Gen- of heat integrated nonsharp distillation sequences',
eralized Benders decomposition; MINLP: Computers Chem. Engin. 16 (1992), 89.
Outer approximation algorithm; General- [10] AL-KHAYYAL,F.A.: 'Jointly constrained bilinear pro-
grams and related problems: An overview', Comput.
ized outer approximation; MINLP: Gener-
Math. Appl. 19 (1990), 53.
alized cross decomposition; Extended cut- [11] ALLGOR, R.I., AND BARTON, P.I.: 'Mixed integer
ting plane algorithm; MINLP: Logic-based dynamic optimization', Computers Chem. Engin. 21
methods; MINLP: Branch and bound meth- (1997), $451-$456.
ods; M I N L P : B r a n c h a n d b o u n d g l o b a l o p t i - [12] ANDROULAKIS,I.P., MARANAS, C.D., AND FLOUDAS,
C.A.: 'aBB: A global optimization method for general
mization algorithm; MINLP: Global optimi-
constrained nonconvex problems', J. Global Optim. 7
zation with aBB; MINLP: Heat exchanger
(1995), 337-363.
network synthesis; MINLP: R e a c t i v e dis- [13] BARBOSA-POVOA, A.P., AND MACCHIETTO, S.: 'De-
tillation column synthesis; MINLP: Design tailed design of multipurpose batch plants', Comput-
and scheduling of batch processes; MINLP: ers Chem. Engin. 18, no. 11-12 (1994), 1013-1042.
A p p l i c a t i o n s in t h e i n t e r a c t i o n o f d e s i g n [14] BARTON, P.I., ALLGOR, R.J., AND FEEHERY, W.F.:
'Dynamic optimization in a discontinuous world', I-
a n d c o n t r o l ; M I N L P : A p p l i c a t i o n in f a c i l i t y
EC Res. 37, no. 3 (1998), 966-981.
location-allocation; MINLP: Applications [15] BEALE, E.M.L.: 'Integer programming': The State of
in b l e n d i n g a n d p o o l i n g p r o b l e m s ; M I N L P : the Art in Numerical Analysis, Acad. Press, 1977,
Trim-loss problem. pp. 409-448.
[16] BERTSEKAS, D.L., LOWER, G.S., SANDELL, N.R.,
AND POSBERGH, T.A.: 'Optimal short term schedul-
References ing of large-scale power systems', IEEE Trans. A u-
[1] ACEVEDO, J., AND PISTIKOPOULOS, E.N.: 'A para- tom. Control AC-28 (1983), 1.
metric MINLP algorithm for process synthesis prob- [17] BIREWAR, D.B., AND GROSSMANN, I.E.: 'Incorporat-
lems under uncertainty', I-EC Res. 35, no. 1 (1996), ing scheduling in the optimal design of multiproduct
147-158. batch plants', Computers Chem. Engin. 13 (1989),
[2] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS, 141-161.
C.A.: 'Global optimization of MINLP problems in [18] BIREWAR, D.B., AND GROSSMANN, I.E.: 'Simultane-
process synthesis and design', Computers Chem. En- ous synthesis, sizing and scheduling of multiproduct
gin. Suppl. 21 (1997), $445-$450. batch plants', Computers Chem. Engin. 29, no. 11
[3] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS, (1990), 2242.
C.A.: 'A global optimization method, aBB, for gen- [19] BLOOM, J.A.: 'Solving an electricity generating ca-
eral twice-differentiable N L P s - II. Implementation pacity expansion planning problem by generalized
and computational results', Computers Chem. Engin. benders' decomposition', Oper. Res. 31, no. 5 (1983),
22, no. 9 (1998), 1159-1179. 84.
[4] ADJIMAN, C.S., ANDROuLAKIS, I.P., AND FLOUDAS, [20] BORCHERS, B., AND MITCHELL, J.E.: 'An improved
C.A.: 'Global optimization of mixed-integer nonlinear branch and bound algorithm for mixed integer nonlin-
problems', AIChE J. 46 (2000), 1769-1797. ear programs', Techn. Report Renssellaer Polytechnic
[5] ADJIMAN, C.S., DALLWIG, S., FLOUDAS, C.A., AND Inst., no. 200 (1991).
NEUMAIER, A.: 'A global optimization method, aBB, [21] CAMARDA, K.V., AND MARANAS, C.D.: 'Optimiza-
for general twice-differentiable NLPs- I. Theoretical tion in polymer design using connectivity indices', I-
advances', Computers Chem. Engin. 22, no. 9 (1998), EC Res. 38 (1999), 1884-1892.
1137-1158.

409
Mixed integer nonlinear programming

[22] CHURI, N., AND ACHENIE, L.E.K.: 'Novel mathemat- Computers Chem. Engin. 15, no. 12 (1991), 843.
ical programming model for computer aided molecular [38] FLETCHER, R., AND LEYFFER, S.: 'Solving mixed in-
design', Industr. Engin. Chem. Res. 35, no. 10 (1996), teger nonlinear programs by outer approximation',
3788-3794. Math. Program. 66, no. 3 (1994), 327-349.
[23] CHURI, N., AND ACHENIE, L.E.K.: 'The optimal [39] FLETCHER, R., AND LEYFFER, S.: 'Numerical experi-
design of refrigerant mixtures for a two-evaporator ence with lower bounds for MIQP branch and bound',
refrigeration system', Computers Chem. Engin. 21, SIAM J. Optim. 8, no. 2 (1998), 604-616.
no. 13 (1997), 349-354. [40] FLOUDAS, C.A.: Nonlinear and mixed integer optimi-
[24] CmIc, A.R., AND FLOUDAS, C.A.: 'A retrofit ap- zation: Fundamentals and applications, Oxford Univ.
proach of heat exchanger networks', Computers Press, 1995.
Chem. Engin. 13, no. 6 (1989), 703. [41] FLOUDAS, C.A.: Deterministic global optimization:
[25] CIRIC, A.R., AND FLOUDAS, C.A.: 'A mixed-integer Theory, methods and applications, Nonconvex Optim.
nonlinear programming model for retrofitting heat ex- Appl. Kluwer Acad. Publ., 2000.
changer networks', I-EC Res. 29 (1990), 239. [42] FLOUDAS, C.A., AGGARWAL, A., AND CIRIC, A.R.:
[26] CmIc, A.R., AND GU, D.Y.: 'Synthesis of nonequi- 'Global optimum search for nonconvex NLP and
librium reactive distillation processes by MINLP op- MINLP problems', Computers Chem. Engin. 13,
timization', AIChE J. 40, no. 9 (1994), 1479-1487. no. 10 (1989), 1117-1132.
[27] CIRIC, A.R., AND HUCHETTE, S.G.: 'Multiobjective [43] FLOUDAS, C.A., AND CIRIC, A.R.: 'Strategies for
optimization approach to sensitivity analysis waste overcoming uncertainties in heat exchanger network
treatment costs in discrete process synthesis and op- synthesis', Computers Chem. Engin. 13, no. 10
timization problems', I-EC Res. 32, no. 11 (1993), (1989), 1133.
2636-2646. [44] FLOUDAS, C.A., AND GROSSMANN, I.E.: 'Algorith-
[28] DAICHENDT, M.M., AND GROSSMANN, I.E.: 'Prelimi- mic approaches to process synthesis: logic and global
nary screening for the MINLP synthesis of process sys- optimization': FOCAPD'9,{, Vol. 91 of AIChE Syrup.
tems: I. Aggregation techniques', Computers Chem. Ser., 1995, pp. 198-221.
Engin. 18 (1994), 663. [45] GANISH, B., HORSKY, D., AND SRIKANTH, K.: ~An
[29] DAICHENDT,M.M., AND GROSSMANN, I.E.: 'Prelimi- approach to optimal positioning of a new product',
nary screening for the MINLP synthesis of process sys- Managem. Sci. 29 (1983), 1277.
tems: II. Heat exchanger networks', Computers Chem. [46] GEOFFRION, A.M.: 'Generalized Benders decomposi-
Engin. 18 (1994), 679. tion', J. Optim. Th. Appl. 10 (1972), 237-260.
[30] DEAN, J.P., AND DERVAKOS, G.A.: 'Design of [47] GEORGIADIS, M.C., ROTSTEIN, G.E., AND MACCHI-
process-compatible biological agents', Computers ETTO, S.: 'Optimal layout design in multipurpose
Chem. Engin. Suppl. A 20 (1996), $67-$72. batch plants', Industr. Engin. Chem. Res. 36, no. 11
[31] DIMITRIADIS, V.D., AND PISTIKOPOULOS, E.N.: (1997), 4852-4863.
'Flexibility analysis of dynamic systems', I-EC Res. [48] GEROMEL, J.C., AND BELLONI, M.R.: 'Nonlinear
34, no. 12 (1995), 4451-4462. programs with complicating variables: theoretical
[32] DORNEIGH, M.C., AND SAHINIDIS, N.Y.: 'Global analysis and numerical experience', IEEE Trans.
optimization algorithms for chip layout and com- Syst., Man Cybern. S M C - 1 6 (1986), 231.
paction', Engin. Optim. 25, no. 2 (1995), 131-154. [49] GROSSMANN, I.E.: 'MINLP optimization strategies
[33] DUA, V., AND PISTIKOPOULOS, E.N.: 'Optimization and algorithms for process synthesis': Proc. 3rd. Inter-
techniques for process synthesis and material de- nat. Conf. on Foundations of Computer-Aided Process
sign under uncertainty', Chem. Engin. Res. Des. 76, Design, 1990, p. 105.
no. A3 (1998), 408-416. [50] GROSSMANN, I.E.: 'Mixed integer optimization tech-
[34] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer ap- niques for algorithmic process synthesis', in J.L. AN-
proximation algorithm for a class of mixed-integer DERSON (ed.): Advances In Chemical Engineering,
nonlinear programs', Math. Program. 36 (1986), 307. Process Synthesis, Vol. 23, Acad. Press, 1996, pp. 171-
[35] DUVEDI, A., AND ACHENIE,L.E.K.: 'On the design of 246.
environmentally benign refrigerant mixtures: A Math- [51] GROSSMANN, I.E., AND FLOUDAS, C.A.: 'Active con-
ematical Programming Approach', Computers Chem. straint strategy for flexibility analysis in chemical pro-
Engin. 21, no. 8 (1997), 915-923. cesses', Computers Chem. Engin. 11, no. 6 (1987),
[36] FAQIR, N.M., AND KARIMI, I.A.: 'Design of mul- 675.
tipurpose batch plants with multiple production [52] GROSsMANN, I.E., QUESADA, I., RAMON, a . , AND
routes': Proc. FOCAPD '89, Snowmass, Colorado, VOUDOURIS, V.T.: 'Mixed-integer optimization tech-
1990, p. 451. niques for the design and scheduling of batch pro-
[37] FLETCHER, R., HALL, J.A.J., AND JOHNS, W.R.: cesses': Proc. NATO Advanced Study Inst. Batch Pro-
'Flexible retrofit design of multiproduct batch plants', cess Systems Engineering, 1992.

410
Mixed integer nonlinear programming

[53] GROSSMANN, I.E., AND SARGENT, R.W.H.: 'Optimal experience with DICOPT solving MINLP problems in
design of multipurpose chemical plants', Industr. En- process systems engineering', Computers Chem. En-
gin. Chem. Process Des. Developm. 18 (1979), 343. gin. 13 (1989), 307.
[54] GUPTA, i . , AND MANOUSIOUTHAKIS, V.: 'Minimum [7o] KocIs, G.R., AND GROSSMANN, I.E.: 'A modelling
utility cost of mass exchange networks with variable and decomposition strategy for the MINLP optimiza-
single component supplies and targets', I-EC Res. 32, tion of process flowsheets', Computers Chem. Engin.
no. 9 (1993), 1937-1950. 13, no. 7 (1989), 797-819.
[55] GUPTA, O.K.: 'Branch and bound experiments in [71] KOKOSSIS, A.C., AND FLOUDAS, C.A.: 'Optimization
nonlinear integer programming', PAD Thesis Purdue of complex reactor networks-I, isothermal operation',
Univ. (1980). Chem. Engin. Sci. 45, no. 3 (1990), 595.
[56] GUPTA, O.K., AND RAVINDRAN, R.: 'Branch and [72] KOKOSSIS, A.C., AND FLOUDAS, C.A.: 'Optimal
bound experiments in convex nonlinear integer pro- synthesis of isothermal reactor-separator-recycle sys-
gramming', Managem. Sci. 31, no. 12 (1985), 1533- tems', Chem. Engin. Sci. 46 (1991), 1361.
1546. [73] KOKOSSIS, A.C., AND FLOUDAS, C.A.: 'Optimization
[57] HARDING, S.T., AND FLOUDAS, C.A.: 'Global opti- of complex reactor networks- II. Nonisothermal op-
mization in multiproduct and multipurpose batch de- eration', Chem. Engin. Sci. 49, no. 7 (1994), 1037.
sign under uncertainty', Industr. Engin. Chem. Res. [74] KOKOSSIS, A.C., AND FLOUDAS, C.A.: 'Stability in
36, no. 5 (1997), 1644-1664. optimal design: Synthesis of complex reactor net-
[ss] HATZIMANIKATIS, V., FLOUDAS, C.A., AND BAILEY, works', AIChE 40, no. 5 (1994), 849-861.
J.E.: 'Analysis and design of metabolic reaction net- [75] KRAVANJA, Z., AND GROSSMANN, I.E.: 'PROSYN-
works via mixed-integer linear optimization', AIChE An MINLP process synthesizer', Computers Chem.
J. 42, no. 5 (1996), 1277-1292. Engin. 14 (1990), 1363.
[59] HATZIMANIKATIS, V., FLOUDAS, C.A., AND BAI- [76] KRAVANJA, Z., AND GROSSMANN, I.E.: 'A compu-
LEY, J.E.: 'Optimization of regulatory architectures tational approach for the modeling/decomposition
in metabolic reaction networks', Biotechnol. and Bio- strategy in the MINLP optimization of process flow-
engin. 52 (1996), 485-500. sheets with implicit models', I-EC Res. 35, no. 6
[6o] HOANG, H.H.: 'Topological optimization of networks: (1996), 2065-2070.
A nonlinear mixed integer model employing gener- [77] LIU, M.L., AND SAHINIDIS, N.Y.: 'Process planning
alized Benders decomposition', IEEE Trans. A utom. in a fuzzy environment', Europ. J. Oper. Res. 100,
Control AC-27 (1982), 164. no. 1 (1997), 142-169.
[61] HOLMBERG, K.: 'On the convergence of the cross de- ITs] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the
composition', Math. Program. 47 (1990), 269. interaction of design and control, Part 1: A multiob-
[62] IERAPETRITOU, M.G., AND PISTIKOPOULOS, E.N.: jective framework and application to binary distilla-
'Simultaneous incorporation of flexibility and eco- tion synthesis', Computers Chem. Engin. 18, no. 10
nomic risk in operational planning under uncertainty', (1994), 933-969.
Computers Chem. Engin. 18, no. 3 (1994), 163-189. [79] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the
[63] IERAPETRITOU, M.G., AND PISTIKOPOULOS, E.N.: interaction of design and control, Part 2: Reactor-
'Batch plant design and operations under uncer- separator-recycle system', Computers Chem. Engin.
tainty', Industr. Engin. Chem. Res. 35, no. 2 (1996), 18, no. 10 (1994), 971-994.
772-787. [so] MARANAS, C.D.: 'Optimal computer-aided molecular
[64] IERAPETRITOU, M.G., PISTIKOPOULOS, E.N., AND design: A polymer design case study', Industr. Engin.
FLOUDAS, C.A.: 'Operational planning under uncer- Chem. Res. 35, no. 10 (1996), 3403-3414.
tainty', Computers Chem. Engin. 20, no. 12 (1996), [Sl] MARANAS, C.D.: 'Optimal molecular design under
1499-1516. property prediction uncertainty', AIChE J. 43, no. 5
[65] KALITVENTZEFF, B., AND MARECHAL, F.: 'The man- (1997), 1250-1264.
agement of a utility network': Process Systems Engi- Is2] MAWENGKANG, H., AND MURTAGH, B.A.: 'Solving
neering. PSE '88, Sydney, Australia, 1988, p. 223. nonlinear integer programs with large scale optimiza-
[66] KELLEY, J.E.: 'The cutting plane method for solving tion software', Ann. Oper. Res. 5 (1986), 425.
convex programs', J. SIAM 8, no. 4 (1960), 703-712. [s3] MCCORMICK, G.P.: 'Computability of global solu-
[67] KocIs, G.R., AND GROSsMANN, I.E.: 'Relaxation tions to factorable nonconvex programs: Part I - Con-
strategy for the structural optimization of process flow vex underestimating problems', Math. Program. 10
sheets', I-EC Res. 26, no. 9 (1987), 1869. (1976), 147-175.
[6s] KocIs, G.R., AND GROSsMANN, I.E.: 'Global opti- Is4] MOCKUS, L., AND REKLAITIS, G.V.: ' i new global
mization of nonconvex MINLP problems in process optimization algorithm for batch process scheduling',
synthesis', I-EC Res. 27, no. 8 (1988), 1407. in C.A. FLOUDAS AND P.M. PARDALOS (eds.): State
[69] KocIs, G.R., AND GROSsMANN, I.E.: 'Computational of the Art In Global Optimization, Kluwer Acad.

411
Mixed integer nonlinear programming

Publ., 1996, pp. 521-538. is NP-Hard', Oper. Res. Left. 7, no. 1 (1988), 33.
[85] MOHIDEEN, M.J., PERKINS, J.D., AND PISTIKOPOU- [101] PARDALOS, P.M., AND VAVASIS, S.A.: 'Quadratic
LOS, E.N.: 'Optimal design of dynamic systems under programming with one negative eigenvalue is NP-
uncertainty', AIChE J. 42, no. 8 (1996), 2251-2272. hard', J. Global Optim. 1 (1991), 15.
[86] MOHIDEEN, M.J., PERKINS, J.D., AND PISTIKOPOU- [102] PAULES IV, G.E., AND FLOUDAS, C.A.: 'Synthesis of
LOS, E.N.: 'Robust stability considerations in optimal flexible distillation sequences for multiperiod opera-
design of dynamic systems under uncertainty', J. Pro- tion', Computers Chem. Engin. 12, no. 4 (1988), 267.
cess Control 7, no. 5 (1997), 371-385. [103] PAULES IV, G.E., AND FLOUDAS, C.A.: 'APROS:
[87] MOORE, R.E.: Methods and applications of interval Algorithmic development methodology for discrete-
analysis, SIAM, 1979. continuous optimization problems', Oper. Res. 37,
[88] NEMHAUSER, G.L., AND WOLSEY, L.A.: Integer and no. 6 (1989), 902.
combinatorial optimization, Interscience Set. Discrete [104] PAULES IV, G.E., AND FLOUDAS, C.A.: 'Stochastic
Math. and Optim. Wiley, 1988. programming in process synthesis: A two-stage model
[89] NEUMAIER, A.: Interval methods .for systems of equa- with MINLP recourse for multiperiod heat-integrated
tions, Encycl. Math. Appl. Cambridge Univ. Press, distillation sequences', Computers Chem. Engin. 16,
1990. no. 3 (1992), 189.
[9o] NOVAK, Z., KRAVANJA, Z., AND GROSSMANN, I.E.: [105] PENTEADO, F.D., AND CIRIC, A.R.: 'An MINLP ap-
'Simultaneous synthesis of distillation sequences in proach for safe process plant layout', I-EC Res. 35,
overall process schemes using an improved MINLP ap- no. 4 (1996), 1354-1361.
proach', Computers Chem. Engin. 20, no. 12 (1996), [106] PETKOV, S.B., AND MARANAS, C.D.: 'Multiperiod
1425-1440. planning and scheduling of multiproduct batch plants
[91] ODELE, O., AND MACCHIETTO, S.: 'Computer aided under demand uncertainty', I-EC Res. 36, no. 11
molecular design: A novel method for optimal solvent (1997), 4864-4881.
selection', Fluid Phase Equilib. 82 (1993), 47. [107] PETKOV, S.B., AND MARANAS, C.D.: 'Quantitative
[92] OSTROVSKY, G.M., OSTROVSKY, M.G., AND assessment of uncertainty in the optimization of meta-
MIKHAILOW, G.W.: 'Discrete optimization of chem- bolic pathways', Biotechnol. and Bioengin. 56, no. 2
ical processes', Computers Chem. Engin. 14 (1990), (1997), 145-161.
111-117. [108] PETKOV, S.B., AND MARANAS, C.D.: 'Design of sin-
[93] PAPAGEORGAKI, S., AND REKLAITIS, G.V.: 'Optimal gle product campaign batch plants under demand un-
design of multipurpose batch plants: 1. Problem for- certainty', AIChE J. 44, no. 4 (1998), 896-911.
mulation', I-EC Res. 29, no. 10 (1990), 2054. [109] PISTIKOPOULOS, E.N., AND IERAPETRITOU, M.G.:
[94] PAPAGEORGAKI, S., AND REKLAITIS, G.V.: 'Optimal 'Novel approach for optimal process design under
design of multipurpose batch plants: 2. A decomposi- uncertainty', Computers Chem. Engin. 19, no. 10
tion solution strategy', I-EC Res. 29, no. 10 (1990), (1995), 1089-1110.
2062. [110] QUESADA, I., AND GROSSMANN, I.E.: 'An LP/NLP
[95] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: based branch and bound algorithm for convex MINLP
'An MINLP retrofit approach for improving the flex- optimization problems', Computers Chem. Engin. 16
ibility of heat exchanger networks', Ann. Oper. Res. (1992), 937-947.
42 (1993), 119. [111] RAMAN, V.S., AND MARANAS, C.D.: 'Optimization
[96] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: in product design with properties correlated with
'Synthesis and retrofit design of operable heat ex- topological indices', Computers Chem. Engin. 45
changer networks: 1. Flexibility and structural con- (1999), 997-1017.
trollability aspects', I-EC Res. 33 (1994), 1718. [112] RATSCHEK, H., AND ROKNE, J.: Computer methods
[97] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: for the range of functions, Ellis Horwood Set. Math.
'Synthesis and retrofit design of operable heat ex- Appl. Halsted Press, 1984.
changer networks: 2. Dynamics and control structure [113] REKLAITIS, G.V.: 'Perspectives on scheduling and
considerations', I-EC Res. 33 (1994), 1738. planning process operations': Proc. ~th. Internat.
[98] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: Syrup. on Process Systems Engineering, Montreal,
'Generalized modular representation framework for Canada, 1991.
process synthesis', AIChE J. 42 (1996), 1010. [114] ROUHANI, R., LASDON, L., LEBOW, W., AND WAR-
[99] PAPALExANDRI, K.P., PISTIKOPOULOS, E.N., AND REN, A.D.: 'A generalized Benders decomposition ap-
FLOUDAS, C.A.: 'Mass exchange networks for waste proach to reactive source planning in power systems',
minimization: A simultaneous approach', Chem. En- Math. Program. Stud. 25 (1985), 62.
gin. Res. Developm. 72 (1994), 279. [115] RYOO, H.S., AND SAHINIDIS, N.Y.: 'Global optimi-
[100] PARDALOS, P.M., AND SCHNITGER, G.: 'Checking lo- zation of nonconvex NLPs and MINLPs with applica-
cal optimality in constrained quadratic programming tions in process design', Computers Chem. Engin. 19,

412
Mixed integer nonlinear programming

no. 5 (1995), 551-566. [130] VECCHIETTI, A., AND GROSSMANN, I.E.: 'LOGMIP:
[116] SAHINIDIS, N.Y., AND GROSSMANN, I.E.: 'Conver- A disjunctive 0-1 nonlinear optimizer for process sys-
gence properties of generalized Benders decomposi- tem models', Computers Chem. Engin. 23 (1999),
tion', Computers Chem. Engin. 15, no. 7 (1991), 481. 555-565.
[117] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'MINOPT: [131] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
A software package for mixed-integer nonlinear opti- bined penalty function and outer-approximation for
mization, user's guide', Manual Computer-Aided Sys- MINLP optimization', Computers Chem. Engin. 14,
tems Lab. Dept. Chemical Engin. Princeton Univ. no. 7 (1990), 769-782.
(1997). [132] WELLONS, M.C., AND REKLAITIS, G.V.: 'Scheduling
[118] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'Interac- of multipurpose batch chemical plants: 1. Formula-
tion of design and control: optimization with dy- tion of single-product campaigns', I-EC Res. 30, no. 4
namic models', in W.W. HAGER AND P.M. PARDA- (1991), 671.
LOS (eds.): Optimal Control: Theory, Algorithms, and [133] WELLONS, M.C., AND REKLAITIS, G.V.: 'Scheduling
Applications, Kluwer Acad. Publ., 1998, pp. 388-435. of multipurpose batch chemical plants: 1. Multiple
[119] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'Optimiza- product campaign formulation and production plan-
tion framework for the synthesis of complex reactor ning', I-EC Res. 30, no. 4 (1991), 688.
networks', I-EC Res. 38 (1999), 744-766. [134] WESTERLUND, W., AND PETTERSSON, F.: 'An ex-
[1201 SKRIFVARS, H., HARJUNKOSKI, I., WESTERLUND, T., tended cutting plane method for solving convex
KRAVANJA, Z., AND PORN, R.: 'Comparison of differ- MINLP problems', Computers Chem. Engin. 19
ent MINLP methods applied on certain chemical en- (1995), 131-136.
gineering problems', Computers Chem. Engin. Suppl. [~as] WESTERLUND, T., PETTERSSON, F., AND GROSS-
20 (1996), $333-$33S. MANN, I.E.: 'Optimization of pump configuration
[121] SMITH, E.M.B., AND PANTELIDES, C.C.: 'Global op- problems as a MINLP problem', Computers Chem.
timisation of general process models', in I.E. GROSS- Engin. 18, no. 9 (1994), 845-858.
MANN (ed.): Global Optimization in Engineering De- [136] WESTERLUND, T., SKRIFVARS, H., HARJUNKOSKI, I.,
sign, Kluwer Acad. Publ., 1996, pp. 355-386. AND P(3RN, R.: 'An extended cutting plane method
[122] SMITH, E.M.B., AND PANTELIDES, C.C.: 'A symbolic for a class of non-convex MINLP problems', Comput-
reformulation/spatial branch and bound algorithm for ers Chem. Engin. 22, no. 3 (1998), 357-365.
the global optimization of nonconvex MINLPs', Com- [137] XIA, Q., AND MACCHIETTO, S.: 'Design and synthesis
puters Chem. Engin. 23 (1999), 457-478. of batch plants: MINLP solution based on a stochastic
[123] STEFANIS, S.K., LIVINGSTON, A.G., AND PIS- method', Computers Chem. Engin. 21 (1997), $697-
TIKOPOULOS, E.N.: 'Environmental impact consid- $702.
erations in the optimal design and scheduling of [138] YEE, T.F., AND GROSSMANN, I.E.: 'Simultaneous op-
batch processes', Computers Chem. Engin. 21, no. 10 timization models for heat integration - II. Heat ex-
(1997), 1073-1094. changer network synthesis', Computers Chem. Engin.
[124] TURKAY, M., AND GROSSMANN, I.E.: 'Logic-based 14, no. 10 (1990), 1165-1184.
MINLP algorithms for the optimal synthesis of pro- [139] YEE, T.F., GROSSMANN, I.E., AND KRAVANJA, Z.:
cess networks', Computers Chem. Engin. 20, no. 8 'Simultaneous optimization models for heat integra-
(1996), 959-978. tion - I. Area and energy targeting and modeling of
[1251 VAIDYANATHAN, R., AND EL-HALWAGI, M.: 'Global multi-stream exchangers', Computers Chem. Engin.
optimization of nonconvex MINLP's by interval anal- 14, no. 10 (1990), 1151-1164.
ysis', in I.E. GROSSMANN (ed.): Global Optimization [140] YEE, T.F., GROSSMANN, I.E., AND KRAVANJA, Z.:
in Engineering Design, Kluwer Acad. Publ., 1996, 'Simultaneous optimization models for heat integra-
pp. 175-193. tion -III. Area and energy targeting and modeling of
[126] VAIDYARAMAN, S., AND MARANAS, C.D.: 'Optimal multi-stream exchangers', Computers Chem. Engin.
synthesis of refrigation cycles and selection of refrig- 14, no. 11 (1990), 1185-1200.
erants', AIChE J. 45 (1999), 997-1017. [141] ZAMORA, J.M., AND GROSSMANN, I.E.: 'Continu-
[127] VASELENAK, J., GROSSMANN, I.E., AND WESTER- ous global optimization of structured process systems
BERG, A.W.: 'An embedding formulation for the op- models', Computers Chem. Engin. 22, no. 12 (1998),
timal scheduling and design of multiproduct batch 1749-1770.
plants', I-EC Res. 26, no. 1 (1987), 139. [142] ZAMOHA, J.M., AND GROSsMANN, I.E.: 'A global
[~28] VASELENAK, J., GROSsMANN, I.E., AND WESTER- MINLP optimization algorithm for the synthesis of
BERG, A.W.: 'Optimal retrofit design of multipurpose heat exchanger networks with no stream splits', Com-
batch plants', I-EC Res. 26, no. 4 (1987), 718. puters Chem. Engin. 22, no. 3 (1998), 367-384.
[129] VAVASIS, S.: Nonlinear optimization: Complexity is- [14a] ZHANG, X., AND SARGENT, R.W.H.: 'The opti-
sues, Oxford Univ. Press, 1991. mal operation of mixed production facilities: Gen-

413
Mixed integer nonlinear programming

eral formulation and some solution approaches': Proc. DEFINITION 1 An algorithmic language describes
5th Int. Symp. Process Systems Engineering, 1994, (explicitly or implicitly) the computation of solv-
pp. 171-177.
ing a problem, that is, 'how' a problem can be pro-
Christodoulos A. Floudas cessed using a machine. The computation consists
Dept. Chemical Engin. Princeton Univ. of a sequence of well-defined instructions which
Princeton, NJ 08544-5263, USA can be executed in a finite time by a Turing ma-
E-mail address: floudas©titan, princeton, e d u chine. The information of a problem which is cap-
tured by an algorithmic language is called algo-
MSC2000: 90Cll, 49M37
rithmic knowledge of the problem. [:]
Key words and phrases: decomposition, outer approxima-
tion, branch and bound, global optimization.
Algorithmic knowledge to describe a problem is
very common in our everyday life ~ one only need
to look at cookery-books, or technical maintenance
MODELING LANGUAGES IN OPTIMIZA- manuals ~ that o n e m a y ask whether the human
TION: A NEW PARADIGM brain is 'predisposed' to preferably present a prob-
lem in describing its solution recipe.
In this paper, modeling languages are identified as
a new computer language paradigm and their ap- However, there exists at least one different way
plications for representing optimization problems to capture knowledge about a problem; it is the
is illustrated by examples. method which describes 'what' the problem is by
defining its properties, rather than saying 'how'
Programming languages can be classified into
to solve it. Mathematically, this can be expressed
three paradigms: imperative, functional, and logic
by a set {x e X : R(x)}, where X is a continu-
programming [14]. The imperative programming
ous or discrete state space and R(x) is a Boolean
paradigm is closely related to the physical way of
relation, defining the properties or the constraints
how (the von Neumann) computer works: Given a
of the problem; x is called the variable(s). A no-
set of memory locations, a program is a sequence
tational system that represents a problem in this
of well defined instructions on retrieving, storing
way is called a declarative language.
and transforming the content of these locations.
The functional paradigm of computation is based DEFINITION 2 A declarative language describes
on the evaluation of functions. Every program can the problem as a set using mathematical vari-
be viewed as a function which translates an input ables and constraints defined over a given state
into a unique output. Functions are first-class val- space. This space can be finite or infinite, count-
ues, that is, they must be viewed as values them- able or noncountable. The information of a prob-
selves. The computational model is based on the A- lem which is captured by a declarative language is
calculus invented by A. Church (1936) as a math- called declarative knowledge of the problem. [::]
ematical formalism for expressing the concept of a
computation. The paradigm of logic programming The declarative representation, in general, does
is based on the insight that a computation can be not give any indication on how to solve the prob-
viewed as a kind of (constructive) proof. Hence, lem. It only states what the problem is. Of course,
a program is a notation for writing logical state- there exists a trivial algorithm to solve a declara-
ments together with specified algorithms for im- tively stated problem, which is to enumerate the
plementing inference rules. state space and to check whether a given x E X vi-
All three programming paradigms concentrate olates the constraint R(x). The algorithm breaks
on problem representation as a computation, that down, however, whenever the state space is infi-
is, the problem is stated in a way that describes nite. But even if the state space is finite, it is
the process of solving it. The computation on how for most nontrivial problems ~ so large that a full
to solve a problem 'is' its representation. One may enumeration is practically impossible.
call such a notational system an algorithmic lan- Algorithmic and declarative representations are
guage. two fundamentally different kinds of modeling

414
Modeling languages in optimization: A new paradigm

and representing knowledge. Declarative knowl- which is clearly a declarative statement of the
edge answers the question 'what is?', whereas algo- problem. In Scheme, a functional language, this
rithmic knowledge asks 'how to?' [4]. An algorithm formula can be implemented directly as a function
gives an exact recipe of how to solve a problem. A in the following way:
mathematical model, i.e. its declarative represen- (define (gcd a b)
tation, on the other hand, (only) defines the prob- (if (= b 0) a
lem as a subspace of the state space. No algorithm (gcd b (remainder a b))))
is given to find all or a single element of the feasible Similar formulations can be given for any other
subspace. language which includes recursion as a basic con-
trol structure. This class of problems is surpris-
ingly rich. The whole paradigm of dynamic pro-
W h y D e c l a r a t i v e R e p r e s e n t a t i o n . The ques- gramming can be subsumed under this class.
tion arises, therefore, why to present a problem
A class of problems of a very different kind are
using a declarative way, since one must solve it
linear programs, which can be represented declar-
anyway and, hence, represent as an algorithm?
atively in the following way:
The reasons are, first of all, conciseness, insight,
and documentation. Many problems can be rep- {mincx" Ax > b}
resented declaratively in a very concise way, while
From this f o r m u l a t i o n - in contrast to the class
the representation of their computation is long and
of recursive definitions - - nothing can be deduced
complex. Concise writings favor also the insight
that would be useful in solving the problem. How-
of a problem. Furthermore, in many scientific pa-
ever, there exists well-known methods, for example
pers a problem is stated in a declarative way using
the simplex method, which solves almost all in-
mathematical equations and inequalities for docu-
stances in a very efficient way. Hence, to make the
mentational purposes. This gives a clear statement
declarative formulation of a linear program useful
of the problem and is an efficient way to commu-
for solving it, one only needs to translate it into
nicate it to other scientists. However, documenta-
a form, the simplex algorithm accepts as input.
tion is by no means limited to human beings. One
The translation from the declarative formulation
can imagine declarative languages implemented on
{min cx" Ax >_ b} to such an input-form can be
a computer like algorithmic languages, which are
automated. This concept can be extended to non-
parsed and interpreted by a compiler. In this way,
linear and discrete problems.
an interpretative system can analyse the structure
of a declarative program, can pretty-print it on a
A l g e b r a i c M o d e l i n g L a n g u a g e s . The idea to
printer or a screen, can classify it, or symbolically
state the mathematical problem in a declarative
transform it in order to view it as a diagram or in
way and to translate it into an 'algorithmic' form
another textual form.
by a standard procedure led to a new language
Of course, the most interesting question is
paradigm emerged basically in the community of
whether the declarative way of representing a
operations research at the end of the 1980s, the
problem could be of any help in solving the prob-
algebraic modeling languages (AIMMS [1], AMPL
lem. [7], GAMS [2], LINGO [18], and LPL [12] and oth-
Indeed, for certain classes of problems the com- ers). These languages are becoming increasingly
putation can be obtained directly from a declara- popular even outside the community of operations
tive formulation. This is true for all recursive def- research. Algebraic modeling languages represent
initions. A classical example is the algorithm of a problem in a purely declarative way, although
Euclid to find the greatest common divisor (gcd) most of them include computational facilities to
of two integers. One can proof that manipulate the data as well as certain control
structures.
_ ~gcd(b,a mod b), b> 0 One of their strength is the complete separation
gcd(a, b)
/ a, b - 0, of the problem formulation as a declarative model

415
Modeling languages in optimization: A new paradigm

from finding a solution, which is supposed to be to integrate symbolic model transformation rules
computed by an external program called a solver. into the declarative language in order to generate
This allows the modeler not only to separate the formulations which are more useful for a solver.
two main tasks of model formulation and model AMPL, for example, automatically detects par-
solution, but also to switch easily between several tially separable structure and computes second
solvers. This is an invaluable benefit for many dif- derivatives [8]. This information are also handed
ficult problems, since it is not uncommon that a over to a nonlinear solver. LPL, to cite a very dif-
model instance can be solved using one method, ferent undertaking, has integrated a set of rules to
and another instance is solvable only using another translate symbolically logical constraints into 0-1
method. Another advantage of such languages is to constraint [11]. To do this in an intelligent way is
separate dearly between model structure, which all but easy, because the resulting 0-1 formulation
only contains parameters (place-holder for data) should be as sharp as possible. This translation is
but no data, and model instance, in which the pa- useful for large mathematical models which must
rameters are replaced by a specific data set. This be extended by a few logical conditions. For many
leads to a natural separation between model for- applications the original model becomes straight-
mulation and data gathering stored in databases. forward while the transformed is complicated but
Hence, the main features of these algebraic mod- still relatively easy to solve (examples were given
eling languages are" in [11]). Even if the resulting formulation is not
solvable efficiently, the modeler can gain more in-
• purely declarative representation of the prob-
sights into the structure of the model from such
lem;
a symbolic translation procedure, and eventually
• clear separation between formulation and so-
modify the original formulation.
lution;
• clear separation between model structure and Second Generation Modeling Languages.
model data. Another research activity, actually under way, goes
It is, however, naive to think that one only needs in the direction of extending the algebraic mod-
to formulate a problem in a concise declarative eling languages in order to express also algorith-
form and to link it somehow to a solver in order mic knowledge. This is necessary, because even if
to solve it. First of all, the 'linking process' is not one could link an purely declarative language to
so straightforward as it seems initially. Second, a any solver, it remains doubtful of whether this can
solver may not exist which could solve the problem be done efficiently in all cases. Furthermore, for
at hand in an efficient way. One only needs to look many problems it is not useful to formulate them
at Fermat's last conjecture which can be stated in in a declarative way: the algorithmic way is more
a declarative way as {a,b,c,n E N +" a n + bn = straightforward and easier to understand. For still
c n, a , b , c >_ 1, n > 2} to convince oneself of this other problems a mixture of declarative and algo-
fact. Even worse, one can state a problem declar- rithmic knowledge leads to a superior formulation
atively for which no solver can exist. This is true in terms of understandability as well as in terms
already for the rather limited declarative language of efficiency, (examples are given below to confirm
of first order logic, for which no algorithm exists this findings).
which decides whether a formula is true or false in Therefore, AIMMS integrates control structures
general (see [5]). and procedure definitions. GAMS, AMPL and
In this sense, efforts are under way actually in LPL also allow the modeler to write algorithms
the design of such languages which focus on flex- powerful enough to solves models repeatedly.
ibly linking the declarative formulation to a spe- A theoretical effort was undertaken in [10] to
cific solver to make this paradigm of purely declar- specify a modeling language which allows the mod-
ative formulation more powerful. This language- eler (or the programmer) to combine algorithmic
solver-interface problem has different aspects and and declarative knowledge within the same lan-
research goes in many directions. A main effort is guage framework without intermingle them. The

416
Modeling languages in optimization: A new paradigm

overall syntax structure of a model (or a program) 1) In CLP the algorithmic part ~ normally a
in this framework is as follows: search mechanism ~ is behind the scene and
MODEL ModelName the computation is intrinsically coupled with
<declarative part of the model> the declarative language itself. This could be
BEGIN a strength because the programmer does not
<algorithmic part of the model>
END ModelName. have to be aware of how the computation is
taking place, he or she only writes the rules
Declarative and algorithmic knowledge are clearly
in a descriptive, that is declarative, way and
separated. Either part can be empty, meaning that
triggers the computation by a request. In re-
the problem is represented in a purely declara-
ality, however, it is an important drawback,
tive or in a purely algorithmic form. The declar-
because ~ for most nontrivial problem
ative part consists of the basic building blocks of
the programmer 'must' be aware on how the
declarative knowledge: variables, parameters, con-
computation is taking place. Therefore, to
straints, model checking facilities, and sets (that
guide the computation in CLP, the declar-
is a way to 'multiply' basic building blocks). This
ative program is interspersed with additional
part may also contain 'ordinary declarations' of
rules which have nothing to do with the de-
an algorithmic language (e.g., type and function
scription of the original problem. In a model-
declarations). Furthermore, one can declare whole
ing language, the user either links the declar-
models within this part, leading to nested model
ative part to an external solver or writes the
structures, which is very useful in decomposing
solver within the language. In either case,
a complex problem into smaller parts. The algo-
both parts are strictly separated. Why is this
rithmic part, on the other hand, consists of all
separation so important? Because it allows
control structures which make the language Tur-
the modeler to 'plug in' different solvers with-
ing complete. One may imagine his or her favorite
out touching the overall model formulation.
programming language being implemented in this
part. A language which combines declarative and 2) The second difference is that the model-
algorithmic knowledge in this way is called model- ing language paradigm lead automatically to
ing language. modular design. This is probably to hottest
DEFINITION 3 A modeling language is a nota- topic in software engineering: building com-
tional system which allows one to combine (not to ponents. Software engineering teaches us that
merge) declarative and algorithmic knowledge in a complex structure can be only managed
the same language framework. The content cap- efficiently by break it down into many rel-
tured by such a notation is called a model. [:] atively independent components. The CLP
approach leads more likely to programs that
Such a language framework is very flexible. Purely
are difficult to survey and hard to debug and
declarative models are linked to external solvers to maintain, because such considerations are
to be solved; purely algorithmic models are pro- entirely absent within the CLP paradigm.
grams, that is algorithms + data structures, in the
ordinary sense. 3) On the other hand, the community of
CLP has developed methods to solve spe-
Modeling Language and Constraint Logic cifiC classes of combinatorial problems which
Programming. Merging declarative and algorith- seems to be superior to other methods. This
mic knowledge is not new, although it is not very is because they rely on propagation, simplifi-
common in language design. The only existing lan- cation of constraints, and various consistency
guage paradigm doing it is constraint logic pro- techniques. In this sense, CLP solvers could
gramming (CLP), a refinement of logic program- be used and linked with modeling languages.
ming [13]. There are, however, important differ- Such a project is actually under way between
ences between the CLP paradigm and the para- the AMPL language and the ILOG solver [6],
digm of modeling language as defined above. [17].
417
Modeling languages in optimization: A new paradigm

Hence, while the representation of models is tic, problems with n < 50 were solvable within the
probably best done in the language framework of LPL framework. Using the constraint language OZ
modeling languages, the solution process can taken [19] problems of n _< 200 are efficiently solvable us-
place in a CLP solver for certain problems. ing techniques of propagation and variable domain
reductions. However, the success of all these meth-
M o d e l i n g E x a m p l e s . Five modeling examples ods seems to be limited compared to the best we
are chosen from very different problem domains to can attain. In [20], [21], Sosic Rok and Gu Jun
illustrate the highlights of the presented paradigm presented a polynomial time local heuristic that
of modeling language. The first two examples show can solve problems of n < 3 000 000 in less than
that certain problems are best formulated using al- one minute. The presented algorithm is very sim-
gorithmic knowledge, the next two examples show ple. The conclusion seems to be for the n-queens
the power of a declarative formulation, and a last problem that an algorithmic formulation is advan-
example indicates that mixing both paradigms is tageous.
sometimes more advantageous.
A Two-Person Game. Two players choose at ran-
Sorting. Sorting is a problem which is preferably dom a positive number and note it on a piece of
expressed in an algorithmic way. Declaratively, paper. They then compare them. If both numbers
the problem could be formulated as follows: Find are equal, then neither player gets a payoff. If the
a permutation 7r such that A~ < A~,+~ for all difference between the two numbers is one, then
i E { 1 , . . . , n - 1} where A1,...,n is an array of ob- the player who has chosen the higher number ob-
jects on which an order is defined. It is difficult tains the sum of both; otherwise the player who
to imagine a 'solver' that could solve this problem has chosen the smaller number obtains the sum
as efficiently as the best known sorting algorithms of both. What is the optimal strategy for a player,
such as Quicksort, of which the implementation is i.e. which numbers should be chosen with what fre-
straightforward. quencies to get the maximal payoff? This problem
The reason why the sorting problem is best was presented in [9] and is a typical two-person
formulated as an algorithm is probably that the zero-sum game. In LPL, it can be formulated as
state space is exponential in the number of items, follows"
however, the best algorithm only has complexity MODEL Game 'finite two-person zero-sum game';
SET i ALIAS j : - / 1 : 5 0 / ;
O(nlogn).
PARAMETER p{i, j} := IF(j > i, IF(j = i + 1,
The n-queens problem. The n-queens problem is - i - j , MIN(i,j)), IF(j < i,-p[j,i],O));
VARIABLE x{i};
to place n queens on a chessboard of dimension
CONSTRAINT R: SUM{i} x[i] = 1;
n x n in such a way, that they cannot beat each MAXIMIZE gain: MIN{j} (SUM{i} p[j, i]. x[i]);
other. This problem can be formulated declarative END Game.
as follows" {xi, xj E { 1 , . . . , n } ' x i ~ xj, xi + i This is an very compact way to declaratively for-
xj + j, x i - i ~ xj - j } , where xi is the column mulate the problem and it is difficult to imag-
position of the ith queen (i.e. the queen in row i). ine how this could be achieved using algorithmic
Using the LPL [12] formulation: knowledge alone. It is also an efficient way to state
MODEL nQueens; the problem, because large instances can be solved
PARAMETER n; SET i ALIAS j ::= { 1 , . . . , n } ; by an linear programming solver. LPL automati-
DISTINCT VARIABLE x{i}[1,..., n]; cally transforms it into an linear program. (By the
CONSTRAINT S { i , j : i < j}:
way, the problem has an interesting solution: Each
x[i] + i < > x[j] + j AND x[i] - i < > x[j] - j;
END player should only choose number smaller than
six.)
the author was able to solve problems for n < 8
using a general MIP solver. The problem is auto- Equal Circles in a Square. The problem is to find
matically translated into a 0-1 problem by LPL. the maximum diameter of n equal mutually dis-
Replacing the MIP-solver by a tabu search heuris- joint circles packed inside a unit square.

418
Modeling languages in optimization: A new paradigm

In LPL, this problem can be compactly formu- one could formulate the algorithmic knowledge as
lated as follows" follows"

MODEL circles 'pack equal circles in a square'; SOLVE the small cutting-stock problem
PARAMETER n 'number of circles'; SOLVE the knapsack problem
WHILE a rewarding pattern was found DO
SET i ALIAS j = 1 , . . . , n ;
add pattern to the cutting-stock problem
VARIABLE
SOLVE the cutting-stock problem again
t 'diameter of the circles';
x{i}[0, 1] 'x-position of the center'; SOLVE the knapsack problem again
y{i}[0, 1] 'y-position of the center'; ENDWHILE
CONSTRAINT The two models (the cutting-stock problem and
R{i,j: i < j} 'circles must be disjoint':
the knapsack problem) can be formulated declara-
(~[i] - ~[j])~ + (y[i] - y[j])~ > t;
MAXIMIZE obj 'maximize diameter': t; tively. In the proposed framework of modeling lan-
END guage, the complete problem can now be expressed
as in the program below.
C.D. Maranas et al. [15] obtained the best known
MODEL CuttingStock;
solutions for all n < 30 and, for n - 15, an
MODEL Knapsack(i, w, p, K, x,obj);
even better one using an equivalent formulation SET i;
in GAMS and linking it to MINOS [16], an well- PARAMETER w{i}; p{i}; K;
known nonlinear solver. INTEGER VARIABLE x{i};
CONSTRAINT R: SUM{i} w . x < K;
The (Fractional) Cutting-Stock Problem. Paper is MAXIMIZE obj" SUM{i} p.x;
manufactured in rolls of width B. A set of cus- END Knapsack.
tomers W orders dw rolls of width b~ (with w E SET
w 'rolls ordered'; p 'possible patterns';
W). Rolls can be cut in many ways, every subset
PARAMETER
P~ C_ W such that ~i~P' yibi <_B is a possible cut- a{w,p} 'pattern table';
pattern, where yi is a positive integer. The ques- d{w} 'demands';
tion is how the initial roll of width B should be cut, b{w} 'widths of ordered rolls';
that is, which patterns should be used, in order to B 'initial width';
INTEGER y{w} 'new added pattern';
minimize the overall paper waste. A straightfor-
C 'contribution of a cut';
ward formulation of this problem is to enumerate
VARIABLE
all patterns, each giving a variable, then to min- X{p} 'number of rolls cut according to p';
imize the number of used patterns while fulfilling CONSTRAINT
the demands. The resulting model is a very large Dem{w}" SUM{p} a,X >_d;
linear program which cannot be solved. MINIMIZE obj" SUM{p} X;
BEGIN
A well-known method in operations research to
SOLVE;
solve such kind of problems is to use a column SOLVE Knapsack(w, b,Dem.dual,B, y, C);
generation method (see [3] for details), that is, a WHILE (C > 1) DO
small instance with only a few patterns is solved p .= p + {'pattern_'+str(#p)};
and a rewarding column ~ a p a t t e r n - is added a{w, #p} := y[w];
SOLVE;
repeatedly to the problem. The new problem is
SOLVE Knapsack(w, b,Dem.dual,B, y, C);
then solved again. This process is repeated, un- END;
til no pattern can be added. To find a rewarding END CuttingStock.
pattern, another problem ~ named a knapsack
This formulation has several remarkable prop-
p r o b l e m - must be solved.
erties:
The problem can be formulated partially be
algorithmic partially by declarative knowledge. 1) It is short and readable. The declarative part
It consists of two declaratively formulated prob- consists of the (small) linear cutting-stock
lems (a linear program and an knapsack problem), problem, it also contains, as a submodel,
which are both repeatedly solved. In a pseudocode a knapsack problem. The algorithmic part

419
Modeling languages in optimization: A new paradigm

implements the column generation method. modeler to communicate the model easily and to
Both parts are entirely separated. build it in a readable and maintainable way.
2) It is a complete formulation, except from the See also: L a r g e scale u n c o n s t r a i n e d o p t i m i -
data. No other code is needed; both models zation; O p t i m i z a t i o n s o f t w a r e ; C o n t i n u o u s
can be solved using a standard MIP solver global o p t i m i z a t i o n : Models~ a l g o r i t h m s a n d
(since the knapsack problem is small in gen- software.
eral).
3) It has a modular structure. The knapsack References
problem is an independent component with [1] BISSCHOP, J.: AIMMS, the modeling system, Paragon
Decision Techn., 1998, www.paragon.nl.
its own name space; there is no interference
[2] BROOKE, A., KENDRICK, D., AND MEERAUS, A.:
with the surrounding model. It could even be GAMS. A user's guide, Sci. Press, 1988.
declared outside the cutting-stock problem. [3] CHVATAL, V." Linear programming, Freeman, 1973.
4) The cutting-stock problem is only one prob- [4] FEIGENBAUM, E.A.: 'How the 'what' becomes the
'how", Comm. A CM 39, no. 5 (1996), 97-104.
lem of a large class of relevant problems
[5] FLOYD, R.W., AND BEIGEL, R.: The language of ma-
which are solved using a column generation chines, an introduction to computability and formal
or, alternatively, a row-cut generation. languages, Computer Sci. Press, 1994.
[6] FOURER, R.: 'Extending a general-purpose algebraic
modeling language to combinatorial optimization: A
C o n c l u s i o n . It has been shown that certain
logic programming approach', in D.L. WOODRUFF
problems are best formulated as algorithms, oth- (ed.): Advances in Computational and Stochastic Op-
ers in a declarative way, still others need both timization, Logic Programming, and Heuristic Search:
paradigms to be stated concisely. Computer sci- Interfaces in Computer Sci. and Oper. Res., Kluwer
ence made available many algorithmic languages; Acad. Publ., 1998, pp. 31-74.
[7] FOURER, R., GAY, D.M., AND KERNIGHAN, B.W.:
they can be contrasted to the algebraic model-
AMPL, a modeling language for mathematical pro-
ing languages which are purely declarative. A lan- gramming, Sci. Press, 1993.
guage, called modeling language, which combines [8] GAY, D.M.: 'Automatically finding and exploiting
both paradigms was defined in this paper and ex- partially separable structure in nonlinear programming
amples were given showing clear advantages of do- problems', A T ~ T Bell Lab. Murray Hill, New Jersey
ing so. Its is more powerful than both paradigms (1996).
[9] HOFSTADTER, D.R.: Metamagicum, Fragen Bach der
separated.
Essenz yon Geist und Struktur, Klett-Cotta, Stuttgart,
However, the integration of algorithmic and 1988.
declarative knowledge cannot be done in an ar- [10] HTJRLIMANN,T.: 'Computer-based mathematical mod-
bitrary way. The language design must follow cer- eling', Habilitations Script Fac. Economic and Social
tain criteria well-known in computer science. The Sci. Inst. Informatics Univ. Fribourg Dec. (1997).
[11] HURLIMANN, W.: 'An efficient logic-to-IP trans-
main criteria are: reliability and transparency. Re-
lation procedure', Working Paper Inst. In/or-
liability can be achieved by a unique notation to matics Univ. Fribourg March (1998), ftp://ftp-
code models, that is, by a modeling language, and iiuf.unifr,ch/pub/lpl/doc / AP MOD 1.pdf.
by various checking mechanisms (type checking, [12] HURLIMANN, T.: Reference manual for the LPL mod-
unit checking, data integrity checking and oth- eling language, working paper version ~.30, June,
Inst. Informatics Univ. Fribourg, 1998, ftp://ftp-
ers). Transparency can be obtained by flexible de-
iiuf.unifr.ch/pub/lpl/doc/Manual.ps.
composition techniques, like modular structure as [13] JAFFAR, J., AND MAHER, M.J.: Constraint logic pro-
well as access and protection mechanisms of these gramming: A survey, Handbook Artificial Intelligence
structure, well-known techniques in language de- and Logic Programming. Oxford Univ. Press, 1995.
sign and software engineering. [14] LOUDEN, K.C.: Programming languages -Principles
and practice, PWS/Kent Publ., 1993.
Solving efficiently and relevant optimization
[15] MARANAS, C.D., FLOUDAS, C.A., AND PARDALOS,
problems using present desktop machine not only P.M.: New results in the packing of equal circles in a
asks for fast machines and sophisticated solvers, square, Dept. Chemical Engin. Princeton Univ., 1993.
but also for formulation techniques that allow the [16] MURTAGH, B.A., AND SAUNDERS, M.A.: MINOS 5.0,

420
Molecular structure determination: Convex global underestimation

user guide, Systems Optim. Lab. Dept. Oper. Res. the experimental structures were known (see [3],
Stanford Univ., 1987. [9]). While most of these have been made with a
[17] SA, ILOG: ILOG solver ,~.0 user's manual; ILOG
blend of a human expert's abilities and computer
solver ~.0 reference manual, ILOG, 1997.
[is] SCHRAGE, L.: Optimization modeling with LINGO, assistance, fully automated methods have shown
Lindo Systems, 1998, www.lindo.com. promise for producing previously unattainable ac-
[19] SMOLKA, G.: 'The Oz programming model', in J. VAN curacy [2].
LEEUWEN (ed.): Computer Sci. Today, Vol. 1000 of Lec- These machine based prediction strategies at-
ture Notes Computer Sci., Springer, 1995, pp. 324-343.
tempt to lessen the reliance on experts by develop-
[2o] SOSIC, R., AND GU, J.: 'A polynomial time algorithm
for the n-queens problem', SIGART Bull. 1, no. 3 ing a completely computational method. Such ap-
(1990), 7-11. proaches are generally based on two assumptions.
[21] SOSIC, R., AND GU, J.: '3,000,000 queens in less than First, that there exists a potential energy function
one minute', SIGART Bull. 2, no. 1 (1991), 22-24. for the protein; and second that the folded state
Tony Hiirlimann corresponds to the structure with the lowest poten-
Inst. Informatics Univ. Fribourg tial energy (minimum of the potential energy func-
Fribourg, Switzerland tion) and is thus in a state of thermodynamic equi-
E-mail address: tony.huerlimann~unifr, ch librium. This view is supported by in vitro obser-
MSC2000: 90C10, 90C30 vations that proteins can successfully refold from
Key words and phrases: algorithmic language, declarative a variety of denatured states. Evolutionary theory
language, modeling language, solver. also supports a folded state at a global energy min-
imum. Protein sequences have evolved under pres-
sure to perform certain functions, which for most
MOLECULAR STRUCTURE DETERMINA- known occurrences requires a stable, unique, and
TION: CONVEX GLOBAL UNDERESTIMA- compact structure. Unless specifically required for
TION a certain function, there was no biochemical need
An important class of difficult global minimization for proteins to hide their global minimum behind
problems arise as an essential feature of molecu- a large kinetic energy barrier. While kinetic blocks
lar structure calculations. The determination of a may occur, they should be limited to special pro-
stable molecular structure can often be formulated teins developed for certain functions (see [1]).
in terms of calculating the global (or approximate
global) minimum of a potential energy function M o l e c u l a r M o d e l . Unfortunately, finding the
(see [6]). Computing the global minimum of this 'true' energy function of a molecular structure,
function is very difficult because it typically has if one even exists, is virtually impossible. For ex-
a very large number of local minima which may ample, with proteins ranging in size up to 1,053
grow exponentially with molecule size. amino acids (a collagen found in tendons), ex-
One such application is the well known pro- haustive conformational searches will never be
tein folding problem. It is widely accepted that tractable. Practical search strategies for the pro-
the folded state of a protein is completely depen- tein folding problem currently require a simplified,
dent on the one-dimensional linear sequence (i.e., yet sufficiently realistic, molecular model with an
'primary' sequence) of amino acids from which the associated potential energy function representing
protein is constructed" external factors, such as en- the dominant forces involved in protein folding [4].
zymes, present at the time of folding have no effect In a one such simplified model, each residue in the
on the final, or native, state of the protein. This led primary sequence of a protein is characterized by
to the formulation of the protein/olding problem: its backbone components N H - C a l l - C'O and
given a known primary sequence of amino acids, one of 20 possible amino acid sidechains attached
what would be its native, or folded, state in three- to the central Ca atom. The three-dimensional
dimensional space. structure of the chain is determined by internal
Several successful predictions of folded protein molecular coordinates consisting of bond lengths l,
structures have been made and announced before bond angles ~, sidechain torsion angles X, and the

421
Molecular structure determination: Convex global underestimation

backbone dihedral angles ¢, ¢, and w. Fortunately, that allows only certain preset values for the back-
these 1 0 r - 6 parameters (for an r-residue struc- bone dihedral angle pairs (¢, ¢). Since the residues
ture) do not all vary independently. Some of these in this model come in only two forms, hydrophobic
( 7 r - 4 of them) are regarded as fixed since they and polar, where the hydrophobic monomers ex-
are found to vary within only a very small neigh- hibit a strong pairwise attraction, the lowest free
borhood of an experimentally determined value. energy state involves those conformations with the
Among these are the 3 r - 1 backbone bond lengths greatest number of hydrophobic 'contacts' [4] and
l, the 3 r - 2 backbone bond angles 0, and the intrastrand hydrogen bonds. Simplified potential
r - 1 peptide bond dihedral angles w (fixed in functions have been successful in [11], [10], and
the trans conformation). This leaves only the r [12]. Here we use a simple modification of the en-
sidechain torsion angles X, and the r - 1 backbone ergy function from [11].
dihedral angle pairs (¢, ¢). In the reduced repre-
sentation model presented here, the sidechain an-
T h e C o n v e x G l o b a l U n d e r e s t i m a t o r . One
gles X are also fixed since sidechains are treated
practical means for finding the global minimum of
as united atoms (see below) with their respective
the polypeptide's potential energy function is to
torsion angles X fixed at an 'average' value taken
use a convex global underestimator to localize the
from the Brookhaven Protein Databank. Remain-
search in the region of the global minimum. The
ing are the r - 1 backbone dihedral angles pairs.
idea is to fit all known local minima with a con-
These also are not completely independent; they
vex function which underestimates all of them, but
are severely constrained by known chemical data
which differs from them by the minimum possible
(the Ramachandran plot) for each of the 20 amino
amount in the discrete L1 norm. The minimum of
acid residues. Furthermore, since the atoms from
this underestimator is used to predict the global
one Ca to the next Ca along the backbone can
minimum for the function, allowing a more local-
be grouped into rigid planar peptide units, there
ized conformer search to be performed based on
are no extra parameters required to express the
the predicted minimum.
three-dimensional position of the attached O and
More precisely, given an r-residue structure with
H peptide atoms. Hence, these bond lengths and
n = 2 r - 2 backbone dihedral angles, denote
bond angles are also known and fixed.
a conformation of this simplified model by ¢ E
Another key element of this simplified polypep- R n, and the corresponding simplified potential en-
tide model is that each sidechain is classified as ergy function value by F(¢). Then, assuming that
either hydrophobic or polar, and is represented by k > 2n + 1 local minimum conformations ¢(J), for
only a single 'virtual' center of mass atom. Since j = 1 , . . . , k, have been computed, a convex qua-
each sidechain is represented by only the single dratic underestimating function U(¢) is fitted to
center of mass 'virtual atom' C8, no extra pa- these local minima so that it underestimates all the
rameters areneeded to define the position of each local minima, and normally interpolates F(¢(J)) at
sidechain with respect to the backbone mainchain. 2n + 1 points. This is accomplished by determining
The twenty amino acids are thus classified into two
the coefficients in the function U(¢) so that
groups, hydrophobic and polar, according to the
scale given by S. Miyazawa and R.L. Jernigan [7]. 5j - F ( ¢ (j)) - U ( ¢ (j)) >_ 0 (1)
Corresponding to this simplified polypeptide
model is a simple energy function. This function for j - 1 , . . . , k, and where ~--~jn=_1 5j is minimized.
includes four components: a contact energy term That is, the difference between F(¢) and U(¢) is
favoring pairwise hydrophobic residues, a second minimized in the discrete L1 norm over the set
contact term favoring hydrogen bond formation of k local minima ¢(J), j - 1,... ,k. Of course,
between donor NH and acceptor C ~ = O pairs, a this 'underestimator' only underestimates known
steric repulsive term which rejects any conforma- local minima. The specific underestimating func-
tion that would permit unreasonably small inter- tion U(¢) used in this convex global underestima-
atomic distances, and a main chain torsional term tot (CGU) method is given by

422
Molecular structure determination: Convex global underestimation

1 Then (1)-(3) can be restated as the linear program


U(¢)-c0+~ ci¢i + -~di¢i • (2) (with free variables c, d, and 5)"
i=1
• minimize e kT 5
Note that ci and di appear linearly in the con-
straints of (1) for each local minimum ¢(J). Con- • such that
vexity of this quadratic function is guaranteed by {T ~"-~T 0 f
requiring that di >_ 0 for i = 1 , . . . , n. Other lin-
ear combinations of convex functions could also be
(5)
used, but this quadratic function is the simplest. -s" -D
Additionally, in order to guarantee that U(¢)
attains its global minimum Umin in the hyperrect- where D = diag(¢l,...,¢n), D =
angle H e - {¢i" 0 < ¢i < ¢i < ¢i < 27r}, an d i a g ( ¢ l , . . . , Cn), Ik is the identity matrix of
u

additional set of constraints are imposed on the order k, and/In is the n × (n + 1) 'augmented'
coefficients of U(¢): matrix ( O ' I n ) where In is the identity ma-
trix of order n.
ci -+-¢_.idi <_ O,
i--1,...,n. (3) Since the matrix in (5) has more rows than
¢i -Jr-¢idi >_ O, columns (2(k + n) rows and k + 2n + 1 columns,
Note that the satisfaction of (3) implies that ci < 0 where k > 2n + 1), it is computationally more effi-
and di > 0 for i = 1 , . . . , n . cient to consider it as a dual problem, and to solve
The unknown coefficients ci, i = 0 , . . . , n, and the equivalent primal. After some simple transfor-
mations, this primal problem reduces to:
di, i = 1 , . . . , n, can be determined by a linear pro-
gram which may be considered to be in the dual min f T y 1 _ f T ek
form. For reasons of efficiency, the equivalent pri-
mal of this problem is actually solved, as described s.t. I~nT - -

below. The solution to this primal linear program


provides an optimal dual vector, which immedi- Y3
ately gives the underestimating function coeffi- Yl, Y2, Y3 _> 0
cients ci and di. Since the convex quadratic func-
tion U(¢) gives a global approximation to the local which has only 2n + 1 rows and k + 2n > 4n + 1
minima of F(¢), then its easily computed global columns, and the obvious initial feasible solution
minimum function value Umin is & good candidate Yl -- ek a n d Y2 -- Y3 - 0. F u r t h e r m o r e , since the

for an approximation to the global minimum of the first of the 2n + 1 constraints in (6) in fact requires
correct energy function F(¢). t h a t e kT Yl - 1, t h e n t h e function fTyl- f T e k is
also bounded below, and so this primal linear pro-
An efficient linear programming formulation
gram always has an optimal solution. This optimal
and solution satisfying (1)-(3) will now be sum-
marized. Let f(J) - F(¢(J)), for j - 1 , . . . , k , solution gives the values of c, d, and (f via the dual
vectors, and also determines which values of f(J)
and let f E R k be the vector with elements f(J).
are interpolated by the potential function U(¢).
Also let w(J) E R n be the vector with elements
1-(¢IJ))2 That is, the basic columns in the optimal solu-
2 ' i - 1 ' " " " ' n, and let ek C R k be the vec-
tion to (6) correspond to the conformations ¢(J)
tor of ones. Now define the following two matrices
(I) E R ( n + l ) x k and f~ E R n X k : for which F ( ¢ ( J ) ) - U(¢(J)).

~_
( T ) ek
Note that once an optimal solution to (6) has
been obtained, the addition of new local minima is
¢(11...¢(k) (4) very easy. It is done by simply adding new columns
--(co(1)...03(k)). to (I) and f~, and therefore to the constraint matrix
in (6). The number of primal rows remains fixed
Finally, let c E R n+l, d E R n, and 5 E R k be the at 2n + 1, independent of the number k of local
vectors with elements ci, di, and 5i, respectively. minima.

423
Molecular structure determination: Convex global underestimation

The convex quadratic underestimating function Hence, these bounds can be used to define the new
U(¢) determined by the values c E R n+l and d E hyperrectangle H ¢ in which to generate new con-
R n now provides a global approximation to the lo- figurations.
cal minima of F(¢), and its easily computed global Clearly, if Ec is reduced, the size of H ¢ is also
minimum point Cmin is given by (¢min)i : -ci/di, reduced. At every iteration the predicted global
i = 1 , . . . , n , with corresponding function value minimum value Umin satisfies Umin _~ F ( ¢ * ) , where
Umin given by Umin - c 0 - ~'~i=1 n ci2/di • The value ¢* is the smallest known local minimum confor-
Umin is a good candidate for an approximation to mation. Therefore, Ec = F(¢*) is often a good
the global minimum of the correct energy function choice. If at least one improved point ¢, with
F(¢), and so Cmin can be used as an initial start- F(¢) < F(¢*), is obtained in each iteration, then
ing point around which additional configurations the search domain H ¢ will strictly decrease at each
(i.e., local minima) should be generated. These lo- iteration, and may decrease substantially in some
cal minima are added to the constraint matrix in iterations.
(6) and the process is repeated. Before each iter-
ation of this process, it is necessary to reduce the T h e C G U A l g o r i t h m . Based on the preceding
volume of the hyperrectangle H ¢ over which the description, a general method for computing the
new configurations are produced so that a tighter global, or near global, energy minimum of the po-
fit of U(¢) to the local minima 'near' Cmin is con- tential energy function F ( ¢ ) can now be described.
structed.
1) Compute k > 2n 4-1 distinct local minima
The rate and method by which the hyperrectan-
¢(J), for j - 1 , . . . , k, of the function F(¢).
gle size is decreased, and the number of additional
local minima computed at each iteration must be 2) Compute the convex quadratic underestima-
determined by computational testing. But clearly tor function given in (2) by solving the linear
the method depends most heavily on computing program given in (6). The optimal solution
local minima quickly and on solving the resulting to this linear program gives the values of c
linear program efficiently to determine the approx- and d via the dual vectors.
imating function U(¢) over the current hyperrect- 3) Compute the predicted global minimum
angle. point Cmin given by (¢min)i -- -ci/di, i -
If Ec is a cutoff energy, then one means for de- 1,... ,n, with corresponding function value
creasing the size of the hyperrectangle H ¢ at any Umin given by Umin - co - ~--~in_-iC2/ (2di).
step is to let H ¢ = {¢: U(¢) <_ Ec}. To get the 4) If Cmin -- ¢*, where ¢* =
bounds of H¢, consider U(¢) < Ec where U(¢) argmin{F(¢(J))'j - 1 , 2 , . . . } is the best
satisfies (2). Then limiting ¢i requires that local minimum found so far, then stop and
report ¢* as the approximate global mini-
1
ci¢i +-~di¢i < Ec- (7) mum conformation.
5) Reduce the volume of the hyperrectangle H ¢
As before, the minimum value of U(¢) is attained over which the new configurations will be
when ¢i - - c i / d i , i - 1 , . . . , n . Assigning this produced, and remove all columns from cI,
minimum value to each ¢i, except ¢k, then results and f~ which correspond to the conformations
in which are excluded from H¢.
2 6) Use Cmin as an initial starting point around
Ck¢k + 2 dk¢2k <_ E c - 1 /#~k
co + -~ . ~ci -- ilk. (8) which additional local minima ¢(J) of F(¢)
(restricted to the hyperrectangle H¢) are
The lower and upper bounds on ¢k, k - 1 , . . . , n, generated. Add these new local minimum
are given by the roots of the quadratic equation conformations as columns to the matrices
and f~.
1 2
Ck¢k + -~dk¢k -- ilk. (9)
7) Return to step 2.

424
Monte-Carlo simulated annealing in protein folding

The number of new local minima to be gener- contact frequencies in protein structures', Protein En-
ated in step 6 is unspecified since there is currently gin. 6 (1993), 267-278.
no theory to guide this choice. In general, a value [8] PHILLIPS, A.T., ROSEN, J.B., AND WALKE, V.H.:
'Molecular structure determination by global optimi-
exceeding 2n + 1 would be required for the con- zation', in P.M. PARDALOS,G.L XUE, AND D. SHAL-
struction of another convex quadratic underesti- LOWAY (eds.): DIMACS, Amer. Math. Soc., 1995,
mator in the next iteration (step 2). In addition, pp. 181-198.
the means by which the volume of the hyperrect- [9] RICHARDS, F.M.: 'The protein folding problem', Sci-
angle H e is reduced in step 5 may vary. One could entif. A met. (1991), 54-63.
[10] SRINIVASAN, R., AND ROSE, G.D.: 'LINUS: A hierar-
use the two roots of (7) to define the new bounds
chic procedure to predict the fold of a protein', PRO-
of He. Another method would be simply to use TEINS: Struct. Funct. Genet. 22 (1995), 81-99.
He- {¢i" ((/)min)i- (~i _~ ¢i _~ (¢min)i-+-(~i} where [11] SUN, S., THOMAS, P.D., AND DILL, K.A.: 'A simple
( ~ i - I((~min)i- (¢*)il, i - - 1 , . . . , U. protein folding algorithm using binary code and sec-
For complete details of the CGU method and ondary structure constraints', Protein Engin. 8, no. 8
(1995), 769-778.
its computational results, see [5], [8].
[12] YUE, K., AND DILL, K.A.: 'Folding proteins with a
See also: Simulated annealing methods in simple energy function and extensive conformational
protein folding; Packet annealing; Phase searching', Protein Sci. 5 (1996), 254-261.
problem in X-ray crystallography: Shake A. T. Phillips
and bake approach; Global optimization in Computer Sci. Dept. Univ. Wisconsin-Eau Claire
protein folding; Multiple minima problem Eau Claire, WI 54701, USA
in protein folding: aBB global optimiza- E-mail address: phillipa©uwec, edu
tion approach; Adaptive simulated anneal- MSC2000: 65K05, 90C26
ing and its application to protein folding; Key words and phrases: protein folding, molecular structure
Genetic algorithms; Global optimization in determination, convex global underestimation.
Lennard-Jones and Morse clusters; Protein
folding: Generalized-ensemble algorithms;
Monte-Carlo simulated annealing in protein MONTE-CARLO SIMULATED ANNEALING
folding; Simulated annealing. IN PROTEIN FOLDING
We review uses of Monte-Carlo simulated anneal-
References ing in the protein folding problem. We will discuss
[1] ABAGYAN, R.A: 'Towards protein folding by global en- the strategy for tackling the protein folding prob-
ergy optimization', Federation of Europ. Biochemical lem based on all-atom models. Our approach con-
Soc.: Lett. 325 (1993), 17-22. sists of two elements: the inclusion of accurate sol-
[2] ANDROULAKIS, I.R., MARANIS, C.D., AND FLOUDAS, vent effects and the development of powerful sim-
C.A.: 'Prediction of oligopeptide conformations via de- ulation algorithms that can avoid getting trapped
terministic global optimization', J. Global Optim. 11
(1997), 1-34.
in states of energy local minima. For the former,
[3] BENNER, S.A., AND GERLOFF, D.L.: 'Predicting the we discuss several models varying in nature from
conformation of proteins: man versus machine', Fed- crude (distance-dependent dielectric function) to
eration of Europ. Biochemical Soc.: Lett. 325 (1993), rigorous (reference interaction site model). For the
29-33. latter, we show the effectiveness of Monte-Carlo
[4] DILL, K.A.: 'Dominant forces in protein folding', Bio-
simulated annealing.
chemistry 29, no. 31 (1990), 7133-7155.
[5] DILL, K.A., PHILLIPS, A.T., AND ROSEN, J.B.: 'Pro-
tein structure and energy landscape dependence on se- Introduction. Proteins under their native physi-
quence using a continuous energy function', J. Comput. ological conditions spontaneously fold into unique
Biol. 4, no. 3 (1997), 227-239. three-dimensional structures (tertiary structures)
[6] MERZ, K., AND GRAND, S. LE: The protein folding
in the time scale of milliseconds to minutes. Al-
problem and tertiary structure prediction, Birkh~iuser,
1994. though protein structures appear to be depen-
[7] MIYAZAWA, S., AND JERNIGAN, R.L.: 'A new substi- dent on various environmental factors within the
tution matrix for protein sequence searches based on cell where they are synthesized, it was inferred by

425
Monte-Carlo simulated annealing in protein folding

experiments 'in vitro' that the three-dimensional


S p -- E c -[- ELJ -~- EHB -~- Stor,
structure of a protein is determined solely by its
Ec - ~ 332qiqj,
amino-acid sequence information [12]. Hence, it
(i,j) erij
has been hoped that once the correct Hamiltonian
of the system is given, one can predict the native ,

protein tertiary structure from the first principles (i,j)


(1)
by computer simulations. However, this has yet to
be accomplished. There are two reasons for the dif-
ficulty. One reason is that the inclusion of accurate (i,j)
solvent effects is nontrivial, because the number Etor - ~ Ui (1 =t=cos(nixi)) .
of solvent molecules that have to be considered
is very large. The other reason for the difficulty
comes from the fact that the number of possible Here, rij is the distance (in ~) between atoms i
conformations for each protein is astronomically and j, e is the dielectric constant, and Xi is the
large [30], [60]. Simulations by conventional meth- torsion angle for the chemical bond i. Each atom
ods such as Monte-Carlo or molecular dynamics is expressed by a point at its center of mass, and
algorithms in canonical ensemble will necessarily the partial charge qi (in units of electronic charges)
be trapped in one of many local-minimum states is assumed to be concentrated at that point. The
in the energy function. In this article, I will discuss factor 332 in Ec is a constant to express energy
a possible strategy to alleviate these difficulties. in units of kcal/mol. These parameters in the en-
The outline of the article is as follows. In section 2 ergy function as well as the molecular geometry
we summarize the energy functions of protein sys- were adopted from E C E P P / 2 [37], [41], [57]. The
tems that we used in our simulations. In section 3 computer code KONF90 [23], [46] was used for all
we briefly review our simulation methods. In sec- the Monte-Carlo simulations. For gas phase sim-
tion 4 we present the results of our protein folding ulations, we set the dielectric constant e equal to
simulations. Section 5 is devoted to conclusions. 2. The peptide-bond dihedral angles w were fixed
at the value 180 ° for simplicity. So, the remaining
dihedral angles ¢ and ~b in the main chain and X in
the side chains constitute the variables to be up-
dated in the simulations. One Monte-Carlo (MC)
sweep consists of updating all these angles once
with Metropolis evaluation [36] for each update.
Solvation free energy of interactions between a
solute molecule and solvent molecules, in general,
can be divided into three contributions: hydropho-
E n e r g y F u n c t i o n s of P r o t e i n S y s t e m s . The bic term that corresponds to the work required to
energy function for the protein systems is given create a cavity of the shape of the solute molecule
by the sum of two terms: the conformational en- in solution (the term 'hydrophobic' used in this ar-
ergy Ep for the protein molecule itself and the sol- ticle is different from a more standard one; see [11]
vation free energy Es for the interaction of pro- for clarification on various definitions), the elec-
tein with the surrounding solvent. The conforma- trostatic term (including the hydrogen-bond en-
tional energy function Ep (in kcal/mol) for the ergy) between solute and solvent molecules, and
protein molecule that we used is one of the stan- the Lennard-Jones term between solute and sol-
dard ones. Namely, it is given by the sum of the vent molecules.
electrostatic term Ec, 12-6 Lennard-Jones term One of the simplest ways to represent solvent
EL j, and hydrogen-bond term EHB for all pairs of effects is by the sigmoidal, distance-dependent di-
atoms in the molecule together with the torsion electric function [20], [54]. The explicit form of the
term Etor for all torsion angles: function we used is given by [43]

426
Monte-Carlo simulated annealing in protein folding

e(r) - D D - 2 [(sr)2 + 2st + 2] e -~r (2) (4)


2
which is a slight modification of the one used in where h ps and h s~ are the matrices of the solute-
[9]. Here, we use s = 0.3 and D = 78. It ap- solvent and the solvent-solvent total correlation
proaches 2 (the value inside a protein) in the limit functions, respectively, ~P~ is the matrix of the
the distance r going to zero and 78 (the value solute-solvent direct correlation functions, "~PP
for bulk water) in the limit r going to infinity. and ~,ss are the intramolecular correlation matri-
The distance-dependent dielectric function is sim- ces for solute and solvent, respectively, and p is
ple and also computationally only slightly more the number density matrix of the solvent. The sol-
demanding than the gas-phase case. But it only vation free energy is given by
involves the electrostatic interactions. Other sol-
vent contributions are hydrophobic interactions Es - 47rpkBT / r2F(r) dr, (5)
t ]
and Lennard-Jones interactions between protein 0
and solvent. where F ( r ) is defined by
Another commonly used term that represents
F(r) (6)
solvent contributions is the term proportional to
the solvent-accessible surface area of protein mol- { 1 pS(r)2 ~ 1 ps ~ }
- Z 7hob - -
ecule. The solvation free energy Es in this approx- a,b
imation is given by Here, the summation indices a and b run over the
solute and the solvent sites, respectively. A robust
Es - E aiAi, (3)
i
and fast algorithm for solving RISM equations was
recently (as of 1999) developed [24], which made
where Ai is the solvent-accessible surface area of
folding simulations of peptides a feasible possi-
ith functional group, and ai is the proportion-
bility [25]. Although this method is computation-
ality constant. There are several versions of the
ally much more time-consuming than the first two
set of the proportionality constants and functional
methods (terms with distance-dependent dielectric
groups. Five parameter sets were compared for the
function and those proportional to surface area),
systems of peptides and a small protein, and we
it gives the most accurate representation of the
found that the parameter sets of [52], [59] are valid
solvation free energy.
ones [33]. The term in (3) includes all the contri-
butions from solvent (namely, hydrophobic, elec-
M e t h o d s . Once the appropriate energy function
trostatic, and Lennard-Jones interactions), and
of the protein system is given, we have to employ
it is therefore more accurate than the distance-
a simulation method that does not get trapped in
dependent dielectric function. It is, however, an
states of energy local minima. We have been advo-
empirical representation, and its validity has to be
cating the use of Monte-Carlo simulated annealing
eventually tested with a rigorous solvation theory.
[27].
The most widely-used and rigorous method of
In the regular canonical ensemble with a given
inclusion of solvent effects is probably the one that
inverse temperature /3 - 1/kBT, the probability
deals with the explicit solvent molecules with all-
weight of each state with energy E is given by the
atom representations. Many molecular dynamics
Boltzmann factor:
simulations of protein systems now directly in-
clude these explicit solvent molecules (for a review, WB(E) - exp(-/3E). (7)
see, for instance, [4]). Another rigorous method is The probability distribution in energy is then
based on the statistical mechanical theory of liq- given by
uid and solution and is called the reference inter-
PB(T, E) ex n(E)WB(E), (8)
action site model (RISM) [7], [21]. The RISM in-
tegral equation for solute-solvent (p-s) correlation where n(E) is the number of states with energy
functions in Fourier k-space is given by E. Since the number of states n(E) is an increas-

427
Monte-Carlo simulated annealing in protein folding

ing function of energy and the Boltzmann fac- K (the final temperature TF was sometimes set
tor WB(E) decreases exponentially with E, the equal to 100 K, 50 K, or 1 K)[23], [46]. The tem-
probability distribution PB(T,E) has a bell-like perature for the nth MC sweep is given by
shape in general. When the temperature is high,
is small, and WB(E) decreases slowly with E. T n --TI'~ n - I , (9)
So, PB(T, E) has a wide bell-shape. On the other
where ~ is a constant which is determined by TI,
hand, at low temperature fl is large, and WB(E)
TF, and the total number of MC sweeps of the
decreases rapidly with E. So, PB(T,E) has a
run. Each run consists of 104 ~ 106 MC sweeps,
narrow bell-shape (and in the limit T --+ 0 K,
and we usually made 10 to 20 runs from different
PB(T,E) oc 5 ( E - EGS), where EGS is the global-
initial conformations.
minimum energy). However, it is very difficult to
obtain canonical distributions at low temperatures
with conventional simulation methods. This is be- R e s u l t s . We now present the results of our simula-
cause the thermal fluctuations at low tempera- tions based on Monte-Carlo simulated annealing.
tures are small and the simulation will certainly All the simulations were started from randomly-
get trapped in states of energy local minima. Sim- generated conformations.
(a)
ulated annealing [27] is based on the process of 35

crystal making. Namely, by starting a simulation 30


25
at a sufficiently high temperature (much above the
20
melting temperature), one lowers the temperature 15
gradually during the simulation until it reaches the 10
5
global-minimum-energy state (crystal). If the rate
0
of temperature decrease is sufficiently slow so that -5
thermal equilibrium may be maintained through- -10
-15
out the simulation, only the state with the global 0 50000 100000 150000 200000
MC Sweep
energy minimum is obtained (when the final tem- (b)
35
perature is 0 K). However, if the temperature de-
30
crease is rapid (quenching), the simulation will get 25
trapped in a state of energy local minimum in the 20
15
vicinity of the initial state.
10
Simulated annealing was first successfully used 5
0
to predict the global-minimum-energy conforma-
-5
tions of polypeptides and proteins [63], [22], [61] -10
and to refine protein structures from X-ray and -15
0 50000 100000 150000 200000
NMR data [5], [42] almost a decade ago. Since then MC Sweep
(c)
this method has been extensively used in the pro- 35
30
tein folding and structure refinement problems (for
25
reviews, see [62], [45]). Our group has been testing 20
the effectiveness of the method mainly in oligopep- 15
10
tide systems. The procedure of our approach is as
5
follows. While the initial conformations in the pro- 0
tein simulations are usually taken from the struc- -5

tures inferred by the experiments, our initial con- -10


-15
formations are randomly generated. Each Monte- 0 50000 100000 150000 200000
MC Sweep
Carlo sweep updates every dihedral angle (in both
the main chain and side chains) once. Our anneal- Fig. 1" Series of energy Ep ( k c a l / m o l ) o f Met-enkephalin
ing schedule is as follows: The temperature is low- from conventional canonical Monte-Carlo runs at T = 1000
K (a), 300 K (b), and 50 K (c).
ered exponentially from TI - 1000 K to TF -- 250

428
Monte-Carlo simulated annealing in protein folding

The first example is Met-enkephalin. This brain ferred from NMR experiments ([13, Fig. 2]). The
neuro peptide consists of 5 amino acids with the figures were created with RasMol [55].
amino-acid sequence: T y r - G l y - G l y - P h e - M e t . Be- We see a striking similarity between simulation
cause it is one of the smallest peptides that have results in water (Fig. 3c)) and those ofNMR exper-
biological functions, it has served as a bench iments (Fig. 3d)). The simulation results in Fig. 3
mark for testing a new simulation method. The are from the same number of MC sweeps. It seems
global minimum conformation of this peptide for that the presence of water speeds up the conver-
E C E P P / 2 energy function in gas phase (~ - 2) gence of the backbone structures in the sense that
is known [31], [49]. For KONF90 realization of it requires less number of MC sweeps for conver-
E C E P P / 2 energy, the peptide is essentially in the gence [26].
ground state for Ep _< - 1 1 kcal/mol [49], [15] and ..( (a)
the lowest value is -12.2 kcal/mol [17], [16].
In Fig. 1, we show the 'time series' of the to-
tal conformational energy Ep (in (1)) obtained by
conventional canonical Monte-Carlo simulations at
T - 1000, 300, and 50 K. ., .,

The thermal fluctuations for the run at T - 50 \, -... (b)

~, ~ i-~~- .~,.~:~-'"-,~
.--.~ ~, i ....
K in Fig. lc) are very small and this run has appar- !:~" 7'--~,, . •" ~ )",,

ently gotten trapped in states of energy local min-


ima (because the average energy at 50 K is about
- 1 1 kcal/mol [15], [16]). In Fig. 2 we display the
time series of energy obtained by a Monte-Carlo
simulated annealing simulation.
This run reaches the global minimum region
-'~~.."'.L: "'r~\ .~.
: ~,: ":-~ ..'
(Ep _< - 1 1 kcal/mol) as the temperature is de-
.

-:~-=:.: , .,:................
~.,
.: ,.~: ..,,
....
creased during the simulation from 1000 K to 50 "-~... ~a )'-.,.,. ....
.

K.
35

3O

25

20

15
"

uJ 1050

-5
-10
-15
0 50000 100000 150000 200000 Fig. 3: Superposition of the eight conformations of
MC Sweep
Met-enkephalin obtained as the lowest-energy structures
Fig. 2" Time series of energy Ep (kcal/mol) of by Monte-Carlo simulated annealing in gas phase (a),
Met-enkephalin from a Monte-Carlo simulated annealing simple-repulsive solvent (b), and water (c) together with
run. superposition of five conformations deduced from the
NMR experiment (d).
We have up to now presented the results in gas
phase (e = 2). In Fig. 3 we compare the super- The solvation free energy based on the RISM
posed structures of lowest-energy conformations theory is very accurate, but it is also computa-
from 8 Monte-Carlo simulated annealing runs in tionally very demanding. We are currently try-
gas phase, simple-repulsive solvent, and water (the ing to solve this problem making the algorithm
latter two contributions were calculated by the more efficient and robust [24]. Hereafter, we dis-
RISM theory) [26] with those of 5 structures in- cuss how well other solvation theories can still de-

429
Monte-Carlo simulated annealing in protein folding

scribe the effects of solvent in the prediction of segment with ~-strand length m _> 3.
three-dimensional structures of oligopeptides and In Table 1 we summarize the c~-helix formation
small proteins. in the 20 Monte-Carlo simulated annealing runs
Next systems we discuss are those of homo- [44]. The results are for Definition II of the a-helix
oligomers with length of 10 amino acids. From the state.
structural data base of X-ray experiments of pro- We see that (Met)i0, (Ala)i0, and (Leu)i0 gave
tein structures [8] and CD experiments [6], it is many helical conformations: 15, 9, and 9 (out
known that certain amino acids have more ten- of 20), respectively. In particular, (Met)i0 and
dency of c~-helix formation than others. For in- (Ala)i0 produced long helices, some conformations
stance, alanine is a helix former and glycine is a being almost entirely helical (t~ _> 8). On the other
helix breaker, while phenylalanine has intermedi- hand, (Val)i0, (Ile)i0, and (Gly)i0 gave few helical
ate helix-forming tendency. We have performed 20 conformations: 2, 2, and I (out of 20), respectively.
Monte-Carlo simulated annealing runs of 10,000 We obtained not only a smaller number of helices
MC sweeps in gas phase (c = 2) with each of but also shorter helices for these homo-oligomers
(Ala)10, (Leu)10, (Met)10, (Phe)10, (Ile)10, (Val)10, than the above three homo-oligomers. Finally, the
and (Gly)i0 [44]. These amino acids are nonpolar results for (Phe)i0 indicate that Phe has interme-
and we can avoid the complications of electrostatic diate helix-forming tendency between these two
and hydrogen-bond interactions of side chains with groups. We thus have the following rank order of
each other, with main chain, and with the solvent. helix-forming tendency for the seven amino acids
[44]"
In order to analyze how much c~-helix formation
is obtained by simulations, we first define c~-helix Met > Ala > Leu > Phe > Val > Ile > Gly. (10)
state of a residue. We consider that a residue is in
This can be compared with the experimentally de-
the c~-helix state when the dihedral angles (¢, ¢)
termined helix propensities [8], [6]. Our rank order
fall in the range ( - 6 0 + 45 °, - 5 0 + 45 °) (Definition
(10) is in good agreement with the experimental
I) [23], [46]. The length g of a helical segment is
data.
then defined by the number of successive residues
that are in the c~-helix state. The number n of he- We then analyzed the relation between helix-
lical residues in a conformation is defined by the forming tendency and energy. We found that
sum of ~ over all helical segments in the conforma- the differences A E = E N H - EH between min-
tion. Note that t~ = 3 corresponds to roughly one imum energies for nonhelical (NH) and helical
turn of c~-helix. We therefore consider a conforma- (H) conformations is large for homo-oligomers
tion as helical if it has a segment with helix length with high helix-forming tendency (9.7, 10.2, 21.5
~>3. kcal/mol for (Met)10, (Ala)i0, (Leu)i0, respec-
tively) and small for those with low helix-forming
The average values of the dihedral angles ¢ and
tendency (0.5, 1.6, - 3 . 2 kcal/mol for (Val)i0,
¢ for the helical segments based on Definition I (Ile)i0, (Gly)10, respectively). Moreover, we found
(with helix length t~ _> 3) are - 7 0 ° and - 3 7 °, re- that the large A E for the former homo-oligomers
spectively, and the standard deviation is --~ 10 ° for are caused by the Lennard-Jones term AELj
E C E P P / 2 energy function [46], [44]. Hence, for de-
(13.3, 8.0, 17.5 kcal/mol for (Met)j0, (Ala)i0,
tailed analyses of the data we adopt a more strin- (Leu)i0, respectively). Hence, we conjecture that
gent criterion for a-helix state (Definition II)- The the differences in helix-forming tendencies are de-
range is (¢, ¢) - ( - 7 0 d: 20 °, - 3 7 + 20 °) [44]. termined by the following factors [44]. A helical
We likewise consider that a residue is in the/~- conformation is energetically favored in general be-
strand state when the dihedral angles (¢, ¢) fall in cause of the Lennard-Jones term ELj. For amino
the range ( - 1 3 0 + 5 0 °, 135:i:45 °) [44]. The ~-strand acids with low helix-forming tendency except for
length rn is then defined to be the number of suc- Gly, however, the steric hindrance of side chains
cessive residues that are in the f~-strand state. We raises ELj of helical conformations so that the
consider a conformation as f~-stranded if it has a difference AELj between nonhelical and helical

430
Monte-Carlo simulated annealing in protein ]olding

Peptide (Met)lo (Ala)10 (Leu)lo (Phe)lo (Val)10 (Ile)10 (Gly)10


e
3 1 0 4 1 0 2 1
4 2 0 2 2 2 0 0
5 0 1 1 1 0 0 0
6 2 3 2 1 0 0 0
7 2 1 0 0 0 0 0
8 7 4 0 0 0 0 0
9 1 0 0 0 0 0 0
10 0 0 0 0 0 0 0
Total 15/20 9/20 9/20 5/20 2/20 2/20 1/20
Table 1: a-Helix formation in homo-oligomers from 20 Monte-Carlo simulated annealing runs.

conformations are reduced significantly. The small gorithms [3] in [48], [47]. The obtained results gave
AELj for these amino acids can be easily overcome quantitative support to those by Monte-Carlo sim-
by the entropic effects and their helix-forming ten- ulated annealing described above [44].
dencies are small. Note that such amino acids (Val
and Ile here) have two large side-chain branches We have so far studied peptides with nonpolar
at C ~, while the helix forming amino acids such as amino acids each of which is electrically neutral
Met and Leu have only one branch at C ~ and Ala as a whole. We now discuss the helix-forming ten-
has a small side chain. dencies of peptides with polar amino acids where
We now study the ~-strand forming tendencies side chains are charged by protonation or depro-
of these seven homo-oligomers. In Table 2 we sum- tonation. One example is the C-peptide, residues
marize the ~-strand formation in 20 Monte-Carlo 1-13 of ribonuclease A. It is known from the X-
simulated annealing runs [44]. ray diffraction data of the whole enzyme that the
The implications of the results are not as obvi- segment from Ala-4 to G l n - l l exhibits a nearly 3-
ous as in the c~-helix case. This is presumably be- turn c~-helix [64], [58]. It was also found by CD
cause a short, isolated ¢/-strand is not very stable [56] and NMR [53] experiments that the isolated
by itself, since hydrogen bonds between ~-strands C-peptide also has significant c~-helix formation in
are needed to stabilize them. However, we can still aqueous solution at temperatures near 0°C.
give a rough estimate for the rank order of strand-
forming tendency for the seven amino acids [44]:
Furthermore, the CD experiment of the isolated
C-peptide showed that the side-chain charges of
Val > Ile > P he > Leu > Ala > Met > G ly. (11)
residues Glu-2- and His-12 + enhance the stabil-
Here, we considered Val as more strand-forming ity of the c~-helix, while the rest of the charges
than Ile, since the longer the strand segment is, of other side chains do not [56]. The NMR ex-
the harder it is to form by simulation. Our rank periment [53] of the isolated C-peptide further
order (11) is again in good agreement with the ex- observed the formation of the characteristic salt
perimental data [8]. bridge between Glu-2- and Arg-10 + that exists in
By comparing (11) with (10), we find that the the native structure determined by the X-ray ex-
helix-forming group is the strand-breaking group periments of the whole protein [64], [58].
and vice versa, except for Gly. Gly is both helix
and strand breaking. This reflects the fact that In order to test whether our simulations can re-
Gly, having no side chain, has a much larger (back- produce these experimental results, we made 20
bone) conformational space than other amino Monte-Carlo simulated annealing runs of 10,000
acids. MC sweeps with several C-peptide analogues [23],
The helix-coil transitions of homo-oligomer sys- [46]. The amino-acid sequences of four of the ana-
tems were further analyzed by multicanonical al- logues are listed in Table 3.

431
Monte-Carlo simulated annealing in protein folding

Peptide (Met)lo (Ala)lo (Leu)~o (Phe)~o (Val)~o (Ile)~o (Gly)lo


m
3 0 0 2 5 1 7 0
4 0 0 0 1 0 4 0
5 0 0 0 0 2 1 0
6 0 0 0 0 1 0 0
7 0 0 0 0 0 0 0
8 0 0 0 0 1 0 0
9 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0
Total 0/20 0/20 2/20 6/20 5/20 12/20 0/20

Table 2" ~-Strand formation in homo-oligomers from 20 Monte-Carlo simulated annealing runs.

Peptide I II III IV Peptide I II III IV


Sequence t~
1 Lys + 3 4 2 3 1
2 Glu- Glu 4 3 2 3 0
3 Thr 5 1 1 0 0
4 Ala 6 0 1 0 0
5 Ala 7 0 1 1 0
6 Ala Total 8/20 7/20 7/20 1/20
7 Lys +
8 Phe Table 4: c~-Helix formation in C-peptide analogues from 20
9 Glu- Glu Leu Monte-Carlo simulated annealing runs.
10 Arg +
Peptides II and III had conformations with
11 Gln
12 His + His the longest c~-helix (t~ -- 7). These conformations
13 Met turned out to have the lowest energy in 20 simu-
lation runs for each peptide. They both exhibit an
Table 3: Amino-acid sequences of the peptide analogues of
c~-helix from Ala-5 to Gln-ll, while the structure
C-peptide studied by Monte-Carlo simulated annealing.
from the X-ray data has an c~-helix from Ala-4 to
Gln-ll. These three conformations are compared
in Fig. 4.
As mentioned above, the agreement of the back-
bone structures is conspicuous, but the side-chain
The simulations were performed in gas p h a s e structures are not quite similar. In particular,
( c - 2). The temperature was decreased exponen- while the X-ray [64], [58] and NMR [53] experi-
tially.from 1000 K to 250 K for each run. As usual, ments imply the formation of the salt bridge be-
all the simulations were started from random con- tween the side chains of Glu-2- and Arg-10 +, the
formations. lowest-energy conformations of Peptides II and III
In Table 4 we summarize the helix formation of obtained from the simulations do not have this salt
all the runs [46]. Here, the number of conforma- bridge.
tions with segments of helix length t~ _ 3 are given The disagreement is presumably caused by the
with Definition I of the c~-helix state. From this lack of solvent in our simulations. We have there-
table one sees that c~-helix was hardly formed for fore made multicanonical Monte-Carlo simulations
Peptide IV where Glu-2 and His-12 are neutral, of Peptide II with the inclusion of solvent effects
while many helical conformations were obtained by the distance-dependent dielectric function (see
for the other peptides. This is in accord with the (2)) [18], [19]. It was found that the lowest-energy
experimental results that the charges of Glu-2- conformation obtained has an c~-helix from Ala-
and His-12 + are necessary for the c~-helix stability 4 to Gln-ll and does have the characteristic salt
bridge between Glu-2- and Arg-10 + [18], [19].

432
Monte-Carlo simulated annealing in protein folding

Similar dependence of a-helix stability on side-


~I (a) chain charges was observed in Monte-Carlo simu-
;\ lated annealing runs of a 17-residue synthetic pep-
tide [43]. The pH difference in the experimental
conditions was represented by the corresponding
difference in charge assignment of the side chains,
and the agreement with the experimental results
(stable c~-helix formation at low pH and low helix
content at high pH) was observed in the simula-
tions by Monte-Carlo simulated annealing with the
distance-dependent dielectric function [43].
Considering our simulation results on homo-
oligomers of nonpolar amino acids, C-peptide, and
b) the synthetic peptide, we conjecture that the helix-
forming tendencies of oligopeptide systems are
controlled by the following factors [43]. An c~-
helix structure is generally favored energetically
(especially, the Lennard-Jones term). When side
chains are uncharged, the steric hindrance of side

i
, ?
' ;]'

/
chains is the key factor for the difference in helix-
forming tendency. When some of the side chains
are charged, however, these charges play an im-
+ portant role in the helix stability in addition to
the above factor: Some charges enhance helix sta-
bility, while others reduce it.
We have up to now discussed a-helix for-
mations in our simulations of oligopeptide sys-
tems. We have also studied /~-sheet forma-
tions by Monte-Carlo simulated annealing [38],
[39], [51]. The peptide that we studied is the
1 (c) fragment corresponding to residues 16-36 of
bovine pancreatic trypsin inhibitor (BPTI) and
has the amino-acid sequence" Ala:6-Arg+-Ile-Ile -
!,, Arg +-Tyr-Phe-Tyr-Asn-Ala-Lys+-Ala-Gly-Leu-
Cys-Gln-Thr-Phe-Val-Tyr-Gly 36. An antiparallel
T 'i .
~-sheet structure in residues 18-35 is observed in
X-ray crystallographic data of the whole protein
[10].
We first performed 20 Monte-Carlo simulated
annealing runs of 10,000 MC sweeps in gas phase
(c = 2) with the same protocol as in the pre-
vious simulations [38]. Namely, the temperature
was decreased exponentially from 1000 K to 250 K
for each run, and all the simulations were started
Fig. 4: The lowest-energy conformations of Peptide II (a)
and Peptide III (b) of C-peptide analogues obtained from from random conformations. The difference of the
20 Monte-Carlo simulated annealing runs in gas phase, present simulation and the previous ones comes
and the corresponding X-ray structure (c). only from that of the amino-acid sequences.

433
Monte-Carlo simulated annealing in protein/olding

The most notable feature of the obtained results resentation of solvent by the sigmoidal dielectric
is that a-helices, which were the dominant motif in function (which gave c~-helices instead) is therefore
previous simulations of C-peptide and other pep- not sufficient. Hence, the same peptide fragment,
tides, are absent in the present simulation. Most BPTI(16-36), was further studied in aqueous solu-
of the conformations obtained consist of stretched tion that is represented by solvent-accessible sur-
strands and a 'turn' which connects them. The face area of (3) by Monte-Carlo simulated an-
lowest-energy structure indeed exhibits an antipar- healing [51]. Twenty simulation runs of 100,000
allel ~-sheet [38]. MC sweeps were made. It was indeed found that
the lowest-energy structure obtained has a ~-
We next made 10 Monte-Carlo simulated an-
sheet structure (actually, type II ~ ~-turn) at the
nealing runs of 100,000 MC sweeps for BPTI(16-
very location suggested by the NMR experiments
36) with two dielectric functions: c = 2 and the
[40]. This structure and that deduced from the
sigmoidal, distance-dependent dielectric function
X-ray experiments [10] are compared in Fig. 5.
of (2) [39]. The results with c = 2 reproduced our
The figures were created with Molscript [29] and
previous results: Most of the obtained conforma-
Raster3D [2], [35].
tions have ~-strand structures and no extended c~-
helix is observed. Those with the sigmoidal dielec- Although both conformations are ~-sheet struc-
tric function, on the other hand, indicated forma- tures, there are important differences between the
tion of a-helices. One of the low-energy conforma- two: The positions and types of the turns are differ-
tions, for instance, exhibited about a four-turn c~- ent. Since the X-ray structure is taken from the ex-
helix from Ala-16 to Gly-28 [39]. This presents an periments on the whole B P T I molecule, it does not
example in which a peptide with the same amino- have to agree with that of the isolated BPTI(16-
acid sequence can form both c~-helix and ~-sheet 36) fragment. It was found [51] that the simulated
structures, depending on its electrostatic environ- results in Fig. 5b) have remarkable agreement with
ment. those in the NMR experiments of the isolated frag-
ment [40].
We have so far dealt with peptides with small
number of amino acids (up to 21) with simple sec-
ondary structural elements: a single a-helix or ~-
sheet. The native proteins usually have more than
one secondary structural elements. We now discuss
our attempts on the first-principles tertiary struc-
ture predictions of larger and more complicated
systems.
The first example is the fragment correspond-
ing to residues 1-34 of human parathyroid hor-
".~ ~ ~:... ~ ~.,,
mone (PTH). An NMR experiment of PTH(1-34)
.,.,.,-,,.""~ suggested the existence of two c~-helices around
residues from Ser-3 to His-9 and from Set-17 to
Let-28 [28]. Another NMR experiment of a slightly
longer fragment, PTH(1-37), in aqueous solution
Fig. 5: The structure of BPTI(16-36) deduced from X-ray also suggested the existence of the two helices [32].
experiments (a) and the lowest-energy conformation of One of the determined structures, for instance, has
BPTI(16-36) obtained from 20 Monte-Carlo simulated c~-helices in residues from Gln-6 to His-9 and from
annealing runs in aqueous solution represented by
Ser-17 to Lys-27 [32].
solvent-accessible surface area (b).
For PTH(1-34) we performed 20 Monte-Carlo
NMR experiments suggest that this peptide ac- simulated annealing runs of 10,000 MC sweeps
tually forms a f~-sheet structure [40]. The rep- in gas phase (c = 2) with the same protocol as

434
Monte-Carlo simulated annealing in protein folding

in the previous simulations [50]. Many conforma- nealing [34] to compare with the results of the
tions among the 20 final conformations obtained recent NMR experiment in aqueous solution [32].
exhibited c~-helix structures (especially in the N- Ten simulation runs of 100,000 MC sweeps were
terminus area). In Fig. 6 we show the lowest-energy made in gas phase (e = 2) and in aqueous solu-
conformation of PTH(1-34) [50]. tion that is represented by the terms proportional
This conformation indeed has two c~-helices to the solvent-accessible surface area (see (3)). Al-
around residues from Val-2 to Asn-10 (Helix 1) though the results are preliminary, the simulations
and from Met-18 to Glu-22 (Helix 2), which are in gas phase did not produce two helices this time
precisely the same locations as suggested by exper- in contrast to the previous work [50], where a short
iment [28], although Helix 2 is somewhat shorter second helix was observed, as discussed in the pre-
(5 residues long) than the corresponding one (12 vious paragraph. The lowest-energy conformation
residues long) in the experimental data. has an c~-helix from Val-2 to Asn-10. The simula-
tions in aqueous solution, on the other hand, did
observe the two c~-helices. The lowest-energy con-
formation obtained has a-helices from Gln-6 to
His-9 and from Gly-12 to Glu-22. Note that the
elix 2 second helix is now more extended than the first
one in agreement with experiments. This structure
together with one of the NMR structure [32] is
shown in Fig. 7. The figures were again created
with Molscript [29] and Raster3D [2], [35].
Generalized-ensemble simulations of P T H ( 1 -
~ - Helix I
37) are now in progress in order to obtain more
quantitative information such as average helicity
as a function of residue number, etc.
Fig. 6: Lowest-energy conformation of PTH(1-34)
obtained from 20 Monte-Carlo simulated annealing runs in The second example of more complicated sys-
gas phase. tern is the immunoglobulin-binding domain of
streptococcal protein G. This protein is composed
..~-.~--~
"? c
, . ~ ' ~
., :..... "
of 56 amino acids and the structure determined
by an NMR experiment [14] and an X-ray diffrac-
J tion experiment [1] has an a-helix and a ~-sheet.
The c~-helix extends from residue Ala-23 to residue
Asp-36. The /~-sheet is made of four ~-strands:
(b)
from Met-1 to Gly-9, from Let-12 to Ala-20, from
L
Glu-42 to Asp-46, and from Lys-50 to Glu-56.
This structure is shown in Fig. 8a). The figures
in Fig. 8 were again created with Molscript [29]
. - ~ ~ - and Raster3D [2], [35].
We have performed eight Monte-Carlo simu-
lated annealing runs of 50,000 to 400,000 MC
Fig. 7: A structure of PTH(1-37) deduced from NMR sweeps with the sigmoidal, distance-dependent di-
experiments (a) and the lowest-energy conformation of electric function of (2). The lowest-energy confor-
PTH(1-37) obtained from 10 Monte-Carlo simulated mation so far obtained has four a-helices and no
annealing runs in aqueous solution represented by
~-sheet in disagreement with the X-ray structure.
solvent-accessible surface area (b).
This structure is shown in Fig. 8b).
A slightly larger peptide fragment, PTH(1-37), The disagreement of the lowest-energy structure
was also studied by Monte-Carlo simulated an- (Fig. 8b)) so far obtained with the X-ray structure

435
Monte-Carlo simulated annealing in protein folding

(Fig. 8a)) is presumably caused by the poor repre- of some solvent effects is very important for a suc-
sentation of the solvent effects. As can been seen cessful prediction of the tertiary structures of small
in Fig. 8 a), the X-ray structure has both interior peptides and proteins.
where a well-defined hydrophobic core is formed See also: S i m u l a t e d a n n e a l i n g m e t h o d s in
and exterior where it is exposed to the solvent. p r o t e i n folding; P a c k e t annealing; P h a s e
The distance-dependent dielectric function, which p r o b l e m in X - r a y c r y s t a l l o g r a p h y : Shake
mimics the solvent effects only in electrostatic in- and bake approach; Global o p t i m i z a t i o n in
teractions, is therefore not sufficient to represent p r o t e i n folding; M u l t i p l e m i n i m a p r o b l e m
the effects of the solvent here. in p r o t e i n folding: a B B global optimiza-
(a)
tion approach; A d a p t i v e s i m u l a t e d anneal-
ing and its a p p l i c a t i o n to p r o t e i n fold-
ing; Genetic algorithms; M o l e c u l a r struc-
t u r e d e t e r m i n a t i o n : Convex global underes-
timation; Global o p t i m i z a t i o n in L e n n a r d -
• "-- r "
J o n e s and M o r s e clusters; P r o t e i n fold-

t .Z "~'
~ °
(b) ing: G e n e r a l i z e d - e n s e m b l e algorithms; Sim-
u l a t e d annealing; Bayesian global optimi-
zation; R a n d o m search m e t h o d s ; Stochas-
tic global o p t i m i z a t i o n : T w o - p h a s e m e t h -
ods; Global o p t i m i z a t i o n based on statisti-
k, .M. cal models; Stochastic global optimization:
S t o p p i n g rules; G e n e t i c a l g o r i t h m s for pro-
Fig. 8: A structure of protein G deduced from an X-ray
tein s t r u c t u r e prediction; M o n t e - C a r l o sim-
experiment (a) and the lowest-energy conformation of ulations for stochastic optimization.
protein G obtained from Monte-Carlo simulated annealing
runs with the distance-dependent dielectric function (b).
References
[1] ACHARI, A., HALE, S.P., HOWARD, A.J., CLORE,
G.M., GRONENBORN, A.M., HARDMAN, K.D., AND
Conclusions. In this article we have reviewed the-
WHITLOW, M.: '1.67-/~ X-ray structure of the B2
oretical aspects of the protein folding problem. immunoglobulin-binding domain of streptococcal pro-
Our strategy in tackling this problem consists of tein G and comparison to the NMR structure of the
two elements: 1) inclusion of accurate solvent ef- B1 domain', Biochemistry 31 (1992), 10449-10457.
fects, and 2) development of powerful simulation [2] BACON, D., AND ANDERSON, W.F.: 'A fast algorithm
algorithms that can avoid getting trapped in states for rendering space-filling molecular pictures', J. Mol.
Graphics 6 (1988), 219-220.
of energy local minima.
[3] BERG, B.A., AND NEUHAUS, W.: 'Multicanonical al-
We have shown the effectiveness of Monte-Carlo gorithms for first order phase transitions', Phys. Leli.
simulated annealing by showing that direct folding B267 (1991), 249-253.
of (~-helix and ~-sheet structures from randomly- [4] BROOKS, III, C.L.: 'Simulations of protein folding and
generated initial conformations are possible. unfolding', Curt. Opin. Struct. Biol. 8 (1998), 222-226.
[5] BRUNGER, A.T.: 'Crystallographic refinement by simu-
As for the solvent effects, we considered sev- lated annealing: Application to a 2.8/~ resolution struc-
eral methods: a distance-dependent dielectric func- ture of aspartate aminotransferase', J. Mol. Biol. 203
tion, a term proportional to solvent-accessible (1988), 803-816.
surface area, and the reference interaction site [6] CHAKRABARTTY, A., KORTEMME, W., AND BALDWIN,
model (RISM). These methods vary in nature from R.L.: 'Helix propensities of the amino acids measured
in alanine-based peptides without helix-stabilizing
crude but computationally inexpensive (distance-
side-chain interactions', Protein Sci. 3 (1994), 843-852.
dependent dielectric function) to accurate but [7] CHANDLER, D., AND ANDERSEN, H.C.: 'Optimized
computationally demanding (RISM theory). In the cluster expansions for classical fluids. Theory of molec-
present article, we have shown that the inclusion ular liquids', J. Chem. Phys. 57 (1972), 1930-1937.

436
Monte-Carlo simulated annealing in protein folding

Is] CHOU, P.Y., AND FASMAN, G.D.: 'Prediction of pro- Carlo simulated annealing method', Protein Engin. 3
tein conformation', Biochemistry 13 (1974), 222-245. (1989), 85-94.
[9] DAGGETT, V., KOLLMAN, P.A., AND KUNTZ, I.D.: [23] KAWAI, H., OKAMOTO, Y., FUKUGITA, M.,
'Molecular dynamics simulations of small peptides: de- NAKAZAWA, T., AND KIKUCHI, T.: 'Prediction of
pendence on dielectric model and pH', Biopolymers 31 a-helix folding of isolated C-peptide of ribonuclease
(1991), 285-304. A by Monte Carlo simulated annealing', Chem. Lett.
[i0] DEISENHOFER, J., AND STEIGEMANN, W.: 'Crystallo- (1991), 213-216.
graphic refinement of the structure of bovine pancreatic [24] KINOSHITA, M., OKAMOTO, Y., AND HIRATA, F.: 'Cal-
trypsin inhibitor at 1.5/~ resolution', Acta Crystallogr. culation of hydration free energy for a solute with many
331 (1975), 238-250. atomic sites using the RISM theory: robust and effi-
[Ii] DILL, K.: 'The meaning of hydrophobicity', Science cient algorithm', J. Comput. Chem. 18 (1997), 1320-
250 (1990), 297-297. 1326.
[12] EPSTAIN, C.J., GOLDBERGER, R.F., AND ANFINSEN, [25] KINOSHITA, M., OKAMOTO, Y., AND HIRATA, F.: 'Sol-
C.B.: 'The genetic control of tertiary protein struc- vation structure and stability of peptides in aqueous so-
ture: studies with model systems', Cold Spring Harbor lutions analyzed by the reference interaction site model
Symp. Quant. Biol. 28 (1963), 439-449. theory', J. Chem. Phys. 107 (1997), 1586-1599.
[13] GRAHAM, W.H., II, E.S. CARTER, AND HICKS, R.P.: [26] KINOSHITA, M., OKAMOTO, Y., AND HIRATA, F.:
'Conformational analysis of Met-enkephalin in both 'First-principle determination of peptide conformations
aqueous solution and in the presence of sodium dode- in solvents: combination of Monte Carlo simulated an-
cyl sulfate micelles using multidimensional NMR and nealing and RISM theory', J. Amer. Chem. Soc. 120
molecular modeling', Biopolymers 32 (1992), 1755- (1998), 1855-1863.
1764. [27] KIRKPATRICK, S., C.D. GELATT, JR., AND VECCHI,
[14] GRONENBORN, A.M., FILPULA, D.R., Essm, N.Z., M.P.: 'Optimization by simulated annealing', Science
ACHARI, A., WHITLOW, M., WINGFIELD, P.T., AND 220 (1983), 671-680.
CLORE, G.M.: 'A novel, highly stable fold of the im- [2s] KLAUS, W., DIECKMANN, T., WRAY, V., SCHOM-
munoglobulin binding domain of streptococcal protein BURG, D., WINGENDER, E., AND MAYER, H.: 'In-
G', Science 253 (1991), 657-661. vestigation of the solution structure of the human
[15] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Prediction parathyroid hormone fragment (1-34) by IH NMR
of peptide conformation by multicanonical algorithm: spectroscopy, distance geometry, and molecular dy-
new approach to the multiple-minima problem', J. namics calculations', Biochemistry 30 (1991), 6936-
Comput. Chem. 14 (1993), 1333-1338. 6942.
[16] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Compara- [29] KRAULIS, P.J.: 'MOLSCRIPT: A program to produce
tive study of multicanonical and simulated annealing both detailed and schematic plots of protein struc-
algorithms in the protein folding problem', Phys. A tures', J. Appl. Crystallogr. 24 (1991), 946-950.
212 (1994), 415-437. [3o] LEVINTHAL, C.: 'Are there pathways for protein fold-
[17] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Sampling ing?', J. Chem. Phys. 65 (1968), 44-45.
ground-state configurations of a peptide by multi- [31] LI, Z., AND SCHERAGA, H.A.: 'Monte Carlo-
canonical annealing', J. Phys. Soc. Japan 63 (1994), minimzation approach to the multiple-minima prob-
3945-3949. lem in protein folding', Proc. Nat. Acad. Sci. USA 84
[is] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Tertiary (1987), 6611-6615.
structure prediction of C-peptide of ribonuclease A [32] MARX, U.T., AUSTERMANN, S., BAYER, P., ADER-
by multicanonical algorithm', J. Phys. Chem. B 102 MANN, K., EJCHART, A., STICHT, H., WALTER, S.,
(1998), 653-656. SCHMID, F.-X., JAENICKE, R., FORSSMANN, W.-G.,
[19] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Effects of AND ROSCH, P.: 'Structure of human parathyroid hor-
side-chain charges on a-helix stability in C-peptide of mone 1-37 in solution', J. Biol. Chem. 270 (1995),
ribonuclease A studied by multicanonical algorithm', 15194-15202.
J. Phys. Chem. B 103 (1999), 1595-1604. [33] MASUYA, M., AND OKAMOTO, Y., in preparation.
[2o] HINGERTY, B.E., RITCHIE, R.H., FERRELL, T., AND [34] MASUYA, M., AND OKAMOTO, V., in preparation.
TURNER, J.E.: 'Dielectric effects in biopolymers: the [35] MERRITT, E.A., AND MURPHY, M.E.P.: 'Raster3D
theory of ionic saturation revisited', Biopolymers 24 version 2.0. A program for photorealistic molecular
(1985), 427-439. graphics', Acta Crystallogr. DS0 (1994), 869-873.
[21] HIRATA, F., AND ROSSKY, P.J.: 'An extended RISM [36] METROPOLIS, N., ROSENBLUTH, A., ROSENBLUTH,
equation for molecular polar fluids', Chem. Phys. Left. M., TELLER, A., AND TELLER, E.: 'Equation of state
83 (1981), 329-334. calculations by fast computing machines', J. Chem.
[22] KAWAI, H., KIKUCHI, T., AND OKAMOTO, Y.: 'A pre- Phys. 21 (1953), 1087-1092.
diction of tertiary structures of peptide by the Monte [37] MOMANY, F.A., McGumE, R.F., BURGESS, A.W.,

437
Monte-Carlo simulated annealing in protein folding

AND SCHERAGA, H.A.: 'Energy parameters in polypep- fragment (1-34) predicted by Monte Carlo simulated
tides. VII. Geometric parameters, partial atomic annealing', Internat. J. Peptide Protein Res. 42 (1993),
charges, nonbonded interactions, hydrogen bond inter- 300-303.
actions, and intrinsic torsional potentials for the natu- [51] OKAMOTO, Y., MASUYA, M., NABESHIMA, M., AND
rally occurring amino acids', J. Phys. Chem. 79 (1975), NAKAZAWA, T.: '~-Sheet formation in BPTI(16-36) by
2361-2381. Monte Carlo simulated annealing', Chem. Phys. Lett.
[38] NAKAZAWA, T., KAWAI, H., OKAMOTO, Y., AND 299 (1999), 17-24.
FUKUGITA, M.: '~-sheet folding of bovine pancreatic [52] OOI, T., OOBATAKE, M., NI3METHY, C., AND SCHER-
trypsin inhibitor fragment (16-36) as predicted by AGA, H.A.: 'Accessible surface areas as a measure of
Monte Carlo simulated annealing', Protein Engin. 5 the thermodynamic parameters of hydration of pep-
(1992), 495-503. tides', Proc. Nat. Acad. Sci. USA 84 (1987), 3086-
[39] AKAZAWA, T., AND OKAMOTO, Y.: 'Electrostatic ef-
N 3090.
fects on the a-helix and 13-strand folding of BPTI(16- [~3] OSTERHOUT, J.J., BALDWIN, R.L., YORK, E.J.,
36) as predicted by Monte Carlo simulated annealing', STEWART, J.M., DYSON, H.J., AND WRIGHT, P.E.:
J. Peptide Res. 54 (1999), 230-236. 'IH NMR studies of the solution conformations of an
[40] NAKAZAWA, T., OKAMOTO,Y., KOBAYASHI, Y., KYO- analogue of the C-peptide of ribonuclea.se A', Biochem-
GOKU, Y., AND AIMOTO, S., in preparation. istry 28 (1989), 7059-7064.
[41] NI~METHY, G., POTTLE, M.S., AND SCHERAGA, H.A." [54] RAMSTEIN, J., AND LAVERY, R.: 'Energetic coupling
'Energy parameters in polypeptides. 9. Updating of ge- between DNA bending and base pair opening', Proc.
ometrical parameters, nonbonded interactions, and hy- Nat. Acad. Sci. USA 85 (1988), 7231-7235.
drogen bond interactions for the naturally occurring [55] SAYLE, R.A., AND MILNER-WHITE, E.J.: 'RasMol:
amino acids', J. Phys. Chem. 87 (1983), 1883-1887. biomolecular graphics for all', TIBS 20 (1995), 374-
[42] NILGES, M., CLORE, G.M., ANDGRONENBORN, A.M.: 376.
'Determination of three-dimensional structures of pro- [56] SHOEMAKER, K.R., KIM, P.S., BREMS, D.N., MAR-
teins from interproton distance data by hybrid distance QUSEE, S., YORK, E.J., CHAIKEN, I.M., STEWART,
geometry-dynamical simulated annealing calculations', J.M., AND BALDWIN, R.L.: 'Nature of the charged-
FEBS Lett. 229 (1988), 317-324. group effect on the stability of the C-peptide helix',
[43] OKAMOTO, Y.: 'Dependence on the dielectric model Proc. Nat. Acad. Sci. USA 82 (1985), 2349-2353.
and pH in a synthetic helical peptide studied by Monte [57] SIPPL, M.J., NI3METHY, G., AND SCHERAGA, H.A."
Carlo simulated annealing', Biopolymers 34 (1994), 'Intermolecular potentials from crystal data. 6. Deter-
529-539. mination of empirical potentials for O-H... O - C hy-
[44] OKAMOTO, Y.: 'Helix-forming tendencies of nonpolar drogen bonds from packing configurations', J. Phys.
amino acids predicted by Monte Carlo simulated an- Chem. 88 (1984), 6231-6233.
nealing', PROTEINS: Struct. Funct. Genet. 19 (1994), [58] WILTON, JR., R.F., DEWAN, J.C., AND PETSKO, G.A.:
14-23. 'Effects of temperature on protein structure and dy-
[45] OKAMOTO, Y.: 'Protein folding problem as studied namics: X-ray crystallographic studies of the protein
by new simulation algorithms', Recent Res. Developm. ribonuclease-A at nine different temperatures from 98
Pure Appl. Chem. 2 (1998), 1-23. to 320 K', Biochemistry 31 (1992), 2469-2481.
[46] OKAMOTO, Y., FUKUGITA, M., NAKAZAWA, T., AND [59] WESSON, L., AND EISENBERG, D.: 'Atomic solvation
KAWAI, H.: 'a-helix folding by Monte Carlo simulated parameters applied to molecular dynamics of proteins
annealing in isolated C-peptide of ribonuclease A', Pro- in solution', Protein Sci. 1 (1992), 227-235.
tein Engin. 4 (1991), 639-647. [6o] WETLAUFER, D.B.: 'Nucleation, rapid folding, and
[47] OKAMOTO, Y., AND HANSMANN, U.H.E.: 'Thermody- globular intrachain regions in proteins', Proc. Nat.
namics of helix-coil transitions studied by multicanon- Acad. Sci. USA 70 (1973), 697-701.
ical algorithms', J. Phys. Chem. 99 (1995), 11276- [61] WILSON, C., AND DONIACH, S.: 'A computer model
11287. to dynamically simulate protein folding: studies with
[48] OKAMOTO, Y., HANSMANN, U.H.E., AND NAKAZAWA, crambin', PROTEINS: Struct. Funct. Genet. 6 (1989),
T.: 'a-Helix propensities of amino acids studied by 193-209.
multicanonical algorithm', Chem. Lett. (1995), 391- [62] WILSON, S.R., AND CUI, W.: 'Conformation searching
392. using simulated annealing': The Protein Folding Prob-
[49] OKAMOTO, Y., KIKUCHI, T., AND KAWAI, H.: 'Pre- lem and Tertiary Structure Prediction, Lecture Notes,
diction of low-energy structures of Met-enkephalin by Birkh~.user, 1994, pp. 43-70.
Monte Carlo simulated annealing', Chem. Left. (1992), [63] WILSON, S.R., CuI, W., MOSKOWITZ, J.W., AND
1275-1278. SCHMIDT, K.E.: 'Conformational analysis of flexible
[50] OKAMOTO, Y., KIKUCHI, T., NAKAZAWA, T., AND molecules: location of the global minimum energy con-
KAWAI, H.: 'a-Helix structure of parathyroid hormone formation by the simulated annealing method', Tetra-

438
Monte-Carlo simulations for stochastic optimization

hedron Lett. 29 (1988), 4373-4376. Here, ~ is the vector of random elements from h, q,
[64] WYCHOFF, H.W., TSERNOGLOU, D., HANSON, A.W., T, and W. A prototypical problem of this nature
KNOX, J.R., LEE, B., AND RICHARDS, F.M.: 'The
is a capacity allocation model under uncertain de-
three-dimensional structure of ribonuclease-S', J. Biol.
Chem. 245 (1970), 305-328. mand and/or capacity availabilities, x is a strate-
gic decision allocating resources while y represents
Yuko Okamoto
an operational recourse decision that is made after
Dept. Theoret. Stud. Inst. Molecular Sci.
and observing the demand and availabilities. Example
Dept. Functional Molecular Sci. applications of this type include capacity expan-
Graduate Univ. Adv. Stud. sion planning in an electric power system [16] and
Okazaki, Aichi 444-8585, Japan in a telecommunications network [61]. The two-
E-mail address: okamotoy©ims, ac. jp
stage model generalizes to a more dynamic, multi-
MSC 2000:92C40 stage model (see, e.g., [10]) in which decisions are
Key words and phrases: simulated annealing, protein fold- made, and random events unfold, over time. For
ing, tertiary structure prediction, c~-helix, f~-sheet.
multistage applications in asset-liability manage-
ment see [13] and in hydro-electric scheduling see
[39].
MONTE-CARLO SIMULATIONS FOR STO-
In the context of a simulation model, ](x,~)
CHASTIC OPTIMIZATION
could represent a performance measure under a
Many important real-world problems contain sto-
design specified by x. For example, f ( x , ~) might
chastic elements and require optimization. Sto-
represent the number of hours in a workday that a
chastic programming and simulation-based optimi-
critical machine is blocked in a queueing network
zation are two approaches used to address this
model of a manufacturing system in which buffer
issue. We do not explicitly discuss other related
sizes are determined by x. In another application,
areas including stochastic control, stochastic dy-
E.L. Plambeck et al. [53] allocate constrained pro-
namic programming, and Markov decision pro-
cessing rates to unreliable machines with buffers
cesses. We consider a stochastic optimization prob-
in a fluid serial queueing network in order to max-
lem of the form
imize steady-state throughput. In nonterminating
(SP) z* = minEf(x,~), simulations, the expectation in El(x, ~) is typically
xEX
with respect to a steady-state distribution.
where x is a vector of decision variables with de-
terministic feasible region X C R d, ~ is a random Note that Ef(x,~) can capture objectives not
usually thought of as a 'mean'. For example, if c
vector, and ] is a real-valued function with finite
represents random rates of return and x invest-
expectation, Ef(x,~), for all x E X. We use x* to
ment amounts, we might want to maximize the
denote an optimal solution to (SP). Note that the
decision x must be made prior to observing the probability of exceeding a return threshold, T. We
realization of ~. can write P(cx > T) - EI(cx > T) where I(.)
is the indicator ]unction that takes value one if
A wide variety of types of problems can be ex-
its argument is true and zero otherwise. For more
pressed as (SP) depending on the definitions of
on probability maximization models (and general-
f and X. Two of the most commonly-used ap-
izations of (SP) in which X contains probabilistic
proaches are rooted in mathematical programming
constraints) see [54]. See [45] for a discussion of
and in discrete-event simulation modeling.
risk modeling in stochastic optimization.
In a two-stage stochastic linear program with
A more general model than (SP) allows the dis-
recourse [6], [14], X is a polyhedral set and f is
tribution of ~ to depend on x. Some simple types
defined as the optimal value of a linear program,
of dependencies can effectively be captured in (SP)
given x and ~, i.e.,
via modeling tricks, such as the x scaling random
min qy elements of T in (1). General dependencies, how-
f ( x , ~ ) - cx + v>_o (1) ever, are difficult to handle. For work on decision-
s.t. Wy- T x + h.
dependent distributions when there are a finite

439
Monte-Carlo simulations .for stochastic optimization

number of possibilities see [26], [40]. S o l u t i o n P r o c e d u r e s . Monte-Carlo methods


Regardless of whether it is defined as the ex- for approximately solving stochastic optimization
pected value of a mathematical program or as problems can typically be classified on the basis
a long-run average performance measure of a of whether the sampling is external to, or inter-
discrete-event simulation model, it is usually im- nal to, the optimization algorithm. Solution pro-
possible to calculate Ef(x,~) exactly even for a cedures of both types are driven by estimates of
fixed value of x. When the dimension of the ran- objective function values and/or gradients. Before
dom vector ~ is relatively low, one approach is turning to solution procedures we briefly discuss
to obtain deterministic approximations of El(x, ~) gradient estimation.
using numerical quadrature or related ideas. In In stochastic programming, gradient (or sub-
stochastic programming, this corresponds to gen- gradient) estimates of Ef(x,~) are typically avail-
erating and refining bounds on Ef(x,~) within able via duality. In simulation-based optimization,
a sequential approximation algorithm [20], [24], the primary methods for obtaining gradient es-
[43]. For problems in which ~ is of moderate-to- timates are finite differences, the likelihood ra-
high dimension and is continuous or has a large tio (LR) method (also called the score function
number of realizations, Monte-Carlo simulation is method) [29], [57], and infinitesimal perturbation
widely regarded as the method of choice for esti- analysis (IPA)[27], [35]. Finite-difference approx-
mating Ff(x,~). As a result, it is not surprising imations require minimal structure, needing only
that Monte-Carlo techniques play a fundamental estimates of Ef(x,~); however, they result in so-
role in solving (SP). lution procedures that can converge slowly. The
In recent years (1999), considerable progress has LR method is more widely applicable than IPA,
been made in solving realistically-sized problems but when both apply the IPA approach tends to
with a significant number of stochastic parameters produce estimators with lower variance. See, for
and decision variables. The telecommunications example, [28] for a discussion of these issues.
model considered in [61] has 86 random point-to- In the simplest form of 'external sampling' (also
point demand pairs and 89 links on which capacity called 'sample-path optimization' [55] and the 'sto-
may be installed. In [53] queueing networks with chastic counterpart' method [57]) we generate in-
up to 50 nodes are studied. Each node represents dependent and identically distributed ([.i.d.) repli-
a machine with random failures and has a deci- cates ~ 1 , . . . , ~n from the distribution of ~ and form
sion variable denoting its assigned cycle time. [53] the approximating problem
also solves a stochastic P E R T (program evaluation n

and review technique) problem with 70 nodes and (SPn) zn rain 1 f(x,~i).
xEX n
i--1
110 stochastic arcs. The arcs model the times re-
quired to complete activities and a decision vari- Even when it is possible to construct (SPn) us-
able associated with each arc influences (param- ing [.i.d. variates, it may be preferable to use an-
eterizes) the distribution of the random activity other sampling scheme in order to reduce the vari-
duration. These problems contain objectives with ance of the resulting estimators. Moreover, in non-
high-dimensional expectations and all were solved terminating simulation models, generating [.i.d.
using Monte-Carlo methods. replicates from a stationary distribution is often
In this article we discuss: impossible (for exceptions see recent work on ex-
act sampling, e.g., [3], [22]), but under appropriate
several types of Monte-Carlo-based solution
conditions we may run the simulation for a length
procedures that can be used for solving (SP);
n and replace the objective function in (SPn) with
ii) methods for testing the quality of a candidate a consistent estimate of the desired long-run aver-
solution ~ E X; age performance measure.
iii) variance reduction techniques used in sto- After constructing an instance of (SPn) we em-
chastic optimization; and ploy a (deterministic) optimization algorithm to
iv) theoretical justification for using sampling. obtain a solution x n.* In the case of stochastic lin-

440
Monte-Carlo simulations .for stochastic optimization

ear programming, (SPn) is a large scale linear pro- the estimates need not be unbiased but the bias
gram. The cutting plane algorithm of R.M. Van must effectively shrink to zero as the algorithm
Slyke and R.J-B. Wets [64], its variant with a qua- proceeds. For convergence properties of SA meth-
dratic proximal term [58], and its multistage ver- ods see [49] and for SQG procedures see [23].
sion [7], [9] are powerful tools for solving such prob- Cutting plane methods are applicable when
lems. A cutting plane algorithm with a proximal Ef(x,~) is convex. The iterates {x e} are found by
term and IPA-based gradients is used in an exter- solving a sequence of optimization problems of the
nal sampling method for solving the queueing net- form
work problem in [53]. See [8] for a recent survey
min max Ef(x e, ~) + V E f ( x e, ~)(x - xe),
of computational methods for stochastic program- x E X ~=I,...,L
ming instances of (SPn). where L grows as the algorithm proceeds. At
Intuitively, we might expect solutions of (SPn) each iteration a first order Taylor approximation
to more accurately approximate solutions of (SP) of Ef(x,~), i.e., a cutting plane, is computed at
as n increases. We discuss results supporting this the current iterate x g and is used to refine the
in the section 'Theoretical Justification for Sam- piecewise-linear outer approximation of Ef(x,~).
pling'. In addition, after having solved (SPn) to The key idea is that this approximation need
obtain x n* it would be desirable to know whether n only be accurate in the neighborhood of an opti-
was 'large enough'. More generally, we would like mal solution. For stochastic linear programs, G.B.
to be able to test the quality of a candidate so- Dantzig, P.W. Glynn [15], and G. Infanger [37],
lution (such as x~). This is discussed in the next [38] and J.L. nigle and S. Sen [32], [34] have de-
section. veloped Monte-Carlo-based cutting plane methods
We now turn to solution procedures based on by using statistical estimates for the cut intercepts
internal sampling. These algorithms adapt deter- and gradients. Dantzig, Glynn, and Infanger use
ministic optimization algorithms by replacing ex- separate streams of observations of ~ to estimate
act function and gradient evaluations with Monte- each cut. The stochastic decomposition algorithm
Carlo estimates. The sampling is internal because of Higle and Sen uses common random number
new observations of ~ are generated on an as- streams to calculate each cut and employs an up-
needed basis at each iteration of the algorithm. dating procedure to ensure that the statistical cuts
We briefly discuss stochastic adaptations of steep- are asymptotically valid (i.e., lie below Ef(x,~)).
est descent and cutting plane methods. Relative to SA and SQG methods, cutting plane
A deterministic steepest descent algorithm for procedures avoid potentially difficult projections
(SP) forms iterates {x g} using the recursion and, in practice, have a reputation for converging
more quickly, particularly when X is high dimen-
-nx sional.
Grid search and optimization of metamodels are
1-Ix performs a projection onto X and {pe} are two common approaches to optimizing system per-
steplengths. It is usually impossible to calculate formance in discrete-event simulation models. In
VEf(x,~) exactly and it must be estimated. Sto- grid search, X is replaced by a 'grid' of points
chastic approximation (SA) and stochastic quasi- X m - - { x l , . . . , x m } and sample-mean estimates
gradient (SQG) algorithms are stochastic variants n
1
of a steepest descent search. The Kei]er-Wolfowitz
Z #)
SA method uses unbiased estimates of Ef(x,~) to i=1
form finite-difference approximations of the gra- are formed at each x E Xm. (SP) is then approx-
dient. The Robbins-Monro SA procedure requires imately solved by z n - minxexm fn(X) with x n
unbiased estimates of VEf(x, ~). SQG methods do being the associated minimizer. Grid search is at-
not require that El(x, ~) be differentiable and work tractive because it requires minimal structure, but
under more general assumptions concerning the es- in implementing this procedure, we must exercise
timates of (sub)gradients of Ef (x, ~). In particular, care in selecting m and n. With independent sam-

441
Monte-Carlo simulations for stochastic optimization

piing at each grid point, K.B. Ensor and Glynn [21] almost surely optimal to (SP). Next, it is desirable
consider the rate at which n must grow relative to to have a statement regarding the rate of conver-
m in order to achieve consistency and they also gence and an associated asymptotic distribution.
discuss the method's limiting behavior when the These consistency and limiting distribution results
rate of growth is at (and slower than) the critical are aimed at justifying sampling-based methods
rate. and may be viewed as establishing solution qual-
A metamodel can be used to approximate a ity. However, the approach discussed in this sec-
more complex simulation model which, in turn, tion centers on the question: Given a candidate
is an approximation of the real system. In such solution ~ E X, what can be said regarding its
a metamodel, estimates of Ef(x,~) are formed at quality? Because candidate solutions may be ob-
each point in a set specified by an experimental tained by internal or external sampling schemes
design, and the parameters of the postulated re- or via another, heuristic, method, procedures that
sponse surface are fit to these observed values. The can directly test the quality of ~, regardless of its
resulting function is then optimized with respect origin, are very attractive.
to x. For more on metamodels see, e.g., [11], [47]. One natural way of defining solution quality is
The review in [25] includes optimization using re- by the optimality gap, Ef(~,~) - z*. An optimal
sponse surfaces, and metamodeling has also been solution has an optimality gap of zero, but in our
applied in stochastic programming [5]. setting we hope to make probabilistic statements
The grid-search and metamodel approaches are such as
classified as external sampling procedures if the
P{Ef(~,~)- z* < e} > c~, (2)
procedure is executed once. However, it may be
desirable to refine the grid (or the region covered where e is a random confidence interval width and
by the experimental design) in the neighborhood c~ is a confidence level, e.g., c~ - 0.95. Unfortu-
of promising values of x and repeat the methodol- nately, exact confidence intervals such as (2) can
ogy. When it is adaptively repeated in this fashion be difficult to obtain even in relatively simple sta-
the procedure is classified as an internal sampling tistical settings so we attempt to construct approx-
method. imate confidence intervals
We have not explicitly discussed approaches for P { E f ( ~ , ~) - z* _< e} ~ c~. (3)
when X is discrete. These range from methods for
selecting the best design in simulation to those To form a confidence interval (3) for Ef (~, ~) -
for solving stochastic integer programming mod- z* we estimate the mean of a gap random variable
els. Finally, sampling-based procedures for multi- Gn - Un - - L n that is expressed as the difference
stage stochastic programs have been proposed in between upper and lower bound estimators and
[17]. satisfies EGn >_ E J ' ( ~ , ~ ) - z*.
In many problems it is relatively straight-
Establishing Solution Quality. Establishing forward to estimate the performance of a sub-
solution quality is a key concept when using an optimal decision ~ via simulation. For exam-
approximation scheme to solve an optimization ple, the standard sample mean estimator, (-In -
problem. When applying Monte-Carlo techniques n1 }-]i=1
n
f(x, ~i), provides an unbiased estimate of
to (SP), the best we can expect are probabilistic the expected cost of using decision ~, i.e., Ef(~, ~).
quality statements. In the context of external sam- To construct a confidence interval for the opti-
pling, there has been significant work on studying mality gap we also want an estimate of z*. How-
the behavior of solutions to (SPn) for large sam- ever, unbiased estimates of z* are difficult to ob-
ple sizes (see the last section). There are analogous tain so an estimator Ln that satisfies ELn <_ z*
convergence results for algorithms based on inter- is used. In [51] it is shown that if the objective
nal sampling. Such results take a number of forms in (SPn) is an unbiased estimate of Ef(x,~) then
but perhaps the most fundamental is to show that Ez n _< z*, i.e., z n* is one possible lower bound esti-
limit points of the sequence of solutions are, say, mator Ln. Higle and Sen [33] perform a Lagrangian

442
Monte-Carlo simulations for stochastic optimization

relaxation of a reformulation of (SPn) which uses structures of f(x,~). Suppose that we have rx(~),
explicit 'nonanticipativity' constraints. The result- with known mean #r, which is believed to ap-
ing lower bound is weaker in expectation than z n proximate (be positively correlated with) f(x,~).
but has the computational advantage that the op- In CVs we attempt to 'subtract out' variation by
timization problem separates by scenario. generating observations of [ f ( x , ~ ) - Fx(~)] + #r,
Once observations of Gn can be formed, we can which has the same expectation as f(x,~). (It
appeal to the batch means method and use the is common to incorporate a multiplicative fac-
central limit theorem [51], or a nonparametric ap- tor with the control variate Fx(~) and also pos-
proach [31], [33], to construct approximate con- sible to use multiple controls.) In IS we attempt
fidence intervals (3). Another approach to exam- to reduce variance by generating observations of
ining solution quality is to test the null hypoth- #r [f (z, ~)/Fx(~)]. In CVs observations of ~ are
esis that the (generalized) Karush-Kuhn-Tucker generated from its original distribution. However,
(KKT) optimality conditions are satisfied; see [63]. in IS the expected value of the ratio is not the ratio
Higle and Sen [31] also consider the KKT condi- of expectations and, as a result, there is a change
tions but use them to derive bounds on the opti- of measure induced by Fx that is required to yield
mality gap. an unbiased estimate. Under the new IS distribu-
tion, we are more likely to sample ~ where Fx(~) is
V a r i a n c e R e d u c t i o n T e c h n i q u e s . When apply- large, i.e., scenarios that our approximation func-
ing the 'crude' Monte-Carlo method to estimate tion predicts have high cost. In an IS scheme for
Ef(x,~) for fixed x, we use the standard sample stochastic linear programs, [15], [37] use an ap-
mean estimator based on i.i.d, terms, proximation function that is separable in the com-
n
ponents of ~ while [48] utilizes a piecewise-linear
1 E f(x ~.i). approximation. See [12] for the solution of a sto-
n
i=1 chastic optimization problem to price American-
The error associated with this estimate is propor- style financial options using the simpler European
tional to option as a control variate. These papers report
significant variance reduction in computational re-
[var f (x, ~) ] 1/2 sults.
. (4)
Other VRTs exploit correlation structures in the
This error can be decreased by increasing the sam- solution methodology. Common random numbers
ple size. However, obtaining an additional digit of (CRNs) are often used in simulation when com-
accuracy requires increasing the sample size by paring the performance of two systems. The use of
a factor of 100. If f is defined as the optimal CRNs has been suggested in a stochastic approx-
value of a mathematical program or as the per- imation method with finite differences where the
formance measure of a simulation model, increas- same stream is used for the forward and backward
ing the number of evaluations of f in this fashion point estimates [50]. The upper and lower bounds
can be prohibitively expensive. Variance reduction used to determine solution quality (see the previ-
techniques (VRTs) effectively decrease the numer- ous section) may be viewed as two 'systems' and
ator in (4) instead of increasing the denomina- the use of CRNs in estimating their difference has
tor. Many problems for which crude Monte-Carlo been advocated in [34], [51]. In order to reduce
would yield useless results are instead made com- the error in the resulting response surface, various
putationally tractable via VRTs. As described in methods have been proposed for generating the
the section 'Solution Procedures', sampling is also streams of observations of ~ at each point in the ex-
used to estimate VEf(x,~), but for simplicity we perimental design. The Schruben-Margolin scheme
primarily restrict our attention to VRTs for esti- [59] uses a mixture of CRNs and antithetic variates
mating El(x, ~). and an extension [65] also incorporates CVs.
Some VRTs, including control variates (CVs) Another group of VRTs attempts to more reg-
and importance sampling (IS), exploit special ularly spread the sampled observations over the

443
Monte-Carlo simulations .for stochastic optimization

support of ~. Such techniques include stratified


said to epiconverge to ¢ (written Cn ~ ¢) if the
sampling and Latin hypercube sampling as well
epigraphs of Cn, {(x,/~): ~ >_ Cn(X)}, converge to
as quasi-Monte-Carlo techniques in which the se-
that of ¢. Epiconvergence is weaker than classi-
quence of observations is deterministic. Empiri-
cal uniform convergence. P. Kall [41].provides an
cal results in [30] for two-stage stochastic linear
excellent review of various types of convergence,
programming compare the variance reduction ob-
their relations, and their implications for approxi-
tained by stratified sampling, antithetic variates,
mations of optimization models. Epiconvergence is
IS, and CVs and suggest that a CV procedure per-
a valuable property because of the following result:
forms relatively well, particularly on high-variance
problems. THEOREM 1 Suppose Cn ~-~ ¢. If ~ is an accumu-
lation point of {x~}, where x n e argminCn(z),
T h e o r e t i c a l J u s t i f i c a t i o n for S a m p l i n g . In the then ~ E argmin ¢(x).
section 'Solution Procedures' we formed an ap-
Constrained optimization is captured in this re-
proximating problem for external sampling pro-
sult because Cn and ¢ are defined to be extended-
cedures by using the sample mean estimator of
real-valued functions that take value +c~ at infea-
Ef (x, ~). Here we redefine (SPn) as
sible points. While it is possible that the sequence
(SPn) Zn -- m i n E n f (X, ~), of optimizers {xn} has no accumulation points,
xEX
this potential difficulty is avoided if the feasible
with x n again denoting an optimal solution. In
region X is compact (i.e., closed and bounded).
(SP) the expected value operator E is with respect
to the 'true' probability measure P while in (SPn), Because of the implications of epiconvergence,
En is with respect to a measure Pn that is a sta- there is considerable interest in determining suf-
tistical estimate of P. If Monte-Carlo methods are ficient conditions on f, Pn, and P under which
used to generate i.i.d, replicates from P then Pn is E n f ( x , ~ ) ~ El(x, ~), a.s. Note that because {Pn}
the associated (random) empirical measure. are random measures, the epiconvergence of the
Since z n* is an estimator of z* and x n$ an esti- approximating functions is with probability one
mator of an optimal solution to (SP), it is natu- (also called epiconsistency). Under this hypothesis
ral to study the behavior of these estimators for the accumulation points of {x~} are almost surely
large sample sizes. For example, under what con- optimal to (SP); see [19].
ditions do we obtain consistency and what can be Sufficient conditions for achieving Enf(x,~)
said concerning rates of convergence? Positive an- Ef(x,~), a.s. are examined in [19], [42], [55], and
swers to such questions provide theoretical justi- [56]. Roughly speaking, we will obtain epiconsis-
fication for employing external Monte-Carlo sam- tency if f is sufficiently smooth, Pn converges
pling techniques to solve (SP). weakly to P with probability one, and the tails
In general, (SPn) and (SP) may have multi- of the distributions are well-behaved relative to f.
ple optimal solutions and so we cannot expect See [2], [60] for results when f is discontinuous.
{x n} to converge. Instead, establishing consistency For two-stage stochastic programming in which
of x n amounts to showing that the accumulation the recourse matrix W in (1) is deterministic and
points of the sequence are almost surely optimal Pn is the empirical measure, [46] contains consis-
to (SP). If, for example, the samples are i.i.d. tency results under modest assumptions. We note
then by the strong law of large numbers we have that is possible to develop consistency results using
En f (x, ~ ) --+ Ef (x, ~ ), a.s., for all x. Unfortunately, other (stronger) types of convergence of E ~ f ( x , ~)
this does not ensure that {x~} has accumulation to El(x,(); see, for example, [52].
points that are optimal to (SP) and that z n$ --+ z $ , There is a large literature on consistency, stabil-
[4]. ity, and rates of convergence for solutions of (SPn).
The notion of epiconvergence plays a fundamen- Much of this work may be viewed as generalizing
tal role in establishing consistency results for x~ earlier results on Constrained m a x i m u m likelihood
and z*; see [4]. A sequence of functions {¢n} is estimation in [1] and [36]. Under restrictive as-

444
Monte-Carlo simulations for stochastic optimization

sumptions, asymptotic normality for V~(z n - z*) ZIEMBA, W.T.: 'The Russell-Yasuda Kasia model: An
and x / ~ ( x ~ - x*) may be obtained, e.g., [19]. How- asset/liability model for a Japanese insurance company
using multistage stochastic programming', Interfaces
ever, when inequality constraints in X play a non-
24 (1994), 29-49.
trivial role we cannot, in general, expect to ob- [14] DANTZIG, G.B.: 'Linear programming under uncer-
tain limiting distributions that are normal [44], tainty', Managem. Sci. 1 (1955), 197-206.
[62], [18]. See [44] for a limiting distribution for [15] DANTZIG, G.B., AND GLYNN, P.W.: 'Parallel proces-
v ~ ( x ~ - x*) that is the solution of a (random) sors for planning under uncertainty', Ann. Oper. Res.
quadratic program. 22 (1990), 1-21.
[16] DANTZIG, G.B., GLYNN, P.W., AVRIEL, M., STONE,
See also: M o n t e - C a r l o s i m u l a t e d a n n e a l i n g J.C., ENTRIKEN, R., AND NAKAYAMA, M.: 'Decompo-
in protein folding. sition techniques for multi-area generation and trans-
mission planning under uncertainty', Report Electric
References Power Res. Inst., no. EPRI 2940-1 (1989).
[1] AtTCHISON, J., AND SILVEY, S.D.: 'Maximum- [17] DEMPSTER, M.A.H., AND THOMPSON, R.T.: 'EVPI-
likelihood estimation of parameters subject to re- based importance sampling solution procedures for
straints', Ann. Math. Statist. 29 (1958), 813-828. multistage stochastic linear programmes on parallel
[2] ARTSTEIN, Z., AND WETS, R.J.-B.: 'Stability results MIMD architectures', Ann. Oper. Res. 90 (1999), 161-
for stochastic programs and sensors, allowing for dis- 184.
continuous objective functions', SIAM J. Optim. 4 [18] DUPACOVA, J." 'On non-normal asymptotic behavior
(1994), 537-550. of optimal solutions for stochastic programining prob-
[3] ASMUSSEN~ S., GLYNN, P.W.~ AND THORISSON~ H.: lems and on related problems of mathematical statis-
'Stationary detection in the initial transient problem', tics', Kybernetika 27 (1991), 38-51.
A CM Trans. Modeling and Computer Simulation 2 [19] DUPA(~OVJ~, J., AND WETS, R.J.-B." 'Asymptotic be-
(1992), 130-157. havior of statistical estimators and of optimal solutions
[4] ATTOUCH, H., AND WETS, R.J.-B.: 'Approxima- of stochastic optimization problems', Ann. Statist. 16
tion and convergence in nonlinear optimization', in (1988), 1517-1549.
O. MANGASARIAN, R. MEYER, AND S. ROBINSON [20] EDIRISINGHE, C., AND ZIEMBA, W.W.: 'Implement-
(eds.): Nonlinear Programming, Vol. 4, Acad. Press, ing bounds-based approximations in convex-concave
1981, pp. 367-394. two-stage stochastic programming', Math. Program. 75
[5] BAILEY, T.G., JENSEN, P.A., AND MORTON, D.P.: (1996), 295-325.
'Response surface analysis of two-stage stochastic lin- [21] ENSOR, K.B., AND GLYNN, P.W.: 'Stochastic optimi-
ear programming with recourse', Naval Res. Logist. 46 zation via grid search', in G.G. YIN AND Q. ZHANG
(1999), 753-778. (eds.): Mathematics o] Stochastic Manufacturing Sys-
[6] BEALE, E.M.L.: 'On minimizing a convex function sub- tems, Vol. 33 of Lect. Applied Math., Amer. Math. Sou.,
ject to linear inequalities', J. Royal Statist. Soc. 17B 1997, pp. 89-100.
(1955), 173-184. [22] ENSOR, K.B., AND GLYNN, P.W.: 'Simulating the
[7] BINGE, J.R.: 'Decomposition and partitioning meth- maximum of a random walk', J. Statist. Planning In-
ods for multistage stochastic linear programs', Oper. ference 85 (2000), 127-135.
Res. 33 (1985), 989-1007. [23] ERMOLIEV, Y.: 'Stochastic quasigradient methods', in
[8] BIRGE, J.R.: 'Stochastic programming computation Y. ERMOLIEV AND R.J.-B. WETS (eds.): Numeri-
and applications', INFORMS J. Comput. 9 (1997), cal Techniques for Stochastic Optimization, Springer,
111-133. 1988, pp. 141-185.
[9] BINGE, J.R., DONOHUE, C.J., HOLMES, D.F., AND [24] FRAUENDORFER, K.: Stochastic two-stage program-
SVINTSITSKI, O.G.: 'n parallel implementation of the ming, Vol. 392 of Lecture Notes Economics and Math.
nested decomposition algorithm for multistage stochas- Systems, Springer, 1992.
tic linear programs', Math. Program. 75 (1996), 327- [25] Fu, M.C.: 'Optimization via simulation: A review',
352. Ann. Oper. Res. 53 (1994), 199-248.
[10] BIRGE, J.R., AND LOUVEAUX, F.: Introduction to sto- [26] FUTSCHIK, A., AND PFLUG, G.CH.: 'Optimal alloca-
chastic programming, Springer, 1997. tion of simulation experiments in discrete stochastic
[11] Box, G.E.P., AND DRAPER, N.R.: Empirical model- optimization and approximative algorithms', Europ. J.
building and response surfaces, Wiley, 1987. Oper. Res. 101 (1997), 245-260.
[12] BROADIE, M., AND GLASsERMAN, P.: 'Pricing [27] GLASsERMAN, P.: Gradient estimation via perturbation
American-style options using simulation', J. Econom. analysis, Kluwer Acad. Publ., 1991.
Dynam. Control 21 (1997), 1323-1352. [28] GLYNN, P.W.: 'Optimization of stochastic systems via
[13] CARINO, D.R., KENT, T., MEYERS, D.H., STACY, C., simulation': Proc. 1989 Winter Simulation Conf., 1989,
SYLVANUS, M., TURNER, A.L., WATANABE, K., AND

445
Monte-Carlo simulations for stochastic optimization

pp. 90-105. chastic programming', Math. Oper. Res. 18 (1993),


[29] GLYNN, P.W.: 'Likelihood ratio gradient estimation for 148-162.
stochastic systems', Comm. ACM 33, no. 10 (1990), [45] KING, A.J., TAKRITI, S., AND AHMED, S.: 'Issues in
75-84. risk modeling for multi-stage systems', IBM Res. Re-
[30] HIGLE, J.L.: 'Variance reduction and objective func- port R C 20993 (1997).
tion evaluation in stochastic linear programs', IN- [46] KING, A.J., AND WETS, R.J.-B.: 'Epi-consistency of
FORMS J. Comput. 10 (1998), 236-247. convex stochastic programs', Stochastics 34 (1991),
[31] HIGLE, J.L., AND SEN, S.: 'Statistical verification of 83-91.
optimality conditions for stochastic programs with re- [47] KLEIJNEN, J.P.C., AND GROENENDAAL, W. VAN: Sim-
course', Ann. Oper. Res. 30 (1991), 215-240. ulation: A statistical perspective, Wiley, 1992.
[32] HIGLE, J.L., AND SEN, S.: 'Stochastic decomposition: [4s] KRISHNA, A.S.: 'Enhanced algorithms for stochastic
An algorithm for two-stage linear programs with re- programming', SOL Report Dept. Oper. Res. Stanford
course', Math. Oper. Res. 16 (1991), 650-669. Univ. 93-8 (1993).
[33] HIGLE, J.L., AND SEN, S.: 'Duality and statistical tests [49] KUSHNER, H.J., AND YIN, G.G.: Stochastic approxi-
of optimality for two stage stochastic programs', Math. mation algorithms and applications, Springer, 1997.
Program. 75 (1996), 257-275. [5o] L'ECUYER, P., GIROUX, N., AND GLYNN, P.W.: 'Sto-
[34] HIGLE, J.L., AND SEN, S.: Stochastic decomposition: A chastic optimization by simulation: numerical experi-
statistical method for large scale stochastic linear pro- ments with the M/M/1 queue in steady-state', Man-
gramming, Kluwer Acad. Publ., 1996. agem. Sci. 40 (1994), 1245-1261.
[35] HO, Y.C., AND CAO, X.R.: Perturbation analysis of [51] MAK, W.K., MORTON, D.P., AND WOOD, R.K.:
discrete event dynamic systems, Kluwer Acad. Publ., 'Monte Carlo bounding techniques for determining so-
1991. lution quality in stochastic programs', Oper. Res. Left.
[36] HUBER, P.J.: 'The behavior of maximum likelihood 24 (1999), 47-56.
estimates under nonstandard conditions': Proc. Fifth [52] PFLUG, G.CH., RUSZCZYI~SKI, A., AND SCHULTZ, R"
Berkeley Symp. Math. Statistics and Probab., 1967, 'On the Glivenko-Cantelli problem in stochastic pro-
pp. 221-233. gramming: Linear recourse and extensions', Math.
[37] INFANGER, G.: 'Monte Carlo (importance) sampling Oper. Res. 23 (1998), 204-220.
within a Benders decomposition algorithm for stochas- [53] PLAMBECK, E.L., FU, B.-R., ROBINSON, S.M., AND
tic linear programs', Ann. Oper. Res. 39 (1992), 69-95. SURI, R.: 'Sample-path optimization of convex stochas-
[3s] INFANGER, G.: Planning under uncertainty: Solving tic performance functions', Math. Program. 75 (1996),
large-scale stochastic linear programs, Sci. Press Set. 137-176.
Boyd ~ Fraser, 1993. [54] PRI~KOPA, A." Stochastic programming, Kluwer Acad.
[39] JACOBS, J., FREEMAN, G., GRYGIER, J., MORTON, Publ., 1995.
D., SCHULTZ, G., STASCHUS, K., AND STEDINGER, [55] ROBINSON, S.M.: 'Analysis of sample-path optimiza-
J.: 'SOCRATES: A system for scheduling hydroelec- tion', Math. Oper. Res. 21 (1996), 513-528.
tric generation under uncertainty', Ann. Oper. Res. 59 [56] ROBINSON, S.M., AND WETS, R.J.-B.: 'Stability in
(1995), 99-133. two-stage stochastic programming', SIAM J. Control
[4o] JONSBR.~TEN, T.W., WETS, R.J.-B., AND Optim. 25 (1987), 1409-1416.
WOODRUFF, D.L.: 'A class of stochastic programs [57] RUBINSTEIN, R.Y., AND SHAPIRO, A.: Discrete event
with decision dependent random elements', Ann. systems: Sensitivity and stochastic optimization by the
Oper. Res. 82 (1998), 83-106. score function method, Wiley, 1993.
[41] KALL, P.: 'Approximation to optimization problems: [ss] RUSZCZYI<ISKI, A." 'A regularized decomposition
An elementary review', Math. (]per. Res. 11 (1986), method for minimizing a sum of polyhedral functions',
9-18. Math. Program. 35 (1986), 309-333.
[42] KALL, P.: 'On approximations and stability in sto- [59] SCHRUBEN, L.W., AND MARGOLIN, B.H.: 'Pseudo-
chastic programming', in J. GUDDAT, H.TH. JON- random number assignment in statistically designed
GEN, B. KUMMER, AND F. NO~I~KA (eds.): Parametric simulation and distribution sampling experiments', J.
Optimization and Related Topics, Akad. Verlag, 1987, Amer. Statist. Assoc. 73 (1978), 504-525.
pp. 387-407. [60] SCHULTZ, R.: 'On structure and stability in stochas-
[43] KALL, P., RUSZCZYI<ISKI, A., AND FRAUENDoRFER, tic programs with random technology matrix and com-
K.: 'Approximation techniques in stochastic program- plete integer recourse', Math. Program. 70 (1995), 73-
ming', in Y. ERMOLIEV AND R.J.-B. WETS (eds.): 89.
Numerical Techniques for Stochastic Optimization, [61] SEN, S., DOVERSPIKE, R.D., AND COSARES, S.: 'Net-
Springer, 1988, pp. 33-64. work planning with random demand', Telecommunica-
[44] KING, A.J., AND ROCKAFELLAR, R.T.: 'Asymptotic tion Systems 3 (1994), 11-30.
theory for solutions in statistical estimation and sto- [62] SHAPIRO, A.: 'Asymptotic properties of statistical es-

446
Motzkin transposition theorem

timators in stochastic programming', Ann. Statist. 17 f yT A + vT B -- O, yTa + vTb >_O,


(1989), 841-858. (T2)
[63] SHAPIRO, n., AND MELLO, T. HOMEM DE: 'n y>0, v_0, v~%0,
simulation-based approach to two-stage stochastic pro-
respectively.
gramming with recourse', Math. Program. 81 (1998),
301-325. In other words, when one has a solution of
[64] SLYKE, R.M. VAN, AND WETS, R.J.-B.: 'L-Shaped ((T1)) or of ((W2)) this solution is a certificate for
linear programs with applications to optimal control the fact that the given system ((S)) is in]easible,
and stochastic programming', SIAM J. Appl. Math. 17 i.e., has no solution.
(1969), 638-663.
It makes sense to formulate two most useful
[6s] TEW, J.D.: 'Simulation metamodel estimation using
a combined correlation-based variance reduction tech- principles following from the theorem.
nique for first and higher-order metamodels', Europ. J. THEOREM 1 (Principle A ) T h e system ((S))is in-
Oper. Res. 87 (1995), 349-367.
feasible if and only if one can combine the inequal-
David P. Morton
ities in ((S)) in a linear fashion (i.e., multiply each
Oper. Res. and Industrial Engin. Univ. Texas at Austin
inequality with a nonnegative number and add the
Austin, Texas, USA
E-mail address: morton©mail, utexas, edu
results) to get the contradictory inequality 0 > 0
Elmira Popova (or0>_ 1). [-1
Oper. Res. and Industrial Engin. Univ. Texas at Austin To see that this is exactly what the M T T says, let
Austin, Texas, USA
y and v denote nonnegative vectors of appropriate
E-mail address: elmira©mail, utexas, edu
sizes. Then the inequality
MSC2000: 90C15, 65C05, 65K05, 90C31, 62F12
Key words and phrases" stochastic programming, yTA + vTB) x >_ yTa + vTb (1)
simulation-based optimization, Monte-Carlo method.
is a consequence of the inequalities in ((S)), and if
the vector v is not the zero vector, then also the
MOTZKIN TRANSPOSITION THEOREM, stronger inequality
MTT
Motzkin's transposition theorem (MTT) [1] is a so- (yTA + vTB) x > yTa + vTb (2)
called theorem of the alternative (cf. L i n e a r op-
is a consequence of ((S)). The inequalities (1) and
t i m i z a t i o n : T h e o r e m s of t h e a l t e r n a t i v e ) . It
(2) have certainly solutions if yTA + vTB ~ O. But
deals with the question whether or not a given
if yTA + vTB -- 0 then (1) yields a contradiction
system of linear inequalities has a solution. In the
if yTa + vTb > 0 and (2) if yTa + vTb > O. The
most general case such a system has the form
first case occurs if ((Wl)) has a solution and the
(S) Ax > a, Bx > b, second case if ((T2)) has a solution.
where A and B are matrices of size m × n and The second principle is:
p × n, respectively, and where Ax >_ a contains
THEOREM 2 (Principle B) If ((S)) is feasible, then
the 'larger than or equal' inequalities and Bx > b
a linear inequality is a consequence of the inequal-
the 'larger than' inequalities. Note that inequali-
ities in ((S)) if and only if it can be obtained by
ties of the opposite type ('smaller than or equal' or
combining, in a linear fashion, the inequalities in
'smaller than') can be turned into the appropriate
((S)) and the trivial inequality 0 _> - 1 . K]
form by multiplying them by - 1 .
The Motzkin transposition theorem states that This principle can be understood in a similar way"
the system ((S)) has no solution if and only if at If ((S)) is feasible, then c Tx _> z is an implied
least one of the systems ((T1)) and ((T2)) has a inequality if and only if
solution, where the latter systems are given by
Ax >_ a, Bx > b =:=> cTx >_ z,
f yTA + vTB -- O, yTa + vTb > o,
(T1) which is equivalent to the system
yk0, v_0,
and Ax >_a, Bx > b, -c TX~--Z

447
Motzkin transposition theorem

being infeasible. By Principle A this happens if has no solutions. By the M T T this is the case if
and only if there exist nonnegative vectors y and and only if at least one of the systems
v and a nonnegative scalar A such that l y T A - - yoc -- O, yTb-- yoz > O,
(Wlz)
(yTA+vTB--~c) x>yTa+vTb--~z
t y>0, y0>0

is a contradictory inequality. Hence yTA + v T B - and


£c -- 0 and yTa + v T b - Az > 0. Since ((S)) is feasi- y T A - - yoc -- O yTb-- yoz > O,
(T2z) ' -
ble, we must have ,~ > 0. Without loss of generality y>0, y0>0
we may assume ~ - 1. Then c - yTA + v T B and has a solution. Note that the only difference be-
z > yTa + vTb. This proves the claim. tween these two systems is that ((Tlz)) requires
The above principles are highly nontrivial and Y0 >_ 0 whereas ((T2z)) requires Y0 > 0. Also, since
very deep. Consider, e.g., the following system of the system ((W2z)) is homogeneous, without loss of
4 inequalities with two variables u, v: generality we may take y0 = 1. Thus it follows that
-l_<u_<l, z is a lower bound on the optimal value of ((P)) if
and only if one of the following two systems
-l<v<l.
(Tltz) yTA -- O, yTb > O, y >_ O
From these inequalities it follows that
and
u 2 + v 2 _< 2,
(T2~z) y T A - - c, yTb > z, y >_O
which in turn implies, by the Cauchy inequality,
has a solution. Observe that z does not appear
the inequality u + v < 2"
in ((Tl~z)). Therefore, if this system has a solu-
u+v-l.u+l.v_< v / i 2 + 1 2 V / u 2 + v 2_<2. tion then each real z is a lower bound on the opti-
The concluding inequality is linear, and is a con- mal value of ((P)), but this occurs if and only the
sequence of the original system, but the above problem ((P)) is infeasible. Assuming that ((P))
derivation is 'highly nonlinear'. It is absolutely un- is feasible, it follows that z is a lower bound on
clear a priori why the same inequality can also be the optimal value of ((P)) if and only if the sys-
obtained from the given system in a linear man- tem ((T2~z)) has a solution. Given a solution y of
ner as well, as stated by Principle B. Of course, it ((W2~z)) any z satisfying yTb >_ z is a lower bound
c a n - it suffices to add the inequalities u < 1 and and the largest lower bound provided in this way
v<l. is yTb. Hence, the largest possible lower bound on
the optimal value of ((P)) is the optimal value of
The M T T is one of the deepest result in the
part of mathematics dealing with linear inequal- the problem
ities and, in fact, is logically equivalent to other (D) m a x { b T y • yTA -- c, y >_ 0 } .
deep results in this discipline. For example, it is
If the problem ((P)) is unbounded, i.e., if there
equivalent to the duality theorem for linear opti-
does not exists a lower bound on the optimal value
mization (cf. L i n e a r p r o g r a m m i n g ) . To demon-
of ((P)), then the problem ((D)) must be infeasi-
strate this, consider the linear optimization prob-
ble. Otherwise the optimal value of ((D)) must
lem
coincide with the optimal value of ((P)).
(P) m i n { c T x • Ax >_ b} . The problem ((D)) is called the dual problem of
Let z* denote the optimal value of ((P)), where we the primal problem ((P)). The above findings can
take z* - - o c if ((P)) is unbounded and z* - oc be summarized as follows:
if ((P)) is infeasible. Now, a real z is a lower bound if one of the two problems ((P)) and
on the optimal value of ((P)) if and only if c T x >__z ((D)) is unbounded then the other is
is a consequence of Ax >_ b, or, which is the same, infeasible; if both problems are feasible
if and only if the system of linear inequalities then they have both an optimal solution
(Sz) Ax >_b, --cTx > --z and the optimal values are the same.

448
Motzkin transposition theorem

This is the duality theorem ]or linear optimization. To prove the MTT, one derives from Farkas'
Note that one other case may occur, namely that lemma that the 'weaker' system
both problems are infeasible. It became clear above (S1) A x >_ a, B x >_ b
that ( ( P ) ) i s infeasible if and only if ((Wl~z)) has a
is infeasible if and only if the system ((T1)) has a
solution, so
solution. If ((S1)) is feasible then one easily ver-
the primal problem ((P)) is infeasible if ifies that ((S)) has no solution if and only if the
and only if there exists a dual ray y, i.e., optimal value of the problem
a vector y such that
(P1) min{u" A x > a, B x + ue >_ b}
yTA--O, yTb>o, y>O. (3)
is a nonnegative real. Here e denotes the all-one
In fact, the latter statement is equivalent to the vector. Since ((P1)) is feasible and below bounded,
statement that (3) and A x > b are alternative sys- by the duality theorem this happens if and only if
tems, which is the special case of the MTT oc- the optimal value of the dual problem
curring when B is vacuous and which is known as
Farkas' lemma. (See L i n e a r o p t i m i z a t i o n : T h e - (D1) max aTy+bTv • eTv -- 1,
o r e m s of t h e a l t e r n a t i v e and F a r k a s l e m m a . ) y>0, v_>0
In just the same way it can be derived from a vari- is a nonnegative real and, finally, this occurs if and
ant of Farkas' lemma that: only if ((T2)) has a solution. Thus it has been
the dual problem ((D)) is infeasible if shown that the MTT is logically equivalent to the
and only if there exists a primal ray x, duality theorem for linear optimization.
i.e., a vector x such that So far the issue of how to prove the MTT has not
A x >_ O, cT x < O. (4) been touched. One possible approach is to prove
the duality theorem for linear optimization and
It has been shown above that the MTT implies then derive the M T T in the above described way.
the duality theorem for linear optimization. The This approach is now quite popular in text books.
converse is also true: Assuming the duality theo- For a recent example see, e.g., [2]. The easiest way
rem for linear optimization, the MTT easily can for a direct proof is to prove first the Farkas' lemma
be proved, showing that the two results are logi- and then derive the MTT from this lemma. The
cally equivalent. This goes in two steps. Assuming latter step uses the easy to verify statement that
the duality theorem for linear optimization, first ((S)) has no solution if and only if the system
one derives Farkas' lemma and then it is shown
that the MTT follows. To derive Farkas' lemma,
Ax- ta > 0,
consider the problem B x - t b - se > 0,
t-s>_O,
min{OTx'Ax>_b}.
-s<0
Clearly, the system A x >_ b has a solution if and
only if the optimal value of this problem is zero. has no solution. Application of a suitable variant
By the duality theorem this holds if and only if of Farkas' lemma to this system yields the MTT.
the optimal value of the dual problem Farkas' lemma and its proof have a rich history;
for a nice and detailed survey one might consult
max{bTy " y T A - 0 , y_>0} [3]
is also zero. This holds if and only See also: M u l t i - i n d e x t r a n s p o r t a t i o n p r o b -
lems; M i n i m u m concave transportation
y T A -- O, y > O ~ bTy <_ O,
p r o b l e m s ; S t o c h a s t i c t r a n s p o r t a t i o n a n d lo-
which is true if and only if the system cation problems; Linear optimization: The-
o r e m s of t h e a l t e r n a t i v e ; L i n e a r p r o g r a m -
y T A -- O, y >_ O, bT y > O
ruing; F a r k a s l e m m a ; T u c k e r h o m o g e n e o u s
has no solution, proving Farkas' lemma. s y s t e m s of l i n e a r r e l a t i o n s .

449
Motzkin transposition theorem

References are axial MITPs, when m = k - 1 ; and planar


[1] MOTZKIN, T.S.: 'Beitriige zur Theorie der Linearen MITPs, when m = 1; see below for details.
Ungleichungen', PhD Thesis Azriel, Jerusalem (1936).
• Integer solutions may or may not be re-
[2] PADBERG, M.: Linear optimization and extensions,
Vol. 12 of Algorithms and Combinatorics, Springer, quired. Integrality requirements, which give
1995. rise to integer MITPs, may be necessary
[3] SCHRIJVER, A.: Theory of linear and integer program- since MITPs lack the integrality property en-
ming, Wiley, 1986. joyed by ordinary transportation problems
Arkadi Nemirovski (but see [22] for an exception).
Fac. Industrial Engin. and Management
• Unit right-hand sides, in conjunction with
Technion: Israel Inst. Technol.
Technion-City, Haifa 32000, Israel
integrality requirements, give rise to multi-
E-mail address: nemirovs(Die, technion, ac. i l index assignment problems (MIAPs). (Some
Kees Roos authors use this term for integer MITPs
Dept. ITS/TWI/SSOR Delft Univ. Technol. with integer right-hand sides; the present
P.O. Box 356 terminology, consistent with that for ordi-
2600 AJ Delft, The Netherlands nary assignment and transportation prob-
E-mail address: C.Roos©twi. t u d e l f t .nl
lems, seems preferable.) MIAPs are hard to
MSC2000: 15A39, 90C05 solve: the 3IAP is already NP-hard by reduc-
Key words and phrases" inequality systems, duality, certifi-
tion from the 3-dimensional matching prob-
cate, transposition theorem.
lem [17]. Even worse [6]" no polynomial time
algorithm for the 3IAP can achieve a con-
MULTI-INDEX TRANSPORTATION PROB- stant performance ratio, unless P = NP.
LEMS, MITP • The objective function is usually a simple lin-
An ordinary transportation problem has vari- ear combination of the variables, normally a
ables with two indices, typically corresponding to total cost to be minimized as in equation (1)
sources (or origins, or supply points) and desti- below. Alternatives, not considered in this ar-
nations (or demand points). A multi-index trans- ticle, may include bottleneck objectives ([36],
portation problem (MITP) has variables with three [11]), more general nonlinear objectives such
or more indices, corresponding to as many differ- as in [34], or multicriteria problems [38].
ent types of points or resources or other factors. • There may be additional constraints, such as
Multi-index transportation problems were consid- upper bounds on the variables, (capacitated
ered by T. Motzkin [22] in 1952; an application in- MITPs), variables fixed to the value zero
volving the distribution of different types of soap (MITPs with forbidden cells), or constraints
was presented by E. Schell [35] in 1955. MITPs on certain partial sums of variables (MITPs
are also known as multidimensional transportation with generalized capacity constraints).
problems [4]. There are several versions and special MITPs with linear objectives and without inte-
cases of MITPs: grality restrictions are linear programming prob-
• The number k of dimensions may be fixed to lems with a special structure. The most extensively
a small value; the resulting MITP is called a studied integer MITPs are three-index assignment
k-index transportation problem, kITP. Quite problems (3IAPs); see also T h r e e - i n d e x assign-
naturally, the best studied cases are the ment problem.
three-index transportation problems (3ITPs),
also known as three-dimensional, or 3D F o r m u l a t i o n s . The following compact notation
transportation problems. ([34], [31]) avoids multiple summations and multi-
• The type of constraints is determined by an ple layers of subindices. Let k _ 3 denote the num-
integer m with 0 < m < k, defining m-fold ber of dimensions or indices, and K - { 1 , . . . , k).
kITPs (called symmetric MITPs in [16]; see For i E K let Ai denote the set of values of the
also [41, Chapt. 8]). The most common cases ith index. Let A - ®iEgAi -- A1 × ' " × Ak de-

450
Multi-index transportation problems

note the Cartesian product of these index sets, the same number k of axial and planar demand
that is, the set of all joint indices (k-tuples) a = constraints; however there are only ~-~icK ]AiI ax-
(a(1),...,a(k)) with a ( i ) C Ai for all i E K. ial constraints, versus ~-~i~K YIs~K\{i} ]As] planar
One variable Xa is associated with each joint index constraints. Of course, it is possible to combine
a C A. Thus, for example in a 3ITP with index sets demand constraints with different values of m, so
I, J and L, the variable xa stands for xije when as to formulate different types of restrictions (e.g.,
the joint index is a = (i, j, t~). see [5] and [16]).
Given unit costs Ca C R for all a C A, a linear Reductions between MITPs are presented in
objective function is [16], where it is shown in particular that an m-fold
kITP can be reduced to a l-fold kITP for any m
min ~-~ CaXa (1)
aCA (with 0 < m < k), thereby generalizing a result in
[14]. Thus, an algorithm that solves planar kITPs
and the variables are usually restricted to be non-
is in principle capable of solving m-fold kITPs for
negative:
any m (with 0 < m < k).
Xa>_O for a l l a C A . (2) Notice that any M I T P with arbitrary right hand
Given the integer m with 0 < m < k, the de- sides can be transformed to a MITP with right
mand constraints of the m-fold kITP are defined hand sides 1. This is a (pseudopolynomial) trans-
as follows. Let (kKm) denote the set of all ( k - re)- formation and simply involves duplicating a re-
source with a supply of q units by q unit-supply
element subsets of K; an F C (kKm) is interpreted
resources. There seems to be little advantage in
as a set of k - m 'fixed indices'. Given such an F
doing so, except perhaps in converting an integer
and a ( k - m)-tuple g E AF -- ®I~FA I of 'fixed
M I T P into one with 0-1 variables.
values', let
Another issue is the existence of feasible solu-
A(F,g) = {a C A" a(f) - g(f), V I E F} tions. For an axial M I T P the requirement of equal
be the set of k-tuples which coincide with g on the total demands ~~g dig - ~-~g dig for all i,j C K
fixed indices. The m-fold demand constraints are is a necessary and sufficient condition for the ex-
istence of feasible solutions. Feasibility conditions
• ° (3)
-

are more complicated for nonaxial problems; see


aEA(F,g)
[40] for a review of results for planar problems.
for a l l F E k-m 'gEAR' See also [41, Chapt. 8] for properties of polytopes
associated with (integer) MITPs, including issues
where the right-hand sides dgg are given positive of degeneracy.
demands associated with the values g for fixed in-
dex subset F. These 'demands' may also denote
Applications.
supplies or capacities when the indices represent
sources or some other resource type. When some Transportation and Logistics. MITPs are used to
of these resources are in excess, the equality in model transportation problems that may involve
constraints (3) may be replaced with inequalities. different goods; such resources as vehicles, crews,
Problem (1)-(3) is a kITP. Adding the integrality specialized equipment; and other factors such as
restrictions alternative routes or transshipment points. Thus
index sets A1 and A2 may represent destina-
x~EN for a l l a E A , (4)
tions and sources, respectively, and the other sets
yields an integer MITP. A3,A4,... these additional factors. The type of
As mentioned above, the most common cases 'demand' constraints used will reflect the availabil-
are m - k - 1, defining axial MITPs; and m = 1, ity of these factors and their interactions. Thus,
defining planar MITPs. For the axial problems, the for example, an axial demand constraint (3) with
notation may simplified by letting dig - dFg when right-hand side d3i will be used for a vehicle type
F = {i}. Note that each variable xa appears in i E A3 of which d3i units are globally available

451
Multi-index transportation problems

(at identical cost) to all sources and destinations, therein). To model this as a planar 3ITP, let A1
while a constraint with F = {2, 3} will be used if be the set of employees; A2 the set of tasks; A3 the
there a r e dFg vehicles of type g(3) available at the set of time periods;
different sources g(2).
rjk forF-{2,3}, Vg-(j,k);
Interesting cases arise when each resource or fac-
tor g E Ai corresponds to a point Pi,t in a metric dFg-- 1 forE-{I,3}, Vg-(i,k);
space, i.e., a set with a distance 5, and the unit rij forF-{1,2}, Vg-(i,j);
costs Ca are 'decomposable' as defined below. Each
joint index a E A may be interpreted as a cluster and require the decision variables to be in {0, 1}.
of points among which transportation and other A special case arises when rjk = 1 for all j, k and
activities are conducted. The unit cost ca reflects N = M. The polyhedral structure of the resulting
the within-cluster transportation costs associated planar 3ITP is investigated in [7]. Other references
with these activities; it is decomposable if it can dealing with timetabling problems formulated as
be expressed as a function of the distances be- M I T e s are [15], [10] and [12].
tween pairs of points in the cluster a. Examples Multitarget Tracking. Consider the following (ide-
include the diameter maxi,j 5(Pi,a(i) , Pj,a(j)), when alized) situation. N objects move along straight
all these activities are performed simultaneously; lines in the plane. At each of T time instants a scan
the sum costs ~i,j 5(Pi,a(i), Pj,a(j)) when all activi- has been made, and the approximate position of
ties are performed sequentially; and the Hamilton- each object is observed and recorded. From such a
ian path or path costs, when all points Pit in the scan it is not possible to deduce which object gen-
cluster have to be visited in a shortest sequence. erated which observation. Also, a small error may
Other interesting cases arise when one of the be associated with each observation. A track is de-
indices denotes time. A simple dynamic location fined as a T-tuple of observations, one from each
problem [27] may be modeled as an axial kITe, scan. For each possible track a cost is computed
where index set A1 may denote the set of facilities based on a least squares criterion associated with
(say, warehouses) to be located; A2 that of candi- the observations in the track. The problem is now
date locations; and A3 that of time periods. The to identify N tracks while minimizing the sum of
costs cijt may include discounted construction and the costs of these tracks. This problem is called the
operating costs of these facilities. See [38] and [33] data-association problem in [25]. It can be modeled
for other applications of this type. as an axial integer TIAP as follows: let Ai be the
set of observations in scan i, i = 1 , . . . , T, and let
Timetabling. Other problems involving time and di9 - 1, i - 1 , . . . , T , g - 1 , . . . , N . Not surpris-
which can be formulated as MITPs arise in ingly, this problem is NP-hard already for T = 3
timetabling or staffing applications. To illustrate, (see [37]; notice however that this does not follow
consider the following generic situation. Given are from the NP-hardness of 3IAP due to the struc-
N employees (index i), each of which can be as- ture present in the cost-coefficients in the objective
signed to one of M tasks (index j) during each of function of multitarget tracking problems). Other
T time periods (index k). Moreover, for each pair references dealing with target tracking problems
consisting of a task and a time period a number formulated as axial MIAPs are [23] and [24]; see
rjk is given denoting the number of employees re- also [20].
quired for task j in period k. Also, a number rij
is given denoting the number of periods that task Tables with Given Marginals. Other statistical
j requires employee i. An employee can only be applications of M I T e s require finding multidi-
assigned to one task during each time period. Fi- mensional tables with given sums across rows
nally, there is a cost-coefficient Cijk which gives the or higher-dimensional planes, as specified in con-
cost of employee i performing task j in period k. straints (3). The right-hand sides dFg of such con-
This problem is called the multiperiod assignment straints are often known as marginals. In a simple
problem in [21] (see also the references contained application [3] arising in the integration of surveys

452
Multi-index transportation problems

and controlled selection, each index set represents some medium-sized planar integer 3ITPs. A tabu
a population from which a sample is to be drawn. search algorithm for this problem is described in
A (joint) sample is a k-tuple, one from each popu- [18]. Heuristic solution approaches based on La-
lation. The marginals are specified marginal prob- grangian relaxation are proposed in [26], [28] and
ability distributions over each population, giving [29] for multitarget tracking problems.
rise to axial demand constraints. Given sample One major difficulty with these exact or ap-
costs Ca, the problem is to find a joint probability proximate solution methods may be the sheer
distribution, defined by (Xa), of all the samples, size of MITP formulations; if, for example, all
consistent with these marginal distributions and IAiI - n then an m-fold kITP has n k variables
of minimum expected cost (1). and (km)nk-m constraints. In contrast, the two ap-
In contrast, problems of updating input-output proaches sketched below yield feasible solutions to
matrices (see [34] and references therein) typically axial MITPs much more quickly than simply writ-
have nonlinear objectives. In such problems, given ing down all the cost coefficients. In particular,
are a k-dimensional array B of data (for exam- these algorithms only produce the nonzero vari-
ple, past input-output coefficients) and arrays d ables xa and their values; all other variables are
of marginals (for example, forecast aggregate co- zero in the solution. In addition, this solution is
efficients) with appropriate dimensions. The prob- integral if all demands are integral. Of course, the
lem is to determine values Xa, the updated array effectiveness of these methods relies on some as-
entries, satisfying the demand constraints corre- sumptions on the cost coefficients Ca, assumptions
sponding to the given marginals, and such that the which are verified in several applications.
resulting updated array X - (Xa) differs as little
A Greedy Algorithm .for Axial M I T P s . The greedy
as possible from the given array B, as specified by
algorithm below (a multi-index extension of the
an appropriate (nonlinear) objective function. A
N o r t h - W e s t corner rule) finds a feasible solution
(nonlinear) MITP arises when the values xa are
to axial MITPs in O(k Y~'~i IAil) time, which is (for
constrained to be nonnegative, a natural require-
fixed k) linear in the size of the demand data dig.
ment in many contexts.
This solution is in fact optimal if the cost coeffi-
Other Applications. include an axial integer 3ITP cients are known to satisfy a 'Monge property' [3],
model for planning the launching of weather satel- [31], [32] defined below. (For k - 3, this greedy
lites [27], and an axial integer 5IAP arising in rout- algorithm is already described in [4] to obtain a
ing meshes in circuit design [9]. basic feasible solution).
Consider the axial kITP with equality con-
S o l u t i o n M e t h o d s . As noted above, MITPs are straints (3) and assume that each Ai =
linear programming problems with a special struc- { 1 , . . . , IAi]}. Recalling that the demands are de-
ture. There are several proposals for extensions noted dig, a s s u m e that EgEAi dig -- EgEA1 dig for
all i E K, a necessary and sufficient condition for
of LP (transportation) algorithms to MITPs (e.g.,
[13], [4] for 3ITPs and [1] for a 4ITP). the problem to be feasible.
As also mentioned earlier, integer MITPS are PROCEDURE greedy MITP algorithm
hard to solve. Exact algorithms have been pro- WHILE (~-']~geA~d~9 > 0 for all i E K) DO
let a(i) - min{g E A," d,9 > 0};
posed for the axial integer 3IAP (see T h r e e - i n d e x
let A = min{di,a(i)" i E K};
a s s i g n m e n t p r o b l e m ) and for the planar integer let xa = A;
3IAP (see [39] and [19]). Other exact approaches FOR i E K DO let d~,~(z) - d~,~(~)- A;
for integer MITPs rely on structure that is present RETURN x
in the particular application considered (see, e.g., END
[12]). A greedy algorithm for axial MITPs.
Several methods have been proposed to obtain
good approximate solutions to integer MITPs. In A Monge Property. The join a V b and meet a A b
[21] results are reported for a rounding heuristic on of a,b E A are

453
Multi-index transportation problems

(a V b)i - max{a(/), b(i)}, PROCEDURE Expand(h, y(h))


(a A b)i - min{a(i), b(i) } for a l l / E K. FORg:=ITO nh DO
q :-- 0;
The cost coefficients (Ca) satisfy the Monge prop- a(i) := l f o r i E K \ h ;
erty if WHILE (q < dh,9) DO
let g be such that
t
Ya(t.),g = m i n { Ya(r),g
r • r:J=h} ;
Cavb -'~ CaAb ~_ Ca Jr- Cb for all a, b E A.
X(h) t
"-- Ya(t),g;
Note that this is just the submodularity of the func- Ya(~),9 "= Ya(~),g
" --X(~ h) for all r E K \ h;
a(e) := a ( g ) + 1;
tion c" A --+ R defined on the product lattice A, q :-- q + x(~h);
see [3], [31], [32]. These references show that the R E T U R N x (h)
above greedy algorithm returns an optimal solu- END
tion for all feasible demands if and only if the cost
The Expand procedure for axial MITPs.
function satisfies the Monge property. The latter
two references also extend the greedy algorithm In the hub heuristics for decomposable costs, the
ordinary transportation problems use as cost coef-
i) to the case of forbidden cells when the non-
ficients the distances 5(Pij,Phg) between the cor-
forbidden cells form a sublattice of A; and
responding points Pij and Phg in the metric space.
ii) so that it returns an optimal dual solution. The expanded MITP solution x h would be opti-
They also show that optimizing a linear function mum if the cost function was that of the star with
over a submodular polyhedron is special case of the center h, namely if Ca - ~-~iCh 5(Pi,a(i),Ph,a(h))"
dual problem. It is shown in [32] that the primal The triangle-inequality property of the distance 5
problems are equivalent to the 'submodular linear allows one to bound the cost penalty from using
programs on forests' of [8]. this h-star cost function instead of the actual de-
composable cost function.
Cost functions c with the Monge property
include typical decomposable costs (as defined In the single hub heuristic, one chooses a hub
above) when all the points are located on a same h E K; solves these k - 1 transportation problems;
inputs their s o l u t i o n s y(h) to Expand; and simply
line or on parallel lines (one line for each factor
outputs the resulting MITP solution x (h). If the
type Ai). For these problems, the greedy algorithm
distance 5 satisfies the triangle inequality, the cost
above amounts to a 'left to right sweep' across the
points. of this solution x (h) is no more than k - 1 times
the optimal cost, in the worst case, for many com-
Hub Heuristics for Axial MITPs. The basic idea mon decomposable cost functions. The multiple-
([30], extending earlier work on axial 3IAPS [6] and hub heuristic is an obvious extension whereby one
MIAPs [2] with decomposable costs) is to solve a performs the single-hub heuristic k times, once for
small number of ordinary transportation problems each h E K, and retains the best solution. This
and to expand their solutions into a feasible solu- amounts to solving (K) ordinary transportation
tion to the original MITP. For a large collection of problems. Under the same assumptions as above
decomposable costs arising from applications, the and for many common decomposable cost func-
objective value of this feasible solution is provably tions, the cost of the resulting solution is less than
within a constant factor of the optimum. twice the optimum cost in the worst case.
Given an index h, called the hub, determine, See also: M o t z k i n t r a n s p o s i t i o n t h e o r e m ;
for each index i ~ h, a feasible solution to the Minimum concave transportation prob-
ordinary transportation problem defined by sup- lems; S t o c h a s t i c t r a n s p o r t a t i o n a n d l o c a t i o n
plies ( d i j ) j E A ( i ) a n d (dhg)gEA(h). The Expand pro- problems.
cedure below then takes as inputs these solutions
References
y(h) _ (yi)i¢h and expands them into a feasible
[1] BAMMI, D." 'A generalized-indices transportation
solution x (h) to the axial MITP. Its running time problem', Naval Res. LogiEr. Quart. 25 (1978), 697-
is O(l&l IA I). 710.

454
Multi-index transportation problems

[2] BANDELT, H.-J., CRAMA,Y., AND SPIEKSMA, F.C.R.: the planar three-index assignment problem', Europ. J.
'Approximation algorithms for multidimensional as- Oper. Res. 77 (1994), 141-153.
signment problems with decomposable costs', Discrete [20] MAVRIDOU, T., PARDALOS, P.M., PITSOULIS, L., AND
Appl. Math. 49 (1994), 25-50. RESENDE, M.G.C.: 'A GRASP for the biquadratic
[3] BEIN, W.W., BRUCKER, P., PARK, J.K., AND assignment problem', Europ. J. Oper. Res. 105/3
PATHAK, P.K.: 'A Monge property for the d- (March 1998), 613-621.
dimensional transportation problem', Discrete Appl. [21] MILLER, J.L., AND FRANK, L.S.: 'A binary-rounding
Math. 58 (1995), 97-109. heuristic for multi-period variable-task duration assign-
[4] CORBAN, i . : ' i multidimensional transportation ment problems', Computers Oper. Res. 23 (1996), 819-
problem', Rev. Roumaine Math. Pures et Appl. IX 828.
(1964), 721-735. [22] MOTZKIN, T.: 'The multi-index transportation prob-
[5] CORBAN, i . : 'On a three-dimensional transportation lem', Bull. Amer. Math. Soc. 58 (1952), 494.
problem', Rev. Roumaine Math. Pures et Appl. XI [23] MURPHEY, R., PARDALOS, P.M., AND PITSOULIS, L.:
(1966), 57-75. 'A GRASP for the multitarget multisensor tracking
[6] CRAMA, Y., AND SPIEKSMA, F.C.R.: 'Approximation problem': DIMACS, Vol. 40, Amer. Math. Sou., 1998,
algorithms for three-dimensional assignment problems pp. 277-302.
with triangle inequalities', Europ. J. Oper. Res. 60 [24] MURPHEY, R., PARDALOS, P.M., AND PITSOULIS, L.:
(1992), 273-279. 'A parallel GRASP for the data association multidi-
[7] EULER, R., AND LE VERGE, H.: 'Time-tables, poly- mensional assignment problem': IMA Vol. Math. Appl.,
hedra and the greedy algorithm', Discrete Appl. Math. Vol. 106, Springer, 1998, pp. 159-180.
65 (1996), 207-222. [25] PATTIPATTI, K.R., DEB, S., BAR-SHALOM, Y., AND
[8] FAIGLE, U., AND KERN, W.: 'Submodular linear pro- WASHBURN JR., R.B.: 'Passive multisensor data asso-
grams on forests', Math. Program. 72 (1996), 195-206. ciation using a new relaxation algorithm', in Y. BAR-
[9] FORTIN, D., AND TUSERA, A.: 'Routing in meshes us- SHALOM (ed.): Multitarget-multisensor tracking: Ad-
ing linear assignment', in A. BACHEM, U. DERIGS, vances and applications, 1990, p. 111.
M. JONGER, AND R. SCHRADER (eds.): Oper. Res. '93, [26] PATTIPATTI, K.R., DES, S., BAR-SHALOM, Y., AND
1994, pp. 169-171. WASHBURN JR., R.B.: 'A new relaxation algorithm
[10] FRIEZE, A.M., AND YADEGAR, J.: 'An algorithm for passive sensor data association', IEEE Trans. A utom.
solving 3-dimensional assignment problems with appli- Control 3 7 (1992), 198-213.
cation to scheduling a teaching practice', J. Oper. Res. [27] PIERSKALLA, W.P.: 'The multidimensional assignment
Soc. 32 (1981), 989-995. problem', Oper. Res. 16 (1968), 422-431.
[11] GEETHA, S., AND VARTAK, M.N.: 'The three- [28] POORE, A.B.: 'Multidimensional assignment formula-
dimensional bottleneck assignment problem with ca- tion of data-association problems arising from multitar-
pacity constraints', Europ. J. Oper. Res. 73 (1994), get and multisensor tracking', Comput. Optim. Appl. 3
562-568. (I 994), 27-57.
[12] GILBERT, K.C., AND HOFSTRA, R.B.: 'An algorithm [29] POORE, A.B., AND RIJAVEC, N.: 'A Lagrangian relax-
for a class of three-dimensional assignment problems ation algorithm for multidimensional assignment prob-
arising in scheduling applications', IIE Trans. 8 (1987), lems arising fi'om multitarget tracking', SIAM J. Op-
29-33. tim. 3 (1993), 544-563.
[13] HALEY, K.B.: 'The solid transportation problem', [30] UEYRANNE, M., AND SPIEKSMA, F.C.R.: 'Approxi-
Q
Oper. Res. 10 (1962), 448-463. mation algorithms for multi-index transportation prob-
[14] HALEY, K.B.: 'The multi-index problem', Oper. Res. lems with decomposable costs', Discrete Appl. Math.
11 (1963), 368-379. 76 (1997), 239-253.
[15] JUNGINGER, W.: 'Zurfickfiihrung des Stundenplan- [31] QUEYRANNE, M., SPIEKSMA, F.C.R., ANDTARDELLA,
problems auf einen dreidimensionales Transportprob- F.: 'A general class of greedily solvable linear pro-
lem', Z. Oper. Res. 16 (1972), 11-25. grams', in G. RINALDI AND L. WOLSEY (eds.): Proc.
[16] JUNGINGER, W.: 'On representatives of multi-index Third IPCO Conf. (Integer Programming and Combi-
transportation problems', Europ. J. Oper. Res. 66 natorial Optimization), 1993, pp. 385-399.
(1993), 353-371. [32] QUEYRANNE, M., SPIEKSMA, F.C.R., AND TARDELLA,
[17] KARP, R.M.: 'Reducibility among combinatorial prob- F.: 'A general class of greedily solvable linear pro-
lems', in R.E. MILLER AND J.W. THATCHER (eds.): grams', Math. Oper. Res. (to appear).
Complexity of Computer Computations, Plenum, 1972, [33] RAUTMAN, C.A., REID, R.A., AND RYDER, E.E.:
pp. 85-103. 'Scheduling the disposal of nuclear waste material in
[18] MAGOS, D.: 'Tabu search for the planar three-index as- a geologic repository using the transportation model',
signment problem', J. Global Optim. 8 (1996), 35-48. Oper. Res. 41 (1993), 459-469.
[19] MAGOS, D., AND MILIOTIS, P.: 'An algorithm for [34] ROMERO, D.: 'Easy transportation-like problems on K-

455
Multi-index transportation problems

dimensional arrays', J. Optim. Th. Appl. 66 (1990), ficulties. The first two are the same as those exist-
137-147. ing for multi-objective integer linear programming
[35] SCHELL,E.: 'Distribution of a product by several prop-
(MOILP) problem (cf. M u l t i - o b j e c t i v e i n t e g e r
erties', in DIRECTORATE OF MANAGEMENT ANALYSIS
(ed.): Second Symposium in Linear Programming 2, linear p r o g r a m m i n g ) , i.e.
DCS/Comptroller HQ, US Air Force, Washington DC, • the number of efficient solutions may be very
1955, pp. 615-642.
[36] large;
SHARMA, J.K., AND SHARUP, K.: 'Time-minimizing
multidimensional transportation problem', J. Engin. • the nonconvex character of the feasible set re-
Production 1 (1977), 121-129. quires to device specific techniques to gener-
[37] SPIEKSMA, F.C.R., AND WOEGINGER, G.J.: 'Geomet-
ate the so-called 'nonsupported' efficient so-
ric three-dimensional assignment problems', Europ. J.
Oper. Res. 91 (1996), 611-618. lutions (cf. M u l t i - o b j e c t i v e i n t e g e r linear
[38] TZENG, G., TEODOROVIC, D., AND HWANG, M." programming).
'Fuzzy bicriteria multi-index transportation problems
A particular single CO problem is characterized by
for coal allocation planning of Taipower', Europ. J.
Oper. Res. 95 (1996), 62-72. some specificities of the problem, generally a spe-
[39] VLACH, M.: 'Branch and bound method for the three- cial form of the constraints; the existing methods
index assignment problem', Ekonomicko-Matematicky for such problem use these specificities to define
Obzor 3 (1967), 181-191. efficient ways to obtain an optimal solution. For
[4o] VLACH,M.: 'Conditions for the existence of solutions of MOCO problem, it appears interesting to do the
the three-dimensional planar transportation problem',
same to obtain the set of efficient solutions. Con-
Discrete Appl. Math. 13 (1986), 61-78.
[41] YEMELICHEV, V.A., KOVALEV, M.M., AND KRATSOV, sequently, and contrary to what is often done in
M.K.: Polytopes, graphs and optimization, Cam- MOLP and MOILP methods, a third difficulty is
bridge Univ. Press, 1984. to elaborate methods avoiding to introduce addi-
Maurice Queyranne tional constraints so that we preserve during all the
Univ. British Columbia procedure the particular form of the constraints.
Vancouver, B.C., Canada The general form of a MOCO problem is
E-mail address: blaurice. Queyranne©coramerce. ubc. ca
Frits Spieksma 'min' z k ( X ) = ckX,
XES
Maastricht Univ.
k= 1,...,K,
Maastricht, The Netherlands
E-mail address: spieksma©math, unimaas, nl
(P) where S=DNB ~
MSC 2000:90C35 with X(n × 1),
Key words and phrases: transportation problem, three- B = {0,1}
dimensional transportation problem, greedy algorithm,
Monge property, approximation algorithms. and D is a specific polytope characterizing the CO
problem: assignment problem, knapsack problem,
traveling salesman problem, etc.
MULTI-OBJECTIVE COMBINATORIAL OP- There exists several surveys on MOCO; some
TIMIZATION, MOCO are devoted to specific problems (i.e., the partic-
It is well known that, on the one hand, combina- ular form of D): the shortest path problem [8],
torial optimization (CO) provides a powerful tool transportation networks [2], and the scheduling
to formulate and model many optimization prob- problem [6], [7]; the survey [9] is more general
lems, on the other hand, a multi-objective (MO) examining successively the literature on MO as-
approach is often a realistic and efficient way to signment problems, knapsack problems, network
treat many real world applications. Nevertheless, flow problems, traveling salesman problems, loca-
until recently, Multi-objective combinatorial opti- tion problems, set covering problems.
mization (MOCO) did not receive much attention In the present article we put our attention on
in spite of its potential applications. One of the the existing methodologies for MOCO. First we
reason is probably due to specific difficulties of examine how to determine the set E(P) of all
MOCO models. We can distinguish three main dif- the efficient solutions and we distinguish three ap-

456
Multi-objective combinatorial optimization

proaches: direct methods, two-phase methods and At any node of the branch and bound tree, vari-
heuristic methods. Subsequently we analyse inter- ables are set to 0 or 1; let B0 and B1 denote the
active approaches to generate a 'good compromise' index sets of variables assigned to the values 0 and
satisfying the decision maker. 1, respectively. Let F be the index set of free vari-
ables which always follow, in the order O, those
Generation of E(P). belonging to B1 U B0. If i - 1 is the last index of
fixed variables, we have B1 [2 B0 - { 1 , . . . , i - 1};
Direct methods. The first idea is to use intensively
F-
classical methods for single objective problem (P)
Initially, i - 1. Let
existing in the literature to determine E(P). Of
course, each time a feasible solution is obtained the • W- W - EjEB1 wj ~ 0 be the leftover ca-

k values z k ( X ) are calculated and compared with pacity of the knapsack.


the list E(P) containing all the feasible solutions Z -- \( z k -- EjEB1 c}k))k_l,...,
• -- -K ,\ " be the crite-
already obtained and non dominated by another ria values vector obtained with already fixed
generated feasible solution. Clearly, E(P), called variables.
the set of potential efficient solutions, plays the E(P) contains nondominated feasible val-
role of the so-called 'incumbent solution' in single ues Z and is updated at each new step.
objective methods. At each step, E(P) is updated Initially, z k - 0, Vk, and E(P) - 0.
and at the end of the procedure E(P) - E(P). • Z - (Zk) be the vector whose components are
Such extension of single objective method is spe- upper bounds of feasible values respectively
cially designed for enumerative procedure based on for each objective at considered node. These
a branch and bound approach. Unfortunately, in a upper bounds are evaluated separately, for
MO framework, a node of the branch and bound instance as in the Martello-Toth method.
tree is less often fathomed than in the single ob- Initially, ~k -- c~, Vk.
jective case, so that logically such MO procedure
A node is fathomed in the following two situations"
is less efficient.
We describe below an example of such direct i) i f { j c F ' w j <W}-0;or
method, extending the well known Martello-Toth ii) ~ is dominated by z* E E(P) .
procedure, for the multi-objective knapsack prob- When the node is fathomed, the backtracking pro-
lem formulated as cedure is performed: a new node is build up by
¢ n
setting to zero the variable corresponding to the
t
max
I
zk(Z) v " (k)xj
last index in B1. Let t be this index:
'j=l
k- 1,...,K, B1 +-- Bl\{t},
n
B0 +-- (B0 A { 1 , . . . , t - 1})U {t},
EWjXj ~ W
j=l F +-- {t + 1 , . . . , n } .
zj - (0, 1). When the node is nonfathomed, a new node of the
The following typical definitions are used (k - branch and bound tree is build up for next itera-
1,... ,K)" tion, as follows"
• Ok" variables order according to decreasing • Define s to be the index variable such that
values of c kj / w j .
max I E F" wj < W .
r~k)" the rank of variable j in order Ok. j=i
• 0" variables order according to increasing
If wi > W , set s - i- 1.
of EZ:I
• Ks>i:
We assume that variables are indexed according to
ordinal preference O. B1 +-- B1 U ( i , . . . , s } ,

457
Multi-objective combinatorial optimization

BO +-- Bo~ ported efficient solutions giving the same op-


timal value as X r and X s for z:x(X); we put
F F\{i,...,
them in list S'.
If s - i - l ,
This first phase is continued until all pairs
U 1 +- B 1 U { r } , (X ~, X s) of S have been examined without exten-
B0 +-- B0 U { i , . . . , r - 1 ) , sion of S.
F F \ { i , . . . , r}, Finally, we obtain S E ( P ) - S U S ~ as illustrated
in Fig. 1.
with r - min{j E F" wj < W}. 12

The procedure stops when the initial node is fath-


omed and then E(P) - E(P). An illustration is
given in [10].

Two-Phase Method. Such an approach is particu-


larly well designed for bi-objective MOCO prob-
lems. The first phase consists to determine the set
S E ( P ) of supported efficient solutions (see M u l t i -
=-->m1
o b j e c t i v e i n t e g e r l i n e a r p r o g r a m m i n g ) . Let
S U S ~ be the list of supported efficient solutions • ~XIrtt;~l.o0w tkn | w

already generated; S is initialized with the two ef-


Fig. 1: S E ( P ) - S t.3 S'.
ficient optimal solutions respectively of objectives
Zl and z2. Solutions of S are ordered by increas- The purpose of the second phase is to gener-
ing value of criterion 1; let Xr and Xs be two ate the set N S E ( P ) = E(P) \ S E ( P ) of non-
consecutive solutions in S, thus with Zl~ < Zls supported efficient solutions. Each nonsupported
and z2~ > Z2s, where Zkl -- zk(X,). The following efficient solution has its image inside the trian-
single-criterion problem is considered: gle AZr Zs determined by two successive solutions
Z r and X s of S E ( P ) (see Fig. 1). So each of the
min z~(X) - )~lZl (X)-~- )~2z2(X)
I S E ( P ) I - 1 triangles A Z r Z s are successively anal-
(P~) X E S - D N B (n)
ysed. This phase is more difficult to manage and is
AI_>0, A2>0. dependent of the particular MOCO problem anal-
This problem is optimized with a classical sin- ysed; in general, this second phase is achieved us-
gle objective CO algorithm for the values A1 - ing partly a classical single objective CO method.
Z 2 r - z2s and A2 - z l s - zl~; with these values the An example of such second phase is given in B i-
search direction z x ( X ) corresponds in the objec- o b j e c t i v e a s s i g n m e n t p r o b l e m and in [14] for
tive space to the line defined by Zr and Zs. Let the bi-objective knapsack problem.
{ X t" t - 1 , . . . , T } be the set of optimal solutions
Heuristic Methods. As pointed out in [9], [10], [14],
obtained in this manner and { Z t ' t - 1 , . . . , T }
it is unrealistic to extend the exact methods de-
their images in the objective space. There are two
scribe above to MOCO problems with more than
possible cases:
two criteria or more than a few hundred variables;
• {Zr, Zs} N {Zt" t - 1 , . . . , T } - 0" Solutions the reason is that these methods are too consum-
X t are new supported efficient solutions. X 1 ing time. Because a metaheuristic, simulating an-
and X T, provided T > 1, are put in S and, nealing (SA), tabu search (TS), genetic algorithms
if T > 2, X 2 , . . . , X T-1 are put in S'. It will (GA), etc., provide, for the single objective prob-
be necessary at further steps to consider the lem, excellent solutions in a reasonable time, it ap-
pairs (X r, X 1) and ( X T, X s) peared logical to try to adapt these metaheuristics
• {Zr, Zs} C {Zt" t - 1 , . . . , T } " Solutions to a multi-objective framework.
{ X t" t - 1 , . . . , T } \ {X r , z s} are new sup- The seminal work in this direction is the 1993

458
Multi-objective combinatorial optimization

Ph.D. thesis of E.L. Ulungu, which gave rise to in a single-objective problem (P9) defined by the
the so-called M O S A method to approximate E(P) global weighted deviation function:
(see, in particular, [11]). After this pioneer study, K
this direction has been tackled by other research min ~-~.pad k
t e a m s : P. Czyzak and A. Jaszkiewicz ([3]) pro- k=l
posed another way to adapt simulating annealing (Pg) s.t. z k ( X ) + d~ - d k - gk, Vk,
to a MOCO problem; independently, [5], [4] and [1] X C S- D N B n.
did the same with tabu search, the later combining
When a solution is obtained, the decision maker
also tabu search and genetic algorithms; genetic
can possibly modify the values of the goals gk be-
algorithms are also used in [13].
fore a new iteration is performed. One drawback
The principle idea of MOSA method can be re-
is that the additional goal constraints induce the
sumed in short terms. One begins with an initial
loss of the particular structure of the initial CO
iterate X0 and initializes the set of potentially effi-
problem, so that a general ILP software must be
cient points P E to just contain X0. One then sam-
used to solve problem (Pg).
ples a point Y in the neighborhood of the current
iterate. But instead of accepting Y if it is better Interactive Two-Phase Methods and MOSA
than the current iterate on an objective: we now Method. The two-phase methodology described
accept it if it is not dominated by any of the points above can easily be adapted to build interac-
currently in the set P E . If it is not dominated, we tively a good compromise. At each step of the
make Y the current iterate, add it to P E , and first phase, the decision maker can indicate which
throw out any point in P E that are dominated pair (X~, Xs) he prefers so that only a small subset
by Y. On the other hand, if Y is dominated, we of S E ( P ) is generated in the direction given by the
still make it the current iterate with some proba- decision maker; at the second phase, only one (or
bility. In this way, as we move the iterate through a few number of) triangles /~Z~Z~ is (are) anal-
the space, we simultaneously build up a set P E of ysed to verify if there exists in it a more satisfying
potentially efficient points. The only complicated nonsupported efficient solution. In the same spirit,
aspect of this scheme is the method for computing an interactive MOSA method can be designed (see
the acceptance probability for Y when it is dom- also [12]): the decision maker gives some goals gk
inated by a point in P E . The MOSA method is and only the solutions satisfying z k ( X ) < gk are
described in details in [11] and in Bi-objective putting in the list of potential efficient solutions.
assignment problem. When this list contains a certain a priori fixed
number of solutions, the decision maker indicates
which one is preferred, modifies the goals gk in a
Interactive Determination of a G o o d C o m -
more restrictive sense before to continue the search
p r o m i s e . The general idea of interactive methods
with MOSA.
is described in Multi-objective integer linear
An example of such interactive procedure is
programming. Two types of methods can be dis-
given in [12] for a real case study.
tinguished, which we treat in the following subsec-
See also: Fractional combinatorial opti-
tions.
mization; Replicator dynamics in com-
Goal Programming. As pointed out in [9], this binatorial optimization; N e u r a l n e t w o r k s
methodology is often used by American re- for combinatorial optimization; Combina-
searchers to treat several case studies. The gen- torial matrix analysis; Combinatorial op-
eral idea of goal programming method is to in- timization algorithms in resource alloca-
troduce for each objective k deviation variables tion problems; Combinatorial optimization
d + and d-, respectively by excess and by default, games; Evolutionary algorithms in combi-
with respect to a certain a priori goal gk, so that natorial optimization; Multi-objective opti-
goal constraints are defined. If some priorities ex- mization: Pareto optimal solutions, proper-
pressed by some weights Pk are given, this results ties; Multi-objective optimization: Interac-

459
Multi-objective combinatorial optimization

tive methods for preference value functions; [9] ULUNGU, E.L., AND TEGHEM, J.: 'Multi-objective
Multi-objective optimization: Lagrange du- combinatorial optimization problems: A survey', J.
Multi-Criteria Decision Anal. 3 (1994), 83-104.
ality; Multi-objective optimization: Inter-
[10] ULUNGU, E.L., AND TEGHEM, J.: 'Solving multi-
action of design and control; Outranking objective knapsack problem by a branch and bound
methods; Preference disaggregation; Fuzzy procedure', in J. CLIMACO (ed.): Multicriteria Analy-
multi-objective linear programming; Multi- sis, Springer, 1997, pp. 269-278.
objective optimization and decision sup- [11] ULUNGU, E.L., TEGHEM, J., FORTEMPS, PH., AND
TUYTTENS, D.: 'MOSA method: A tool for solving
port systems; Preference disaggregation ap-
MOCO problems I', Multi-Criteria Decision Anal. 8
proach: Basic features, examples from fi- (1999), 221-236.
nancial decision making; Preference model- [12] ULUNGU, E.L., TEGHEM, J., AND OST, CH.: ~ElC[i-
ing; Multiple objective programming sup- ciency of interactive multi-objective simulated anneal-
port; Multi-objective integer linear pro- ing through a case study', J. Oper. Res. Soc. 49 (1998),
gramming; Bi-objective assignment prob- 1044-1050.
[13] VIENNET, R., AND FONTEX, M.: 'Multi-objective com-
lem; Estimating data for multicriteria deci- binatorial optimization using a genetic algorithm for
sion making problems: Optimization tech- determining a Pareto set', Internat. J. Syst. Sci. 27,
niques; Multicriteria sorting methods; Fi- no. 2 (1996), 255-260.
nancial applications of multicriteria anal- [14] VIS~E, M., TEGHEM, Z., PIRLOT, M., AND ULUNGU,
ysis; Portfolio selection and multicriteria E.L.: 'Two-phases method and branch and bound pro-
cedures to solve the bi-objective knapsack problem', J.
analysis; Decision support systems with
Global Optim. 12 (1998), 139-155.
multiple criteria.
Jacques Teghem
References Lab. Math. & Operational Research Fac. Polytechn. Mons
[1] BEN ABDELAZIZ, F., CHAOUACHI, J., AND KRICHEN, 9, rue de Houdain
S.: 'A hybrid heuristic for multiobjective knapsack B-7000 Mons, Belgium
problems', Techn. Report Inst. Sup. Gestion, Tunisie
E-mail address: teghem(Dmathro, fpms. ac. be
s u b m i t t e d (1997).
[2] CURRENT, J.R., AND MIN, H.: 'Multiobjective design
of transportation networks: taxonomy and annotation', MSC2000: 90C29, 90C27
Europ. J. Oper. Res. 26, no. 2 (1986), 187-201. Key words and phrases: multi-objective programming, com-
[3] CZYZAK, P., AND JASZKIEWICZ, A.: 'Pareto simulated binatorial optimization.
annealing - A metaheuristic technique for multiple ob-
jective combinatorial optimization', J. Multi-Criteria
Decision Anal. 7 (1998), 34-47.
[4] GANDIBLEUX, X., MEZDAOUI, N., AND FRI~VILLE, A."
'A tabu search procedure to solve multiobjective com- MULTI-OBJECTIVE INTEGER LINEAR
binatorial optimisation problems', in R. CABALLERO PROGRAMMING, MOILP
AND R. STEUER (eds.): Proc. Volume of MOPGP'96, From the 1970s onwards, multi-objective linear
Springer, 1997.
programming (MOLP) methods with continuous
[5] HANSEN, M.P.: 'Tabu search for multiobjective optimi-
zation: MOTS', Techn. Report Inst. Math. Modelling, solutions have been developed [8]. However, it is
Techn. Univ. Denmark (1996), Submitted for publica- well known that discrete variables are unavoidable
tion. in the linear programming modeling of many ap-
[6] HOOGEVEEN, H.: 'Single machine bicriteria schedul- plications, for instance, to represent an investment
ing', PhD Diss. Univ. Eindhoven (1992).
choice, a production level, etc.
[7] K(3KSALAN, M., AND KOKSALAN-KONDA, CKI.S."
'Multiple criteria scheduling on single machine: A re- The mathematical structure is then integer lin-
view and a general approach', in M. KARWAN ET AL. ear programming (ILP), associated with MOLP
(eds.): Essays in Decision Making, Springer, 1997. giving a MOILP problem. Unfortunately, MOILP
[8] ULUNGU, E.L., AND TEGHEM, J.: 'Multi-objective cannot be solved by simply combining ILP and
shortest problem path: A survey', in M. CERNY,
MOLP methods, because it has got its own spe-
D. GLACKAUFOVA, AND D. LOULA (eds.): Proc. In-
ternat. Workshop on MCDM, Liblice (Czechoslovakia), cific difficulties.
1991, pp. 176-188. The problem (P) considered is defined as

460
Multi-objective integer linear programming

n
K m 2~
!
max xj,
XED Zl ( X ) - 6xl + 3x2 + x3,
j=l
k- 1,...,K, z2(X) - Xl + 3x2 + 6x3,
T X < d, D - ( X " Xl + z2 + x3 <_ 1, zi e {0, 1}}.
X_>0, For this problem,
where D- XER n"
(P) xj integer,
E(P) - {(1, 0, 0); (0, 1, 0); (0, 0, 1)}
jEJ
with T ( m × n), while N S E ( P ) - {(0, 1, 0)}.
d(m × 1), Nevertheless, V.J. Bowman [1] has given a the-
oretical characterization of E(P)" Setting
X(n×l),
J C {1,...,n}. Mk -- max z k ( X ) ,
XED
If we denote L D - { X " T X <_ d, X >_ 0}, problem -2k -- Mk + ~k, with~k>0,
(LP) is the linear relaxation of problem (P)" p>0,
(LP) ' max ' zk (X) , k- 1 , . . . , K,
then E(P) is characterized by the optimal solu-
XELD tions of the problem (P~)"
A solution X* in D (or L D ) is said to be effi-
min max
cient for problem (P) (or (LP)) if there does not XED k
exist any other solution in D (or L D ) such that
z k ( X ) >>_zk(X*), k - 1 , . . , K , with at least one
strict inequality.
(5
Let E(.) denote the set of all efficient solutions consisting of minimizing the augmented weighted
of problem (.). It is well known (see [8])that (LP) Tchebychev distance between z k ( X ) and gk.
may be characterized by the optimal solutions of Let us note that another characterization of
the single objective and parametrized problem: E(P) is given in [2] for the particular case of binary
K variables.
max ~ AkZk(X) Two types of problems can be analysed:
k=l
XELD • Generate E(P) explicitly. Several methods
(LP~) have been proposed; they are reviewed in
with Ak>0, Vk,
K
[10]. below we will present two of them, which
appear general, characteristic and efficient.
E
k=l
Ak--1
• To determine interactively with the decision
This fundamental principle often called Ge- maker a 'best compromise' in E(P) accord-
offrion's theorem is no longer valid in pres- ing to the preferences of the decision maker.
ence of discrete variables because the set D is not Some of the existing approaches are reviewed
convex. The set of optimal solutions of problem in [11]; below we will describe three of these
(P~), defined as problem (LP~) in which L D is interactive methods.
replaced by D, is only a subset SE(P) of E(P);
the solutions in SE(P) are called supported effi-
G e n e r a t i o n of E ( P ) .
cient solutions, while the solutions belonging to
N S E ( P ) = E(P) \ SE(P) are called nonsupported K l e i n - H a n n a n method. See [5]. This is an iterative
efficient solutions. procedure for sequentially generating the complete
The breakdown of Geoffrion's theorem for prob- set of efficient solutions for problem (P) (we sup-
lem (P) can be illustrated by the following obvious pose that the coefficients c~k) are integers); it con-
example: sists in solving a sequence of progressively more

461
Multi-objective integer linear programming

constrained single objective ILP problems and can


'max' E cjxj + E cj
be implemented through use of any ILP algorithm. jCF r jCB r
s.t. ~ tjxj ~ d r
• (Initialization: step 0) An objective function
jCF
1 C { 1 , . . . , K} is chosen arbitrarily and the
xj - (0,1)
following single objective ILP problem is con-
sidered: where B r is the index set of variables
assigned the value one
( P 0 ) max zl(X). F r is the index of free variables
XED
d~-d - ~ tj
jCB r
Let E(P0) be the set of all optimal solutions
tj is the j t h column o f T
of (P0) and let E0(P) be the set of solutions
cj is the vector of components c}k)."
defined as E0(P) = E ( P 0 ) N E(P). Thus,
E0(P) is the subset of nondominated solu- The node S r is called feasible when d r _ 0 and
tions in E(P0). infeasible otherwise. The three basic rules of the
branch and bound algorithm are:
• (Step j, (j > 1)) The efficient solutions gen-
• (bounding rule) A lower and upper bound
erated at the previous steps are denoted by
, j-1 vector, Z r and Z r, respectively, are defined
Z r , r - 1 , . . . , R , i.e. Ui= 1Ei(P) - { X * ' r -
as
1 , . . . , R } . In this j t h step, the following
problem is solved Zr - E cj~
jCB r

max zl(X) Z ~ __ Z ~ + Y~,


XCD
where Y[ - ~jeF~ max(0, cjk }. The vector
(Pj) A

Z ~ is added to a list E of existing lower


fi 6 z k ( X ) ~-- z k ( X * ) + 1 •
r=l k-1 bounds if Z r is not dominated by any of the
A

existing vectors of E. At the same time, any


A

vector of E dominated by Z ~ is discarded.


The new set of constraints represents the re- • (fathoming rules) In the multi-objective case,
quirement that a solution to (P j) be bet- the feasibility of a node is no longer a suffi-
ter on some objective k ~ 1 for each effi- cient condition for fathoming it. The three
cient solution X r generated during the previ- general fathoming conditions are:
ous steps; an example of implementation of - Z r is dominated by some vector of E;
A

these constraints is given in [5]. The set of - the node S r is feasible and Z r - zr;
solutions Ej (P) is then defined as Ej (P) - the node Sr is unfeasible and
E ( P j ) M E(P), where E ( P j ) is the set of all ~ j e F ~ min(O, tij) > d r for some i =
optimal solutions of (P j).
1,...,m.
The usual backtracking rules are applied.
The procedure continues until, at some iteration J,
the problem (Pj) becomes infeasible; at this time • (branching rule) A variable al C F r is se-
E(P) - U j :g-1
oEj(P). lected to be the branching variable.
- I f the node S r is feasible, l E {j E
Kiziltan-Yucaoglu method. See [4]. This is a direct F 0}.
adaptation to a multi-objective framework of the - Otherwise, index 1 is selected by the min-
well-known Balas algorithm for the ILP problem imum unfeasibility criterion"
with binary variables. m

At node S r of the branch and bound scheme, the min E max (0,-d~ + tij).
jCF r
following problem is considered: i=1

462
Multi-objective integer linear programming

When the A
explicit enumeration is complete, in E by X* and a new iteration is per-
E(P) - E. formed;
- if Z* ¢ Z and X* is not preferred to any
solution in E: E is not modified and the
Interactive M e t h o d s . Such methods are partic-
second stage is initiated;
ularly important to solve multi-objective applica-
- if Z* C Z: Z defines a face of the efficient
tions. The general idea is to determine progres-
surface and the second stage is initiated.
sively a good compromise solution integrating the
preferences of the decision maker. • (Stage 2)" Introduction of the best non sup-
The dialog with the decision maker consist of a ported solutions. We will not give details
succession of 'calculation phase' managed by the about this second stage (see [3] or [10]); let us
model and 'information phase' managed by the de- just say that it is performed in the same spirit
cision maker. but considering the single objective problem
At each calculation phase, one or several new max G(X)
efficient solutions are determined taking into ac-
XED
count the information given by the decision maker
at the preceding information phase. At each in- G(X) < G - ~ withe>0
formation phase, a few number of easy questions
where G is the optimal value obtained for the
are asked to the decision maker to collect infor-
last function G(X) considered.
mation about its preferences in regard to the new
solutions.
Steuer-Choo Method. See [9]. Several interactive
Gonzalez-Reeves-Franz Algorithm. See [3]. In this approaches of MOLP problems can also be ap-
method a set E of K efficient solutions is selected plied to MOILP; among them, we mention only
and updated in each algorithm step according to the Steuer-Choo method, which is a very general
the decision maker's preferences. At the end of the procedure based on problem (pT) defined in the
procedure, E will contain the most preferred so- introduction.
lutions. The method is divided in two stages: in The first iteration uses a widely dispersed group
the first one, the supported efficient solutions are of A weighting vectors to sample the set of efficient
considered, while the second one deals with non- solutions. The sample is obtained by solving prob-
supported efficient solutions. lem (pT) for each of the ~ values in the set. Then
the decision maker is asked to identify the most
(Stage 1)" Determination of the best sup- preferred solution X (1) among the sample. At iter-
ported efficient solutions. E is initialized with ation j, a more refined grid of weighting vectors
K optimal solutions of the K single objective is used to sample the set of efficient solution in the
ILP problems. Let us denote by Z the K cor- neighborhood of the point zk(X(J)) (k - 1,... ,K)
responding points in the objective space of in the objective space. Again the sample is ob-
the solution of E. At each iteration, a linear tained by solving several problems (pT) and the
direction of search G(X) is build: G(X)is the most preferred solution X (j+l) is selected. The
inverse mapping of the hyperplane defined by procedure continues using increasingly finer sam-
the points of Z in the objective space into the pling until the solution is deemed to be acceptable.
decision space. A new supported efficient so-
lution X* is determined by solving the single The MOMIX Method. (See [6].) The main charac-
objective ILP problem maxxED G(X) and teristic of this method is the use of an interactive
Z* is the corresponding point in the objec- branch and bound concept initially introduced
tive space. Then: in [7] to design the interactive phase.
- if Z* ~ Z and the decision maker prefers • (First compromise): The following minimax
solution X* to at least one solution of optimization, with m = 1, is performed to
E" the least preferred solution is replaced determined the compromise )~(1):

463
Multi-objective integer linear programming

min 5
one of the following conditions is verified"
(pm) a) D (re+l) - 0;
Vk I-[~m)(M~ m) - zk(X)) <
b) ~/r(m+l) (re+l) < ek Vk;
"'k -- mk --
X E D (m)
c) the vector Z of the incumbent val-
where ues (values of the criteria for the best
- D (~) - D;
compromise already determined) is
preferred to the new ideal point (of
- [m~1), M~ 1)] are the variation intervals of
component ""k ~/r(m+l) ).
the criteria k, provided by the pay-off ta-
ble (see [8]); The first step of the procedure is stopped
if either more than q successive iterations
- II~1) are certain normalizing weights tak-
do not bring an improvement of the in-
ing into account these variation intervals
cumbent point Z or more than Q itera-
(see [8]).
tions have been performed.
REMARK 1 If the optimal solution is not Note that the parameters ek, q and Q are
unique, an augmented weighted Tchebychev fixed in the agreement with the decision
distance is required in order to obtain an ef- maker.
ficient first solution. V-]
• (Backtracking procedure)" It can be hoped
(Interactive phases)" There are integrated in that the appropriate choice of the criterion
an interactive branch and bound tree; a first zlm(1), at each level m of the depth-first pro-
step (a depth-first progression in the tree) gression, has been made so that at the end
leads to the determination of a first good of the first step, a good compromise has been
compromise; the second step (a backtracking found.
procedure) confirms the degree of satisfaction Nevertheless, it is worth examining some
achieved by the decision maker or it finds a other parts of the tree to confirm the satis-
better compromise if necessary. faction of the decision maker. The complete
- (Depth first progression)" For m > 1, let tree is generated in the following manner: at
at the ruth iteration each level, K subnodes are introduced by suc-
1) )~(m) be the ruth compromise; cessively adding the constraints"
2) z~m) be the corresponding values of
the criteria; Zlm(1 ) ( x ) > z /re(l)'
(m)
3) [ra~m) , "'~k
'/r(m) ] be the variation inter- (m) (m)
zl~(2)(X) > zl~(2); Z/m(1 ) ( X ) ~_ Zlm(1),
vals of the criteria; and
4) H~m) be the weight of the criteria.
The decision maker has to choose, at .(m) (m)
this ruth iteration, the criterion/re(l) E Zlm(K)(X) > ~lm(K); zlm(k)(X) <_Zl~(k),
{k" k - 1 , . . . , K } he is willing to
for all k - 1 , . . . , K - 1, where Im(k) E
improve in priority. Then a new con-
{k" k - 1 , . . . , K } is the kth objective that
straint is introduced so that the fea-
the decision maker wants to improve at the
sible set becomes ~ D (re+l) - D (m) N
mth level of the branch and bound tree.
{zlm(1)(X) > z Ira(l)} (m) Further, the vari-
At each level m, the criteria are thus or-
(re+l) ~/r(m+l)
ation intervals [mk , ""k ] and the dered according to the priorities of the de-
rr(m+l)
weights "'k are updated on the new cision maker in regard with the compromise
feasible set D (re+l). The new compromise ~(m).
~(m+l) is obtained by solving the prob- The usual backtracking procedure is ap-
lem (pro+l). plied; yet it seems unnecessary to explore the
Different tests allow to terminate this whole tree. Indeed, the subnode k > K of
first step. The node (m + 1) is fathomed if each branching correspond to a simultaneous

464
Multi-objective integer linear programming

relaxation of those criteria lm(k), k <_K, the problems: Optimization techniques; Multi-
decision maker wants to improve in priority! criteria sorting methods; Financial applica-
Therefore, the subnodes k > K - 2 or tions of multicriteria analysis; Portfolio se-
3, for instance, do almost certainly not bring lection and multicriteria analysis; Decision
any improved solutions. support systems with multiple criteria.
The fathoming tests and the stopping tests
are again applied in this second step. References
[1] BOWMAN JR., V.J.: 'On the relationship of the
Tchebycheff norm of the efficient frontier of multi-
See also: Decomposition techniques for criteria objectives', in H. THIRIEZ AND S. ZIONTS
MILP: Lagrangian relaxation; LCP: Parda- (eds.): Multiple Criteria Decision Making, Springer,
los-Rosen mixed integer formulation; In- 1976, pp. 76-85.
[2] BURKARD, R.E.: 'A relationship between optimality
teger linear complementary problem; In-
and efficiency in multiple criteria 0-1 programming
teger programming: Cutting plane al- problems', Comput. Oper. Res. 8 (1981), 241-247.
gorithms; Integer programming: Branch [3] GONZALEZ, J.J., REEVES, G.R., AND FRANZ, L.S.:
and cut algorithms; Integer programming: 'An interactive procedure for solving multiple objective
Branch and bound methods; Integer pro- integer Linear programming problems', in Y. HAIMES
gramming: Algebraic methods; Integer pro- AND V. CHANKONG (eds.): Decision Making with Mul-
tiple Objectives, Springer, 1985, pp. 250-260.
gramming: Lagrangian relaxation; Inte-
[4] KIZILTAN, G., AND YUCAOGLU, E.: 'An algorithm
ger p r o g r a m m i n g duality; Time-dependent for multiobjective zero-one linear programming', Man-
traveling salesman problem; Set cover- agem. Sci. 29, no. 12 (1983), 1444-1453.
ing, packing and partitioning problems; [5] KLEIN, D., AND HANNAN, E.: 'An algorithm for the
Simplicial pivoting algorithms for integer multiple objective integer linear programming', EJOR
9, no. 4 (1982), 378-385.
programming; Multi-objective mixed inte-
[6] L'Hom, H., AND TEGHEM, J.: 'Portfolio selection by
ger programming; Mixed integer classifica- MOLP using an interactive branch and bound', Found.
tion problems; Integer programming; Mul- Computing and Decision Sci. 20, no. 3 (1995), 175-185.
tiparametric mixed integer linear program- [7] MARCOTTE, 0., AND SOLAND, R.: 'An interactive
ming; P a r a m e t r i c mixed integer nonlinear branch and bound algorithm for multiple criteria opti-
optimization; Stochastic integer program- mization', Managem. Sci. 32, no. 1 (1986), 61-75.
[8] STEUER, R.E.: Multiple criteria optimization theory,
ming: Continuity, stability, rates of conver-
computation and applications, Wiley, 1986.
gence; Stochastic integer programs; Branch [9] STEUER, R.E., AND CHOO, E.-U.: 'An interactive
and price: Integer p r o g r a m m i n g with col- method weighted Tchebycheff procedure for multiple
umn generation; Multi-objective optimi- objective programming', Math. Program. 26 (1983),
zation: Pareto optimal solutions, proper- 326-344.
[10] TEGHEM, J., AND KUNSCH, P.: 'Interactive method
ties; Multi-objective optimization: Interac-
for multi-objective integer linear programming', in
tive methods for preference value functions; G. FANDEL ET AL. (eds.): Large Scale Modelling and
Multi-objective optimization: Lagrange du- Interactive decision analysis, Springer, 1986, pp. 75-
ality; Multi-objective optimization: Inter- 87.
action of design and control; Outranking [11] TEGHEM, J., AND KUNSCH, P.: 'A survey of techniques
methods; Preference disaggregation; Fuzzy for finding efficient solutions to multi-objective inte-
ger linear programming', Asia-Pacific J. Oper. Res. 3
multi-objective linear programming; Multi-
(1986), 1195-106.
objective optimization and decision sup-
Jacques Teghem
port systems; Preference disaggregation ap-
Lab. Math. & Operational Research Fac. Polytechn. Mons
proach: Basic features, examples from finan- 9, rue de Houdain
cial decision making; Preference modeling; B-7000 Mons, Belgium
Multiple objective p r o g r a m m i n g support; E-mail address: teghem~mathro. :fpms. a c . b e
Multi-objective combinatorial optimization; MSC2000: 90C29, 90C10
Bi-objective assignment problem; Estimat- Key words and phrases: multi-objective programming, in-
ing data for multicriteria decision making teger, linear programming.

465
Multi-objective integer linear programming

MULTI-OBJECTIVE MIXED INTEGER approaches designed for all-integer problems that


PROGRAMMING do not apply to the mixed integer case. There-
A multi-objective (multicriteria) mixed integer fore, even for the linear case, techniques for dealing
programming (MOMIP) problem is a mathemat- with multi-objective mixed integer programming
ical programming problem that considers more involve more than the combination of MOLP with
than one objective function and some but not all multi-objective integer programming techniques.
the variables are constrained to be integer valued.
The integer variables can either be binary or take Efficiency and Nondominance. The concept of
on general integer values. The problem may be efficiency (or nondominance) in MOMIP is de-
stated as follows: fined as usually for multi-objective mathemati-
cal programming: A solution 5 E X is efficient
max Zl - - f l ( X )
if and only if it does not exist another x E X
such that fi(x) >_ fi(-x) for a l l / C {1,... ,k} and
- Sk(x) fi(x) > fi(~) for at least one i. h solution 5 E X
is weakly efficient if and only if it does not ex-
s.t. xEX
ist another x C X such that fi(x) > fi(x) for all
where X C R n denotes the nonconvex set of fea- i E {1,...,k}.
sible solutions defined by a set of functional con- Let Z C R k be the image of the feasible region
straints, x >_ 0 and xj integer j E J C { 1 , . . . , n } . X in the criterion (objective function) space. A cri-
It is assumed that X is compact (closed and terion point ~ E Z corresponding to a (weakly) ef-
bounded) and nonempty. ficient solution 5 C X is called (weakly) nondomi-
Although a MOMIP problem may be nonlinear, nated. The designations 'efficient', 'nondominated'
models with linear constraints and linear objective and 'Pareto optimal' are often used as synonyms.
functions have been more often considered. In a Z
multi-objective mixed integer linear programming
(MOMILP) problem, the functional constraints
can be defined as Ax <_ b, and the objective func-
tions fi(x) - cix, i - 1 , . . . , k, where A is a m x n
matrix, b is a m-dimensional column vector and ci,
i - 1 , . . . , k, are n-dimensional row vectors. c B
Multi-objective mixed integer programming is z2 -- C

very useful for many areas of application such


as communication, transportation and location,
among others. Integer variables are required in a A
real-world model whenever it is sought to incor-
porate discrete phenomena; for instance, invest- >
ment choices, production levels, fixed charges, log- Zl
ical conditions or disjunctive constraints. However,
Fig. 1" N o n d o m i n a t e d criterion p o i n t s of a M O M I L P
research on MOMIP has been rather limited. Con-
problem.
cerning multi-objective mathematical program-
ming, most research efforts have been so far de-
voted to linear programming with continuous vari- Supported and U n s u p p o r t e d Nondom-
ables (MOLP). The introduction of discrete phe- inated Solutions. Since the feasible re-
nomena into multi-objective models leads to all- gion is nonconvex, unsupported nondominated
integer or mixed integer problems that are more points/solutions may exist in a MOMIP problem.
difficult to tackle. They can not be handled by A nondominated point ~ C Z is unsupported if it is
most MOLP approaches because the feasible set is dominated by a convex combination (which does
no longer convex. Also, there are multi-objective not belong to Z) of other nondominated criterion

466
Multi-objective mixed integer programming

points (belonging to Z). In Fig. 1 the line segment Interactive Versus N o n i n t e r a c t i v e M e t h o d s .


from A to B plus D is the set of supported non- Methods may be either noninteractive (in general,
dominated criterion points. The line segment from generating methods designed to find the whole or
C to D excluding C and D is the set of unsup- a subset of the nondominated solutions) or inter-
ported nondominated criterion points. Note that active (characterized by phases of human inter-
convex combinations of B and D dominate the vention alternated with phases of computation).
line segment from C to D, excluding D. C is a Generating methods for MOMIP problems usu-
weakly nondominated solution. ally require an excessive amount of computational
resources, both in processing time and storage
C h a r a c t e r i z a t i o n of the N o n d o m i n a t e d Set. capacity. Even specialized generating algorithms
Unlike MOLP, the nondominated (or efficient) set developed just for bi-objective problems, which
of MOMIP problems can not be fully determined profit from graphical representations on the cri-
by parameterizing on ~ the weighted-sums pro- terion space, tend to be inadequate to deal with
gram: large problems. Nevertheless, the distinction be-
tween interactive and generating methods is not al-
max )~if i (x ) " x E X } ways clear. Some approaches attempt to find a rep-
(P),) resentative subset of the nondominated set (gener-
i--1
where )~ C A. ating methods according to the above definition)
Here, and would be easily embodied in an interactive
[ framework. The bi-objective method of R. Solanki
A- ~ c R k" )~i > 0 Vi, [14] may be regarded as an example of such an
k j
( Y~i=l Ai - 1 approach.
The unsupported nondominated solutions cannot Taking into account the difficulties mentioned
be reached even if the complete parameterization above, and the large number of nondominated so-
on )~ is attempted. lutions in many problems, special attention to in-
teractive methods will be paid. First of all, a short
Researchers on multi-objective mathematical
remark is made about the major paradigms fol-
programming early recognized this fact and stated
lowed by the authors of interactive methods. Some
other characterizations for the nondominated set
authors admit that the decision maker's (DM)
that fit MOMIP and, in particular, MOMILP
preferences can be represented by an implicit util-
problems. Basically, two main characterizations
are defined. One consists of introducing additional
ity function. The interactive process consists in
building a protocol of interaction aiming to dis-
constraints into the weighted-sums program. Gen-
cover the optimum (or an approximation of it)
erally, these constraints impose bounds on the ob-
of that implicit utility function. The convergence
jective function values. This form of characteriza-
to this optimum requires no contradictions in the
tion may be regarded as a particularization of the
DM's responses given throughout the interactive
general characterization provided by R.M. Soland
process.
[13]. The other is based on the Tchebycheff the-
ory whose theoretical foundation originated from In contrast with implicit utility function ap-
V.J. Bowman [3]. More details about these charac- proaches, the open communication approaches are
terizations and on how they provide the computa- based on a progressive and selective learning of
tion of nondominated solutions will be given later. the nondominated set. The terminology of open
Although providing very important theoretical re- communication is inspired on the concept of open
sults, the characterizations of the nondominated exchange, defined by P. Feyerbend [6]. Such multi-
set do not offer an explicit means to provide deci- objective approaches are not intended to converge
sion support for MOMIP problems. However, some to any 'best' compromise solution but to help the
authors have developed decision support methods DM to avoid the search for nondominated solu-
for these problems. tions he/she is not at all interested in. There are no
irrevocable decisions during the whole process and

467
Multi-objective mixed integer programming

the DM is always allowed to go 'backwards' at a a particular nondominated solution. Other types of


later interaction. So, at each interaction, the DM is additional constraints can also be used.
only asked to give some indications on what direc- A scalarizing program which consists of the
tion the search for nondominated solutions must weighted-sums program combined with additional
follow, or occasionally to introduce additional con- constraints is used for computing nondominated
straints. The process only finishes when the DM solutions in the interactive branch and bound
considers to have gained sufficient insight into the method of B. Villarreal et al. [18]. The addi-
nondominated solution set. Using the terminology tional constraints are bounds imposed on integer
of B. Roy [12], 'convergence' must give place to variables by the branching process. This method,
'creation'. The interactive process is a construc- which is devoted to MOMILP problems, received
tive process, not the search for something 'pre- later improvements in [8] and [11]. Starting by ap-
existent'. plying the well-known (MOLP) Zionts-Wallenius
Although we personally prefer the open com- procedure to the linear relaxation of the MOMILP
munication methods, we will include in the next problem, the method then employs a branch and
section a tentative classification of both, draw- bound phase until an integer solution that satis-
ing out some differences and similarities between ties the DM is achieved. An implicit utility func-
them. We adopt this perspective because this ques- tion is assumed and the DM's preferences are as-
tion is not specific to mixed integer programming sessed using pairwise evaluations of decision alter-
and arguments pro or against each approach, be- natives and trade-off analysis. In light of the DM's
sides being subjective, are the same as in other underlying utility function, decisions on whether
multi-objective programming fields. Furthermore, to apply again the Zionts-Wallenius procedure to
since MOMIP is still in its early steps, no behav- the linear relaxation of a candidate multi-objective
ioral studies exist addressing the use of procedures subproblem, or to continue to branch by append-
within this context. ing a constraint on a variable, are successively
As we have mentioned before, research on made.
MOMIP has been rather scarce in comparison to Another method that uses particular forms of
other fields of the multi-objective mathematical (P~,9) to compute nondominated solutions is due
programming, namely in MOLP. We will men- to Y. Aksoy [1]. This is an interactive method for
tion herein some well-known methods specially de- bicriterion mixed integer programs that employs a
signed for MOMIP or far more generally applica- branch and bound scheme to divide the subset of
ble. nondominated solutions considered at each node
into two disjoint subsets. The branching process
seeks to bisect the range of nondominated values
C o m p u t i n g P r o c e s s e s a n d T h e i r U s e in In- for z2 at the node under consideration, checking
teractive Methods. whether a nondominated point exists whose value
Weighted-Sums Programs with Additional Con- for z2 is in the middle of the range. If no such
straints. The introduction of bounds on the ob- solution exists, that subset is divided using two
jective function values into the weighted-sums pro- nondominated points whose values for z2 are the
gram (P~) enables this program to also compute closest (one up and the other down) to the mid-
unsupported nondominated solutions: dle value. These nondominated solutions are ob-

(P~,g) max /k~ A i f i ( x ) " x e X, f ( x ) > g


j=l
/ ,
tained by solving (P~,g) optimizing one objective
function and bounding the other. The interactive
process requires the DM to make pairwise com-
parisons in order to determine the branching node
where f ( x ) : ( f l ( x ) , . . . ,fk(x)), A e A and g is a and to adjust the incumbent solution to the pre-
vector of objective bounds. Besides the fact that ferred nondominated solution. It is assumed that
every solution obtained by (P~,g) is nondominated, the DM's preferences are consistent, transitive and
there always exists a g E R k such that (P~,g) yields invariant over the process aiming to optimize the

468
Multi-objective mixed integer programming

DM's implicit utility function. may be portions of the nondominated set that the
C. Ferreira et al. [5] proposed a decision support program is unable to compute, even considering p
system for bicriterion mixed integer programs. The very small (for example, the line segment from C
interactive process follows an open communication to C' in Fig. 2, for a given p), this characteriza-
protocol asking the DM to specify bounds for the tion is still possible in practice. Note that p can be
objective function values. These bounds are in- set so small that the DM is unable to discriminate
put into (P~,g) defining subregions to carry on the between those solutions and a nearby weakly non-
search for nondominated solutions. Some objective dominated solution (this corresponds to C' getting
space regions are progressively eliminated either closer to C in Fig. 2).
by dominance or infeasibility.
In [16] and [15] a lexicographic weighted
Tchebycheff and Achievement Scalarizing Pro- Tchebycheff program is proposed for the nonlinear
grams. Bowman [3] proved that the parameter- and infinite-discrete feasible region cases to over-
ization on w of minx~x Ill - f(x)llw generates come this drawback of the augmented weighted
the nondominated set, where wi > 0 for all i, Tchebycheff program. The lexicographic approach
k can also be applied to the mixed integer (linear)
~-~i=lwi - 1, f is a criterion point such that
f > f ( x ) for all x E Z and I I f - f ( x ) l l w de- case. However, it is more difficult to implement
notes the w-weighted Tchebycheff metric, that since two stages of optimization are employed. At
is, m a x l < i < k { w i l f i - fi(x)l}. This scalarizing pro- the first stage only a is minimized. When the first
gram is equivalent to stage results in alternative optima, a second stage
is required. It consists of minimizing - ~ i k l fi (z)
min a
over the solutions that minimize a in order to elim-
(Tw) s.t. wi (-]i - fi(x)) <_ a, 1 <_ i <_ k, inate the weakly nondominated solutions.
xEX, a>_O. Besides (Tw) (either the augmented or the
(Tw) may yield weakly nondominated solutions lexicographic forms), there are other similar ap-
(for instance, point C in Fig. 1). Replacing the ob- proaches that also allow to characterize the non-
jective function in (Ww) by a - p ~-]~k_1 f i ( x ) w i t h dominated set of multi-objective mixed integer
p a small positive value, all the solutions returned programs. An approach of this type consists in
by this augmented weighted Tchebycheff program discarding the w-vector or fixing it and varying
are nondominated. R.E. Steuer and E.-U. Choo f, the criterion reference point that represents the
[16] proved that there are always p small enough DM's aspiration levels. This scalarizing program
that enable to reach all the nondominated set for can be denoted by (TT). There always exist refer-
the finite-discrete and polyhedral feasible region ence points satisfying f > f ( x ) for all x E X, such
cases. that (Ty) produces a particular nondominated so-
lution ~ - f (5). The variation of f can be done ac-
cording to a vector direction 0, leading to (T]+e).
i
l
_ l)/w,~,h,ed The reference points are thus projected onto the
p
i
nondominated set. Reference points that do not
i f this distance ~rchebycheffcont
i~Augmenledweighted our,,:
~ Tchebycheffconto~r
satisfy the condition f > f ( x ) for all x E X may
also be considered provided that the a variable is
4 =z~, C'
defined without sign restriction. This corresponds
to the minimization of a distance from Z to the
reference point if the latter is not attainable and
2, to the maximization of such a distance if the ref-
erence point is attainable. If reference or aspira-
Fig. 2: Illustration of the augmented weighted Tchebycheff tion levels are used as controlling parameters, the
metric. (weighted) Tchebycheff metric changes its form of
Concerning the MOMIP case, although there dependence on controlling parameters and should

469
Multi-objective mixed integer programming

be interpreted as an achievement function [9]. tives for the points forming the pair. The algorithm
finishes when the maximum 'error' is lower than a
Like (Tw), the simplest form of (TT) may pro-
predefined maximum allowable 'error'.
duce weakly nondominated solutions. The aug-
mented form is a good substitute in practice and Another interactive method capable of solving
the lexicographic approach guarantees that all MOMIP problems was developed by A. Durso [4].
nondominated solutions can be reached. In what This method employs a branching scheme consid-
follows, let (T.) denote either the simplest, the ering progressively smaller portions of the non-
augmented or the lexicographic form. dominated set by imposing lower bounds on the
criterion values. At each interaction, the k non-
Scalarizing programs (T~), (TT) and their ex-
dominated solutions that define the (quasi)ideal
tensions or slight different formulations are used
criterion point for each new node are calculated.
to generate nondominated solutions in several (in-
The DM is then asked to select the node for
teractive) methods proposed in literature, namely
branching by choosing the preferred ideal point.
in the following ones. The branching process begins by solving an equally
Steuer and Choo [16] proposed a general weighted augmented Tchebycheff program to de-
purpose multi-objective programming interactive termine a 'centralized' nondominated point for the
method that assumes an implicit DM's utility subset of the node under exploration. Once the
function without any special restriction on shape. DM chooses the most preferred of the k + 1 non-
The strategy of the interactive procedure is to dominated points already known for this node, say
sample series of progressively smaller subsets of ~', up to k new nodes (children) are created. Each
nondominated solutions. At each interaction, the child inherits its parent's bounding constraints and
DM selects his/her preferred solution from a sam- uses ~ to further restrict one of them. Thus, the
ple of nondominated solutions obtained from (T~) ith child restricts the ith criterion by imposing
with several w-vectors and the ideal criterion point fi(x) >_ ~i + 5 with 5 small positive. This approach
in the role of f. The solution preferred by the may be regarded as an open communication pro-
DM provides information to tighten the set of w- cedure that terminates when the DM is satisfied
vectors for the next interaction. The procedure ter- with the incumbent solution (the preferred non-
minates when a nondominated criterion point suf- dominated solution obtained so far).
ficiently close to the optimal criterion point of the M.J. Alves and J. Climaco [2] proposed a
underlying utility function is found. MOMILP open communication interactive ap-
Solanki's method [14], which is designed for proach. It combines the Tchebycheff theory with
bi-objective mixed integer linear programs, is the traditional branch and bound technique for
an adaptation of the noninferior set estimation solving single-objective mixed integer programs.
(NISE) method developed by J.L. Cohon for bi- At each interaction, the DM specifies either a ref-
objective linear programs. It seeks to generate erence point f , which is input in (TT) to compute
a representative subset of nondominated solu- a nondominated solution via branch and bound,
tions by combining the NISE's key features with or just selects an objective function, say fj, he/she
weighted Tchebycheff scalarizing programs. At wants to improve with respect to the previous non-
each iteration, a new nondominated solution, say dominated solution. In the latter case, the refer-
z 3, is computed by solving (Tw) for specific w and ence point is automatically adjusted by increas-
N

f, assuring that z 3 belongs to the region between ing the j t h component of f keeping the others
a pair of nondominated criterion points previously equal, in order to produce new nondominated solu-
determined, say (z 1, z2). This pair is then replaced tions (directional search) more suited to the DM's
by (z 1, z3) and (z 3, z2). The approximation of the preferences. This involves an iterative process of
nondominated surface is progressively improved, sensitivity analysis and operations to update the
thus decreasing the 'errors' associated with the ap- branch and bound tree. The sensitivity analysis
proximate representation of the pairs. This 'error' takes advantage of the special behavior of the para-
is measured by the largest range of the two objec- metric scalarizing program (T]+o). It returns a

470
Multi-objective mixed integer programming

value Oj > 0 such that the structure of the pre- proaches are continuous/integer ([7], [10]) working
vious branch and bound tree remains unchanged almost all the time with nondominated continuous
for variations in f j up to f j + Oj. Therefore, refer- solutions of the linear relaxation of the problem.
ence points f + 0 - ( f l, . . . , f j + Oj," " , f k) with Whenever the DM finds a satisfactory continuous
Oj _< Oj lead to nondominated solutions that may solution, an integer nondominated solution close
be obtained in a straightforward way. If the DM to it is then computed.
wishes to continue the search in the same direction,
a slight increase over Oj, say Oj + e, is first consid- C o n c l u s i o n s a n d F u t u r e D e v e l o p m e n t s . Most
ered. In this case, the previous sensitivity analysis methods developed so far for MOMIP problems re-
also returns the best candidate node, i.e., an ances- quire an excessive amount of computational effort,
tor of the node that will produce the next nondom- or require too much cognitive load from the DM,
inated solution. The previous branch and bound or only address bi-objective problems. In addition,
tree is thus used to proceed to the next computa- computational experience with real-world applica-
tions. Since further branching is usually required, tions is lacking. Although interesting or promising
an attempt is made to simplify the tree before en- approaches have been developed, further research
larging it. The underlying idea is to avoid an ever- efforts must be made in order to build effective in-
growing tree. This simplification means cutting off teractive methods able to handle real-sized prob-
parts of the tree linked by branching constraints lems.
no longer active. In sum, this approach brings to- See also" M i x e d i n t e g e r classification prob-
gether sensitivity analysis phases meant to adjust lems; I n t e g e r p r o g r a m m i n g ; Simplicial piv-
the reference point and simplification/branching o t i n g a l g o r i t h m s for i n t e g e r p r o g r a m m i n g ;
operations of the search tree to compute nondomi- Set covering, p a c k i n g a n d p a r t i t i o n i n g prob-
nated solutions. This process is repeated as long as lems; T i m e - d e p e n d e n t t r a v e l i n g s a l e s m a n
the DM wishes to continue the directional search p r o b l e m ; G r a p h coloring; I n t e g e r p r o g r a m -
or if the reference point has not been adjusted m i n g duality; I n t e g e r p r o g r a m m i n g : La-
enough to yield a nondominated solution differ- grangian relaxation; Integer programming:
ent from the previous one (a situation that occurs Algebraic methods; Integer programming:
more often in all-integer programs than in mixed B r a n c h a n d b o u n d m e t h o d s ; I n t e g e r pro-
integer models). Computational experiments have g r a m m i n g : B r a n c h a n d cut a l g o r i t h m s ; In-
shown that this multi-objective approach succeeds t e g e r p r o g r a m m i n g : C u t t i n g plane algo-
in performing directional searches. The times of r i t h m s ; I n t e g e r linear c o m p l e m e n t a r y prob-
computing phases using simplification/branching lem; L C P : P a r d a l o s - R o s e n m i x e d inte-
operations have been significantly reduced by this ger f o r m u l a t i o n ; D e c o m p o s i t i o n t e c h n i q u e s
strategy. for M I L P : L a g r a n g i a n relaxation; M u l t i -
o b j e c t i v e i n t e g e r linear p r o g r a m m i n g ; Mul-
Some researchers have developed other methods t i p a r a m e t r i c m i x e d i n t e g e r linear p r o g r a m -
for multi-objective integer programming that are ming; P a r a m e t r i c m i x e d i n t e g e r n o n l i n e a r
also applicable to the mixed integer case. Good optimization; Stochastic integer program-
examples of such approaches are those in [17], [10] ming: C o n t i n u i t y , stability, r a t e s of conver-
and [7]. In our opinion, they all are open commu- gence; S t o c h a s t i c i n t e g e r p r o g r a m s ; B r a n c h
nication procedures that share some key features, a n d price: I n t e g e r p r o g r a m m i n g w i t h col-
namely the concept of projecting a reference direc- umn generation.
tion onto the nondominated surface (although this
procedure is used in different ways) and the type of References
information required about the DM's preferences.
[1] AKSOY, Y.: 'An interactive branch-and-bound algo-
rithm for bicriterion nonconvex/mixed integer pro-
This information lies fundamentally in the speci- gramming', Naval Res. Logist. 37 (1990), 403-417.
fication of aspiration levels for the objective func- [2] ALVES, M.J., AND CLIMACO, J" 'An interactive
tion values (reference points). Some of these ap- reference point approach for multiobjective mixed-

471
Multi-objective mixed integer programming

integer programming using branch-and-bound', Europ. weighted Tchebycheff procedure for multiple objective
J. Oper. Res. 124, no. 3 (2000), 478-494. programming', Math. Program. 26 (1983), 326-344.
[3] BOWMAN, V.J.: 'On the relationship of the Tcheby- [17] VASSILEV, V., AND NARULA, S.C.: 'A reference direc-
cheff norm and the efficient frontier of multiple-criteria tion algorithm for solving multiple objective integer
objectives', in H. THIRIEZ AND S. ZIONTS (eds.): Mul- linear programming problems', J. Oper. Res. Soc. 44,
tiple Criteria Decision Making, Vol. 130 of Lecture no. 12 (1993), 1201-1209.
Notes Economics and Math. Systems, Springer, 1976, [ls] VILLARREAL, B., KARWAN, M.H., AND ZIONTS, S.:
pp. 76-86. 'An interactive branch and bound procedure for mul-
[4] DURSO, A.: 'An interactive combined branch-and- ticriterion integer linear programming', in G. FANDEL
bound/Tchebycheff algorithm for multiple criteria op- AND T. GAL (eds.): Multiple Criteria Decision Making:
timization', in A. GOICOECHEA, L. DUCKSTEIN, AND Theory and Application, Vol. 177 of Lecture Notes Eco-
S. ZIONTS (eds.): Multiple Criteria Decision Making, nomics and Math. Systems, Springer, 1980, pp. 448-
Proc. 9th Internat. Conf., Springer, 1992, pp. 107-122. 467.
[5] FERREIRA, C., SANTOS, B.S., CAPTIVO, M.E.,
CLIMACO, J., AND SILVA, C.C." 'Multiobjective loca- Maria JoSo Alves
tion of unwelcome or central facilities involving envi- Fac. Economics Univ. Coimbra and INESC
ronmental aspects: A prototype of a decision support Coimbra, Portugal
system', Belgian J. Oper. Res., Statist. and Computer E-mail address: mjoao~inescc.pt
Sci. 36, no. 2-3 (1996), 159-172. JoSo Climaco
[6] FEYERABEND,P.: Against method, Verso, 1975. Fac. Economics Univ. Coimbra and INESC
[7] KARAIVANOVA,J., KORHONEN, P., NARULA, S., WAL- Coimbra, Portugal
LENIUS, J., AND VASSILEV, V.: ' i reference direction
approach to multiple objective integer linear program- MSC2000: 90C29, 90Cll
ming', Europ. J. Oper. Res. 81 (1995), 176-187. Key words and phrases: multi-objective mathematical pro-
[8] KARWAN, M.H., ZIONTS, S., VILLARREAL, B., AND gramming, multicriteria analysis, interactive method.
RAMESH, R.: 'An improved interactive multicriteria
integer programming algorithm', in Y. HAIMES AND
V. CHANKONG (eds.): Decision Making with Multiple
Objectives, Vol. 242 of Lecture Notes Economics and
Math. Systems, Springer, 1985, pp. 261-271.
M U L T I - O B J E C T I V E OPTIMIZATION AND
[9] LEWANDOWSKI,i., AND WIERZBICKI, i . : 'Aspiration DECISION S U P P O R T SYSTEMS
based decision analysis and support. Part I: Theoreti- Multiple criteria decision making ( M C D M ) refers
cal and methodological backgrounds', WP-88-03, In- to the explicit i n c o r p o r a t i o n of more t h a n one eval-
ternat. Inst. Appl. Systems Anal. (IIASA), Austria
uation criteria into a decision problem. M C D M
(1988).
[10] NARULA, S.C., has been a very active field of research roughly
AND VASSILEV, V.: 'An interactive
algorithm for solving multiple objective integer lin- since the 1970s. A l t h o u g h b o u n d a r i e s might be
ear programming problems', Europ. J. Oper. Res. 79 fuzzy and overlapping, multicriteria decision anal-
(1994), 443-450. ysis (studying the problem of identifying the
[11] RAMESH, R., ZIONTS, S., AND KARWAN, M.H.: 'A 'most-preferred' a m o n g a finite discrete set of
class of practical interactive branch and bound algo-
alternatives), m u l t i - a t t r i b u t e utility theory (us-
rithms for multicriteria integer programming', Europ.
J. Oper. Res. 26 (1986), 161-172. ing utility functions explicitly to model a deci-
[12] RoY, B.: 'Meaning and validity of interactive proce- sion maker's preferences) and multi-objective op-
dures as tools for decision making', Europ. J. Oper. t i m i z a t i o n (modeling the decision p r o b l e m within
Res. 31 (1987), 297-303. a m a t h e m a t i c a l p r o g r a m m i n g framework) have
[13] SOLAND, R.M.: 'Multicriteria optimization: A general
emerged as m a j o r fields of interest under M C D M .
characterization of efficient solutions', Decision Sci. 10
(1979), 26-38. For more i n f o r m a t i o n on the general field of
[14] SOLANKI, R.: 'Generating the noninferior set in mixed M C D M , see [21].
integer biobjective linear programs: An application to Multi-objective mathematical programming
a location problem', Comput. Oper. Res. 18, no. 1 provides a flexible modeling framework t h a t al-
(1991), 1-15.
lows for simultaneous o p t i m i z a t i o n of more t h a n
[15] STEUER, R.: Multiple criteria optimization: Theory,
one objective function over a feasible set. Mathe-
computation and application, Wiley, 1986.
[16] STEUER, R., AND Cnoo, E.-U.: 'An interactive matically, the multi-objective o p t i m i z a t i o n prob-
lem can be expressed as:

472
Multi-objective optimization and decision support systems

scribes the decision maker's preferences [14], or as


(MOO) {max f(x),
in goal programming [7] and compromise program-
s.t. x E X,
ming [23], a standard model can be imposed upon
where X C R n is the set of feasible alternatives the decision maker. As these methods reduce the
and f = ( f l , . . . , fp): R n --+ R p, p > 2, is a vector- (MOO) problem to a single-objective optimization
valued function. Note that X can be any set, con- problem and they aspire to find a single solution
tinuous or discrete, expressed through constraints, to it, they have received considerable recognition
and the objective function f can be of any form. although their assumptions are usually restrictive.
The increased flexibility provided by (MOO) The interactive methods require the interaction
also raises the question of what constitutes a solu- of the decision maker with the computer while
tion to it. The definition of optimality is no longer solving a particular (MOO) problem. Usually, the
valid, as each objective function would possibly idea is to construct a model that proposes solutions
yield a different optimal solution. Therefore solv- to the (MOO) problem based on some initial in-
ing the (MOO) problem is about studying the in- put. The decision maker is then invited to reply to
herent trade-offs among conflicting objectives. Ef- the solution by providing additional preference in-
ficient solutions are the ones that possess the rele- formation. The interaction between the computer
vant trade-off information. An x ° C R n is called an program and the decision maker continues until a
efficient solution for the (MOO) problem if x ° C X satisfying solution is obtained.
and there exists no x C X such that f (x) >_ .f (x °) Interactive methods are important in more than
with strict inequality holding for at least one com- one way. First, they have introduced the means
ponent. The set of all efficient solutions of the for practically solving a (MOO) problem [12]. Sec-
(MOO) problem is usually denoted by XE. As per ond, they help a decision maker learn about the
the above definition, the most-preferred solution inherent trade-offs of a problem during the solu-
of the decision maker should belong to XE, as so- tion process [5]. Third, the idea underlying the
lutions that are not efficient, the dominated ones, interactive methods constitutes the major moti-
can be improved upon in at least one objective vation behind the contemporary decision support
without worsening the others. systems. Although interactive algorithms have en-
Since XE is usually a big set, confining the countered a certain level of acceptance from prac-
most-preferred solution to XE does not help iden- titioners [1], [20], they are not without disadvan-
tify the most-preferred solution immediately. In tages. They usually rely too much on the informa-
particular, the difficulty of defining and obtaining tion provided by the decision maker, are not able
the most-preferred solution, the one that the de- to provide a global look at XE, and thus at the
cision maker would identify as the solution to the trade-offs inherent in a problem, and they focus on
decision-making problem, and the need for the in- finding a single solution whereas a number of solu-
evitable involvement of the decision maker in the tions may be compatible with the decision maker's
solution procedure has resulted in very different preferences. Moreover, their information requests
solution approaches to the (MOO) problem. may be overwhelming for the decision maker. It
has been discussed that interactive methods need
T r a d i t i o n a l Classification. The timing of the to address behavioral aspects of decision making
involvement of the decision maker in the solution [16] and concentrate on interfacing the decision
procedure has been a crucial factor that distin- maker[15] as well as broadening their model base
guishes among various approaches to the (MOO) [10]. Although they do not encompass all the raised
problem [13]. A priori methods, methods that use issues, some of the interactive (MOO) algorithms
prior articulation of preferences, ask the decision have already evolved into decision support systems
maker to specify preference information prior to that provide a friendly environment for modeling
the application of an optimization routine. The as well as problem solving [17]. It can be expected
elicitation of preference information can be di- that more decision support systems to solve prob-
rected towards deriving a utility function that de- lem (MOO) will appear in the near future.

473
Multi-objective optimization and decision support systems

Perhaps the most straight-forward way of ap- methods that rely on simplex-like procedures or
proaching the (MOO) problem is as in vector op- parametric searches that incorporate book-keeping
timization methods. Also referred to as posterior mechanisms based on the fact that the set of ef-
methods, these methods are based on the sole as- ficient extreme points is connected. A well-known
sumption that the decision maker prefers more to procedure that solves (MOLP) for all of its ex-
less in each objective function in (MOO) hence treme points is ADBASE which was developed by
they propose identifying all of the efficient solu- R.E. Steuer [19].
tions of (MOO) and presenting them to the de-
EXAMPLE 1 Consider the MOLP problem [18]:
cision maker for the identification of the most-
preferred solution. Along with theoretical findings max xl~ X2~ X3
[11], [2], some vector optimization methods have s.t. 2Xl + 3x2 + 4x3 < 12
been proposed; however, the methods have not (1)
4xl + X2 + X3 _< 8
gained practical recognition in general. The fail-
Xl~X2~X3 >_ O.
ure in the implementation of the proposed meth-
ods can be explained by the heavy computational
requirements of these methods. Perhaps a more
important factor is the difficulty of presenting the x3
efficient set in a 'legible' way to the decision maker.
Furthermore, as the efficient set is usually contin- el
uous when the feasible region is, the task of iden- E1 / e2
tifying the most-preferred solution is a monstrous J
one attributed to the decision maker. 5
Xl

Multi-Objective Linear Programming. When E2


(MOO) has linear objective functions and a poly- e4
e3
hedral feasible set, the resulting problem is called
a multiple objective linear programming (MOLP)
problem. The MOLP problem has mathematical
features that make it easier to characterize and
obtain the efficient set compared to the more gen- The efficient set is the union of the two shaded
eral case. More specifically, it has been shown that efficient faces E1 and E2. There are 5 efficient ex-
the efficient set of the MOLP problem consists of treme points" el - (0, 0, 3), e2 - (10/7, 0, 16/7),
a collection of efficient faces of the feasible region. e3 -- (12/10, 32/10, 0), e4 -- (0, 4, 0), e5 -- (2, 0, 0).
As faces of a polyhedron can be characterized in If X denotes the feasible region, The face marked
a number of ways, for instance as the convex hull E1 can be characterized as the polyhedron that
of its extreme points if its compact, as the optimal forces the first constraint in (1) to equality in the
solution set to a particular optimization problem, definition of X. It can also be defined as the con-
or as a polyhedron itself, it becomes possible to vex hull of its four extreme points el, e2, e3, e4.
obtain and present the efficient set [9], [18], [22]. Finally, it is the optimal solution set to the opti-
Yet the computational effort increases with mization problem
problem size, and the (MOO) problem cannot be
considered truly solved at this stage without some max /klXl -+- /~2X2 -4- )~3X3
mechanism that helps the decision maker iden- s.t. xCX
tify the most-preferred solution in this huge and
for (~,/~2, ~3) = (2, 3, 4), and its positive multi-
hard-to-explore set. Most of the vector optimiza-
ples. K]
tion methods have concentrated on finding the set
of efficient extreme points of the multiple objective In large problems, the set of efficient extreme
linear programming problem. These are usually points may still contain too many points to be

474
Multi-objective optimization and decision support systems

studied by the decision maker. Moreover, extreme constraints that define the feasible region, but usu-
efficient points may not carry the trade-off infor- ally in a conservative way so as to retain some
mation well since some portions of the efficient computational tractability. Similarly, the multiple
set may end up being over-emphasized whereas objective integer programming problem is a very
some regions are highly missed. Indeed, there is difficult one to solve due to the additional compli-
no reason for a decision maker to be solely inter- cations related to integrality.
ested in extreme point efficient solutions. The at-
tractiveness of efficient extreme points mostly lies A p p l i c a t i o n s . Along with what one can call 'case
in their mathematical properties. With this mo- studies', certain applications that are more generic
tivation, a method that applies to a general set than a case study but more specific than prob-
of (MOO) problems has been suggested to find lem (MOO) itself have appeared. Typical exam-
globally-representative subsets of the efficient set ples include, but are not limited to, bicriteria net-
work optimization problems, bicriteria knapsack
problems, and multicriteria scheduling problems.
W o r k i n g in t h e O u t c o m e Space. The outcome Since usually these are problems that naturally in-
set Y = {y e R P : y = f ( x ) 3 x e X} helps redefine volve multiple criteria, the methods developed for
an equivalent problem to (MOO) in p-dimensional these problems have practical implications. Most
outcome space: of the methods developed can be categorized un-
der a priori methods. A typical approach is to form
(MOO0) {max y a weighted combination of the objective functions.
s.t. yCY.
Recently, interactive and vector optimization ap-
As the number of objectives p is usually much less proaches that deal with similar problems have also
than the number of variables n, the structure of appeared.
Y is simpler than that of Z [4], [8]. The ability to
work directly with (MOO0) thus has the poten- A R e l a t e d O p t i m i z a t i o n P r o b l e m . A related
tial of providing significant computational benefits problem is the problem of optimizing a function
that vector optimization algorithms have tried to g: R n --+ R p o v e r the efficient set XE. This can
realize [3]. be a difficult global optimization problem depend-
ing on the properties of the objective function g.
R e f l e c t i o n s on O p t i m i z a t i o n Trends. As a The problem is motivated in different ways. Some-
field within the general field of optimization, multi- times, in certain settings, a function that is to serve
objective optimization is naturally affected by the as a pseudo utility function is available. Then op-
trends that become dominant in optimization. timizing this pseudo utility function over the effi-
Consequently, interior point methods, genetic al- cient set in a sense corresponds to solving problem
gorithms, neural networks have been applied to the (MOO) itself. In addition, when g becomes one of
(MOO) problem in various ways. As there are dif- the objective functions, then solving this problem
ficult problems under (MOO) that cannot be yet provides the range of values the objective func-
practically solved, new developments in the gen- tion takes over the efficient set. This information
eral field of optimization constitute a potential to is valuable for a decision maker who is trying to
solve these problems. make assessments to solve a problem and is used
in some of the interactive algorithms. The diffi-
N o n l i n e a r and Integer P r o b l e m s . Most of the culty of the problem has also resulted in heuristic
algorithms proposed to solve problem (MOO) con- solution approaches.
centrate on the fully linear case. In general, when
nonlinearities are introduced, the efficient solu- Trends. The advances in information technology
tions and the efficient set become difficult to char- affect the field of multiple criteria decision mak-
acterize. There are some algorithms that allow for ing heavily. Faster computers and parallel process-
nonlinearities in the objective functions, and in the ing opportunities make it timewise feasible to solve

475
Multi-objective optimization and decision support systems

optimization problems that would be deemed im- objective linear programs in outcome space', J. Optim.
practical in the past. Improved graphical capabili- Th. Appl. 98 (1998), 17-35.
[4] BENSON, H.P., AND LEE, D.: 'Outcome-based algo-
ties make it feasible to accommodate sophisticated
rithm for optimizing over the efficient set of a bicriteria
user interfaces to invite the decision maker in the linear programming problem', J. Optim. Th. Appl. 88,
problem solving process more actively and reliably. no. 1 (1996), 77-105.
The developments in the World Wide Web present [5] BENSON, H.P., LEE, D., AND MCCLURE, J.P.: 'A
many opportunities to explore for individual and multiple-objective linear programming model for the
citrus rootstock selection problem in Florida', J. Multi-
group decision support. At this point in time, there
Criteria Decision Anal. 6 (1997), 1-13.
is still a need to solve the MOO problem in a rig- [6] BENSON, H.P., AND SAYIN, S.: 'Towards finding global
orous, user-friendly and creative way. The decision representations of the efficient set in multiple objec-
support systems that enable the involvement of the tive mathematical programming', Naval Res. Logist. 44
decision maker in modeling and problem solving (1997), 47-67.
practically seem to be the way of solving (MOO) [7] CHARNES, A., AND COOPER, W.W.: 'Goal program-
ming and multiple objective optimization-Part 1', Eu-
problems. The vector optimization approaches can
top. J. Oper. Res. 1 (1977), 39.
also benefit from a decision support framework in [8] DAUER, J.P., AND LIU, Y.-H.: 'Solving multiple ob-
their effort to help the decision maker identify a jective linear programs in objective space', Europ. J.
most-preferred solution. Oper. Res. 46 (1990), 350-357.
See also: Multi-objective optimization: [9] ECKER, J.G., HEGNER, N.S., AND KOUADA, I.A.:
'Generating all maximal efficient faces for multiple ob-
Pareto optimal solutions, properties; Multi- jective linear programs', J. Optim. Th. Appl. 30 (1980),
objective optimization: Interactive meth- 353-381.
ods for preference value functions; Multi- [10] GARDINER, L.R., AND STEUER, R.E.: 'Unified in-
objective optimization: Lagrange duality; teractive multiple-objective programming- An open-
Multi-objective optimization: Interaction architecture for accommodating new procedures', J.
Oper. Res. Soc. 45, no. 12 (1994), 1456-1466.
of design and control; Outranking meth-
[11] GEOFFRION, A.M.: 'Proper efficiency and the theory of
ods; Preference disaggregation; Fuzzy multi- vector maximization', J. Math. Anal. Appl. 22 (1968),
objective linear programming; Preference 618-630.
disaggregation approach: Basic features, ex- [12] GEOFFRION, A.M., DYER, J.S., AND FEINBERG, A.:
amples from financial decision making; Pref- 'An interactive approach for multi-criterion optimiza-
erence modeling; Multiple objective pro- tion with an application to the operations of an aca-
demic department', Managem. Sci. 19 (1972), 357-368.
gramming support; Multi-objective integer [13] HWANG, C.L., AND MASUD, A.S.M.: Multiple objec-
linear programming; Multi-objective com- tive decision making-methods and applications, A state
binatorial optimization; Bi-objective assign- of the art survey, Lecture Notes Economics and Math.
ment problem; Estimating data for multicri- Systems. Springer, 1979.
teria decision making problems: Optimiza- [14] KEENEY, R.L., AND RAIFFA, H.: Decisions with mul-
tiple objectives: Preferences and value tradeoffs, Wiley,
tion techniques; Multicriteria sorting meth-
1976.
ods; Financial applications of multicriteria [15] KORHONEN, P., AND LAAKSO, J.: 'A visual interac-
analysis; Portfolio selection and multicrite- tive method for solving the multiple criteria problem',
ria analysis; Decision support systems with Europ. J. Oper. Res. 24 (1986), 277-287.
multiple criteria. [16] KORHONEN, P., MOSKOWITZ, H., AND WALLENIUS, Z.:
'Choice behavior in interactive multiple criteria deci-
sion making', Ann. Oper. Res. 23 (1990), 161-179.
References [17] KORHoNEN, P., AND WALLENIUS, J.: 'A Pareto race',
[1] BENAYOUN, R., MONTGOLFIER, J. DE, TERGNY, J., Naval Res. Logist. 35 (1988), 615-623.
AND LARICHEV, O.: 'Linear programming with multi- [18] SAYIN, S.: 'An algorithm based on facial decomposition
ple objective functions: Step method (STEM)', Math. for finding the efficient set in multiple objective linear
Program. 1 (1971), 366-375. programming', Oper. Res. Lett. 19 (1996), 87-94.
[2] BENSON, H.P.: 'Existence of efficient solutions for vec- [19] STEUER, R.E.: 'Operating manual for the AD-
tor maximization problems', J. Optim. Th. Appl. 26 BASE multiple objective linear programming package',
(1978), 569-580. Techn. Report College Business Admin. Univ. Georgia,
[3] BENSON, H.P.: 'A hybrid approach for solving multiple

476
Multi-objective optimization: Interaction of design and control

Athens (1983). improve the controllability of the process. Other


[20] STEUER, R.E., AND CHOO, E.-U.: 'An interactive methods may examine the dynamic operation of
weighted Tchebycheff procedure for multiple objective
several designs to determine which has the best
programming', Math. Program. 26 (1983), 326-344.
[21] STEUER, R.E., GARDINER, L.R., AND GRAY, J.: 'A controllability aspects.
bibliographic survey of the activities and the interna- There are very few methods which address the
tional nature of multiple criteria decision making', J. interaction of design and control in a quantita-
Multi-Criteria Decision Anal. 5 (1996), 195-217. tive manner. The interaction of design and con-
[22] Yu, P.L., AND ZELENY, M.: 'The set of all nondomi-
trol can be addressed through a process synthesis
nated solutions in linear cases and a multicriteria sim-
plex method', J. Math. Anal. Appl. 49 (1975), 430-468. approach involving optimization. This approach
[23] ZELENY, M.: 'Compromise programming', in J.L. involves the representation of design alternatives
COCHRANE AND M. ZELENY (eds.): Multiple Criteria through a process superstructure, the mathemat-
Decision Making, Univ. South Carolina Press, 1973. ical modeling of the superstructure, and the de-
Serpil Sayin velopment of an algorithm to extract the optimal
Ko~ Univ. flowsheet from the superstructure. The simultane-
80860 istinye ous optimization of the design and control of the
istanbul, Turkey
process is handled through multiple objectives rep-
E-mail address: ssayin~ku, edu. t r
resenting the steady state economics and dynamic
MSC2000: 90B50, 90C29, 65K05, 90C05, 91B06
controllability. This naturally leads to a multi-
Key words and phrases: multiple criteria decision making,
vector optimization, efficient solution, decision support. objective framework.

M u l t i - O b j e c t i v e O p t i m i z a t i o n . In any decision
MULTI-OBJECTIVE OPTIMIZATION: IN- making process, the goal is to reach the best com-
TERACTION OF DESIGN AND CONTROL promise solution among a number of competing
Traditionally, process design and process control objectives. Many examples of competing objec-
are treated sequentially. Dynamics are not con- tives exist in the field of engineering. For exam-
sidered during the design phase, and flowsheet ple, in the design of a process, one may have to
changes can not be made during the control phase. consider safety and operational issues as well as
The problem with this approach is that the two are economic issues. A decision making process is nec-
inherently connected as the design of the process essary when the most economic design is not the
affects its controllability. Thus, the steady state safest or most operable.
design and the dynamic operability issues should The best compromise solution depends on the
be treated simultaneously. Analyzing the interac- relative importance of the conflicting objectives.
tion o] design and control addresses the issue of This relative importance is not easily determined
quantitatively determining the trade-offs between and is usually a subjective decision. The one re-
the steady state economics and the dynamic con- sponsible for making this decision is the deci-
trollability. sion maker (DM) whose choice can be based on
The interaction of design and control problem is a number of factors. Since subjective measures
to determine the process flowsheet which is both and decisions do not translate well into mathemat-
the economically optimal and controllable. There ics, a quantitative way of determining the trade-
are different methods for addressing this problem. offs and relative importance among the the objec-
One common approach is to use overdesign where, tives is necessary for a multi-objective optimiza-
once the economic steady state design is deter- tion framework.
mined, surge tanks are added or equipment sizes
are increased in order to handle any dynamic prob- M u l t i - O b j e c t i v e F r a m e w o r k for t h e I n t e r a c -
lems which may arise. This overdesign is usually t i o n of D e s i g n a n d C o n t r o l . In analyzing the
based on heuristic rules and will likely move the interaction of design and control, the objectives
design away from its economic optimum. There is that are considered measure the steady state eco-
no guarantee that the measures taken will even nomics and the dynamic controllability of the pro-

477
Multi-objective optimization: Interaction of design and control

cess. The optimization approach in process syn-


thesis serves as the basis for the multi-objective
I2
framework for the interaction of design and con-
trol. The procedure involves four steps:

1) Process representation;
2) Mathematical modeling;
3) Generation of noninferior solution set (deter-
mine trade-offs);
4) Best-compromise examination. Noninferior SolutionSet

The first step is the representation of all the pos-


sible design alternatives through a process super-
structure. In this step, all the units and possible
connections of interest are incorporated into the
Fig. 1: Noninferior solution set for a problem with two
superstructure such that all designs of interest are objectives.
included as a subset of the superstructure.
Using the information about the trade-offs
Next, a mathematical model of the superstruc-
among the competing objectives, a strategy for de-
ture is developed for the superstructure as well
termining the best compromise solution is devel-
as for for objective functions. The mathematical
oped. This strategy is based on information from
formulation is determined by the structure of the
the DM and depends on the relative weights given
process flowsheet and must include all information
to the objectives. These weights are varied sys-
needed to evaluate the objective functions. The ob-
tematically to locate the solution which the DM
jective functions must measure the economics of
prefers the most. How to determine these weights
the process as well as the controllability of the pro-
is one of the more interesting aspects of the prob-
cess. Since the objective related to the economic
lem.
performance is determined by steady state oper-
ation and the objective for the controllability is Note that the multi-objective problem can be
determined by its dynamic operation, the mathe- reduced if some of the objectives (presumably
matical model most contain both steady state and those with very low weights) need not be optimized
dynamic information. The mathematical formula- but simply brought to a satisfactory level. In this
tion involves both continuous and discrete vari- case, these objectives can be incorporated into the
ables where discrete variables are used to indicate problem as constraints.
the existence of units and connections within the
flowsheet. General M a t h e m a t i c a l Formulation. The
Once the model has been formulated, an al- mathematical model is a multi-objective mixed in-
gorithm is developed and used to determine the teger nonlinear programming problem which has
quantitative trade-offs among the competing ob- the following form:
jectives. Individually, each objective can be opti-
OPTIMIZE J(x, y)
mized, but together, they will be in conflict. This
means that there is a set of solutions where one s.t. h(x, y) - 0
objective can be improved only at the expense of g(x, y) _< 0 (1)
the other objectives. This set of solutions is called xER p
the noninferior solution set which is visually de- y E {0, 1} q.
picted for a two objective problem in Fig. 1. This
solution set is also referred to as nondominated In this formulation, J is a vector of objectives
and Pareto optimal and the surface of noninferior which includes the economic and controllability
solutions implicitly defines a function G(J). objectives. The expressions h and g represent ma-

478
Multi-objective optimization: Interaction of design and control

terial and energy balances, thermodynamic rela- the noninferior solution set, determining the util-
tions, and other constraints. The controllability ity function based on information from the DM,
measures are included in the formulation as ~7. The and determining the best-compromise solution.
variables in this problem are partitioned as contin- Different techniques have been developed in
uous x and binary y. order to assess the trade-offs among the objec-
tives quantitatively. See [7] for a tutorial in multi-
S o l u t i o n of t h e M O P . One way to address the objective optimization. A review is also available
solution of the MOP is to formulate it using a util- in [17]. Much of the fundamental aspects of multi-
ity function U which implicitly relates the multiple objective optimization can be found in [1].
objectives in terms of some common basis:
¢
rain U[J(x, y)] N o n i n f e r i o r S o l u t i o n Sets. The noninferior so-
lution set can be determined in a number of ways.
s.t. h(x,y)=O
One approach is the formulate the problem as
g ( x , y ) <_ O (2)
xER p min E wiJi(x, y)
icI
y e {0, 1} q.
s.t. h(x,y)=O
By introducing the utility function, the vector g(x, y ) < O (4)
optimization problem has been reduced to a
xER p
scalar optimization problem and MINLP tech-
niques can be applied to solve the problem. These y E {0, 1} q,
MINLP techniques include generalized Benders de- where the weights wi are selected such that wi >_0
composition (GBD)[4], [14], outer approximation for all i and Y~i~I wi - 1. Through a suitable
(OA) [2], outer approximation with equality re- choice of the weights, the noninferior solution set
laxation ( O A / E R ) [ 8 ] , and outer approximation can be found. This approach can miss some points
with equality relaxation and augmented penalty in the noninferior solution set if the solution region
( O A / E R / A P ) [16]. These methods are discussed is nonconvex. In order to address this problem, a
in detail in [3]. weighted norm can be used as follows:
With the definition of the noninferior solution
set, the optimization problem can be formulated
min ~ [wiJi(x, y)] p
aS
it1
min U[J(x, y)] s.t. h(xy)-O
(3) ' (5)
s.t. a(J) -0. g(x, y)<_ 0

The challenging aspect of the problem is de- xcR p


termining the explicit form of the utility func- y e {0, 1} q.
tion. One possible form of the utility function is
By increasing the size of p, the curvature of the
a weighted linear sum of the objectives:
supporting function is increased and more noninfe-
U[J(x, y)] - E wiJi, rior points can be found. In the extreme of p = oc,
iEI all the noninferior points can be located. Using the
where I is the set of objective functions and wi c~-norm, the problem becomes
are the weights for the objective functions whose
min max wiJi(x, y)
value is determined by the DM. The difficulty that iEI
arises is that the utility function is generally not s.t. h(x, y) = O
known. It is, however, assumed to be convex and g(x, y)~_ 0 (6)
continuously differentiable.
xcR p
The issues surrounding the solution of the multi-
y e {0, 1} q.
objective optimization problem are determining

479
Multi-objective optimization: Interaction of design and control

The advantage of this formulation is that the problems of the form (2) where U is unknown, con-
weights have a physical meaning for the DM. If vex, and continuously differentiable. Due to con-
the DM knows the desired values for each objec- vexity, the partial derivatives of U with respect to
tive for a given noninferior point, the weights can each of the arguments in the objective space are
be set to the reciprocal of these values. The non- positive. This is expressed mathematically as
inferior solution will be the one that is most like
0U(J) > 0.
the one with the values specified by the DM. The
OJi
disadvantage of this formulation is that it can be
difficult to solve. Thus, a decrease in Ji will lead to a decrease in U.
In the interactive scheme, the DM is asked for the
Another way to determine the noninferior so-
positive trade-off weights, w k, for a given solution
lution set is through the e-constraint method [6].
k. This weight is defined as the ratio of the change
In this approach, all but one of the objectives is
in the utility function with respect to one function
incorporated into the problem as a constraint less
divided by the change in the utility function with
than e. This results in the following formulation:
respect to another. This is expressed mathemati-
min J1 (x, y) cally as
s.t. J i ( x , y ) _< ei, i = 2,...,q, Ou(jk)/oJi
h(x, y) = O wki -
OU(Jk)/OJ1
(7)
g(x, y) _< 0 where jk _ [Jl(x k , y k ) , . . . , J l ( x k,yk)]. A line
xcR p search along a feasible direction of steepest descent
y e (0, 1} q. locates an improved solution for the next iteration.
By exploiting the fact that the utility function
By varying the values of ei, the points of the non-
is convex, cutting planes can be introduced to re-
inferior solution set can be found.
duce the search to improving directions [10]. Since
U is convex,
Choosing the Best-Compromise Solution. To
this point, the focus has been on determining the 0 _ U ( J * ) - U(J k) (8)
noninferior solution set. Only one of the points can ~_ V f u ( j k ) ( j , _ jk)
be chosen as the best solution for the problem,
and the task of the DM is to determine this point. min V f u ( j k ) ( J - J k)
Once the noninferior solution set is determined, it s.t. h(x, y) - O
is presented to the DM who will choose the solu- _~ g(x, y) _~ O
tion point he prefers. The selection of this point is
xER p
based on the relative importance of the objectives
in the eyes of the decision maker. y e {0, 1}q.
Instead of assigning arbitrary weights to the var- This involves the linearization in the objective
ious objectives, a systematic approach can applied space around the point jk. If the solution to the
which uses the trade-off information in the non- minimization is zero, then the optimal solution
inferior solution set. The slope of the noninferior J* has been found. If the solution has a negative
solution set at any point reveals how much one ob- value, then the direction leads to an improvement
jective will be improved at the expense of another in the objective space. This minimization can be
objective. This information is used in an interac- performed over a number of points k - 1 , . . . , K to
tive, iterative cutting plane algorithm to determine find a direction which improves all of them. Cut-
the best compromise solution. ting planes in the objective space are formed to
find new values of the objectives which improve the
Cutting Plane Algorithm. The cutting plane utility function according to the trade-off weights,
algorithm described in [11] is based on [5] and [10]. VU, which the DM provides. At each iteration
Marginal rates of substitution were used to solve of the algorithm, the following problem must be

480
Multi-objective optimization: Interaction of design and control

solved: designs were determined and used to screen de-


min z signs and determine the noninferior solution set.
p No method was provided for determining the best-
s.t. z ~ ~ wk(Ji(x, y) -- J i ( x k, yk), compromise solution.
i=l In the work of [13], singular value decomposi-
V k = 1, . . . , K . tion is used to determine dynamic operability mea-
(9)
h(x, y) - O sures. The controllability is formulated through
g(x, y) _ O the linearization of the model and is given in terms
xER p
of the singular values of the transfer function. This
modeling leads to an infinite-dimensional prob-
y e {0, 1}q.
lem as all frequencies must be considered for the
The steps of the cutting plane algorithm are the controllability measure. For the multi-objective
following: optimization, the e-constraint method was used
1 Determine the initial solution point k -- 1 and to determine the noninferior solution set. The
determine the values of all the objective func- scalar optimization was addressed by approximat-
tions. ing the infinite-dimensional problem and using an
Assign the values of the weights w~. gradient-based algorithm to solve the optimization
2 Solve (9) to find new values of x and y.
problem and determine the operating parameters
Determine the values of the objective functions
for the new values of x and y. for the process.
3 IF the solution to (9) is zero, T H E N go to Step The previous methods did not take into account
4 that the structure of the process flowsheet as well
ELSE set k - k + 1, u p d a t e the values x k, yk, as the design parameters determine its inherent
and j k , generate new weights, and go to Step 2.
controllability. In order to consider structural al-
4 Terminate with x k and yk as the best-
compromise solution. ternatives in the process flowsheet such as the
existence of units in the flowsheet, discrete vari-
Cutting plane algorithm.
ables are used in the process modeling. This as-
This algorithm requires the DM to provide only pect of the process design was considered by [11],
trade-off weights at each iteration. These weights [12] in the interaction of design and control by us-
can be estimated by knowledge of the relative im- ing the optimization approach to process synthe-
portance of the objectives or by information from sis. In this approach, the structure of the process
the noninferior solution set. flowsheet and the design parameters are consid-
ered simultaneously with the dynamic controlla-
Multi-Objective Optimization in the Inter- bility of the process. The controllability measures
action of D e s i g n and Control. The interaction employed were the open-loop linear controllability
of design and control has been recognized as a measures (singular value, condition number, rela-
multi-objective problem by many researchers as tive gain array). The noninferior solution set was
the objectives representing the steady-state eco- determined using the e-constraint method, and the
nomic design and dynamic controllability are re- best-compromise solution was found using the cut-
garded as noncommensurable. One of the first ting plane method described above.
challenges in this problem is determining a suitable Further development of the above technique was
controllability objective. The choice of the control- addressed by [15] where nonlinear dynamic models
lability objective will dictate the required elements were considered. The problem was formulated as a
of the mathematical formulation of the problem. multi-objective m i x e d i n t e g e r o p t i m a l c o n t r o l p r o b -
One of the early works which addressed the l e m . The multi-objective problem was again solved
multi-objective nature of the interaction of design using the e-constraint method. The mixed integer
and control was that of [9]. A given set of al- optimal control problem was solved by extending
ternative steady-state designs was assumed to be the methods for solving mixed integer nonlinear
known. Bounds on the dynamic measures of the optimization to handle dynamic systems.

481
Multi-objective optimization: Interaction of design and control

Conclusions. Analyzing the interaction of design m e t h o d s for d i s t r i b u t e d optimal control


and control leads to a multi-objective optimization problems; Robust control; Robust control:
problem. The key issue in solving this problem is Schur stability of polytopes of polynomi-
quantitatively determining the trade-offs between als; Semi-infinite p r o g r a m m i n g and control
the steady-state economics and the dynamic con- problems; D y n a m i c p r o g r a m m i n g and New-
trollability. By using multi-objective optimization ton~s m e t h o d in u n c o n s t r a i n e d optimal con-
techniques, these characteristics of the process can trol; Duality in optimal control with first
be traded off in a systematic manner. order differential equations; Infinite horizon
By following the optimization approach to pro- control and dynamic games; Control vector
cess synthesis, a mathematical framework can be iteration; S u b o p t i m a l control.
developed. This involves developing a superstruc-
ture of design alternatives and effective mathemat-
References
ical models for the different criteria. The algorith-
[1] CLARK, P.A., AND WESTERBERG, A.W.: 'Optimiza-
mic procedure for solving the multi-objective prob- tion for design problems having more than one objec-
lem involves the successive solution of scalar op- tive', Computers Chem. Engin. 7, no. 4 (1983), 259-
timization problems to determine the noninferior 278.
solution set. The final step in the approach is to de- [2] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer-
approximation algorithm for a class of mixed-integer
termine the best-compromise solution from those
nonlinear programs', Math. Program. 36 (1986), 307-
in the noninferior solution set. 339.
See also" Multi-objective optimization: [3] FLOUDAS, C.A.: Nonlinear and mixed integer optimi-
P a r e t o optimal solutions~ properties; Multi- zation: Fundamentals and applications, Oxford Univ.
objective optimization: Interactive meth- Press, 1995.
[4] GEOFFRION, A.M.: 'Generalized Benders decomposi-
ods for preference value functions; Multi-
tion', J. Optim. Th. Appl. 10, no. 4 (1972), 237-260.
objective optimization: Lagrange dual- [5] GEOFFRION, A.M., DYER, J.S., AND FEINBERG, A.:
ity; O u t r a n k i n g methods; Preference dis- 'An interactive approach for multi-criterion optimiza-
aggregation; Fuzzy multi-objective linear tion with an application to the operation of an aca-
programming; Multi-objective optimization demic department', Managem. Sci. 19 (1972), 357-368.
[6] HAIMES, Y., HALL, W.A., AND FREEDMAN, H.T.:
and decision s u p p o r t systems; Preference
Multi-objective optimization in water resource systems:
disaggregation approach: Basic features~ ex- The surrogate worth trade-off method, Elsevier, 1975.
amples from financial decision making; Pref- [7] HWANG, C.L., PAIDY, S.R., YOON, K., AND MA-
erence modeling; Multiple objective pro- SUD, A.S.M.: 'Mathematical programming with mul-
g r a m m i n g support; Multi-objective integer tiple objectives: A tutorial', Comput. Oper. Res. 7
linear programming; Multi-objective com- (1980), 5-31.
[8] KocIs, G.R., AND GROSSMANN, I.E.: 'Relaxation
binatorial optimization; Bioobjective assign-
strategy for the structural optimization of process flow
ment problem; E s t i m a t i n g d a t a for mul- sheets', Industr. Engin. Chem. Res. 26, no. 9 (1987),
ticriteria decision making problems: Opti- 1869.
mization techniques; M u l t i c r i t e r i a sorting [9] LENHOFF, A.M., AND MORARI, M.: 'Design of resilient
methods; Financial applications of multicri- processing plants I: Process design under consideration
of dynamic effects', Chem. Engin. Sci. 37, no. 2 (1982),
teria analysis; Portfolio selection and mul-
245-258.
ticriteria analysis; Decision s u p p o r t sys- [10] LOGANATHAN, G.V., AND SHERALI, H.D.: 'A conver-
tems with multiple criteria; Optimal con- gent interactive cutting-plane algorithm for multiob-
trol of a flexible arm; Dynamic pro- jective optimization', Oper. Res. 35 (1987), 365-377.
gramming: Continuous-time optimal con- [11] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the
trol; H a m i l t o n - J a c o b i - B e l l m a n equation; interaction of design and control-1. A multiobjective
framework and application to binary distillation syn-
Dynamic p r o g r a m m i n g : Optimal control ap-
thesis', Computers Chem. Engin. 18, no. 10 (1994),
plications; M I N L P : Applications in the in- 933-969.
teraction of design and control; Sequen- [12] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the
tial quadratic p r o g r a m m i n g : Interior point interaction of design and control-2. Reactor-separator-

482
Multi-objective optimization: Interactive methods for preference value functions

recycle system', Computers Chem. Engin. 18, no. 10 mous improvements in the speed and storage of
(1994), 971-994. computers make it practical to apply these algo-
[13] PALAZOGLU, A., AND ARKUN, Y.: 'A multiobjective
rithms to the solution of realistically-sized problem
approach to design chemical plants with robust dy-
namic operability characteristics', Computers Chem. applications.
Engin. 10, no. 6 (1986), 567-575. Formally, the statement of the multi-objective
[14] PAULES, IV, G.E., AND FLOUDAS, C.A.: 'APROS: optimization problem of interest here is
Algorithmic development methodology for discrete-
continuous optimization problems', Oper. Res. 37, SVMAX f(x) - [ f l ( x ) , . . . , fp(X)],
no. 6 (1989), 902-915.
(v)
s.t. xEX.
[15] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'Interaction
of design and control: Optimization with dynamic mod- Here, p > 2, X is a nonempty subset of R n, each
els', in W.W. HAGER AND P.M. PARDALOS (eds.): Op- fj, j - 1 , . . . ,p, is a real-valued function defined
timal Control: Theory, Algorithms, and Applications, on X or on some suitable set containing X, and
Kluwer Acad. Publ., 1997, pp. 388-435. VMAX indicates that, in some unspecified sense,
[16] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
we are to 'vector maximize' the vector f(x) of ob-
bined penalty function and outer approximation
method for MINLP optimization', Computers Chem. jective ]unctions (criteria) over X. The set X is
Engin. 14, no. 7 (1990), 769-782. called the set of decision alternatives or the deci-
[17] ZIONTS, S.: 'Methods for solving management prob- sion set, and {f(x) E R P ' x e Z}, is called the
lems involving multiple objectives', Working Paper outcome set.
SUNY at Buffalo (1979).
There are a large number of diverse solution al-
Carl A. Schweiger gorithms for problem (V). All are intended to help
Dept. Chemical Engin. Princeton Univ. the decision maker (DM) find a most preferred
Princeton, NJ 08544-5263, USA
solution to the problem. In the majority of these
E-mail address: carl©titan.princeton, e d u
algorithms, the notion of efficiency plays an indis-
Christodoulos A. Floudas
Dept. Chemical Engin. Princeton Univ.
pensable role. An efficient (nondominated, nonin-
Princeton, NJ 08544-5263, USA ferior, Pareto optimal) solution for problem (V) is
E-mail address: floudas@titan, princeton, e d u a solution 5 E X such that there exists no other
solution x E X that satisfies ](x) > f(-~) and
MSC2000: 90C29, 90Cll, 90C90
Key words and phrases: interaction of design and control, f(x) ~ f(5). Let XE denote the set of efficient
multi-objective optimization, mixed integer nonlinear opti- solutions for problem (V). Notice that if 5 E XE,
mization, Pareto optimal solution. then there is no other feasible solution for prob-
lem (V) that achieves at least as large a value as
in each criterion of the problem and a strictly
MULTI-OBJECTIVE OPTIMIZATION: IN- larger value than ~ in at least one criterion of the
TERACTIVE METHODS FOR PREFERENCE problem.
VALUE FUNCTIONS In the great majority of instances of problem
The multi-objective optimization (multiple criteria (V), the preference value ]unction (value ]unc-
decision making) problem is the problem of choos- tion) v of the DM is unknown. This is a function
ing a most preferred solution when two or more in- v" R p ~ R that maps the outcomes of problem
commensurate, conflicting objective functions (cri- (V) to real numbers in such a way that for any
teria) are to be simultaneously maximized. Interest two outcomes yl and y2, the DM prefers yl to y2
in multi-objective optimization has risen sharply if and only if v(y 1) > v(y2). Although v is un-
during the past 30 years. There are at least three known, what is known is that for each objective
reasons for this. First, and most importantly, is the function fj, the DM prefers more of fj to less of
increasing recognition that most applied problems fj. Mathematically, this means that v is coordi-
in both the private and public sectors involve mul- natewise increasing, i.e., that whenever 7, z E R p
tiple objectives rather than one objective. Second, satisfy ~ > z and ~j > zj for some j - 1 , . . . , p ,
a variety of solution algorithms for multi-objective then v(~) > v(z). It is easy to show that when
optimization are now available. Finally, the enor- v is coordinatewise increasing, any maximizer x*

483
Multi-objective optimization: Interactive methods for preference value ]unctions
4"

of v[f(x)] over X must satisfy x* E XE. In other during the iterations seems to be burdensome for
words, as long as the DM prefers more to less, the him in many cases. This may cause the DM to
search for a most preferred solution to problem (V) prematurely terminate the search so that a most
can be confined to XE. This is one of the key rea- preferred solution is not found.
sons that the concept of efficiency is so important There are literally hundreds of interactive algo-
to the majority of the algorithms for problem (V). rithms for problem (V). Many are limited to cases
The interactive methods constitute one of the where problem (V) is a multiple objective linear
most popular categories of algorithms for solving programming problem. Others apply when prob-
problem (V). An interactive method for problem lem (V) is a multiple objective convex, nonlinear
(V) consists of a sequence of DM-computer inter- programming problem, a multiple objective inte-
actions designed to create a sequence of decision ger programming problem, or some other type of
alternatives that terminates with a most preferred multiple objective optimization problem. Instead
solution to the problem. In a majority of cases, of examining these algorithms individually, we will
the generated alternatives are efficient. Each iter- describe them by groups according to the charac-
ation of the interactive process consists of three teristics that they possess.
steps. First, an initial solution is found with the One of the key characteristics of the interac-
aid of the computer. Typically, this solution is tive algorithms concerns the type of information
found by solving a single-objective optimization required of the DM at each iteration. For instance,
problem that generates either an efficient point or, at each iteration, the DM may be asked to intu-
at worst, a feasible point. Next, the DM is asked itively assign or re-assign weights to the criteria
to react to the generated point by answering one according to his current assessment of their rela-
or more questions involving his preferences for it. tive importance. R.E. Steuer [13] has shown some
Last, based upon the answers given, the computer important stumbling blocks to this approach, how-
generates a new point, typically by modifying pa- ever. Other algorithms may instead elicit relax-
rameters in the single-objective optimization prob- ation quantities from the DM. In these cases, the
lem. This process continues until either the com- DM is asked how much he would be willing to re-
puter or the DM identifies a most preferred so- lax the level of one objective function in order to
lution. The value function v of the DM is never obtain possible improvements in the levels of other
needed and, in fact, is assumed to be unavailable. objective functions. Some of the oldest interactive
algorithms use this approach [1], [9]. Still other
There are several advantages to using interac- types of algorithms ask the DM various types of
tive methods as compared to other categories of
trade-off questions. The trade-off questions are de-
methods for problem (V). For instance, the pref- signed to obtain an estimate of the gradient of the
erence information asked of the DM at each it- value function of the DM at the current solution.
eration is not difficult to supply. Furthermore, the This approach is also relatively old, but difficult
DM thereby learns about his value function, which for the DM to accomplish [5], [14]. Finally, a num-
is often initially vague or mostly unknown. As the ber of algorithms call for the DM to make paired
search continues, the DM also learns about the comparisons at each iteration. In a paired com-
decision or efficient decision alternatives available parison, the DM is given two solutions to compare
and the trade-offs in the objective functions across and must give his preference for one or the other.
these decision alternatives. The optimizations re- Usually, the DM can accomplish this. But when
quired of the computer are also usually not difficult the two solutions are quite similar, difficulties can
to perform. Finally, because the DM is highly in- arise [15]. In addition, algorithms that use paired
volved in the process, his confidence in the most comparisons can sometimes call for excessive num-
preferred solution that is eventually found is en- bers of these comparisons [12].
hanced.
A second dimension where the interactive algo-
A frequent criticism of the interactive methods rithms differ is in the approach used to explore
is that, in practice, the work required of the DM the feasible region X or the efficient set XE. Some

484
Multi-objective optimization: Interactive methods for preference value functions

algorithms use .feasible direction methods [2]. In overall quality of the solution process and the an-
these algorithms, at each iteration, the direction to swers obtained. Although preliminary, these com-
move from a point that was last found and the dis- parisons seem to show the relative superiority of
tance to move along the direction are determined the weighting space reduction and other criterion
with the aid of the DM. By moving along the di- weight space search methods, and of the visual in-
rection by the specified amount, the next solution teractive methods. Readers should note, however,
point is found. In many algorithms, all such points that the rankings in the study are subjectively-
are efficient. In another group of algorithms, feasi- obtained by the authors [7].
ble region reduction is used to explore X or XE. As For further general reading on interactive meth-
points in X or in XE are examined in these meth- ods, see [2], [3], [4], [6], [11], [12], [13], [14].
ods, portions of X are removed, usually via lin- See also: M u l t i - o b j e c t i v e optimization:
ear cuts. Another set of algorithms uses weighting P a r e t o o p t i m a l solutions, p r o p e r t i e s ; M u l t i -
space reduction. In these algorithms, a weighted objective optimization: Lagrange dual-
sum of fj, j = 1 , . . . , p , is maximized at each iter- ity; M u l t i - o b j e c t i v e o p t i m i z a t i o n : I n t e r a c -
ation, thereby yielding a point in XE. Based upon tion of design a n d control; O u t r a n k i n g
the DM's responses to these maximizations, por- m e t h o d s ; P r e f e r e n c e d i s a g g r e g a t i o n ; Fuzzy
tions of the weighting space are removed. Eventu- m u l t i - o b j e c t i v e linear p r o g r a m m i n g ; M u l t i -
ally, the portion of the weighting space remaining o b j e c t i v e o p t i m i z a t i o n a n d decision sup-
is so small that the DM can pick out the set of p o r t s y s t e m s ; P r e f e r e n c e d i s a g g r e g a t i o n ap-
weights associated with a most preferred solution. proach: Basic f e a t u r e s , e x a m p l e s f r o m fi-
Other approaches used to explore X o r X E nancial decision m a k i n g ; P r e f e r e n c e model-
include the trade-off cutting plane method [10], ing; M u l t i p l e o b j e c t i v e p r o g r a m m i n g sup-
Lagrange multiplier methods, visual interactive port; M u l t i - o b j e c t i v e i n t e g e r linear pro-
methods (see, e.g. [7]), and the branch and bound gramming; Multi-objective combinatorial
method [8], among others. For further reading con- o p t i m i z a t i o n ; B i - o b j e c t i v e a s s i g n m e n t prob-
cerning these methods, see [3], [4], [6], [11], [12], lem; E s t i m a t i n g d a t a for m u l t i c r i t e r i a deci-
[13] sion m a k i n g p r o b l e m s : O p t i m i z a t i o n tech-
niques; M u l t i c r i t e r i a s o r t i n g m e t h o d s ; Fi-
Another way to group the interactive algorithms nancial a p p l i c a t i o n s of m u l t i c r i t e r i a anal-
for problem (V) is according to whether or not ysis; P o r t f o l i o selection a n d m u l t i c r i t e r i a
they handle inconsistencies in the DM's preference analysis; Decision s u p p o r t s y s t e m s w i t h
responses. As human beings, DM's are prone to m u l t i p l e criteria.
giving preference responses over the course of the
solution procedure that imply inconsistencies such
as asymmetries or intransitivities of preference. References
Some algorithms take no account of these possi- [1] BENAYOUN, a., MONTGOLFIER, J., TERGNY, J., AND
LARITCHEV, O.: 'Linear programming with multiple
ble inconsistencies and have been criticized for this
objective functions: Step method (STEM)', Math. Pro-
[12]. Others attempt to reduce inconsistency by ei- gram. 1 (1971), 366-375.
ther minimizing the DM's cognitive burden or by [2] BENSON, H.P., AND AKSOY, Y.: 'Using efficient fea-
incorporating tests for inconsistency that are used sible directions in interactive multiple objective linear
as the interactive solution process proceeds. programming', Oper. Res. Lett. 10 (1991), 203-209.
[3] BUCHANAN, J.T., AND DAELLENBACH, H.G.: 'A com-
W.S. Shin and A. Ravindran [12] have compared parative evaluation of interactive solution methods for
various of the classes of interactive algorithms ac- multiple objective decision models', Europ. J. Oper.
cording to four criteria that are important in prac- Res. 29 (1987), 353-359.
[4] EVANS, G.W.: 'An overview of techniques for solv-
tice. These criteria are the DM's cognitive bur-
ing multiobjective mathematical programs', Managem.
den, the ease with which the single-objective op- Sci. 30 (1984), 1268-1282.
timizations called for can be used, implemented [5] GEOFFRION, A.M., DYER, J.S., AND FEINBERG, A.:
and solved, the handling of inconsistency, and the 'An interactive approach for multicriterion optimiza-

485
Multi-objective optimization: Interactive methods for preference value functions

tion, with an application to the operation of an aca- where


demic department', Managem. Sci. 19 (1972), 357-368.
[6] GOICOECHEA, A., HANSEN, D.R., AND DUCKSTEIN,
L.: Multiobjective decision analysis with engineering
and business applications, Wiley, 1982.
X-{xcX" gi(x)<O,}i-l,...,m, X ~C R n "

[7] KOHORNEN, P., AND LAAKSO, J.: 'A visual interac-


tive method for solving the multiple criteria problem', Note here that vector inequalities are commonly
Europ. J. Oper. Res. 24 (1986), 277-287. used: for any n-vectors a and b, a > b means
[8] MARCOTTE, O., AND SOLAND, R.: 'An interactive ai > bi ( i - 1, . . . , n ). Also, a > b means ai > bi
branch-and-bound algorithm for multiple criteria op-
(i - 1 , . . . , n ) . On the other hand, a > b means
timization', Managem. Sci. 32 (1985), 61-75.
[9] MONARCHI, D.E., KISIEL, C.C., AND DUCKSTEIN, L.:
a > b but a ~ b. Hereafter, vector inequalities
'Interactive multiobjective programming in water re- such as g ( x ) < 0 will be used instead of g i ( x ) < 0
sources: A case study', Water Resources Res. 9 (1973), (i- 1,...,m).
837-850. Defining a dual problem (D) in some appropri-
[lO] MUSSELMAN, K., AND TALAVAGE, J.: 'A tradeoff cut
ate way associated with the problem (P), our aim
approach to multiple objective optimization', Oper.
Res. 28 (1980), 1424-1435. is to show the property min(P) - max (D). Here
[11] ROSENTHAL, R.E.: 'Concepts, theory and techniques: min(P) denotes the set of efficient points of the
Principles of multiobjective optimization', Decision problem (P) in the objective function space R p,
Sci. 16 (1985), 133-152. and similarly max (D) the one of the dual problem
[12] SHIN, W.S., AND RAVINDRAN, A.: 'Interactive multi- (D).
ple objective optimization. Survey I: Continuous case',
Comput. Oper. Res. 18 (1991), 97-114.
Unlike the usual mathematical programming,
[13] STEUER, R.E.: Multiple criteria optimization: Theory, the optimal value of the primal problem (and
computation, and application, Wiley, 1986. the dual problem) are not necessarily determined
[14] WALLENIUS, J.: 'Comparative evaluation of some in- uniquely in multi-objective optimization. Hence,
teractive approaches to multicriterion optimization', there have been developed several kinds of formu-
Managem. Sci. 21 (1975), 1387-1396.
lation of dual problem in order to get the desir-
[15] ZIONTS, S., AND WALLENIUS, J.: 'An interactive multi-
ple objective linear programming method for a class of able property min(P) - max (D). Regarding La-
underlying nonlinear utility functions', Managem. Sci. grange duality, three typical dualizations can be
29 (1983), 519-529. seen in linear cases, nonlinear cases and geometric
Harold P. Benson approaches [6].
Dept. Decision and Information Sci. Univ. Florida
Gainesville, Florida 32611-7169, USA
E-mail address: benson~dale, cba. ufl. edu L i n e a r Cases. The first result on duality for
MSC 2000:90C29 multi-objective optimization seems the one given
Key words and phrases: multi-objective optimization, mul- in [1] for linear cases. This is formulated as a ma-
tiple criteria decision making, interactive method, prefer- trix optimization including the vector optimization
ence value function, value function. as a special case. Although there have been several
related works, the probably most attractive one is
given in [2] because it is formulated as a natural
MULTI-OBJECTIVE OPTIMIZATION: LA- extension of traditional linear programming: Let
GRANGE DUALITY A be an m × n matrix, C a p × n matrix, and b an
As is well known, duality in mathematical pro- m-vector. Then the p r i m a l p r o b l e m (P) in linear
gramming is based on the property that any closed cases is formulated as
convex set can be also represented by the inter-
section of closed half spaces including it. Let the min Cx

multi-objective optimization problem to be con- (PI) s.t. Ax>=b


sidered here be given by x>0.
(p) ~min f(x) "- (f~(x),. . . , fp(X)) Associated with (PI), H. Iserman [2] defined the
( over x E X, dual p r o b l e m as

486
Multi-objective optimization: Lagrange duality

max Ab L(x, A) - f ( x ) + Ag(x).


(DI) s.t. AA/~= C Associated with this definition, the dual map
A~O. can be defined as
Here, the multiplier A > 0 is a p x m matrix whose @(A) - min f~(A),
elements are all nonnegative.
where
Then Isermann's duality is given by
i) Ab ~ Cx for all feasible x and A. f~(A) - {L(x,A)" x E X ' } .
ii) Suppose that Ab - C~ for some feasible Under the terminology, the dual problem asso-
and some feasible A. Then A is an efficient ciated with the primal problem (P) can be given
solution to (DI) and ~ is an efficient solution by
to (PI)"
(DTs) max U ~I'(A).
iii) min(Pi) - max(Di). ACE
It can be shown that cI, is concave point-to-set
N o n l i n e a r Cases. The most natural dualization map on F, namely
in nonlinear multi-objective optimization seems to
• (aA 1 + (1 - a)A 2)
be the one given in [10].
Consider the problem (P), and assume the fol- c (1 +
lowing: and ~(A) + R~_ is a convex set in R p for each
i) X' is a nonempty compact convex set. A C £. Here £ is the set of all p × m matrices
ii) f is continuous, and .f(Z) + R ~ is convex in whose components are all positive.
R p . T. Tanino and Y. Sawaragi [10] presented the
following as duality in multi-objective optimiza-
iii) gi (i = 1 , . . . , m) are continuous and convex.
tion:
Under these assumptions, it can be readily
shown that for every u E R m, both sets X(u) - THEOREM i i) For any x E X and y C ~(A)
{x e Z " g(x) ~ u } and Y(u) - f i X ( u ) ] - {y e y
RP" y - f (x), x C X', g(x) ~ u} are compact and
convex. ii) Suppose that ~ C X, A C £ and f(~) E ~(A).
The primal problem (P) can be embedded as Then ~"- f(~) is an efficient point to the pri-
(P0) in a family of perturbed problems (Pu) given mal problem (P) and also to the dual problem
by (DTs).
(P~,) minY(u). iii) Suppose that any efficient solutions to (P) are
all proper and that Slater's constraint quali-
Defining F = {u C Rm: X(u) ~ 0}, the set fication is satisfied. Then
F is convex. Now in a similar fashion to the or-
dinary mathematical programming, the perturbed min (P) C max (DTs).
map can be defined by [-7
W(u) - min {/(x)" x e X', g(x) <_u} . REMARK 2 The above theorem is not complete in
It is known that for every u E F, W(u) + R~_ is the sense that the relation min(P) = max (D) does
convex and not hold. Regarding conjugate duality, there have
been reports presenting w- min(P) = w - m a x (D)
W(u) + - +
(see, e.g., [4] and [9]). Several studies based on geo-
In addition, the map W is monotone and convex metric consideration have been made for deriving
on F. the relation min(P) = max (D) using vector valued
Now, define the vector valued Lagrangian func- Lagrangian. This will be stated in the following [-7
tion with a p × m matrix multiplier A as

487
Multi-objective optimization: Lagrange duality

G e o m e t r i c D u a l i t y . Geometric considerations DEFINITION 5 The primal problem (P) is said to


are made in [3], based on the supporting hyper- be J-normal, if for every # > 0
planes for epi W, and in [5], based on the support-
cl(AG(~)) -- Ad G(t,)"
ing conical varieties for epi W, which is denoted by
G here. The primal problem (P) is said to be J-stable,
Define if it is J-normal and for an arbitrary # > 0 the

for some x C X ~
/ problem
sup inf (#,f(x))+ (A g(x))
A>0xEX
has at least one solution. K]
Ya={y: (0,y) e G , 0 e R m, y e R P } .

Associates with the primal problem (P), we con- On the other hand, J.W. Nieuwenhuis [7] sug-
sider the following two kinds of dual problems" gested another normality condition:
[_j DEFINITION 6 The primal problem (P) is said to
AE£ be N-normal, if
where
clYG - YclG.
YS(A) -- {Y e R p" f ( x ) + Ag(x) ~ y, Vx e X'}
O
and
LEMMA 7 Slater's constraint qualification (3~,
(Dj) Y H - ( ~ #, ) , g(~) > 0) yields J-stability and N-normality. E]
#>0
)~>0
where THEOREM 8 Suppose that Yc is closed,
minD (P) ~ 0, and the efficient solutions to (P)
YH- (~,,)
are all proper. Then, under the condition of J-
-- { y E R p.
+ ], stability,
Vx C X~ S
min (P) - max (DN) -- max (Dj).
THEOREM 3 i) For any feasible x in (P) and
[:]
for any feasible y in (DN) or (Dj),

D u a l i t y for W e a k Efficiency. Define


ii) Assume that G is closed, that there exists at
least an efficient solution to the primal prob- YS'(A) -- {Y e R p" f ( x ) + Ag(x) 5( y, Vx e Z ' } .
lem, and that these solutions are all proper. THEOREM 9 Suppose that Yc is a nonempty sub-
Then, under the condition of Slater's con- set in R p and Yc + R~_ is bounded. Then under
straint qualification, the following holds: the condition of N-normality
min (P) = max (DN) = max (Dj). w- min cl YG -- w- max cl U Ys' (h)
AE/::
E]
= w- max cl U YH- (x,p).
REMARK 4 In the above duality, we assumed that
the convex set G is closed and that Slater's con- A>0
straint qualification is satisfied, which seem rela- [:]
tively restrictive. Instead of these conditions, J.
Jahn [3] assumed that YG is closed and some nor- REMARK 10 As can be readily seen, by defin-
mality condition. ing infA, for a set A C R p, as essentially
Define min cl(A + R~_) and similarly sup A as essentially
min c l ( A - R~_), we can have inf (P) - sup (DTs) --
AG(t,)--{a" (0, a) E G ( # ) , 0 E R m, a E R 1} sup (DN) -- sup (Dj) under some appropriate sta-
YG--{Y" (0, y) c G , 0 E R m, y c R m } . [:] bility condition [9]. [::]

488
Multi-objective optimization: Pareto optimal solutions, properties

See also: M u l t i - o b j e c t i v e o p t i m i z a t i o n : [8] SAWARAGI,Y., NAKAYAMA,H., ANDTANINO,T.: The-


P a r e t o o p t i m a l solutions, p r o p e r t i e s ; M u l t i - ory of multiobjective optimization, Acad. Press, 1985.
objective optimization: Interactive meth- [9] TANINO, T.: 'Supremum of a set in a multi-dimensional
space', J. Math. Anal. Appl. 130 (1988), 386-397.
ods for p r e f e r e n c e value functions; M u l t i -
[10] TANINO, T., AND SAWARAGI,Y.: 'Duality theory in
o b j e c t i v e o p t i m i z a t i o n : I n t e r a c t i o n of de- multiobjective programming', J. Optim. Th. Appl. 27
sign a n d control; O u t r a n k i n g m e t h o d s ; (1979), 509-529.
Preference disaggregation; Fuzzy multi- Hirotaka Nakayama
objective linear programming; Multi- Dept. Applied Math. Konan Univ.
o b j e c t i v e o p t i m i z a t i o n a n d decision sup- 8-9-1 Okamoto, Higashinada, Kobe 658, Japan
p o r t s y s t e m s ; P r e f e r e n c e d i s a g g r e g a t i o n ap- E-mail address: nakayama©konan-u.ac,jp
proach: Basic f e a t u r e s , e x a m p l e s f r o m finan- MSC2000: 90C29, 90C30
cial decision m a k i n g ; P r e f e r e n c e modeling; Key words and phrases: vector inequality, efficient point,
Multiple objective programming support; vector valued Lagrangian.
M u l t i - o b j e c t i v e i n t e g e r linear p r o g r a m -
ming; M u l t i - o b j e c t i v e c o m b i n a t o r i a l opti-
mization; B i - o b j e c t i v e a s s i g n m e n t p r o b l e m ; M U LTI- O BJ E C TIVE O P TIMI ZATI O N:
E s t i m a t i n g d a t a for m u l t i c r i t e r i a decision P A R E T O OPTIMAL SOLUTIONS~ PROP-
m a k i n g p r o b l e m s : O p t i m i z a t i o n techniques; ERTIES
M u l t i c r i t e r i a s o r t i n g m e t h o d s ; F i n a n c i a l ap- The multi-objective optimization (multiple criteria
plications of m u l t i c r i t e r i a analysis; Portfolio decision making) problem is the problem of choos-
selection a n d m u l t i c r i t e r i a analysis; Deci- ing a most preferred solution when two or more in-
sion s u p p o r t s y s t e m s w i t h m u l t i p l e criteria; commensurate, conflicting objective functions (cri-
Lagrange, Joseph-Louis; Lagrangian multi- teria) are to be simultaneously maximized. A cen-
pliers m e t h o d s for convex p r o g r a m m i n g ; In- tral difficulty in such problems is that, unlike in
teger programming: Lagrangian relaxation; single objective maximization problems, there is
D e c o m p o s i t i o n t e c h n i q u e s for M I L P : La- no obvious or simple way to define the concept of
grangian relaxation. a most preferred solution. Nevertheless, because
the applications of multi-objective optimization
References abound, there has been great interest during the
[1] GALE, D., KUHN, H.W, AND TUCKER, A.W: 'Linear past 30 years in seeking appropriate definitions
programming and the theory of games', in T.C. KOOP- for a most preferred solution and in developing
MANS (ed.): Activity Analysis of Production and Allo-
algorithms that aid the decision maker (DM) to
cation, Wiley, 1951, pp. 317-329.
[2] ISERMANN, H." 'On some relations between a dual pair find such a solution. These applications are in a
of multiple objective linear programs', Z. Oper. Res. wide variety of areas, including, for example, pro-
22 (1978), 33-41. duction planning, finance, environmental conser-
[3] JAHN, J.: 'Duality in vector optimization', Math. Pro- vation, academic planning, nutrition planning, ad-
gram. 25 (1983), 343-353.
vertising, facility location, auditing, blending tech-
[4] KAWASAKI,H.: 'A duality theorem in multiobjective
nonlinear prgramming', Math. Oper. Res. 7 (1982), 95- niques, transportation planning, and scheduling,
110. to name just a few.
[5] NAKAYAMA,H.: 'Geometric consideration of duality in There are several alternate mathematical for-
vector optimization', J. Optim. Th. Appl. 44 (1984), mulations of the multi-objective optimization
625-655.
problem [13]. For purposes of modeling the deter-
[6] NAKAYAMA, H.: 'Duality in multi-objective optimi-
zation', in T. GAL, T.J. STEWART, AND T. HANNE ministic multiple objective optimization problems
(eds.): Multicriteria Decision Making: Advances in found in management science/operations research,
MCDM Models, Algorithms, Theory and Applications, however, the most popular form of the problem is
Kluwer Acad. Publ., 1999, pp. 3.1-3.29. denoted
[7] NIEUwENHUIS,J.W.: 'Supremal points and generalized
duality', Math. Operationsforsch. Statist. Ser. Optim. (V) {VMAXs.t. xeX.[fl(x)""'fP(X)]
11 (1980), 41-59.

489
Multi-objective optimization: Pareto optimal solutions, properties

Here, p > 2, X is a nonempty subset of R n, each for problem (V). Notice that XE is a subset of
fj, j = 1 , . . . , p, is a real-valued function defined on XWE. In some cases of problem (V), such as when
X or on a suitable set containing X, and VMAX the objective functions are ratios of linear func-
indicates that we are to, in some as-yet unspecified tions, it is easier to analyze and generate points in
sense, 'vector maximize' the vector XWE than points in XE.
Let U represent a utility function defined on the
f(x) - ,
space R p of the objective functions of problem (V).
of objective/unctions (criteria) over X. The set X Suppose that U is coordinatewise increasing, i.e.,
is called the set of alternatives or the decision set. that whenever ~, z E R p satisfy ~ > z and ~j > zj
Of all of the solution concepts proposed for help- for some j - 1,... ,p, then U(~) > g(z). Suppose
ing the DM find a most preferred solution for prob- that x* is an optimal solution to the single objec-
lem (V), the concept of efficiency has proven to tive problem
be of overriding importance. An efficient (Pareto
optimal, noninferior, nondominated) solution for (s)
problem (V) is a point 5 E X such that there exists xEX
no other point x E X that satisfies f(z) > f(~)
and f(x) ~ f(~). Letting X E denote the set of all Then x* must be an efficient solution for problem
efficient points for problem (V), we see that when- (V) (cf. [11]).
ever 5 E XE, there is no other feasible point that The property in the previous paragraph explains
does at least as well as 5 in all of the criteria for to a great extent why the concept of efficiency is
problem (V) and strictly better in at least one cri- of such fundamental value. The assumption that
terion. A point 5 E X is called dominated when the utility function U in the above paragraph is
for some other point x E X, f(x) >_ f(~) and, coordinatewise increasing implies that in problem
for at least one j - 1,... ,p, fj(x) > fj(5). Thus, (S), for each j - 1 , . . . , p, more of fj is preferred to
we have the alternate definition for efficiency that less of fj. Thus, if we imagine that U is the utility
states that a point 5 is an efficient solution for (or value) function of the DM over the objective
problem (V) when 5 E X and there are no other function space of problem (V), then the previous
points in X that dominate 5. paragraph implies that whenever the DM prefers
One of the reasons for the fundamental impor- more to less in each objective function of problem
tance of the efficiency concept is that it has proven (V), any point that maximizes the DM's utility
to be highly useful in a variety of algorithms for for f (x) over X must be an efficient point in prob-
problem (V). Among these algorithms are the sat- lem (V). In short, as long as we know that the
isficing methods, compromise programming, most DM prefers more to less, we can confine the search
interactive methods, and the vector maximization for a most preferred solution to XE. Although the
method. The latter method, for instance, seeks to utility function of the DM is generally not actu-
generate either all of XE or key parts of XE. The ally available, in virtually all applications the DM
generated set is shown to the DM. Then, based does, indeed, prefer more to less in each objective
upon the DM's internal utility (or value) function, function of problem (V). Thus, in essentially all
the DM chooses from the generated set a most pre- cases, any most preferred solution for problem (V)
ferred solution. For details concerning these meth- will be found in XE.
ods for problem (V), see [7], [10], [12], [13], [14]. Because of the central importance of efficiency,
In some cases, it is useful to consider a slightly a great deal of effort has been made by researchers
relaxed concept of efficiency called weak efficiency. to delineate the properties of the efficient points
A point ~ E X is called a weakly efficient (weakly and of the efficient set for problem (V). In what
Pareto optimal, weakly noninferior, weakly non- follows, we shall briefly highlight some of the most
dominated) solution for problem (V) when there is important of these properties.
no other point x E X such that f(x) > f(~). Let Consider the single-objective optimization prob-
XWE denote the set of all weakly efficient points lem

490
Multi-objective optimization: Pareto optimal solutions, properties

p
The scalarization properties can be used for var-
max E wjfj(x), ious purposes, including the generation of points in
(W) j-1
X g , XWE and XPR E. For instance, when each f j,
s.t. xEX. j - 1 , . . . , p, is a linear function and X is a polyhe-
Here, wj, j = 1,... ,p, are parameters, which are dron, from properties 3) and 6), points in XE, in-
often thought of as weights associated with the cluding, at least potentially, all of XE, can be gen-
objective functions fj, j = 1 , . . . ,p, of problem erated by solving problem (W) as the parameter
(V). A number of so-called scalarization properties w > 0 is varied. Under the assumptions of property
for efficient points of problem (V) are expressed in 3), the same process will generate points in XpRE,
terms of problem (W). To present some of these, including, at least potentially, all of XpRE. How-
another efficiency concept, called proper efficiency, ever, from properties 3)-5), it is apparent that no
is needed. A point x ° is said to be a properly ef- such simple process for generating XE exists, even
ficient solution for problem (V) when x ° E XE under the assumptions of property 3). This is an-
and, for some sufficiently large number M, when- other motivation for the proper efficiency concept.
ever fi(x) > fi(x °) for some i = 1 , . . . , p and some Another i m p o r t a n t issue in efficiency concerns
x E X, there exists some j = 1 , . . . , p such that testing. One may want to test a given point for ef-
f j(x) < f j(x °) and ficiency in problem (V), and one may want to test
whether XE and X p R E a r e empty or not. We will
Si( ) - Si(x o )
<M.
-
present several of the properties of efficiency that
provide some of the theory for these tests. These
In words, for each properly efficient solution of properties all utilize the single-objective problem
problem (V), for each criterion, the possible mar- p
ginal gains in that criterion relative to the losses max Eli(x),
in the criteria that have losses cannot all be un- j=l
bounded from above. Let XPR E denote the set of (T) s.t. fj(x) > ]j(x°),
properly efficient solutions for problem (V), and j - 1,...,p,
let w T - ( W l , . . . , Wp). T h e n some key scalariza- xEX.
tion properties are as follows.
Here, x ° is an arbitrary element of R n. The prop-
1) If ~ is the unique optimal solution to problem erties are as follows.
(W) for some w > 0, w 7~ 0, then 5 E XE.
7) The point x ° E R n belongs to XE if and only
2) If ~ is an optimal solution to problem (W) if x ° is an optimal solution to problem (T).
for some w >_ 0, w 5¢ 0, then 5 E XWE.
8) Suppose that x ° E X in problem (T), and
3) Assume that for each j - 1 , . . . , p , fj is a that problem (T) has no finite m a x i m u m
concave function on the convex set X. Then value. Then X p R E - 0 [1].
X E XpRE if and only if ~ is an optimal so-
9) Suppose t h a t the assumptions of property 3)
lution to problem (W) for some w > 0.
hold, that x ° E X in problem (T), and that
4) Under the assumptions in property 3), 5 E problem (T) has no finite m a x i m u m value.
XWE if and only if ~ is an optimal solution Then, if the set
to problem (W) for some w > 0, w 7~ 0.
Z - {z E R p" z <_ f (x) for s o m e x E X }
5) Under the assumptions of property 3), if
X E XE but 5 ~ XpRE, then there exists is closed, XE -- O.
a w > 0, w 7~ 0 with wj - 0 for at least
10) Assume t h a t each fj, j - 1 , . . . , p , is a linear
one j - 1 , . . . , p such that 5 is an optimal
function and t h a t X is a polyhedron. Sup-
solution to problem (W).
pose that x ° E X in problem (T), and that
If each fj, j - 1 , . . . ,p, is a linear function problem (T) has no finite m a x i m u m value.
and X is a polyhedron, Xp R E -- XE. T h e n XE -- O.

491
Multi-objective optimization: Pareto optimal solutions, properties

11) Any optimal solution to problem (T) belongs following properties.


to XE.
12) Assume that for each j = 1 , . . . , p , fj is a
quasiconcave function on X, and that X is a
Notice from these properties that solving prob-
compact convex set. Then XWE is connected.
lem (T) is a useful tool for both testing a point
for efficiency and for investigating the issues of 13) Assume that for each j = 1 , . . . , p, fj is a con-
whether XE and X P R E a r e empty or not. In the cave function on R n, and that X is a compact
case of testing a point x ° for efficiency, property convex set. Then XE is connected.
7) shows that problem (T) can be used to obtain
Recall that a concave function on a convex set is
a definitive answer, i.e., using property 7), we will
also quasiconcave on the set. Therefore, from prop-
always detect whether or not x ° E XE. Further-
erty 12), it follows that XWE is connected when
more, when property 7) shows that x ° ¢_ XE, but
each objective function in problem (V) is a con-
problem (T) has an optimal solution x*, then, by
cave function on X, and X is a compact, convex
property 11), x* E XE. Notice also that in this
set.
case, x* dominates x °.
There are a variety of other properties of effi-
In the case of investigating whether or not XE cient points and of the efficient set for problem (V).
and X p R E a r e empty, however, definitive answers These include, for instance, density properties,
cannot usually be obtained by using these prop- stability-related properties, the domination prop-
erties. This is because none of the properties ad- erty [2], [3], [8], and complete efficiency-related
dresses the issue of whether or not XE and X p R E properties [4], [6]. For further reading, see [5], [7],
are empty when, instead of having an optimal solu- [9], [10], [12], [13], [14].
tion or having no finite maximum value, problem
See also: M u l t i - o b j e c t i v e o p t i m i z a t i o n : In-
(T) has a finite but unattained maximum value.
teractive methods for p r e f e r e n c e value
The one case where the properties can be used to
functions; M u l t i - o b j e c t i v e o p t i m i z a t i o n : La-
definitely detect whether or not XE and X p R E a r e
grange duality; Multi-objective optimiza-
empty is the case where the objective functions of
tion: I n t e r a c t i o n of d e s i g n a n d control; O u t -
problem (V) are all linear and X is a polyhedron.
ranking methods; Preference disaggrega-
In that case, problem (T) cannot have a finite but
tion; F u z z y m u l t i - o b j e c t i v e l i n e a r p r o g r a m -
unattained maximum value. Therefore, properties
m i n g ; M u l t i - o b j e c t i v e o p t i m i z a t i o n a n d de-
7), 10) and 11) can be used to detect whether or
cision s u p p o r t s y s t e m s ; P r e f e r e n c e disag-
not XE - X P R E is empty in such cases.
gregation approach: Basic features, exam-
One of the main challenges computationally to ples f r o m f i n a n c i a l d e c i s i o n m a k i n g ; P r e f -
generating all or parts of XE or XWE for the DM e r e n c e m o d e l i n g ; M u l t i p l e o b j e c t i v e pro-
to consider is that both XE and XWE are, except gramming support; Multi-objective integer
for trivial cases, nonconvex sets. Although some l i n e a r p r o g r a m m i n g ; M u l t i - o b j e c t i v e com-
researchers have suggested ways to mitigate this b i n a t o r i a l o p t i m i z a t i o n ; B i - o b j e c t i v e assign-
problem [5], it generally remains a major stum- m e n t p r o b l e m ; E s t i m a t i n g d a t a for m u l t i c r i -
bling block for algorithm development. In many teria decision m a k i n g problems: Optimiza-
common cases, however, XE or XWE possesses a tion techniques; Multicriteria sorting meth-
useful, although less valuable, property than con- ods; F i n a n c i a l a p p l i c a t i o n s of m u l t i c r i t e r i a
vexity upon which algorithms can be based. This analysis; P o r t f o l i o s e l e c t i o n a n d m u l t i c r i t e -
property is called connectedness. In particular, a ria analysis; D e c i s i o n s u p p o r t s y s t e m s w i t h
set Z C_ R n is connected if, whenever A and B multiple criteria.
are nonempty subsets of R n such that A has no
points in common with the closure of B, and B
References
has no points in common with the closure of A, [1] BENSON, H.P.: 'An improved definition of proper effi-
Z ~ A t2 B. Some common cases of problem (V) ciency for vector maximization with respect to cones',
where XE or XWE is connected are given in the J. Math. Anal. Appl. 71 (1979), 232-241.

492
Multicommodity flow problems

[2] BENSON, H.P.: 'On a domination property for vector these commodities might be telephone calls in a
maximization with respect to cones', J. Optim. Th.
telecommunications network, packages in a distri-
Appl. 39 (1983), 125-132.
bution network, or airplanes in an airline flight net-
[3] BENSON, H.P.: 'Errata corrige', J. Optim. Th. Appl.
43 (1984), 477-479. work. Each commodity has a unique set of charac-
[4] BENSON, H.P.: 'Complete efficiency and the initializa- teristics and the commodities are not interchange-
tion of algorithms for multiple objective programming', able. That is, you cannot satisfy demand for one
Oper. Res. Left. 10 (1991), 481-487. commodity with another commodity. The objec-
[5] BENSON, H.P., AND SAYIN, S.: 'Towards finding global
tive of the MCF problem is to flow the commodi-
representations of efficient sets in multiple objective
mathematical programming', Naval Res. Logist. 44 ties through the network at minimum cost without
(1997), 47-67. exceeding arc capacities. A comprehensive survey
[6] BENVENISTE, M.: 'Testing for complete efficiency in of linear multicommodity flow models and solution
a vector maximization problem', Math. Program. 12 procedures are presented in [2].
(1977), 285-288.
Integer multicommodity flow (IMCF) problems,
[7] GOICOECHEA, A., HANSEN, D.R., AND DUCKSTEIN,
L.: Multiobjective decision analysis with engineering a constrained version of the linear multicommodity
and business applications, Wiley, 1982. flow problem in which flow of a commodity (speci-
[8] HENIG, M.I.: 'The domination property in multicri- fied in this case by an origin-destination pair) may
teria optimization', J. Math. Anal. Appl. 114 (1986), use only one path from origin to destination.
7-16.
MCF and IMCF problems are prevalent in a
[9] Luc, D.T.: Theory of vector optimization, Springer,
1989. number of application contexts, including trans-
[10] SAWARAGI,Y., NAKAYAMA,H., AND TANINO, T.: The- portation, communication and production.
ory of multiobjective optimization, Acad. Press, 1985.
[11] SOLAND, R.M.: 'Multicriteria optimization: A general MCF Example Applications.
characterization of efficient solutions', Decision Sci. 10
• Routing vehicles in traffic networks (dynamic
(1979), 26-38.
[12] STEUER, R.: Multiple criteria optimization: Theory, traffic assignment). This involves the deter-
computation and application, Wiley, 1986. mination of minimum delay routes for ve-
[13] Yu, P.L.: Multiple-criteria decision making, Plenum, hicles from their origins to their respective
1985. destinations over the traffic network. The al-
[14] ZELENY, M.: Multiple criteria decision making, Mc-
lowable congestion levels determine the arc
Graw-Hill, 1982.
capacities. Alternatively, there are no capac-
Harold P. Benson ities but the cost on an arc is a function of
Dept. Decision and Information Sci. Univ. Florida
the amount of flow on the arc. In the former
Gainesville, Florida 32611-7169, USA
case, the objective function is linear while in
E-mail address: bensonCdale, cba. ufl. edu
the latter it is nonlinear.
MSC 2000:90C29
Key words and phrases: multi-objective optimization, mul- • Distribution systems planning. In this prob-
tiple criteria decision making, efficient solution, Pareto opti- lem there are different products (or, com-
mal solution, noninferior solution, nondominated solution, modities) produced at several plants with
weakly efficient solution, weakly Pareto optimal solution,
known production capacities. Each commod-
weakly noninferior solution, weakly nondominated solution,
properly efficient solution. ity has a certain demand in each customer
zone. The demand is satisfied by shipping
via regional distribution centers with finite
MULTICOMMODITY FLOW PROBLEMS storage capacities. A.M. Geoffrion and G.W.
Graves [28] model this problem of routing the
Linear multicommodity flow problems (MCF) are
commodities from the manufacturing plants
linear programs (LPs) that can be characterized
to the customer zones through the distribu-
by a set of commodities and an underlying net-
tion centers as a MCF problem.
work. A commodity is a good that must be trans-
ported from one or more origin nodes to one or • Import and export models. One of the factors
more destination nodes in the network. In practice that may affect export is handling capacity at

493
Multicommodity flow problems

ports. D. Barnett, J. Binkley and B. McCarl be modeled as a MCF problem with inflows
[8] use a MCF model to analyze the effect of given as probabilistic density functions.
US port capacities on the export of wheat, • Forest management. For each planning pe-
corn and soybean. riod, forest managers have to make decisions
• Optimization of freight operations. T. concerning the land areas to be harvested,
Crainic, J.A. Ferland and J.M. Rousseau [20] the volume of timber to be harvested from
develop a MCF-based routing and scheduling these areas, the land areas to be developed
optimization model that considers the plan- for recreation and the road network to be
ning issues for the railroad industry. More built and maintained in order to support
recently, H.N. Newton [48] and C. Barnhart, both the timber haulers and recreationists.
H. Jin and P.H. Vance [13] study the rail- This problem has been formulated as a MCF
road blocking problem using multicommod- problem in [33].
ity based formulations. • Street planning. L.R. Foulds [26] introduced
• Freight Assignment in the Less-than- this problem and modeled it as a MCF prob-
Truckload (LTL) industry. An LTL carrier lem. The objective is to identify a set of two-
has to consolidate many shipments to make way streets such that making these streets
economic use of the vehicles. This requires one-way minimizes the total congestion cost
the establishment of a large number of termi- in the network.
nals to sort freight. Trucking companies use • Spatial price equilibrium (SPE) problem.
forecasted demands to define routes for each This problem requires modeling consumer
vehicle to carry freight to and from the termi- flows within a general network. The SPE
nals. Once the routes are fixed, the problem problem determines the optimum levels of
is to deliver all the shipments with minimum production and consumption at each market
total service time or cost. This problem is and the optimal flows satisfy the equilibrium
formulated as a MCF problem in [17] and property. R.S. Segall [59] models and solves
[24]. the SPE problem as a MCF problem.
• Express Shipment Delivery. D. Kim [40] For a more comprehensive description of MCF
models the shipment delivery problem faced applications, see [57], [2], [37].
by express carriers like Federal Express,
United States Postal Service, United Parcel IMCF Example Applications.
Service, etc. as a MCF problem on a network • Airline fleet assignment. Given a time ta-
in space and time. ble of flight arrivals and departures, the ex-
• Routing messages in a telecommunications or pected demand on the flights and a set of air-
computer network. The network consists of craft, the objective is to arrive at a minimum
transmission lines. Each message request is cost assignment of aircraft to the flights. This
a commodity. The problem is to route the problem has been extensively studied in [1],
messages from origins to the respective des- [31].
tinations at a minimum cost. T.L. Magnanti • Airline crew scheduling. This problem deals
et al. [42] and others provide MCF-based for- with the minimum cost scheduling of crews.
mulations for this problem. Factors such as hours of work limitations and
• Long-term hydro-generation optimization. Federal Aviation Administration regulations
The task in this case is to determine the must be taken into account while solving the
amount of hydro-generation at a reservoir in problem. For an in-depth study see [5], [14].
an interval of time, that minimizes the ex- • Airline maintenance routing problems re-
pected cost of power generation over a pe- quire that single aircraft be routed such that
riod of time, divided into several intervals. N. maintenance requirements are satisfied and
Nabonna [47] showed that this problem can each flight is assigned to exactly one aircraft.

494
Multicommodity flow problems

This problem has been studied in [19], [10], signing commodity k in its entirety to arc ij equals
k
[251 qk times the unit flow cost for arc ij, denoted cij.
Bandwidth packing problems require that Arc ij has capacity dij, for all ij E A. Node i has
bandwidth be allocated in telecommunica- supply of commodity k, denoted b/k, equal to 1 if
tions networks to maximize total revenue. i is the origin node for k, equal to - 1 if i is the
destination node for k, and equal to 0 otherwise.
The demands, or calls, on the networks are
the commodities and the objective is to route The node-arc MCF formulation is:
the calls from their origin to their destina- kqk k
minimize E E Cij Xij (1)
tion. In the case of video teleconferencing,
kEK ijEA
since call splitting is not allowed, each call
must be routed on exactly one network path. such that
This IMCF problem is described in [49]. k k
E xij- E xji -- bki, Vi E N, Vk E K, (2)
Package flow problems, such as those arising ijEA jiEA
in express package delivery operations, re-
~-'~ qk Xijk <
_ dij , Vij E A, (3)
quire that shipments, each with a specific ori-
kEK
gin and destination, be routed over a trans-
Xijk _> O, Vij E A vk E K. (4)
portation network. Each set of packages with
a common origin-destination pair can be con-
Note that without restricting generality of the
sidered as a commodity and often, to facil-
problem, we model the arc flow variables x having
itate operations and ensure customer satis-
values between 0 and 1. To do this, we scale the
faction, must be assigned to a single network
demand for each commodity to 1 and accordingly
path. These problems are cast as IMCF prob-
adjust the coefficients in the objective function (1)
lems in [12].
and in constraints (3). Also note the block-angular
structure of this model. The conservation of flow
F o r m u l a t i o n s . Multicommodity flow problems constraints (2) form nonoverlapping blocks, one for
can be modeled in a number of ways depending each commodity. Only the arc capacity constraints
how one defines a commodity. There are three ma- (3) link the values of the flow variables of different
jor options: a commodity may originate at a sub- commodities.
set of nodes in the network and be destined for To contrast, the path-based or column genera-
another subset of nodes, or it may originate at a tion MCF formulation has fewer constraints, and
single node and be destined for a subset of the far more variables. Again, the underlying network
nodes, or finally it may originate at a single node G is comprised of node set N and arc set A, with
and be destined for a single node. K.L. Jones et qk representing the quantity of commodity k. P(k)
al. [34] present models for each of these different represents the set of all origin-destination paths in
cases. In the interest of space, we will only consider G for k, for all k E K. In the column generation
models for the last case. The other cases can also model, the binary decision variables are denoted
be modeled using variants of the models presented yk, where ypk is the fraction of the total flow of
here. commodity k assigned to path p E P(k). The cost
We present two different formulations of the of assigning commodity k in its entirety to path p
MCF problem: the node-arc or conventional for- equals qk times the unit flow cost for path p, de-
mulation and the path or column generation for- noted Cpk. ck represents the sum of the cija costs for
mulation. The MCF is defined over the network G all arcs ij contained in path p. As before, arc ij
comprised of node set N and arc set A. MCF con- P is equal
has capacity dij, for all ij E A. Finally, 5ii
tains decision variables x, where xijk is the fraction to 1 if arc ij is contained in path p E P(k), for all
of the total quantity (denoted qk ) of commodity k E K; and is equal to 0 otherwise.
k assigned to arc ij. In the IMCF problem these The path or column generation IMCF formula-
variables are restricted to be binary. The cost of as- tion is then:

495
Multicommodity flow problems

minimize ~ E k kypk
Cpq (5) ploit the underlying network structure. Experi-
kEgpEP(k) ences with primal partitioning techniques have
such that been reported in [51], [53], [54], [55], [32], [43],
[36], [24], among others. J.B. Rosen [53] devel-
E E
kEKpeP(k)
P < dij, Vij E A, (6) ops a partitioning strategy for angular problems.
J.K. Hartman and L.S. Lasdon [32] develop a gen-
E ykp - 1 ' VkEK, (7) eralized upper bounding algorithm for multicom-
p~P(k) modity network flow problems in which the special
> 0, vp e P(k), vk e K. (s) structure of the MCF problem is exploited. Their
primal partitioning procedure, a specialization of
the generalized upper bounding procedure devel-
LP Solution M e t h o d s . Comprehensive surveys oped by G.B. Dantzig and R.M. Van Slyke [21],
of the available multicommodity network flow so- involves the determination at each iteration of the
lution techniques are provided in [6], [37]. Descrip- inverse of a basis containing only one row for each
tions of these approaches are also provided in [2], saturated arc. Similarly, C.J. McCallum [44] devel-
[38]. oped a generalized upper bounding algorithm for a
Price-directive decomposition techniques use the communications network planning problem. All of
path-based MCF model. To limit the number of these procedures exploit the block-diagonal prob-
variables considered in finding an optimal solu- lem structure and perform all steps of the simplex
tion, column generation techniques are used. Fur- method on a reduced working basis of dimension
ther details of price-directive decomposition and m, where m represents the size of set A.
column generation are provided in [22], [41], [61], Interior point methods and parallel comput-
[18], [45]. ing techniques have also been applied to MCF
Resource-directive decomposition techniques at- problems. Interior point methods provide polyno-
tempt to solve MCF problems by allocating arc mial time algorithms for the MCF problems. The
capacity by commodity and solving the resulting best time bound is due to P.M. Vaidya [62]. G.L.
decoupled minimum cost flow problems for each Schultz and R.R. Meyer [58] provide an interior
commodity. Additional descriptions of this tech- point method with massive parallel computing to
nique can be found in [52], [61], [27], [41], [30], solve multicommodity flow problems.
[37], [39], [35], [60]. Development of new heuristic procedures for
Computational comparisons of the performance MCF problems include the primal and dual-ascent
of price- and resource-directive decomposition heuristics described in [17] and [9], respectively. A.
methods can be found in [3], [4]. A. Ali, R.V. Gersht and A. Shulman [29] use a barrier-penalty
Helgason, J.L. Kennington, and H. Lall [4] report method to find nearly optimal solutions for mul-
that specialized decomposition codes can be ex- ticommodity problems, while R. Schneur [62] de-
pected to run from three to ten times faster than a scribes a scaling algorithm to determine nearly fea-
general linear programming package. Furthermore, sible MCF solutions.
A.A. Assad [7] reports that resource-directive al- Recently, price-directive decomposition or col-
gorithms converge quickly for small problems but umn generation approaches, such as those pre-
are outperformed by the price-directive method for sented in [2], [11], [23], [34] have been the most
larger MCF problems. extensively used method for solving large versions
G. Saviozzi [56] uses subgradient techniques of the linear MCF problem. The general idea of col-
on the Lagrangian relaxation of the bundle con- umn generation is that optimal solutions to large
straints and proposes a method of arriving at an LP's can be obtained without explicitly including
advanced starting basis for the minimum cost mul- all columns (i.e., variables) in the constraint ma-
ticommodity flow problem. trix (called the Master Problem or MP). In fact,
Partitioning methods specialize the simplex only a very small subset of all columns will be in an
method by partitioning the current basis to ex- optimal solution and all other (nonbasic) columns

496
Multicommodity flow problems

can be ignored. In a minimization problem, this I P S o l u t i o n M e t h o d s . The ability to solve large


implies that all columns with positive reduced cost MCF LP's enables the solution of large IMCF
can be ignored. The multicommodity flow column problems. Successful approaches for solving large
generation strategy, then, is" IMCF problems use the path-based or column gen-
0) RMP Construction. Include a subset of eration formulation of the problem. Column gen-
columns in a restricted MP, called the Re- eration IP's can be solved to optimality using a
stricted Master Problem, or RMP; procedure known as branch and price, detailed in
[15], [64], [23]. Branch and price, a generalization
1) RMP Solution. Solve the RMP LP;
of branch and bound with LP relaxations, allows
2) Pricing Problem Solution. Use the dual vari- column generation to be applied at each node of
ables obtained in solving the RMP to solve the branch and bound tree. Branching occurs when
the pricing problem. The pricing problem ei- no columns price out to enter the basis and the LP
ther identifies one or more columns with neg- solution does not satisfy the integrality conditions.
ative reduced cost (i.e., columns that price Applying a standard branch and bound proce-
out) or determines that no such column ex- dure to the final restricted master problem with
ists. its existing columns will not guarantee an optimal
3) Optimality Test. If one or more columns price (or feasible) solution. After the branching decision
out, add the columns (or a subset of them) modifies RMP, it may be the case that there exists
to the RMP and return to Step 1; otherwise a column for MP that prices out favorably, but is
stop, the MP is solved. not present in RMP. Therefore, to find an opti-
mal solution we must maintain the ability to solve
For any RMP in Step 1, let -Trij represent the
the pricing problem after branching. The impor-
nonnegative dual variables associated with con-
tance of generating columns after the initial LP
straints (6) and a k represent the unrestricted dual
has been solved is demonstrated for airline crew
variables associated with constraints (7). Since Cp k
scheduling applications in [63]. Although they were
can be represented a s ~-~ijEA
¢ij(~ij'
k p the reduced
unable to find even feasible IP solutions using just
cost of column p for commodity k, d e n o t e d c-kpqk,
the columns generated to solve the initial LP re-
is"
laxation, they were able to find quality solutions
-k qk :
qk +
p _ ak , (9)
using a branch and price approach for crew sched-
ijEA uling problems in which they generated additional
Vp E P(k), Vk E g. columns whenever the LP bound at a node ex-
ceeded a preset IP target objective value.
For each RMP solution generated in Step 1, the The difficulty of performing column generation
pricing problem in Step 2 can be solved efficiently. with branch and bound is that conventional in-
Columns that price out can be identified by solv- teger programming branching on variables may
ing one shortest path problem for each commod- not be effective because fixing variables can de-
ity k E K over a network with arc costs equal to stroy the structure of the pricing problem. For
cijk + lrij, for each ij E A. Let p , represent a result- the multicommodity flow application, a branch-
ing shortest path p , for commodity k. Then, if for ing rule is needed that ensures that the pricing
all k E K, problem for the LP with the branching decisions
k k included can be solved efficiently with a shortest
cp, q >_0,
path procedure. To illustrate, consider branching
the MP is solved. Otherwise, the MP is not solved based on variable dichotomy in which one branch
and, for each k E K with forces commodity k to be assigned to path p, i.e.,
ypk _ 1, and the other branch does not allow com-
Ckp,qk ~ O,
modity k to use path p, i.e., yk _ 0. The first
path p, E P(k) is added to the RMP in Step 3. branch is easy to enforce since no additional paths
need to be generated once k is assigned to path

497
Multicommodity flow problems

p. The latter branch, however, cannot be enforced subproblems are formulated as constrained or un-
if the pricing problem is solved as a shortest path constrained shortest path problems.
problem. There is no guarantee that the solution to P. Raghavan and C.D. Thompson [50] illustrate
the shortest path problem is not path p. In fact, it the use of randomized algorithms to solve some
is likely that the shortest path for k is indeed path integer multicommodity flow problems. They use
p. As a result, to enforce a branching decision, the randomized rounding procedures that give prov-
pricing problem solution must be achieved using a ably good solutions in the sense that they have a
next shortest path procedure. In general, for a sub- very high probability of being close to optimality.
problem, involving a set of a branching decisions, Barnhart et al. [12] present a branch and price
the pricing problem solution must be achieved us- and cut algorithm for general IMCF problems
ing a kth shortest path procedure. where each commodity is represented by an origin-
The key to developing a branch and price pro- destination pair and flow volume. Branch and cut,
cedure is to identify a branching rule that elimi- another variant of branch and bound, allows valid
nates the current fractional solution without com- inequalities, or cuts, to be added throughout the
promising the tractability of the pricing problem. branch and bound tree. Branch and price and cut
In general, J. Desrosiers et al [23] argue this can combines column and row generation to yield very
be achieved by basing branching rules on variables strong LP relaxations at nodes of the branch and
in the original formulation, and not on variables bound tree.
in the column generation formulation. This means See also: M i n i m u m cost flow p r o b l e m ; N o n -
that branching rules should be based on the arc convex n e t w o r k flow p r o b l e m s ; Traffic net-
flow variables Xij
k from the node-arc formulation of w o r k e q u i l i b r i u m ; N e t w o r k location: Coy-
the problem. Barnhart et al. [15] develop branch- e r i n g p r o b l e m s ; M a x i m u m flow p r o b l e m ;
ing rules for a number of different master problem Shortest path tree algorithms; Steiner tree
structures. They also survey specialized algorithms problems; Equilibrium networks; Survivable
that have appeared in the literature for a broad networks; Directed tree networks; Dynamic
range of applications. traffic n e t w o r k s ; A u c t i o n a l g o r i t h m s ; Piece-
M. Parker and J. Ryan [49] present a branch and wise linear n e t w o r k flow p r o b l e m s ; N o n o r i -
price algorithm for the bandwidth packing prob- e n t e d m u l t i c o m m o d i t y flow p r o b l e m s ; C o m -
lem. in which the objective is to choose which of munication network assignment problem;
a set of commodities to send in order to maxi- G e n e r a l i z e d n e t w o r k s ; E v a c u a t i o n networks;
mize revenue. They use a path-based formulation. N e t w o r k d e s i g n p r o b l e m s ; S t o c h a s t i c net-
Their branching scheme selects a fractional path w o r k problems" M a s s i v e l y parallel solution.
and creates a number of new subproblems equal to
the length of the path (measured in the number of References
[1] ABARA, J.: 'Applying integer linear programming to
arcs it contains) plus one. On one branch, the path
the fleet assignment problem', Inter:faces 19 (1989),
is fixed into the solution and on each other branch, 20-28.
one of the arcs on the path is forbidden. To limit [2] AHUJA, R.K., MAGNANTI, T.L., AND ORLIN, J.B.:
time spent searching the tree they use a dynamic Network flows: Theory, algorithms, and applications,
optimality tolerance. They report the solution of Prentice-Hall, 1993.
[3] ALI, A.I., BARNETT, D., FAaHANGIAN, K., KENNING-
14 problems with as many as 93 commodities on
TON, J.L., PATTY, B., SHETTY, B., MCCARL, B., AND
networks with up to 29 nodes and 42 arcs. All but TONG, P.: 'Multicommodity network problems: Appli-
two of the instances are solved to within 95% of cations and computations', IIE Trans. 16 (1984), 127-
optimality. 134.
K. Ziarati et al. [16] consider the problem of as- [4] ALI, A., HELGASON, R., KENNINGTON, J., AND LALL,
H.: 'Computational comparison among three multi-
signing railway locomotives to trains. They model
commodity network flow algorithms', Oper. Res. 28
the problem as an integer multicommodity flow (1980), 995-1000.
problem with side constraints and solve using [5] ANBIL, R., GELMAN, E., PATTY, B., AND TANGA,
a Dantzig-Wolfe decomposition technique, where R.: 'Recent advances in crew-pairing optimization at

498
Multicommodity flow problems

American Airlines', Interfaces 21 (1991), 62-64. [21] DANTZIG, G.B., AND SLYKE, R.M. VAN: 'Generalized
[6] ASSAD, A.A.: 'Multicommodity network flows- A sur- upper bounding techniques', J. Comput. Syst. Sci. 1
vey', Networks 8 (1978), 37-91. (1967), 213-226.
[7] ASSAD, A.A.: 'Solving linear multicommodity flow [22] DANTZIG, G.B., AND WOLFE, P.: 'Decomposition
problems': Proc. IEEE Internat. Conf. Circuits and principle for linear programs', Oper. Res. 8 (1960),
Computers, Vol. 1, 1980, pp. 157-161. 108-111.
[s] BARNETT, D., BINKLEY, J., AND MCCARL, B.: 'The [23] DESROSIERS, J., DUMAS, Y., SOLOMON, M.M., AND
effects of US port capacity constraints on national and SOUMIS, F.: 'Time constrained routing and schedul-
world grain shipments', Techn. Paper Purdue Univ. ing', in M.E. BALL, T.L. MAGNANTI, C. MONMA, ,
(1982). AND G.L. NEMHAUSER (eds.): Handbook Oper. Res.
[9] BARNHART, C.: 'Dual-ascent methods for large-scale and Management Sci., Vol. 8, Elsevier, 1995, pp. 35-
multi-commodity flow problems', Naval Res. Logist. 40 139.
(1993), 305-324. [24] FARVOLDEN, J.M., POWELL, W.B., AND LUSTIG, I.J.:
[10] BARNHART, C., BOLAND, N.L., CLARKE, L.W., 'A primal partitioning solution for the arc-chain for-
JOHNSON, E.L., NEMHAUSER, G.L., AND SHENOI, mulation of a multicommodity network flow problem',
R.G.: 'Flight string models for aircraft fleeting and Oper. Res. 4, no. 4 (1993), 669-693.
routing', Transport. Sci. 32, no. 3 (1998), 208-220, Fo- [25] FEO, T.A., AND BARD, J.F.: 'Flight scheduling and
cused Issue on Airline Optimization. maintenence base planning', Managem. Sci. 35 (1989),
[11] BARNHART, C., HANK, C.A., JOHNSON, E.L., AND 1415-1432.
SIGISMONDI, G.: 'A column generation and parti- [26] FOULDS, L.R.: 'A multicommodity flow network design
tioning approach for multicommodity flow problems', problem', Transport. Res. B 15 (1981), 273-283.
Telecommunication Systems 3 (1995), 239-258. [27] GEOFFRION, A.M.: 'Primal resource-directive ap-
[12] BARNHART, C., HANK, C.A., AND VANCE, P.H.: proaches for optimizing non-linear decomposable sys-
'Using branch-and-price-and-cut to solve origin- tems', Oper. Res. 18 (1970), 375-403.
destination integer multicommodity flow problems', [2s] GEOFFRION, A.M., AND GRAVES, G.W.: 'Multicom-
Oper. Res. 48, no. 2 (2000), 318-326. modity distribution systems design by Bender's decom-
[13] BARNHART, C., JIN, H., AND VANCE, P.H.: 'Railroad position', Managem. Sci. 20 (1974), 822-844.
blocking: A network design application', Working Pa- [29] GERSHT, A., AND SHULMAN, A.: 'A new algorithm for
per Center Transport. Stud., MIT (1997). the solution of the minimum cost multicommodity flow
[14] BARNHART, C., JOHNSON, E.L., ANBIL, R., AND problem', Proc. 26th IEEE Conf. Decision and Control
HATAY, L.: 'A column generation technique for the (1987), 748-758.
long-haul crew-assignment problem', in T.A. CIRIANI [30] GRINOLD, R.C.: 'Steepest ascent for large scale linear
AND R.C. LEACHMAN (eds.): Optimization in Indus- program', SIAM Rev. 14 (1972), 447-464.
try: Math. Programming and Optimization Techniques, [31] HANK, C.A., BARNHART, C., JOHNSON, E.L.,
Vol. 2, Wiley, 1994, pp. 7-24. MARSTEN, R.E., NEMHAUSER, G.L., AND SIGISMONDI,
[15] BARNHART, C., JOHNSON, E.L., NEMHAUSER, G.L., G.: 'The fleet assignment problem: solving a large-scale
SAVELSBERGH, M.W.F., AND VANCE, P.H.: 'Branch- integer program', Math. Program. 70 (1995), 211-232.
and-price: Column generation for solving huge integer [32] HARTMAN, J.K., AND LASDON, L.S.: 'A generalized
programs', Oper. Res. 46, no. 3 (1998), 316-329. upper bounding algorithm for multicommodity net-
[16] BARNHART, C., JOHNSON, E.L., NEMHAUSER, G.L., work flow problems', Networks 1 (1972), 333-354.
SIGISMONDI, G., AND VANCE, P.: 'Formulating a mixed [33] HELGASON, R., KENNINGTON, J., AND WONG, P.: 'An
integer programming problem to improve solvability', application of network programming for national for-
Oper. Res. 41 (1993), 1013-1019. est planning', Techn. Report Dept. Oper. Res. Southern
[17] BARNHART, C., AND SHEFFI, Y.: 'A network-based Methodist Univ., Dallas O R 81006 (1981).
primal-dual heuristic for the solution of multi- [34] JONES, K.L., LUSTIG, I.J., FARVOLDEN, J.M., AND
commodity network flow problems', Transport. Sci. 27 POWELL, W.B.: 'Multicommodity network flows: The
(1993), 102-117. impact of formulation on decomposition', Math. Pro-
[ls] BAZARAA, M.S., AND JARVIS, J.J.: Linear program- gram. 62 (1993), 95-117.
ming and network flows, Wiley, 1977. [35] KARKAZIS, J., AND BOFFEY, T.B.: 'A subgradient
[19] CLARKE, L.W., JOHNSON, G.L., NEMHAUSER, G.L., based optimal solution method for the multicommodity
AND ZHU, Z.: 'The aircraft rotation problem', Ann. problem', in R.E. BURKARD AND T. ELLINGER (eds.):
Oper. Res.: Math. Industr. Systems II 69 (1997), 33- Methods Oper. Res., Vol. 40, Anton Hain Verlag, 1981,
46. pp. 339-344.
[20] CRAINIC, T., FERLAND, J.A., AND ROUSSEAU, J.M.: [36] KENNINGTON, J.L.: 'Solving multicommodity trans-
'A tactical planning model for Rail freight transporta- portation problems using a primal partitioning simplex
tion', Transport. Sci. 18 (1984), 165-184. technique', Naval Res. Logist. Quart. 24 (1977), 309-

499
Multicommodity flow problems

325. generalized upper bounding techniques for structured


[37] KENNINGTON, J.L.: 'A survey of linear cost network linear programs', SIAM J. Appl. Math. 15 (1967), 906-
flows', Oper. Res. 26 (1978), 209-236. 914.
[38] KENNINGTON, J.L., AND HELGASON, R.V.: Algorithms [56] SAvIOZZI, G.: 'Advanced start for the multicommod-
for network programming, Wiley, 1980. ity network flow problem', Math. Program. Stud. 26
[39] KENNINGTON, J.L., AND SHALABY, M.: 'An effective (1986), 221-224.
subgradient procedure for minimal cost multicommod- [57] SCHNEUR, R.: 'Scaling algorithms for multi-commodity
ity fl0w problems', Managem. Sci. 23 (1977), 994-1004. flow problems and network flow problems with side
[40] KIM, D.: 'Large scale transportation service network constraints', PhD Diss. Massachusetts Inst. Techn.
design: Models, algorithms and applications', PhD (1991).
Thesis Dept. Civil Engin. MIT (June 1997). [58] SCHULTZ, G.L., AND MEYER, R.R.: 'An interior point
[41] LASDON, L.: Optimization theory for large systems, method for block angular optimization', SIAM J. Op-
MacMillan, 1970. tim. 1 (1991), 583-602.
[42] MAGNANTI, T.L., MIRCHANDANI, P., AND VACHANI, [59] SEGALL, R.S.: 'Mathematical modeling of spatial price
R.: 'Modeling and solving the two-facility capacitated equilibrium for multicommodity consumer flows of
network loading problem', Oper. Res. 43, no. 1 (1995). large markets using variational inequalities', Appl.
[43] MAIER, S.F.: 'A compact inverse scheme applied to Math. Modeling 19, no. 2 (1995), 112-122.
a multicommodity network with resource constraints', [6o] SHETTY, B., AND MUTHUKRISHNAN, R.: 'A parallel
in R. COTTLE AND J. KRARUP (eds.): Optimization projection for the multicommodity network model', J.
Methods for Resource Allocation, English Univ. Press, Oper. Res. Soc. 41 (1990), 837-842.
1974, pp. 179-203. [61] SWOVELAND, C.: 'Decomposition algorithms for the
[44] MCCALLUM, C.J.: 'A generalized upper bounding ap- multicommodity distribution problem', Working Paper
proach to a commununications network planning prob- Western Management Sci. Inst. Univ. Calif., Los An-
lem', Networks 7 (1977), 1-23. geles 184 (1971).
[45] MINOUX, M.: Mathematical programming: Theory and [62] VAIDYA, P.M.: 'Speeding up linear programming using
algorithms, Wiley, 1986. fast matrix multiplication', Proc. 30th Annual Symp.
[46] MOORE, E.: 'The shortest path through a maze': Proc. Foundations of Computer Sci. (1989), 332-337.
Internat. Symposium on the Theory of Switching, Har- [63] VANCE, P.H., BARNHART, C., JOHNSON, E.L.,
vard Univ. Press, 1957, pp. 282-292. NEMHAUSER, G.L., MAHIDARA, D., KRISHNA, A.,
[47] NABONNA, N.: 'Multicommodity network flow model AND REBELLO, a.: 'Exceptions in crew planning',
for long term hydro-generation optimization', IEEE ORSA/TIMS Detroit, Michigan (1994).
Trans. Power Systems 8 (1993), 395-404. [64] VANDERBECK, F., AND WOLSEY, L.A.: 'An exact al-
[48] NEWTON, H.N.: 'Network design under budget con- gorithm for IP column generation', Oper. Res. Lett. 19
straints with application to the railroad blocking prob- (1996), 151-159.
lem', PhD Thesis Auburn Univ., Alabama (1996). [65] ZENIOS, S.A.: 'On the fine-grain decomposition of mul-
[49] PARKER, M., AND RYAN, J.: 'A column generation ticommodity transportation problems', SIAM J. Op-
algorithm for bandwidth packing', Telecommunication tim. 1 (1991), 643-669.
Systems 2 (1994), 185-195. [66] ZIARATI, K., SOUMIS, F., DESROSIERS, J., GELINAS,
[5o] RAGHAVAN, P., AND THOMPSON, C.D.: 'Randomized S., AND SAINTONGE, A.: 'Locomotive assignment with
rounding: A technique for provably good algorithms heterogeneous consists at CN North America', Working
and algorithmic proofs', Combinatorica 4 (1987), 365- Paper GERAD and l~cole Polytechnique de Montrdal
374. (1995).
[51] RITTER, K.: 'A decomposition method for linear pro-
gramming problems with coupling constraints and vari- Cynthia Barnhart
ables', Techn. Report Math. Res. Center Univ. Wiscon- Center for Transportation Studies
sin 739 (1967). Massachusetts Inst. Technol.
[52] ROBACKER, J.T.: 'Notes on linear programming: Cambridge, MA 02139, USA
Part XXXVII concerning multicommodity networks',
E-mail address: ¢barnhartDmit.edu
Techn. Report The Rand Corp. RM-1799 (1956).
Niranjan Krishnan
[53] ROSEN, J.B.: 'Primal partition programming for block
Center for Transportation Studies
diagonal matrices', Numerische Math. 6 (1964), 250-
Massachusetts Inst. Technol.
260.
Cambridge, MA 02139, USA
[54] SAIGAL, R.: 'Multicommodity flows in directed net-
works', Techn. Report Oper. Res. Center Univ. Calif. E- mail address: ninj ak¢mit, edu
ORC 67-38 (1967). Pamela H. Vance
[55] SAKAROWTCH, M., AND SAmAL, R.: 'An extension of Goizueta Business School
Emory Univ.

500
Multicriteria sorting methods

Atlanta, GA 30322, USA For instance, in medical diagnosis the classifica-


E-mail address: pvaaco©bus, omory, odu tion of patients according to their symptoms into
MSC 2000:90C35 several possible diseases is a discrimination (clas-
Key words and phrases: multicommodity network flows, sification) problem, since it is impossible to estab-
column generation, decomposition, transportation. lish a preference ordering between the diseases. On
the contrary, the evaluation of bankruptcy risk is
a sorting problem, since the non-bankrupt firms
MULTICRITERIA SORTING METHODS are preferred to the bankrupt ones. In this pa-
Decision making problems, according to their na- per the terms 'discrimination', 'classification', and
ture, the policy of the decision maker, and the 'sorting' will be used without distinction to refer
overall objective of the decision may require the to the general problem of assigning observations,
choice of an alternative solution, the ranking of the objects or alternatives into classes.
alternatives from the best to the worst ones or the The major practical interest of the sorting prob-
sorting of the alternatives in predefined homoge- lem, has motivated researchers in developing an
neous classes [30]. For instance, a decision regard- arsenal of methods for studying such problems,
ing the location of a new power plant can be con- with the aim being the development of quantita-
sidered as a choice problem, since the objective is tive models achieving the higher possible classifica-
to select the most appropriate location according tion accuracy and predicting ability. In 1936, R.A.
to environmental, social and investment criteria. Fisher [8] was the first to propose a framework
On the other hand, an evaluation of the efficiency for studying classification problems taking into ac-
of the different units of a firm can be considered count their multidimensional nature. The linear
as a ranking problem, since the objective is to es- discriminant analysis (LDA) that Fisher proposed
timate the relative performance of each unit com- has been used for decades as the main classifica-
pared to the others. Finally, a credit granting deci- tion technique and it is still being used at least as
sion is a sorting problem: a credit application can a reference point for comparing the performance
be accepted, rejected or submitted for further con- of new techniques that are developed. C. Smith
sideration, according to the business and personal in 1947 [34] extended Fisher's linear discriminant
profile of the applicant. Actually, a wide variety of analysis proposing quadratic discriminant analy-
decision problems, including financial and invest- sis (QDA) in order to overcome the restrictive as-
ment decisions, environmental decisions, medical sumption underlying LDA that groups have equal
decisions, etc., are better formulated and studied dispersion matrices. Later on, several other statis-
through the sorting approach. tical classification approaches have been proposed.
The sorting problem, generally stated, involves Among them logit and probit analysis are the most
the assignment of a set of observations (objects, widely used techniques overcoming the multivari-
alternatives) described over a set of attributes or ate normality assumption of discriminant analysis
criteria into predefined homogeneous classes. This (both linear and quadratic). Although these tech-
type of problem can also referred to as the 'dis- niques overcome most of the statistical restrictions
crimination' problem or the 'classification' prob- imposed in discriminant analysis, their parameters
lem. Although any of these three terms can be used are difficult to explain, especially in multigroup
to describe the general objective of the problem discriminant problems.
(i.e. the assignment of observations into groups), The continuous advances in other fields includ-
actually, they refer to two slightly different situ- ing operations research and artificial intelligence
ations: the discrimination or classification prob- led many scientists and researchers to exploit the
lem refers to the assignment of observations into new capabilities of these fields, in developing more
classes which are not necessarily ordered. On the efficient classification techniques. Among the at-
other hand, sorting refers to the problem in which tempts made one can mention neural networks,
the observations should be classified into classes machine learning, fuzzy sets as well as multicri-
which are ordered from the best to the worst ones. teria decision aid (MCDA). This article will focus

501
Multicriteria sorting methods

on MCDA and its application in the study of classi- velop a linear discriminant model so that the min-
fication problems with or without ordered classes. imum distance of the score of each alternative
MCDA provides an arsenal of powerful and effi- from a predefined cut-off point is maximized (max-
cient nonparametric classification methods and ap- imize the minimum distance-MMD). To develop
proaches, which are free of statistical assumptions this model, they proposed the following goal pro-
and restrictions, while furthermore they are able gramming formulation:
to incorporate the decision maker's preferences in
a flexible and realistic way. max d
The remainder of the article is organized as fol- s.t. ~ wixij ÷ d <_c, Vi E Group 1,
lows. Section 2 provides a review of MCDA sorting E wixij-d>c, ViEGroup2,
approaches and techniques, outlining their basic
characteristics, concepts and limitations. In sec- where wi is the weight of attribute i, xij is the eval-
tion 3, a new MCDA sorting method is described uation of alternative j on attribute i, and c is the
and its operation is depicted through a simple il- cut-off score (wi and d are unrestricted in sign).
lustrative example. Finally, section 4 concludes the Soon after proposing this model, the same au-
paper and outlines some possible future research thors proposed a variety of similar goal program-
directions concerning the application of MCDA in ming formulations incorporating several other dis-
sorting problems. crimination criteria, such as the sum of deviations
(optimize the sum of deviations-OSD), the sum
of interior deviations (minimize the sum of inte-
Multicriteria Sorting Methods. The MCDA
rior deviations-MSID) and the maximum devia-
methods which have been proposed for the study
tion [10].
of sorting problems can be distinguished either ac-
These two studies attracted the interest of sev-
cording to the approach from which they are orig-
eral operational researchers and management sci-
inated (multi-objective/goal programming, multi-
entists. S.M. Bajgier and A.V. Hill [2] proposed
attribute utility theory, outranking relations, pref-
a new goal programming approach in order to
erence disaggregation), or according to the type of
minimize the number of misclassifications using
problem that they address (ordered or non-ordered
a mixed integer programming formulation (MIP)
classes). The review presented in this section will
and conducted a first experimental study to com-
distinguish the methods according to their origi-
pare the MMD model, the OSD model, and their
nation, but in the same time the type of problems
MIP formulation with LDA. They concluded that
that they address will also be discussed.
the goal programming formulations are generally
Goal Programming Approaches. The work of A. superior to LDA, except for the case of moderate
Charnes and W.W. Cooper [4] set the foundations to low overlap between groups and equal disper-
on goal/multi-objective programming, but it can sion matrices, where LDA outperforms all the ex-
also be considered as one or the pioneering stud- amined goal programming formulations.
ies in the field of MCDA in general. Since then, The performance of goal programming ap-
both multi-objective and goal programming con- proaches compared to statistical techniques was
stitute two major fields of interest from the theo- an issue that several researchers tried to investi-
retical and practical points of view in the MCDA gate using mainly experimental data sets. Freed
and operations research communities. In partic- and Glover [11] compared MMD, MSID, OSD
ular, goal programming approaches, during the and LDA and they concluded that although the
1960s and the 1970s have been used to elicit at- presence of outliers pose a greater problem for
tribute weights in multiple criteria ranking deci- the two simpler goal programming formulations
sion problems ([15], [27], [36], [35]). N. Freed and (MMD and MSID) than for LDA, generally the
F. Glover [9] were among the first to investigate goal programming approaches outperform LDA.
the potentials of goal programming techniques in E.A. Joachimsthaler and A. Stam [18] compared
the discriminant problem. Their aim was to de- the LDA, QDA, logistic regression and OSD pro-

502
Multicriteria sorting methods

cedures and they concluded that these method- b) they produce improper solutions.
ologies produce similar results although the mis-
A solution is considered unbounded if the objec-
classification rates for LDA and QDA tended to
tive function can be increased or decreased without
increase with highly kurtosis data and increased
limit, in which case the discrimination rule (func-
dispersion heterogeneity. C.A. Markowski and E.P.
tion) may be meaningless, whereas a solution is
Markowski [22] examined the influence of qualita-
improper if all observations fall on the classifica-
tive attributes on the discriminating performance
tion hyperplane.
of MMD and LDA. Although the incorporation of
To overcome these problems new goal program-
qualitative attributes in LDA violates the normal-
ming formulations were proposed, including hybrid
ity assumption, the experimental study of the au-
models ([12], [13]), nonlinear programming formu-
thors showed that the incorporation of qualitative
lations [37], as well as several mixed integer pro-
variables improved the performance of LDA, while
gramming formulations ([1], [3], [5], [20], [33], [38],
on the other hand MMD did not appear to be par-
[39]).
ticularly well-suited for use with qualitative vari-
In the light of this review of goal programming
ables. In another experimental study conducted
approaches for discriminant problems it is possi-
by P.A. Rubin [32], QDA outperformed 15 goal
ble to identify the following three characteristics
programming approaches, leading the author to
of the research in this field:
indicate that 'if LP models are to be considered
seriously as an alternative to conventional proce- 1) The majority of the proposed models aim at
dures, they must be shown to outperform QDA developing a linear discrimination rule (func-
under plausible conditions, presumably involving tion). The extension of the models to develop
non-Gaussian data'. These experimental studies a nonlinear discriminant function leads to
clearly indicate the confusion concerning the dis- nonlinear programming formulations which
criminating performance of the goal programming are generally computationally intensive and
formulations as opposed to well known multivari- difficult to solve. Among the few alternative
ate statistical techniques. Except for this issue, approaches is the MSM method (multisur-
the research on the field of goal programming ap- face method) proposed by O.L. Mangasarian
proaches for discriminant problems, was also fo- [21] that leads to the construction of a piece-
cus on the theoretical drawbacks which were of- wise linear discrimination surface between
ten meet. Markowski and Markowski [23] were the two groups (see also [26] for a revision of the
first to identify two major drawbacks of the goal method using multi-objective programming
programming formulations (MMD and OSD) pro- and fuzzy mathematical programming tech-
posed by Freed and Glover ([9], [10]). More specif- niques).
ically, they proved that if each quadrant contains 2) Little research has been made on extend-
at least one case from the second group, unaccept- ing the existing framework on the multigroup
able solutions will result in MMD (all coefficients discriminant problem. E.-U. Choo and W.C.
in the discriminant function are zeros which leads Wedley [5], W. Gochet et al. [14], as well as
all the observations to be classified in the same J.M. Wilson [39] applied goal programming
group), while furthermore they showed that the so- approaches in multigroup discriminant prob-
lutions (discriminant functions) obtained through lems, but generally most of the studies in this
the MMD and the OSD models are not stable when field were focused on two-group discrimina-
the data are transformed (when there is a shift tion trying to extend the original goal pro-
from the origin). Except for these two problems, gramming models of Freed and Glover ([9],
many goal programming formulations were found [10]) in order to achieve higher classification
to suffer from two additional theoretical shortcom- accuracy and predicting ability.
ings [29]: 3) The models based on the goal programming
approach can be applied in any classification
a) they produce unbounded solutions, and problem with or without ordered classes.

503
Multicriteria sorting methods

Outranking Relations Approaches. In contrast to is based on the definition of a veto threshold vj(ri)
the goal programming approaches, outranking re- for criterion j and the profile ri. The veto threshold
lations procedures study the classification problem vj(ri) for criterion j defines the minimum accepted
on a completely different basis. The aim of such difference between the values of the profile ri and
procedures is not to develop a discriminant func- alternative a on the specific criterion so that we
tion (linear or nonlinear), but instead their aim is can say that they have totally different preference
to model the decision makers" preferences and de- according to criterion j.
velop a global preference model which can be used Let F(a, ri) be the set consisted of all criteria for
to assign the alternatives (observations) into the which the discordance index value is greater than
predefined classes. To achieve the classification of the value of global concordance index. For each af-
the alternatives some reference profiles are deter- firmation of the type: 'alternative a outranks pro-
mined which can be considered as representative file ri according to all criteria', the credibility in-
examples of each class. Through the comparison of dex as(a, ri) is calculated. If F(a, ri) is empty then
each alternative with these reference profiles the as(a, ri) = C(a, ri), otherwise the credibility index
classification of the alternatives is accomplished. is calculated as follows:
A representative example of MCDA sorting
method based on the outranking relations ap-
as(a r i ) - C(a ri)" H 1 - D ( ) _ j , ari,
,
proach is the E L E C T R E TRI method proposed jEF
by W. Yu [40]. The aim of E L E C T R E TRI is to If the value of the credibility index of the affirma-
provide a sorting of the alternatives under con- tion 'alternative a outranks profile ri according to
sideration into two or more ordered categories. In all criteria' exceeds a predefined cut-off value A,
order to define the categories E L E C T R E TRI uses then the proposition 'a outranks ri' can be consid-
some reference alternatives (reference profiles) ri, ered to be valid. Denoting the outranking relation
i = 1 , . . . , k - 1, which can be considered as fic- as S, the preference (P), indifference (I) and in-
titious alternatives different from the alternatives comparability (R) relations between alternative a
under consideration. The profile ri is the theoreti- and profile ri can be defined as follows:
cal limit between the categories Ci and Ci+l (C i+1
is preferred to Ci) and ri is strictly better than • aIri if and only if aSri and riSa;
ri-1 for each criterion. To provide a sorting of the • aPri if and only if aSri and no riSa;
alternatives in categories E L E C T R E TRI makes • riPa if and only if no aSri and riSa;
comparisons of each alternative with the profiles.
• aRri if and only if no aSri and no riSa.
For an alternative a and a profile ri the con-
cordance index cj(a, ri) is calculated. This index According to these relations two sorting proce-
expresses the strength of the affirmation 'alterna- dures are applied: the pessimistic and the opti-
tive a is at least as good as profile ri on criterion mistic one. The sorting procedure starts by com-
j'. In order to compare the alternative to a refer- paring alternative a to the worst profile rl and in
ence profile on the basis of more than one criteria, the case where aPrl, a is compared to the sec-
a global concordance index C(a, ri) is calculated. ond profile r2, etc., until one of the following two
This index expresses the strength of the affirma- situations appears:
tion 'a is at least as good as ri according to all i) aPri and ri+lPa or aIri+l;
criteria'. Setting wj as the weight of the criterion
ii) aPri and aRri+l, . . . , aRri+k, ri+k+lPa.
j, C(a, ri) is constructed as the weighted average
of all ci ( a, r i ). If situation i) appears, then alternative a is as-
In contrast to the concordance index, the dis- signed to category i + 1 by both pessimistic and
cordance index Dj(a, ri) expresses the strength of optimistic procedures. If situation ii) appears, then
the opposition to the affirmation 'alternative a is a is assigned to category i + 1 by the pessimistic
at least as good as profile ri according to crite- procedure and to category i + k + 1 by the opti-
rion gj'. The calculation of the discordance index mistic procedure.

504
Multicriteria sorting methods

It is clear that the E L E C T R E TRI method is is not constructed through a direct interrogation
a powerful tool for analyzing the decision maker's procedure between the decision analyst and the de-
preference in sorting problems involving multiple cision maker. Instead, decision instances (e.g. past
criteria where the classes are ordered. However, decisions) are used in order to analyze the deci-
the major drawback of the method is the signifi- sion policy of the decision maker, to specify his/her
cant amount of information that it requires by the preferences and construct the corresponding global
decision maker (weights of the criteria, preference preference model as consistently as possible.
and indifference thresholds, veto thresholds, etc.). A well known preference disaggregation method
This problem can be overcame using decision in- is the UTA method (UTilit@s Additives) proposed
stances (assignment examples) as proposed in [25]. in [17]. Given a predefined ranking of a reference
Other MCDA sorting methods based on the set of alternatives, the aim of the UTA method
outranking relations approach have been pro- is to construct a set of additive utility functions
posed in [24] (N-TOMIC method), [31] and the which are as consistent as possible with the pre-
P R O M E T H E E method as it has been modified in ordering of the alternatives (and consequently with
[19]. Furthermore, P. Perny [28] extended the ex- the decision maker's preferences). The form of the
isting framework of the sorting methods based on additive utility function is the following:
the outranking relations approach in the case in
which the groups are not ordered. More specifi- - Z (gJ),
J
cally, he proposed the construction of a fuzzy out-
ranking relation in order to estimate the member- where U(~) denotes the global utility of an alter-
ship of each alternative for each group, and sug- native described over a vector of criteria y, while
gested two assignment procedures: uj(gj) is the partial or marginal utility of an alter-
native on criterion gj.
a) filtering by strict preference (the assignment Except for the study of ranking problems, the
rule consists of testing whether an alterna- methodological framework of the preference dis-
tive is preferred or not to a reference profile aggregation approach using the UTA method is
reflecting the lower limit of a group), and also applicable in sorting problems. The UTADIS
b) filtering by indifference (the assignment rule method (VTilit@s Additives DIScriminantes) ([6],
consists of testing whether an alternative is [16], [17], [42]) is a representative example. In the
indifferent or not to a reference profile repre- UTADIS method, the sorting of the alternatives
senting a prototype of a group). is accomplished by comparing the global utility
(scores) of each alternative a, denoted as U(a),
Overall the main characteristics of sorting meth- with some thresholds ( u l , . . . , Uq-1) which distin-
ods based on the outranking relations approach guish the classes C 1 , . . . , C a (the classes are or-
of MCDA include their application to both sort- dered, so that C1 is the class of the best alterna-
ing (ordered classes) as well as discrimination tives and C a is the class of the worst alternatives).
(non ordered classes) problems, and the significant
amount of information that they require by the de- U(a) > a
cision maker. u2 <_ U(a) < ul a C2

Preference Disaggregation Approaches. The pref-


erence disaggregation approach refers to the analy- uk <_ U(a) < uk- a Ck
sis (disaggregation) of the global preferences of the
decision maker to deduce the relative importance
U(a) < a Cq.
of the evaluation criteria, using ordinal regression
techniques based mainly on linear programming The objective of the UTADIS method is to es-
formulations. timate an additive utility function and the utility
In contrast to the outranking relations approach thresholds in order to minimize the classification
the global preference model of the decision maker error. The classification error is measured through

505
Multicriteria sorting methods

two error functions denoted as a+(a) and a-(a), M.H.DIS (Multigroup Hierarchical DIScrimina-
representing the deviations of a misclassified alter- tion) and differs from most of the aforementioned
native from the utility threshold. The estimation MCDA approaches in two major aspects.
of both the additive utility model and the utility
thresholds is achieved through linear programming
1) It employs a hierarchical discrimination ap-
proach: the method does not aim on the
techniques ([6], [42]).
development of an overall global preference
See [7] and [41] for three variants of the UTADIS
model (discriminant function) which will
method to improve the classification accuracy of
characterize all the observations (alternatives
the obtained additive utility models as well as their
or objects). Instead the method is trying to
predicting ability. The first variant (UTADIS I) ex-
distinguish the groups progressively, starting
cept for the classification errors also incorporates
by discriminating the first group (best alter-
the distances of the correctly classified alternatives
natives) from all the others, and then pro-
from the utility thresholds which have to be max-
ceeding to the discrimination between the ob-
imized. The second variant (UTADIS II) is based
jects which belong to the other groups.
on a mixed integer programming formulation min-
imizing the number of misclassifications instead of 2) It accommodates three different discrimina-
their magnitude, while the third variant (UTADIS tion criteria in a very flexible and efficient
III) combines UTADIS I and II, and its aim is way. The most common discrimination crite-
to minimize the number of misclassifications and rion in the previous approaches is the min-
maximize the distances of the correctly classified imization of the classification error which is
alternatives from the utility thresholds. measured as the deviations of the scores of
Overall the main characteristics of the applica- the misclassified alternatives from some cut-
tion of the preference disaggregation approach in off points. However, such an objective does
the study of sorting problems, can be summarized not necessarily yield the optimal classifica-
in the following three aspects. tion rule. For instance, consider that in a dis-
1) The information that is required is minimal, crimination problem, three alternatives are
since, similarly to the goal programming ap- misclassified with the following deviations
proaches, only a predefined classification of a from the cut-off point: [0.25, 0.25, 0.25], with
the overall objective of minimizing the to-
reference set of alternatives is required.
tal classification error being 0.75. It is ob-
2) The preference disaggregation approach is fo-
vious, that this classification result is not op-
cused only on decision problems where the
timal, since a classification result [0, 0, 0.75]
classes are ordered, since it is assumed that
yields the same value for the overall classi-
there is a strict preference relation between
fication error (0.75), but there is only one
the classes.
misclassified alternative instead of three. Sev-
3) The classification/sorting models which are eral mixed integer programming formulations
developed have a nonlinear form, since the have been proposed to confront this issue,
marginal utilities of the evaluation criteria but their application in real world prob-
are piecewise linear and consequently the lems is prohibited by the significant amount
global utility model is also nonlinear, in con- of time required to solve such problems.
trast to the linear discriminant models used M.H.DIS employs an efficient mixed integer
in the goal programming approaches. programming (MIP) formulation for mini-
mizing the number of misclassifications, once
A Multigroup Hierarchical Discrimination the minimization of the classification error
M e t h o d . In this section a new method is pre- has been achieved. Furthermore, M.H.DIS
sented for the study of discrimination problems also considers a third criterion in order to
with two or more ordered groups (multigroup achieve the higher possible discrimination.
discrimination). The proposed method is called These three discrimination criteria have been

506
Multicriteria sorting methods

used in previous studies separately, or in hy- alternatives have been classified in the predefined
brid models ([12], [13]), but they have never classes.
been used through a sequential procedure. Throughout this hierarchical classification pro-
Instead, in M.H.DIS initially the classifica- cedure, it is assumed that the decision maker's
tion error is minimized. Then considering preferences are monotone functions (increasing or
only the misclassified alternatives M.H.DIS decreasing) on the criteria's scale. This assump-
tries to 're-arrange' their classification error tion implies that in the case of a criterion gi E G1,
in order to minimize the number of misclassi- as the evaluation of an alternative on this criterion
fications, and finally the maximum discrimi- increases, then the decision of classifying this al-
nation between the alternatives is attempted. ternative into a higher (better) class is more favor-
able to a decision of classifying the alternative into
a lower (worst) class. For instance, in the credit
Model Formulation. Let A = { a l , . . . , an} be a set
granting problem as the profitability of a firm in-
of n alternatives which should be classified into q
creases, the credit analyst will be more favorable in
ordered classes C 1 , . . . , C a. (C1 is preferred to C2,
classifying the firm as a healthy firm, rather than
C2 is preferred to C3, etc.) Each alternative is de-
classifying it as a risky one. A similar implication
scribed (evaluated) along a set G = {gl,... ,gin}
is also made for each criterion gi E G2.
of m evaluation criteria. The evaluation of each
This preference relation between the several
alternative a on criterion gi is denoted as gi(a).
possible decisions of classifying a specific alterna-
According to the set A of alternatives, Pi different
tive a into one of the predefined classes, imposes
values for each criterion gi can be distinguished.
the following general classification rule:
These Pi values are rank-ordered from the small-
est value g~ to the largest value g~/~. Furthermore, The decision concerning the classifica-
among the set of criteria it is possible to distin- tion of an alternative a into one of the
guish two subsets: a subset G1 consisting of ml cri- predefined classes should be made in
teria for which higher values indicate higher pref- such a way that the utility (value) of
erence, and a second subset G2 consisting of m2 such a decision for the decision maker
criteria for which the decision maker's preference is maximized.
is a decreasing function of the criterion's scale. For The utility of a decision concerning the classifi-
instance, in an investment decision problem G1 cation of an alternative a into group Cj can be
may include criteria related to the return of an in- expressed in the form of additive utility function:
vestment project (projects with higher return are m

preferred), while G2 may include criteria related U Cj ( a ) - ~ u Cj [gi(a)] E [0, 1],


to the risk of the investment (projects with lower i--1
risk are preferred).
where uC~[gi(a)] denotes the marginal (partial)
The Hierarchical Discrimination Process. The utility of the decision concerning the classification
method proceeds progressively in the classifica- of an alternative a into group Cj according to cri-
tion of the alternatives into the predefined classes, terion gi. If gi E G1, then u C~(gi) will be an in-
starting from class C1 (best alternatives). Initially, creasing function on the criterion's scale. On the
the aim is to identify which alternatives belong in contrary, the marginal utility of a criterion gi E G2
class C1. The alternatives which are found to be- regarding the classification of an alternative into a
long in class C1 (either correctly or incorrectly) lower (worse) class Ck (k > j) will be a decreasing
are excluded from further consideration. In a sec- function on the criterion's scale. For instance, con-
ond stage the objective is to identify which alter- sider once again the credit granting problem: since
natives belong in class C2. The alternatives which healthy firms are generally characterized by high
are found to belong in this class (either correctly profitability, the marginal utility for a profitability
or incorrectly) are excluded from further consider- criterion for the group of healthy firms will be an
ation, and the same procedure continues until all increasing function, indicating that as profitability

507
Multicriteria sorting methods

increases the preference of decision concerning the archical discrimination procedure, two linear pro-
classification of a firm in the group of healthy firms grams and one mixed integer program are solved
in also increasing. On the other hand, for the group to estimate 'optimally' the two utility functions.
of risky firms the marginal utility will be a decreas-
LPI: Minimizing the Overall Classification Error.
ing function of the criterion's (profitability) values,
According to the classification rule (1), to achieve
indicating that as profitability increases the pref-
the correct classification of an alternative a E Ck
erence of the decision concerning the classification
at stage k (cf. Fig. 1), the estimated utility func-
of a firm in the group of risky firms is decreasing.
tions should satisfy the following constraint:
Consequently, at each stage of the hierarchical
uC~ (~) > u-C~ (~).
classification procedure that was described above,
two utility functions are constructed. The first one Since, in linear programming it is not possible to
corresponds to the utility of a decision concerning use strict inequality constraints, a small positive
the classification of an alternative a into class Ck real number s may be used as follows:
(denoted as UCk(a)), while the second one corre- u c~ (~) - u-C~ (~) > ~.
sponds to the utility of a decision concerning the
nonclassification of an alternative a into class Ck If for an alternative a 6 Ok the classification rule at
(denoted as U-Ck(a)). Based on these two utility stage k yields UCk(a) < u-Ck(a), then this alter-
functions the aforementioned general classification native is misclassified, since it should be classified
rule can be expressed as follows: in one of the lower classes (the specific classifica-
tion of the alternative will be determined in the
if U Ck (a) > U -Ck (a), t h e n a 6 Ck, (1) next stages of the hierarchical discrimination pro-
if U ck (a) < U -Ck (a), then a ~ Ck. cess). The classification error in this case is:
Following this rule, the overall hierarchical dis- ~(~) = v-C~ (~) - uCk (~) + ~.
crimination procedure is presented in Fig. 1.
Similarly, to achieve the correct classification of
comider-~, fl an alternative b ¢_ Ck at stage k, the estimated
.w-
(U~(.)>U'¢~(.)) utility functions should satisfy the following con-

•. I
A ~,o
straint:
U -ck (b) - U ck (b) >_ s.
(u~c°)>~u'~c°)) If this constraint is not satisfied for an alterna-
¥,, /~ No
tive b ~_ Ck at stage k, then this fact implies that
•.~. ) C -.q ) this alternative should be classified in class Ck
( u~(o),u-~.(., ) and the classification error in this case is e(b) =
Yes .,1, No
u c~ (b) - u -c~ (b) + ~.
(o.C,)(o.C,)
t !
1
Moreover, to achieve the monotonicity of the
marginal utilities, the following constraints are im-
k f posed:
c ~ (g~) _ 0
Fig. 1" The hierarchical classification procedure.
-c~ ( ~ , ) = 0
ifg, 6 G1 ~uCk(g~+l) > uCk(g{) (2)
Estimation of Utility Functions. According to the
hierarchical discrimination procedure which was u~-Ck (gj+l
~ )<u=, c~(g~)
j
described above, to achieve the classification of the
alternatives in q classes, the number of utility func-
tions which must be estimated is 2 ( q - 1). The u:, c~ (g~ ) - o (3)
estimation of these utility functions in M.H.DIS if g, 6 G2 uCk (g~+l) < uCk (g~)
is accomplished through linear programming tech- U;Ck (g{+l) > u;Ck (g{)
niques. More specifically, at each stage of the hier-

508
Multicriteria sorting methods

where ~ and oi o j +1 are two consecutive values of of the classification errors which may lead to the
criterion gi (gj+l
i > gij for all gi 6 G) . These con- reduction of the number of misclassifications.
straints can be simplified by setting: In M.H.DIS this is achieved through a mixed in-
teger programming (MIP) formulation. However,
J
Wij,j+l = Ui i ) -- (gi ) since MIP formulations are difficult to solve, espe-
if gi 6 G1 ~ , j-ck
,~+~ = u~- c k ( g { ) - u ; c~ (9 j~+ 1)
cially in cases where the number or integer or bi-
(4) nary variables is large, the MIP formulation used
Ok Ck j
Wij,j+l -- Ui (9i) --
u/Ck(,.qj+l
i )
in M.H.DIS considers only the misclassifications
if gi 6 G2 -Ck --Ck ~j+l j occurred by solving (LP1), while retaining all the
~j,j+l = u~ (~ ) _ ~;c~(a) correct classifications. Let C be the set of alter-
(5) natives which have been correctly classified after
solving (LP1), and M be the set of misclassified
The marginal utility of criterion gi at point g{
alternatives for which e(a) > 0. The MIP formu-
can then be calculated through the following for-
lation used in M.H.DIS is the following (LP2):
mulas:
j-1 pi-1 min
uC"g j ' - z., c, j -Ck
• Wil,l+l ~ Iti (gi ) -- E Wil,l+l" aEA
t=~ l=j v c~ (~) - u - c k (a) > ~,
s.t.
(~)
Va e C~ n C,
Using these transformations, constraints (2) and U -Ck (b) - U ck (b) >_ s,
(3) can be rewritten as follows (a small positive
Vbf! C k , b 6 C,
number t is used to ensure the strict inequality)"
U ck (a) - U -ck (a) + I(a) >_ s,
Ck > t, --Ck
Wij,j+l _ Wij,j+l k t, Vgi.
Va e Ok n M,
Consequently, the initial linear program (LP1) u - c k (b) - u c, (b) + Z(~) > ~,
to be solved can be formulated as follows: Vb C_ Ck,b 6 M ,
min F - E e(a) Ck > t
Wij,j+l --
aEA -Ck
u c, (a) - u -c, (~) + ~(~) > ~, Wij,j+l k
s.t.
Wij,j+l ---1
V a e C~, i j
U -ck (b) - U Ck (b) + e(b) >_ s, Wij,j+l - 1
- -

Vb f~ Ck, i j
Ck >t s, t, I (a) integer.
Wij,j+l --
--Ck _
Wij,j+l > t The first set of constraints is used to ensure that
-1
Wij,j+l -- all the correct classifications achieved by solving
i j (LP1) are retained. The second set of constraints
_c, is used only for the alternatives which were mis-
Wij,j+l -- 1
i j classified by (LP 1). Their meaning is similar to the
e ( a ) , s , t > O. constraints in LP1, with the only difference be-
ing the transformation of the continuous variables
LP2: Minimizing the Number of Misclassifica- e(a) of LP1 (classification errors) into integer vari-
tions. If after the solution of (LP1), there exist ables I(a) which indicate whether an alternative is
some alternatives a 6 A for which e(a) > 0, then misclassified or not. The meaning of the final two
obviously these alternatives are misclassified. How- constraints has already been illustrated in the dis-
ever, as it has been already illustrated during the cussion of the LP1 formulation. The objective of
discussion of the main characteristics of M.H.DIS, LP2 is to minimize the number of misclassifica-
it may be possible to achieve a 're-arrangement' tions occurred through the solution of LP1.

509
Multicriteria sorting methods

LP3: Maximizing the M i n i m u m Distance. Solving evaluation criteria [25] for which higher values are
LP1 and LP2 the 'optimal' classification of the al- preferred. The alternatives must be classified in
ternatives has been achieved, where the term 'op- three ordered classes. Table 1, illustrates the eval-
timal' refers to the minimization of the number uation of the alternatives on the criteria as well as
of misclassified alternatives. However, the correct the predefined classification.
classification of some alternatives may have been gl g: ga Class
'marginal', that is although they are correctly clas- al 70 64.75 46.25 C1
sified, their global utilities according to the two a2 61 62 60 C1
utility functions developed may have been very a3 40 50 37 C:
a4 66 40 23.125 C:
close. The objective of LP3 is to maximize the min-
a5 20 20 20 (73
imum difference between the global utilities of the a8 15 15 30 (273
correctly classified alternatives achieved according
to the two utility functions. Table 1: Data of the illustrative example (Source: [25]).

Similarly to LP2, let C be the set of alternatives Distinguishing between C1 and C2-C3
which have been correctly classified after solving In the first stage of the hierarchical discrimina-
LP1 and LP2, and M be the set of misclassified tion procedure, the aim is to distinguish the alter-
alternatives. LP3 can be formulated as follows: natives belonging in class C1 from the alternatives
belonging in classes C2 and C3. To achieve this
max d classification two utility functions are developed,
s.t. U Ck (a) - U -Ck (a) - d >__s, denoted as U C~ (a) and U -C1 (a).
Va 6 Ck N C, The utility of the decision of classifying the al-
U -Ck (b) - U Ck (b) - d > s, ternative al in class C1 can be expressed as follows:

Vb f! Ck, b E C, u cl - u (70) (7)


U Ck (a) - U -C~ (a) >_ s, + u C1 (64.75) + u C1 (46.25).
VaECkNM, Since for all criteria higher values are preferred,
U -Ck (b) - U Ck (b) > s, it is possible to define the following rank-order on
Vb ~ Ck, b E M, each criterion's scale (Pl = p2 = P3 = 6).
ck >t
W i j , j + l -- g~) g 1 _ 1 5 < 2 0 < 4 0 < 6 1 < 6 6 < 7 0 _ ~ ;
-Ck > t
W i j , j + l -- 92) g1-15<20<40<50<62<64.75-9~22;
EEc W i j , j + l = 1 g3) g~ - 20 < 23.125 < 30 < 37 < 46.25 < 6 0 -
~3.
i j
EE-cWij,j+l-_ 1 According to relation (4), the following transfor-
i j
mations are then applied (criterion gl):
d , s , t > O.
c~ _ uC~ uC~ ,
w11,2 (20) - (15)
The first set of constraints involves only the cor-
12,3
Cl-uf (a0) -
uCl ( 2 0 ) ,
rectly classified alternatives. In these constraints
4C1 __ u C 1 (61) - u C~(40)
W13,
d represents the minimum absolute difference be-
tween the global utilities of each alternative in the we1 = uC1(66) - uGh(61),
14,5
two utility functions. The second set of constraints 15, - u,
Cl ( 7 0 ) - u c` .
involves the misclassified alternatives and it is used
to ensure that they will be retained as misclassi- The same transformations are also applied to
fled. criteria g2 and g3. Then, according to (6), relation
(7) can be re-written in the following way:
A n illustrative Example. To illustrate the appli-
u cl(a) Cl wc' wCl wCl)
cation of the method, consider a simple example (Wll,2 ~
-- 12,3 ~- 13,4 + 14,5 + 15,6

consisting of six alternatives evaluated along three C1 C1 C1 C1 C1 )


-+'(W21,2 -~- W22,3 -I- W23,4 -+- W24,5 27 W25,6

510
Multicriteria sorting methods

C1 we1 we1 we1


-[- (W31,2 -[- 32,3 -1- 33,4 -~- 34,5)" + u 2 c* (40) + u-~ c~ (23.125)

On the other hand, if al is classified in class C2


then the utility of the decision maker will be: u_C,(~4) _ (~-c, )
U -C1 (al) - Ul C1(70) -C1 w-C1 w-C1
+(w23,4 + 24,5 + 25,6)
+ u 2 C~ (64.75) + u ; C' (46.25) -C1
+(w32,3 +
w-C1 w-el w-C1
33,4 + 34,5 + 35,6).
9
U -C1 (el) - w-C1
35,6" • Alternative a5:
Following the same methodology, the utilities
concerning the classification of the rest of the al- u-C,(~) - ~ ( 2 0 ) + ~c~(20) + ~ ( 2 0 )
ternatives are also formulated.
• Alternative a2" u c' (a~) - (~c 5) + ( c~
W21,2)'
u-C~(a2) - uC~ (61) + uC'(62) + uC~ (60)
u -c~(~5) - u ~ c~ (20)
+u~ c~ (20) + u-;c~(20)
uC~(a2) _ (~c~ ~c~ wc~
~,2+ 12,3+ ~3,4)
+( CI Ci Ci C,
w21,2 + w22,3 + w23,4 + w24,5) u-C~(as)
C1 we1 we1 we1
-[- (W31,2 -+- 32,3 + 33,4 + 34,5 + 35,6),
we1
-C1 w-C1 w-C1 w-C1
--(W12,3 -4- 13,4 -5 14,5 -t- 15,6)
U -c* (a2) - u-[ c' (61) + u2 C* (62) + u3 C~ (60) w-el w-el w-el w-el
+( 22,a + 23,4 + 24,5 + 25,6)
-C1 w-el w-el w-C1 w-C1
-t-(W31,2 ~- 32,3 -+- 33,4 + 34,5 nt- 35,6)"
u-C~(a2) -- (W14,5
-c~ -t- w-c~ -c~
15,6) + (W25,6)"
• Alternative a3:
• Alternative a6:
u-C~(a3) - uC, (40) + uCx (50) + uC~ (37)

u - C , ( a 6 ) - uC,(15) + uGh(15) + uGh(30)


uCl(a3) _ (Wll,2 c1 -4- w~21,3)
"~
c1 c1 c1 uC,(a~)_ (@,~ + ~c,
-[- (W21,2 -+- W22,3 -~- W23,4) 32,3)'
C1 wC1 ' wCI
-+-(W31,2 "4- 32,3 -~- 33,4), U -c~ (a6) - u[-C~(15)
u-C~(a3) - ui-C~ (40) + u2C~(50) + u3C~ (37) +u~-C~ (15) + u~-C~(15)
¢
U-c1 (a3) --(W13,4
- c l -4- w 14,5
- e l -+- w 15,6)
-el u-C~(a6)
-C~ w-C~ -C~ w-C~ -el w-C1 w-C1 w-C1 w-C1
-~-(W24,5 -[- 25,6 ) -[- (W34,5 -[- 35,6 )" = (w11,2 + 12,3 + 13,4 + 14,5 + 15,6)
• Alternative a4" -c~ w-C~ + w-C1 + w-C1 + w-C1 )
-4-(W21,2 -+- 22,3 23,4 24,5 25,6
u-C*(a4) - u C* (66)
+( w-el w-C1 w-el
33,4 + 34,5 + 35,6).
+u C' (40) + u C' (23.125)
According to these expressions of the global util-
U cl (a4) - cl ity of the decision to classify an alternative into
(Wll,2 "4- wCl
12,3 -[- wC1
13,4 + wCl
14,5)
C1 C1 we1 class C1 or into one of the classes C2 and C3, the
+(W21,2 -+- W22,3 ) -t- (31,2), LP1 formulation is used to minimize the classifi-
u-C~(a4) - u { c' (66) cation error (s = 0.001, t = 0.0001).

511
Multicriteria sorting methods

min F = e ( a l ) --[- e(a2) -q--e(a3) q- e(a4) -ul-C~(40) -


w-el
13,4 -[-
w 14,5
--
- C 1 + w-C1
15,6

+~(~) + ~(~6) 0.18521,


- uC~(61) - w 11,2 C~
-{-w 12,3
C~ C~ - 0.09892,
~- w 13,4
s.t. U C~ ( a l ) - U -C1 ( a l ) -[- e ( a l ) _) 0.001
- Ul
-C1 ( 6 1 ) - w-C*~4,5 + w -1C5 , 16 = 0.11114,
uCx(a2) - u-C*(a2) + e(a2) > 0.001
- u C1(66)-w C* c, we, +wC, _
11,2 -[- W12,3 -[- 13,4 14,5 --
U-el(a3) -- uCl(a3) ~- ~(a3) > 0.001
0.09902,
U - c l (a4) -- U C~ (a4) -[- e(a4) ~ 0.001 - u I-c~ ( 6 6 ) - w -C~- 0.07406
15,6
u-C*(a5) - uC*(a5) + e(a5) >__0.001 -@(7o)-~ci ~c~ ~c~ ~c~
11,2 -~- 12,3 -~- 13,4 -~- 14,5 -~-
U -c* (a6) - U C~ (a6) + e(a6) > 0.001 w 15,6
c~ - 0
"1 9 7 7 3 '
-el
c~
Wij,j+l >
-- 0.0001, -c1 >
Wij,j+l -- 0.0001 , - uI (70) - 0;
3 5

Wij,j+l -- 1, • Criterion g2"


i=1 j = l - uGh(15) - 0,
3 5
- u 2 C~(15) - w -C~
Wij,j+l -- 1, 21,2 -~- w 22,3
-C~
-F- w 23,4
-C~
-[-
i=1 j = l w -C~ w - C ~ - 0.33333,
24,5 -~ 25,6
Vi=1,2,3, Vj=l,...,6, _ uC, (20) _ w21, 2C, _ 0.0001,
- u 2 C~(20) - w 22,3
-C~ if" W23,4
- c ~ -~- w -24,5
C , nu
~(~ ), ~(~2), ~(~) >__0,
-c~ _ 0.29625
W25,6
, e ( a 4 ) , e(a 5), e(a6) ~ 0.
-
~'(4o) - ~ c~
w21,2 + w22,3 _ o.ooo2,
u ~ C~(40) - w -C~ w -C~ w -C~ -
The obtained solution is presented in Table 2. -

23,4 -~" 24,5 "q- 25,6 --


0.25917,
0.00010 W15721 0.03708
-- ~ ' ( ~ 0 ) - ~ -[- W22,3
W21,2 ~ -1-W23,4
~-0.0989~
w': ~,3 0.00010 wz2,c31 0.03708
W': 31,4 0.09872 w,-~,c4~ 0.07406 - u 2 C~ ( 5 0 ) - w-C'24,5 + w-C~25,6 - 0.18511

w': 41,~ 0.00010 w14C51 0.03708 uC,(62)_ C~ wC, C, C,_


-- W21,2 -[- 22,3 -[- W23,4 + W24,5 --
I ;1
Wl 5,6 0.09872 w~.~~ 0.07406 0.23462,
0.00010 w 2-~,c~ 0.03708 - u ~ C~ (62) - w 25,6
- C ~ - 0.07406
0.00010 W22731 0.03708
U1 -- u2
c* (64.75) - c~
W21,2-1- ~w22,3
c, ~_o..c, cx
n--w23,4 -[-W24,5-[-
W23 4 0.09872 W23741 0.07406
W24 5 0.13570 w24C~ 0.11104 c~
W25,6 - 0.33333
W25,6 0.09872 w25,c~ 0.07406 - u 2 C~ ( 6 4 . 7 5 ) - O;
w~C~ 0.00010 w3~,c~ 0.03708
ci'
W32 3 0.09872 w3~,c3~ 0.07406
• Criterion g3"
W33,4 0.09872 w a-~,~4~ 0.07406
C1 u~(20) o,
W34 5 0.13570 w34,c~ 0.11104 - -

- u 3 C~(20) -c, w -C*


W35~6 0.13570 w35~1 0.11104 -
31,2 -q- W32,3 + w -33,4
C,
+
Table 2: Results obtained through the solution of LP1. - c , w -
W34,5 q- 35,6C * - 0.40730,
- u C*(23.125) - w 31,2
c* - 0.0001,
According to this solution, the marginal utilities - u3 C~(23.125) - w 32,3- C * -~- w 33,4
- C * -~- w 34,5
- C * -l-
are calculated.
- c , _ 0.37021
W35,6
• Criterion gl" - u C* (30) - we'31,2 -[- wC132,3 - 0 . 0 9 8 8 2
- uC*(15) - 0, u 3 C~(30) - w -C~ w -C~ w -C~ -
33,4 -I- 34,5 -}- 35,6
-

--
- U l C~(15) - w -C* w -C* w -C* 0.29615,
11,2 -q- 12,3 -q- 13,4 -l-
w -14,5
C, - c ~ _ 0.25937,
if- W15,6 - u C~(37) - w 31,2C~
~- w 32,3
C~ c,
~-w33,4 _ 0.19753

- u C~(20) - - w C*
11,2 - 0.0001 -- u 3 C~ ( 3 7 ) - W34,5 - C ' - 0.22209
- c ~ -[- w 35,6

- -el
u 1 (20) - w 12,3
-C~ -1- w-C1 w-C1 - u C* (46.25) - w 31,2 e* w e* w c~
13,4 -q- 14,5 -~- -[- 32,3 -q- 33,4 q-
w 15,6
-C~ - 0.22229 w 34,5
C~ - 0.33323
- ~(4o) = ~c~ c~ _ o.ooo~
11,2 -[- W12,3 - u3 C~(46.25) - w 35,6
-C~ - 0.11104,

512
Multicriteria sorting methods

31,2 -j- 32,3-~- 33,4-~- 34,5+ max


c~ - 0.46893,
w 35,6 s.t. U cl (al) - U - e l (al) - d > 0.001
- u 3 cl (60) -- 0; U C~ (a2) - U -c~ (a2) - d > 0.001
U -C~ (a3) - U C~ (a3) - d > 0.001
According to these marginal utilities, the global U -C~ (a4) - U C~ (a4) - d > 0.001
utilities are calculated based on the expressions
U -c~ (a5) - U C~ (a5) - d >_ 0.001
that have already been presented. Table 3, illus-
trates the obtained global utilities according to the U -C~ (a6) - U Cx (a6) - d > 0.001
two utility functions that were developed. W i c~
j,j+l >
_ 0.0001, W i -c~
j,j+l ~_ 0 . 0 0 0 1
3 5

u c'(a) u (a) Wij,j+l -- ,


i=1 j = l
al 0.8643 0.1110 3 5
a2 0.8025 0.1852 -- -1
Wij,j+l ,
a3 0.2967 0.5924
i j=l
a4 0.0993 0.7034
a5 0.0002 0.9258
Vi-1,2,3, Vj-1,...,6,
a6 0.0988 0.8889 d>O.
Table 3: Global utilities obtained through the solution of
LP1 (stage 1).
According to the obtained solution and follow-
u (a) u (a) ing the same procedure for calculating the mar-
al 0.9985 0.0001 ginal utilities, the global utilities of Table 4 are
a2 0.9987 0.0003 obtained. Obviously, this new solution provides a
a3 0.0008 0.9992
better discrimination of the alternatives, compared
a4 0.0009 0.9993
a5 0.0002 0.9998
to the initial solution obtained by LP1.
a6 0.0002 0.9998 Distinguishing between C2 and C3
After the solution of LP3, the first stage of the
Table 4: Global utilities obtained through the solution of
LP3 (stage 1). hierarchical discrimination process is completed,
with the correct classification of al and a2 in class
C1. Consequently, these two alternatives are ex-
It is clear that a l and a2 are classified in class C1,
cluded from further consideration (second stage).
since the global utility of a decision concerning the
In the second stage, the aim is to determine the
classification of these two alternatives in class C1
specific classification of the alternatives a3, a4, a5
is greater than the utility concerning their classi-
and a6. The following rank-order is defined on the
fication in classes 6'2 or C3. Similarly, alternatives
scale of the three evaluation criteria (Pl = P2 =
a3, a4, a5 and a6 are not classified in class C1, but
p3 = 4).
instead they belong in one of the classes C2 or C3
(their specific classification will be determined in
the next stage of the hierarchical discrimination gl) gl1 - 1 5 < 2 0 < 4 0 < 6 6 - g ~ 1 ~ ;
process).
Since the correct discrimination between the al- g2) g 2 1 - 1 5 < 2 0 < 4 0 < 5 0 - ~ 2 ;
ternatives belonging in class C1 and the alterna-
tive not belonging in this class has been achieved g3) - 20 < 2 3 . 1 2 5 < 30 < 3 7 -
through LP1, it is not necessary to proceed in LP2
(minimization of the number of misclassifications). Then, following the procedure illustrated in the
Hence, the procedure proceeds in the formulation previous stage, the variables Wij,j+lc~ and Wij,j+l
-c~ are
and solution of LP3 in order to achieve the higher formulated, and the new form of the LP1 problem
possible discrimination: is the following (s - 0 . 0 0 1 , t - 0.0001):

513
Multicriteria sorting methods

u c:(a) u -c:(a)
min F - e(a3) + e(a4) + e(a5) ~- e(a6)
a3 0.9999 0.0005
s.t. U C2 (a3) - U -c2 (a3) + e(a3) > 0.001
a4 0.9997 0.0003
uC2(a4) - u - C 2 ( a 4 ) + e(a4) _> 0.001 a5 0.0002 0.9996
.a6 0.0005 0.7949
u-C2(a5) - uC:(a5) + e(a5) > 0.001
u - C 2 ( a 6 ) - uC2(a6) + e(a6) _ 0.001 Table 6: Global utilities obtained through the solution of
LP3 (stage 2).
W ic2
j,j+l >
-- 0.0001, W i-j ,cj :+ l -- 0.0001
>
3 3 In this point the hierarchical discrimination pro-
~-~-~ C2 -1 cedure ends, since all the alternatives have been
W i j , j + l -- ,
i=1 j = l classified in the three predefined classes. Moreover,
3 3
-c2 -- 1, this classification is correct. In particular, in stage
~-~ ~-~ Wij,j+l
1 a l and a2 have been correctly classified in class
i j--1
C1, while in stage 2 a3 and a4 have been correctly
Vi-1,2,3, Vj=I,...,4,
classified in class C2, and a5 and a6 have been clas-
e(a3), e(a4), e(a5), e(a6) 2 0. sified into the final class C3 (cf. Table 6).
Table 5 presents the global utilities of the alter-
natives according to the solution obtained by LP1 C o n c l u d i n g R e m a r k s A n d Future P e r s p e c -
in this second stage. tives. The focal point of interest in this article
u c2(~) u-c2(~) was the application of MCDA in the study of
sorting or more generally discrimination (classi-
a3 0.8944 0.1000
a4 0.7333 0.2501 fication) problems. Such types of problems have
a5 0.2111 0.8000 major practical interest in several fields includ-
a6 0.1612 0.7500 ing finance, environmental and energy policy and
Table 5" Global utilities obtained through the solution of planning, marketing, medical diagnosis, robotics
LP1 (stage 2). (pattern recognition), etc. The multivariate statis-
tical classification techniques have been used for
The alternatives are correctly classified in their
decades to study such problems. However, their in-
original classes, and therefore, it is not necessary
ability to provide a realistic and flexible approach
to proceed with LP2 (similarly to the first stage).
to support real world decision making problems in
Instead, the method proceeds in solving LP3 to
situations where classification is required, led oper-
achieve better discrimination of the alternatives.
ational researchers, management scientists as well
max d as practitioners towards the exploitation of the re-
s.t. U C2 (a3) - U -C2 (a3) - d > 0.001 cent advances in the fields of operations research,
U C2 (a4) - U -c2 (a4) - d > 0.001
management science, and artificial intelligence.
Among these 'alternative' approaches for the
U -C2 (a5) - U C2 (a5) - d > O.O01
study of classification problem, MCDA provides an
u-C2(a6) - uC2(a6) - d > 0.001 arsenal of tools and methods to develop classifica-
W ic2
j,j+l >
-- 0.0001, W i-j c, j2+ l -- 0.0001 ,
> tion (sorting) models within a realistic and flexi-
3 3 ble context. This article outlined the main MCDA
Wij,j+l = 1 classification techniques, both from the specific
i=1 j = l
3 3 type of classification problems that they address
Wij,j+l = 1,
(ordered or non-ordered classes), as well as from
i=1 j=~ the MCDA approach that they employ (goal pro-
Vi-1,2,3, Vj=l,...,4, gramming, outranking relations, preference disag-
d>0. gregation).
Furthermore, a new MCDA approach has been
Table 6 presents the global utilities calculated proposed. The M.H.DIS method, extends the com-
according to the solution of LP3. mon two-group classification framework, through a

514
Multicriteria sorting methods

hierarchical multigroup discrimination procedure, References


taking into account three main discrimination cri-
[1] ABAD, P.L., AND BANKS, W.J.: 'New LP based heuris-
teria through a sequential process. In this way the
tics for the classification problem', Europ. J. Oper. Res.
classification problem is studied globally, in order 67 (1993), 88-100.
to achieved the higher possible classification ac- [2] BAJGIER, S.M., AND HILL, A.V.: 'A comparison of
curacy. Except for the illustrative example used statistical and linear programming approaches to the
in this paper, the M.H.DIS method has already discriminant problem', Decision Sci. 13 (1982), 604-
been used in several financial classification prob- 618.
[3] BANKS, W.J., AND ABAD, P.L.: 'An efficient optimal
lems, including the evaluation of bankruptcy risk,
solution algorithm for the classification problem', De-
portfolio selection and management, the evalua- cision Sci. 22 (1991), 1008-1023.
tion of bank branches efficiency, the assessment of [4] CHARNES, A., AND COOPER, W.W.: Managem. mod-
country risk, company mergers and acquisitions, els and industrial applications of linear programming,
etc. [43], providing very encouraging results com- Wiley, 1961.
[5] CHOO, E.-U., AND WEDLEY, W.C.: 'Optimal criterion
pared to well known statistical techniques (dis-
weights in repetitive multicriteria decision-making', J.
criminant analysis, logit and probit analysis), and Oper. Res. Soc. 36, no. 11 (1985), 983-992.
MCDA preference disaggregation techniques (fam- [6] DEVAUD, J.M., GROUSSAUD, G., AND JACQUET-
ily of UTADIS methods). LAGR~ZE, E." 'UTADIS: Une m~thode de construction
An interesting further research direction would de fonctions d'utilit~ additives rendant compte de juge-
ments globaux', Europ. Working Group on Multicrite-
be the exploration of a possible combination of
ria Decision Aid, Bochum (1980).
M.H.DIS with artificial intelligence techniques [7] DOUMPOS, M., AND ZOPOUNIDIS, C.: 'The use of the
such as fuzzy sets, in order to consider the fuzzi- preference disaggregation analysis in the assessment of
ness which may exist on the evaluation of alterna- financial risks', Fuzzy Economic Rev. 3, no. 1 (1998),
tives on each evaluation criterion, or on the classi- 39-57.
fication of the alternatives. [8] FISHER, R.A.: 'The use of multiple measurements in
taxonomic problems', Ann. Eugenics 7 (1936), 179-
See also: M u l t i - o b j e c t i v e optimization: 188.
Pareto optimal solutions, properties; Multi- [9] FREED, N., AND GLOVER, F.: 'A linear programming
objective optimization: Interactive meth- approach to the discriminant problem', Decision Sci.
ods for p r e f e r e n c e v a l u e f u n c t i o n s ; M u l t i - 12 (1981 ), 68-74.
objective optimization: Lagrange dual- [10] FREED, N., AND GLOVER, F.: 'Simple but powerful
goal programming models for discriminant problems',
ity; M u l t i - o b j e c t i v e o p t i m i z a t i o n : I n t e r a c -
Europ. J. Oper. Res. 7 (1981), 44-60.
t i o n of d e s i g n a n d control; O u t r a n k i n g [11] FREED, N., AND GLOVER, F.: 'Evaluating alternative
methods; Preference disaggregation; Fuzzy linear programming models to solve the two-group dis-
multi-objective linear programming; Multi- criminant problem', Decision Sci. 17 (1986), 151-162.
objective o p t i m i z a t i o n and decision sup- [12] GLOVER, F.: 'Improved linear programming models for
discriminant analysis', Decision Sci. 21 (1990), 771-
p o r t s y s t e m s ; P r e f e r e n c e d i s a g g r e g a t i o n ap-
785.
p r o a c h : B a s i c features~ e x a m p l e s f r o m fi- [13] GLOVER, F., KEENE, S., AND DUEA, B.: 'A new class
nancial decision making; Preference model- of models for the discriminant problem', Decision Sci.
ing; M u l t i p l e o b j e c t i v e p r o g r a m m i n g sup- 19 (1988), 269-280.
port; Multi-objective integer linear pro- [14] GOCHET, W., STAM, A., SRINIVASAN, V., AND CHEN,
gramming; Multi-objective combinatorial S.: 'Multigroup discriminant analysis using linear pro-
gramming', Oper. Res. 45, no. 2 (1997), 213-225.
optimization; Bi-objective assignment prob-
[15] HORSKY, D., AND RAO, M.R.: 'Estimation of attribute
lem; E s t i m a t i n g d a t a for m u l t i c r i t e r i a deci- weights from preference comparisons', Managem. Sci.
sion m a k i n g p r o b l e m s : O p t i m i z a t i o n t e c h - 30, no. 7 (1984), 801-822.
niques; F i n a n c i a l a p p l i c a t i o n s of m u l t i c r i t e - [16] JACQUET-LAGREZE, E." 'An application of the UTA
r i a analysis; P o r t f o l i o s e l e c t i o n a n d m u l t i - discriminant model for the evaluation of R&D
projects', in P.M. PARDALOS,Y. SISKOS,AND C. ZO-
c r i t e r i a analysis; D e c i s i o n s u p p o r t s y s t e m s
POUNIDIS (eds.): Advances in Multicriteria Analysis,
with multiple criteria. Kluwer Acad. Publ., 1995, pp. 203-211.
[17] JACQUET-LAGREZE, E., AND SISKOS, Y.: 'Assessing a

515
Multicriteria sorting methods

set of additive utility functions for multicriteria deci- mixed-integer programming discriminant model', Man -~
sion making, the UTA method', Europ. J. Oper. Res. agerial and Decision Economics 11 (1990), 255-266.
10 (1982), 151-164. [34] SMITH, C.: 'Some examples of discrimination', Ann.
[ls] JOACHIMSTHALER, E.A., AND SWAM, A.: 'Four ap- Eugenics 13 (1947), 272-282.
proaches to the classification problem in discriminant [35] SRINIVASAN, V., AND SHOCKER, A.D.: 'Estimating the
analysis: An experimental study', Decision Sci. 19 weights for multiple attributes in a composite criterion
(I 988), 322-333. using pairwise judgements', Psychometrika 38, no. 4
[19] KHOURY, N.T., AND MARTEL, J.M.: 'The relationship (1973), 473-493.
between risk-return characteristics of mutual funds and [36] SRINIVASAN, V., AND SHOCKER, A.D.: 'Linear pro-
their size', Finance 11, no. 2 (1990), 67-82. gramming techniques for multidimensional analysis of
[2o] KOEHLER, G.J., AND ERENGUC, S.S.: 'Minimizing preferences', Psychometrika 38, no. 3 (1973), 337-396.
misclassifications in linear discriminant analysis', De- [37] STAM, A., AND JOACHIMSTHALER, E.A.: 'Solving the
cision Sci. 21 (1990), 63-85. classification problem via linear and nonlinear pro-
[21] MANGASARIAN, O.L.: 'Multisurface method for patter gramming methods', Decision Sci. 20 (1989), 285-293.
separation', IEEE Trans. Inform. Theory IT- 14, no. 6 [3s] STAM, A., AND JOACHIMSTHALER, E.A.: 'A compar-
(1968), 801-807. ison of a robust mixed-integer approach to existing
[22] MARKOWSKI, C.A., AND MARKOWSKI, E.P.: 'An ex- methods for establishing classification rules for the dis'
perimental comparison of several approaches to the dis- criminant problem', Europ. J. Oper. Res. 46 (1990),
criminant problem with both qualitative and quantita- 113-122.
tive variables', Europ. J. Oper. Res. 28 (1987), 74-78. [39] WILSON, J.M.: 'Integer programming formulation of
[23] MARKOWSKI, E.P., AND MARKOWSKI, C.A.: 'Some statistical classification problems', OMEGA Internat.
difficulties and improvements in applying linear pro- J. Management Sci. 24, no. 6 (1996), 681-688.
gramming formulations to the discriminant problem', [40] Yu, W.: 'ELECTRE TRI: Aspects methodologiques
Decision Sci. 16 (1985), 237-247. et manuel d'utilisation', Document du Lamsade (Univ.
[24] MASSAGLIA, M., AND OSTANELLO, A.: 'N-TOMIC: A Paris-Dauphine) 74 (1992).
decision support for multicriteria segmentation prob- [41] ZOPOUNIDIS, C., AND DOUMPOS, M.: 'A multicriteria
lems', in P. KORHONEN (ed.): Internat. Workshop decision aid methodology for the assessment of country
Multicriteria Decision Support, Vol. 356 of Lecture risk', in C. ZOPOUNIDIS AND J.M. GARCIA V/tZQUEZ
Notes Economics and Math. Systems, Springer, 1991, (eds.): Managing in Uncertainty, Proc. VI Internat.
pp. 167-174. Conf. AEDEM, AEDEM Ed., 1997, pp. 223-236.
[25] MOUSSEAU, V., AND SLOWINSKI, R.: 'Inferring an [42] ZOPOUNIDIS, C., AND DOUMPOS, M.: 'Preference dis-
ELECTRE-TRI model from assignment examples', J. aggregation methodology in segmentation problems:
Global Optim. 12, no. 2 (1998), 157-174. The case of financial distress', in C. ZOPOUNIDIS (ed.):
[26] NAKAYAMA, H., AND KAGAKU, N.: 'Pattern classifica- New Operational Approaches for Financial Modelling,
tion by linear goal programming and its extensions', J. Physica Verlag, 1997, pp. 417-439.
Global Optim. 12, no. 2 (1992), 111-126. [43] ZOPOUNIDIS, C., AND DOUMPOS, M.: 'A multi-group
[27] PEKELMAN, D., AND SEN, S.K.: 'Mathematical pro- hierarchical discrimination method for managerial de-
gramming models for the determination of attribute cision problems: The M.H.DIS method': Paper Pre-
weights', Managem. Sci. 20, no. 8 (1974), 1217-1229. sented at the EURO X V I Conf.: Innovation and Qual-
[2s] PERNY, P.: 'Multicriteria filtering methods based on ity of Life, Brussels, 12-15 July, 1998.
concordance and non-discordance principles', Ann.
Constantin Zopounidis
Oper. Res. 80 (1998), 137-165.
[29] RAGSDALE, C.T., AND SWAM, A.: 'Mathematical pro- Dept. Production Engin. and Management
gramming formulations for the discriminant problem: Financial Engin. Lab.
An old dog does new tricks', Decision Sci. 22 (1991), Techn. Univ. Crete
296-307. Univ. Campus, 73100 Chania, Greece
[30] RoY, B • Mdthodologie multicrit~re d'aide ~ la d~cision, E-mail address: kostas@ergasya, rue. gr
Economica, 1985. Michael Doumpos
[31] RoY, B., AND MOSCAROLA, J." 'Procedure automa- Dept. Production Engin. and Management
tique d'examem de dossiers fond6e sur une segmenta- Financial Engin. Lab.
tion trichotomique en presence de crit~res multiples', Techn. Univ. Crete
RAIRO Rech. Opgrat. 11, no. 2 (1977), 145-173. Univ. Campus, 73100 Chania, Greece
[32] RUBIN, P.A.: 'A comparison of linear programming E-mail address: dmichael©ergasya, rue. gr
and parametric approaches to the two- group discrim-
inant problem', Decision Sci. 21 (1990), 373-386. MSC 2000:90C29
[33] RUBIN, P.A.: 'Heuristic solution procedures for a Key words and phrases: sorting, multicriteria analysis, goal
programming, outranking relation, preference disaggrega-

516
Multidimensional knapsack problems

tion. (m = 1). For the single constraint case the problem


is not strongly NP-hard and effective approxima-
tion algorithms have been developed for obtaining
MULTIDIMENSIONAL KNAPSACK PROB- near-optimal solutions. A good review of the single
LEMS constraint knapsack problem and its associated ex-
The multidimensional knapsack problem (MKP) act and heuristic algorithms is given by S. Martello
can be formulated as: and P. Toth [42].
}2 Below we give a very brief overview of the liter-
max ~-~ pj xj ature relating to the MKP. A more detailed liter-
j--1 ature review can be found in [10].
n (1)
s.t. ~ rijxj ~_ bi, i - 1,...,m,
j--1
xj E {0,1}, j=l,...,n, Exact A l g o r i t h m s . There have been relatively
few exact algorithms presented in the literature.
where bi _> 0, i - 1 , . . . , m , and rij >_ O, i - W. Shih [53] presented a branch and bound
1,...,m, j = 1,...,n. algorithm (cf. also I n t e g e r p r o g r a m m i n g :
Each of the m constraints in (1) is called a knap- B r a n c h a n d b o u n d m e t h o d s ) f o r the MKP
sack constraint, so the MKP is also called the m- with an upper bound obtained by computing the
dimensional knapsack problem. objective function value associated with the opti-
Other names given to this problem in the lit- mal fractional solution for each of the m single con-
erature are the multiconstraint knapsack problem, straint knapsack problems separately and select-
the multi-knapsack problem and the multiple knap- ing the minimum objective function value among
sack problem. Some authors also include the term those as the upper bound.
'zero-one' in their name for the problem, e.g., the Another branch and bound algorithm was pre-
multidimensional zero-one knapsack problem. His- sented in [25] with various relaxations of the prob-
torically the majority of authors have used the lem, including Lagrangian, surrogate and com-
name multidimensional knapsack problem and so posite relaxations being used to compute bounds.
we also use that phrase to refer to the problem. Y. Crama and J.B. Mazzola [11] showed that al-
The special case corresponding to m = 2 is known though the bounds derived from these relaxations
as the bidimensional knapsack problem or the bi- are stronger than the bounds obtained from the
knapsack problem. linear programming (LP) relaxation, the improve-
Many practical problems can be formulated as ment in the bound that can be realized using these
a MKP, for example, the capital budgeting prob- relaxations is limited.
lem where project j has profit pj and consumes
rij units of resource i. The goal is to find a sub-
set of the n projects such that the total profit is S t a t i s t i c a l / A s y m p t o t i c Analysis. There have
maximised and all resource constraints are satis- been a few papers considering a statisti-
fied. Other applications of the MKP include allo- cal/asymptotic analysis of the MKP.
cating processors and databases in a distributed An asymptotic analysis was presented by K.E.
computer system [24], project selection and cargo Schilling [51] who computed the asymptotic (n --+
loading [53], and cutting-stock problems [26]. co with m fixed) objective function value for the
The MKP can be regarded as a general state- MKP where the rij ~S and pj ~S were uniformly (and
ment of any zero-one integer programming prob- independently) distributed over the unit interval
lem with nonnegative coefficients. Indeed much of and where bi = 1. K. Szkatula [54] generalized that
the early work on the MKP (e.g., [32], [35], [52], analysis to the case where bi ~ 1 (see also [55]).
[59]) viewed the problem in this way. A statistical analysis was conducted by J.F.
Most of the research on knapsack problems deals Fontanari [18], who investigated the dependence
with the much simpler single constraint version of the objective function on bi and on m, in the

517
Multidimensional knapsack problems

case when pj 1 and the rij's


-- w e r e uniformly dis- 2) at the end of the procedure the upper bound
tributed over the unit interval. is sharpened by changing some multiplier val-
ues.

E a r l y H e u r i s t i c A l g o r i t h m s . Early heuristic al- A. Freville and G. Plateau [21] presented an effi-


gorithms for the MKP were typically based upon cient preprocessing algorithm for the MKP, based
simple constructive heuristics. on [20], which provided sharp lower and upper
S.H. Zanakis [59] gave detailed results compar- bounds on the optimal value, and also a tighter
ing three algorithms from [32], [35] and [52]. R. equivalent representation by reducing the continu-
Loulou and E. Michaelides [40] presented a greedy- ous feasible set and by eliminating constraints and
like method based on Toyoda's primal heuristic variables.
[57]. Primal heuristics start with a zero solution, They also [22] presented a heuristic for the bidi-
after which a succession of variables are assigned mensional knapsack problem which includes prob-
the value one, according to a given rule, as long as lem reduction, a bound based upon surrogate re-
the solution remain feasible. laxation and partial enumeration.

B o u n d B a s e d Heuristics. Bound based heuris- Tabu Search H e u r i s t i c s . Tabu search (TS)


tics make use of an upper bound on the optimal heuristics are based on tabu search concepts (see
solution to the MKP. [1], [29], [46]).
M.J. Magazine and O. Oguz [41] presented a F. Dammeyer and S. Vot~ [12] presented a TS
heuristic algorithm that combines the ideas of S. heuristic based on reverse elimination. R. Aboudi
Senju and Toyoda's dual heuristic [52] with Ev- and K. JSrnsten [2] combined TS with the pivot
erett's generalized Lagrange multiplier approach and complement heuristic [6] in a heuristic that
[17]. Dual heuristics start with the all-ones solu- they applied to the MKP (see also [39]). R. Battiti
tion, variables are then successively set to zero ac- and G. Tecchiolli [7] presented a heuristic based
cording to heuristic rules until a feasible solution on reactive TS (essentially TS but with the length
is obtained. Their algorithm computes an approx- of the tabu list varied over the course of the algo-
imate solution and uses the multipliers generated rithm).
to obtain an upper bound. F. Glover and G.A. Kochenberger [28] presented
H. Pirkul [45] presented a heuristic algorithm a TS heuristic with a flexible memory structure
which makes use of surrogate duality. The m knap- that integrates recency and frequency information
sack constraints were transformed into a single keyed to 'critical events' in the search process.
knapsack constraint using surrogate multipliers. A Their method was enhanced by a strategic oscilla-
feasible solution was obtained by packing this sin- tion scheme that alternates between constructive
gle knapsack in decreasing order of profit/weight (current solution feasible) and destructive (current
ratios. These ratios were defined as pj/~-~im=l wirij, solution infeasible) phases. See also [30].
where wi is the surrogate multiplier for constraint A. Lckketangen and Glover [37] presented a
i. Surrogate multipliers were determined using a heuristic based on probabilistic TS (essentially TS
descent procedure. but with the acceptance/rejection of a potential
J.S. Lee and M. Guignard [36] presented a move controlled by a probabilistic process). They
heuristic that combined Toyoda's primal heuristic also [38] presented a TS heuristic designed to solve
[57] with variable fixing, LP and a complementing general zero-one mixed integer programming prob-
procedure from [6]. lems which they applied to the MKP.
A. Volgenant and J.A. Zoos [58] extended the
heuristic in [41] in two ways:
G e n e t i c A l g o r i t h m H e u r i s t i c s . Genetic algo-
1) in each step, not one, but more, multiplier rithm (GA) heuristics are based on genetic algo-
values are computed simultaneously; and rithm concepts (see [1], [8], [43], [46]).

518
Multidimensional knapsack problems

In the GA of [34] infeasible solutions were al- ferent local search procedures (such as greedy, SA,
lowed to participate in the search and a simple threshold accepting [14], [15] and noising [9]) can
fitness function which uses a graded penalty term be used. They also presented two TS heuristics.
was used. In [56] simple heuristic operators based
on local search algorithms were used, and a hy- M u l t i p l e - C h o i c e P r o b l e m s . One problem that
brid algorithm based on combining a GA with a is related to the MKP is the multidimensional
TS heuristic was suggested. multiple-choice knapsack problem (MMKP). Sup-
In [48], [49] a GA was presented where parent pose that { 1 , . . . , n } is divided up into K sets
selection is not unrestricted (as in a standard GA) Sk, k - 1,... , K , which are mutually exclusive
but is restricted to be between 'neighboring' solu- Sk N Sl -- O, Vk # l, and exhaustive [.JkK=lSk --
tions. Infeasible solutions were penalized as in [34]. {1,... ,n}. If we then add to the formulation of
An adaptive threshold acceptance schedule (moti- the MKP given previously the constraint
vated by [14], [15]) for child acceptance was used.
In the GA of [33] only feasible solutions were al- xj-1, k-1,...,K, (2)
j6Sk
lowed. P.C. Chu and J.E. Beasley [10] presented a
GA based upon a simple repair operator to ensure we obtain the MMKP. Equation (2) ensures that
that all solutions were feasible. exactly one variable is chosen from each of the sets
Sk, k = 1 , . . . , K .
A n a l y s e d H e u r i s t i c s . Analysed heuristics have See [44] for a heuristic for MMKP based on the
some theoretical underlying analysis relating to MKP heuristic of Magazine and Oguz [41].
their worst-case or probabilistic performance. The special case of the MMKP corresponding
A.M. Frieze and M.R.B. Clarke [23] described to m = 1 is known as the multiple-choice knap-
a polynomial approximation scheme based on the sack problem (MCKP) and its LP relaxation as
use of the dual simplex algorithm for LP, and anal- the linear multiple-choice knapsack problem (LM-
ysed the asymptotic properties of a particular ran- CKP). Work on MCKP includes [16], which pre-
dom model. sented a hybrid dynamic programming tree search
In [47] a class of generalized greedy algorithms algorithm incorporating a Lagrangian relaxation
is proposed in which items are selected according bound; [4], which presented a heuristic based
to decreasing ratios of their pj's and a weighted upon SA; and [3], which presented a tree search
sum of their rij's. These heuristics were subjected algorithm incorporating a Lagrangian relaxation
to both a worst-case, and a probabilistic, perfor- bound. For work on LMCKP see [50]. Earlier work
mance analysis. on MCKP and LMCKP is cited in [3], [4], [16], [50].
I. Averbakh [5] investigated the properties of See also: Q u a d r a t i c k n a p s a c k ; I n t e g e r pro-
several dual characteristics of the MKP for differ- gramming.
ent probabilistic models. He also presented a fast
References
statistically efficient approximate algorithm with
[1] AARTS, E.H.L., AND LENSTRA, J. (eds.): Local search
linear running time complexity for problems with in combinatorial optimization, Wiley, 1997.
random coefficients. [2] ABOUDI, R., AND JORNSTEN, K.: 'Tabu search for gen-
eral zero-one integer programs using the pivot and com-
plement heuristic.', ORSA J. Comput. 6 (1994), 82-93.
O t h e r H e u r i s t i c s . G.E. Fox and G.D. Scudder [3] AGGARWAL, V., DEO, N., AND SARKAR, D.: 'The
[19] presented a heuristic based on starting from knapsack problem with disjoint multiple-choice con-
setting all variables to zero(one) and successively straints', Naval Res. Logist. 39 (1992), 213-227.
choosing variables to set to one(zero). See [13] for [4] AL-SULTAN, K.: 'A new approach to the multiple-
a heuristic based upon simulated annealing (SA). choice knapsack problem': Proc. 16th Internat. Conf.
Computers and Industr. Engineering, 1994, pp. 548-
See [27] for a heuristic based on ghost image pro-
550.
cesses. S. Hanafi and others [31] presented a simple [5] AVEP.BAKH, I.: 'Probabilistic properties of the dual
multistage algorithm within which a number of dif- structure of the multidimensional knapsack problem

519
Multidimensional knapsack problems

and fast statistically efficient algorithms.', Math. Pro- primitive tool', J. Heuristics 2 (1997), 147-167.
gram. 65 (1994), 311-330. [23] FRIEZE, A.M., AND CLARKE, M.R.B.: 'Approximation
[6] BALAS, E., AND MARTIN, C.H.: 'Pivot and comple- algorithms for the m-dimensional 0-1 knapsack prob-
m e n t - a heuristic for 0-1 programming', Managem. lem: worst-case and probabilistic analysis', Europ. J.
Sci. 26 (1980), 86-96. Oper. Res. 15 (1984), 100-109.
[7] BATTITI, a., AND TECCHIOLLI, G.: 'Local search [24] GAVISH, B., AND PmKUL, H.: 'Allocation of databases
with memory: Benchmarking RTS', OR Spektrum 17 and processors in a distributed computing system', in
(1995), 67-86. J. AKOKA (ed.): Managem. of Distributed Data Pro-
[8] B)i.CK, T., FOGEL, D.B., AND MICHALEWICZ, Z. cessing, North-Holland, 1982, pp. 215-231.
(eds.): Handbook of evolutionary computation, Ox- [25] GAVISH, B., AND PIRKUL, H.: 'Efficient algorithms for
ford Univ. Press, 1997. solving multiconstraint zero-one knapsack problems to
[9] CHARON, I., AND HUDRY, O.: 'The noising method: optimality', Math. Program. 31 (1985), 78-105.
A new method for combinatorial optimization', Oper. [26] GILMORE, B.C., AND GOMORY, R.E.: 'The theory and
Res. Left. 14 (1993), 133-137. computation of knapsack functions', Oper. Res. 14
[10] CHU, P.C., AND BEASLEY, J.E.: 'A genetic algorithm (1966), 1045-1075.
for the multidimensional knapsack problem', J. Heuris- [27] CLOVER, F.: 'Optimization by ghost image processes in
tics 4 (1998), 63-86. neural networks', Comput. Oper. Res. 21 (1994), 801-
[11] CRAMA, V., AND MAZZOLA, J.B.: 'On the strength 822.
of relaxations of multidimensional knapsack problems', [28] CLOVER, F., AND KOCHENBERGER, G.A.: 'Critical
INFOR 32 (1994), 219-225. event tabu search for multidimensional knapsack prob-
[12] DAMMEYER, F., AND Voss, S.: 'Dynamic tabu list lems', in I.H. OSMAN AND J.P. KELLY (eds.): Meta-
management using reverse elimination method', Ann. Heuristics: Theory and Applications, Kluwer Acad.
Oper. Res. 41 (1993), 31-46. Publ., 1996, pp. 407-427.
[13] DREXL, A.: 'A simulated annealing approach to the [29] CLOVER, F.W., AND LACUNA, M.: Tabu search,
multiconstraint zero-one knapsack problem', Comput- Kluwer Acad. Publ., 1997.
ing 40 (1988), 1-8. [30] HANAFI, S., AND FREVILLE, A.: 'An efficient tabu
[141 DUECK, G.: 'New optimization heuristics: the grand search approach for the 0-1 multidimensional knapsack
deluge algorithm and the record-to-record travel', J. problem', Europ. J. Oper. Res. 106 (1998), 659-675.
Comput. Phys. 104 (1993), 86-92. [31] HANAFI, S., FREVILLE, A., AND ABEDELLAOUI, A.EL.:
[15] DUECK, G., AND SCHEUER, T.: 'Threshold accepting: 'Comparison of heuristics for the 0-1 multidimen-
A general purpose optimization algorithm appearing sional knapsack problem', in I.H. OSMAN AND J.P.
superior to simulated annealing', J. Comput. Phys. 90 KELLY (eds.): Meta-Heuristics: Theory and Applica-
(1990), 161-175. tions, Kluwer Acad. Publ., 1996, pp. 449-465.
[16] DYER, M.E., RIHA, W.O., AND WALKER, J.: ' i hy- [32] HILLIER, F.S.: 'Efficient heuristic procedures for inte-
brid dynamic programming/branch-and-bound algo- ger linear programming with an interior', Oper. Res.
rithm for the multiple-choice knapsack problem', J. 17 (1969), 600-637.
Comput. Appl. Math. 58 (1995), 43-54. [33] HOFF, A., LOKKETANGEN, A., AND MITTET, I.: 'Ge-
[17] EVERETT, H.: 'Generalized Lagrange multiplier netic algorithms for 0/1 multidimensional knapsack
method for solving problems of optimum allocation of problems', Working Paper Molde College, Britveien 2,
resources', Oper. Res. 11 (1963), 399-417. 6~00 Molde, Norway (1996).
[18] FONTANARI, J.F.: 'A statistical analysis of the knap- [34] KHURI, S., Bti.CK, T., AND HEITK('3TTER, J.: 'The
sack problem', J. Phys. A: Math. Gen. 28 (1995), zero/one multiple knapsack problem and genetic algo-
4751-4759. rithms': Proc. 199~ A CM Syrup. Applied Computing
[19] Fox, G.E., AND SCUDDER, G.D.: 'A heuristic with tie (SAC'94), ACM, 1994, pp. 188-193.
breaking for certain 0-1 integer programming models', [35] KOCHENBERGER, G.A., MCCARL, B.A., AND
Naval Res. Logist. Quart. 32 (1985), 613-623. WYMANN, F.P.: 'A heuristic for general integer
[20] FREVILLE, A., AND PLATEAU, G.: 'Heuristics and re- programming', Decision Sci. 5 (1974), 36-44.
duction methods for multiple constraints 0-1 linear pro- [36] LEE, J.S., AND GUIGNARD, M.: 'An approximate al-
gramming problems', Europ. J. Oper. Res. 24 (1986), gorithm for multidimensional zero-one knapsack prob-
206-215. l e m s - a parametric approach', Managem. Sci. 34
[21] FREVILLE, i . , AND PLATEAU, G.: 'An efficient prepro- (1988), 402-410.
cessing procedure for the multidimensional 0-1 knap- [37] LOKKETANGEN, A., AND GLOVER, F.: 'Probabilistic
sack problem', Discrete Appl. Math. 49 (1994), 189- move selection in tabu search for zero-one mixed inte-
212. ger programming problems', in I.H. OSMAN AND J.P.
[22] FREVILLE, A., AND PLATEAU, G.: 'The 0-1 bidimen- KELLY (eds.): Meta-Heuristics: Theory and Applica-
sional knapsack problem: toward an efficient high-level tions, Kluwer Acad. Publ., 1996, pp. 467-487.

520
Multidisciplinary design optimization

[38] LOKKETANGEN, A., AND GLOVER, F.. 'Solving zero- [55] SZKATULA, K.: 'The growth of multi-constraint ran-
one mixed integer programming problems using tabu dom knapsacks with large right-hand sides of the con-
search', Europ. J. Oper. Res. 106 (1997), 624-658. straints', Oper. Res. Left. 21 (1997), 25-30.
[39] LOKKETANGEN, A., J(DRNSTEN, K., AND STOROY, S." [56] THIEL, J., AND VOSS, S.: 'Some experiences on solv-
'Tabu search within a pivot and complement frame- ing multiconstraint zero-one knapsack problems with
work', Internat. Trans. Oper. Res. 1 (1994), 305-316. genetic algorithms', INFOR 32 (1994), 226-242.
[40] LOULOU, R., AND MICHAELIDES,E.' 'New greedy-like [57] TOYODA, Y.: 'A simplified algorithm for obtaining ap-
heuristics for the multidimensional 0-1 knapsack prob- proximate solutions to zero-one programming prob-
lem', Oper. Res. 27 (1979), 1101-1114. lems', Managem. Sci. 21 (1975), 1417-1427.
[41] MAGAZINE, M.J., AND OGUZ, 0.' 'A heuristic al- [58] VOLGENANT, A., AND ZOON, J.A.: 'An improved
gorithm for the multidimensional zero-one knapsack heuristic for multidimensional 0-1 knapsack problems',
problem', Europ. J. Oper. Res. 16 (1984), 319-326. J. Oper. Res. Soc. 41 (1990), 963-970.
[42] MARTELLO, S., AND TOTH, P." Knapsack problems: Al- [59] ZANAKIS, S.H.: 'Heuristic 0-1 linear programming: An
gorithms and computer implementations, Wiley, 1990. experimental comparison of three methods', Managem.
[43] MITCHELL, M" An introduction to genetic algorithms, Sci. 24 (1977), 91-104.
MIT, 1996. J.E. Beasley
[44] MOSER, M., JOKANOVIC, D.P., AND SHIRATORI, N." The Management School, Imperial College
'An algorithm for the multidimensional multiple-choice London ST7 2AZ, England
knapsack problem', IEICE Trans. Fundam. Electron-
E-mail address: j. beasley©i¢, ac. uk
ics, Commun. and Computer Sci. E80A (1997), 582-
589. MSC2000: 90C27, 90C10
[45] PIRKUL, H." 'A heuristic solution procedure for the Key words and phrases: multidimensional knapsack, multi-
multiconstraint zero-one knapsack problem', Naval constraint knapsack, multiple choice knapsack, combinato-
Res. Logist. 34 (1987), 161-172. rial optimization.
[46] REEVES, C.R." Modern heuristic techniques for combi-
natorial problems, Blackwell, 1993.
[47] RINNOOY KAN, A.H.G., STOUGIE, L., AND VERCEL- MULTIDISCIPLINARY DESIGN OPTIMIZA-
LIS, C." 'A class of generalized greedy algorithms for TION, M D O
the multi-knapsack problem', Discrete Appl. Math. 42
Modern large scale vehicle design (aircraft, ships,
(1993), 279-290.
[48] RUDOLPH, G., AND SPRAVE, J." 'A cellular genetic automobiles, mass transit) requires the interac-
algorithm with self-adjusting acceptance threshold': tion of multiple disciplines, traditionally processed
Proc. First IEE/IEEE Internat. Conf. Genetic Algo- in a sequential order. Multidisciplinary optimiza-
rithms in Engineering Systems: Innovations and Ap- tion (MDO), a formal methodology for the integra-
plications, IEEE, 1995, pp. 365-372.
tion of these disciplines, is evolving toward meth-
[49] RUDOLPH, G., AND SPRAVE, J." 'Significance of locality
and selection pressure in the grand deluge evolutionary ods capable of replacing the traditional sequential
algorithm', in H.M. VOIGT, W. EBELING, I. RECHEN- methodology of vehicle design by concurrent algo-
BERG, AND H.P. SCHWEFEL (eds.)" Parallel Problem rithms, with b o t h an overall gain in product per-
Solving from Nature IV. Proc. Internat. Conf. Evo- formance and a decrease in design time. The obsta-
lutionary Computation, Lecture Notes Computer Sci., cles to M D O becoming a production methodology,
Springer, 1996, pp. 686-694.
in the same sense as quality control, are numer-
[50] SARIS, S., AND KARWAN, M.H." 'The linear multiple
choice knapsack problem', Oper. Res. Left. 8 (1989), ous and formidable. In aircraft design, for instance,
95-100. typical disciplines involved would be aerodynam-
[51] SCHILLING, K.E.. 'The growth of m-constraint random ics, structures, t h e r m o d y n a m i c s , controls, propul-
knapsacks', Europ. J. Oper. Res. 46 (1990), 109-112. sion, manufacture, and economics. Detailed anal-
[52] SENJU, S., AND TOYODA, Y." 'An approach to linear
yses in each of these disciplines could involve tens
programming with 0-1 variables', Managem. Sci. 15
to hundreds of subroutines and tens of thousands
(1968), 196-207.
[53] SHIH, W." 'A branch and bound method for the mul- of lines of code. Managing the software libraries
ticonstraint zero-one knapsack problem', J. Oper. Res. and d a t a alone is a daunting task.
Soc. 30 (1979), 369-378. Codes from different disciplines typically are
[54] SZKATULA, K." 'The growth of multi-constraint ran-
grossly incompatible, but even within disciplines,
dom knapsacks with various right-hand sides of the
constraints', Europ. J. Oper. Res. 73 (1994), 199-204. d a t a structures and solution representations may
be incompatible, requiring 'translation' routines or

521
Multidisciplinary design optimization

recoding. This incompatibility is particularly acute tion). If response surface approximations are used,
when stand-alone packages with interactive inter- two prevalent approximation methods are classical
faces are involved. Most disciplinary codes, de- least squares and DA CE (Design and Analysis of
signed years ago for small serial computers, are Computer Experiments).
very ill-suited to modern parallel architectures, S. Burgee, A.A. Giunta, V. Balabanov, B.
even with a coarse grained approach. Grossman, W.H. Mason, R. Narducci, R.T.
Detailed, highly accurate disciplinary analyses Haftka, and Watson [3] has a detailed discussion
are very expensive, requiring sometimes hours on of the multipoint, classical least squares approach
a supercomputer, even when run in parallel. The to response surface construction, and of the use of
import of this is that, regardless of the dimension parallelism within disciplines (the pipelined MDO
of the design space, it can be sampled for accurate paradigm of Burgee is also provably convergent).
function values at only a relatively small number The tack of this approach is to use classical de-
of points. Other obstacles to achieving true MDO sign of experiments theory, regression statistics,
include model verification, noisy function values, and low order polynomial approximation models.
and flawed parallel optimization methodologies. The DACE [7] model posits that the output of
a computer analysis program is
Almost every conceivable strategy for MDO has
been proposed. A good recent summary of hierar-
chical approaches can be found in [4], and [9] pi-
where Z(x) is a zero mean stationary Gaussian
oneered nonhierarchical or concurrent approaches.
process. (This is clearly a fiction since computer
The basic idea of concurrent methods, and a par-
output is deterministic. The issue is whether the
ticular variant known as concurrent subspace op-
model has predictive power.) Using Bayesian sta-
timization (CSSO), is to simultaneously and inde-
tistics, the best unbiased predictor is
pendently optimize each of the disciplines (or 'con-
tributing analyses', as they are called), and then Y(x) - ~ + r(x, S ) R - I ( Y s - 1. ~),
perform a global coordination that brings the en-
where S is a set of observation sites, Ys is the vec-
tire system closer to a globally feasible and opti-
tor of observations at S, r(x, S) is the correlation
mal point. Collaborative optimization differs from
of x with sites S, R is the correlation matrix be-
CSSO in how the global coordination is managed.
tween sites S, and ~ is the estimate of the mean.
An excellent discussion of these approaches is in
Some parametrized functional form for the correla-
the proceedings [2]. While concurrent methods are
tion is assumed, and then these correlation param-
intuitively appealing and naturally parallelizable,
eters and ~ are computed as maximum likelihood
they are not guaranteed to converge [8].
estimates.
Trust region model management [1] is a rigor- DACE models are more flexible than polynomial
ous approach to MDO that shows promise, and as- models, but with sparse data in high dimensions
pects of CSSO when combined with an extended neither DACE nor polynomial models have much
Lagrangian and response surface approximations, predictive power. To appreciate the problem, ob-
can lead to a provably convergent MDO method serve that a cube in 30 dimensions has 230 ~ 109
(J.F. Rodriguez, J.E. Renaud and L.T. Watson, vertices, and to even evaluate an algebraic formula
[6]). A noteworthy aspect of the Rodriguez method at each vertex requires supercomputer power.
[6] is that the convergence proof covers variable
fidelity data, which is crucial in practice. MDO Paradigm Example. As an illustration,
In a taxonomy of MDO approaches, one dis- an MDO paradigm for aircraft design is presented
tinction would be between hierarchic or nonhier- here. The MDO algorithm is a repeat loop, with a
archic. Another distinction is whether parallelism nominal design as its starting point, approximate
is achieved between disciplines (concurrent disci- optimal designs as loop iterates, and an optimal
plinary computation) or within disciplines (mul- design as its ending point (see Fig. 1). At the start
tipoint, response surface, local/global computa- of each loop, aerodynamic shape and mission vari-

522
Multidisciplinary design optimization

ables are obtained from either the nominal start- of the optimal weight and necessary aerodynamic
ing design or the intermediate approximate opti- quantities over the approximation domain.
real design. These shape and mission variables are A genetic algorithm (GA; cf. Genetic algo-
then used in the parallel simple aerodynamic and r i t h m s ) is used to find sets of approximate D-
structural analyses. optimal design points in the approximation do-
NOMINAL DESIGN MDO loop main obtained from the parallel simple analyses.
!
..1 Configuration design variables
The structure of a response surface model is em-
bodied in the regression matrix X, which defines
Parallel simple aerodynamic analyses the GA merit function IXTXl (maximized by a set
Parallel simple structural analyses
of points called D-optimal). These D-optimal de-
~ Approximation domain
sign points are input to the detailed aerodynamic
I Regressionanalysis I analysis code, which performs detailed analyses at
I Response surface model structures each of the D-optimal design points in parallel.
I D-optimal point selection I The analyses result in accurate aerodynamic quan-
~ D-optimal design point sets tities, such as wave drag and other drag compo-
I Parallel detailed ! A~l~Odynamic
loads ~'
_ ['iParallel structural
nents, and accurate aerodynamic loads.
aerodynamic analyses --- optimizations The accurate aerodynamic quantities are used
. ~ Accurate aerodynamic
quantities
[Accurat,
~ weights to generate reduced-term polynomial response sur-
Aerodynamic response [ [Weight response face models for each of the expensive quantities
surfaces surface
(such as wave drag). An aerodynamic load cal-
Configuration ]4 culated in the detailed aerodynamic analyses is
optimization used in a detailed structural optimization to calcu-
~ Approximate optimal design
late an accurate optimal weight for that particular
T aerodynamic load. This structural optimization is
OPTIMAL DESIGN done (in parallel) for each aerodynamic load gener-
ated in the detailed aerodynamic analyses. The ac-
Fig. 1: MDO paradigm.
curate optimal weights calculated in the structural
The simple aerodynamic analyses are performed optimization are used to generate a reduced-term
on a regular grid of points in the design space. Sim- polynomial response surface model for the optimal
ple aerodynamic calculations evaluate the (aero- weight.
dynamic) feasibility of each grid point using toler- All the response surface models are then used
ances on the constraints and move limits on the in a configuration optimization to generate an ap-
objective function, eliminating grossly infeasible proximate optimal design, which will be used as
points, and generating an approximation domain. the starting design for the next iteration of the
The simple structural analyses use the aerody- MDO loop. The grid spacing may possibly be re-
namic shape and mission variables in basic weight fined for the simple analyses. When some conver-
equations to calculate approximate weights needed gence criterion is satisfied, the MDO loop exits
by the objective function and constraints, further with an optimal design.
refining the approximation domain. Note that the source of parallelism in the
Using the relatively abundant data from the present MDO paradigm is the multipoint approx-
simple analyses, regression analysis and analysis imations within each discipline, where the disci-
of variance are used to identify less important plines are visited sequentially in a pipeline. This
terms in the polynomial response surface models. contrasts sharply with CSSO MDO paradigms,
Once the less important terms are eliminated, the where the source of the parallelism is processing
structure of the reduced-term polynomial regres- the disciplines in parallel.
sion models is known, and can be used later in See also: O p t i m a l design of c o m p o s i t e
the generation of response surface approximations s t r u c t u r e s ; Multilevel m e t h o d s for o p t i m a l

523
Multidisciplinary design optimization

design; Design optimization in computa- MULTIFACILITY AND RESTRICTED LO-


tional fluid dynamics; Interval analysis: Ap- CATION PROBLEMS, MFR
plication to chemical engineering design In location planning one is typically concerned
problems; Bilevel programming: Applica- with finding a good location for one or several
tions in engineering; Structural optimiza- new facilities with respect to a given set of ex-
tion: History; Optimal design in nonlinear isting facilities (clients). The two most common
optics. models in planar location theory are the Weber
References problem, where the average (weighted) distance of
[1] ALEXANDROV,N.: 'Robustness properties of a trust re- the new to the existing facilities is taken into ac-
gion framework for managing approximations in engi- count and the Weber-Rawls problem, where the
neering optimization': Proc. 6th AIAA/NASA/USAF m a x i m u m (weighted) distance of the new to the
Multidisciplinary Analysis and Optimization Sympo- existing facilities is taken into account.
sium, Vol. 96-4102, AIAA, 1996, pp. 1056-1059.
[2] ALEXANDROV, N., AND HUSSAINI, M.Y. (eds.):
More precisely, one is given a finite set Ex =
Multidisciplinary design optimization, state-of-the-art, { E x l , . . . , E x u } of existing facilities (represented
SIAM, 1997. by their geographical coordinates) in the plane R 2
[3] BURGEE, S., GIUNTA, A.A., BALABANOV,V., GROSS- and distance functions dm assigned to each existing
MAN, B., MASON, W.H., NARDUCCI, R., HAFTKA, facility m E f14 "-- { 1 , . . . , M}. The set of locations
R.T., AND WATSON, L.T.: 'A coarse grained paral-
for the N new facilities one is looking for is denoted
lel variable-complexity multidisciplinary optimization
paradigm', Internat. J. Supercomputer AppI. High Per- X - { X I , . . . , XN}. The distance between the new
formance Comput. 10 (1996), 269-299. facilities is measured by a common distance d. Ad-
[4] CRAMER, E.J., DENNIS JR., J.E., FRANK, P.D., ditionally, a value wren is assigned to each pair
LEWIS, R.M., AND SHUBIN, G.R.: 'Problem formula- (Exm,Xn), for m E M , n E Af . - { 1 , . . . , N }
tion for multidisciplinary optimization', SIAM J. Op-
and a value vrs assigned to each pair (Xr, Xs), for
tim. 4 (1994), rs4-776.
r, s E Af, s > r, reflecting the level of interaction.
[5] RENAUD, J.E., AND GABRmLE, G.A.: 'Sequential
global approximation in non-hierarchic system decom- W i t h these definitions the multifacility Weber
position and optimization', Adv. Design Automation 1 objective function can be written as
(1991), 191-200.
[6] RODRiGUEZ,J.F., RENAUD, J.E., AND WATSON, L.T"
'Convergence of trust region augmented Lagrangian mE,Ad nEAr
methods using variable fidelity approximation data',
+ .- xN)
Structural Optim. 15 (1998), 141-156.
r,sEA/"
[7] SACKS, J., WELCH, W.J., MITCHELL, T.J., AND s>r
WYNN, H.P.: 'Design and analysis of computer exper-
iments', Statistical Sci. 4 (1989), 409-435. and the multifacility Weber-Rawls objective func-
[8] SHANKAR, J., RIBBENS, C.J., HAFTKA, R.T., AND tion can be written as
WATSON, L.T.: 'Computational study of a nonhier-
archical decomposition algorithm', Comput. Optim.
max Wren dm (Ezra, X.),
Appl. 2 (1993), 273-293.
[9] SOBIESZCZANSKI-SOBIESKI, J.: 'Optimization by de-
composition: A step from hierarchic to non-hierarchic
systems': Second NASA/Air Force Symposium on Re- max vrsd(Xr, Xs)
r,sEAf
cent Advances in Multidisciplinary Analysis and Opti-
mization, 1988.
s>r J
•= xN).
Layne T. Watson
Virginia Polytechnic Inst. and State Univ. In the corresponding optimization problems we
Virginia, USA may additionally assume a feasible region ~ and
E-mail address: ltu©vt, edu we look for
MSC2000: 65F10, 65F50, 65H10, 65K10
Key words and phrases: collaborative, concurrent sub- rain f(Xl,... , XN) ,
{x~,...,xN}c:~
space, multidisciplinary design, multipoint approximation,
response surface, DACE. and

524
Multifacility and restricted location problems

min g(X1, . . . , XN). m E ~4. This result carries over to multi- (facil-
{x~,...,xNIcJ=
ity) Weber problems when each Bm has no more
In the first part of this survey it is assumed that than 4 extreme points [24]. For more than 4 ex-
~" -R 2 whereas $" will be a restricted set later treme points it is in general wrong (see [24] for a
on.
counterexample).
The models above implicitly assume that the In the case where all Bm are polytopes we can
new facilities can be distinguished, that the give linear programming formulations for the mul-
amount of interaction between each new and ex- tifacility Weber as well as the multifacility We-
isting facility is known and that the new facilities ber-Rawls problem ([34]) using B ° , the polar set
have mutual communication. Note, that problems of Bin, m E f i 4 .
without communication between the new facili-
ties can be separated into N independent 1-facility
min
problems which can be easily solved by suitable al-
mE.M nEAr r,sEAf
gorithms. Also, in many applications we want to s>r
0
locate a number of indistinguishable facilities to s.t. <Exm - Xn, em> < Zmn,
serve the overall demand. This implies that we are Vm E A/i, n E A/'e ° E E x t ( B ° ) ,
not only locating facilities, but we are also allo- !

cating existing facilities (clients) to the new ones.


Vs, r E N', s > r, e ° E Ext(B°),
This variation of the problem is called multi Weber
or multi Weber-Rawls problem and the objective min z

functions can be written as s.t.


0
wren <Exm - Xn, era> <_z,
Vm E .M, n EAf, e0m C Ext(BO),
m E A/i
- < z,
and
Vs, r E A/', s > r, e ° E Ext(B°).
max {wmdm(Exm { X l , . . XN})} -- ~(X)
mEA/t ' "' '
Even without polyhedral structure we still have a
respectively, where dm(Zxm, {X1,... ,XN}) :=
convex optimization problem for which several so-
minye{x,,...,X,} dm(Exm, Y).
lution techniques are available (see [21], [11], [12],
In order to discuss solution methods, suitable
[32] and references therein).
types of distance functions din, m E A/t, are spec-
In the case where we also have to deal with the
ified next.
allocation problem we still can apply discretization
Let B be a compact convex set in the plane
results from the 1-facility case. The allocation part
containing the origin in its interior and let Y be a
makes the problem however NP-hard (see [22],
point in the plane. The gauge of Y (with respect
[23]; cf. also C o m p l e x i t y t h e o r y ; C o m p l e x -
to B) is then defined as
ity classes in o p t i m i z a t i o n ) . Nevertheless, con-
VB(Y) "--inf {)~ > 0" Y E AB}. structs from computational geometry (e.g. Voronoi
This definition dates back to [25]. The distance diagrams; cf. also V o r o n o i d i a g r a m s in facility
from Exm to Y induced by ")'S is l o c a t i o n ) can be used to tackle the allocation part
efficiently and allow iterative heuristics producing
Y) .- (Y - m e M.
in general satisfactory results (see [2], [30]).
In the case where all Bm are convex polytopes Further extensions are possible and already in-
with extreme points E x t ( B m ) " - {e~n,... ,e~} we vestigated including location with attraction and
can define halflines Im starting at Exm and going repulsion, hub location, etc. (see [32] for further
through e~n. For the 1-facility case it was proved references).
in [6] for the Weber problem that there always ex- A problem common to all forms of multi- (fa-
ists an optimal solution in the set of intersection cility) location problems is, that in an optimal so-
points of the halflines Im for i - 1 , . . . , G m and lution locations of different new facilities may co-

525
Multi/acility and restricted location problems

incide with each other or with existing facilities. l m and the boundary of 7~ (see [15], [26], [28] and
This raises at least two issues" the illustration in the following figure).
• A priori detection of coincidences which re-
sult in a reduction of the dimension of the
problem and allow the exploitation of differ- /
entiability are discussed in ([20], [31], [7]).
• If coincidence is excluded, the theory of re-
stricted location can be used which is dis-
cussed next.
So far, the set ~ for placing new facilities was
the whole plane R 2. Now, the feasibility set 9r =
R 2 \ int(7~) is considered, where T/C_ R 2 is the re-
stricting set assumed to be connected in R 2. This
problem is more complicated than the unrestricted
Fig." Example of a restricted location problem with 4
one, since ~" is in general not convex. But from existing facilities and an elliptic forbidden region.
a practical point of view it is a necessary exten-
sion of the classical location model, since forbid- The discretization also works for restricted cen-
den regions appear everywhere" nature reserves, ter problems [16] and can be extended to noncon-
lakes, exclusion of coincidence in multifacility, etc. vex forbidden regions (see [15], [26]) and also to the
These problems are called restricted location prob- case of attraction and repulsion (negative weights
lems and have been developed in [1], [12], [14], [15] are allowed), see [29]. The concept of forbidden re-
and [26]. In the following we exclude the trivial gions has been successfully applied to a problem in
case and assume that none of the optimal solutions PCB assembly, where the bins holding the parts to
of the unrestricted problem is a feasible solution of be inserted into the PCB have to be stored [10].
the restricted one. Of course, the PCB itself has to be forbidden for
If the objective function h of the location prob- placing a bin. A solution approach, where also the
lem is convex it can be shown that optimal solu- issue of space requirements in a multifacility set-
tions of the restricted problem can be found on the ting is addressed can be found in [9], [15]. A more
boundary of 7~. Therefore, level curves general case where the new facility is a line has
been considered in [33]. Algorithms for multifacil-
L=(z) "- { Z e R '~" h ( X ) - z}
ity problems with forbidden regions can be found
and level sets in [8], [15], [27].
L<(z) "- <X e R '~" h ( X ) <_ z} Another type of restricted location problem is
can be used to reformulate the restricted location one, where not only placement, but also tresspass-
problem as ing of regions is forbidden. These problems are
called barrier location problems. The correspond-
min {z" L=(z) N OT~ ~ 0 and L<(z) C_ T~} .
ing models are mathematically challenging, since
A resulting search algorithm was formulated in the distance functions (and thus also the objec-
[11], but proved to be inefficient in practical appli- tive functions) are no longer convex. [17] considers
cations. Euclidean distances and one circle as forbidden re-
An efficient approach originally presented in gion. [1] and [4] develop heuristics for Ip distances
[12], [14], [15] identifies finite dominating sets and barriers that are closed polygons. [19] and [3]
(FDS) on the boundary 7~, i.e. a finite set of lo- obtain discretization results for 11 distances and
cations on 07~ which contains an optimal solution. arbitrary shaped barriers by showing an equiva-
Using this discretization, problems with gauge dis- lence of the barrier problem to a network location
tance and convex forbidden region can be solved problem. In the more general context of gauge dis-
by considering as FDS the intersection points of tances an FDS is given in [13] for median problems

526
Multifacility and restricted location problems

and in [5] for center problems. Finally, [18] consid- [8] FLIEGE, J., AND NICKEL, S.: 'An interior point method

ers barrier problems if the distance is an a r b i t r a r y for multifacility location problems with forbidden re-
gions', Studies in Location Anal. 14 (2000), 23-45.
n o r m and the barrier consists of a line with finitely
[9] FOULDS, L.R., AND HAMACHER, H.W.: 'Optimal bin
m a n y passages. location and sequencing in printed circuit board assem-
See also" S i n g l e facility location: Multi- bly', Europ. J. Oper. Res. 66 (1993), 279-290.
objective Euclidean distance location; Sin- [10] FRANCIS, R.L., HAMACHER, H.W., LEE, C.-Y., AND
gle f a c i l i t y l o c a t i o n : M u l t i - o b j e c t i v e r e c t i - YERALAN, S.: 'Finding placement sequences and bin
locations for Cartesian robots'; Trans. Inst. Industr.
l i n e a r d i s t a n c e l o c a t i o n ; S i n g l e f a c i l i t y lo-
Engin. (IIE) (1994), 47-59.
cation: Circle covering problem; Network [11] FRANCIS, R.L., MCGINNIS JR., L.F., AND WHITE,
location: Covering problems; Warehouse J.A.: Facility layout and location: An analytical ap-
location problem; Facility location with proach, second ed., Prentice-Hall, 1992.
externalities; Production-distribution sys- [12] HAMACHER, H.W.: Mathematische LSsungsverfahren
fiir planare Standortprobleme, Vieweg, 1995.
tem design problem; Global optimization
[13] HAMACHER, H.W., AND KLAMROTH, K.: 'Planar loca-
in Weber's problem with attraction and tion problems with barriers under polyhedral gauges',
repulsion; Facility location with staircase Report in Wirtschaftsmath. Dept. Math. Univ. Kaiser-
costs; Stochastic transportation and loca- slautern 21 (1997), to appear in Ann. Oper. Res.
tion problems; Facility location problems (2000).
with spatial interaction; Voronoi diagrams [14] HAMACHER, H.W., AND NICKEL, S.: 'Combinatorial
algorithms for some 1-facility median problems in the
in f a c i l i t y l o c a t i o n ; O p t i m i z i n g facility loca-
plane', Europ. J. Oper. Res. 79 (1994), 340-351.
tion with rectilinear distances; Combinato- [15] HAMACHER, H.W., AND NICKEL, S.: 'Restricted planar
r i a l o p t i m i z a t i o n a l g o r i t h m s in r e s o u r c e al- location problems and applications', Naval Res. Logist.
l o c a t i o n p r o b l e m s ; M I N L P : A p p l i c a t i o n in 42 (1995), 967-992.
facility location-allocation; Resource alloca- [16] HAMACHER, H.W., AND SCH(3BEL, A.: 'A note on cen-
ter problems with forbidden polyhedra', Oper. Res.
t i o n for e p i d e m i c c o n t r o l ; C o m p e t i t i v e facil-
Left. 20 (1997), 165-169.
ity location.
[17] KATZ, I.N., AND COOPER, L.: 'Facility location in the
presence of forbidden regions, I: Formulation and the
case of Euclidean distance with one forbidden circle',
References Europ. J. Oper. Res. 6 (1981), 166-173.
[1] ANEJA, Y.P., AND PARLAR, M.: 'Algorithms for We- [18] KLAMROTH, K.: 'Planar location problems with line
ber facility location in the presence of forbidden regions barriers', Report in Wirtschaftsmath. Dept. Math.
and/or barriers to travel', Transport. Sci. 28 (1994), Univ. Kaiserslautern 13 (1996), to appear in Optim.
70-76. [19] LARSON, R.C., AND SADIQ, G.: 'Facility locations with
[2] AURENHAMMER, F.: 'Voronoi diagrams - A survey of the Manhattan metric in the presence of barriers to
a fundamental geometric data structure', ACM Com- travel', Oper. Res. 31 (1983), 652-669.
puting Surveys 23 (1991), 345-405. [2o] LEFEBVRE, O., MICHELOT, C., AND PLASTRIA, F.:
[3] BATTA, R., GHOSE, A., AND PALEKAR, V.S.: 'Locat- 'Sufficient conditions for coincidence in minisum multi-
ing facilities on the Manhattan metric with arbitrarily facility location problems with a general metric', Oper.
shaped barriers and convex forbidden regions', Trans- Res. 39 (1991), 437-442.
port. Sci. 23 (1989), 26-36. [21] LOVE, R.F., MORRIS, J.G., AND WESOLOWSKY,
[4] BUTT, S.E., AND CAVALIER, T.M.: 'An efficient algo- G.O.: Facilities location: Models and methods, North-
rithm for facility location in the presence of forbidden Holland, 1988.
regions', Europ. J. Oper. Res. 90 (1996), 56-70. [22] MASUYAMA, S., IBARAKI, T., AND HASEGAWA, T.:
[5] DEARING, P.M., HAMACHER, n . w . , AND KLAMROTH, 'The computational complexity of the M-center prob-
K.: 'Center problems with barriers', Techn. Report lems on the plane', Trans. IECE Japan E64 (1981),
Depts. Math. Univ. Kaiserslautern and Clemson Univ. 57-64.
(1998), To appear in Naval Research Logistics. [23] MEGIDDO, N., AND SUPOWIT, K.J.: 'On the com-
[6] DURIER, R., AND MICHELOT, C.: 'Geometrical prop- plexity of some common geometric location problems',
erties of the Fermat-Weber problem', Europ. J. Oper. SIAM J. Comput. 13 (1984), 182-196.
Res. 20 (1985), 332-343. [24] MICHELOT, C.: 'Localization in multifacility location
[7] FLIEGE, J.: 'Nondifferentiability detection and dimen- theory', Europ. J. Oper. Res. 31 (1987), 177-184.
sionality reduction in minisum multifacility location
problems', J. Optim. Th. Appl. 94 (1997).

527
Multifacility and restricted location problems

[25] MINKOWSKI, H.: Gesammelte Abhandlungen, Vol. 2, levels and the definition of the objectives and con-
Chelsea, 1967. straints at particular levels.
[26] NICKEL, S.: Discretization of planar location problems,
Given a set of objectives {fi}i=l,...,M with
Shaker, 1995.
[27] NICKEL, S.: 'Bicriteria and restricted 2-facility Weber f i : R n --+ R and a vector of variables x E R n,
problems', Math. Meth. Oper. Res. 45, no. 2 (1997), partitioned into subsets x = ( X l , . . . , XM) for some
167-195. integer M denoting the number of subsystems, a
[28] NICKEL, S.: 'Restricted center problems under polyhe- prototypical form of MLP may be stated as fol-
dral gauges', Europ. J. Oper. Res. 104, no. 2 (1998),
lows:
343-357.
[29] NICKEL, S., AND DUDENHt3FFER, E.-M.: 'Weber's
min
problem with attraction and repulsion under polyhe- xlES1
dral gauges', J. Global Optim. 11 (1997), 409-432. s.t. x2 E argmin{f2(x)}
[30] OKABE, A., BOOTS, B., AND SUGIHARA, K.: Spatial x2ES2
tesselations. Concepts and applications of Voronoi di-
agrams, Wiley, 1992.
[31] PLASTRIA, F.: 'When facilities coincide: exact optimal-
ity conditions in multifacility Location', J. Math. Anal. XM6SM
Appl. 169 (1992), 476-498.
[32] PLASTRIA, F.: 'Continuous location problems', in
where the optimization problem at each level i
Z. DREZNER (ed.): Facility Location - A Survey of Ap- controls its own subset of variables xi, while the
plications and Methods, Springer, 1995, pp. 225-262. other subsets of variables X l , . . . , X i - l , X i + l , X M
[33] SCHOBEL, A" Locating lines and hyperplanes: Theory serve as parameters. The constraint set for each
and algorithms, Kluwer Acad. Publ., 1999. level is Si - {x: h i ( x ) - O, gi(x) >__ 0} with
[34] WARD, J.E., AND WENDELL, R.E.: 'Using block norms
hi : R '~ ---+ Rmh~ and gi : R n ~ R mgi for some
for location modeling', Oper. Res. 33 (1985), 1074-
1090. integers mh~ , mg~.
This form of MLP inspired by the work of H.
Horst W. Hamacher
Stackelberg [89] can be viewed as an M-player
Fachber. Math. Univ. Kaiserslautern
Postfach 3049, 67653 Kaiserslautern, Germany Stackelberg game ([18], [81]). Its interpretation is
E-mail address: hamacher©mathematik, uni-kl, de that of M autonomous players or decision makers
Stefan Nickel seeking to minimize their (possibly constrained)
Fachber. Math. Univ. Kaiserslautern objective functions while manipulating subsets of
Postfach 3049, 67653 Kaiserslautern, Germany decision or design variables disjoint from those of
E-mail address: nickel©mathematik, uni-kl, de other decision makers. The higher-level problems
MSC 2000:90B85 are implicit in the variables of the lower-level prob-
Key words and phrases: location theory, Weber problem, lems. This formulation has been studied widely in
Weber-Rawls problem, multifacility Weber problem, mul- the bilevel case. See, for example, [15] and the ref-
tiWeber problem, gauge, convex polytope, linear program- erences therein. In general, all problem levels, but
ming, NP-hard, Voronoi diagram, restricted location prob-
the outermost one, may contain a number of con-
lem, finite dominating set, discretization, barrier location
problems. current optimization problems.
A related variant of the problem, known as
the generalized bilevel p r o g r a m m i n g problem, rep-
MULTILEVEL METHODS FOR OPTIMAL resents the reaction of the lower-level problem to
DESIGN decisions made by the upper-level problem via a
solution of an equilibrium problem stated as a vari-
Multilevel, or hierarchical, programming problems
ational inequality:
(MLP) are constrained optimization programs in
which subsets of the solution set are themselves so-
min
lution sets of other, lower-level optimization pro- xEX,
yEY(x)
grams. Several general MLP problem statements
exist. They differ from one another in the specifics s.t. ( A ( x , y), y - z) < 0
of optimization variable distribution among the for all z E Y ( x ) ,

528
Multilevel methods for optimal design

where the upper-level domain X is such that the cal organization. Maintaining disciplinary auton-
lower-level domain Y(x) is not empty. This formu- omy while accounting for interdisciplinary subsys-
lation was introduced by P. Marcotte in [60] and tem couplings and allowing for integrated system
studied in [44], [61], and [68]. optimization with respect to system and interdis-
Multilevel problems may be partitioned into two ciplinary objectives is one of the tasks of MDO.
classes with respect to another criterion [97]. In Overviews of multidisciplinary optimization may
one of the classes, upper-level optimization prob- be found in [6] and [87].
lems depend on the corresponding lower-level ones Practitioners of engineering have been using
through the optimal value functions (or the mar- multilevel methods, in some form, since optimi-
ginal ]unctions) of the lower-level problems. An zation algorithms made their appearance in engi-
optimal value function represents the value of a neering problems. The seminal works [57], [62], and
lower-level objective function at a solution of that [95] contributed to a systematic development and
lower-level problem. In the other class, upper-level understanding of hierarchical optimization. Mul-
problems depend on the corresponding lower-level tilevel methods have been studied extensively in
problems through the actual optimal solutions of application to multidisciplinary design ([16], [17],
the latter. An example of two such formulations [22], [93], [94])and single-discipline design areas
in engineering design optimization will be given that give rise to large problems, such as structural
further. optimization (e.g., [84], [88], [71]). Engineering
Multilevel programming problems arise in nu- multilevel optimization has always had a strong
merous applications where the structure of the ap- connection with multi-objective optimization (e.g.,
plication involves hierarchical decision making or [52]).
where the sheer size and complexity of the problem
necessitates partitioning of the system and pro- P r o b l e m F o r m u l a t i o n . The procedure of formu-
cessing the subsystems in a hierarchical fashion. lating an engineering design problem as a multi-
Information on applications of multilevel optimi- level or a bilevel problem is difficult and depends
zation in such varied areas as power systems, wa- on the complexity and size of the problem. The
ter resource systems, urban traffic systems, and general components in formulating a multilevel op-
river pollution control can be found in [35], [49], timization problem are as follows:
[50], [51], [59], [66], [67], [82], and many other ref-
The original problem is studied to determine
erences. The use of multilevel algorithms in engi-
its structure. Structure is of paramount im-
neering control is well documented, for instance,
portance in deciding to adopt a particular
in [45] and [54].
formulation. For instance, most formulations
The broad area of multidisciplinary design op- assume that the problem subsystems share
timization (MDO) ~ a term that denotes a large only a relatively small number of variables,
set of research subjects and practical techniques i.e., that the bandwidth of interdisciplinary
for the design of complex coupled engineering sys- coupling is relatively small.
tems ~ is particularly amenable to the use of
multilevel methods, due to the extreme computa- The problem is partitioned into a system
tional expense and the organizational complexity (or upper-level) problem and subsystem (or
of the field. For instance, the design of aircraft in- lower-level) problems. Decisions are made on
volves aerodynamics, structural analysis, control, inclusions of particular variables and con-
weights, propulsion, and cost, to list a few disci- straints into the system and subsystems. De-
plines. The complexity and expense of each dis- cisions are also made on the form of the sys-
cipline have assured that most disciplines have tem and subsystem objectives.
developed into vast, autonomous fields of study, Finally, algorithms are selected for solving
so that practically feasible optimization methods the system and subsystem optimization prob-
that involve the contributing disciplines must take lems. One must distinguish a formulation of
into account such an autonomy and the hierarchi- the problem from the algorithm used to solve

529
Multilevel methods for optimal design

that formulation. While some of the mul- proaches have enjoyed success when applied to
tilevel formulations can be easily shown to specific problems, insufficient analytical founda-
be mathematically equivalent to the original tion and the difficulty of the problem usually mean
problem with respect to solution sets, they that the approaches are not robust, and extensive
may not be equivalent with respect to other 'fine-tuning' of heuristic parameters is required for
attributes, such as constraint qualifications each new problem or instance of a problem. Hence,
and optimality conditions. Hence the numer- recent years have seen renewed interest in sys-
ical properties of algorithms applied to dif- tematic, analytically substantiated approaches to
ferent formulations vary widely [8], [9], [10]. MLP. Many such developments have taken place
in bilevel optimization.
Problem decomposition constitutes a special area
of study. In general, decomposition techniques take Bilevel O p t i m i z a t i o n . Although bilevel optimi-
advantage of the problem structure and depend on zation problems (BLP) form the simplest case of
the strength and bandwidth of couplings among multilevel optimization, they are very difficult to
the subsystems. Separable and partially separable solve and constitute a fertile research area. A sur-
problems are particularly amenable to decomposi- vey of the field can be found in [28]. A large bibli-
tion. ography with an emphasis on theoretical develop-
Two types of decomposition may be considered ments is also provided in [91].
in design optimization. Coarse-grained decompo- The conventional general bilevel problem may
sition with respect to disciplines presents no diffi- be posed as follows:
culty, because the design problem initially consists
of autonomous parts. The difficulty at this level min fl (X, y)
xEX
of problem formulation is in integration or synthe- s.t. hi (x, y) - 0
sis. However, in realistic applications, even though gl(x,y) >_ O,
the coarse-grained decomposition is frequently ob-
vious, the complexity of the problem requires that where y solves for fixed x:
a dependence analysis be performed in order to de- min /2(x, y)
termine the most advantageous arrangement or se- yEY
quencing of the disciplinary subsystems in the op- s.t. h2(x, y) = 0
timization procedure. Automatic techniques based g2(x, y) >_O.
on graph-theoretic foundations may be found in
The cases of linear and convex problem functions
[75] and [76], for instance.
have been studied widely. A popular class of meth-
Finer-grained decomposition within a particu-
ods for the linear bilevel problem (extreme point
lar discipline may be addressed by a multitude
algorithms) computes global solutions by enumer-
of techniques for decomposition of mathematical
ating extreme points of the lower-level feasible
programs. Extensive references on decomposition
set (e.g., [27]). Convex bilevel problems are often
in general mathematical programming, beginning
solved by branch and bound methods (e.g., [15]).
with [19] and [31], and extended in [48] and many A survey of methods for linear and convex bilevel
others, can be found in [41] and [42]. Further ref-
programming can be found in [11].
erences to decomposition techniques aimed specif-
The considerably more difficult case of nonlin-
ically at design problems can be found in [92].
ear and nonconvex problem functions has inspired
General multilevel programming presents an ex- much research activity as well but has, to date, led
ceedingly difficult problem, and many multilevel to few computationally successful algorithms. The
formulations and algorithms of engineering de- existing approaches to nonlinear bilevel optimiza-
sign rely more on heuristics than on theoreti-
tion can be classified into several categories.
cally substantiated foundations. There are excep-
tions, for instance, such as those in [12], [65], [29], Penalty-Based Methods. This category uses
and [72]. While many engineering multilevel ap- penalty methods. In some algorithms (e.g., [1]),

530
Multilevel methods for optimal design

a barrier function penalizes the lower-level objec- Examples: Collaborative Optimization. Col-
tive. In double-penalty methods, both the lower- laborative optimization (CO) is a general approach
level problem and the upper-level problem are ap- to solving multidisciplinary design optimization
proximated by sequences of unconstrained optimi- problems by formulating them as nonlinear bilevel
zation problems ([53], [58], [61]). Single or double- programs of special structure. CO comprises a
penalty methods are, in general, expected to con- number of methods. Its antecedents can be traced
verge slowly, especially for highly nonlinear prob- to earlier hierarchical approaches, as in [57] and
lems. Thus using these methods for the usually [95]. The underlying idea of CO appeared in [13],
large and nonlinear design optimization problems [77], [78], [79], [85] and [93], [94]. The approach
may be difficult. has recently received attention under the name of
collaborative optimization [22], [23], [83], [90].
KKT-Based Methods. The algorithms of this cate-
Given that MDO problems are naturally parti-
gory convert the bilevel problem into a noncon-
tioned into subsystems along disciplinary lines, CO
vex, single-level optimization problem by using
suggests an intuitively attractive way to formulate
the Karush-Kuhn-Tucker conditions (KKT condi-
the optimization problem so that the autonomy
tions) of the lower-level problem as constraints on
of the disciplinary subsystem computations is pre-
the upper-level problem ([14], [15], [20], [36], [43]).
served. However, the approach presents a problem
If the lower-level problem is convex, the KKT for-
that is difficult to solve by means of conventional
mulation is equivalent to the original formulation
nonlinear programming software ([7], [55]). The
[14]. However, even in this case, the KKT condi-
analytical and computational aspects of CO were
tions on the lower-level problem include the com-
addressed in [9], of which the following discussion
plementarity slackness condition as a constraint.
is an abstract. As a complete description of CO is
The form of the complementarity condition makes
lengthy, only an abbreviated version is considered
the single-level problem difficult to solve. The
here.
KKT formulation suffers from an additional dif-
It is assumed that the original system is com-
ficulty. Namely, it is well known from the study of
posed of a number, say M, of interdependent but
the sensitivity and stability of nonlinear program-
autonomous systems, each of which is described by
ming (e.g., [39]) that even if the lower-level prob-
a disciplinary analysis Ai, i - 1,... , M , expressed
lem behaves exceedingly well in that it satisfies
in the form
such stringent assumptions as strong second order
sufficiency and regularity as a constraint qualifica- Ai(xi, yi(xi)) = O,
tion, the feasible set of the single-level problem will
where, given a vector of disciplinary design vari-
generally not be differentiable with respect to x.
ables xi, the analysis (frequently represented by
Hence, the performance of gradient-based solvers
a numerical differential equation solver or simula-
on the transformed problem may be adversely af-
tor) is performed to yield the vector of state vari-
fected.
ables or responses yi(xi). The sets of disciplinary
Descent-Based Methods. Another category of algo- variables xi are not necessarily disjoint. The disci-
rithms is based on solving subproblems that result plinary constraints are usually represented by in-
in descent for the upper-level problem with gradi- equalities
ent information of the lower-level problem used in
>_ 0.
a number of ways ([33], [38], [56], [80]).
The remainder of the article will be devoted Once the system objective and variables and the
to a more detailed description of two specific ap- subsystem constraints and variables are identified,
proaches to nonlinear, nonconvex problems that the bilevel problem is formed as follows:
arose from the need to solve engineering design The constraints of the system problem com-
problems. One approach is a bilevel formulation, prise the 'consistency' (or 'coupling' or 'match-
the other is an algorithm for solving multilevel for- ing') conditions that are used to drive the discrep-
mulations. ancy among the inputs and outputs shared by the

531
Multilevel methods for optimal design

subsystems to zero. The values of the constraints where x, solves the subsystem optimization prob-
are computed by solving the subsystem optimiza- lem.
tion problems, and the number of consistency con- Another instance of system-level consistency
straints is related to the number of subsystems and conditions matches the system-level variables with
variables shared among the subsystems. The form their subsystem counterparts computed in sub-
of the consistency constraints determines a partic- problem
ular implementation of CO.
Let ~ and 77represent system-level variables cor- gi(~, 77) = (~ - x,, ~ - y(x,)). (4)
responding to inputs and outputs of subsystems,
respectively. Then, given M subsystems, the ab- The behavior of optimization algorithms applied
breviated system program is to the original and CO formulations will differ
greatly, as the formulations are not equivalent with
min F(~, ~)
(1) respect to constraint qualifications or optimality
s.t. a(~, y) - 0 , conditions.
where In general, value functions are not differentiable,
and this may cause difficulties for optimization
- algorithms applied to the system-level problem.
However, under a number of strong assumptions,
the constraints are locally differentiable and can
is the set of system consistency constraints ob- usually be computed.
tained by solving lower-level subproblems, each of
Derivatives of the system-level constraints with
which is of the form
respect to the system-level design variables are the
/ 1[ ]
min ~ II~i- xill2 + II~i- Y(xi)ll2 (2) sensitivities of the minima or the solutions of the
s.t. ci(xi, y(xi)) >_ O, subsystem-level optimization problems to parame-
ters. The area of sensitivity in nonlinear program-
where i is the number of the subsystem. Thus, ming has been studied extensively. Relevant re-
the objective of a subsystem optimization prob- sults can be found in [39] and [40]. In particu-
lem is always to minimize the discrepancy between lar, under the assumptions of sufficient smooth-
the shared variables of the subsystems, in a least ness, second order sufficiency, regularity as con-
squares sense, subject to satisfying the disciplinary straint qualification, and strict complementarity
constraints, which do not depend explicitly on the slackness, the basic sensitivity theorem (BST)
system variables passed down to the subsystems as proves the existence of a unique, local, contin-
parameters. The subsystems remain feasible dur- uously differentiable solution-multiplier triple for
ing optimization, while interdisciplinary .feasibility the perturbed problem. Moreover, locally, the set
is gradually attained at the system level via the of active constraints remains unchanged and reg-
consistency constraints. Maintaining disciplinary ularity and strict complementarity hold, allowing
feasibility is extremely important from the design one to compute derivatives locally. In fact, under
perspective. a number of assumptions, stronger statements can
The problem now consists of a set of decoupled be made about the differentiability of the value
subproblems that can be solved independently and function ([30], [74]).
in parallel. Under the conditions of BST, local first order
One instance of the system-level consistency derivatives of the consistency constraints (3) have
conditions gives rise to the form in which CO is a particularly simple form because, in the case of
usually presented: namely, the consistency condi- CO, the constraints of the lower-level problems do
tion is intended to drive to zero the value function not depend on parameters. On the other hand, the
of the subproblem (2). That is, first order sensitivities of solutions of the lower-
1 level problem that form the derivatives of the con-
- - ,112 + 11,7 - y(x,)ll 2 , (3)
sistency constraints (4), while of closed form, are

532
Multilevel methods for optimal design

expensive to compute and involve second order The algorithms of the class are based on trust
derivatives of the subsystem Lagrangians. region methodology (see, e.g., [34], [37], [64])and
There is another feature of the CO formula- are proven to converge under reasonable assump-
tion with compatibility constraints (3) that will tions.
cause difficulties for nonlinear programming algo- The idea of the MAESTRO algorithms is to at-
rithms applied to the system-level problem: La- tain sequential predicted sufficient decrease con-
grange multipliers will almost never exist for the ditions for all the constrained objectives, and is
equality constrained system level problem, with a direct extension of the multilevel ideas for the
all the ensuing consequences. The nonexistence of equality constrained optimization problem. The
Lagrange multipliers is due to the description of approach can be summarized as follows. Given an
the feasible region that causes the Jacobian of the initial approximation to the solution of the mul-
system-level constraints to vanish at a solution. tilevel problem, the trial step for the multilevel
The formulation with compatibility constraints (4) problem is computed as a sum of a sequence of
aims to address this problem. However, the compu- substeps, each of which predicts sufficient (or opti-
tation of derivatives for this formulation is clearly mal) decrease in the quadratic model of the objec-
expensive, as it not only involves solving a system tive of a given subproblem, subject to maintain-
of equations, but also requires the computation of ing predicted decrease in the models of the pre-
second order information for the subsystems. The vious objectives. For instance, in the case of the
difficulties are addresses in detail in [9]. unconstrained bilevel problem, the trial step for
In summary, CO is an appealing approach to the bilevel problem is a sum of two substeps. The
design optimization; however, the bilevel nature of first substep is computed to predict sufficient de-
the problem formulation will cause difficulties for crease, via the quadratic model of the innermost
conventional nonlinear programming algorithms objective f2, for the subproblem of approximately
applied to the system-level problem. This is to be optimizing
expected for a bilevel problem. 1
mf2(s) - f2(xc) + V f2(xc)Ts + -~sTH2(xc)s,
E x a m p l e : M A E S T R O , a Class of Multi- in the trust region of size 6/2 to produce the sub-
level A l g o r i t h m s . As mentioned earlier, most step s f2, where xc is the current approximation
multilevel formulations and algorithms for engi- to the solution and H2 is the current approxima-
neering design problems assume that the band- tion to the Hessian of f2. The second step s/~
width of coupling among the subsystems com- would then approximately minimize the quadratic
prised by the multilevel system is small. While model of the outermost objective fl, constructed
many problems may be stated in this way, it is at xc + s f2, in the trust region of size 6f~, subject
becoming increasingly important to consider prob- to constraints that enforce the preservation of the
lems with large bandwidth of coupling where, to predicted sufficient or optimal decrease for fl. The
use an MDO expression, 'everything affects every- total trial step is evaluated by using the merit func-
thing else'. MAESTRO (a class of multilevel al- tion designed to account for the sequential pro-
gorithms for constrained optimization; [2]) is in- cessing of the objectives. The algorithm is shown
tended for solving large nonlinear programming to converge to critical points of the bilevel or mul-
problems with arbitrary couplings among the nat- tilevel problem. Thus, the essential difference be-
urally occurring subsystems, i.e., a particular in- tween this approach and the classical approaches
stance of MDO problems with a single objective. to bilevel optimization is that instead of starting
The class was extended in, e.g., [5] to include a from the optimality conditions for the bilevel or
large class of steps for the nonlinear programming multilevel problem, the approach attempts to ob-
problem and in [3], [4] to incorporate general non- tain decrease on the sequence of subproblem mod-
linear objectives. The class makes no assumptions els, while preserving predicted decrease for the
on the structure of the problem, such as convexity previously processed subproblems, and to measure
or separability. progress via the use of an appropriate merit func-

533
Multilevel methods for optimal design

tion with rigorously updated penalty parameters. ment; Bilevel p r o g r a m m i n g : Applications in


It is important to emphasize that the merit func- engineering; Bilevel optimization: Feasibil-
tion is used only to evaluate the steps, and not to ity test and flexibility index; Bilevel pro-
compute them. g r a m m i n g : Implicit function approach.
The ongoing work is concerned with practical
implementation issues and applications to engi- References
[1] AIYOSHI, E., AND SHIMIZU, K.: 'Hierarchical decentral-
neering design problems.
ized system and its new solution by a barrier method',
IEEE Trans. Syst., Man Cybern. 11 (1981), 444-449.
S u m m a r y . Multilevel optimization has been an [2] ALEXANDROV,N.M.: 'Multilevel algorithms for nonlin-
active research field, both in applied mathematics ear equations and equality constrained optimization',
and in engineering design. Many open questions re- PhD Thesis Rice Univ. (1993).
main, in particular, in the area of practical compu- [3] ALEXANDROV, N.M.: 'Multilevel and multiobjective
optimization in multidisciplinary design', AIAA Paper,
tational algorithms for bilevel and multilevel prob-
no. 96-4122 (1996).
lems. Overviews of some recent developments can [4] ALEXANDROV, N.M.: 'A trust-region algorithm for
be found in [63]. nonlinear bilevel optimization', in preparation (2000).
Understanding the behavior of specific, non- [5] ALEXANDROV, N.M., AND DENNIS, J.E.: 'Mul-
linear programming algorithms applied to the tilevel algorithms for nonlinear optimization', in
J. BORGGAARD, J. BURKARDT, M. GUNZBURGER,
system-level problem of the bilevel or multilevel
AND J. PETERSON (eds.): Optimal Design and Control,
formulations will present an interesting and diffi- Birkh~iuser, 1995, pp. 1-22.
cult area of inquiry, and would benefit from the [6] ALEXANDROV, N.M., AND HUSSAINI, M.Y. (eds.):
techniques of nonsmooth analysis and optimiza- Multidisciplinary design optimization: State of the art,
tion ([32], [35], [46]), unconventional notions of SIAM, 1997.
constraint qualifications ([24], [25]), and optimal- [7] ALEXANDROV, N.M., AND KODIYALAM, S.: 'Initial re-
sults of an MDO method evaluation study', AIAA Pa-
ity ([96], [97]).
per, no. 98-4884 (1998).
To facilitate research and testing in the area of [8] ALEXANDROV, N.M., AND LEWIS, R.M.: 'Algorithmic
algorithms, one may find automatic bilevel and perspectives on problem formulations in MDO', AIAA
multilevel problem generators, as well as other Paper, no. 2000-4719 (2000).
sources of multilevel problems, described in [26], [9] ALEXANDROV, N.M., AND LEWIS, R.M.: 'Analytical
and computational aspects of collaborative optimiza-
[69], [70].
tion', NASA, no. TM-210104-2000 (2000).
See also: O p t i m a l design of composite [10] ALEXANDROV, N.M., AND LEWIS, R.M.: 'Analyti-
structures; Multidisciplinary design opti- cal and computational properties of distributed ap-
mization; Design o p t i m i z a t i o n in compu- proaches to MDO', AIAA Paper, no. 2000-4718 (2000).
tational fluid dynamics; Interval analy- [11] ANADALINGAM, G., AND FRIESZ, T.L.: 'Hierarchical
optimization: An introduction', Ann. Oper. Res. 34
sis: Application to chemical engineering
(1992), 1-11.
design problems; S t r u c t u r a l optimization: [12] BADHRINATH, K., AND RAO, J.R.J.: 'Bilevel models
History; O p t i m a l design in nonlinear op- for optimum designs which are insensitive to perturba-
tics; Multilevel o p t i m i z a t i o n in mechan- tions in variables and parameters', Techn. Report Univ.
ics; Bilevel p r o g r a m m i n g : Applications; Sto- Houston U H - M E - S D L - 9 4 - 0 3 (1994).
[13] BALLING, R.J., AND SOBIESZCZANSKI-SOBIESKI,
chastic bilevel programs; Bilevel fractional
J.: 'An algorithm for solving the system-
p r o g r a m m i n g ; B ilevel p r o g r a m m i n g : Intro- level problem in multilevel optimization': Fifth
duction~ history and overview; Bilevel lin- AIAA/USAF/NASA/ISSMO Symp. Multidisciplinary
ear p r o g r a m m i n g ; B ilevel linear p r o g r a m - Analysis and Optimization (Panama Beach, Florida,
ming: Complexity~ equivalence to minmax~ Sept. 7-9, 199.~), 1994, AIAA Paper no. 94-4333
concave programs; Bilevel p r o g r a m m i n g : (1994).
[14] BhRD, J.F.: 'An algorithm for solving the general
O p t i m a l i t y conditions and duality; Bilevel
bilevel programming program', Math. Oper. Res. 8
p r o g r a m m i n g ; B ilevel p r o g r a m m i n g : Algo- (1983), 260-272.
rithms; Bilevel p r o g r a m m i n g : Global opti- [15] BhRD, J.F., hND FnLK, J.E.: 'An explicit solution to
mization; Bilevel p r o g r a m m i n g in manage- the multi-level programming problem', Comput. Oper.

534
Multilevel methods for optimal design

Res. 9 (1982), 77-100. [34] DENNIS, JR., J.E., AND SCHNABEL, R.B.: Numerical
[16] BARTHELEMY, J-F.M.: 'Engineering applications of methods for unconstrained optimization and nonlinear
heuristic multilevel optimization methods', NASA equations, Prentice-Hall, 1983.
TM-101504 (1988). [35] DIRICKX, Y.M.I., AND JENNERGREN, L.P.: Systems
[17] BARTHELEMY, J-F.M., AND RILEY, M.F.: 'Improved analysis by multilevel methods, Wiley, 1979.
multilevel optimization approach for the design of com- [36] EDMUNDS, T.A., AND BARD, J.F.: 'Algorithm for non-
plex engineering systems', AIAA J. 26 (1988), 353-360. linear bilevel mathematical programs', IEEE Trans.
[is] BASAR, W., AND SELBUZ, H.: 'Closed loop Stackelberg Syst., Man Cybern. 21 (1991), 83-89.
strategies with applications in optimal control of mul- [37] EL-ALEM, M.M.: 'A global convergence theory for
tilevel systems', IEEE Trans. Autom. Control AC-24 the Celis-Dennis-Tapia trust region algorithm for con-
(1979), 166-178. strained optimization', SIAM J. Numer. Anal. 28
[19] BENDERS, J.F.: 'Partitioning procedures for solving (1991), 266-290.
mixed variables programming problems', Numerische [38] FALK, J.E., AND LIU, J.: 'On bilevel programming,
Math. 4 (1962), 238-252. Part I: General nonlinear cases', Math. Program. 70
[20] BIALAS, W.F., AND KARWAN, M.H.: 'On two-level (1995), 47-72.
optimization', IEEE Trans. Autom. Control AC-27 [39] FIACCO, A.V. (ed.): Introduction to sensitivity and sta-
(1982), 211-214. bility analysis in nonlinear programming, Acad. Press,
[21] BRACKEN, J., AND MCGILL, J.: 'Mathematical pro- 1983.
grams with optimization problems in the constraints', [4o] FIACCO, A.V., AND MCCORMICK, G.P. (eds.): Nonlin-
Oper. Res. 21 (1973), 37-44. ear programming, sequential unconstrained minimiza-
[22] BRAUN, R.D.: 'Collaborative optimization: An archi- tion techniques, SIAM, 1990.
tecture for large-scale distributed design', PhD Thesis [41] FLIPPO, O.E.: 'Stability, duality and decomposition in
Stanford Univ. (1996). general mathematical programming', PhD Thesis Eras-
[23] BRAUN, R.D., MOORE, A.A., AND KROO, I.M.: 'Col- mus Univ. Rotterdam, The Netherlands (1989).
laborative approach to launch vehicle design', J. Space- [42] FLIPPO, O.E., AND RINNOOY KAN, A.H.G.: 'Decom-
craft and Rockets 34 (1997), 478-486. position in general mathematical programming', Math.
[24] BURKE, J.V.: 'Calmness and exact penalization', Program. 60 (1993), 361-382.
SIAM J. Control Optim. 29 (1991), 493-497. [43] FORTUNY-AMAT, J., AND MCCARL, B.: 'A represen-
BURKE, J.V.: 'An exact penalization viewpoint of con- tation of a two-level programming problem', J. Oper.
strained optimization', SIAM J. Control Optim. 29 Res. Soc. 32 (1981), 783-792.
(1991), 968-998. [44] FRIESZ, T., TOBIN, R., CHO, H., AND MEHTA, N.:
[26] CALAMAI, P.H., AND VICENTE, L.N.: 'Generating lin- 'Sensitivity analysis based heuristic algorithms for
ear and linear-quadratic bilevel programming prob- mathematical programs with variational inequality
lems', SIAM J. Sci. Comput. 14 (1993), 770-782. constraints', Math. Program. 48 (1990), 265-284.
[27] CANDLER, W., AND TOWNSLEY, R.: 'A linear two-level [45] GAHUTU, D.W.H., AND LOOZE, D.P.: 'Parametric co-
programming problem', Comput. Oper. Res. 9 (1982), ordination in hierarchical control', Large Scale Systems
59-76. a (1985), 33-45.
[28] CHEN, Y.: 'Bilevel programming problems: Analysis, [46] GAUVIN, J.: 'The method of parametric decomposition
algorithms and applications', PhD Thesis Univ. Mon- in mathematical programming: the nonconvex case', in
treal (1994). C. LEMARECHAL AND R. GRIFFIN (eds.): Nonsmooth
[29] CHIDAMBARAM, B., AND RAO, J.R.J.: 'A study of con- optimization, Pergamon, 1978, pp. 131-149.
straint activity in bilevel models of optimal design', [47] GAUVIN, J., AND DUBEAU, F.: 'Some examples and
Techn. Report Univ. Houston UH-ME-SDL-94-01 counterexamples for stability analysis of nonlinear pro-
(1994). gramming problems', in A.V. FIACCO (ed.): Vol. 21 of
[30] CLARKE, F.H. (ed.): Optimization and nonsmooth Math. Program. Stud., North-Holland, 1983, pp. 69-78.
analysis, SIAM, 1990. [48] GEOFFRION, A.M.: 'Generalized Benders decomposi-
[31] DANZIG, G.B., AND WOLFE, P.: 'Decomposition prin- tion', J. Optim. Th. Appl. 10 (1972), 237-260.
ciple for linear programming', Oper. Res. $ (1960), [4o] GOULBECK, B., BRDYS, M., ORR, C.H., AND RANCE,
101-111. J.P.: 'A hierarchical approach to optimized control of
[a2] DE LUCA, A., AND DI PILLO, G.: 'Exact augmented water distribution systems: Part I, decomposition', Op-
Lagrangian approach to multilevel optimization of timal Control Appl. Meth. 9 (1988), 51-61.
large-scale systems', Internat. J. Syst. Sci. 18 (1987), [50] GOULBECK, B., BRDYS, M., ORR, C.H., AND RANCE,
157-176. J.P.: 'A hierarchical approach to optimized control of
[33] DE SILVA, A.H., AND McCORMICK, G.P.: 'Implicitly water distribution systems: part II, lower-level algo-
defined optimization problems', Ann. Oper. Res. 34 rithm', Optimal Control Appl. Meth. 9 (1988), 109-126.
(1992), 107-124.

535
Multilevel methods/or optimal design

[51] HAIMES, Y.Y.: Hierarchical analyses of water resources (1994), 340-357.


systems, McGraw-Hill, 1977. [69] PADULA, S.L., ALEXANDROV, N.M., AND GREEN,
[52] HAIMES, Y.Y., TARVAINEN, K., SHIMA, T., AND L.L.: 'MDO test suite at NASA Langley Research Cen-
THADATHIL, J.: Hierarchical multiobjective analysis of ter': Proc. Sixth AIAA/NASA/ISSMO Syrup. Multidis-
large-scale systems, Hemisphere, 1990. ciplinary Analysis and Optimization, AIAA, 1996.
[53] ISHIZUKA, Y., AND AIYOSHI, E.: 'Double penalty [701 PADULA, S.L., AND YOUNG, K.C.: 'Simulator for mul-
method for bilevel optimization problems', Ann. Oper. tilevel optimization research', NASA Techn. Memoran-
Res. 34 (1992), 73-88. dum TM-87751 (1986).
[54] Kmscn, U.: 'Improved optimum structural design by [71] RAO, J.R.J., AND BADHRINATH, K.: 'Solution of mul-
passive control', Engin. with Computers 5 (1989), 13- tilevel structural design problems using a nonsmooth
22. algorithm': Proc. Sixth AIAA/NASA//ISSMO Syrup.
KODIYALAM, S.: 'Evaluation of methods for multidis- Multidisciplinary Analysis and Optimization, AIAA,
ciplinary design optimization (MDO), phase I', NASA 1996, AIAA-96-3986-CR.
Contractor Report (1998). [72] RAO, J.R.J., AND CHIDAMBARAM, B.: 'Parametric de-
[56] KOLSTAD, C.D., AND LASDON, L.S.: 'Derivative eval- formation and model optimality in concurrent design',
uation and computational experience with large bilevel Techn. Report Univ. Houston UH-ME-SDL-93-01
mathematical programs', J. Optim. Th. Appl. 65 (1993).
(1990), 485-499. [73] REDDY, S.Y., FERTIG, K.W., AND SMITH, D.E.: 'Con-
[57] LASDON, L.S.: Optimization theory for large systems, straint management methodology for conceptual de-
MacMillan, 1970. sign tradeoff studies': Proc. DETC '96, Aug. 1996,
[581 LORIDAN, P., AND MORGAN, J.: 'Quasiconvex lower ASME Paper 96-DETC/DTM-1228.
level problems and applications in two-level optimi- [74] ROCKAFELLAR, R.T.: 'Directional differentiability of
zation', Techn. Report Univ. Montreal CRM-1578 the optimal value function in a nonlinear programming
(1988). problem', Math. Program. Stud. 21 (1984), 312-226.
[59] MAHMOUDI, M.S.: 'Multilevel systems control and ap- [75] ROGERS, J.L.: 'DeMAID/GA- An enhanced design
plications: A survey', IEEE Trans. Syst., Man Cybern. manager's aid for intelligent decomposition', Proc.
SMC-7 (1977), 125-143. Sixth AIAA/NASA/ISSMO Syrup. Multidisciplinary
[6o] MARCOTTE, P.: 'A new algorithm for solving varia- Analysis and Optimization (1996), AIAA Paper 96-
tional inequalities with application to the traffic assign- 4157.
ment problem', Math. Program. 33 (1985), 339-351. [76] ROGERS, J.L.: 'DeMAID/GA user's guide - Design
MARCOTTE, P., AND ZHU, D.L.: 'Exact and inexact manager's aid for intelligent decomposition with a ge-
penalty methods for the generalized bilevel program- netic algorithm', NASA Langley Res. Center TM-
ming problem', Math. Program. 74 (1996), 141-157. 110241 (1996).
[62] MESAROVI(~, M.D., MACKO, D., AND TAKAHARA, [77] SCHMITT, L.A., AND CHANG, K.J.: 'A multilevel
Y.: Theory of hierarchical, multilevel, systems, Acad. method for structural synthesis': A Collection of Tech-
Press, 1970. nical Papers: AIAA/ASME/ASCE/AHS 25th Struc-
[63] MIGDALAS, A., PARDALOS, P.M., AND VARBRAND, P. tures, Structural Dynamics and Materials Conf.,
(eds.): Multilevel optimization: Algorithms and applica- AIAA, 1984.
tions, Kluwer Acad. Publ., 1998. [78] SCHMITT, JR., L.A., AND MEHRINFAR, M.: 'Multi-
[64] MOR~, J.J • 'Recent developments in software for trust level optimum design of structures with fiber-composite
region methods', in A. BACHEM, M. GROTSCHEL, AND stiffened panel components', AiAA J. 20, no. 1 (1982),
B. KORTE (eds.): Mathematical Programming: The 138-147.
State of the Art, Springer, 1991, pp. 266-290. [79] SCHMITT, JR., L.A., AND RAMANATHAN, R.K.: 'Mul-
[65] MURALIDHAR, R., RAO, J.R.J., BADHRINATH, K., tilevel approach to minimum weight design inclusing
AND KALAGATLA, A.: 'Multilevel formulations and buckling constraints', AiAA J. 16, no. 2 (1978), 97-
limit analysis and design of structures with bilat- 104.
eral contact constraints', Techn. Report Univ. Houston [80] SHIMIZU, g., AND AIYOSHI, E.: 'A new computational
UH-ME-SDL-95-02 (1995). method for Stackelberg and min-max problems by use
[66] NACHANE, D.M.: 'Optimization methods in multilevel of a penalty method', IEEE Trans. Aurora. Control
systems: A methodological survey', Europ. J. Oper. AC- 26 (I 981), 460-466.
Res. 21 (1984), 25-38. [81] SIMAAN, M., AND CRUZ, JR., J.B.: 'On the Stackel-
[67] NICHOLLS, M.G.: 'Aluminium production modelling berg strategy in nonzero-sum games', J. Optim. Th.
- a non-linear bi-level programming approach', Oper. Appl. 11, no. 5 (1973), 535-555.
Res. 43 (1995), 208-218. [82] SINGH, M.G., MAHMOUD, M.S., AND TITLI, A.: 'A
[68] OUTRATA, J.: 'On optimization problems with vari- survey of recent developments in hierarchical optimiza-
ational inequality constraints', SIAM J. Optim. 4 tion and control': Proc. IFAC Control Sci. and Techn.

536
Multilevel optimization in mechanics

8th Triennial World Congress, Kyoto, Japan, IFAC, Natalia M. Alexandrov


1981. NASA Langley Res. Center
Is3] SOBIESKI, I., AND KROO, I.: 'Aircraft design using Hampton, Virginia, USA
collaborative optimization': AIAA paper 96-0715 Pre- E-mail address: n. alexandrov©larc .nasa. gov
sented at the 3.{th AIAA Aerospace Sci. Meeting, Reno,
MSC2000: 49M37, 65K05, 65K10, 90C30, 93A13
Nevada, Jan. 15-18, 1996, AIAA, 1996.
Key words and phrases: nonlinear optimization, multilevel,
Is4] SOBIESZCZANSKI-SOBIESKI, J.: 'Optimization by de-
bilevel, hierarchical, multidisciplinary design.
composition', in M.P. KAMAT (ed.): Structural Opti-
mization: Status and Promise, Vol. 150 of Progress in
Astronautics and Aeronautics, AIAA, 1993, pp. 487-
515. MULTILEVEL OPTIMIZATION IN ME-
[sb] SOBIESZCZANSKI-SOBIESKI, J.: 'Two alternative ways CHANICS
for solving the coordination problem in multilevel op- Multilevel optimization methods have been devel-
timization', Structural Optim. 6 (1993), 205-215. oped first in the period after 1960. The main scope
Is6] SOBIESZCZANSKI-SOBIESKI, J.: 'Multidisciplinary
was to facilitate the optimization of large scale sys-
aerospace design optimization: Survey of recent de-
velopments': Proc. 3,{-th Aerospace Sci. Meeting and tems in industrial processes and to solve trajectory
Exhibit, Reno, Nevada, AIAA, 1996, AIAA paper determination and prediction problems using tra-
96-0711. jectory decomposition techniques. The reader may
[sT] SOBIESZCZANSKI-SOBIESKI, J., AND HAFTKA, R.T.: refer in this respect to the corresponding articles
'Multidisciplinary aerospace design optimization: sur-
[3] and [26] and to the references given there but
vey of recent developments', Structural Optim. 14
(1997), 1-23. also to the books [27] and [12]. More recent works
[ss] SOBIESZCZANSKI-SOBIESKI, J., JAMES, B.B., AND on this subject have been published in [14], [4]. It
DOVl, A.R.: 'Structural optimization by generalized should be mentioned that certain sources concern-
multilevel optimization', AIAA J. 23 (1985), 1775- ing the ideas of multilevel optimization may be
1782.
found in well-known treatises of calculus of varia-
[sg] STACKELBERG, H. (ed.): The theory of the market econ-
tions and theoretical mechanics, el. e.g. [10], [5].
omy, Oxford Univ. Press, 1952.
[90] TAPPETA, R.V., AND RENAUD, J.E.: 'Multiobjective Indeed, the well-known procedure of variational
collaborative optimization', J. Mechanical Design 119 methods in Mechanics of 'frozen' variables or con-
(1997), 403-411. straints has a great relationship with the ideas of
[91] VICENTE, L.N., AND CALAMAI, P.H.: 'Bilevel and mul- multilevel optimization. Also the well-known iter-
tilevel programming: A bibliographic review', J. Global
ative methods of H. Cross and G. Kany of linear
Optim. 5 (1994), 291-306.
[92] WAGNER, I . e . : 'A general decomposition method- structural analysis used after 1940 and before the
ology for optimal system design', PhD Thesis Univ. development of computer codes based on the fi-
Michigan (1993). nite element method (FEM), for the calculation of
[93] WALSH, J.L., LAMARSH, W.J:, AND ADELMAN, H.M." framed structures, are nothing than a formulation
'Fully integrated aerodynamic/dynamic/structural op- in the 'language' of structural analysis of a multi-
timization of helicopter rotor blades', NASA T M -
104226 (1992).
level optimization algorithm for the minimization
[94] WALSH, J.L., YOUNG, K.C., PRITCHARD, J.I., AbEL- problem of the complementary energy of the struc-
MAN, H.M., AND MANTAY, w . a . : 'Integrated aerody- ture, expressed in terms of the bending moments
namic/dynamic/structural optimization of helicopter of the beam and column connections.
rotor blades using multilevel decomposition', NASA Among the pioneers in the application of the
T P - 3 4 6 5 (1995).
multilevel optimization methods in mechanics and
[95] WISMER, b.A. (ed.): Optimization methods for large-
scale systems, McGraw-Hill, 1971. especially concerning the calculation of structures
[96] YE, J.J., ZHU, D.L., AND ZHU, Q.J.: 'Exact penaliza- involving inequality constraints was P.D. Pana-
tion and necessary optimality conditions for general- giotopoulos [19], [20]. The idea was the follow-
ized bilevel programming problems', SIAM J. Optim. ing: Most mechanical problems can be expressed as
r (~ 997), 4s ~-507.
the minimum problems of an appropriately formu-
[97] ZHANG, R.: 'Problems of hierarchical optimization in
finite dimensions', SIAM J. Optim. 4 (1994), 521-536. lated energy function. The decomposition of this
initial optimization problem into smaller subprob-
lems corresponds to the energetic decomposition

537
Multilevel optimization in mechanics

of the initial mechanical problem into smaller ficti- 0 E Of(x),


tious subproblems. The mutual interaction of these
subproblems yields, after an iterative procedure, where f is a nonconvex nonsmooth energy func-
m

the solution of the initial problem. The aforemen- tion and 0 denotes the generalized gradient of F.H.
tioned method leads to the following three main Clarke [7] as it has been extended by R.T. Rock-
applications of the multilevel optimization tech- afellar [25] for nonLipschitzian functionals. In this
niques in the framework of Mechanics and more case the variational inequalities of the convex en-
generally in engineering sciences. ergy problems are replaced by hemivariational in-
a) Calculation of large structures. equalities (cf. e.g. [20], [21], [17], [8])and instead
of a global minimum of the convex potential or
b) Validation of the simplifying assumptions
complementary energy functionals, the local min-
used for the calculation of complex struc-
ima and maxima are searched and among them
tures. Accuracy testing.
the global minimum as well. For the numerical
c) Accuracy improvement of simplified models treatment of hemivariational inequalities certain
used for the estimation of the behavior of numerical methods have been developed (cf. e.g.
complex structures. [21]) and among them, the two methods described
Note that in the above, the term 'structure' can be in [15] are extensions of the multilevel optimization
replaced with the term 'systems', meaning systems methods to substationarity problems.
whose behavior is characterized by the solution of It should also be noted that most of the do-
a minimax problem. main decomposition methods are special cases of
Since most of the multilevel techniques devel- the multilevel optimization algorithms, as it re-
oped in the early sixties for the trajectory deter- sults easily if one considers the energy functionals
mination problems in space science are also appli- corresponding to the partial differential equations
cable to stationarity problems, and since recently studied. Then the domain decomposition leads to
it has been proved that in the dynamic problems energy functionals which have to be minimized on
involving impact phenomena the functional of the the decomposed parts of the domain.
action is stationary [23], [22] it results that there Finally, it should be mentioned that fractal ge-
is also a further application of the multilevel opti- ometries in optimization problems arising in Me-
mization methods: chanics are treated by means of appropriate mul-
tilevel transformations of the problem as is will
d) Calculation of the dynamic behavior of struc- be shown further. It is evident that an optimiza-
tures involving impact effects. tion problem with many variables cannot always
To the aforementioned applications the following, directly be decomposed into independent optimi-
classical one, can be added. zation subproblems. The aim of the multilevel op-
timization is to define with respect to an optimi-
e) Solution of optimal control (minimum of
zation problem, appropriate mutually independent
weight or cost, maximum of strength) in dy-
subproblems. Each of these when solved indepen-
namic structural analysis problems.
dently yields the optimum of the overall problem
This article deals mainly with static systems. after an iterative procedure which is called second-
Concerning the application d) and e) the reader is level controller. The decomposition into subprob-
referred to [27], [12] in relation with [23], [22]. In lems is achieved by choosing some variables, called
dynamic problems analogous methods to the static coordinating variables, which are freely manipu-
problems can be developed. lated by the second-level controller in such a way
The classical decomposition techniques which that the subproblems (first-level of the problem)
are applied to optimization problems (cf. in this re- have solutions which in fact yield the optimum of
spect also [20, pp. 355f[]) have been extended and the initial problem, i.e. before its decomposition
they can be applied also to substationarity prob- into subproblems. Here, the ideas of [3] are closely
lems [25], i.e. to problems of the type followed.

538
Multilevel optimization in mechanics

There are several different methods of trans- H is immediately separable into N individual sub-
forming a given constrained optimization problem systems, except for its last term.
into a multilevel optimization problem. All these In the method of nonfeasible decomposition it
methods are basically combination of two meth- is assumed that p(i) has a known value. The term
ods: the feasible decomposition method or model p(i)Ts(i) is put in the ith subsystem and all of the
coordination method and the nonfeasible decompo- p(i)Tg(i)(x(J), u (j)) terms associated with the j t h
sition method or goal coordination method. variables are put in the j t h subsystem. On the
Let us consider the problem other hand, in the feasible decomposition method
min H(x, u) it is assumed that s (i) has a known value. More-
X~U over, all of the p(i)T[g(i)(x(J), u ( J ) ) - S(i)] terms as-
s.t. f(x, u) - 0 (1) sociated with the j t h variables are put in the j t h
a(x, u) > 0, subsystem. In both cases, the optimization prob-
where x is a vector in En, u is a vector in E r a , f lem is separable and each subsystem can be opti-
is an n vector of C 2 functions, II is a twice contin- mized independently. Equation (2) is rewritten in
uously differentiable (C 2) function, and R is an r more compact form as
vector of C 2 functions. To decompose, coordinat- n ( x , v; A,/z, p) (3)
ing variables s may be substituted not only for a
single variable but also, for functions g(x, u), so
= F(x, v)+ ~Tf(x, v)
that II is splitted into mutually disjoint parts and + # T [ R ( x , v) - (r] + pTh(x, v),
the f and R equations contain no common x, u,
where tr > 0, v represents u and s and h(x, v)
or s variables between the subproblems. Thus the
denotes all g(i) - s (i), p is a Lagrange multiplier
following problem results:
vector of the same dimension as g,/z is an r vector
N
including all Lagrange multipliers, and A is an n
n ( x , u, s) - ~ II (~)(x (~) , u (~), s (~))
vector including all Lagrange multipliers.
i--1
The Kuhn-Tucker theory of nonlinear program-
f(i)(x(i),u(i),s(i)) -- O, i - 1,... ,N,
ming [9] implies that if II(x, v) has a critical point
R (i) (x (/), u (i), s (/)) > 0, i - 1,..., N. at (x °, v °) such that the constraint equations in
The (i) denotes to the ith subproblem or subsys- (1), are satisfied, and if the rank of
tem which must be optimized. For example in a
control problem x denotes the state, u denotes the
control and x (1) is the state vector for the first
subsystem. Also the coupling equations must be is full and equals the rank of
added:
s (/) - g(i)(x (j), u (j)) for all j ¢ i.
T( RT T N ) 1' (4)
The Lagrangian of the new problem reads
where
n(x, u, s; ~, t,, o) (2)
N N y (:)
= ~ H (i) + ~ A (i)T f(i)
i=1 i=1 at (x °, v°), then a set of unique Lagrange multi-
N N pliers A°, /z ° and p0 exist at the critical point.
+ ~ . ( ~ ) ~ (R(') - ~('))+ Z o (~)~(g(~) - ~(~)), The necessary conditions for a critical point (local
i-1 i-1
minimum) are
where er (i) > 0 are additional slack variables such
that OH 0II
= =0, #~R~-0, R>0, /z<0,
Ox Ov (5)
R (i) - a (i) - 0.

539
Multilevel optimization in mechanics

OH = fT _ 0, II
0X 0 p = hT -- O. (6) As=-~ ~ , witha>0,

If II(x, v) is convex, if fi(x, v) and hi(x, v) are con-


will converge to s o - x ° and the minimum of (1).
vex for )~o and pO positive, or if fi(x, v), hi(x, v),
The good choice of ~ is important for the gra-
Ri(x, v) are concave for ~o, pO, #o negative, and
dient calculations. Then at the second level of the
the above necessary conditions are satisfied, then
feasible method, we may write ([3, p. 142]) that
H(x °, v °) is the absolute minimum of (1) and fi
has a global saddle point at (x °, v°); that is, aft aft (aft) T
fi(x, v; ,x°, u °, p0) > fi(x 0, v0; )0, u0, o0) dII*- ~ s s d S - - a - O ~s ~ , ~>0.

_>fi( x°, v°; u, p) An estimate of the expected improvement is writ-


ten as - a l l * , a > 0, where a is usually 10% or so.
for all x, v, X, /z, and p. These conditions can
be relaxed to local convexity and concavity such Then
that only a local minimum and saddle point are aII*
assured. -- T"

The non.feasible gradient controller of L.C. Las-


don and J.D. Schoeffier [11] has the following form: In the case of nonfeasible decomposition a similar
Given (1), suppose that equation may be obtained [3]"

a) H has a global saddle point at c~II*


(x 0, v0; X0, #0, p0); and -- gTg (8)
b) for any given p, a finite constrained (unique) Note that As and A p become singular at the op-
minimum (constrained by f and R) exists. timum if (7) and (8) are used, respectively, and
Then the iterative procedure given by therefore these values of As and A p are not ap-
propriate to obtain exact solutions.
i+lp _ ip + Ap,
There is also the possibility to apply a Newton-
where Raphson controller both for the feasible and for
Ap = + a h ( x * , v * ) , with ~ > 0, the nonfeasible method in the second level (cf. in
this context [3, p. 173]).
will converge to p0 and the absolute minimum of
For instance examining (5) and (6), it is obvi-
(1). Note that a local saddle point can replace a),
ous that the only necessary condition not satisfied
then the initial guess on p must be within this sad-
by the subsystems is g - 0 in the nonfeasible de-
dle region. However, then the algorithm leads only
composition method. Thus the Newton-Raphson
to a local minimum. This Lasdon gradient con-
method has as task to solve g - 0 by an iterative
troller can be considered as a variant of the modi-
method at the second level.
fied A rrow-Hurwicz gradient method of K. Arrow,
Note that the main characteristic of the afore-
L. Hurwicz and H. Uzawa [1].
mentioned methods, i.e. the decomposition into
The feasible gradient controller of C.B. Brosilow
subsystems and the separable optimization applies
et al. [6] has the following form: Given (1), suppose
also to nonsmooth convex or nonconvex optimiza-
that
tion problems.
a) a finite minimum exists at (x °, v°); and
b) all the conditions of (5) and (6) are fulfilled L a r g e C a b l e S t r u c t u r e s . Here a possibility of-
except for 0 H / 0 s - 0, (where v denotes all s fered in structural analysis by the multilevel op-
and u). timization algorithms is presented. Certain sub-
Then the iterative procedure given by problems do not contain inequalities, i.e. are bilat-
eral, and thus they can be treated by the available
i+ls -- is + As,
classical (i.e. based only on inequalities) FEM pro-
where grams.

540
Multilevel optimization in mechanics

In the majority of cable structures the number method of Lasdon and Schoeffier and the feasi-
of cables and nodes is large, and so an optimi- ble gradient controller method of Brosilow, Las-
zation problem with a large number of unknowns don and Pearson [11]. In the nonfeasible gradient
and constraints must be solved. Here, a multilevel controller method the value of p is supposed to be
optimization technique suitable for the solution of constant in the first level, say Pl, and the min-
this kind of optimization problem is proposed. The imization problem decomposes into the two sub-
initial optimization problem is decomposed into a problems
number of subproblems. In the 'first level' of the
min{H'(u) + u T G K o w - p~w}
calculation, each subproblem is optimized sepa- U,W

rately, and in the 'second level' the solutions of and


these subproblems are combined to yield the over-
all optimum.
min
t.
p:vv +, > )
v

It is interesting to note that some of these sub- After performing the optimization, the values of u,
problems constitute minimization problems with- v and w, e.g. ui, vi and wi, result. It is obvious
out inequality constraints (corresponding to clas- that vi # wi. The task of the second level is to
sical bilateral structures), and the algorithms for estimate a new value of p, e.g. P2 by means of the
their numerical treatment are much faster. The ini- equation
tial problem is decomposed into two subproblems:
P2 -- Pl -{" ~ ( V l -- Wl), g > 0,
the first involves only the displacement terms and
corresponds to a structure resulting from the given where ~ is a properly chosen constant (see, e.g.,
one by considering that all the cables act as bars [11]), and to transmit this value to the first level.
(capable of having compressive forces), and the The optimization is performed again, new values
second, including only the slackness terms, corre- u2, v2 and w2 result, etc., until the differences
sponds to a hypothetical slack structure. In order v i - wi are made negligible. The algorithm con-
to perform the decomposition, the potential energy verges in a finite number of steps, provided that
of the structure is written in the form the minima exist [11].
II(u, v) - n ' ( u ) + II"(v) + u T G K 0 v, (9) In the feasible gradient controller method, the
value of w is taken as constant in the first level,
where
e.g. wi, and thus the initial problem decomposes
1 T T into the two subproblems
II'(u) - ~ u K u - u (GK0e0 + p) (10)

and min{H'(u) + u T G K o w i }
u

H"(v) - l v T K 0 v T + v T ( a - K0e0). (11) and


In the above equations u, v, p, e0 are the displace- min { II~'(v) + pT(v ,,
-- wi)" v + b _ 0 . }
v,p
ments, slackness, loading and initial strain vectors
respectively, K0 is the natural stiffness matrix, K As a result of the optimization, the values of u,
is the stiffness matrix of the assembled structure v and p, e.g. ui, vi and Pi are calculated. By
and G is the equilibrium matrix. Introducing the means of the second level a new value of w, e.g.
variable w the minimization problem (9) takes the w2, is estimated and transmitted to the first level.
form This value is given by the equation

minH(u, v, w) - II'(u) + II"(v) + u T G K 0 w. - 0Hi (u, v, w) ) , ~>0,


W2 -- Wl -- g _ OW w=wi
The Lagrangian of this problem is
where tc is a properly chosen constant (see, e.g.,
II1 (u, v, w) - - II(u, v, w) -F pT(v -- w), [11]). The optimization yields a new set of values
where p is the vector of the Lagrange multipliers. u2, v2 and P2 and the procedure is continued un-
The decomposition can be performed by means of til the difference between the consecutive values of
two methods: the nonfeasible gradient controller vector w becomes sufficiently small.

541
Multilevel optimization in mechanics

For numerical applications the reader is referred n(u, A, w) - n(u, A, w) + ps(A _ w)


to [20].
and the minimization problem is decomposed in
the following two subproblems
L a r g e E l a s t o p l a s t i c S t r u c t u r e s . We consider
here the holonomic plasticity model [13], (ex- rain { I I ' ( u ) - u T G K o N w - p T w } (14)
U~W
tension to nonholonomic plasticity problems is
straightforward) described by the following equa- and
tions" min l~YI~'(A)+ pT)~. A _~ 0}. (15)
e = FOS, In the first step it is supposed that the value of p
e = e0 + eE + ep, is constant (say Pi) and we take as a result from
ep = NA, (14) and (15) the values Ul, A1 and wi. Obviously
Ai # wi. Then the second level controller esti-
¢- NTs -- k,
mates the new value nf p from the equation
)~_~ 0,, ¢_~0, cT,,~ -- 0,
P2 : Pl -[- g()~l -- Wl), g > 0,
where F0 is the natural flexibility matrix of the
and transmits it to the first level, and the pro-
structure, e the respective strain vector consist-
cedure is continued until the differences A i - wi
ing of three parts, the initial strain e0, the elas-
become appropriately small.
tic strain eE and the plastic strain ep, )~ are the
The same procedure can be applied also to holo-
plastic multipliers vector, ~b the yield functions, N
nomic models including hardening and to nonholo-
is the matrix of the gradients of the yield func-
nomic plasticity models [13].
tions with respect to the stresses and k is a vector
of positive constants. The potential energy of the
Validation and Improvements of Simplified
structure is written in the form
M o d e l s . In mechanics and engineering sciences
II(u, A) - II' (u) 4- II" (A) - u T G K o N A as well as in economy, simplified models are often
where considered for the treatment of complicated prob-
1 lems, e.g. concerning the calculation of stresses in
H'(u) - ~uTKu -- e 0 T K o G T u - p T u , complex structures. In these models it is assumed
that certain quantities do not influence consider-
H"(A) -- 2 A T N T K o N A + e0TKoNA - kA. ably the solution of the problem. By means of the
Again, K is the stiffness matrix of the structure multilevel decomposition, a method which permits
and K0 is the inverse of F0. the validation of these models and the improve-
The solution of the problem can be obtained by ment of their accuracy can be developed. This idea
minimizing the potential energy of the structure: is explained in the sequel.

rain {II(u, A): A _> 0}. (12) A. Consider a large structure involving also some
cables and assume that due to the pretension of the
By introducing a new variable w, (12) takes the
cables the structure is calculated as if the cables
form
are rods, i.e. by ignoring the fact that a cable may
min {II(u, A, w) - II'(u) + II"(A) (13) become slack and then it has zero stresses. Then
in the equations (9)-(11) v - 0 and the solution
~uTGKoNw e w- A, A _> 0 } .
of the minimum problem is obtained by solving an
As in the previous section, the decomposition can unconstrained minimization problem, i.e. by a lin-
be performed by the two methods of the feasi- ear system solver. In order to check whether the
ble and the nonfeasible gradient controller respec- solution of the simplified model is close to the so-
tively. For the sake of brevity only the nonfeasi- lution of the initial problem, in which some cables,
ble gradient method will be shown here. The La- say r, may become slack, i.e. vi > O, i - 1 , . . . , r,
grangian of (13) is first considered it is enough to verify whether the second level

542
Multilevel optimization in mechanics

controller which gives a value of the slackness of stresses) or displacements (respectively, forces).
the cables causes a significant change in the solu- Thus the feasible and the nonfeasible decompo-
tion of the first level problem which corresponds to sition method have a precise mechanical mean-
the simplified structure. Also the algorithm offers ing. In the first case the Lagrange multipliers, i.e.
an improvement of the solution of the simplified the strains (respectively, the stress) are controlled
model. while in the second one the coordinating variables,
i.e. the stress (respectively, the strain) of the links
B. Here, the investigation of the mutual influence
between the two substructures are controlled, in
of two subsystems is presented. Consider two sub-
order to achieve the position of equilibrium of the
structures connected together, for instance a cylin-
whole structure.
drical shell with a hemispherical shell covering the
one end of the cylinder. The solution of the whole D. Some of the resulting substructures may have
linear elastic structural compound minimizes, for a known analytical solution. Then this fact facili-
a given external loading, the potential (or the com- tates the calculation and may be applied as a test
plementary) energy of the whole structure. Let Xl for the accuracy of the resulting solution via a nu-
(respectively, x2) be the variables of the cylindri- merical technique, e.g. by the FEM model. The
cal (respectively, the hemispherical) shell and let z procedure is described in [24].
be the common variables at the contact line which
are common in both structures. In order to decom- E. The multilevel decomposition method can be
pose the potential energy into two minimum prob- used also as estimator of the sensitivity of the final
lems, one containing the unknowns of the cylindri- solution to small changes of the system to be opti-
cal shell and the other of the hemispherical shell, mized [24]. This method may be used for example
the common variables for the cylindrical (respec- in estimating how a partial change in a structure
tively, hemispherical) shell are denoted by Zl (re- influences the stress and strain field of the struc-
spectively, z2) and thus the initial problem ture without solving twice the structure.

min {YI(Xl, X2, Z) - - 1-I1(Xl, z) -{- 1-I2(x2, z)}


Xl ,X2 ~.

is written as
D e c o m p o s i t i o n A l g o r i t h m s for N o n c o n v e x
rain {IIl(Xl, Zl) + II2(x2, z2)" Zl - z2 - 0}. M i n i m i z a t i o n P r o b l e m s . In unilateral contact
Xl ,X2 ,ZI ,~'2
problems with friction, Panagiotopoulos proposed
Here HI (respectively, YI2) denotes the potential
in 1975 an algorithm [18] called later PANA-
or the complementary energy of the cylindrical (re-
algorithm for the decomposition of the quasivari-
spectively, the hemispherical).shell. Thus it can be
ational inequality problems into two classical vari-
tested by the nonfeasible controller method how
ational inequality problems which are equivalent
the difference Zl -z2 influences the solution of the
to two minimization problems. Analogous decom-
problem. The procedure is similar in the case of
position methods of complicated problems using
elastoplastic structures with the difference that the
an analogous to [18] fixed point procedure can be
minimum is constrained by inequalities.
applied to the treatment of much more compli-
The above procedure may find applications in
cated problems today involving nonconvex energy
estimating the influence of saddles on pipelines of
functions. This section is devoted to the study of
rigidity rings on long tubes etc.
multilevel decomposition algorithms for problems
C. Note that in all the above cases the Lagrange belonging to the general framework of the substa-
multipliers have a precise meaning: they corre- tionarity problems.
spond in the sense of energy to the chosen co- It is known that the equilibrium of an elastic
ordinating variables, i.e., if the coordinating vari- body gt in adhesive contact with a support F is
ables are stresses (respectively, strains) or forces governed by the following problem [21], [17]: Find
(respectively, displacements) then the coordinat- u E V such as to satisfy the hemivariational in-
ing Lagrange multipliers are strains (respectively, equality

543
Multilevel optimization in mechanics

a(u, v - u) + f j ° ( u g , VN - - ug)dF (16)


F

+ f j°(uT, VT -- uT)dr >_ ( f , v -- u), Vv ~ Y.


(17)
F is solved. The above problem yields a value of ST,
say S (1). Then the problem
Here u, v are the displacement fields, f are all the 0 (18)
applied forces, (f, v) usually a L 2 internal prod-
u c t - is the work of the applied forces, a(u, v) is
e O -~a(u, u) + jg( , uN)dr -- ( , u)
the elastic strain energy which is usually a coercive
F
form, j g (respectively, jT) denote the nonconvex,
locally Lipschitz generally nonsmooth energy den- is solved (S(T1) enters with its work into (f~l), u))
sity functions of the adhesive forces in the normal yielding a new value of SN, say S(~ ) , and so on until
(respectively, the tangential) direction to the in- the differences IIS(~) - - ~¢(i+1)
N [[ and [[S(~) - ~ ¢(i+1)
'T [I
terface F. It is assumed that the normal adhesive at each point of the discretized interface F become
action is independent of the tangential adhesive appropriately small. Here [[.[[ denotes the R3-norm
action. Moreover, j o , jo denote the directional because the values are checked pointwise. The first
derivative in the sense of Clarke [7], and uy, vg (respectively, second) problem with jN -- 0 (re-
(respectively, UT, VT) denote the normal (respec- spectively, with jT -- 0) corresponds to the first
tively, tangential) component of the displacement level (respectively, to the second level). Applica-
with respect to F. The solution of the above prob- tions of the above procedure can be found in [20],
lem can be obtained in most cases of practical [21], [15].
interest (cf. [21]) under certain mild hypotheses
which guarantee this equivalence, by solving the S t r u c t u r e s w i t h Fractal Interfaces. In this sec-
substationarity problem tion the attention is focused on the fractal geom-
etry of interfaces where their behavior is modeled
m
by means of an appropriate nonmonotone contact
0 E OI(u)
and friction mechanism. The interfaces of fractal

- ~ Ix j
~ ( u , u) + jN(UN)dF
geometry are analyzed here as a sequence of clas-
sical interface subproblems. These classical sub-
problems result from the consideration of the ]rac-
tal interface as the unique 'fixed point' of a given
iterative ]unction system (IFS), which consists of
+f - (i, u)/' N contractive mappings wi" R 2 -+ R 2 with con-
tractivity factors 0 < si < 1, i - 1 , . . . , N [2].
According to this procedure, a fractal set A is the
where 0 denotes the generalized gradient of Clarke.
'fixed point' of a transformation W i.e.
In engineering problems the nonconvex superpo-
N
tentials (cf. e.g. [16]) j g and jT are not indepen-
A- W(A) - (.j Wi(A),
dent but they depend jY (respectively, jT) on the
i=1
vectors ST (respectively, SN), where ST, SN are
where Wi is defined
the reactions corresponding to UT, ug respectively.
In this case a hemivariational inequality cannot be Wi(B) = {wi(x)" x E B}, VB E H(R2).
formulated. In order to solve this problem numer- Generally a fractal set A is given by the relation"
ically one may apply the following procedure: In
the first step it is assumed that SN is given, say,
A-lira
n--+ oo
W(~)(B), VBeH(R2),
S(~ ) and the problem (S(N°) enters with its work where H ( R 2) is the space of all compact sub-
into (f~0), u)) sets of R 2. Thus each level corresponds to a clas-

544
Multilevel optimization in mechanics

sical geometry approximating the fractal geome- obtained using numerical procedures for the solu-
try. Within each level a new optimization problem tion of ( 1 7 ) a n d (18). This procedure is repeated
is solved with the new data. Thus the multilevel several times by increasing n; at the limit n --+ co,
character of the optimization problem results from u (n) and a (n) give the solution of the fractal inter-
the necessity to take into account the fractal ge- face problem.
ometry. See also: M u l t i l e v e l m e t h o d s for o p t i m a l
In the sequel a linear elastic structure occupying design; Bilevel p r o g r a m m i n g : A p p l i c a t i o n s ;
a subset ~ of R 3 is considered. In its undeformed S t o c h a s t i c bilevel p r o g r a m s ; Bilevel frac-
state the structure has a boundary F which is de- t i o n a l p r o g r a m m i n g ; Bilevel p r o g r a m m i n g :
composed into two mutually disjoint parts Fu and I n t r o d u c t i o n , h i s t o r y a n d overview; Bilevel
F F. It is assumed that on Fu (respectively, F F) linear p r o g r a m m i n g ; B ilevel linear p r o g r a m -
the displacements (respectively, the tractions) are m i n g : C o m p l e x i t y , e q u i v a l e n c e to m i n m a x ,
given. In the structure ~ some cracks with inter- concave p r o g r a m s ; Bilevel p r o g r a m m i n g :
faces (I) of fractal type are formed. These cracks in O p t i m a l i t y c o n d i t i o n s a n d duality; Bilevel
brittle materials frequently propagate along one or p r o g r a m m i n g ; B ilevel p r o g r a m m i n g : Algo-
more irregular ways. In this case the fracture sys- r i t h m s ; Bilevel p r o g r a m m i n g : G l o b a l opti-
tem may be considered to be a cluster of branches m i z a t i o n ; Bilevel p r o g r a m m i n g in m a n a g e -
propagating in such a way that new branches in m e n t ; Bilevel p r o g r a m m i n g : A p p l i c a t i o n s in
the n + 1 step are successively created from a for- e n g i n e e r i n g ; Bilevel o p t i m i z a t i o n : Feasibil-
mer branch at the n step. In other words the frac- ity t e s t a n d flexibility index; Bilevel pro-
ture system can be modeled by an IFS procedure. gramming: Implicit function approach.
Regarding now the boundary conditions on (I), it
is assumed that nonmonotone, possibly multival- References
[1] ARROW, K.J., HURWICZ, L., AND UZAWA, H.: Studies
ued laws describe the behavior of each interface in
in linear and nonlinear programming, Stanford Univ.
the normal and tangential directions. More specif- Press, 1985.
ically, it is assumed that the following boundary [2] BARNSLEY, M.: Fractals everywhere, Acad. Press,
conditions hold: 1988.
[3] BAUMAN, E.J.: 'Trajectory decomposition', in C.T.
--~N E OjN(UN, X), LEONDES (ed.): Optimization methods for large scale
systems with applications, McGraw-Hill, 1971.
- - S T E OjT(UT, X).
[4] BERTSEKAS, D.P., AND TSITSIKLIS, J.N.: Parallel and
Then according to the previous section, an equi- distributed computation: numerical methods, Prentice-
librium position of f2 is characterized by the hemi- Hall, 1989, Last edition: Athena Sci. Belmont Mass.
1997.
variational inequality (16).
[5] BOLZA, O.: Lectures on the calculation of variations,
In this case, where the fractured body ~2 with Chicago, 1904.
fractal interfaces (I) is studied, it is necessary to [6] BROSILOW,C.B., LASDON,L.S., ANDPEARSON, J.D.:
substitute in (16) the domain F with (I). As it has 'A multi-level technique for optimization': Proc. Joint
been mentioned above, (I) is the fixed point of a Aurora. Control Conf., Rensselaer ~ Polytech. Inst.,
1965.
given transformation denoted by W, i.e.
[7] CLARKE, F.H.: Optimization and nonsmooth analysis,
,I, = W,I,, Wiley, 1983.
[8] DEMYANOV, V.F., STAVROULAKIS, G.E., POLYAKOVA,
(I)(n+l) = W(~(n) L.N., AND PANAGIOTOPOULOS, P.D.: Quasidifferentia-
@(-) _+ @. bility and nonsmooth modelling in mechanics, engineer-
n--~oo
ing and economics, Kluwer Acad. Publ., 1996.
Thus, for each approximation (I)(n) of the fractal [9] HADLEY, G.: Non-linear and dynamic programming,
Addison-Wesley, 1964.
interface (~ a structure ~(n) must be solved. Since
[10] HAMEL, G.: Theoretische Mechanik, Springer, 1967.
(I)(n) is an interface set with classical geometry the [11] LASDON, L.C., AND SCHOEFFLER, J.D.: 'A multi-level
solutions u (n) and a (n) (where u (n) and a (n) are the technique for optimization': Proc. Joint A utom. Con-
corresponding displacement and stress fields) are trol Conf., Rensselaer Polytech. Inst., 1965.

545
Multilevel optimization in mechanics

[12] LEONDES, C.T. (ed.): Advances in control systems. Braunschweig, Germany


Theory and applications, Acad. Press, 1968. E-mail address: g. stavroulakis~tu-bs, de
[13] MAIER, G.: 'A quadratic programming approach for Euripidis Mistakidis
certain classes of non linear structural problems', Mec- Univ. Thessaly
canica 2 (1968), 121-130. Volos, Greece
[14] MIGDALAS, A., AND PARDALOS, P.M.: 'Special issue E-mail address: emistaki~uth.gr
on hierarchical and bilevel programming', J. Global Op-
Olympia Panagouli
tim. 8, no. 3 (1996).
Aristotle Univ.
[15] MISTAKIDIS, E.S, AND STAVROULAKIS, G.E.: Noncon-
Thessaloniki, Greece
vex optimization in mechanics. Algorithms, heuristics
E-mail address: olympiaCheron, c i v i l , auth. gr
and engineering applications by the F.E.M., Kluwer
Acad. Publ., 1998.
MSC2000: 49Q10, 74K99, 74Pxx, 90C90, 91A65
[16] MOREAU, J.J., PANAGIOTOPOULOS, P.D., AND
Key words and phrases: multilevel optimization, computa-
STRANG, G. (eds.): Topics in nonsmooth mechanics,
tional mechanics, parallel computation in mechanics.
Birkh~iuser, 1988.
[17] NANIEWICZ, Z., AND PANAGIOTOPOULOS, P.D.: Math-
ematical theory of hemivariational inequalities and ap-
plications, M. Dekker, 1995.
[ls] PANAGIOTOPOULOS, P.D.: 'A nonlinear programming MULTIPARAMETRIC LINEAR PROGRAM-
approach to the unilateral contact - and friction - MING
boundary value problem in the theory of elasticity',
In this article we will describe some results for sen-
Ingen. Archiv 44 (1975), 421-432.
[19] PANAGIOTOPOULOS, P.D.: 'A variational inequality ap- sitivity analysis and parametric programming for
proach to the inelastic stress-unilateral analysis of cable linear models. The solution approach that is de-
structures', Comput. Structures 6 (1976), 133-139. scribed here is based upon the extension of simplex
[20] PANAGIOTOPOULOS, P.D.: Inequality problems in me- algorithm for linear programs (LP) ([5], [3]). Here
chanics and applications. Convex and nonconvex en-
we mention some references ([16], [6], [20], [7], [15],
ergy Functions, Birkh~iuser, 1985, Russian Translation
MIR, Moscow 1989.
[14], [19], [1], [2], [8], [9], [10], [18], [11], [12], [13],
[21] PANAGIOTOPOULOS, P.D.: Hemivariational inequal- [21], and [17]); however [3] is recommended for an
ities. Applications in mechanics and engineering, extensive list of references and [4] for a historical
Springer, 1993. outline on parametric linear programming.
[22] PANAGIOTOPOULOS, P.D.: 'Modelling of nonconvex
We will consider right-hand side (RHS) multi-
nonsmooth energy problems. Dynamic hemivariational
inequalities with impact effects', J. Comput. Appl.
parametric linear programming problems, where
Math. 63 (1995), 123-138. uncertain parameters are assumed to be bounded
[23] PANAGIOTOPOULOS, P.D.: 'Variational principles for in a convex region. The solution algorithm is based
contact problems including impact phenomena', in upon characterizing the given initial convex region
M. RAOUS (ed.): Contact Mechanics, Plenum, 1995. by a number of nonoverlapping smaller convex re-
[24] PANAGIOTOPOULOS, P.D., MISTAKIDIS, E.S.,
gions and obtaining optimal solutions associated
STAVROULAKIS, G.E., AND PANAGOULI, O.K.:
'Multilevel optimization methods in mechanics': with each of these regions. The basic assumptions
Multilevel Optimization: Algorithms, Complexity and for the application of the algorithm are:
Applications, Kluwer Acad. Publ., 1998, pp. 51-90.
[25] ROCKAFELLAR, R.T." La thdorie des sous-gradients et • The given region must be finite and con-
ses applications ~ l'optimization. Fonctions convexes et nected.
non-convexes, Les Presses de l'Univ. Montr6al, 1979.
[26] SCHOEFFLER,J.D.: 'Static multilevel systems', in C.T. • One should be able to characterize at least
LEONDES (ed.): Optimization methods for large scale one (smaller) region.
systems with applications, McGraw-Hill, 1971.
[27] WISMER, D.A. (ed.): Optimization methods for large • One should be able to identify all regions that
scale systems with applications, McGraw-Hill, 1971.
are adjacent to a given region.

Consider the following multiparametric lin-


Georgios E. Stavroulakis ear programming problem, when parameters are
Carolo Wilhelmina Techn. Univ. present on the right-hand side of the constraints:

546
Multiparametric linear programming

z(O)-min c Tx together with the conditions of primal feasibility.


~g
The conditions of primal feasibility are derived as
s.t. Ax - b + F8
(1) follows. The basis B is said to be primal feasible if
x>0 the condition:
x E R n ~ E R s
B-lb(0)- XB(O) >__0, (4)
where x is a vector of continuous variables; A and
where b(O) - b + FO and xB(O) -- x s + PFO, is
F are constant matrices, and c and b are con-
satisfied. Then using (2) and (4), the condition of
stant vectors of appropriate dimensions; tOis a vec-
primal feasibility is given by:
tor of uncertain parameters, such that for each
O E K, O E R s, (1) has a finite optimal solution, - P F 0 _< XB. (5)
and has no optimal solution for O E R s - K. Fur- Thus, the critical region corresponding to p is
ther, consider the following restriction on 8 E F~, given by (5) and (3). For illustration purposes, say
= {8" GO <_ g}, where G is a constant matrix in Fig. 1, the initial region of 0 (condition (3))
and g is a constant vector; see Fig. 1 for a graphical is given by P Q R S T and the condition of primal
interpretation for the two parametric case where O feasibility is given by UVWX (condition (5)), then
is bounded in the region given by PQRST. CR 2 is the corresponding critical region. Note that
CR 2 is obtained by removing the redundant con-
straints, PT, QR and RS. In order to devise a pro-
T u cedure to obtain 'all' the critical regions (CR 1 and
CR3), and optimal solutions associated with them,
7// OR s we first state the following:
• Two optimal bases are said to be neighbors
if
- there exists some 0* E K such that both
0i the bases are optimal, and,
w Q
- it is possible to pass from one basis to
another by one dual step.
Fig. 1: Definition of critical regions.
• The critical regions associated with two dif-
The simplex tableau associated with (1) is given ferent optimal bases are said to be neighbors
as follows" if their corresponding bases are neighbors.
Y x - PFO - Xs, • Two neighboring critical regions lie in oppo-
site half spaces.
Z -t-PzTx- frn_t_l 0 - - Z (p) ,
• The optimal value function, z(0), is contin-
where
uous and convex; see Fig. 2 for a graphical
Y- B-1A, PF- B-iF, xs - B-lb, (2) interpretation for the case of two parameters.
Z -- cTx~ Pz=c T Y-c T Based upon the above statements, the solution
T algorithm for identifying all the critical regions can
now be described. The algorithm consists of two
where p corresponds to the index of basic vari- major parts. In the first part, an initial feasible
ables and B is the corresponding matrix. The solution is obtained and the critical region which
(critical) region within which the above (optimal) corresponds to the initial solution is characterized.
tableau is valid can then be derived as follows. The second part then starts with this critical re-
The critical region, CR, where an optimal solu- gion and identifies all the regions and correspond-
tion, z(P)(O) - C~XB(0), preserves its optimality, ing optimal solutions. The major steps of the al-
is given by the initial conditions on ~" gorithm are as follows:
GO < g (3) 1) Find a feasible solution:

547
Multiparametric linear programming

- Solve (1) by treating 0 as a free variable ric m i x e d integer linear programming; Para-
to obtain 0*. If no feasible solution exists, metric m i x e d integer nonlinear optimiza-
stop; (1) is infeasible. tion.
- Fix 0 = 0* and solve (1) to obtain an
initial basis B and corresponding critical References
region. [1] GAL, T.: 'Putting the LP survey into perspective',
OR/MS Today 19, no. 6 (1992), 93.
2) Find all optimal solutions: [2] GAL, T.: 'Weakly redundant constraints and their im-
- Construct two lists V and W, where V pact on postoptimal analysis in linear programming',
consists of those optimal bases whose Europ. J. Oper. Res. 60 (1992), 315-336.
[3] GAL, T.: Postoptimal analyses, parametric program-
neighboring bases have been identified,
ming, and related topics, de Gruyter, 1995.
and W consists of those bases whose [4] GAL, T.: Advances in sensitivity analysis and paramet-
neighbors have yet not been identified. ric programming, Kluwer Acad. Publ., 1997.
- Select any basis from W and identify all [5] GAL, T., AND NEDOMA, J.: 'Multiparametric linear
its neighboring bases. From all the identi- programming', Managem. Sci. 18 (1972), 406-422.
[6] GRANOT, D., GRANOT, F., AND JOHNSON, E.L.: 'Du-
fied bases, insert in W those bases which
ality and pricing in multiple right-hand choice linear
are neither in V nor in W. The optimal programming problems', Math. Oper. Res. ? (1982),
solutions (and corresponding critical re- 545-556.
gions) are then determined by moving [7] GREENBERG, H.J.: 'An analysis of degeneracy', Naval
from the basis to its neighbors by one Res. Logist. Quart. 33 (1986), 635-655.
dual step. [8] GREENBERG, H.J.: 'How to analyze the results of lin-
ear programs- Part 1: Preliminaries', Interfaces 23,
- Repeat the procedure until W = {0}.
no. 4 (1993), 56-67.
[9] GREENBERG, H.J.: 'How to analyze the results of lin-
z(o) ear programs- Part 2: Price Interpretation', Interfaces
23, no. 5 (1993), 97-114.
[10] GREENBERG, H.J.: 'How to analyze the results of linear
~e) ~
programs- Part 3: Infeasibility Diagnosis', Interfaces
z,(o~
23, no. 6 (1993), 120-139.
[11] GREENBERG, H.J.: 'How to analyze the results of lin-
ear programs- Part 4: Forcing Structures', Interfaces
eL
24, no. 1 (1994), 121-130.
.......-" ............' [12] GREENBERG, H.J.: 'The use of optimal partition in
/~i...... " cR~ linear programming solution for postoptimal analysis',
Oper. Res. Left. 15 (1994), 179-185.
oy [13] GREENBERG, H.J.: 'The ANALYZE rulebase for sup-
porting LP analysis', Ann. (9per. Res. 65 (1996), 91-
126.
[14] HANSEN, P.M., LABBE, M., AND WENDELL, R.E.:
Fig. 2: z(0) is a continuous and convex function of 0.
'Sensitivity analysis in multiple objective linear pro-
See also: Multiplicative programming; gramming: the tolerance approach', Europ. J. Oper.
Global o p t i m i z a t i o n in multiplicative pro- Res. 38, no. 1 (1989), 63-69.
[15] MAGNATI, T.L., AND ORLIN, J.B.: 'Parametric linear
gramming; Linear programming; P a r a m e t -
programming and anti-cycling pivoting rules', Math.
ric linear programming: Cost simplex algo- Program. 41 (1988), 317-325.
rithm; Parametric global optimization: Sen- [16] MURTY, K.: 'Computational complexity of parametric
sitivity; M u l t i p a r a m e t r i c linear program- linear programming', Math. Program. 19 (1980), 213-
ming; Selfdual parametric m e t h o d for linear 219.
programs; Nondifferentiable optimization: [17] Roos, C., TERLAKY, T., AND VIAL, J.-PH.: The-
ory and algorithms for linear optimization, an interior
Parametric programming; B o u n d s and solu-
point approach, Wiley, 1997.
tion vector e s t i m a t e s for parametric NLPs; [18] WANG, H-F, AND HUANG, C-S: 'Multiparametric anal-
Parametric optimization: E m b e d d i n g s , path ysis of the maximum tolerance in a linear programming
following and singularities; M u l t i p a r a m e t - problem', Europ. J. (9per. Res. 67, no. 1 (1993), 75-87.

548
Multiparametric mixed integer linear programming

[19] WARD, J.E., AND WENDELL, R.E.: 'Approaches to more than one uncertain parameter the solution
sensitivity analysis in linear programming', Ann. Oper. approach is available only for the right-hand side
Res. 27 (1990), 3-38.
case. Next we will describe solution approaches for
[2o] WENDELL, R.E.: 'The tolerance approach to sensitiv-
ity analysis in linear programming', Managem. Sci. 31 i) single parametric mixed integer linear pro-
(1985), 504-578. grams for objective function coefficients
[21] WENDELL, R.E.: 'Linear programming 3: The toler- parametrization; and
ance approach', in T. GAL AND H.J. GREENBERG
(eds.): Advances in Sensitivity Analysis and Paramet- ii) single parametric pure integer programs
ric Programming, Kluwer Acad. Publ., 1997. when the uncertain parameter is present on
Vivek Dua the right-hand side of the constraints.
Imperial College These illustrate some concepts which are based
London, U.K. upon some basic observations. For other solution
Efstratios N. Pistikopoulos approaches, see the literature cited above. Finally
Imperial College
we will present a solution approach for right-
London, U.K.
E-mail address: e . pistikopoulosOic, a c . uk
hand side multiparametric mixed integer linear
programs.
MSC2000: 90C31, 90C05
Key words and phrases: sensitivity analysis with respect to
right-hand side changes, critical region.
Mixed Integer Linear Programming Prob-
l e m s I n v o l v i n g a Single U n c e r t a i n P a r a m e -
t e r in O b j e c t i v e F u n c t i o n Coefficients. These
MULTIPARAMETRIC MIXED INTEGER can be stated as follows:
LINEAR PROGRAMMING z(¢) - min(c m + c'¢)x + dmy
x,y
In this article we describe theoretical and algorith-
s.t. Ax + Ey <_ b, (1)
mic developments in the field of parametric pro-
n, ye{O, 1}
gramming for linear models involving 0-1 integer
variables. We will consider two cases of the prob- Cmin _~ ¢ <_ Cmax,
lem: single parametric (when a single uncertain where x is a vector of continuous variables; y is the
parameter is present) and multiparametric (when vector of 0-1 integer variables; ¢ is a scalar un-
more than one uncertain parameters are present certain parameter bounded between its lower and
in the model). For the case when a single uncer- upper bounds ¢min and (/)maxrespectively; A is an
tain parameter is present, solution approaches are (m x n) matrix; E is an (m x l) matrix; c, d, d
based upon and b are vectors of appropriate dimensions. Solu-
i) enumeration ([12], [13], [11]); tion procedure for (1) is based upon following two
features of the formulation in (1). First feature of
ii) cutting planes ([6]); and
this formulation is that, since the uncertain pa-
iii) branch and bound techniques ([8], [10]). rameter is present in the objective function only,
For the multiparametric case, solution algorithm the feasible region of (1) remains constant for all
that has been proposed is based upon branch and the fixed values of ¢ in [¢min,¢max]. And the sec-
bound fundamentals [1], [2]. While most of the ond feature is that, the optimal value of (1) for
work on single parametric problems has been re- ¢min _~ ~ _~ (/)max is piecewise linear, continuous,
viewed in the two excellent papers [5] and [7], and and concave on its finite domain. The solution is
has been borrowed here for the sake of complete- then approached by deriving valid upper and lower
ness, the work on multiparametric problems, the bounds, using the concavity property of the objec-
focus of this article, is quite recent and is described tive function value, and sharpening these bounds
in detail. It may be mentioned that while solution until they converge to the same value, as described
approaches for single parametric case are available next. Solving (1) for ¢ fixed at its endpoints ~min
for uncertainty in objective function coefficients and Cmax, gives upper bounds A B and B C respec-
or right-hand side of constraints, for the case of tively (see Fig. 1); and a linear interpolation, AC,

549
Multiparametric mixed integer linear programming

between the endpoints provides a lower bound to (2), a solution will remain optimal for some inter-
the solution. The region A B C within which the so- val of 0 and then suddenly another solution will
lution will lie is then reduced by solving (1) at ¢int, become op$imal, and remain so for the next inter-
the intersection point of two upper bounds AB and val (see Fig. 3). The problem thus reduces to solv-
BC. This results (see Fig. 2) in two smaller re- ing (2) at an end point, say 0min, and then finding
gions, A D E and EFC, within which the solution a point 0i at which the current solution becomes
will exist. This procedure is continued until the dif- infeasible. Solving (2) at Oi + c will give another
ference between upper and lower bounds becomes integer solution. This procedure is continued until
zero. we hit the other end point, 0max.
z(¢) UB = UPPER BOUND; LB = LOWER BOUND z(O)
UB I
• 1
I ,,
-i
I
I
I
I

i
I
I
...., -" " I I I
I I
I
i I
! i i I
t I i
I I
I I I
i.~ , . . . . I I I t
i
I

~rnin ¢ int ¢max


0mi n 0 i ei+! em~

Fig. 1: Derivation of bounds. Fig. 3: Step function nature of objective function value.
z(¢) Consider a multiparametric mixed integer linear
UB l programming problem (mp-MILP) of the following
UB2 B
form"
z(0) - m i n c T x + d -ry
x,y
s I I

s.t. Ax + Ey < b + FO, (3)


I ! I
', , . ,. GO <_g,
0rain ¢ int 0max x E R n, {0, 1} l , 0 E R ~,
yE

where ~ is a vector of uncertain parameters; F is


Fig. 2" Sharpening of bounds.
an (m × s) matrix, G is an (r × s) matrix, and g
Integer programming problem involving a single is a constant vector. Solving (3) implies obtaining
uncertain parameter on the right-hand side of the the optimal solution to (3) for every ~ that lies
constraints can be stated as follows: in E - {0" GO < g, 0 C RS). The algorithm for
the solution of (3) proposed in [1] is based upon
z(0) - m i n d Ty
y simultaneously using the concepts of
s.t. Ey < b + rO, (2) • branch and bound method for solving mixed
0min <_ 0 <_ 0max integer linear programming (MILP) prob-
y e {0, 1) l, lems (see, e.g., [9]); and,
• simplex algorithm for solving multiparamet-
where r is a scalar constant and 0 is a scalar un-
ric linear programming (mp-LP) problems
certain parameter bounded between ~min and 0max
respectively. For a special case of (2) when r > 0, [4].
it may be noted that as 0 is increased from 0min While a solution of (3) by relaxing the integrality
to 0max, the feasible region will enlarge, and hence condition on y (at the root node) represents a para-
the objective function value will decrease or re- metric lower bound, a solution where all the y vari-
main the same, i.e., z(Oi) >_ z(Oi+l) for Oi <_ Oi+l. ables are fixed (e.g., at a terminal node) represents
Further, since only integer variables are present in a parametric upper bound. The algorithm proceeds

550
Multiparametric mixed integer linear programming

from the root node (lower bound) towards terminal lution at an intermediate node, 2(0)/, valid in its
nodes (upper bound) by fixing y variables at the corresponding critical regions, CR/, is then ana-
intermediate nodes. The complete enumeration of lyzed, to decide whether to explore subnodes of
the tree is avoided by fathoming those intermedi- this intermediate node or not, by using the follow-
ate nodes which guarantee a suboptimal solution. ing fathoming criteria. A given space in any node
At the root node, by relaxing the integrality can be discarded if one of the following holds:
condition on y, i.e., considering y as a continuous • (infeasibility criterion) Problem (6) is infea-
variable bounded between 0 and 1, (3) is trans- sible in the given space.
formed to an mp-LP of the following form"
• (integrality criterion) An integer solution is
'~(0) - m i n c T x + d T~) found in the given space.
s.t. Ax + E~I < b + FO, • (dominance criterion) The solution of the
GO < g, (4) node is greater than the current upper bound
in the same space.
0_<~_<1,
If all the regions of a node are discarded the node
, xER n OER s
can be fathomed. While the first two fathoming
The solution of (4), given by linear parametric pro- criteria (Infeasibility and Integrality) are easy to
files, 2(0) i, valid in their corresponding critical re- apply, in order to apply the third one (dominance
gions, CR i, represents a parametric lower bound. criteria) we need a comparison procedure, which
Similarly, at a node where all y are fixed, y - ~', is described next.
(3) is transformed to an mp-LP of the following x2
form:
¢

~(0) -minc Tx+d T


x,y
/ /
s.t. Ax + E ~ <_ b + FO,
GO<_g, (5) d

- {0, 1} t,
1
x E Rn 0 E R s
k

The solution of (5), "~(0)i, valid in its correspond-


ing critical regions, ~-"~i, represents a parametric
x!
upper bound.
Starting from the root node, some of the y vari-
Fig. 4: Redundant constraints.
ables are systematically fixed (to 0 and 1) to gen-
erate intermediate nodes of the branch and bound The comparison procedure consists of two steps.
tree. At an intermediate node, where some y are In the first step, a region, C R int - C"R n CR,
fixed and some are relaxed, an mp-LP of the fol- where the solution of the intermediate node and
lowing form is formulated: the current upper bound are valid is defined. This
is achieved by removing the redundant constraints
2(0) - min cTx + djT ^yj + dT~k
x,y from the set of constraints which define CR and
s.t. Ax + Ej~j + Ek~tk < b + FO, CR (for a procedure to eliminate redundant con-
GO<_9, (6) straints see [3]); graphical interpretation of redun-
y"j - {0, 1}, dant constraints is given in Fig. 4, where C1 is a
strongly redundant constraint and C2 is a weakly
o<_~k_<l,
redundant constraint.
x E R n, 0 E R s,
The results of this redundancy test, which be-
where the subscripts j and k correspond to y that long to one of the following 4 cases, are then ana-
are fixed and y that are free, respectively. The so- lyzed as follows:

551
Multiparametric mixed integer linear programming

• (case 1; Fig. 5) All constraints from CR are • (case 4; Fig. 8) The problem is infeasible.
redundant. This implies that CR _~ CR, and This implies that two spaces are apart from
t h e r e f o r e C R int - C"R. each other and CR i n t - {0}.

02 02 int
k kXkkkkkkkkkkk gR CR = re)

I/ ', '
~ , •
,,~ , / C R
A = cRint

/ x CR

s •
I l l g l l g l i l l l *,

4 / / / 7 / / / / 1 1 / / / ~
....._

0L . ,

el
Fig. 5: Definition of cRint; Case 1.
Fig. 8" Definition of cRint; Case 4.
• (case 2; Fig. 6) All constraints from C~-R are
redundant. This implies that CR _D CR, and Once C R int has been defined, the second step
t h e r e f o r e C R int - CR. is to compare ~ to ~, so as to find which of the
02
two is lower. This is achieved by defining a new
% %%
• %) constraint"

2~.,. ~-~ =CR int zdiff(O) -- Z(O) -- Z'(O) ~ 0

, tS tt ,,,-..-~?
s t
2a~ca and checking for redundancy of this constraint in
• t ,

C R int. This redundancy test results in following 3


cases:
01
• (case 1; Fig. 9) The new constraint is re-
Fig. 6: Definition of cRint; Case 2. dundant. This implies that ~(O) _< ~(8) and
therefore the space must be kept for further
• (case 3; Fig. 7) Constraints from both re- analysis.
gions are nonredundant. This implies that
02
two spaces intersect with each other, and
CR__tin , ", . . 5 1 ~ , ~/, zdiff (e) ~ 0
C R int is given by the space delimited by the
nonredundant constraints.
)
02
CR int

eo ~,~

el

I •
qlg'gl~lg 0 •

Fig. 9: Compare ~(0)" ~'(0); Case 1.

CR
. . --~.~
• (case 2; Fig. 10) The problem is infeasible.
el
This implies that ~(0) >_ ~(0) and therefore
the space can be discarded from further anal-
Fig. 7" Definition of cRint; Case 3.
ysis.

552
Multiparametric mixed integer linear programming

zation: Parametric programming; Bounds


O2 CRint , ', ,, and solution vector estimates for paramet-
.<0 ric NLPs; Parametric optimization" Em-
beddings, path following and singularities;
Parametric linear programming: Cost sim-
plex algorithm; Parametric mixed integer
nonlinear optimization; Multi-objective in-
el teger linear programmingl Decomposition
techniques for MILP: Lagrangian relax-
ation; LCP: Pardalos-Rosen mixed inte-
Fig. 10: Compare ~(0)" ~'(0); Case 2.
ger formulation; Integer linear complemen-
• (case 3; Fig. 11) The new constraint is non-
tary problem; Integer programming: Cut-
redundant. This implies that ~(0) _< ~'(0) in
ting plane algorithms; Integer program-
ABCD, and therefore the rest of the space ming: Branch and cut algorithms; Integer
can be discarded from further analysis.
programming: Branch and bound methods;
diff Integer programming: Algebraic methods;
z (e).<0
Integer programming: Lagrangian relax-
ez II ,
ation; Integer programming duality; Time-
dependent traveling salesman problem; Set
covering, packing and partitioning prob-
• ."
. / // ~ C C .~.
t.
lems; Simplicial pivoting algorithms for in-
teger programming; Multi-objective mixed
integer programming; Mixed integer clas-
sification problems; Integer programming;
01 Stochastic integer programming: Continu-
ity, stability, rates of convergence; Stochas-
Fig. 11: Compare ~(0) • ~(0); Case 3. tic integer programs; Branch and price: In-
teger programming with column generation.
Based upon the above theoretical framework,
the steps of the algorithm can be summarized as References
[1] ACEVEDO, J., AND PISTIKOPOULOS, E.N.: 'A multi-
follows:
parametric programming approach for linear process
Set an upper bound of ~(0) - co. engineering problems under uncertainty', Industr. En-
Solve the fully relaxed prolJlem (4). gin. Chem. Res. 36 (1997), 717-728.
IF an integer solution is found in a critical re- [2] ACEVEDO, J., AND PISTIKOPOULOS, E.N.: 'An algo-
gion, THEN update the upper bound and dis- rithm for multiparametric mixed integer linear pro-
card the region from further analysis. gramming problems', Oper. Res. Lett. 24 (1999), 139-
Fix one of the y variables to 0 and 1 to create 148.
two new nodes. [3] GAL, T.: Postoptimal analyses, parametric program-
IF no new nodes can be generated, THEN stop. ming, and related topics, de Gruyter, 1995.
Solve the resulting problem (6). [4] GAL, T., AND NEDOMA, J.: 'Multiparametric linear
IF the problem is infeasible THEN go back to programming', Managem. Sci. 18 (1972), 406-422.
Step 3, [5] GEOFFRION, A.M., AND NAUSS, R.: 'Parametric and
ELSE compare the solution to the current up- postoptimality analysis in integer linear programming',
per bound. Managem. Sci. 23, no. 5 (1977), 453-466.
IF all regions from a node have been analyzed, [6] HOLM, S., AND KLEIN, D.: 'Three methods for postop-
THEN go to Step 3. timal analysis in integer linear programming', Math.
See also: Parametric global optimization: Program. Stud. 21 (1984), 97-109.
[7] JENKINS, L.: 'Parametric methods in integer linear pro-
Sensitivity; Multiparametric linear pro- gramming', Ann. Oper. Res. 27 (1990), 77-96.
gramming; Selfdual parametric method for [8] MARSTEN, R.E., AND MORIN, T.L.: 'Parametric in-
linear programs; Nondifferentiable optimi- teger programming: The right-hand side case', Ann.

553
Multiparametric mixed integer linear programming

Discrete Math. 1 (1977), 375-390. taneously refold to their unique, native structure
[9] NEMHAUSER, G.L., AND WOLSEY, L.A.: Integer and after denaturation. This implies that the forma-
combinatorial optimization, Wiley, 1988.
tion of the native structure is controlled primar-
[10] OHTAKE, Y., AND NISHIDA, N.: 'A branch-and-bound
algorithm for 0-1 parametric mixed-integer program- ily by the amino acid sequence. According to An-
ming', Oper. Res. Left. 4, no. 1 (1985), 41-45. finsen's hypothesis the native structure is in a
[11] PIPER, C.J., AND ZOLTNERS, A.A.: 'Some easy state of thermodynamic equilibrium correspond-
postoptimality analysis for zero-one programming', ing to the conformation with the lowest free en-
Managem. Sci. 22, no. 7 (1976), 759-765.
ergy. Through mathematical modeling of protein
[12] ROODMAN, G.M.: 'Postoptimality analysis in zero-one
programming by implicit enumeration', Naval Res. Lo-
interaction energies, the protein folding problem
gist. Quart. 19 (1972), 435-447. can be addressed as a con]ormational search for
[13] ROODMAN, G.M.: 'Postoptimality analysis in zero- the global minimum energy.
one programming by implicit enumeration: The mixed- There exists two fundamental problems associ-
integer case', Naval Res. Logist. Quart. 21 (1974), 595-
ated with protein folding in the context of a con-
607.
formational search. The first is the ability to cor-
Vivek Dua rectly model protein interactions using detailed
Imperial College
mathematical equations. The second is associated
London, U.K.
with searching the highly nonconvex energy hyper-
Efstratios N. Pistikopoulos
Imperial College surface that describes a given protein. This com-
London, U.K. plexity, coupled with an exponential growth in the
E-mail address: e . pistikopoulosOic, ac. uk number of local minima as the size of protein in-
MSC2000: 90C31, 90Cll creases, has become known as the multiple rain-
Key words and phrases: parametric bounds, branch and ima problem. There exists an obvious need for the
bound, comparison of parametric solutions. development of efficient global optimization tech-
niques. An efficient method which has been suc-
cessfully applied to detailed atomistic models of
protein folding is the c~BB [1], [2], [3], [17] global
MULTIPLE MINIMA PROBLEM IN PRO- optimization algorithm.
TEIN FOLDING: o BB GLOBAL OPTIMIZA-
TION APPROACH
M a t h e m a t i c a l D e s c r i p t i o n . Proteins are essen-
tially polymer chains composed of a predefined
M o t i v a t i o n . Proteins are arguably the most com- set of amino acid residues in which neighbor-
plex molecules in nature. This complexity arises ing residues are linked by peptidic bonds. Natu-
from an intricate balance of intra- and inter- rally occurring proteins consist of only 20 differ-
molecular interactions that define the native three- ent amino acid residues, and the form of the side
dimensional structure of the system, and subse- chain R (e.g., methyl, butyl, benzoic, etc.) defines
quently its biological functionality. The underly- the differences between these constituent groups.
ing goal of protein .folding research is to under- The chemical structure of a generic protein is il-
stand the formation of these native tertiary struc- lustrated in Fig. 1. The repeating unit - N C a C ' -
tures. Genetic engineering can be used to produce defines the backbone of the protein. The protein
proteins with specific amino acid sequences. The also possesses amino and carboxyl end groups, de-
next step involves developing the link between the
noted by EAmino and ECarboxyl, respectively..
primary protein sequence and the native struc-
The geometry of a protein can be fully described
ture. The ability to predict the folding of proteins
by assigning a three-dimensional coordinate vector
promises to have important practical and theoret-
ri:
ical ramifications, especially in the areas of medic-
inal and biophysical chemistry.
Experimental studies have shown that pro- ri ---- Yi •
teins, under native physiological conditions, spon- zi

554
Multiple minima problem in protein folding: a B B global optimization approach

These ri specify the position of each atom in the hedral angle between the normals of the planes
protein molecule. The bond vector between two formed by atoms C~_INiC ~ and NiC~C~ respec-
atoms (i, j) connected with a covalent bond is de- tively, is called ¢i, where i - 1 and i are two ad-
fined as: jacent amino acid residues. The angle defined by
the planes NiC~C~ and C~C~Ni+I, respectively,
rij -
I
xj - xi I
yj yi •
zj zi
The corresponding bond length is then equal to
is called ¢i, where i and i + 1 are two adjacent
amino acid residues. Also, wi is the dihedral angle
defined by the planes C~ C~Ni+l and CiNi+lCi+
, a 1.
The letter X is utilized to denote the dihedral an-
the Euclidean distance between these two atoms:
gles which are associated with the side groups Ri.
I~jl - X/(~j - ~ ) 2 + (y~ _ y~)2 + (zj - z~) 2 Finally, the letter 0 is used to name the dihedral
A covalent bond angle, Oijk, formed by the two ad- angles associated with the two end groups. These
jacent bond vectors rij and rjk can be computed conventions are illustrated in Fig. 3.
by the following formulas:
rij • rjk sin (Oijk) -- rij x rjk
cos (o#~) - Ir~jll~jkl' Ir~jll~jkl"
. [:.
Here, rij "rjk is the dot product of the bond vectors
rij and rjk and rij x rjk is the cross product. k
rjk k - - ,

EAmino Ecarboxyl
1

Fig. 1: Generic primary protein structure.


Fig. 2: Illustration of dihedral angle.
The dihedral angle Wijkl m e a s u r e s the relative
orientation of two adjacent covalent angles Oij k and
Ojkl. This angle is defined as the angle between the •. i-1 i
normals through the planes defined by atoms i, j, k
and j, k, l respectively, and can be calculated from i+l
the following relations:
H
cos (,,.,~jk~)- ('~j x ,'jk)" (,'~k × ,'k~) /
N\ / ....
C a
sin (~o~j~)- I~j × ~Ykll~jk × rk~l" /\
RH
An alternative to specifying the coordinate vec-
tor for all atoms in a protein molecule is to set
bond lengths, covalent bond angles and indepen-
dent dihedral angles. A common approximation is Fig. 3: Dihedral angle conventions.
to assume rigid bond lengths and bond angles so
that the dihedral angles can be used to fully char-
acterize the shape of the protein molecule. P o t e n t i a l E n e r g y M o d e l i n g . A number of em-
The names of the dihedral angles of a protein pirically based molecular mechanics models have
chain follow a standard nomenclature. The di- been developed for protein systems, including AM-

555
Multiple minima problem in protein folding: aBB global optimization approach

BER [24], CHARMM [7], E C E P P / 3 [19], GRO- replaced by a modified 10-12 Lennard-Jones type
MOS [11], MM3 [4]. These models, also known term:
as force fields, are typically expressed as summa-
tions of several potential energy components, with Ehbond -- £ij 5 \ Rij --6 ~ .
the mathematical form of individual energy terms
based on the phenomenological nature of that Finally, corrective torsional energies, Etor,
term. A general total potential energy equation which are represented by a three term Fourier se-
should include terms for bond stretching (Ebond), ries expansion, are also added:
angle bending (E angle), torsion (Etor) and non- E2
Etor - El(l" - cos ¢) + (1 - cos 2¢)
bonded (END) interactions: -Y 5-
E3
Epotential - Zbond -+-Eangle -+-Ztor -b Enb + - ~ (1 - cos 3¢).
When rigid body approximations are employed, Each term can be interpreted physically. The 1-
bond stretching and angle bending energies can be x (cos ¢) symmetry term accounts for those non-
neglected. For these force fields, torsion angles de- bonded interactions not included in general non-
fine a set of independent variables that effectively bonded terms. The 2-x (cos 2¢) symmetry term is
describe any protein conformation. This approxi- related to the interactions of orbitals, while the 3-x
mately reduces the number of variables by a factor (cos 3¢) symmetry term describes steric contribu-
of 3 over those force fields that use a Cartesian co- tions.
ordinate system to describe flexible molecular ge- Other specific potential energy terms may also
ometries.
be added to the general energy equation depend-
One example of a rigid body atomistic level po- ing on the exact protein sequence. For example,
tential energy model is the E C E P P / 3 force field. the formation of disulfide bridges can be enforced
In this case, the nonbonded energy terms, Enb, in- by adding a penalty term to constrain the values
clude electrostatic, Eelec, van der Waals, Evdw, and of particular atomic distances. Correction terms
hydrogen bonding, Ehbond, interactions. These en- have also been used to adjust conformational en-
ergies are calculated for those atoms that are sep- ergies according to the configurations of proline
arated by more than two atoms; that is, the atoms and hydroxyproline residues.
possess at least a 1-4 relationship. Electrostatic
energies, Eelec , are calculated as Coulombic forces S o l v a t i o n E n e r g y M o d e l i n g . In general, the en-
based on atomic point charges" ergetic description of a protein must also include
solvation effects. A theoretically simple approach
Eelec -- QiQi
eRij would be to explicitly surround the peptide with
Here, Qi and Qj represent the two point charges, solvent molecules and compute potential energy
while Rij equals the distance between these two contributions for intra-and inter-molecular inter-
points. The e term describes the dielectric nature actions. These explicit calculations tend to greatly
of the protein environment. increase the computational cost of the simulation.
General nonbonded van der Waals interactions, In addition, solvent configurations are not rigid,
Evdw, are modeled using a 6-12 Lennard-Jones po- so these calculations must consider an average
tential energy term, which consists of a repulsion solvent-peptide configuration, which is typically
and attraction term" generated by a number of Monte-Carlo (MC) or
molecular dynamics (MD) simulations [14]. There-
[ (Ri*j) 12 (Ri~) 6]
E aw- - 2 . fore, most simulations of this type are limited to
restricted conformational searches.
The energy minimum for a given atomic pair is de- An alternative way for effectively considering
scribed by the potential depth, eij, and position, average solvent effects is to use implicit solvation
Ri~. For those atomic pairs that may form a hy- models. One complication involves the solvent's
drogen bond, the 6-12 potential energy term is influence on electrostatic interaction energies be-

556
Multiple minima problem in protein folding: a B B global optimization approach

cause of the implicit relationship between dielec- ergy contributions can easily be added at every
tric effects and solvation. A simple solution has step of local minimizations.
been to modify the representation of the dielectric
term. In reality, however, the rigorous treatment P r o b l e m F o r m u l a t i o n . For protein folding, the
of electrostatic interactions involves the solution energy m i n i m i z a t i o n problem can be formulated as
of the Poisson-Boltzmann equation. a nonconvex, nonlinear global optimization prob-
Other simple and computationally feasible im- lem in which the energy, E, must be globally min-
plicit solvation models are based on empirical rep- imized with respect to the dihedral angles of the
resentations of the solvation energy. In these cases, protein:
the solvation energy of each functional group is re-
min E ( ¢i , ~)i , wi , xki , oN, OC)
lated to the interaction of the solvent with a hy-
dration shell for the particular group. The individ- subject to -~_<¢i_<
ual terms are then summed together to provide a -Tr_< ¢i _< 7r
total solvation energy for the system. These solva- -Tr _< ~i _< 7r
tion contributions can be described by the follow-
ing general equation"
-Tr _< 0 g _< 7r
N --Tr _< 0 c _< 7r.
Esolv Siai.
i--1 The index i - 1 , . . . , NRES defines the number of
residues, NRES, in the protein. In addition, k -
Typically, Si represents either the solvent- 1 , . . . , K i denotes the number of dihedral angles in
accessible surface area, Ai, or the solvent- the side chain of the ith residue, and j = 1 , . . . , j Y
accessible volume of hydration layer, VHSi, for the and j = 1 , . . . , J C indicates the indices of the
functional group, and ai is an empirically derived amino and carboxyl end groups, respectively. The
free energy density parameter. energy, E, represents the total potential energy
A number of algorithms have been developed for function, Epotential, plus the free energy of solva-
calculating solvent-accessible surface areas [8], [9], tion, Esolv. In most cases, this is the exact formula-
[22]. Although several of these are relatively effi- tion; that is, energetic and gradient contributions
cient, the appearance of discontinuities has been can be added at each step of the minimization.
one complication in considering solvent accessible However, in the case of surface-accessible hydra-
surface areas. In addition, a large number of pa- tion using the JRF parameters, the potential en-
rameterization strategies (JRF, OONS, WE, etc.) ergy function is minimized before adding the hy-
have been used to derive appropriate ai parame- dration energy contributions. In other words, gra-
ters [21], [23], [25]. In the case of the JRF parame- dient contributions from solvation are not consid-
ter set, discontinuities can be avoided because the ered.
surface-accessible solvation energies are only in- Even after reducing this optimization problem
cluded at local minimum conformations [23]. This to a function of internal variables, the multidimen-
is because the parameters were derived from low sional surface that describes the energy function
energy solvated configurations of actual tetrapep- possesses an astronomically large number of local
tides. minima. In addition, evaluation of the energy, es-
Several methods have also been developed for pecially with the addition of solvation, is compu-
calculating the hydration volumes and correspond- tationally expensive, which makes even local min-
ing free energy parameters [6], [12]. A recent and imization slow. A large number of techniques have
computationally inexpensive method, RRIGS, is been developed to search this nonconvex confor-
based on a Gaussian approximation for the volume mational space. Many methods employ stochastic
of a hydration layer [6]. This method also inher- search procedures, while others rely on simplifica-
ently avoids numerical problems associated with tions of the potential model and/or mathematical
possible discontinuities so that the solvation en- transformations. In addition, the use of statistical

557
Multiple minima problem in protein folding: aBB global optimization approach

and/or heuristic conformational information is of- NRES K~

ten required. In general, the major limitation is +


that there is no guarantee for convergence to the i=1 k = l
global minimum energy structure. A number of re- jN
cent reviews have focused on global optimization
issues for these systems [10], [20]. j=l
jC
The c~BB global optimization approach has
been extremely effective in identifying global rain- 0y)
j=l
imum energy conformations of peptides described
by detailed atomistic models. The development The c~ represent nonnegative parameters which
of this deterministic branch and bound method must be greater or equal to the negative one-half
was motivated by the need for an algorithm that of the minimum eigenvalue of the Hessian of E
could guarantee convergence to the global min- over the defined domain. The overall effect of these
imum of nonlinear optimization problems with terms is to overpower the nonconvexities of the
twice-differentiable functions. The application of original nonconvex terms by adding the value of 2~
this algorithm to the minimization of potential en- to the eigenvalues of the Hessian of E. The convex
ergy functions was first introduced for microclus- lower bounding functions, L, possess a number of
ters [16]. The algorithm has also been shown to be important properties which guarantee global con-
successful for isolated [5], [15], as well as solvated vergence [18]:
peptide systems [13]. i) L is a valid underestimator of E;
ii) L matches E at all corner points of the cur-
Global M i n i m i z a t i o n using c~BB. The aBB rent box constraints;
global optimization algorithm effectively brack-
iii) L is convex in the current box constraints;
ets the global minimum solution by developing
iv) the maximum separation between L and E
converging sequences of lower and upper bounds.
is bounded. This property ensures that fea-
These bounds are refined by iteratively partition-
sibility and convergence tolerances can be
ing the initial domain. Upper bounds on the global
reached for a finite size partition element;
minimum are obtained by local minimizations of
the original energy function, E. Lower bounds be- v) the underestimators L constructed over su-
long to the set of solutions of the convex lower persets of the current set are always less tight
bounding functions, which are constructed by aug- than the underestimator constructed over the
menting E with the addition of separable qua- current box constraints for every point within
dratic terms. By using ¢L, eL, w L, x k,L, 07,L, oC,L. the current box constraints.
and cU, cu, wU, , Xik'U, 0N'U, 0C'U to refer to lower Once solutions for the upper and lower bound-
and upper bounds on the corresponding dihedral ing problems have been established, the next step
angles, the lower bounding function, L, of the en- is to modify the problem for the next iteration.
ergy hypersurface can be expressed in the following This is accomplished by successively partitioning
manner: the initial domain into smaller subdomains. One
obvious strategy is to subdivide the original hyper-
L=E rectangle by bisecting the longest dimension. In
NRES order to ensure nondecreasing lower bounds, the
+ Z - ¢,) - ¢,) hyper-rectangle to be bisected is chosen by select-
i=1
ing the region which contains the infimum of the
NRES
+ _ _
minima of lower bounds. A nonincreasing sequence
for the upper bound is found by solving the non-
i=1
NRES convex problem locally and selecting it to be the
+ - - minimum over all the previously recorded upper
i=1 bounds. If the single minimum of L for any hyper-

558
Multiple minima problem in protein folding: aBB global optimization approach

rectangle is greater than the current upper bound, objective function gradient vector is below a
this hyper-rectangle can be discarded because the specified tolerance (kcal/mol/deg). The sec-
global minimum cannot be within this subdomain ond derivative matrix is also evaluated to ver-
(fathoming step). ify that the upper bound solution is a local
The computational requirement of the c~BB minimum.
algorithm depends on the number of variables 5) The hyper-rectangle with the current mini-
(global) on which branching occurs. Therefore, mum value for L is selected and partitioned
these global variables need to be chosen carefully. along one of the global variables.
Qualitatively, the branching variables should cor-
6) If the best upper and lower bounds are within
respond to those variables which substantially in-
an c tolerance the program will terminate,
fluence the nonconvexity of the surface and the
otherwise it will return to Step 2.
location of the global minimum. In terms of the
protein folding problem, it is generally accepted A novel approach has also been proposed for the
that the backbone dihedral angles (¢ and ¢) are initialization of the c~BB algorithm [5]. Specifically,
the most influential variables. Therefore, in larger an analysis of 98 proteins from the Brookhaven X-
problems, the global variable set should include ray data bank was used to develop dihedral angle
only the set of ¢ and ¢ variables. In this case, the distributions in the form of histograms from - ~ to
dihedral angles associated with the peptide bond for each dihedral angle of each of the naturally
(w) and the side chains (X) are treated as local occurring amino acids. Using this information, a
variables. set of reduced domains can be defined for every
dihedral angle of every residue in the peptide se-
A l g o r i t h m i c D e s c r i p t i o n . The basic steps of the quence. Overall initialization domains correspond
algorithm are as follows: to the Cartesian products of all the sub-domains of
individual residues in the protein. This approach
1) The initial best upper bound is set to an ar-
maintains the guarantee of global optimality over
bitrarily large value. The original domain is
the considered search space of the reduced do-
partitioned along one of the global variable
mains, and is deterministic in those subdomains
dimensions.
that possess convex underestimators. In addition,
2) A convex function L is constructed in each all variable bounds are expanded to the [-~, ~]
hyper-rectangle and minimized using a local when solving the upper bounding problem. There-
nonlinear solver, with function calls to po- fore, although the initial point of an upper bound-
tential and solvation models. If a solution is ing minimization is restricted to the search space
greater than the best upper bound the entire of the corresponding lower bounding problem, the
subregion can be fathomed, otherwise the so- solution may lie outside the original subdomain.
lution is stored.
EXAMPLE 1 Met-enkephalin (H-Tyr-Gly-Gly-
3) The local minima for L are used as ini- Phe-Met-OH) is an endogenous opioid pentapep-
tial starting points for local minimizations tide found in the human brain, pituitary, and
of the upper bounding function E in each peripheral tissues. Its biological function involves
hyper-rectangle. In solving the upper bound- a large variety of physiological processes, most no-
ing problems, all variable bounds are ex- tably the endogenous response to pain. The pep-
panded to (-Tr, 7r) domain. These solutions tide consists of 24 dihedral angles and a total of
are upper bounds on the global minimum so- 75 atoms, and has played the role of a benchmark
lution in each hyper-rectangle. molecular conformation problem. The energy hy-
4) The current best upper bound is updated to persurface is extremely complex with the number
be the minimum of those thus far stored. If of local minima estimated on the order of 1011 .
a new upper bound (from step 3) is selected, The unsolvated global minimum energy conforma-
a separate module is called to ensure that tion, which is efficiently located using the c~BB
the absolute value of each gradient in the algorithm, has been shown to exhibit a type II'

559
Multiple minima problem in protein folding: c~BB global optimization approach

/%bend along the N-C' peptidic bond of Gly 3 and


Phe4 [5]' as sh°wn in Fig" 4" ~ .~
'."~ii,
~ i],','~,~,~,O E , , ~. , ~ , ,

Fig. 6: Global minimum energy structure of


met-enkephalin using volume based hydration.
Fig. 4: Global minimum energy structure of unsolvated
D
met-enkephalin.
See also: Simulated annealing m e t h o d s in
protein folding; Packet annealing; Phase
p r o b l e m in X-ray crystallography: Shake
and bake approach; Global optimization in
protein folding; A d a p t i v e simulated anneal-
ing and its application to protein fold-
ing; Genetic algorithms; Molecular struc-
ture d e t e r m i n a t i o n : Convex global underes-
timation; Global optimization in L e n n a r d -
Jones and Morse clusters; P r o t e i n folding:
Generalized-ensemble algorithms; Monte-
Carlo s i m u l a t e d annealing in protein fold-
ing; Simulated annealing.

Fig. 5: Global minimum energy structure of References


met-enkephalin using area based hydration. [1] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
C.A.: 'A global optimization method, c~BB, for gen-
eral twice-differentiable N L P s - II. Implementation and
computational results', Computers Chem. Engin. 22
(1998), 1159-1179.
The algorithm has also successfully pre- [2] ADJIMAN, C.S., ANDROULAKIS,I.P., MARANAS, C.D.,
dicted global minimum energy structures of met- AND FLOUDAS, C.A.: 'A global optimization method,
enkephalin using both solvent-accessible surface ~BB, for process design', Computers Chem. Engin. 20
(1996), $419-$424.
area (JRF) and volume of hydration (RRIGS)
[3] ADJIMAN, C.S., DALLWIG, S., FLOUDAS, C.A., AND
models [13]. In both cases, extended structures NEUMAIER, A.: 'A global optimization method, c~BB,
were identified, which qualitatively agrees with ex- for general twice-differentiable N L P s - I. Theoretical
perimental results. However, differences in the role advances', Computers Chem. Engin. 22 (1998), 1137-
of nonbonded energies and the side chain confor- 1158.
[4] ALLINGER,N.L., YUH, Y.H., AND LII, J.-H.: 'Molecu-
mations have been identified. The global minimum
lar mechanics. The MM3 force field for hydrocarbons',
energy conformations of the surface area and vol- J. Amer. Chem. Soc. 111 (1989), 8551-8566.
ume of hydration models are shown in Fig. 5 and [5] ANDROuLAKIS, I.P., MARANAS, C.D., AND FLOUDAS,
Fig. 6, respectively. C.A.: 'Global minimum potential energy conforma-

560
Multiple objective dynamic programming

tions of oligopeptides', J. Global Optim. 11 (1997), 1- [20] NEUMAIER, A.: 'Molecular modeling of proteins and
34. mathematical prediction of protein structure', SIAM
[6] AUGSPURGER, J.D., AND SCHERAGA, H.A.: 'An ef- Rev. 39 (1997), 407-460.
ficient, differentiable hydration potential for peptides [21] OOBATAKE, M., NI~METHY, G., AND SCHERAGA,H.A..
and proteins', J. Comput. Chem. 17 (1996), 1549-1558. 'Accessible surface areas as a measure of the thermody-
[7] BROOKS, B.R., BRUCCOLERI, R.E., OLAFSON, B.D., namic parameters of hydration of peptides', Proc. Nat.
STATES, D.J., SWAMINATHAN,S., AND KARPLUS, M.: Acad. Sci. USA 84 (1987), 3086-3090.
'CHARMM: A program for macromolecular energy [22] PERROT, G., CHENG, B., GIBSON, K.D., VILA, J.,
minimization and dynamics calculations', J. Comput. PALMER, K.A., NAYEEM, A., MAIGRET, B., AND
Chem. 4 (1983), 187-217. SCHERAGA, H.A.: 'MSEED: A program for the rapid
Is] CONNOLLY, M.L.: 'Analytical molecular surface calcu- analytical determination of accessible surface areas and
lation', J. Appl. Crystallogr. 16 (1983), 548-558. their derivatives', J. Comput. Chem. 13 (1992), 1-11.
[9] EISENHABER, F., AND ARGOS, P.: 'Improved strategy [23] VILA, J., WILLIAMS,R.L., V/~SQUEZ, M., AND SCHER-
in analytic surface calculation for molecular systems: AGA, H.A.: 'Empirical solvation models can be used
Handling of singularities and computational efficiency', to differentiate native from near-native conformations
J. Comput. Chem. 14 (1993), 1272-1280. of bovine pancreatic trypsin inhibitor', PROTEINS:
[10] FLOUDAS, C.A., KLEPEIS, J.L., AND PARDALOS, Struct. Funct. Genet. 10 (1991), 199-218.
P.M.: 'Global optimization approaches in protein fold- [24] WEINER, S.J., KOLLMAN, P.A., CASE, D.A., SINGH,
ing and peptide docking': DIMACS, Vol. 47, Amer. U.C., GHIO, C., ALAGONA, G., PROFETA, S., AND
Math. Soc., 1998, pp. 141-171. WEINER, P.: 'A new force field for molecular mechan-
[11] GUNSTEREN, W.F. VAN, AND BERENDSEN, H.J.C.: ical simulation of nucleic acids and proteins', J. Amer.
'GROMOS', Groningen Mol. Sire. (1987). Chem. Soc. 106 (1984), 765-784.
[12] KANG, Y.K., NI~METHY, G., AND SCHERAGA, H.A.. [25] WESSON, L., AND EISENBERG, D.: 'Atomic solvation
'Free energies of hydration of solute molecules 1. Im- parameters applied to molecular dynamics of proteins
provement of hydration shell model by exact compu- in solution', Protein Sci. 1 (1992), 227-235.
tations of overlapping volumes', J. Phys. Chem. 91 John L. Klepeis
(1987), 4105-4109. Dept. Chemical Engin. Princeton Univ.
[13] KLEPEIS, J.L., ANDROULAKIS, I.P., IERAPETRITOU, Princeton, NJ 08544-5263, USA
M.G., AND FLOUDAS, C.A.: 'Predicting solvated pep- E-mail address" john©titan, princeton, edu
tide conformations via global minimization of energetic
Christodoulos A. Floudas
atom-to-atom interactions', Computers Chem. Engin.
Dept. Chemical Engin. Princeton Univ.
22 (1998), 765-788.
Princeton, NJ 08544-5263, USA
[14] KOLLMAN, P.A.: 'Free energy calculations: Applica-
E-mail address: floudas@titan, princeton, e d u
tions to chemical and biochemical phenomena', Chem.
Rev. 93 (1993), 2395-2417. MSC2000: 92C40, 65K10
[15] MARANAS, C.D., ANDROULAKIS, I.P., AND FLOUDAS, Key words and phrases: protein folding, multiple minima,
C.A.: 'A deterministic global optimization approach global optimization, c~BB.
for the protein folding problem': DIMACS, Vol. 23,
Amer. Math. Soc., 1996, pp. 133-150.
[16] MARANAS, C.D., AND FLOUDAS, C.A.: 'A global op- MULTIPLE OBJECTIVE DYNAMIC PRO-
timization approach for Lennard-Jones microclusters',
GRAMMING
J. Chem. Phys. 97 (1992), 7667-7677.
D y n a m i c p r o g r a m m i n g has b e e n an area of active
[17] MARANAS, C.D., AND FLOUDAS, C.A.: 'A determin-
istic global optimization approach for molecular struc- research since its i n t r o d u c t i o n by R. B e l l m a n [1].
ture determination', J. Chem. Phys. 100 (1994), 1247- More recently, w i t h the recognition t h a t m a n y ap-
1261. plied o p t i m i z a t i o n p r o b l e m s require more t h a n one
[lS] MARANAS, C.D., AND FLOUDAS, C.A.: 'Global objective, the s t u d y of m u l t i c r i t e r i a o p t i m i z a t i o n
minimum potential energy conformations of small
has b e c o m e a growing a r e a of research. I n c l u d e d
molecules', J. Global Optim. 4 (1994), 135-170.
[19] NI}METHY, G., GIBSON, K.D., PALMER, K.A., YOON, in this a r e a of m u l t i c r i t e r i a o p t i m i z a t i o n is the
C.N., PATERLINI, G., ZAGARI, A., RUMSEY, S., AND s t u d y of m u l t i p l e objective d y n a m i c p r o g r a m m i n g
SCHERAGA, H.A.: 'Energy parameters in polypeptides ( M O D P ) . M O D P was first used to replace multi-
10. Improved geometrical parameters and nonbonded ple objective linear p r o g r a m m i n g ( M O L P ) where
interactions for use in the ECEPP/3 algorithm with
it was not applicable, such as in p r o b l e m s w i t h dis-
application to proline containing peptides', J. Phys.
crete variables. M a n y of the techniques used are
Chem. 96 (1992), 6472-6484.
extensions of classical d y n a m i c p r o g r a m m i n g . T h e

561
Multiple objective dynamic programming

following is a discussion of some of the research from its destination, t. T h e n the algorithm is given
t h a t has been developed in the area of MODP. above.
Using multiple objective d y n a m i c p r o g r a m m i n g T h e resulting S k vectors give n o n d o m i n a t e d so-
to find the 'shortest' p a t h t h r o u g h a network lutions for the network, b u t m a y b e not all of them.
with constant costs is one of the more straight- T h e y solve an example in which the weights are
forward uses of MODP. Work has been done on not specified.
b o t h forward and backward M O D P in this area. A few years later, R. Hartley [6] proposed a sire-
First, we consider a general network containing ilar algorithm t h a t also uses backward M O D P to
a set of nodes N - { 1 , . . . , n } and a set of find all P a r e t o p a t h s from all nodes in the network
arcs A - { ( i 0 , i l ) , ( i 2 , i 3 ) , ( i 4 , i 5 ) , . . . } C N x g to a specified node. T h e algorithm is as follows:
which indicates connections between nodes. Each Let V0(i) = { c ~ , . . . , c ~ } for k = 0, 1 , . . . , and let
arc (i, j) has an associated cost vector, cij = vk(t) = {0,..., 0}.
(Cijl,..., Cijm) C R m. A p a t h from node io to ip Vk(i) -- eff [U{cij + Vk-l(j)" j E F(i)}] for i E
is the sequence of arcs P - {(i0, i l ) , . . . , (ip-1, ip)} N (i ¢ t) and k = 1 , 2 , . . . , where F(i) is the set
where the first node of each arc is the same as of nodes such t h a t (i, j) C A. T h e 'eft' operator
the terminal node of the preceding arc and each finds all n o n d o m i n a t e d vectors in the set. The as-
node in the p a t h is unique. Let Hi be the set of all sociated p a t h s must be h a n d l e d separately.
p a t h s from node 1 to node i. T h e cost to traverse H.W. Corley and I.D. Moon [4] used forward
a p a t h p in Hi is [c(p)] E(i,j)Ep[Cij]. A p a t h in
-

M O D P to find all n o n d o m i n a t e d paths from a


Hi is nondominated if there is no other p a t h p* specified node to all other nodes in a network
in Hi with [c(p*)]r _ [c(p)]r for r - 1 , . . . , m and with multiple constant costs. T h e y assumed t h a t
[c(p*)]r < [c(p)]r for some r e { 1 , . . . , m } . the network contains no loops and t h a t cij ¢
0 k~l. {0,...,0} for any (i, j) C A. Letting GI k) be the
1 Evaluate S~ for all nodes using S k = {cij +
set of vector costs of all P a r e t o paths from node 1
2 If k < N, set k - k + 1 and return to step 1; to node i containing k or fewer arcs, the algorithm
otherwise: follows:
3 For each nondominated solution at each 1 Set cii = (0,...,0), i = 1,...,n, and cij =
node determined in step 1 and for each (co,...,oo), i ~: j, if no arc exists from i to
r, r = 1,...,m, define T r as T r = j. Set k = 1 and let G~I)= {cli}, i= 1,...,n.
mini N,...,io ]~']~,cr,~j,-x, where i, is the origi- For i - 1,..., n, set G k+l) = Vmin Uj=l{cji +
nated node at stage n and I, is the set of nodes
3
that can be reached from node n. If G~k+x) -3 G"(k)
i , i - 1,...,n, stop, otherwise
4 Given weights W m E R~', compute the MIN-
go to step 4.
SUM as
If k - n - 1, stop. Else, k = k + 1 and go to step
2.
rr
r=l
V m i n is an operation t h a t computes the vector
H.G. Daellenbach and C.A. DeKluyver [5] gave costs of all n o n d o m i n a t e d p a t h s in a set of vec-
one of the earliest algorithms for backward M O D P tor costs. An a l g o r i t h m for V m i n is given in their
with constant costs, which finds n o n d o m i n a t e d paper.
paths from all nodes to the destination node. Their T h e following example uses the Corley-Moon al-
m e t h o d is basically an extension of the principle of gorithm to solve a d y n a m i c routing problem for the
optimality to a multicriteria context. T h e y state a network in Fig. 1.
principle of Pareto optimality of MODP: 'A non- Table 1 gives the results of the algorithm. The
d o m i n a t e d policy has the p r o p e r t y t h a t regardless resulting P a r e t o o p t i m a l p a t h s from node I to node
of how the process entered a given state, the re- 6 are {(1, 2), (2, 5), (5, 6)} a n d {(1, 3), (3, 5), (5, 6)}.
maining decisions must belong to a n o n d o m i n a t e d Using multiple objective d y n a m i c p r o g r a m m i n g
subpolicy.' Let S k be the n o n d o m i n a t e d vector to find the shortest p a t h t h r o u g h a network with
of objective values for a node i, exactly k links t i m e - d e p e n d e n t costs is considerably more compli-

562
Multiple objective dynamic programming

cated t h a n M O D P with constant costs. The mono- Find a time grid of discrete values S T =
tonicity assumptions necessary for the principle {t0,...,t0 + T}, to > 0 and compute [cij(t)]
of optimality in dynamic p r o g r a m m i n g can eas- for all t E ST and all (i, j) E A.
Modify [cij(t)] for all t E ST and all (i,j) E A
ily be broken when dealing with time-dependent
as follows:
costs. Reaching a node later may be less costly
[c,3(t)]' = ~[cij(t)] ift + [cij(t)]l <_to + T,
t h a n reaching it earlier. M.M. Kostreva and M.M.
[ c~ if t + [cij (t)], > to + T.
Wiecek [7] extended the work done by K.L. Cooke
and E. Halsey [3] on dynamic p r o g r a m m i n g with Find the initial array [{[Fi(t)(0)]}], i =
one time-dependent cost (travel time) to dynamic 1,... ,N, for all t E ST, where {[Fgd(t)(O)]} =
p r o g r a m m i n g with multiple time-dependent costs. {0}, and {[Fi(t)(°)]} = [Cigd(t)]' for i E N \ N d .
Find the arrays [{[Fi(t)(k)]}], i-- 1,...,N, for
This m e t h o d uses backward dynamic program-
all t E ST, for k -- 1, 2,... as follows:
ming on a discrete time grid to find all nondomi-
{[f,(t)(')]}
nated paths from every node in the network to the
destination node. = VMIN{[cij(t)]' + {[fj (t + [cij(t)]~)('-l)]}},
[6,31 i e y \ {Y~},
{[F,(t)(')]} = {0}.
E,-.~J/~...',
The sequence of sets {[Fi(t0)(k)]}, k = 1,2,...,
converges to the set {[Fi(to)]}, the set of non-
dominated vectors associated with the paths
that leave node i at time to and reach node N4.

The following example uses Algorithm One [7]


Fig. 1.
to solve a dynamic routing problem for the net-
work in Fig. 2. A grid of discrete values of time
S19 = {1, 2 , . . . , 20} for to = 1 is established.

Assume the discrete time grid ST = { t o , . . . , to +


T}, to > 0 and the cost functions [cij(t)]k > O, [2t,
l]//~22]
(i, j) E A, for all t E ST. T is the upper b o u n d
on total time to travel from any node in the net-
[t+l,31 [(t-5)"+t 2]
work to the destination node, Nd. Also assume
t h a t [cij(t)]l is the time to travel from node i to
Fig. 2.
node j when the arrival time at node i is time t. For
all i E g \ N d and all t E ST, define {[Fi(t)]} as Table 2 shows the initial array and the two
the set of n o n d o m i n a t e d vectors associated with subsequent arrays. So, the set {Eff(Ei(to))} of
the paths t h a t leave node i at time t and reach all n o n d o m i n a t e d paths t h a t leave node 1, 2,
node Nd and define {[Fi(t)(k)]} as the set of non- and 3 at time to - 1 are { ( 1 , 2 ) , ( 2 , 3 ) , ( 3 , 4 ) } ,
dominated vectors associated with the paths t h a t {(2, 3), (3, 4)}, and {(3, 4)}.
leave node i at time t and reach node Nd in at most Kostreva and Wiecek [7] also developed an al-
k + 1 links before time to + T, where k = 0, 1, .... gorithm which uses forward dynamic programming
The following is the principle of optimality used to find all n o n d o m i n a t e d p a t h s from an origin node
for this algorithm: 'A n o n d o m i n a t e d p a t h p, leav- to every other node in the network without using a
ing node i at time t E ST and reaching node N time grid. Thus, assume t is a continuous variable,
at or before time to + T, has the property t h a t for t > 0, and [cij(t)]l > 0. An assumption must be
each node j lying on this path, a s u b p a t h pt, t h a t made about the cost functions so t h a t the princi-
leaves node j at time tj E ~T, tj > t, and arrives at ple of optimality will hold for these networks: For
node N at or before time to + T, is nondominated.' any arc ( i , j ) E A and all t l , t 2 _~ 0, if tt _ t2,
The algorithm is as follows: then"

563
Multiple objective dynamic programming

k-1 k=2 k=3


r

(k)
G1

G~k) ~2~
)
( 4

i ( I(!)(:)i
(1o)
G~k) Vmin ~I (:) (00
a(:)
r ,

) V m i n , 11 (191)(1:)} (:1)(:)
I
3

Table 1.

a) tl + [cij(tl)]l <_ t2 + [cij(t2)]l, and leave the origin node at time t = 0 and lead to
node j. The algorithm is as listed above.
b) [cij(tl)]r ~ [Cij(t2)]r for all r C { 2 , . . . , m } . Another way to get around the monotonicity as-
Assuming the cost functions are monotone
sumption of dynamic p r o g r a m m i n g is to use gen-
increasing with respect to time satisfies this eralized dynamic p r o g r a m m i n g techniques. See [2]
assumption.
for a way to use generalized DP with a multicrite-
Find the initial vector {[a~°)]}, j = 1,...,N; rid preference function. Basically, generalized DP
where ([G~°)]} = {0} and {[G~°)]} -[clj(0)], uses a weaker principle of optimality than Bell-
j=2,...,N. man's famous version [1]. Generalized DP finds
Calculate the vectors {[G~k)]}, j = 1,..., N, for partial solutions that may lead to optimal solu-
k = 1, 2,..., as follows: tions even though locally they are not optimal so-
{[G}(h)(k)],l =,... ,Nj} lutions according to the preference function.
= VMIN{[G'~(tn)( k-l)] + [cij(tn)], In [2] generalized D P is applied to the multicri-
n-- 1,...,gi}, teria best p a t h problem. Assuming node 1 to be
j = 2,...,N, the origin and node N to be the destination, let H
be the set of all paths in the network. Let
{[G~(t~)(k)],l = 1}--- {0}.
P(j)= {pEH: il = 1 , i n = j }
{[GJ.k)]}, k = 1,2,..., converges to {[Gj]}, the
be the set of all paths from the origin to node j.
set of vector costs of all nondominated paths
which leave the origin node at time t = 0 and Let
lead to node j.
X ( j ) = {p C H: il = j, i n = N }
Assume that node 1 is the origin node. For nodes be the set of all paths from node j to the destina-
j - 2 , . . . , N , let [Gy(tu) (k)] be the vector cost of tion node. The vector cost along each arc is called
the nondominated p a t h u which is of at most k m E R m. A
an arc length vector, lij - (li~,... ,lij)
links leaving the origin node at time t = 0 and p a t h length function z : H --+ R m assigns a p a t h
leading to node j, where t u is the arrival time of length vector to every p a t h p E H where o is a
this p a t h at node j. Also, let {[G~k)]} be the set of binary operator on R m"
vector costs of all nondominated paths which are
of at most k links leaving the origin node at time
z(p) =/1,2 o . . . o li,-1,i,.
t = 0 and leading to node j, where Nj is the num- Thus, each different objective can have a differ-
ber of nondominated paths. Let {[Gj]} be the set ent binary operation. For example, distance would
of vector costs of all nondominated paths which have an additive binary operator and probabilities

564
Multiple objective dynamic programming

Time [{[F,(t)(°)]}] [{[F~(t)(~)]}] [{[F~(t)(~)]}]


1

2
~ :/1o)(°o)i i 2o)i
5 7 [

Lq
0

3 (~ ~
4 4 2 0 4 2 0

5 iO)o i i i i Zi
6 (~o~) (142/ ( 2 2 ) ( : ) (~o~) ( 1 4 2 ) ( X ) ( ~ )

7
(
oo oo
i
10 0
(=) (=)(~)(°0) (i)
oo oo 10 0
8

(:) (:)(:)(:) (=) (=)(=)(:)


Table 2" Sequence of arrays.

would have a multiplicative binary operator. Let These local preference relations are used to form
the weak principle of optimality. An optimal path
Z(j) - {z(p)" p E P ( j ) } must be composed of subpaths that can be part of
be the set of all length vectors of all paths from the an optimal path.
origin to node j. A multicriteria preference func- Unfortunately, in order to get these preference
tion u: R m ~ R is defined on the set of path relations one would have to complete all paths
length vectors. The objective is to maximize this from every node in the network. Since this is too
preference function. The monotonicity assumption computationally intense, the preference relations
says that for all z,z' E Z(j), u(z) < u(z') '.. are relaxed to the refining local preference rela-
u(z o ljk) <_ u(z'o ljk) for all j , k E S such that tions -~j where z -~j z' implies zpjz'. Using -~j
(j,k) C T. Unfortunately, with multi-objective avoids having to find the entire relation pj. Using
problems this assumption is easily violated. Gen- this relation means that a larger set of maximal
eralized DP tries to get around this monotonic- path length vectors will be kept by using pj than
ity assumption by having local preference relations if pj were used. A maximal path length vector is
defined as pj C Z(j) × Z(j): for z, z' C Z(j), a vector where there does not exist another vector
where zpjz ~ implies that any subpath from the ori- at that state that is strictly more preferred. Let
gin to node j whose length is z cannot be used
maxl(X, p ) - {x E X" 3x' C X" xpx' and x'px}.
in a path to produce a better overall path from
the origin to the destination node than using the The following are the equations of generalized DP:
subpath from the origin to node j whose length
is z ~. So, subpath length vector z ~ is more lo- f(1)- (zl},
cally preferred even though subpath length vector S(J) - m~xl (u(~,j)~A(f(i)o l~j) -~j)
z may be globally preferred, u(z') <_ u(z). So, for forj = 2 , . . . , N ,
z,z' E Z(j), zpjz' if and only if 3p' C X ( j ) such
that u(z o z(p)) <_ u(z' o z(p')) for all p E X(j). where { f (i) o lij } = {z o lij : z E f (i) }.

565
Multiple objective dynamic programming

When the monotonicity assumptions are satis- [4] CORLEY, H.W., AND MOON, I.D.: 'Shortest paths in
fied, the -~j relation can be replaced with the mul- networks with vector weights', J. Optim. Th. Appl. 46
(1985), 79-86.
ticriteria preference function, u, thus reducing to
[5] DAELLENBACH,H.G., AND DEKLUYVER, C.A.: 'Note
the conventional DP problem. However, when the on multiple objective dynamic programming', J. Oper.
monotonicity assumption does not hold the -~j re- Res. Soc. 31 (1980), 591-594.
lation must be defined by trying exploit any special [6] HARTLEY, R.: 'Vector optimal routing by dynamic pro-
structures of each individual problem. Also, using gramming', in P. SERAFINI (ed.): Mathematics of Mul-
tiobjective Optimization, 1984, pp. 215-224.
dynamic programming to find the entire Pareto
[7] KOSTREVA, M.M., AND WIECEK, M.M.: 'Time depen-
optimal set can be seen as another special case of dency in multiple objective dynamic programming', J.
generalized DP where Zk >_ Z~k for all k - 1 , . . . , m Math. Anal. Appl. 173 (1993), 289-307.
'.. z -~j z I (assuming minimization of each cri- Michael M. Kostreva
teria). Dept. Math. Sci. Clemson Univ.
The subject of multiple objective dynamic pro- Clemson, SC 29634-1907, USA
gramming has developed into a viable body of E-mail address: flstgla~clomson.odu
knowledge capable of providing solutions to ap- Laura C. Lancaster
plied problems in which trade-offs among objec- Dept. Math. Sci. Clemson Univ.
Clemson, SC 29634-1907, USA
tives is important. Among the multiple objective
E-mail address: 11ancas@math. clemson, e d u
techniques, it is distinctive in its ability to pro-
MSC2000: 90C39, 90C31
vide the entire Pareto optimal set. To gain such an
Key words and phrases: dynamic programming, multiple
advantage, one must be willing to perform com- objective programming, efficient set.
putationally intensive operations on large sets of
vectors.
See also: D y n a m i c p r o g r a m m i n g in cluster- MULTIPLE OBJECTIVE PROGRAMMING
ing; D y n a m i c p r o g r a m m i n g a n d N e w t o n ' s SUPPORT
m e t h o d in u n c o n s t r a i n e d o p t i m a l control; This article gives a brief introduction into multiple
Dynamic programming: Continuous-time objective programming support. We will overview
o p t i m a l control; H a m i l t o n - J a c o b i - B e l l m a n basic concepts, formulations, and principles of
equation; D y n a m i c p r o g r a m m i n g : Infinite solving multiple objective programming problems.
horizon p r o b l e m s , overview; D y n a m i c pro- To solve those problems requires the interven-
g r a m m i n g : Stochastic s h o r t e s t p a t h prob- tion of a decision-maker. That's why behavioral
lems; D y n a m i c p r o g r a m m i n g : D i s c o u n t e d assumptions play an important role in multiple
problems; D y n a m i c p r o g r a m m i n g : A v e r a g e objective programming. Which assumptions are
cost p e r stage problems; D y n a m i c p r o g r a m - made affects which kind of support is given to a
ming: U n d i s c o u n t e d problems; D y n a m i c decision maker. We will demonstrate how a free
p r o g r a m m i n g : I n v e n t o r y control; N e u r o - search type approach can be used to solve multi-
dynamic programming; Dynamic program- ple objective programming problems.
ming: O p t i m a l control applications.
I n t r o d u c t i o n . Before we can consider the con-
cept of multiple objective programming support
References
(MOPS), we have to first explain the concept of
[1] BELLMAN, R.E.: Dynamic programming, Prince-
ton Univ. Press, 1957. multiple criteria decision making (MCDM). Even
[2] CARRAWAY,R.L., MORIN, T.L., AND MOSKOWITZ, if there is a variation of different definitions, most
H.: 'Generalized dynamic programming for multicri- researchers working in the field might accept the
teria optimization', Europ. J. Oper. Res. 44 (1990), following general definition: Multiple Criteria De-
95-104. cision Making (MCDM) refers to the solving of
[3] COOKE, K.L., AND HALSEY, E.: 'The shortest route
decision and planning problems involving multi-
through a network with time-dependent internodal
transit times', J. Math. Anal. Appl. 14 (1966), 493- ple (generally conflicting) criteria. 'Solving' means
498. that a decision-maker (DM) will choose one 'rea-

566
Multiple objective programming support

sonable' alternative from among a set of available Fig. 1 and Fig. 2 and the numerical example we
ones. It is also meaningful to define that the choice consider a multiple objective linear programming
is irrevocable. For an MCDM problem it is typi- model in which all constraints and objectives are
cal that no unique solution for the problem ex- defined using linear functions.
ists. Therefore to find a solution for MCDM prob- The article consists of seven sections. In Sec-
lems requires the intervention of a decision-maker tion 2, we give a brief introduction to some foun-
(DM). In MCDM, the word 'reasonable' is replaced dations of multiple objective programming. How
by the words 'efficient/nondominated'. They will to generate potential 'reasonable' solutions for a
be defined later on. DM's evaluation is considered in Section 3, and
Actually the above definition is a strongly sim- in Section 4, we will review general principles to
plified description of the whole (multiple criteria) solve a multiple objective programming problem.
decision making process. In practice, MCDM prob- In Section 5, a multiple criteria decision support
lems are not often so well-structured, that they can system VIG is introduced, and a numerical exam-
be considered just as a choice problem. Before a de- ple is solved in Section 6. Concluding remarks are
cision problem is ready to be 'solved', the following given in Section 7.
questions require a lot of preliminary work: How
to structure the problem? How to find essential A Multiple Objective Programming Prob-
criteria? How to handle uncertainty? These ques- lem. A multiple objective programming (MOP)
tions are by no means outside the interest area of problem in a so-called criterion space can be de-
MCDM-researchers. The outranking method by B. fined as follows:
Roy [17] and the AHP (the analytic hierarchy pro-
'max' q (1)
cess) developed by T.L. Saaty [18] are examples of
s.t. q C Q,
the MCDM-methods, in which a lot of effort is de-
voted to problem structuring. Both methods are where set Q c R k is a so-called feasible region in
well known and widely used in practice. In both a k-dimensional criterion space R k. The set Q is
methods, the presence of multiple criteria is an es- of special interest. Most considerations in multi-
sential feature, but the structuring of a problem is ple objective programming are made in a criterion
an even more important part of the solution pro- space.
cess. Set Q may be convex/nonconvex,
When the term 'support' is used in connection bounded/unbounded, precisely known or un-
with MCDM, we may adopt a broad perspective known, consist of finite or infinite number of alter-
and refer with the term to all research associated natives, etc. When Q consists of a finite number of
with the relationship between the problem and the elements which are explicitly known in the begin-
decision-maker. In this article we take a narrower ning of the solution process, we have an important
perspective and focus on a v e r y essential support- class of problems which may be called e.g. (multi-
ing problem in multiple criteria decision making: ple criteria) evaluation problems. Sometimes those
How to assist a DM to find the 'best' solution from problems are referred to as discrete multiple cri.
among a set of available 'reasonable' alternatives, teria problems or selection problems (for a survey
when the alternatives are evaluated by using sev- see for example. [16]).
eral criteria? Available alternatives are assumed When the number of alternatives in Q is infi-
to be defined explicitly or implicitly by means of a nite and not countable, the alternatives are usu-
mathematical model. The term multiple objective ally defined using a mathematical model formula-
programming is usually used to refer to dealing tion, and the problem is called continuous. In this
with this kind of model. case we say that the alternatives are only implic-
The following considerations are general in the itly known. This kind of problem is referred as a
sense that usually it is not necessary to specify multiple criteria design problem (the terms 'evalu-
how the alternatives are defined. It is enough to tion' and 'design' are adopted from A. Arbel) or a
assume that they belong to set Q. However, in continuous multiple criteria problem. In this case,

567
Multiple objective programming support

the set Q is not specified directly, but by means of dominated) solutions is an acceptable and 'reason-
decision variables as usually done in single optimi- able' solution, unless we have no additional infor-
zation problems: mation about the DM's preference structure.
~Nondominated
max q-- f(x) - ( f l ( x ) , . . . ,fk(x)) (2)
s.t. x E X,

where X C R n is a feasible set and f" R n ~ R k.


The space R n is called a variable space (see Fig. 1).
The functions fi, i = 1 , . . . ,k, are objective func- *x, q~ ; v

tions. The feasible region Q can now be written as


Fig. 1: A variable, criterion, and value space.
Q={q:q=f(x),xEX}.
The MOP-problem has seldom a unique solu- Nondominated solutions are defined as follows:
tion, i.e. an optimal solution that simultaneously DEFINITION 1 In (1), q* E Q is nondominated if
maximizes all objectives. Conceptually the multi- and only if there does not exist another q E Q such
ple objective mathematical programming problem that q >_ q* and q ~ q*.
may be regarded as a value (utility) function max-
imization program: DEFINITION 2 In (1), q* E Q is weakly nondomi-
nated if and only if there does not exist another
max v(q) (3) q E Q such that q > q*. [::]
s.t. q E Q,
Correspondingly, efficient solutions are defined as
where v is a real-valued function, which is strictly
follows:
increasing in the criterion space and defined at
least in the feasible region Q. It is mapping the DEFINITION 3 In (2), x* E X is efficient if and
feasible region into a one-dimensional value space only if there does not exist another x E X such
(see Fig. 1). Function v specifies the DM's pref- that f (x) > f (x* ) and f (x) # f (x* ). [2
erence structure over the feasible region. However, DEFINITION 4 In (2), x* E X is weakly efficient
the key assumption in multiple objective program- if and only if there does not exist another x E X
ming is that v is unknown. Generally, if the value
such that f (x) > f (x*). [:]
function is estimated explicitly, the system is con-
sidered to be in the MAUT category, see for ex- The final ('best') solution q E Q of the problem
ample [7], (MAUT stands for multiple attribute (1) is called the most preferred solution. It is a so-
utility theory) and can then be solved without any lution preferred by the DM to all other solutions.
interaction of the DM. Typically, MAUT-problems At the conceptual level, we may think it is the so-
are not even classified under the MCDM-category. lution maximizing an (unknown) value function in
If the value function is implicit (assumed to exist problem (3). How to find it? That is the problem
but is otherwise unknown) or no assumption about we now proceed to consider.
the value function is made, the system is usually Unfortunately, the above characterization of the
classified under MCDM [2] or MOP. most preferred solution is not very operational, be-
Solutions of the MOP-problems are all those al- cause no system can enable the DM to simultane-
ternatives which can be the solutions of some value ously compare the final solution to all other solu-
function v: Q --+ R. Those solutions are called ef- tions with an aim to check if it is really the most
ficient or nondominated depending on the space preferred or not. It is also as difficult to maximize
where the alternatives are considered. The term a function we do not know. Some properties for a
nondominated is used in the criterion space and ef- good system are, for example, that it makes the
ficient in the variable space. (Some researchers use DM convinced that the final solution is the most
the term efficient to refer to efficient and nondom- preferred one, does not require too much time from
inated solutions without making any difference.) the DM to find the final solution, to give reliable
Any choice from among the set of efficient (non- enough information about alternatives, etc.. Even

568
Multiple objective programming support

if it is impossible to say which system provides If )~ > 0, then the solution vector x of (4) is effi-
the best support for a DM for his multiple criteria cient, but if we allow that A >_ 0, then the solu-
problem, all proper systems have to be able to rec- tion vector is weakly-efficient. (see, e.g. [21, p. 215;
ognize, generate and operate with nondominated 221]). Using the parameter set A - {A" /~ > 0}
solutions. To generate nondominated solutions for in the weighted-sums linear program we can com-
the DM's evaluation is thus one key issue in mul- pletely characterize the efficient set provided the
tiple objective programming. In the next section, constraint set is convex. However, A is an open set,
we will consider some principles. which causes difficulties in a mathematical optimi-
zation problem. If we use cl(A) - {~" )~ > 0} in-
Generating Nondominated Solutions. De- stead, the efficiency of x cannot be guaranteed any-
spite many variations among different methods of more. It is surely weakly-efficient, and not neces-
generating nondominated solutions, the ultimate sarily efficient. When the weighted-sums are used
principle is the same in all methods: a single ob- to specify a scalarizing function in multiple objec-
jective optimization problem is solved to generate tive linear program (MOLP) problems, the opti-
a new solution or solutions. The objective func- mal solution corresponding to nonextreme points
tion of this single objective problem may be called of X is never unique. The set of optimal solutions
a scalarizing function, according to [25]. It typi- always consists of at least one extreme point, or
cally has the original objectives and a set of pa- the solution is unbounded. In early methods, a
rameters as its arguments. The form of the scalar- common feature was to operate with weight vec-
izing function as well as what parameters are used tors A E R k, limiting considerations to efficient
depends on the assumptions made concerning the extreme points (see, e.g., [29]).
DM's preference structure and behavior. A Chebyshev-Type Scalarizing Function. Cur-
Two classes of parameters are widely used in rently, most solution methods are based on the
multiple objective optimization: use of a so-called Chebyshev-type scalarizing func-
1) weighting coefficients for objective functions; tion first proposed by A. Wierzbicki [25]. We will
and refer to this function by the term achievement
(scalarizing) function. The achievement (scalariz-
2) reference/aspiration/reservation levels for ing) function projects any given (feasible or infea-
objective function values. sible) point g C R k onto the set of nondominated
Based on those parameters, there exist several solutions. Point g is called a reference point, and
ways to specify a scalarizing function. An impor- its components represent the desired values of the
tant requirement is that this'function completely objective functions. These values are called aspi-
characterizes the set of nondominated solutions: ration levels.
The simplest form of achievement function is"
for each parameter value, all solution
vectors are nondominated, and for each s(g, q, w) - max gk -- qk, (5)
kCK Wk
nondominated criterion vector, there is
at least one parameter value, which pro- where w > 0 E R k is a (given) vector of weights,
duces that specific criterion vector as a g C R k, and q E Q - {f(x)" x c X}. By minimiz-
solution ing s(g, q, w) subject to q c Q, we find a weakly
nondominated solution vector q* (see, e.g. [25],
(see, for theoretical considerations, e.g. [26]).
[26]). However, if the solution is unique for the
A Linear Scalarizing Function. A classic method problem, then q* is nondominated. If g C R k is
to generate nondominated solutions is to use the feasible, then q* C Q, q* >_ g. To guarantee that
weighted-sums of objective functions, i.e. to use only nondominated (instead of weakly nondomi-
the following linear scalarizing function: nated) solutions will be generated, more compli-
cated forms for the achievement function have to
max { )~'f (x) " x e X } . (4) be used, for example"

569
Multiple objective programming support

k ative; otherwise it is nonnegative. It is zero, if an


s(g, q, w fl) - max [ gk - qk ] + p E ( g i - qi), aspiration level point is weakly-nondominated.
kEK Wk i:1

where p > 0. In practice, we cannot operate with


a definition 'any positive value'. We have to use a
pre-specified value for p. Another way is to use a
lexicographic formulation [10].
el ""
The applying of the scalarizing function (6) is I I I I : -'~
4 8 ql
easy, because given g E R k, the minimum of
s(g, v, w, p) is found by solving the following LP- Fig. 2: I l l u s t r a t i n g the p r o j e c t i o n of a feasible and an
problem: infeasible aspiration level point onto the n o n d o m i n a t e d
surface.
k
rain e + p E (gi - qi)
i=1 S o l v i n g M u l t i p l e O b j e c t i v e P r o b l e m s . Sev-
s.t. xCX
(7) eral dozen procedures and computer implementa-
tions have been developed from the 1970s onwards
e > g~ - q~ i - 1 k.
Wi to address both multiple criteria evaluation and
design problems. The multiple objective decision
Problem (7) can be further written as:
procedures always requires the intervention of a
k DM at some stage in the solution process. A pop,
rain e + p E (gi - qi) ular way to involve the DM in the solution process
i=1 is to use an interactive approach.
s.t. xEX (8) The specifics of these procedures vary, but they
q+ew-z--g have several common characteristics. For example,
z>O. at each iteration, a solution, or a set of solutions,
is generated for a DM's examination. As a result
To illustrate the use of the achievement scalariz- of the examination, the DM inputs information in
ing function, consider a two-criteria problem with the form of trade-offs, pairwise comparisons, as-
a feasible region having four extreme points (0, 0), piration levels, etc. (see [20] for a more detailed
(0, 3), (2, 3), (8, 0), as shown in Fig. 2. In Fig. 2, discussion). The responses are used to generate a
the thick solid lines describe the indifference curves presumably, improved solution. The ultimate goal
when p = 0 in the achievement scalarizing func- is to find the most preferred solution of the DM.
tion. The thin dotted lines stand for the case p :> 0. Which search technique and termination rule is
N o t e t h a t the line from (2, 3) to (8, 0) is nondom- used is heavily dependent on the underlying as-
inated and the line from (0,3) to (2,3) (exclud- sumptions postulated about the behavior of the
ing the point (2, 3)) is weakly-nondominated, but DM and the way in which these assumptions are
dominated. Let us assume that the DM first spec- implemented. In MCDM-research there is a grow-
ifies a feasible aspiration level point gl = (2, 1). ing interest in the behavioral realism of such as-
Using a weight vector w = [2, 1]~, the minimum sumptions.
value of the achievement scalarizing function ( - 1 ) Based on the role that the value function (3) is
is reached at a point v 1 = (4, 2) (cf. Fig. 2). Cor- supposed to play in the analysis, we can classify
respondingly, if an aspiration level point is infea- the assumptions into three categories:
sible, say g2 = (8,2), then the minimum of the
achievement scalarizing function (+1) is reached 1) Assume the existence of a value function v,
at point v 2 = (6, 1). When a feasible point domi- and assess it explicitly.
nates an aspiration level point, then the value of 2) Assume the existence of a stable value func-
the achievement scalarizing function is always neg- tion v, but do not attempt to assess it explic-

570
Multiple objective programming support

itly. Make assumptions of the general func- ing Chebyshev-type achievement scalarizing func-
tional form of the value function. tions as explained above. These functions can be
3) Do not assume the existence of a stable value controlled either by varying weights (keeping as-
function v, either explicitly, or implicitly. piration levels fixed) or by varying the aspiration
levels (keeping weights fixed). Instead of aspiration
The first assumption is adopted in multi- levels, some algorithms asks the DM to specify the
attribute utility or decision analysis (see, e.g. reservation levels for the criteria (see, e.g. [15]).
[7]). Interactive software implementing such ap- An achievement scalarizing function projects
proaches on personal computers exists. one aspiration (reservation) level point at a time
The second assumption was a basic paradigm onto the nondominated frontier. By parametrizing
used in interactive multiple criteria approaches in the function, it is possible to project the whole vec-
the 1970s. A classical example is the GDF-method tor onto the nondominated frontier as originally
[3]. DM's responses to specific questions were used proposed by [11]. The vector to be projected is
to guide the solution process towards an 'optimal' called a reference direction vector and the method
or 'most preferred' solution (in theory), assuming reference direction method, correspondingly. When
that the DM behaves according to some specific a direction is projected onto the nondominated
(but unknown) underlying value function (see for frontier, a curve traversing across the nondomi-
surveys, e.g. [5], [20], [21], and [24]). Interactive nated frontier is obtained. Then an interactive line
software that implements such systems for a com- search is performed along this curve. The idea en-
puter have often been developed by the authors of ables the DM to make a continuous search on the
the above procedures for experimental purposes. nondominated frontier. The corresponding mathe-
The approaches based on the assumption on matical model is a simple modification from the
'no stable value/utility function' typically operate original model (8) developed for projecting one
with a DM's aspiration levels regarding the objec- point:
tives on the feasible region. The aspiration levels r k
are projected via minimizing so called achievement
min e÷p~(gi-qi)
scalarizing functions (6) ([23], [25]). No specific i--1
behavioral assumptions e.g. transitivity are nec- s.t. x CX (9)
essary.
q + ew - z - g + tr,
In essence, this approach seeks to help the DM
z>_O,
more or less freely to search the set of efficient so-
lutions. Interactive software that implements such where t" 0 -+ ce and r E R k is a reference direc-
systems for a computer have been developed like tion. In the original approach, a reference direction
ADBASE [22], DIDAS [14], VIG [8], and VIMDA was specified as a vector starting from the current
[9]. For an excellent review of several interactive solution and passing through the aspiration levels.
multiple criteria procedures, see [21]. Other well- The DM was asked to give aspiration levels for
known books that provides a deeper background criteria.
and additional references especially in the field of The original reference direction approach has
multiple objective optimization include [I], [4], [5], been further developed into many directions. First,
[6], [19], [27] and [28]. [12] improved upon the original procedure by mak-
Multiple objective linear programming (MOLP) ing the specification of a reference direction dy-
is the most commonly studied problem in multiple namic. The dynamic version was called Pareto
criteria decision making (MCDM). Most solution race. In Pareto race, the DM can freely move in
methods are developed for this problem. any direction on the nondominated frontier he/she
likes, and no restrictive assumptions concerning
Example of a Decision Support System: the DM's behavior are made. Furthermore, the ob-
V I G . Today, many systems use aspiration level jectives and constraints are presented in a uniform
projections, where the projection is performed us- manner. Thus, their role can also be changed dur-

571
Multiple objective programming support

ing the search process. The method and its imple- An example of the Pareto race screen is given in
mentation is called Pareto race. The whole soft- Fig. 3. The screen is associated with the numerical
ware package consisting of Pareto race is called example described in the next section.
VIG.
In Pareto race, a reference direction r is deter-
mined by the system on the basis of preference in-
formation received from the DM. By pressing num- .... i : ........... ::: :::85.43:7 :: :/:i i::ii: ::: i:i: i ii:i~:::i ii~ii!::::: ili: ~ii:~:: ===+==========
=======
=!i!i:i!~i:~!:!:i:!:::i:i i::!ii:i ::!!ii.!:!:i

ber keys corresponding to the ordinal numbers of


the objectives, the DM expresses which objectives I . . . . . . . . . . . . +: ........ . . . .

he/she would like to improve and how strongly. In : ": :~: ::: : :~:+::+ : : : ::- : : : : : :;--::~:~:!: : i .:: ::;~:!!:i + !.i.: :~::::;: ::: ::::: ::::::::::::::::::::::::::::::::::::::::::
........ : : ........... i !:-:!+::~i~!:ii:i!:::i::! :

this way he/she implicitly specifies a reference di- ::Yb~es::: :!i: F2!Ge~(~:F4~Rei~::i:i!iFl0iE~{:: I! i?:::ii~:i:::iii:::i;::ii::::!i

rection. Fig. 3 shows the Pareto race interface for


Fig. 3: E x a m p l e P a r e t o race screen.
the search, embedded in the VIG software ([8]).
Thus Pareto race is a visual, dynamic, search Pareto race does not specify restrictive behav-
procedure for exploring the nondominated frontier ioral assumptions for a DM. He/she is free to make
of a multiple objective linear programming prob- a search on the nondominated surface, until he/she
lem. The user sees the objective function values believes that the solution found is his/her most
on a display in numeric form and as bar graphs, preferred one.
as he/she travels along the nondominated fron- Pareto race is only suitable for solving moder-
tier. The keyboard controls include an accelera- ate size problems. When the size of the problem
tor, gears, brakes, and a steering mechanism. The becomes large, computing time makes the interac-
search on the nondominated frontier is like driv- tive mode inconvenient. To solve large scale prob-
ing a car. The DM can, e.g., increase/decrease lems [13] proposed a method based on Pareto race.
the speed, make a turn and brake at any moment An (interactive) free search is performed to find
he/she likes. the most preferred direction. Based on the direc-
To implement those features, Pareto race uses tion, an nondominated curve can be generated in
certain control mechanisms, which are controlled a batch mode if desired.
by the following keys:
N u m e r i c a l I l l u s t r a t i o n s . For illustrative pur-
• (SPACE) BAR, an 'accelerator': Proceed in poses, we will consider the following production
the current direction at constant speed. planning problem, where a decision maker (DM)
• F1, 'gears (backward)': Increase speed in the tries to find the 'best' product-mix for three prod-
backward direction. ucts: Product 1, Product 2, and Product 3. The
production of these products requires the use
• F2, 'gears (forward)': Increase speed in the of one machine (mach. hours), man-power (man
forward direction. hours), and two critical materials (crit. mat. 1 and
• F3, 'fix" Use the current value of objective i crit. mat. 2). Selling the products results in profit
as the worst acceptable value. (profit). Assume that the DM describes his/her
decision problem as follows:
• F4, 'relax': Relax the 'bound' determined
Of course, I would like to make as much
with key F3.
profit as possible. Because it is difficult
• F5, 'brakes': Reduce speed. and quite expensive to obtain critical
materials, I would like to use them as
• F10, 'exit'.
little as possible, but never more than
• n u m , 'turn': Change the direction of motion I have presently in storage (96 units of
by increasing the component of the reference each). Only one machine is used to pro-
direction corresponding to the goal's ordinal duce the products. It operates without
number i ~ [1, k] pressed by DM. any problems for at least 9 hours. The

572
Multiple objective programming support

length of the regular working day is 10 depends entirely on his/her own preferences. Ac-
hours. People are willing to work over- tually, all sample solutions except solution II are
time which is costly and they are tired somehow consistent with his/her statement above.
the next day. Therefore, if possible, I In solution II, product 3 is excluded from the pro-
would like to avoid it. Finally, product 3 duction plan.
is very important to a major customer,
I i I ii] iii ! iv
and I cannot totally exclude it from the
Objectives:
production plan.
crit. mat. 1 91.46 94.50 93.79 90.00
The traditional single objective programming crit. mat. 2 85.44 88.00 89.15 84.62
considers the problem as a profit maximization profit 30.27 31.00 30.42 29.82
problem. The other 'requirements' are taken as product 3 0.23 0.00 0.50 0.44
, .

constraints. The multiple objective programming Constraints:


takes a 'softer' perspective. We may, for instance, mach. hours 9.00 9.00 9.00 9.00
consider the problem as a four objective problem. man hours 9.73 10.00 10.00 9.62
The DM would like to make as much profit as pos- Decision Variables:
sible, but simultaneously, he/she would like to use product 1 3.88 4.00 3.45 3.71
those two critical materials as little as possible, product 2 2.81 3.00 3.03 2.74
and in addition to maximize the use of product 3. product 3 0.23 0.00 0.50 0.44
Machine hours and man hours are considered as
Table 2: A sample of solutions for the multiple criteria
constraints, but during the search process the role
problem.
of constraints and objectives may also be changed,
if necessary.
We assume that the problem can be modeled Conclusion. In this article, we have provided an
as an MOLP-model. The coefficient matrix of the overview on multiple objective programming sup-
problem is given in Table 1. port. The emphasis was how to find the most pre-
ferred alternative from among a set of reasonable
Prod. 1 Prod. 2 Prod. 3
(nondominated) alternatives. This kind of the ap-
mach. hours 1.5 1 1.6
proach is unique for the multiple criteria decision
man hours 1 2 1
making. We have left other features like structur-
crit. mat, 1 9 19.5 7.5
ing the problem, finding relevant criteria etc. be-
crit. mat. 2 7 20 9
yond this presentation. They are important, but
profit 4 5 3
also relevant in the considerations of any decision
Table 1: The coefficient matrix of the production planning support system.
problem.
See also" M u l t i - o b j e c t i v e o p t i m i z a t i o n :
Thus, we have the following multiple objective P a r e t o optimal solutions, properties; Multi-
linear programming model: objective optimization: Interactive meth-
crit. mat. 1: 9P1 + 19.5P2 + 7.5P3 --+ min ods for p r e f e r e n c e value functions; M u l t i -
crit. mat. 2: 7P1 + 20P~. + 9P3 -+ min objective optimization: L a g r a n g e dual-
profit: 4P1 + 5/:'2 + 3P3 -+ max ity; M u l t i - o b j e c t i v e o p t i m i z a t i o n " I n t e r a c -
product 3: P3 --+ max
t i o n of d e s i g n a n d control; O u t r a n k i n g
subject to" methods; Preference disaggregation; Fuzzy
mach. hours: 1.hP1 + P2 + 1.6P3 <9 m u l t i - o b j e c t i v e linear p r o g r a m m i n g ; M u l t i -
man hours: P1 +' 2P2 + P3 <10 o b j e c t i v e o p t i m i z a t i o n a n d decision sup-
The problem has no unique solution. Using the p o r t s y s t e m s ; P r e f e r e n c e d i s a g g r e g a t i o n ap-
Pareto race (see Fig. 3) or any other software de- proach: Basic features~ e x a m p l e s f r o m fi-
veloped for multiple objective programming en- n a n c i a l decision m a k i n g ; P r e f e r e n c e m o d e l -
ables a DM to search nondominated solutions. ing; M u l t i - o b j e c t i v e i n t e g e r linear p r o g r a m -
Which solution he/she will choose as a final one ming; M u l t i - o b j e c t i v e c o m b i n a t o r i a l optio

573
Multiple objective programming support

mization; Bi-objective assignment problem; A. WIERZBICKI (eds.): Aspiration Based Decision Sup-
Estimating data for multicriteria decision port Systems, Springer, 1989, pp. 21-47.
making problems: Optimization techniques; [15] MICHALOWSKI, W., AND SZAPIRO, T.: 'A bi-reference
procedure for interactive multiple criteria program-
Multicriteria sorting methods; Financial ap-
ming', Oper. Res. 40 (1992), 247-258.
plications of multicriteria analysis; Portfo- [16] OLSON, D.: Decision aids .for selection problems, Ser.
lio s e l e c t i o n and multicriteria analysis; De- Oper. Res. Springer, 1996.
cision support systems with multiple crite- [17] RoY, B.: 'How outranking relation helps multiple crite-
ria. ria decision making', in J. COCHRANEAND M. ZELENY
(eds.): Multiple Criteria Decision Making, Univ. South
Carolina Press, 1973, pp. 179-201.
References [18] SAATY, T.: The analytic hierarchy process, McGraw-
[1] COHON, J.: Multiobjective programming and planning, Hill, 1980.
Acad. Press, 1978. [19] SAWARAGI, Y., NAKAYAMA,H., AND TANINO, T.: The-
[2] DYER, J., FISHBURN, P., WALLENIUS, J., AND ZIONTS, ory of multiobjective optimization, Acad. Press, 1985.
S.: 'Multiple criteria decision making, multiattribute [20] SHIN, W., AND RAVINDRAN, A.: 'Interactive multiple
utility theory- The next ten years', Managem. Sci. 38 objective optimization: Survey I - Continuous case',
( 1992 ), 645-654. Comput. Oper. Res. 18 (1991), 97-114.
[3] GEOFFRION, A., DYER, J., AND FEINBERC, A.: 'An [21] STEUER, R.E.: Multiple criteria optimization: Theory,
interactive approach for multi-criterion optimization, computation, and application, Wiley, 1986.
with an application to the operation of an academic [22] STEUER, R.: Manual for the ADBASE multiple objec-
department', Managem. Sci. 19 (1972), 357-368. tive linear programming package, Dept. Management
[4] HAIMES, Y., TARVAINEN, K., SHIMA, T., AND THA- Sci., Univ. Georgia, 1992.
DATHIL, J.: Hierarchical multiobjective analysis of [23] STEUER, R., AND CHOO, E.-U.: 'An interactive
large-scale systems, Hemisphere, 1990. weighted Tchebycheff procedure for multiple objective
[5] HWANG, C., AND MASUD, A.: Multiple objective de- programming', Math. Program. 26 (1983), 326-344.
cision making - Methods and applications: A state-of- [24] WHITE, D.: 'A bibliography on the applications of
the-art survey, Springer, 1979. mathematical programming multiple-objective meth-
[6] IGNIZIO, J.: Goal programming and extensions, D.C. ods', J. Oper. Res. Soc. 41 (1990), 669-691.
Heath, 1976. [25] WIERZBICKI, A.: 'The use of reference objectives
[7] KEENEY, R.L., AND RAIFFA, H.: Decisions with mul- in multiobjective optimization', in G. FANDEL AND
tiple objectives: Preferences and value tradeoffs, Wiley, T. CoAL (eds.): Multiple Objective Decision Making,
1976. Theory and Application, Springer, 1980.
[8] KORHONEN, P.: ' V I G - A visual interactive support [26] WIERZBICKI, i . : 'On the completeness and construc-
system for multiple criteria decision making', Belgian tiveness of parametric characterizations to vector opti-
J. Oper. Res., Statist. and Computer Sci. 27 (1987), mization problems', OR Spektrum 8 (1986), 73-87.
3-15. [27] Yu, P.L.: Multiple criteria decision making: Concepts,
[9] KORHONEN, P.: 'A visual reference direction approach techniques, and extensions, Plenum, 1985.
to solving discrete multiple criteria problems', Europ. [28] ZELENY, M.: Multiple criteria decision making, Mc-
J. Oper. Res. 34 (1988), 152-159. Graw-Hill, 1982.
[10] KORHONEN, P., AND HALME, M.: 'Using lexicographic [29] ZIONTS, S., AND WALLENIUS, J.: 'An interactive pro-
parametric programming for searching a nondominated gramming method for solving the multiple criteria
set in multiple objective linear programming', J. Multi- problem', Managem. Sci. 22 (1976), 652-663.
Criteria Decision Anal. 5 (1996), 291-300.
[11] KORHONEN, P., AND LAAKSO, J.: 'A visual interac- Pekka Korhonen
tive method for solving the multiple criteria problem', Internat. Inst. Applied Systems Analysis
Europ. J. Oper. Res. 24 (1986), 277-287. A-2361 Laxenburg, Austria
[12] KORHONEN, P., AND WALLENIUS, J.: 'A Pareto race', Helsinki School Economics and Business Adm.
Naval Res. Logist. 35 (1988), 615-623. Runeberginkatu 14-16
[13] KORHoNEN, P., WALLENIUS, J., AND ZIONTS, S.: 00100 Helsinki, Finland
'A computer graphics-based decision support system E-mail address: korhonon~iiasa, ac. at
for multiple objective linear programming', Europ. J.
Oper. Res. 60 (1992), 280-286. MSC 2000:90C29
[14] LEWANDOWSKI, A., KREGLEWSKI, T., ROGOWSKI, Key words and phrases: multiple criteria decision making,
W., AND WIERZBICKI, A.: 'Decision support sys- multiple objective programming, multiple objective pro-
tems of DIDAS family (Dynamic Interactive Deci- gramming support, scalarizing function, value function.
sion Analysis and Support', in A. LEWANDOWSKIAND

574
Multiplicative programming

MULTIPLICATIVE PROGRAMMING where S - {x E R n" c [ x + cio >_ O, i - 1,2}.


Multiplicative programming refers to a class of op- While (2) belongs to multi-extremal global optimi-
timization problems containing products of real- zation [6] and is known to be NP-hard [11] (cf. also
valued functions in the objective and/or in the con- C o m p l e x i t y classes in o p t i m i z a t i o n ; C o m p u -
straints. A product of convex functions is called a t a t i o n a l c o m p l e x i t y t h e o r y ) , problem (3) can
convex multiplicative ]unction; similar definitions be solved using a standard convex minimization
hold for concave and linear multiplicative func- technique because maximizing f ( x ) amounts to
tions. Multiplicative functions appear in various minimizing a convex f u n c t i o n log(c~x + c10) -
areas, including microeconomics [4], VLSI chip de- log(c~x + c20). For the same reason as (3), cer-
sign [10] and modular design [2]. Especially in mul- tain linear programs with additional linear multi-
tiple objective decision making, they play impor- plicative constraints, e.g. the modular design prob-
tant roles [3]. A typical example is a bond port- lem with xiYj ~_ bij [2], can be handled within the
folio optimization studied in [7], where a num- framework of convex programming, if xi, yj ~_ O.
ber of performance indices such as average coupon A generalization of (1) is a convex multiplica-
rate, average terminal yields and average length rive program, which minimizes a product of sev-
of maturity associated with a portfolio (a bundle eral convex functions fi(x), i - 1,... ,p, over a
of assets) are to be optimized (either minimized compact convex set D C R n"
or maximized) subject to a number of constraints. P
One handy approach to simultaneously optimizing min f ( x ) - I ~ f~(x )
multiple objectives without a common scale is to
(4)
optimize the geometric mean, or equivalently the s.t. xED.
product of these objectives. Thus, we are led to In most of the existing solutions to (4), the con-
consider a multiplicative programming problem. vex functions fi are assumed to be nonnegative-
The simplest subclass of multiplicative pro- valued on D. When fi(x ~) - 0 for some i and
gramming problems is a linear multiplicative pro- for some x ~ E D, the minimum value of (4) is
gram, which is a quadratic program of minimizing zero; and x ~ is a globally optimal solution. We may
a product of two affine functions c~x+cl0, i - 1, 2, therefore assume for each i that fi(x) > 0 for all
over a polytope D C Rn: x E D. If f is a concave multiplicative function
instead of a convex one, the problem is equiva-
min f ( x ) -- (c~x + Cl0)(c2Tx -t-- C20)
(1) lent to a concave minimization problem because
s.t. xED.
log f ( x ) - ~ i P l log fi(x) is concave. The convex
This problem was first studied by K. Swarup [13] multiplicative program (4) itself can also be trans-
many years ago, but had attracted little attention formed into a concave minimization problem (cf.
until the late 1980s when an intensive research was C o n c a v e p r o g r a m m i n g ) , though f is not a con-
undertaken [8], [12], [14]. In general, the objective cave function. For example, introducing additional
function f is indefinite; it is quasiconcave on a re- variables Yi, i - 1,... ,p, we have an equivalent
gion where the signs of c~x + cios are the same, problem:
but quasiconvex on a region where the signs are p
different [1], [8]. Therefore, to solve (1), we need min ~ log yi
to solve a quasiconcave minimization problem: i--1
s.t. x E D (5)
min f(x) (2) fi(x) < y i , i-l,...,p,
s.t. xEDNS, y>_O.
and a quasiconcave maximization problem: The number p of fis is often very small in com-
parison with the dimension n of x; e.g. five or so
max f(x) (3)
in applications to multiple objective optimization.
s.t. xE DMS, Owing to this low-rank nonconvexity [9], problem

575
Multiplicative programming

(5) can be solved far more efficiently than the usual [13] SWARUP,K.: 'Programming with indefinite quadratic
concave minimization problem of the same size. function with linear constraints', Cahiers CERO 8
(1966), 132-136.
In addition to (1) and (4), there are a
[14] THOAI, N.V.: 'A global optimization approach for solv-
number of studies on problems with gener- ing the convex multiplicative programming problem',
alized convex multiplicative ]unctions
of the J. Global Optim. I (1991), 341-357.
forms f ( x ) - 1-Iip__lfi(x)+ g ( x ) a n d f ( x ) = Takahito Kuno
p
Y~i=l f2i-1 (x)f2i(x)+g(x), where the fis and g are Univ. Tsukuba
convex functions. These are all nonconvex mini- Ibaraki, Japan
mization problems, each of which has an enormous E-mail address: takahitoOis, tsukuba, ac. jp
number of local minima. Nevertheless, algorithms
MSC2000: 90C26, 90C31
developed in the 1990s can locate a globally opti- Key words and phrases: optimization, nonconvex minimiza-
mal solution in a reasonable amount of time, by tion, low-rank nonconvexity.
exploiting special structures of f such as low-rank
nonconvexity. A comprehensive review of the al-
gorithms are given by H. Konno and T. Kuno in
MULTISTAGE STOCHASTIC PROGRAM-
[5] MING: BARYCENTRIC APPROXIMATION
See also: G l o b a l o p t i m i z a t i o n in m u l t i p l i c a - Many problems in finance, economics and other
tive p r o g r a m m i n g ; L i n e a r p r o g r a m m i n g ;
applications require that decisions xt E R m are
Multiparametric linear programming; Para-
made periodically over time, depending on obser-
metric linear programming: Cost simplex
vations of uncertain data (~t,~t) in future peri-
algorithm.
ods t = 1 , . . . , T . Here, it is distinguished be-
References tween random data ~?t C Ot C R Kt that influ-
[1] AVRIEL, M., DIEWERT, W.E., SCHAIBLE, S., AND ence prices in the objective function and random
ZANG, I.: Generalized concavity, Plenum, 1988. data ~t C Et C R L` that affect the demand on the
[2] EVANS, D.H.: 'Modular design: A special case in non- right-hand side of constraints in an optimization
linear programming', Oper. Res. 11 (1963), 637-647.
problem.
[3] GEOFFRION, M.: 'Solving bicriterion mathematical
programs', Oper. Res. 15 (1967), 39-54. Once an observation (Ut, ~t) becomes available,
[4] HENDERSON, J.M., AND QUANDT, R.E.: Microeco- the decision maker has to determine a policy
nomic theory, McGraw-Hill, 1971. xt that minimizes the costs flt(xt-l,xt, r]t) in t
[5] HORST, R., AND PARDALOS, P.M.: Handbook of global plus the expected costs in the subsequent peri-
optimization, Kluwer Acad. Publ., 1995. ods t + 1 , . . . , T , subject to a set of constraints
[6] HORST, R., AND TUY, H.: Global optimization: deter-
f t ( x t - l , x t ) ~_ h(~t). Both the objective function
ministic approaches, second ed., Springer, 1993.
[7] KONNO, H., AND INORI, M.: 'Bond portfolio optimiza- and the constraints may depend on the sequences
tion by bilinear fractional programming', J. Oper. Res. of observations rlt - ( r l l , . . . , tit), ~ t = (~1,... ,~t)
Japan 32 (1989), 143-158. up to t and earlier decisions x t-1 - ( x 0 , . . . , xt-1).
[8] KONNO, H., AND KUNO, T.: 'Linear multiplicative pro- Obviously, an action xt must be selected after
gramming', Math. Program. 56 (1992), 51-64.
(tit, ~t) is observed but before the future outcomes
[9] KONNO, H., THACH, P.T., AND Wuv, H.: Optimiza-
tion on low rank nonconvex structures, Kluwer Acad. r/t+l,...,rlT and ~ t + l , . . . , ~ T are known, i.e. the
Publ., 1997. decision is based only on information available at
[10] MALING, K., MUELLER, S.H., AND HELLER, W.R.: time t. Hence, one obtains a sequence of decisions
'On finding most optimal rectangular package plans': with the property x0, Xl (rl 1, ~ 1 ) , . . . , XT(~T, ~T),
Proc. 19th Design Automation Conf., 1982, pp. 663- called nonanticipativity. This results in a multi-
670.
[11] MATSUI, T.: 'NP-hardness of linear multiplicative pro-
stage stochastic program, which may be written
gramming and related problems', J. Global Optim. 9 in its dynamic representation as a series of nested
(1996), 113-119. two-stage programs (with C T + I ( ' ) : = 0, see [4]):
[12] PARDALOS, P.M.: 'Polynomial time algorithms for
some classes of constrained nonconvex quadratic prob- Ct(xt-1, ~?t,~t) ._ min ~pt(x t-l, xt, ~7t) (1)
lems', Optim. 21 (1990), 843-853. L

576
Multistage stochastic programming: Barycentric approximation

problem is, and if the set of scenarios can be im-


+ f ct+l(xt-1 , xt, r]t , r/t+1, ~ t, ~t+l) dPt+l} ,
proved w.r.t, the accuracy.
t - 0 , . . . ,T, For convex optimization problems where the
random data are decomposable in two groups, one
where the expectation is taken w.r.t, the probabil- that determines the cost function and the second
ity measure Pt+l(rlt+l, ~t+ll~ t, ~t) of the joint dis- one affecting the demand, it can be shown (see [4]
tribution of (~t+l,~t+l), subject to for details) that the value function (1) is a saddle
function for all t - 1,... , T under the following
.f,(x '-1, xt) < h(~t), (2)
conditions"
xt>_O.
i) Pt(') is concave in ~t,
In case of discrete distributions, it is well known ii) the left-hand sides of the constraints are de-
that one can immediately transform the stochas- terministic, and
tic multistage program given by (1) and (2) into a iii) the distribution function of Pt(.}~t-l,~ t-l)
(large) deterministic equivalent problem which can depends linearly on the past.
be solved by standard optimization tools, possibly
Then, (1) is concave in qt and convex in (xt, ~t).
combined with decomposition techniques to exploit
The situation where assumptions i)-iii) are ful-
the special structure of the problem (see e.g. [1],
filled is called the entire convex case.
[9], [10]). However, if the distribution is continuous
This underlying saddle property of the value
with some density function, it is in general impossi-
function motivates the application of barycentric
ble to do the integration in (1) exactly. One way to
approximation which derives two scenario trees ,4 u
overcome this difficulty is to approximate the (con-
and jt I. The associated approximate deterministic
tinuous) probability measure P, by a discrete one
programs provide upper and lower bounds to the
Qt. In MSP, this is usually done by constructing a
original problem. In this sense, barycentric approx-
scenario tree which can be illustrated as follows:
imation is a generalization of the inequalities due
to H.P. Edmundson [2] and A. Madansky [8] (see
--1, ~t--1) e.g. S t o c h a s t i c p r o g r a m s w i t h r e c o u r s e : U p -
p e r b o u n d s ) and J.L. Jensen [6] that is applica-
ble to saddle functions of correlated random data.
T) Here, it is assumed that Ot C R K* and ~t C R L~
are regular simplices whose vertices are denoted
Fig. 1. by u~,t, vt - 0,... ,Kt, and vt, t, #t - 0,... ,Lt.
Together with the associated scenario probabil- Both Ot and St may depend on prior observations
ities, this tree is defined formally as (~?t-1, ~t-1) although this is not stressed in the no-
tation for simplicity.
.,4._ {(rlT ~T). (~t,~t) E.At(r]t-l,~t-1) } To illustrate the way the discretization is per-
' Vt>0 '
formed, assume that a two-stage problem is given
(3)
(the time index is omitted here) with deterministic
T objective, i.e. only the right-hand side coefficients
q(,r, .- y[ q,(,,, h(~) are random (see e.g. [7]). For any ~ E ~, the
t=l
barycentric weights TO(~),..., TL(~) w.r.t, the sim-
The scenario tree represents an approximation plex .=. are given by
of the discrete-time process (r/t, ~t; t = 1 , . . . , T),
TO+'''+TL-- 1, (4)
and At(') denotes the set of finitely many outcomes
for (r/t,~t) conditioned on the history (r]t-l,~t-1). ToVo -Jr''''-Jr- TL VL -- ~ .
Again, this results in a sparse large scale program. Since ¢(x,~) is convex in ~ for all x, ~(~) "=
Naturally, the question arises how good the accu- ¢(~,~) is a convex function for any fixed first-
racy of the associated (deterministic) optimization stage decision ~. Due to convexity, ~(~) is bounded

577
Multistage stochastic programming: Barycentric approximation

from above for all ~ 6 E by a linear function and the tangent ¢(~) to ~o(~) at ~ is a lower bound
L
- ~-~t,=0Tt'(~)V~" To construct the 'classi- to the original function. Both linear approxima-
cal' Edmundson-Madansky upper bound (EM) for tions ¢(() and ~(~) to the convex value function
f ~(~)dP over the simplex E, ~ is replaced by a for a given policy are shown in Fig. 2.
discrete random variable with the same expecta- From a computational viewpoint, the original
tion, attaining values vo,..., VL. To obtain the cor- function ~o(~) is replaced by two linear affine func-
responding probabilities, ~ has to be replaced by tions. Clearly, ¢(~) and ~(~) can be integrated
- f ~ dP in (4), and the system must be solved easily over the support of ~. If there is only ran-
L
for TO,..., TL. Then, f ~(~) dR <_ Et,=0 Tt'(~)Vt,, domness in the objective with deterministic right-
and the weights may be interpreted as the proba- hand sides, a lower and an upper bound can be
bilities of the discrete outcomes. constructed by applying the same procedure to the
dual concave (maximization) problem, deriving an
upper bound from Jensen's inequality and a lower
approximation with the EM-rule.
Barycentric approximation combines these con-
cepts for stochastic objective and right-hand sides
[3] and extends them to the multistage case [4],
[5]. It derives distinguished points, so-called gen-
eralized barycenters, where the value function (1)
v0 Vl must be supported by two bilinear functions to
minimize the error induced by the approximation.
Fig. 2.
This is shown in Fig. 3 for Kt = Lt = 1, where
the minorant is supported at ~0 and ~1 and the
majorant at 770 and ~1.
Let At,0(7?t),..., At,K,(~?t), Tt,o(~t),..., Tt,L,(~t)
be the barycentric weights w.r.t. Ot and Et de-
fined analogously to (4). For both simplices, the
generalized barycenters and their probabilities are
given by

gt
r]t~t -- [q(r]#t)]-I " E uut f Aut(~]t)rttt(~t)dPt,
vt-o
q(n., ) - / rm (~t ) dPt, #t -- O , . . . , L t ,

Lt
~vt - [q(~vt)] -1" ~ vt,t / Avt(~t)Tta(~t) dPt,
#t=0

q((vt) - / A~t (7It)dPt, vt - O , . . . , K t .

Note that the integrand At,r(~t ) • T~,(~t) is a


bilinear function in (r/t,~t) since the barycentric
"'-----..... i / weights Am and Tv~ are linear in their components.
Obviously, a bilinear function is easy to integrate
Fig. 3.
which was the intention of the approximation.
On the other hand, a lower bound can be found The generalized barycenters ~v~, vt = 0 , . . . , Kt,
using Jensen's inequality" qo(~) <_f qo(~)dP, i.e. by are supporting points of the minorant. They are
evaluation of the function for the expectation of ~, combined with the vertices u~t and weighted with

578
Multistage stochastic programming: Barycentric approximation

the corresponding probabilities q ( ~ ) to obtain for t - 0 , . . . , T with CT+I (') -- ~I/T+I (') "-- 0. Ac-
discrete outcomes for the lower approximation of cording to [4], these are lower and upper bounds
the original measure Pt. This way, one derives a to the original value function, i.e.
discrete probability measure Q~ with support Ct(xt-1, ,f , < ,f ,
supp Q~ - {(u~,, ~,)" ut = 0 , . . . , Kt }. <_ t(zt-1,
Analogously, ~Tm, #t - 0 , . . . , Lt, are supporting In the entire convex case, the accuracy of the ap-
points for the majorant with assigned probabilities proximation is quantifiable by the difference be-
q(rh, ,). This induces a discrete measure Q~' for the tween the upper and lower bound. If required, the
upper approximation with approximation can be improved by partitioning
the simplices Ot and ~t. In case that the subsim-
suppQ~ - {(rh,,,vt, t). #t - O, . . . ,Lt} . plices become arbitrarily small, the extremal mea-
sures converge to Pt, and the convergence of the
Both measures represent the solutions of two upper and lower bounds to the expectation of the
corresponding moment problems. The advanta- value function is guaranteed (see [5] for details).
geous feature from a computational viewpoint is See also: S t o c h a s t i c p r o g r a m m i n g w i t h
that the generalized barycenters and their proba- simple i n t e g e r recourse; T w o - s t a g e stochas-
bilities are completely determined by the first mo- tic p r o g r a m s w i t h recourse; S t o c h a s t i c in-
ments of 7/t and ~t, and by the bilinear cross mo- t e g e r p r o g r a m m i n g : C o n t i n u i t y , stability,
ments E(~,~. ~m), u t - O , . . . , K t , #t - O , . . . , L t . r a t e s of convergence; G e n e r a l m o m e n t opti-
Note that the covariance of two random variables m i z a t i o n p r o b l e m s ; A p p r o x i m a t i o n of multi-
is derived from the first moments and the corre- v a r i a t e p r o b a b i l i t y integrals; D i s c r e t e l y dis-
sponding cross moments. Therefore, the measures t r i b u t e d s t o c h a s t i c p r o g r a m s : D e s c e n t di-
Q~ and Q~ incorporate implicitly a correlation be- r e c t i o n s a n d efficient points; S t a t i c stochas-
tween Ut and ~t. However, cross moments (or co- tic p r o g r a m m i n g models; S t a t i c stochas-
variances, respectively) between different elements tic p r o g r a m m i n g m o d e l s : C o n d i t i o n a l ex-
of Ut are not taken into account (the same holds for pectations; Stochastic programming mod-
the components of ~t). Hence, the formulae given els: R a n d o m objective; S t o c h a s t i c pro-
above are applicable without the assumption of in- g r a m m i n g : M i n i m a x a p p r o a c h ; Simple re-
dependent random variables. course p r o b l e m : P r i m a l m e t h o d ; Simple re-
Applying the approximation scheme dynam- course p r o b l e m : D u a l m e t h o d ; P r o b a b i l i s -
ically over time, one obtains two barycen- tic c o n s t r a i n e d linear p r o g r a m m i n g : Dual-
tric scenario trees A t and jt u with their ity t h e o r y ; P r o b a b i l i s t i c c o n s t r a i n e d p r o b -
path probabilities of type (3). The set of lems: C o n v e x i t y t h e o r y ; E x t r e m u m prob-
outcomes at stage t - 1 , . . . , T is given lems w i t h p r o b a b i l i t y functions: K e r n e l
by ,41(~7t-1,~t-l) -- suppQ~(.[rlt-l,~t-l) and
t y p e s o l u t i o n m e t h o d s ; A p p r o x i m a t i o n of
Au(~?t-1, ~t-1) _ supp Q~(.I~7t-l, ~t-1). Substitut- e x t r e m u m p r o b l e m s w i t h p r o b a b i l i t y func-
ing Pt in (1) by the discrete measures Q~ and q~' tionals; S t o c h a s t i c linear p r o g r a m s w i t h re-
yields two value functions course a n d a r b i t r a r y m u l t i v a r i a t e d i s t r i b u -
tions; S t o c h a s t i c p r o g r a m s w i t h recourse:
Ct(xt-1, ~t, ~t) ._ min { p t ( x t-l, xt, ut) Upper bounds; Stochastic integer programs;
L - s h a p e d m e t h o d for t w o - s t a g e stochas-
+ ~t+l ,xt ,~t+l ,~t+l)dQ~+l , tic p r o g r a m s w i t h recourse; S t o c h a s t i c lin-
ear p r o g r a m m i n g : D e c o m p o s i t i o n a n d cut-
~t (xt-1 , , )'-rain { Pt (zt-1 ,xt ) t i n g planes; S t a b i l i z a t i o n of c u t t i n g plane
a l g o r i t h m s for s t o c h a s t i c linear p r o g r a m -
..kfff2t+l(xt-lxt, l]tlTt+l,~t~t+l dQt+ }
, ,
m i n g p r o b l e m s ; T w o - s t a g e s t o c h a s t i c pro-
g r a m m i n g : Q u a s i g r a d i e n t m e t h o d ; Stochas-

579
Multistage stochastic programming: Barycentric approximation

tic quasigradient methods in minimax prob- (1906), 175-193.


lems; Stochastic programming: Nonantici- [7] KALL, P.: 'Bounds for and approximations to stochas-
pativity and Lagrange multipliers; Prepro- tic linear programs with recourse', in K. MARTI AND
P. KALL (eds.): Stochastic Programming Methods and
cessing in stochastic programming; Stochas-
Technical Applications, Springer, 1998, pp. 1-21.
tic network problems: Massively parallel so- [8] MADANSKY, A.: 'Bounds on the expectation of a con-
lution. vex function of a multivariate random variable', Ann.
Math. Statist. 30 (1959), 743-746.
References [9] MULVEY, J.M., AND RUSZCZYI<ISKI, A." 'A new sce-
[1] BIRGE, J.R., DONOHUE, C.J., HOLMES, D.F., AND nario decomposition method for large-scale stochastic
SVINTSITSKI, O.G.: 'A parallel implementation of the optimization', Oper. Res. 41 (1995), 477-490.
nested decomposition algorithm for multistage stochas- [10] ROSA, C.H., AND RUSZCZYI~SKI, A." 'On augmented
tic linear programs', Math. Program. 75 (1996), 327- Lagrangian decomposition methods for multistage sto-
352. chastic programs', Ann. Oper. Res. 64 (1996), 289-309.
[2] EDMUNDSON, H.P.: 'Bounds on the expectation of a
convex function of a random variable', Techn. Report Karl Frauendorfer
RAND Corp. 982 (1957). Inst. Operations Res. Univ. St. Gallen
[3] FRAUENDORFER, K.: Stochastic two-stage program- St. Gallen, Switzerland
ming, Springer, 1992. E-mail address: k a r l . frauendorforCunisg, ch
[4] FRAUENDORFER, K.: 'Multistage stochastic program- Michael Schiirle
ming: Error analysis for the convex case', Math. Meth. Inst. Operations Res. Univ. St. Gallen
Oper. Res. 39 (1994), 93-122. St. Gallen, Switzerland
[5] FRAUENDORFER, K.: 'Barycentric scenario trees in con- E-mail address: michael, schuorlo©unisg, ch
vex multistage stochastic programming', Math. Pro-
gram. 75 (1996), 277-293. MSC 2000:90C15
[6] JENSEN, J.L.: 'Sur les fonctions convexes et les Key words and phrases: stochastic programming, approxi-
in~galit~s entre les valeurs moyennes', Acta Math. 30 mation.

580

You might also like