Ahuja2001 ReferenceWorkEntry MaximumFlowProblemMaximumFlowP
Ahuja2001 ReferenceWorkEntry MaximumFlowProblemMaximumFlowP
structure'. Indeed, if a partial matrix A has a psd (where A . X "- ~-~in,j=l ai~xi j for two Hermitian
(pd, completely positive, distance matrix) comple- (n × n)-matrices A and X).
tion, then every principal specified submatrix of A The exact complexity status of problems (PSD)
is psd (pd, completely positive, a distance matrix); and (P) is not known; in particular, it is not known
similarly, if a partial matrix A admits a comple- whether they belong to the complexity class NP.
tion of rank _ k, then every specified submatrix However, it is shown in [60] that (P) is neither
of A has rank < k. Hence, having a completion of NP-complete nor co-NP-complete if NP~co-NP.
a certain kind imposes certain 'obvious' necessary However, the semidefinite programming problem
conditions. This leads to asking which are the pat- and, thus, problem (PSD) can be solved with an
terns for the specified entries that insure that if the arbitrary precision in polynomial time. This can
obvious necessary conditions are met, then there be done using the ellipsoid method (since one can
will be a completion of the desired type; therefore, test in polynomial time whether a rational matrix
this introduces a combinatorial aspect into matrix A is positive semidefinite and, if not, find a vec-
completion problems, as opposed to their analyti- tor x such that x*Ax < 0; cf. [24]), or interior
cal nature. point methods (cf. [56], [3], [27]). There has been
In this article we survey some results and pro- a growing interest in semidefinite programming in
vide references for the various matrix completion the recent years (1994), which is due, in particular,
problems mentioned above, concerning optimiza- to its successful application to the approximation
tion and combinatorial aspects of the problems. of hard combinatorial optimization problems (cf.
See [32], [47] for more detailed surveys on some of the survey [20]). This has prompted active research
the topics treated here. on developing interior point algorithms for solving
semidefinite programming problems; the literature
Positive Semidefinite Completion Problem. is quite large, see [65], [64] for extensive informa-
We consider here the following positive (semi) def- tion. Numerical tests are reported in [34] where
inite completion problem (PSD)" Given a partial an interior point algorithm is proposed for the ap-
Hermitian matrix A - (aij)ijES whose entries are proximate psd completion problem; it permits to
specified on a subset S of the positions, determine find exact completions for random instances up to
whether A has a psd (or pd) completion; if, yes, size 110.
find such a completion. (Here, S is generally as- Moreover, it is shown in [59] that problem (P)
sumed to contain all diagonal positions.) can be solved in polynomial time (for rational in-
This problem belongs to the most studied ma- put data Aj, bj) if either the number m of con-
trix completion problems. This is due, in partic- straints, or the order n of the matrices X, Aj in
ular, to its many applications, e.g., in probabil- (2) is fixed (cf. also [9]). Moreover, under the same
ity and statistics, systems engineering, geophysics, assumption, one can test in polynomial time the
etc., and also to the fact that positive semidefinite- existence of an integer solution and find one if it
ness is a basic property which is closely related to exists [39].
other matrix properties like being a contraction or Call a partial Hermitian matrix A partial psd
distance matrix. Equivalently, (PSD) is the prob- (respectively, partial pd) if every principal specified
lem of testing feasibility of the following system submatrix of A is psd (respectively, pd). As men-
(in variable X - ( x i j ) ) " tioned in the Introduction, being partial psd (pd)
is an obvious necessary condition for A to have a
X ~ O, xij - aij (ij E S). (1)
psd (pd) completion. In general, this condition is
Therefore, (PSD) is an instance of the following not sufficient; for instance, the partial matrix:
semidefinite programming problem (P)" Given Her-
mitian matrices A 1 , . . . , Am and scalars b l , . . . , bin,
decide whether the following system is feasible" A ~
!11!
1 1
1
?
1
0
X ~- O, Aj . X - bj (j - 1 , . . . , m ) (2) ? 1
222
Matrix completion problems
('?' indicates an unspecified entry) is partial psd, existence of psd completions. Namely, it is shown
yet no psd completion exists; note that the pat- in [8] that if a partial matrix A - (aij) with pat-
tern of A is a circuit of length 4. Call a graph tern G and diagonal entries equal to 1 is com-
chordal if it does not contain any circuit of length pletable to a psd matrix, then the associated vector
>_ 4 as an induced subgraph; chordal graphs oc- x "-(arccos(aij)/Tr)ijEE satisfies the inequalities"
cur in particular in connection with the Gaussian
Exe- ~ xe <_ lF] - I (3)
elimination process for sparse pd matrices (el. [61],
eEF eEC\F
[21]). (An induced subgraph of a graph G - (V, E)
for all F C C, C circuit in G, IF[ odd.
being of the form H - (U, F) where U C_ V and
F "- {ij E E" i , j E U}.) It is shown in [23] that Moreover, any partial matrix with pattern G sat-
every partial psd matrix with pattern G has a psd isfying (3) is completable to a psd matrix if and
completion if and only if G is a chordal graph; only if G does not contain a homeomorph of/(4
the same holds for pd completions. This extends as an induced subgraph (then, G is also known
an earlier result from [16] which dealt with 'block- as series-parallel graph) [44]. (Here, /(4 denotes
banded' partial matrices; in the Toeplitz case (all the complete graph on 4 nodes and a homeomorph
entries equal along a band), one finds the classical of/(4 is obtained by replacing the edges of/(4
Carath6odory-Fej6r theorem from function theory. by paths of arbitrary length.) The patterns G for
The proof from [23] is constructive and can be which every partial psd matrix satisfying (3) has
turned into an algorithm with a polynomial run- a psd completion are characterized in [6]; they are
ning time [48]. Moreover, it is shown in [48] that the graphs G which can be made chordal by adding
(PSD) can be solved in polynomial time when re- a set of edges in such a way that no new clique of
stricted to partial rational matrices whose pattern size 4 is created. Although (3) can be checked in
is a graph having a fixed minimum fill-in; the mini- polynomial time for rational x [5], the complexity
mum fill-in of a graph being the minimum number of problem (PSD) for series-parallel graphs (or for
of edges needed to be added in order to obtain a the subclass of circuits) is not known. A strength-
chordal graph. This result is based on the above ening of condition (3) (involving cuts in graphs) is
mentioned results from [59], [39] concerning the formulated in [44].
polynomial time solvability of (integer) semidefi- Another approach to problem (PSD) is consid-
nite programming with a fixed number m of linear ered in [I], [28], which is based on the study of the
constraints in (2). cone
The result from [23] on psd completions of par-
tial matrices with a chordal l~attern has been gen-
~ G "-- {
X -- (Xij)i,jEV" Vi ¢ j, ij ~_ E }
eralized in various directions; for instance, consid- associated to graph G = (V, E). Indeed, it is shown
ering general inertia possibilities for the comple- there that a partial matrix A with pattern G has
tions ([35], [17]), or considering completions with a psd completion if and only if
entries in a function ring [37].
Z + a x,j >_ o, vx (4)
If A is a partial matrix having a pd completion, iEV i¢j,
then A has a unique pd completion with maximum ijEE
determinant (this unique completion being charac- Obviously, it suffices to check (4) for all X extremal
terized by the fact that its inverse has zero entries in P c (i.e., X lying on an extremal ray of the cone
at all unspecified positions of A) [23]. In the case Pc).
when the pattern of A is chordal, explicit formu- Define the order of G as the maximum rank of
las for this maximum determinant are given in [7]. an extremal matrix in Pc. The graphs of order 1
The paper [52] considers the more general problem are precisely the chordal graphs [1], [58] and the
of finding a maximum determinant psd completion graphs of order 2 have been characterized in [46].
satisfying some additional linear constraints. One might reasonably expect that problem (PSD)
Further necessary conditions are known for the is easier for graphs having a small order. This is
223
Matrix completion problems
indeed the case for graphs of order 1; the complex- instance of the latter problem is the molecular
ity of (PSD) remains however open for the graphs conformation problem in chemistry; indeed, nu-
of order 2 (partial results are given in [48]). clear magnetic resonance spectroscopy permits to
determine some pairwise interatomic distances,
Euclidean Distance Matrix Completion the question being then to reconstruct the global
P r o b l e m . We consider here the Euclidean dis- shape of the molecule from this partial information
tance matrix completion problem (abbreviated (cf. [13], [41]).
as distance matrix completion problem) (EDM): In view of relation (6), problem (EDM) can be
Given a graph G = (Y = [1, n], E) and a real par- formulated as an instance of the semidefinite pro-
tial symmetric matrix A - ( a i j ) with pattern G gramming problem (P) and, therefore, it can be
and with zero diagonal entries, determine whether solved with an arbitrary precision in polynomial
A can be completed to a distance matrix; that is, time. Exploiting this fact, some specific algorithms
whether there exist vectors v l , . . . , v n E R k for based on interior point methods are presented in
some k > 1 such that [2] together with numerical tests. Moreover, prob-
aij - IIvi - vj [I2 for all ij E E. (5) lem (EDM) can be solved in polynomial time when
restricted to partial rational matrices whose pat-
(here, Ilvll - i~-~'kh_l v 2 denotes the Euclidean tern is a chordal graph or, more generally, a graph
norm of v E Rk.) The vectors v l , . . . , V n are with fixed minimum fill-in [48]; as in the psd case,
then said to form a realization of A. A variant of this follows from the fact (mentioned below) that
problem (EDM) is the graph realization problem partial matrices that are completable to a distance
(EDMk), obtained by letting the dimension k of matrix admit a good characterization when their
the space where one searches for a realization of A pattern is a chordal graph.
be part of the input data. While the exact complexity of problem (EDM)
Distance matrices are a central notion in the is not known, it has been shown in [62] that prob-
area of distance geometry; their study was ini- lem (EDMk) is NP-complete if k - 1 and NP-hard
tiated by A. Cayley in the 18th century and it if k > 2 (even when restricted to partial matrices
was continued in particular by K. Menger and I.J. with entries in {1, 2}). Finding e-optimal solutions
Schoenberg in the 1930s. They are, in fact, closely to the graph realization problem is also NP-hard
related to psd matrices. The following basic con- for small e ([53]). The graph realization problem
nection was established in [63]. Given a symmetric (EDMk) has been much studied, in particular in
(n x n)-matrix D -- (dij) ni,j--1 with zero diagonal dimension k < 3, which is the case most relevant
entries, consider the symmetric ((n - 1) x (n - 1))- to applications. The problem can be formulated as
matrix X - (xij )i,j=l
n-1 defined by a nonlinear global optimization problem: min f ( v )
such that v - ( v l , . . . , vn) E R kn, where the cost
xij - 1 (din + din - d i j )
-~ (6) function f(.) can, for instance, be chosen as
moreover, D has a realization in the k-space if and Hence, f(.) is zero precisely when the vi's provide
only if X has rank < k. Other characterizations are a realization of the partial matrix A. This opti-
known for distance matrices. As the literature on mization problem is hard to solve (as it may have
this topic is quite large, see the monographs [11], many local optimum solutions). Several algorithms
[13], [14], where further references can be found. have been proposed in the literature; see, in par-
Problems (EDM) and (EDMk) have many im- ticular, [13], [19], [26], [29], [31], [41], [54], [57].
portant applications; for instance, to multidimen- They are based on general techniques for global
sional scaling problems in statistics (cf. [49]) and to optimization like tabu and pattern search [57], the
position-location problems, i.e., problem (EDMk) continuation approach (which consists of trans-
mostly in dimension k < 3. A much studied forming the original function f (.) into a smoother
224
Matrix completion problems
225
Matrix completion problems
died separately). Below is an example of a partial equal to 0). On the other hand, determining mr(A)
matrix A which is a partial contraction, but which seems to be a much more difficult task.
is not completable to a contraction matrix: We first deal with the problem of finding maxi-
mum rank completions. Let A be an n × m partial
• ,/~
A- 1__ 1__ • matrix with graph pattern G, i.e., G is the bipar-
tite graph (U U V, E) where U, V index respec-
In fact, the graph pattern displayed in this exam- tively the rows and columns of A, and the edges
ple is in a sense present in every partial contraction of G correspond to the specified entries of A, and
which is not completable to a contraction. Namely, let G denote the complementary bipartite graph
it is shown in [36] that the following assertions (i- whose edges correspond to unspecified entries of
iii) are equivalent for a connected bipartite graph A. Note that computing MR(A) amounts to com-
G with node bipartition U t2 V: puting the generic rank of A when viewing the
i) Every partial contraction with pattern G can unspecified entries of A as independent variables
be completed to a contraction; over the field containing the specified entries. For
a subset X C U U V, let Ax denote the submatrix
ii) G does not contain an induced matching of
of A with respective row and column index sets
size 2 (i.e., if e := uv, e ~ := u~v ~ are edges in
{i e [1, n]" ui ¢ X} and {j e [1, m]" vj ~ Z } .
G with u ~ u ~ C U, v ~: v ~ E V, then at least
Call X a cover of G if every edge of G has at
one of the pairs uv ~, u~v is an edge in G; that
least one end node in X; that is, if Ax is a
is, G is nonseparable in the terminology of
fully specified submatrix of A. Clearly, we have:
[21]);
MR(A) __ r a n k ( A x ) + IX[. In fact, the following
iii) The graph G obtained from G by adding all equality holds"
edges uu' (u ~ u' E V) and vv' (v ~ v' E V)
is chordal. MR(A) - rain _ r a n k ( A x ) + IX[ (9)
X cover of G
(Note that the implication iii)~i) is a consequence
as shown in [12]. A determinantal version of the
of the result on psd completions from [23] men-
result was given in [25]. In the special case when
tioned in the Section on the positive semidefinite
all specified entries of A are equal to 0, then
completion problem above, as G is the graph pat-
MR(A) coincides with the maximum cardinality of
tern of the matrix A defined in (8).)
a matching in G and, therefore, the minimax rela-
tion (9) reduces to the Frobenius-KSnig theorem
R a n k C o m p l e t i o n s . In this section, we consider
(cf. [50] for details on the latter result). Moreover,
the problem of determining the possible ranks for
one can determine MR(A) and construct a max-
the completions of a given partial matrix. For a
imum rank completion of A in polynomial time.
partial matrix A, let mr(A) and MR(A) denote,
This was shown in [55] by a reduction to matroid
respectively, the minimum and maximum possible
intersection and, more recently, in [18] where a
ranks for a completion of A. If B, C are comple-
simple greedy procedure is presented that solves
tions of A of respective ranks mr(A), MR(A), then
the problem by perturbing an arbitrary comple-
changing B into C by changing one entry of B
tion.
into the corresponding entry of C at a time per-
We now consider m i n i m u m rank completions.
mits to construct completions realizing all ranks
To start with, note that mr(A) may depend, in
in the range [mr(A), MR(A)]. Hence, the question
general, on the actual values of the specified en-
is to determine the two extreme values mr(A) and
tries of A (and not only on the ranks of the speci-
MR(A). As we see below, the value MR(A) can, in
fied submatrices of A). Indeed, consider the partial
fact, be expressed in terms of ranks of fully spec-
ified submatrices of A and it can be computed in matrix A - d~ where a b, c, d, e J' ¢ 0 Then
, , • ,
polynomial time; this constitutes a generalization mr(A) - 1 if ace - bdf and mr(A) - 2 otherwise,
of the celebrated Frobenius-Khnig theorem (cor- while all specified submatrices have rank 1 in both
responding to the case when specified entries are cases. Thus arises the question of identifying the
226
Matrix completion problems
bipartite graphs G for which mr(A) depends only given sparsity pattern', Linear Alg. ~ Its Appl. 107
on the ranks of the specified submatrices of A for (1988), 101-149.
every partial matrix A with p a t t e r n G; such graphs
[2] ALFAKIH, A.Y., KHANDANI,A., AND WOLKOWICZ,H.:
'Solving Euclidean distance matrix completion prob-
are called rank determined. The graph p a t t e r n of lems via semidefinite programming', Comput. Optim.
the above instance A is the circuit C6. Hence, C6 Appl. 12 (1998), 13-30.
is not rank determined. Call a bipartite graph G [3] ALIZADEH, F.: 'Interior point methods in semidefinite
bipartite chordal if it does not contain a circuit of programming with applications in combinatorial opti-
length _ 6 as an induced subgraph. Then, if a bi- mization', SIAM J. Optim. 5 (1995), 13-51.
[4] BAKONYI, M., AND JOHNSON, C.R.: 'The Euclidian
partite graph is rank determined, it is necessarily
distance matrix completion problem', SIAM J. Matrix
bipartite chordal [12]. It is conjectured there that, Anal. Appl. 16 (1995), 646-654.
conversely, every bipartite chordal graph is rank [5] BARAHONA, F., AND MAHJOUB, A.R.: 'On the cut
determined. The conjecture was shown to be true polytope', Math. Program. 36 (1986), 157-173.
in [66] for the nonseparable bipartite graphs (i.e., [6] BARRETT, W.W., JOHNSON, C.R., AND LOEWY, R.:
'The real positive definite completion problem: cy-
the bipartite graphs containing no induced match-
cle completability', Memoirs Amer. Math. Soc. 584
ing of size 2; they are obviously bipartite chordal). (1996).
Note that a partial matrix A has a nonseparable [7] BARRETT, W.W., JOHNSON, C.R., AND LUNDQUIST,
p a t t e r n if and only if it has (up to row/column M.: 'Determinantal formulas for matrix completions as-
permutation) the following 'triangular' form: sociated with chordal graphs', Linear Alg. ~ Its Appl.
121 (1989), 265-289.
[8] BARRETT, W., JOHNSON, C.R., AND TARAZAGA, P.:
'The real positive definite completion problem for a
simple cycle', Linear Alg. ~ Its Appl. 192 (1993), 3-
31.
[9] BARVINOK, A.I.: 'Feasibility testing for systems of
real quadratic equations', Discrete Comput. Geom. 10
(1993), 1-13.
[10] BARVINOK, A.I.: 'Problems of distance geometry and
convex properties of quadratic maps', Discrete Corn-
put. Geom. 13 (1995), 189-202.
[11] BLUMENTHAL, L.M.: Theory and applications of dis-
tance geometry, Oxford Univ. Press, 1953.
Then, mr(A) can be explicitly formulated in [12] COHEN, N., JOHNSON, C.R., RODMAN, L., AND WOE-
DERMAN, H.J.: 'Ranks of completions of partial ma-
terms of the ranks of the specified submatrices
trices', in H. DYM ET AL. (eds.): The Gohberg Anniv.
of A; in the simplest case, the formula for mr(A) Coll., Vol. I, Birkh~iuser, 1989, p. 165-185.
reads" [13] CRIPPEN, G.M., AND HAVEL, T.F.: Distance geometry
and molecular conformation, Res. Studies Press, 1988.
mr(~ D) [14] DEZA, M., AND LAURENT, M.: Geometry of cuts and
metrics, Vol. 15 of Algorithms and Combinatorics,
Springer, 1997.
-rank(c ) + rank (C D) - rank(C). [15] DREW, J.H., AND JOHNSON, C.R.: 'The completely
positive and doubly nonnegative completion problems',
It is shown in [12] t h a t the above conjecture holds Linear Alg. 8J Its Appl. 44 (1998), 85-92.
when the p a t t e r n G is a path, or when G is ob- [16] DYM, H., AND GOHBERG, I.: 'Extensions of band ma-
tained by 'gluing' a collection of circuits of length trices with band inverses', Linear Alg. ~ Its Appl. 36
4 along a common edge. ( 1981), 1-24.
[17] ELLIS, R.L., LAY, D.C., AND GOHBERG, I.: 'On neg-
See also: I n t e r i o r p o i n t m e t h o d s for s e m i -
ative eigenvalues of selfadjoint extensions of band ma-
definite programming; Semidefinite pro- trices', Linear Alg. ~¢ Its Appl. 24 (1988), 15-25.
gramming and determinant maximization. [18] GEELEN, J.: 'Maximum rank matrix completion', Lin-
ear Alg. ~ Its Appl. 288 (1999), 211-217.
References [19] GLUNT, W., HAYDEN, T.L., AND RAYDAN, M.:
[1] AGLER, J., HELTON, J.W., MCCULLOUGH, S., AND 'Molecular conformations from distance matrices', J.
RODMAN, L." 'Positive semidefinite matrices with a
227
Matrix completion problems
Comput. Chem. 14 (1998), 175-190. [36] JOHNSON, C.R., AND RODMAN, L.: 'Completion of ma-
[2o] GOEMANS, M.X.: 'Semidefinite programming in combi- trices to contractions', J. Funct. Anal. 69 (1986), 260-
natorial optimization', Math. Program. 79 (1997), 143- 267.
161. [37] JOHNSON, C.R., AND RODMAN, L.: 'Chordal inheri-
[21] GOLUMBIC, M.C.: Algorithmic theory and perfect tance principles and positive definite completions of
graphs, Acad. Press, 1980. partial matrices over function rings', in I. GOHBERG
[22] GRAY, L.J., AND WILSON, D.G.: 'Nonnegative factor- ET AL. (eds.): Contributions to Operator Theory and
ization of positive semidefinite nonnegative matrices', its Applications, Birkh~iuser, 1988, p. 107-127.
Linear Alg. ~ Its Appl. 31 (1980), 119-127. [3s] JOHNSON, C.R., AND TARAZAGA, P.: 'Connections be-
[23] GRONE, R., JOHNSON, C.R., S/~, E.M., AND tween the real positive semidefinite and distance matrix
WOLKOWICZ, H.: 'Positive definite completions of par- completion problems', Linear Alg. FJ Its Appl. 223//4
tial hermitian matrices', Linear Alg. ~ Its Appl. 58 (1995), 375-391.
(1984), 109-124. [39] KHACHIYAN, L., AND PORKOLAB, L.: 'Computing inte-
[24] GROTSCHEL, M., Lovksz, L., AND SCHRIJVER, A.. gral points in convex semi-algebraic sets': 38th Annual
Geometric algorithms and combinatorial optimization, Symp. Foundations Computer Sci., 1997, p. 162-171.
Springer, 1988. [4o] KOGAN, N., AND BERMAN, A.: 'Characterization
HARTFIEL, D.J., AND LOEWY, R.: 'A determinantal of completely positive graphs', Discrete Math. 114
version of the Frobenius-K5nig theorem', Linear Mul- (1993), 297-304.
tilinear Algebra 16 (1984), 155-165. [41] KUNTZ, I.D., THOMASON, J.F., AND OSHIRO, C.M.:
[26] HAVEL, T.F.: 'An evaluation of computational strate- 'Distance geometry', Methods in Enzymologie 177
gies for use in the determination of protein structure (1993), 159-204.
from distance constraints obtained by nuclear mag- [42] LAMAN, G.: 'On graphs and rigidity of plane skeletal
netic resonance', Program. Biophys. Biophys. Chem. structures', J. Engin. Math. 4 (1970), 331-340.
56 (1991), 43-78. [43] LAURENT, M.: 'Cuts, matrix completions and graph
[27] HELMBERG, C., RENDL, F., VANDERBEI, R.J., AND rigidity', Math. Program. 79 (1997), 255-283.
WOLKOWICZ, H.: 'An interior-point method for semi- [44] LAURENT, M.: 'The real positive semidefinite comple-
definite programming', SIAM J. Optim. 6 (1996), 342- tion problem for series-parallel graphs', Linear Alg.
361. Its Appl. 252 (1997), 347-366.
[2s] HELTON, J.W., PIERCE, S., AND RODMAN, L.: 'The [45] LAURENT, M.: 'A connection between positive semidef-
ranks of extremal positive semidefinite matrices with inite and Euclidean distance matrix completion prob-
given sparsity pattern', SIAM J. Matrix Anal. Appl. lems', Linear Alg. ~ Its Appl. 273 (1998), 9-22.
10 (1989), 407-423. [46] LAURENT, M.: 'On the order of a graph and its
[29] HENDRICKSON, B.: 'The molecule problem: Deter- deficiency in chordality', CWI Report P N A - R 9 8 0 1
mining conformation from pairwise distances', Techn. (1998).
Report Dept. Computer Sci. Cornell Univ. 9 0 - 1 1 5 9 [47] LAURENT, M.: 'A tour d'horizon on positive semidef-
(1990), PhD Thesis. inite and Euclidean distance matrix completion prob-
[3o] HENDRICKSON, B.: 'Conditions for unique graph real- lems', in P.M. PARDALOS AND H. WOLKOWICZ(eds.):
izations', SIAM J. Comput. 21 (1992), 65-84. Topics in Semidefinite and Interior-Point Methods,
[31] HENDRICKSON, B.: 'The molecule problem: exploiting Vol. 18 of Fields Inst. Res. Math. Sci. Commun., Amer.
structure in global optimization', SIAM J. Optim. 5 Math. Soc., 1998, p. 51-76.
(1995), 835-857. [4s] LAURENT, M.: 'Polynomial instances of the positive
[32] JOHNSON, C.R.: 'Matrix completion problems: A sur- semidefinite and Euclidean distance matrix completion
vey', in C.R. JOHNSON (ed.): Matrix Theory and Appl., problems', SIAM J. Matrix Anal. Appl. (to appear).
Vol. 40 of Proc. Symp. Appl. Math., Amer. Math. Soc., [49] LEEUW, J. DE, AND HEISER, W.: 'Theory of mul-
1990, p. 171-198. tidimensional scaling', in P.R. KRISHNAIAH AND
[33] JOHNSON, C.R.., JONES, C., AND KROSCHEL, B.: L.N. KANAL (eds.): Handbook Statist., Vol. 2, North-
'The distance matrix completion problem: cycle com- Holland, 1982, p. 285-316.
pletability', Linear Multilinear Algebra 39 (1995), 195- [~0] Lov~.sz, L., AND PLUMMER, M.D." Matching theory,
207. Akad. KiadS, 1986.
[34] JOHNSON, C.R., KROScHEL, B., AND WOLKOWICZ, [51] LOVASZ, L., AND YEMINI, Y." 'On generic rigidity in
H.: 'An interior-point method for approximate posi- the plane', SIAM J. Alg. Discrete Meth. 3 (1982), 91-
tive semidefinite completions', Comput. Optim. Appl. 98.
9 (1998), 175-190. [52] LUNDQUIST, M.E., AND JOHNSON, C.R.: 'Linearly con-
[35] JOHNSON, C.R., AND RODMAN, L.: 'Inertia possibili- strained positive definite completions', Linear Alg.
ties for completions of partial Hermitian matrices', Lin- Its Appl. 150 (1991), 195-207.
ear multilinear algebra 16 (1984), 179-195. [53] MOR~, J.J., AND W v , Z" 'e-optimal solutions to dis-
228
Matroids
229
Matroids
230
Matroids
p(A) - p(E).
xEA
[2] V]
W i t h these further definitions at hand, the follow-
ing theorems express three other equivalent ax-
M i n o r of M a t r o i d s : R e s t r i c t i o n and Con-
iomatic characterizations of a matroid in terms of
traction. A minor of a matroid M - ( E , I ) is
its rank.
a 'submatroid' obtained from deleting or contract-
THEOREM 15 A function p" 2 E -+ N is a rank ing from the ground set E one or more elements.
function of a matroid M - (E, I) if and only if for A loop is an element y of a matroid such that
all X C E and for all y, z E E the following three {y} is not independent. EquivMently, {y} does not
properties hold: lie in any independent set, nor in maximal inde-
1) p(O) - O; pendent sets.
2) p(X) <_ p(X U {y}) _< p ( X ) + 1; DEFINITION 19 Let M - ( E , I ) be a matroid. If
3) p(x) - p(X u {y}) - p ( x u {z)) p(x u an element {x} is not a loop, the matroid M/x,
{y, z}) - p(X). called a contraction of M, is defined as follows:
231
Matroids
space V over F, with some finite set E of vectors Uniform Matroid. Let E be a set of n elements
of V, so that M is isomorphic to the vectorial ma- and let I be the family of subsets A of E such
troid of the set E. A binary matroid is a matroid that IAI _< k < n. T h e n M = ( E , I ) is called the
representable over GF(2), while a ternary matroid uniform matroid of rank k and is denoted by Uk,n.
is representable over GF(3). The sets of the bases and the circuits of Uk,n are
In recent literature (as of 1999) the problem of
B-{X E" IXl-k}
classifying all the fields over which a given matroid
is representable and the inverse problem of char- and
acterizing all the matroids that are representable C-{XCE Ixl-k+l},
on a given field have had growing interest. An im-
respectively.
portant result for matroid representability is the
following theorem. Moreover, for all A C_ E,
232
Matroids
A have the same head (or, equivalently, the same A and B TA - O. B T is the dual matroid M of the
tail). vectorial matroid M.
Dual Matroids. Let M - (E, I) be a matroid, and
let B be its set of bases. Greedy Algorithms on Weighted Matroids.
Many combinatorial problems for which the greedy
The dual matroid M is the matroid on the
technique gives an optimal solution can be formu-
ground set E, whose bases are the complements
lated in terms of finding a maximum-weight inde-
of the bases of M. Thus, a set A is independent in
pendent subset in a weighted matroid. In more de-
M if and only if A is disjoint from some basis of
tail, there is given a weighted matroid M = (E, I)
M. Note that M - M.
and the objective is to find an independent set
For a pair of matroids (M, M) and their rank
A C I such that w(A) is maximized (also called
functions, the following propositions hold.
an optimal subset of M). Since the weight w(x) of
PROPOSITION 25 Let M - ( E , I ) be a matroid, any element x 6 E is positive, a maximum-weight
and let p be its rank function. Let M - (E, 7) be independent subset is always a maximal indepen-
the dual matroid of M; then dent subset.
~(A) - ]A I + p(E \ A) - p(E), In the minimum spanning tree problem, for ex-
ample, there are given a connected undirected
for each A C E. [-1 graph G = (V, E) and a length function w such
PROPOSITION 26 Let M be the dual of the ma- that w(e) is the positive length of the edge e. The
troid M - (E, I), let A be a subset of E and let objective is to find an acyclic subset T of E that
A - E \ A. If p and ~ are the rank functions of M connects all of the vertices of G and whose total
and M respectively, then length
233
Matroids
(E, I) and a weight function w and returns an op- this property guarantee the applicability of greedy
timal subset A. strategies as well as dynamic programming algo-
set A = 0 rithms.
sort E[M] = { x l , . . . ,xt} into nonincreasing or- See also: O r i e n t e d m a t r o i d s .
der by weight w
FOR i = 1 to t
IF A U {xi} E I[M] References
set A = AU {xi} [1] AIGNER, M.: Combinatorial theory, Springer, 1979.
return(A) [2] BACHEM, A., AND KERN, W.: Linear programming du-
ality: An introduction to oriented matroids, Springer,
GREEDY(M, w).
1992.
Like any other greedy algorithm, GREEDY always [3] BJ(~RNER, A., VERGNAS, M. LAS, STURMFELS, B.,
WHITE, N., AND ZIEGLER, G.M.: Oriented matroids,
makes the choice that looks best at the moment.
Vol. 46 of Encycl. Math. Appl., Cambridge Univ. Press,
In fact, it considers in turn each element xi be- 1993.
longing to E[M], whose element are sorted into [4] CORMEN, T.H., LEISERSON, C.E., AND RIVEST, R.L.:
nonincreasing order by weight w and immediately Introduction to algorithms, MIT, 1990.
adds x to the building set A if A U {xi} is still [5] EDMONDS, J.: 'Lehman's switching game and a theo-
rem of Tutte and Nash-Williams', J. Res. Nat. Bureau
independent. Note that the returned set A is al-
Standards (B) 69 (1965), 73-77.
ways independent, because it is initialized to the [6] EDMONDS, J.: 'Maximum matching and a polyhedron
empty set, which is independent by definition of with {0, 1} vertices', J. Res. Nat. Bureau Standards (B)
a matroid, and then at each iteration an element 69 (1965), 125-130.
xi is added to A while preserving the A's inde- [7] EDMONDS, J.: 'Minimum partition of a matroid into
pendence. A is also an optimal subset of the ma- independent subsets', J. Res. Nat. Bureau Standards
(B) 69 (1965), 67-72.
troid M and therefore, a minimum spanning tree
[8] EDMONDS, J.: 'Paths, trees, and flowers', Canad. J.
for the original graph G. To prove its optimality, Math. 17 (1965), 449-467.
it is enough to show that weighted matroids ex- [9] EDMONDS, J.: 'Optimum branchings', J. Res. Nat. Bu-
hibit the two ingredients whose existence guaran- reau Standards (B) 71 (1967), 233-240.
tee that a greedy strategy will solve optimally the [10] EDMONDS, J.: 'Systems of distinct representatives and
linear algebra', J. Res. Nat. Bureau Standards (B) 71
given problem: the greedy-choice property and the
(1967), 241-245.
optimal substructure property. The proof that ma- [11] EDMONDS, J.: 'Submodular functions, matroids, and
troids exhibit both these properties can be found certain polyhedra', in R. GuY, H. HANANI, N. SAVER,
in [4]. Generally speaking, the proof of the ex- AND J. SCH(~NHEIM (eds.): Combinatorial Structures
hibition of the greedy-choice property consists of and Their Applications, Gordon and Breach, 1970.
showing that a globally optimal solution can be ob- [12] FUJISHIGE, S.: 'Submodular functions and optimiza-
tion', Ann. Discrete Math. 47 (1991).
tained by making a locally optimal (greedy) choice.
[13] H.H. CRAPO, G.C. ROTA: On the ]oundations of com-
The proof examines a global optimal solution. It binatorial theory: Combinatorial geometries, prelimi-
shows that the solution can be modified so that a nary ed., MIT, 1970.
greedy choice is made at the first step and that this [14] KRUSKAL, J.B.: 'On the shortest spanning subtree
choice reduces the original problem into an equiv- of a graph and the traveling salesman problem',
Proc. Amer. Math. Soc. 7 (1956), 48-50.
alent problem having smaller size. By induction,
[15] KUNG, J.P.S.: 'Numerically regular hereditary classes
it is proved that a greedy choice can be made at of combinatorial geometries', Geometriae Dedicata 21
each step. To show that making a greedy choice (1986), 85-105.
reduces the original problem into a similar but [16] KUNG, J.P.S.: A source book in matroid theory,
smaller problem reduces the proof of correctness to Birkh~iuser, 1986.
demonstrating that an optimal solution must ex- [17] LAWLER, E.L.: Combinatorial optimization: Networks
and matroids, Holt, Rinehart and Winston, 1976.
hibit optimal substructure. The optimal substruc-
[18] LOV~.SZ, L., AND PLUMMER, M.D." Matching theory,
ture property is exhibited by a given problem, if an Akad. Kiad6, 1986.
optimal solution to the problem contains within it [19] MAFFIOLI, F.: Elementi di programmazione matemat-
optimal solutions to subproblems. The validity of ica, Masson, 1990.
234
Maximum constraint satisfaction: Relaxations and upper bounds
[2o] MUROTA, K., IRI, M., AND NAKAMURA, M.: 'Com- MAXIMUM C O N S T R A I N T SATISFACTION:
binatorial canonical form of layered mixed matrices
RELAXATIONS AND UPPER B O U N D S
and its application to block-triangularization of sys-
tems of linear/nonlinear equations', SIAM J. Alg. Dis- M a x i m u m constraint satisfaction problems (MAX-
crete Meth. 8 (1987), 123-149. CSPs) generalize m a x i m u m satisfiability (MAX-
[21] OXLEY, J.G.: Matroid theory, Oxford Univ. Press, SAT) to include cases where the variables are no
1992. longer restricted to binary (or Boolean) values.
[22] PRIM, R.C.: 'Shortest connection networks and some
MAX-CSP is NP-complete even in the special
generalizations', Bell System Techn. J. 36 (1957),
1389-1401. case of binary CSPs. Therefore designing proce-
[23] RECSKI, A.: Matroid theory and its application in elec- dures to compute upper bounds to the exact (un-
trical networks and statics, Springer, 1989. known) optimum value (maximum number of sat-
[24] SCHRIJVER, A.: Theory of linear and integer program- isfied constraints) is a relevant issue. Such bounds
ming, Wiley, 1986.
may be useful, in particular, to provide estimates
[25] TUTTE, W.W.: 'A homotopy theorem for matroids I-
II', Trans. Amer. Math. Soc. 88 (1958), 144-160; 161- of the quality of solutions obtained from various
174. heuristic approaches.
[26] TUTTE, W.W.: 'Matroids and graphs', Trans. Amer. This article describes a systematic way of com-
Math. Soc. 90 (1959), 527-552. puting upper bounds for large scale MAX-CSP in-
[27] TUTTE, W.W.: 'An algorithm for determining wheter
stances such as those arising from the so-t:alled ra-
a given binary matroid is graphic', Proc. Amer. Math.
Soc. 11 (1960), 905-917. dio link frequency a s s i g n m e n t problem (RLFAP).
[2s] TUTTE, W.W.: 'On the problem of decomposing a After discussing the general relaxation principle
graph into n connected factors', J. London Math. Soc. and the basic procedure from which the bounds
36 (1961), 221-230. are derived, we present results of extensive com-
[29] TUTTE, W.W.: 'From matrices to graphs', Canad. J. putational experiments on series of 90 instances
Math. 16 (1964), 108-127.
[30] of RLFAP including both real test problems and
TUTTE, W.T.: 'Lectures on matroids', J. Res. Nat. Bu-
reau Standards (B) 69 (1965), 1-47. randomly generated 'realistic' test problems (for
[31] TUTTE, W.W.: Connectivity in graphs, Univ. Toronto sizes ranging from 396 variables and about 1700
Press, 1966. constraints to 831 variables and about 4800 con-
[32] TUTTE, W.T.: 'Connectivity in matroids', Canad. J. straints).
Math. 18 (1966), 1301-1324.
These results clearly indicate that the proposed
[33] TUTTE, W.T.: Introduction to the theory of matroids,
Amer. Elsevier, 1971. approach is practically useful to produce fairly
[34] WELSH, D.J.A.: Matroid theory, Acad. Press, 1976. accurate upper bounds for such large MAX-CSP
[35] WHITE, N.: Theory of matroids, Cambridge Univ. problems.
Press, 1986.
[36] WHITE, N.: Combinatorial geometries, Cam-
bridge Univ. Press, 1987. I n t r o d u c t i o n . Constraint satisfaction problems
[37] WHITE, N.: Matroid applications, Cambridge Univ. (CSPs) may be viewed as a generalization of sat-
Press, 1991.
[38] isfiability (SAT) to include cases where, instead of
WHITNEY, H.: 'On the abstract properties of linear de-
pendence', Amer. J. Math. 57 (1935), 509-533. taking binary values only (0-1 or true-false) the
variables may take on a finite number (> 2) of
given possible values.
Paola Festa
Dip. Mat. e Inform. Univ. Salerno For an infeasible CSP, a relevant question, both
Via S. Allende theoretically and practically, is to determine an as-
84081 Baronissi (SA), Italy signment of values to variables such that the num-
E-mail address: paofes~unisa, it ber of satisfied constraints is the largest possible.
This is the so-called m a x i m u m constraint satisfac-
MSC2000: 90C09, 90C10 tion problem (MAX-CSP), which generalizes in a
Key words and phrases: combinatorial optimization, greedy natural way m a x i m u m satisfiability (MAX-SAT).
technique, graph optimization. Since MAX-2SAT is NP-complete (see e.g. [12,
pp. 259-260]) even the subclass of MAX-CSP cor-
235
Maximum constraint satisfaction: Relaxations and upper bounds
responding to binary CSPs (those problems with if the combination is allowed, FALSE oth-
constraints involving pairs of variables only) is erwise. (For any S C { 1 , . . . , n ) and x C
NP-complete. Therefore, for very large instances D1 × . . . × Dn, x[s] denotes the vector x re-
such as those arising from practical applications stricted to components in S.)
(e.g. the RLFAP discussed below) one can only
Given a CSP specified as above, we define a free
hope for approximate solutions using some of the
assignment as any n-tuple x E D - D1 × . . . × Dn.
currently available heuristic approaches such as:
A feasible assignment (or solution) is a free as-
simulated annealing, tabu search, genetic algo-
signment such that ~k(X[sk]) -- TRUE for all
rithms, or local search of various kinds.
k-1,...,K.
However, for many applications, getting an ap-
For simplicity, we restrict here to the case where
proximate solution without any information about
each variable takes scalar values only (i.e. real or
the quality of this solution (e.g. measured by the
integer values), but we note that more general
difference between the cost of this solution and the
CSPs may be defined with variables taking, for
optimal cost) may be of little value.
instance, vector values.
We address in this paper the problem of com-
The arity of a constraint ~k is the cardinality
puting upper bounds to the optimum cost of MAX-
of its support set" ISkl -- Isupp(~k)[. A binary
CSP problems from which estimates on the quality
CSP is a constraint satisfaction problem in which
of heuristic solutions can be derived.
Isupp(~k)l _< 2 for all k - 1 , . . . , g .
The article is organized as follows. Basic defi-
The constraint hypergraph associated with a
nitions about CSPs and MAX-CSPs are recalled
given CSP is the hypergraph having vertex set
in the second section. Modeling the so-called ra-
I - { 1 , . . . , n } and edge set { $ 1 , . . . , SK}. In case
dio link frequency assignment problem (RLFAP)
of a binary CSP this is a graph.
in terms of CSP and MAX-CSP is addressed in the
The two examples below are interesting spe-
third section. Then we present a general class of re-
cial cases of the general definition and show
laxations for MAX-CSP problems and its special-
NPcompleteness of arbitrary CSPs.
ization to the computation of MAX-CSP bounds
for RLFAP. Finally results of extensive computa- EXAMPLE 1 (Satisfiability) SAT is easily recog-
tional experiments carried out on series of both nized as a special case of CSP where Vi" Di =
real test problems and realistic randomly gener- {TRUE, FALSE} and where there is a constraint
ated test problems are presented. To our knowl- ~k corresponding to each clause Ck with ~k(X) --
edge, this is the first time extensive computational TRUE ¢:~ clause Ck is satisfied under truth assign-
results of this kind are reported for such large scale ment x. U
MAX-CSP problems.
EXAMPLE 2 (Hypergraph q-coloring; see [2, Chap.
19]) Let q > 1 be a given integer and H - I V , E]
C S P a n d M A X - C S P . A constraint satisfaction
an hypergraph with vertex set V and edge set E.
problem (CSP) is defined by specifying:
The problem is to assign one out of q colors to each
• a set of n variables x l , . . . , x n ; vertex of H so that each edge of H has vertices of
• for each variable xi, i E I = { 1 , . . . , n } the different colors. Clearly this may be formulated as
domain of i, i.e. the (finite) set Di of possible a CSP problem where there is one variable xi for
values for xi; each vi E V, with domain Di - { 1 , . . . , q}, and one
constraint ~k for each edge ek -- { i l , . . . , ip} E E
• a set of K constraints ~k, k = 1 , . . . , K . For
such that ~ ( 5 i ~ , . . . , xip) - TRUE ¢~ no two values
each k C [1, K], constraint ~k is defined by
in {~i~,..., xip } are equal. Note that when H is a
its support set (i.e. the subset Sk = supp(~k)
graph (i.e. lekl -- 2 for all ek E E), the resulting
of indices of the variables involved in the
CSP is a binary CSP. [-1
constraint) and an oracle which, given any
combination x[sk] of values for variables in For an infeasible CSP, one basic question is to de-
Sk, answers TRUE if ~k (x[sk]) -- TRUE, i.e. termine a 'best possible' or 'least infeasible' as-
236
Maximum constraint satisfaction: Relaxations and upper bounds
signment. If the criterion for quality (or degree of of noninterference constraints, (most constraints
'feasibility') of an assignment x is taken to be the usually involving pairs of links). A CSP formula-
number a(x) of constraints satisfied under that as- tion of RLFAP is as follows: With n denoting the
signment, we are led to the so-called MAX-CSP number of links, for each link i - 1 , . . . , n, there
problem: is an associated variable xi representing the fre-
• Given: a CSP defined by its variables quency to be assigned to link i. The domain Di of
x l , . . . , x n , domains D 1 , . . . , D n , and con- xi is the (finite) set of allowed frequencies for link
straints ~ 1 , . . . , ~g. i (frequencies are expressed in Hz, KHz, MHz or
any other specified unit).
• Find: x E D1 × . ' . × Dn such that
Any assignment x E S - D1 × " " × Dn is not al-
a(x)- I{k E [1, K]" ~k(X[sk])- TRUE} I lowed because a number of constraints, called non-
is maximized. interference constraints have to be satisfied.
We will only consider here the case of binary
EXAMPLE 3 ( M A X - S A T , MAX-2SAT) Clearly,
noninterference constraints (i.e. involving only
MAX-SAT is a special case of MAX-CSP when
pairs of links), which is relevant to many appli-
the given CSP is a satisfiability problem. The as-
cations of interest (see e.g. [16], [15]). For a given
sociated decision problem is NP-complete even for
pair of links i and j, two (exclusive) types of con-
the special case of MAX-2SAT ([13]), showing that
straints are possible:
MAX-CSP is NP-complete even for binary CSPs.
[-1 • equality constraints of the form
237
Maximum constraint satisfaction: Relaxations and upper bounds
must be disregarded because they are physically We note here that in the case where an upper
meaningless, therefore, from now on, we will only bound 3 is found such that ~ < K, then we can
consider assignments in S ~ as possible solution to deduce that the given RLFAP has no feasible solu-
RLFAP. tion. Thus, an interesting by-product of computing
An assignment in S ~ which satisfies all con- bounds will be to produce proofs of infeasibility of
straints of type (I) will be called feasible. The fea- a given instance of RLFAP. Clearly, such an infor-
sibility version of RLFAP may therefore be stated mation may be of considerable importance to the
as the following CSP: practitioner.
• Given" an instance of RLFAP.
A G e n e r a l Class of R e l a x a t i o n s for C o m -
• Question" does there exist a feasible fre-
p u t i n g M A X - C S P B o u n d s . MAX-CSP may be
quency assignment ?
reformulated as the discrete optimization problem
• Answer" yes or no and, if yes, output a feasi-
K
ble assignment x.
max z -- ~ Yk
Efficient solution methods for RLFAP are of ma- k=l
jor interest to numerous practical applications in s.t. gk(x) >_ Yk, V k - 1,... ,K, (1)
the context of civilian mobile communication net- Yk -- 0 or 1, Vk,
works as well as of military networks. Since the x- ( X l , . . . ,x~) T ~ S'.
available spectrum is severely limited and the com-
munication needs (traffic requirements) are con- In the above, for all k - 1 , . . . , K , gk(x) _> 1 if
tinuously increasing, a high proportion of the in- ~k(X[sk]) -- TRUE, and gk(x) < 1 if ~k(X[sk] ) --
stances of the RLFAP encountered in applications FALSE. Note that in the case of RLFAP, this spe-
turn out to be infeasible. cializes to: gk(x) - - I x i - xj[/Wk, where xi and xj
When faced with an instance which is either are the two variables involved in constraint k, and
infeasible or which is presumably infeasible (e.g. Wk the weight of constraint k.
because running a heuristic solution method just A relaxation of an optimization problem such as
failed to produce a feasible solution) a key ques- (1) is obtained by replacing its solution set by a
tion for the practitioner becomes to determine a larger solution set. Clearly if the relaxed problem
'best possible' or 'least infeasible' assignment. can be solved exactly (i.e. to guaranteed optimal-
This leads to the 'optimization version' of the ity) then its optimal objective function value is
RLFAP in the form of the following M A X - C S P : an upper bound (in case of maximization) to the
optimum objective function value of the original
• Given: an instance of RLFAP with n vari-
problem.
ables (links) and K constraints.
There exists a number of standard ways of re-
• Question: determine x* E S ~ such that a(x*) laxing an optimization problem such as (1), e.g.
(number of satisfied constraints) is maxi- using Lagrangian relaxation (e.g. [10]) or consid-
mized: ering the so-called continuous relaxation of some of
a(x*) - max{a(x)}. the variables (e.g. relaxing the constraints on the
xES' Yk variables in (1) to 0 < Yk ~_ 1). However, in our
In view of the NP-completeness of MAX-CSP treatment of RLFAP, those standard relaxations
for binary CSPs, guaranteed optimal solutions to have not been considered because they do not give
the above for large scale instances (such as those of rise to easily solvable relaxed problems. We there-
the CELAR benchmarks) cannot be reasonably ex- fore investigated a different approach according to
pected from currently available techniques in com- the following general principle.
binatorial optimization. A less ambitious, though The relaxations we consider are based on the
practically relevant objective, addressed in the fol- identification of those parts of the constraint graph
lowing section, is to try and obtain good upper or hypergraph which are responsible for the infea-
bounds to an optimal solution value. sibility of the whole problem. Preliminary compu-
238
Maximum constraint satisfaction: Relaxations and upper bounds
tational results obtained in [25] have shown that, the procedure SOLVE.RELAX below). Clearly,
at least for MAX-CSP problems deriving from RL- any such upper bound still provides a valid upper
FAP, it is most often possible to identify in a given bound to the original problem. Of course, in the
instance an infeasible induced subproblem of suf- above approach, the quality of the bound derived
ficiently reduced size to make the corresponding from R[~'] essentially depends on how to select
MAX-CSP bound computable in reasonable time. the subset )~'. We now describe the selection pro-
This suggests to consider relaxations of (1) cedure which has been used in our computational
formed by subproblems induced by properly cho- experiments.
sen subsets of constraints. Thus, if K:' C ~ =
{ 1 , . . . , K} is the subset of constraints chosen, the
induced relaxation considered is: Building Relaxations for RLFAP Using
K M a x i m u m Cliques. We now specialize the gen-
max z -- E Yk eral relaxation scheme described above to derive
k=l bounds for RLFAP. The presentation below im-
s.t. gk(x) >_ Yk, Vk E E', (2) proves and extends our preliminary work in [25].
Yk = 0 or 1, Vk = 1,... ,K, The basic idea of our selection procedure for
xES'. choosing K:' C K: is that, for RLFAP, infeasibility is
more likely to occur on subsets of links which are
Note that, in an optimal solution to (2) all mutually constrained, i.e. on subsets of links
which induce a clique (complete subgraph) in the
k E )~\]C' ~ Yk = 1.
constraint graph. Since for RLFAP the constraint
Therefore ~, the optimum objective function value graphs arising from applications are always very
of (2), may be rewritten as: sparse (less than 1% density for the CELAR in-
stances), it is known that finding a clique of maxi-
mum cardinality can be efficiently done even using
where ~' is the optimum value of the problem: simple approaches such as implicit enumeration.
max z'- E Yk In [6] an efficient implicit enumeration based al-
kEK7 gorithm with good computational results for large
s.t. gk(x) >_ Yk, Vk E ~', sparse graphs up to 3000 vertices is described;
yk--Oorl, Vk E K:', however, it assumes very small maximum clique
sizes (in the computational results presented in
xES'.
[6], maximum clique sizes do not exceed 11, and
Clearly, the constraint graph or hypergraph G' the running times seem to increase extremely fast
corresponding to a relaxation R[~'] is deduced with this parameter). Unfortunately, in view of the
from the constraint graph or hypergraph G by fact that, for our large RLFAP instances, the max-
deleting all edges associated with the constraints imum clique sizes turned out to be commonly in
in K~\~'. Also observe that if G' has several distinct the range [12, 25], the above algorithm could not
connected components, then the solution of R[K'] be used.
decomposes into independent subproblems, one for We therefore worked out a different implementa-
each connected component. tion of the implicit enumeration technique which
If the constraint graph or hypergraph G' is of allowed us to find guaranteed maximum cliques
sufficiently small size, then it is possible to solve for all the test problems treated within acceptable
R[~'] exactly, and the optimum solution value ob- computing times (see results at the end of the pa-
tained clearly leads to an upper bound to the op- per). Using this maximum clique algorithm, the
timum value of the original problem. When G' is procedure for building a relaxation to MAX-CSP
too large to get the exact optimal solution value of for R F L A P is as follows.
R[~ ~] then we will content ourselves with getting The heuristic solution method used in our ex-
an upper bound to this exact optimal value (see periments to implement step b l) is a variant of
239
Maximum constraint satisfaction: Relaxations and upper bounds
240
Maximum constraint satisfaction: Relaxations and upper bounds
tion of the tree search (i.e. when all the nodes of 5 minutes to 35 minutes with an average of about
the tree have been explored implicitly or explic- 12 minutes.
itly). Prob. n K NF Relaxation
# # var. # const.
1 680 2389 8 44 257
2 680 3367 16 38 339
Computational R e s u l t s . In order to validate
3 680 4103 24 84 671
the above described approach, systematic compu- 4 680 2725 8 74 490
tational experiments have been carried out on two 5 680 2576 8 46 311
series of test problems. 6 680 2470 8 44 284
The first set was composed of 15 infeasible real 7 831 3451 16 16 113
8 831 4802 24 33 248
problems which arose from actual network engi-
9 396 1792 12 70 375
neering studies carried out on three distinct large 10 396 1792 12 70 375
radio link networks (one in the 2GHz frequency 11 396 1792 12 70 375
range, one in the 2, 5GHz frequency range and one 12 396 1792 12 70 375
in the 4GHz frequency range). 13 396 1792 12 70 375
14 396 1792 12 70 37~
The second series concerned a set of 5 x 15 -
15 396 1792 12 70 375
75 'realistic' test problems generated by applying
some r a n d o m p e r t u r b a t i o n to the above 15 real Table I.
problems. More precisely, each problem of the sec- Prob. HS Best upper bound
ond series is generated from one problem of the # obtained within
15 s 5' 1h
first series by changing the weight wij of each in-
1 2376 2387 2385 2383
equality constraint of the form: Ixi - xj] >_ wij to:
2 3358 3367 3366 3365
wij - wij × (a + ~ ) where (I, is a pseudoran- 3 4090 4102 4098 4098
dom number drawn from a uniform distribution 4 2700 2720 2713 2708
on [0, 1] and a,/3 are chosen parameters (of course 5 2559 2571 2569 2564
the p s e u d o r a n d o m drawing is assumed to be inde- 6 2457 2467 2464 2459
7 3440 3450 3450 3450
pendent from one constraint to the next).
8 4781 4800 4800 4799
Table I presents the characteristics of the 15 real 9 1762 1786 1780 1777
test problems treated, numbered 1 to 15 and pro- 10 1759 1786 1780 1776
vides for each problem: number of variables (n), 11 1761 1786 1780 1778
number of constraints (K), number of distinct fre- 12 1764 1786 1780 1776
13 1761 1786 1780 1775
quencies used ( N F ) and the main characteristics
14 1757 1786 1780 1775
of the relaxed subproblem obtained from the pro- 15 1764 1786 1783 1777
cedure BUILD.RELAX" number of variables # v a r ,
Table II.
and number of constraints # c o n s t .
Table III presents in a similar way the charac- Table II shows the results obtained on the 15
teristics of the 5 x 15 - 75 test problems deduced real test problems of Table I and Table IV shows
from the previous ones by r a n d o m perturbation. the results for t h e 5 x 15 problems of Table III.
The 5 instances corresponding to each basic prob- The computer used was a P C P e n t i u m 166 work-
lem i are numbered i 1 , . . . , i5. For each instance the station with 32Mb RAM. For each problem we pro-
values of the parameters a a n d / 3 used to generate vide: HS, the best heuristic solution value obtained
the instance are displayed together with the char- (number of satisfied constraints); the best upper
acteristics (number of variables, number of con- bounds obtained after 15 seconds, 5 minutes and
straints) of the relaxed subproblem produced by 1 hour. The results in Table II confirm that our
BUILD.RELAX. approach is practical to consistently produce good
The c o m p u t a t i o n times taken to construct the bounds for real R L F A P instances within accept-
relaxed subproblems (using B U I L D . R E L A X ) on able solution times.
the problems of Tables I and III, are all between
241
Maximum constraint satisfaction: Relaxations and upper bounds
Prob. a /3 Relaxation
# # var. # const.
11 0,5 1 42 257
Prob. C~ Relaxation
12 0,5 1 42 261 # var. # const.
13 0,8 0,4 42 261
91 0,2 1,6 12 375
14 0,8 0,4 42 261
92 0,2 1,6 12 66
15 0,8 0,4 42 261
93 0,2 1,6 48 66
21 0,5 1 38 339
94 0,2 1,6 24 264
22 0,5 1 38 339
95 0,2 1,6 36 132
23 0,8 0,4 38 339
101 0,2 1,6 36 375
24 0,8 0,4 38 339
102 0,2 1,6 12 198
25 0,8 0,4 38 339
103 0,2 1,6 48 66
31 0,5 1 54 671
104 0,2 1,6 24 264
32 0,5 1 70 460
105 0,2 1,6 36 132
33 0,8 0,4 84 480
111 0,2 1,6 36 375
34 0,8 0,4 84 671
112 0,2 1,6 12 198
35 0,8 0,4 54 671
113 0,2 1,6 48 66
41 0,5 1 74 490
114 0,2 1,6 24 264
42 0,5 1 74 490
115 0,2 1,6 36 132
43 0,8 0,4 74 490
121 0,2 1,6 24 375
44 0,8 0,4 74 490
122 0,2 1,6 12 132
45 0,8 0,4 74 490
123 0,2 1,6 48 66
51 0,5 1 46 311
124 0,2 1,6 24 264
52 0,5 1 46 311
125 0,2 1,6 36 162
53 0,8 0,4 46 311
131 0,2 1,6 36 375
54 0,8 0,4 46 311
132 0,2 1,6 12 198
55 0,8 0,4 46 311
133 0,2 1,6 48 66
61 0,5 1 44 284
134 0,2 1,6 24 264
62 0,5 1 44 284
135 0,2 1,6 36 132
63 0,8 0,4 44 284
141 0,2 1,6 36 375
64 0,8 0,4 44 284
142 0,2 1,6 12 198
65 0,8 0,4 44 284
143 0,2 1,6 48 66
71 0,8 0,4 16 113
144 0,2 1,6 24 264
72 0,8 0,4 16 113
145 0,2 1,6 36 132
73 0,8 0,4 16 113
151 0,2 1,6 24 375
74 0,8 0,4 16 113
152 0,2 1,6 12 132
75 0,8 0,4 16 113
153 0,2 1,6 48 66
81 0,5 1 33 248
154 0,2 1,6 24 264
82 0,5 1 33 248
155 0,2 1,6 36 132
83 0,5 1 33 248
84 0,5 1 33 248
85 0,5 1 33 248
Table III.
242
M a x i m u m constraint satisfaction: Relaxations and upper bounds
Table IV.
243
Maximum constraint satisfaction: Relaxations and upper bounds
From Tables II and IV, it is seen t h a t for all Left. 9 (1990), 375-382.
the instances treated, the difference between the [7] CASTELINO, D.J., HURLEY, S., AND STEPHENS, N.M.:
'A tabu search algorithm for frequency assignment',
heuristic solution values HS and the best u p p e r
Ann. Oper. Res. 63 (1996), 301-319.
bounds obtained are always quite small. More
[8] CELAR: Radio link frequency assign-
precisely for all the examples treated, the ratio ment problem benchmark, CELAR, 1994,
R = ( U P - H S ) / U B is most often well below 1% ft p. cs. unh. edu /p ub / csp / archive / code/benchmarks.
(Problem 14 in Table II is the only one for which [9] Du, D.-Z., Gu, J., AND PARDALOS, P.M. (eds.): Sat-
R > 1%). We note t h a t since HS is only a lower is fiability problem: Theory and applications, Vol. 35 of
DIMACS, Amer. Math. Soc., 1997.
bound, R is a pessimistic estimate of the relative
[10] FISHER, M.L.: 'The Lagrangian relaxation method for
difference between the best u p p e r b o u n d obtained solving integer programming problems', Managem. Sci.
and the optimal, unknown, solution value. 2r (1981), 11-18.
Also, from Table IV, it is seen t h a t the results [11] FREUDER, E.G., AND WALLACE, R.J.: 'Partial con-
obtained a p p e a r to be fairly stable, in spite of the straint satisfaction', Artif. Intell. 58 (1992), 21-70.
[12] GAREY, M.R., AND JOHNSON, D.S.: Computers
i m p o r t a n c e of the p e r t u r b a t i o n s applied to gener-
and intractability. A guide to the theory of NP-
ate the corresponding 75 instances. In addition to completeness, Freeman, 1979.
practical applicability, and efficiency, this clearly [13] GAREY, M.R., JOHNSON, D.S., AND STOKMEYER, L.:
shows good stability and robustness in the behav- 'Some simplified NP-complete graph problems', Theo-
ior of our algorithms. To our knowledge, this is ret. Computer Sci. 1 (1976), 237-267.
the first time a systematic way of deriving u p p e r
[14] GONDRAN, M., AND MINOUX, M.: Graphes et algo-
rithmes, third ed., Eyrolles, 1995.
b o u n d s to such large scale M A X - C S P problems [15] HAJEMA, W., MINOUX, M., AND WEST, C.: 'CALMA
has been i m p l e m e n t e d and fully tested. project specification': Statement of the Radio Link Fre-
To conclude, let us mention that, in view of quency Assignment Problem. Appendix 3 to EUCLID
the results obtained, the techniques described here RTP, 6-~ Implementing Arrangement, June 25, 1992.
HALE, W.K.: 'Frequency assignment theory and appli-
have been included in an industrial software tool [16]
cations', Proc. IEEE 68, no. 12 (1980), 1497-1514.
for radio network engineering developed by the
[17] HANSEN, P., AND JAUMARD, B.: 'Algorithms for the
French M O b ( D G A / C E L A R ) . maximum satisfiability problem', R UTCOR Res. Re-
See also" F r e q u e n c y assignment problem; port Rutgers Univ., New Jersey (USA) R R R ~ 4 3 - 8 7
Graph coloring. (1987).
[i8] HURLEY, S., THmL, S.U., AND SMITH, D.H.: 'A com-
parison of local search algorithms for radio link fre-
References quency assignment problems': Proc. A CM Symposium
[1] BABEL, L., AND TINHOFER, G.: 'A branch and bound on Applied Computing, 1996, pp. 251-257.
algorithm for the maximum clique problem', ZOR: [19] JOHNSON, D.S.: 'Approximation algorithms for com-
Methods and Models of Oper. Res. 34 (1990), 207-217. binatorial problems', J. Comput. Syst. Sci. 9 (1974),
[2] BERGE, C.: Graphes et hypergraphes, second ed., 256-278.
Dunod, 1973. [20] Joy, S., MITCHELL, J., AND BORCHER, S.B.: 'A
[3] BOROS, E., AND PRI~KOPA, A." 'Probabilistic bounds branch and cut algorithm for MAX-SAT and weighted
and algorithms for the maximum satisfiability prob- MAX-SAT': Satisfiability Problem: Theory and Appl.,
lem', RUTCOR Res. Report Purgers Univ., New Jersey DIMACS 35, Amer. Math. Soc., 1997, pp. 519-536.
(USA) RRR#1r-88 (1988). [21] KUMAR, V.: 'Algorithms for constraint satisfaction
[4] SovJu, A., BOYCE, J.F., DIMITROPOULOS, C.H.D., problems: A survey', AI Magazine Spring (1992), 32-
SCHEIDT, G. VON, AND TAYLOR, J.G.: 'Tabu search 44.
for the radio link frequency assignment problem': [22] LANFEAR, T.A.: 'Graph theory and radio frequency
Proc. Conf. on Applied Decision Technologies: Mod- assignment': NATO EMC Analysis Project, Vol. 5,
ern Heuristic methods, Brumel Univ., Uxbridge (UK), NATO, 1989.
1995, pp. 233-250, work carried out in the CALMA [23] LIEBERHERR, K.J.: 'Algorithmic extremal problems in
PROJECT. combinatorial optimization', J. Algorithms 3 (1982),
[5] BOURRET, P.: 'Simulated annealing': Deliverable 2.3 225-244.
of the CALMA Project. Report 3/3507.00/DERI- [24] LIEBERHERR, K.J., AND SPECKER, E.: 'Complexity of
ONERA-CERT, CALMA, 1995. partial satisfaction', J. ACM 28, no. 2 (1981), 411-421.
[6] CARRAGHAN,a., AND PARDALOS, P.M.: 'An exact al- [25] MAVROCoRDATOS, P., AND MINOUX, M.: 'Allocation
gorithm for the maximum clique problem', Oper. Res.
244
Maximum entropy principle: Image reconstruction
de ressources dans les r4seaux (frequency allocation Image reconstruction is a procedure for process-
in networks)', Final Techn. Report CELAR con- ing the measurement data to construct an image
tract ~ 0 1 1 4 1 9 3 (1995), R4solution de probl~mes
of the object. This section introduces the basic
d'optimisation combinatoire pour application
l'allocation optimis4e de fr4quences dans les grands concept of image reconstruction from projection
r4seaux. data. Two types of entropy optimization models,
[26] PARDALOS, P.M., AND RODGERS, G.P.: 'A branch and namely, the finite-dimensional model and vector-
bound algorithm for the maximum clique problem', space model, and three classes of entropy optimi-
Comput. Oper. Res. 19, no. 5 (1992), 363-375.
zation methodologies, namely, the discretization
[27] POLJAK, S., AND TURZIK, D.: 'A polynomial algorithm
for constructing a large bipartite subgraph, with an ap- methods, Banach-space methods (e.g., MENT)
plication to a satisfiability problem', Canad. J. Math. and Hilbert-space methods (e.g., finite element
X X X I V , no. 3 (1982), 519-524. method) are included. For more details about im-
[28] RESENDE, M.G.C., PITSOULIS, L.S., AND PARDALOS, age reconstruction, the reader is referred to [7], [2],
P.M.: 'Approximate solution of weighted MAX-SAT [13] and the references therein.
problems using GRASP': Satisfiability Problem: The-
ory and Appl., DIMACS 35, Amer. Math. Sou., 1997,
A very important scientific application of im-
pp. 393-405. age reconstruction is in computerized tomography
[29] ROBERTS, F.S.: 'T-colorings of graphs; recent results (CT) for medical diagnosis. Physicians need to
and open problems', Discrete Math. 93 (1991), 229- know, for example, the location, shape, and size
245. of a suspected tumor inside a patient's brain in
[30] SMITH, D.H., AND HURLEY, S.: 'Bounds for the fre-
order to plan a suitable course of treatment. With
quency assignment problem', Discrete Math. 167/168
(1997), 571-582. computerized tomography, images of cross-sections
[31] SMITH, D.H., HURLEY, S., AND THIEL, S.U.: 'Improv- of a human body can be constructed from data
ing heuristics for the frequency assignment problem', obtained by measuring the attenuation of X-rays
Europ. J. Oper. Res. 107 (1998), 76-86. along a large number of straight lines (or strips)
[32] WERRA, D. DE, AND GAY, Y.: 'Chromatic scheduling
through each cross-section. For ease of introduc-
and frequency assignment', Discrete Appl. Math. 49
(1994), 165-174. tion, we illustrate the basic ideas about image re-
construction with the example of two-dimensional
M. Minoux
X-ray CT, with the understanding that the dis-
Univ. Paris 6
cussion can be generalized to higher-dimensional
4 place Jussieu
75005 Paris, France settings.
E-mail address: Michel. Minoux@lip6. fr In this example, the distribution to be deter-
P. Mavrocordatos mined is that of the X-ray linear attenuation co-
Algotheque and Univ. Paris 6 efficient of human body tissues. The total attenu-
4 place Jussieu ation of the X-ray beam between a source and a
75005 Paris, France
detector is approximately the integral of the linear
E-mail address: p 2 m ~ a l g o t h e q u e , com
attenuation coefficient along the line between the
MSC 2000:90C10 source and the detector. The unknown distribution
Key words and phrases: constraint satisfaction, relaxation. of the X-ray linear attenuation coefficient is rep-
resented by a density function f of two variables,
which assumes zero-value outside a squared-shape
MAXIMUM ENTROPY PRINCIPLE: IM- region. The squared region is usually referred to as
AGE R E C O N S T R U C T I O N , entropy optimiza- the support of the image.
tion for image reconstruction Two basic types of entropy optimization models,
Images can be used to characterize the underlying namely, finite-dimensional model and vector-space
distribution of certain physical properties, such as model, are commonly used to decide the density
density, shape, and brightness, of an object un- function f. The finite-dimensional models approx-
der investigation. In many applications where an imate the density values over the support of the
image is required, only a finite number of observa- image at a finite number of grid points, while the
tions and/or indirect measurements can be made. density is approximated by a real-value function
245
M a x i m u m entropy principle: Image reconstruction
for the entire scanning region in the vector-space zation approach' to find a solution that is not
models. The latter models were motivated to re- only feasible in the above sense but also optimal
construct the image with only a small number of with respect to a certain criterion. In the liter-
available projections. ature, at least three different types of optimiza-
In the finite-dimensional models, the support tion problems have been proposed, namely, the en-
of the density f is represented by n (given by tropy maximization problem, the quadratic min-
the users) regularly spaced grid points, and the imization problem, and the maximum likelihood
values of the density function f at these points problem.
are denoted by f - ( f l , . . . , f n ) . Assume that m The entropy optimization problem seeks to op-
projections are made and the measurement data timize an entropic objective function subject to (2)
d - ( d l , . . . , din) are obtained. and (3) as follows.
The relationship between the unknown density Model 1"
values f and the observed measurement d can be n
d ~ Af, (1)
s.t. d-e _ A f _< d + e ,
where A = [aij] is a projection matrix. f>_O.
Note that the approximation sign in (1) reflects
possible errors in modeling and measurement. Also Some researchers proposed models in which the
f j ' s are normalized in such a way that Ejn__l f j -
note that, in the classical square pixel model, the
1, and the projection matrix and the measurement
image is discretized by partitioning its support into
data differ from those of Model 1. See, e.g., [4]. In
a finite number of equi-sized square regions (called
this way, a solution that is consistent with the mea-
pixels or cells) whose centers are those n sample
surement data but remains maximally noncommit-
points. By assuming that the density function f
tal can be found. Note that an optimal solution to
is constant in each of the equi-sized pixels, i.e.,
such models can also be interpreted as the most
f = f j throughout pixel j, the value of aij in the
probable solution that is consistent with the mea-
projection matrix is simply the length of the in-
tersection of the line corresponding to the ith pro- surement data [3].
jection with the pixel surrounding the j t h sample Other variations of Model 1 exist. Despite pos-
point. sible modeling and measurement errors, one com-
mort practice is to replace (1) and inequalities (2),
Once the projection matrix A is defined and
and (5) by a system of equations" A f - d.
the measurement d is known, the problem is to
find an f satisfying (1). To cope with the errors A different version of the finite-dimensional en-
mentioned above, G.T. Herman [6] suggested that tropy optimization model begins with the defini-
(1) be replaced by an 'interval constraint' and a tion of an error vector e - (el, ... , era) T, where
nonnegativity constraint be added: n
ei - di - E aij f j, i = l, . . . , m .
d-e < Af < d +e, (2) j=l
f_~O, (3)
Assume that errors e l , . . . , em exist due to impre-
where e - ( e l , . . . , e m ) is an m-vector of user- cise measurement and are independent noise terms
chosen tolerance levels. Note that (2) can be re- with zero mean and known variance a i2 . S.F . Burch
placed by an equivalent system of inequalities et al. [1] observed that the strong law of large num-
bers implies that
A ' f <_ d', (4)
with twice as many one-sided inequalities [2], [6].
For such an image reconstruction model, we can
m i- 1 °i
adopt either the 'feasibility a p p r o a c h ' to find a
as m ---+ (x).
solution to (2) and (3) directly, or the ' o p t i m i -
246
Maximum entropy principle: Image reconstruction
Thus, if m is sufficiently large, the following en- the desirability of employing iterative methods in
tropy optimization problem with quadratic con- CT systems.
straints can be useful: In many situations, e.g., conducting diagnostic
Model 2: experiments on plasma in magnetic confinement
devices or laser target impositions with measure-
n
ments on fusion reactor cores, only few projec-
max -- ~ f j ln f j
j=l
tions are available, e.g., less than 10. When the
1 finite-dimensional entropy optimization model is
s.t. - - ( A f - d ) T S 2 ( A f - d) - 1,
m applied, it tends to produce 'streaking' artifacts.
fj>_O, j- 1,...,n, This motivated the use of the vector-space model.
Take the two-dimensional X-ray CT problem as
where S is a diagonal matrix with 1/ai being its an example. By assuming that the unknown den-
ith diagonal element. sity function f(x, y) is continuous over a compact
Concerns such as the smoothing effect, nonuni- support D such that
formity, peakness, and exactness [14] of a con-
structed image can also be addressed in this model I(x, Y) > 0 and [ f f(x, y)dx dy - 1, (6)
t]
with proper modification of the objective functions , 2
D
and constraints. So far, we have used the square
pixel model to illustrate the idea of entropy opti- G. Minerbo [9] defined the entropy of ](x, y) as
mization for image reconstruction. Other models
exist [2].
i(I) - - [ [ f(x, y)ln[f(x, y)A] dx dy,
i] i]
non's entropy and related entropy optimization where A is the area of D. Denote the set of contin-
principles, i.e., principle of maximum entropy and uous, nonnegative functions with compact support
principle of minimum cross-entropy, see E n t r o p y in D by C+ (D).
o p t i m i z a t i o n : S h a n n o n m e a s u r e of e n t r o p y The scanning area is partitioned into parallel
a n d its p r o p e r t i e s . A large amount of literature strips, each of which is penetrated by an X-ray
has been devoted to developing iterative methods beam. Let Oj, j = 1,..., J, be the J distinct pro-
for solving finite-dimensional entropy optimization jection angles with respect to the X-axis of the
problems with linear and/or quadratic constraints. scanning area. Also let M(j) be the number of
For details and a unification of such methods, see parallel beams associated with the j t h projection
[3]. or view, and Sjl < " " < S j / ( j ) be a set of ab-
The method currently employed in most CT scissas for the j t h view. The projection data are
systems is the 'filtered back-projection' method, assumed to be in the form of the following 'strip
which is based on a finite-dimensional model. (See integrals':
[5], [10] for details.) Compared to the iterative
Sj(m+1) oo
methods for solving entropy optimization prob-
lems, this method provides speed, which enables
reconstruction of the image while X-ray transmis-
sion data are being collected. Hence the time be- f (s cos Oj - t sin Oj, s sin Oj + t cos Oj) dt ds,
tween scanning and obtaining reconstructed im-
ages is reduced. However, there are situations where m = 1 , . . . , M ( j ) and j = 1 , . . . , J. It is as-
where iterative methods produce comparable or sumed that, for j = 1 , . . . , J,
better reconstructed images than the filtered back- O0
247
Maximum entropy principle: Image reconstruction
Let Gjm denotethe observed values of Pjm(f), for sup G(f) -- ~(f) -- "/~-~.[Gjm -- Pjm(f)] 2,
m - 1,... ,M(j), and j - 1 , . . . , J. Note that (6) left j,m
implies aim >_ 0 and )--~M(j) Gjm - 1. where ~ > 0 is an adjustable penalty parameter
Then the vector-space model results in the fol- and ft is a convex and weakly (sequentially) com-
lowing optimization problem" pact set of nonnegative functions in L2(D), with
Model 3" a compact support in D and containing physical
information known a priori about the object to be
sup ¢(:) scanned, e.g., upper and lower bounds on the den-
C+(D) sity function. (A set gt of nonnegative functions
s.t. Pjm(f) - Gjm, (7) in L2+(D)is weakly (sequentially) compact if and
m - 1,...,M(j); only if every sequence in f~ has a weakly convergent
subsequence whose weak limit lies in ft; a sequence
j-1,...,J.
{ fn (x, y) } converges weakly to f (x, y) if and only
if the sequence {(fn(x, y), g(x, y)} } converges to
A finite-dimensional unconstrained dual prob- (f(x,y),g(x,y)} for every g(x,y) e L2+(D), where
lem can be derived by using the technique of La- (hi, h2) - f f hi (x, y)h2(x, y)dxdy denotes the in-
grange multipliers. An algorithm known as MENT ner product of hi and h2 in the space of L2+(D).)
[9] was proposed. It was shown that the solu- With the aid of the theory of Hilbert space, it
tions produced by MENT converge to a density can be shown [8] that G has a unique maximizer
function f* which satisfies the constraint (7) with in f~, for any given data Gjm, m = 1,... , M ( j ) ,
~(f*) - SuPc+(D)~(f). However, the limiting den- j-1,...,J.
sity function f* is not continuous. Actually, as
Based on this alternative formulation, the den-
pointed out in [8], f* is piecewise constant and
sity function f(x, y) can be approximated by using
f* ~ C+(D). When few projections are available the finite element method [11]. For simplicity, as-
and the object being scanned has a simple struc-
sume that D - [-1, 1] x [-1, 1]. First, we superim-
ture (or close to circular symmetry in density),
pose a fixed rectangular mesh on D, with uniform
some preliminary computational results indicated
mesh size h - 1In in both the x and y directions.
the potential of this approach.
We also use the product of piecewise linear func-
Recognizing the fact that the supremum of tions in x and y as the finite element space S h. In
Model 3 is not attained by any function f C this way, a basis for S h has the form
C+(D), M. Klaus and R.T. Smith [8] defined an al-
ternative formulation in a richer class of functions
than C+(D). More precisely, they replaced C+(D) for k- 1 , . . . , ( 2 n + 1) 2 ,
by L~_(D), the set of all nonnegative square inte-
where
grable functions on D, as the setting. Note that
(k-1)-(k-1) (mod2n+l)~_
all piecewise-constant functions over D are con- n,
2n+1 J
tained in L2+(D). Also recognizing that measure-
i-k-(l+n)(2n+l)-n-1,
ments may not be consistent and even be flawed,
they considered an optimization problem where and
the objective function is the original entropy func- if t<(j-1)h
tional ~(f) minus a penalty term corresponding or t_>(j+l)h,
to the residual error in meeting the measurement t-(j- 1)h
Cj(t) -
constraints, and the constraint is that the maxi- if (j-1)h<t<jh,
mizer lies in a weakly compact set that is deter-
if jh <_t <_ (j + l)h.
mined by known physical information about the
density function of the object to be scanned. A It is reasonable to expect that, in practice, one
corresponding formulation becomes should know a priori the minimum and maximum
Model 4: densities of the object being examined. Hence we
248
Maximum flow problem
focus on a simple constraint set [4] FRIEDEN, B.R.: 'Restoring with maximum likelihood
and maximum entropy', J. Optical Soc. Amer. 62
~ _ { f E L2+(D). O < a < f < b < oc a.e., } (1972), 511-518.
f - 0 a.e., i n R 2 \ D " [5] HENDEE, W.R.: The physical principles of computed
tomography, Little, Brown and Company, 1983.
The density function f ( x , y ) is then approxi-
[6] HERMAN, G.T.: 'A relaxation method for reconstruct-
m a t e d in S h by ing objects from noisy X-rays', Math. Program. 8
N (1975), 1-19.
[7] HERMAN, G.T. (ed.): Image reconstruction from pro-
](x, y) - Z y),
k-1 jections: implementation and applications, Springer,
1979.
where N - (2n + 1) 2 and Ck'S are chosen as the [8] KLAUS, M., AND SMITH, R.T.: 'A Hilbert space ap-
optimal solution of the following finite-dimensional proach to maximum entropy reconstruction', Math.
optimization problem: Meth. Appl. Sci. 10 (1988), 397-406.
[9] MINERBO, G.: 'MENT: A maximum entropy algorithm
for reconstructing a source from projection data', Com-
puter Graphics and Image Processing 10 (1979), 48-68.
eERN k=l
N [10] NATTERER, F.: Mathematics of computerized tomogra-
phy, Wiley, 1986.
s.t. 0 < a <_ ~ CkCk(X, y) <_ b. [11] SMITH, R.T., AND ZOLTANI, C.K.: 'An application of
k=l
the finite element method to maximum entropy tomo-
This problem can be further reduced to graphic image reconstruction', J. Sci. Comput. 2, no. 3
(1987), 283-295.
[12] SMITH,R.T., ZOLTANI,C.K., KLEM,G.J., ANDCOLE-
sup ~ CkCk(x,y) MAN, M.W.: 'Reconstruction of tomographic images
cER g k--1
from sparse data sets by a new finite element maximum
entropy approach', Applied Optics 30, no. 5 (1991),
- ~ ~_~ aim - ~ ckPjm(¢k(x, y)) 573-582.
j,m k=l
[13] STARK, H. (ed.): Image recovery: theory and applica-
,s.t. O < a < ck < b, k-1,...,N. tion, Acad. Press, 1987.
[14] WANG, Y., AND LU, W.: 'Multi-criterion maximum
Preliminary computational results reported in entropy image reconstruction from projections', IEEE
[11], [12] indicate some improvements of this alter- Trans. Medical Imaging 11 (1992), 70-75.
native approach over the M E N T algorithm when
Shu-Cherng Fang
the object under investigation does not have cir- North Carolina State Univ.
cular s y m m e t r y in density and has a high density North Carolina, USA
area near the edge of the scanning region. E-mail address: fang(Deos .ncsu. edu
See also" Entropy optimization: Shannon H.-S. Jacob Tsao
measure of entropy and its properties; San Jose State Univ.
San Jose, California, USA
Jaynes' maximum entropy principle; En-
E-mail address: jtsao©email, sjsu. e d u
tropy optimization: Parameter estimation;
Entropy optimization: Interior point meth- MSC 2000: 94A17, 94A08
Key words and phrases: entropy optimization, image recon-
ods; Optimization in medical imaging.
struction, maximum entropy principle, principle of maxi-
mum entropy.
References
[I] BURCH,S.F., GULL,S.F., AND SKILLING, J.K.: 'Image
restoration by a powerful maximum entropy method',
Computer Vision, Graphics, and Image Processing 23 MAXIMUM FLOW PROBLEM
(1983), 113-128. The maximum flow problem seeks the m a x i m u m
[2] CENSOR,Y., ANDHERMAN, G.T.: 'On some optimiza- possible flow in a capacitated network from a spec-
tion techniques in image reconstruction', Applied Nu-
ified source node s to a specified sink node t with-
mer. Math. 3 (1987), 365-391.
[3] FANG,S.-C., RAJAsEKERA, J.R., AND TSAO, H.-S.J.: out exceeding the capacity of any arc. A closely re-
Entropy optimization and mathematical programming, lated problem is the minimum cut problem, which
Kluwer Acad. Publ., 1997. is to find a set of arcs with the smallest total ca-
249
Maximum flow problem
pacity whose removal separates node s and node In examining the maximum flow problem, we
t. The m a x i m u m flow and minimum cut problems impose two assumptions"
arise in a variety of application settings as diverse
i) all arc capacities are integer; and
as manufacturing, communication systems, distri-
bution planning, matrix rounding, and schedul- ii) whenever the network contains arc (i, j),
ing. These problems also arise as subproblems in then it also contains arc (j, i).
the solution of more difficult network optimization The second assumption is nonrestrictive since we
problems. In this article, we study the m a x i m u m allow arcs with zero capacity.
flow and minimum cut problems, briefly introduc- Sometimes the flow vector x might be required
ing the underlying theory and algorithms, and pre- to satisfy lower bound constraints imposed upon
senting some applications. See [2] for a wealth of the arc flows; that is, if lij ~_ 0 specifies the lower
additional material that amplifies on this discus- bound on the flow on arc (i, j) E A, we impose the
sion. condition xij ~_ lij. We refer to this problem as
Let G = (N, A) be a directed network defined the maximum y~ow problem with nonnegative lower
by a set N of n nodes and a set A of m directed bounds. It is possible to transform a maximum
arcs. We refer to nodes i and j as endpoints of arc flow problem with nonnegative lower bounds into
(i, j). A directed path il - i 2 - . . . - - i k is a set of arcs a maximum flow problem with zero lower bounds.
( i 1 , i 2 ) , . . . , ( i k - l , i k ) . Each arc (i,j) has an associ- The minimum cut problem is a close relative of
ated capacity uij denoting the m a x i m u m amount the maximum flow problem. A cut [S, S] partitions
of flow on this arc. We assume that each arc capac- the node set N into two subsets S and S - N- S
ity uij is an integer, and let U = max{uij: (i, j) It consists of all arcs with one endpoint in S and
A}. The network has two distinguished nodes, a the other in S. We refer to the arcs directed from
source node s and a sink node t. To help in rep- S to S, denoted by (S, S), as .forward arcs in the
resenting a network, we use the arc adjacency list cut and the arcs directed from S to S, denoted by
A(i) of node i, which is the set of arcs emanating (s, s), back a the cut The cut IS, S]
from it, that is, A(i) = {(i,j) E A: j E N } . is called an s- t-cut if s E S and t E S. We define
The m a x i m u m flow problem is to find the maxi- the apacity of IS, S], denoted
m u m flow from the source node s to the sink node as ~(i,j)~(s,~) uij. A minimum cut in G is an s-t-
t that satisfies the arc capacities and mass balance cut of minimum capacity. We will show that any
constraints at all nodes. We can state the problem algorithm that determines a maximum flow in the
formally as follows. network also determines a minimum cut in the net-
work.
max v (1)
The remainder of this article is organized as fol-
subject to lows. To help in understanding the importance of
the maximum flow problem, we begin by describ-
Xij -- E XJi (2)
{j. (i,j)EA} { j (j,i)EA} ing several applications. In the next section we
present some preliminary results concerning flows
v f o r / - - s, and cuts. We next discuss two important classes of
-- 0 f o r i ~ {s,t}, algorithms for solving the maximum flow problem:
--v for i -- t, augmenting path algorithms, and preflow-push al-
gorithms. As described in the next section, aug-
0 <_ xij < uij for all (i, j) E A. (3)
menting path algorithms augment flow along di-
We refer to a vector x - {xij} satisfying (2) rected paths from the source node to the sink node.
and (3) as a flow and the corresponding value of The proof of the validity of the augmenting path
the scalar variable v as the value of the flow. We algorithm yields the well-known max-flow min-cut
refer to the constraints (2) as the mass balance theorem, which implies that the value of a max-
constraints, and refer to the constraints (3) as the imum flow in a network equals the capacity of a
flow bound constraints. minimum cut in the network. In the next section,
250
Maximum flow problem
we study preflow-push algorithms that 'flood' the bution network. In this problem context, the re-
network so that some nodes have excesses and then finery corresponds to a particular node s in the
incrementally 'relieve' the flow from nodes with distribution network and the storage facility cor-
excesses by sending flow from excess nodes for- responds to another node t. The capacity of each
ward toward the sink node or backward toward arc is the maximum amount of oil per unit time
the source node. In the final section, we study im- that can flow along it. The value of a maximum
plications of the max-flow min-cut theorem and s - t flow determines the maximum flow rate from
prove some max-min results in combinatorics. the source node s to the sink node t. Similar ap-
We would like to design maximum flow algo- plications arise in other settings, for example, de-
rithms that are guaranteed to be efficient in the termining the transmission capacity between two
sense that their worst-case running times, that nodes of a telecommunications network.
is, the total number of multiplications, divisions,
additions, subtractions, and comparisons in the F e a s i b l e F l o w P r o b l e m . The feasible flow prob-
worst-case grow slowly in some measure of the lem consists of finding a feasible flow satisfying the
problem's size. We say that a maximum flow al- following constraints:
gorithm is an O(n 3) algorithm, or has a worst-
case complexity of O(n3), if it is possible to solve xij - ~ xji - b(i) (4)
any maximum flow problem using a number of (j: (i,j)EA) (j: (j,i)EA)
computations that is asymptotically bounded by for all i E N,
some constant times the term n 3. We say that 0 ~ xij ~_ uij for all (i, j) E A. (5)
an algorithm is a polynomial time algorithm if it's
worst-case running time is bounded by a polyno- We assume that ~-~iENb(i) -- O. The following
mial function of the input size parameters. For a distribution scenario illustrates how the feasible
maximum flow problem, the input size parameters flow problem arises in practice. Suppose that mer-
are n, m, and log U (the number of bits needed chandise available at several seaports is desired
to specify the largest arc capacity). We refer to by other ports. We know the stock of merchan-
a maximum flow algorithm as a pseudopolynomial dise available at the 'supply' ports, the amount
time algorithm if its worst-case running time is required at the other ports, and the maximum
bounded by a polynomial function of n, m, and U. quantity of merchandise that can be shipped on a
For example, an algorithm with worst-case com- particular sea route. We wish to know whether we
plexity of O(nm log U) is a polynomial time algo- can satisfy all of the demands by using the avail-
rithm, but an algorithm with worst-case complex- able supplies.
ity of O(nmU) is a pseudopolynomial time algo- We can solve the feasible flow problem by solv-
rithm. ing a maximum flow problem defined on an aug-
mented network as follows. We introduce two new
nodes, a source node s and a sink node t. For each
A p p l i c a t i o n s . The maximum flow problem arises
node i with supply (that is, with b(i) > 0), we
in a variety of situations and in several forms.
add an arc (s, i) with capacity b(i), and for each
Sometimes, it arises directly in combinatorial ap-
node i with demand (that is, with b(i) < 0), we
plications that on the surface might not appear to
add an arc (i, t) with capacity -b(i). We refer to
be maximum flow problems at all; at other times, it
the new network as the transformed network. We
occurs as a subproblem in the solution of more dif-
then solve a maximum flow problem from node s
ficult network optimization problems. In this sec-
to node t in the transformed network. It is easy
tion, we describe three applications of the maxi-
to show that the model (4)-(5) has a feasible so-
mum flow problem.
lution if and only if the maximum flow saturates
Capacity o] Physical Networks. An oil company all the arcs emanating from the source node, that
needs to ship oil from a refinery to a storage fa- is, xsj - usj for all arcs (s,j) E A(s). Moreover,
cility using the pipelines of its underlying distri- if each b(i) and uij is integer, then model (4)-(5)
251
Maximum flow problem
Row
Sum
(a) (b)
always has an integer feasible solution whenever it Using a numerical example, we will show how to
has a feasible solution (see Theorem 3). transform a matrix rounding problem into a max-
imum flow problem. Fig. la) shows an instance of
Sometimes in a feasible flow problem arcs have
the matrix rounding problem and Fig. lb) gives
nonnegative lower bounds, that is, the flow bound
the m a x i m u m flow network G for this problem.
constraints are lij ~_ Xij ~ Uij instead of 0 < xij ~_
The network G contains a node i corresponding to
uij, for some constants lij > 0 for each (i, j) E A.
each row i of the matrix D, a node j corresponding
By substituting Yij -- x i j - lij for xij, we can trans-
to each column j of D, a source node s, and a sink
form this problem to the formulation (4)-(5). Then
node t. The network contains an arc (i, j) corre-
(5) reduces to 0 < Yij <_ ( u i j - lij) and (4) reduces
sponding to the ijth element in the matrix, an arc
to the same set of equations, but with a different
(s, i) for each row i (this arc represents the sum
right-hand side vector b~.
of row i), an arc (j, t) for each column j (this arc
represents the sum of column j). For any arc (i, j),
Matrix Rounding Problem. This application is con-
we define its upper b o u n d uij = Idij] and lower
cerned with consistent rounding of the elements,
bound lij -- [dij]. Notice that the flow xij = dij is
the row sums, and the column sums of a ma-
a real-valued feasible flow x in the network. Since
trix. We are given a p × q matrix of real num-
there is a one-to-one correspondence between the
bers D = {dij}, with row sums ai and column
consistent roundings of the matrix and feasible in-
sums/3j. We can round any real number d to the
teger flows in the corresponding network, we can
next smaller integer [d] or to the next larger in-
find a consistent rounding by solving a feasible flow
teger [d], and the decision to round up or round
problem on the corresponding network. The feasi-
down is entirely up to us. The matrix-rounding
ble flow algorithm will produce an integer feasible
problem requires that we round the matrix ele-
flow (because of Theorem 3), which corresponds to
ments, and the row and column sums of the matrix
a consistent rounding.
so that the sum of the rounded elements in each
row equals the rounded row sum, and the sum of
the rounded elements in each column equals the P r e l i m i n a r i e s . In this section, we discuss some el-
rounded column sum. We refer to such a round- ementary properties of flows and cuts. We will use
ing as a consistent rounding. The matrix-rounding these properties to prove the celebrated max-flow
problem arises is several application contexts, for min-cut theorem and to establish the correctness
example, the rounding of census data to disguise of the augmenting p a t h algorithm described in the
data on individuals. next section.
252
Maximum flow problem
Residual Network. The concept of residual network amount of flow from the nodes in S to nodes in
plays a central role in the development of maxi- S, and the second expression denotes the amount
mum flow algorithms. Given a flow x, the residual of flow returning from the nodes in S to the nodes
capacity r i j of any arc (i, j) C A is the maximum in S. Therefore, the right-hand side denotes the
additional flow that can be sent from node i to total (net) flow across the cut, and (6) implies
node j using the arcs (i,j) and (j, i). (Recall the that the flow across any s - t-cut IS, S] equals v.
assumption from the first Section that whenever Substituting xij < uij in the first expression of
the network contains arc (i, j), it also contains the (6) and xij >_ 0 in the second expression yields"
arc (j, i).) The residual capacity rij has t w o com- v _ ~(i,j)e(s,~)uij - u[S,S] implying that the
ponents: value of any flow can never exceed the capacity
i) u i j - xij, the unused capacity of arc (i, j); of any cut in the network. We record this result
ii) the current flow xji on arc (j, i), which we formally for future reference.
can cancel to increase the flow from node i LEMMA 1 The value of any flow can never exceed
to node j. the capacity of any cut in the network. Conse-
Consequently, r i j -- u i j - x i j nt- x j i . W e refer to the quently, if the value of some flow x equals the ca-
network G(x) consisting of the arcs with positive pacity of some cut [S, S], then x is a maximum
residual capacities as the residual network (with flow and the cut IS, S] is a minimum cut-. 71
respect to the flow x). Fig. 2 gives an example of The max-flow rain-cut theorem, to be proved in
a residual network. the next section, states that the value of some flow
always equals the capacity of some cut.
253
Maximum flow problem
the network contains no such path. The algorithm capacity of arc (4, 3) from 0 to 4. Fig. 3b) shows the
below describes the generic augmenting path algo- residual network at this stage. In the second iter-
rithm. ation, the algorithm selects the path 1 - 2 - 3 - 4
We can identify an augmenting path P in G ( x ) and augments 1 unit of flow; Fig. 3c) shows the
by using a graph search algorithm. A graph search residual network after the augmentation. In the
algorithm starts at node s and progressively finds third iteration, the algorithm augments one unit
all nodes that are reachable from the source node of flow along the path 1 - 2 - 4. Fig. 3d) shows the
using directed paths. Most search algorithms run corresponding residual network. Now the residual
in time proportional to the number of arcs in the network contains no augmenting path and so the
network, that is, O ( m ) time, and either identify an algorithm terminates.
augmenting path or conclude that G ( x ) contains
no augmenting path; the latter happens when the
sink node is not reachable from the source node.
BEGIN
x : - 0;
WHILE G(x) contains a directed path
from node s to node t DO
BEGIN
identify an augmenting path P from s to t; (a) (b)
set ~ : - m i n { r i j : (i, j) E P};
augment 5 units of flow along P;
update G(x);
END;
END;
254
Maximum flow problem
that v - u[S, S]. Therefore, the value of the cur- source node to the sink node. To bound the num-
rent flow x equals the capacity of the cut. Lemma ber of iterations, we will determine a bound on the
1 implies that x is a maximum flow and IS, S] is a maximum flow value. By definition, U denotes the
minimum cut. This conclusion establishes the cor- largest arc capacity, and so the capacity of the cut
rectness of the generic augmenting path algorithm ({s}, S - { s } ) is at most nU. Since the value of any
and, as a byproduct, proves the following max-flow flow can never exceed the capacity of any cut in
min-cut theorem. the network, we obtain a bound of nU on the max-
imum flow value and also on the number of itera-
THEOREM 2 The maximum value of the flow from
tions performed by the algorithm. Consequently,
a source node s to a sink node t in a capacitated
the running time of the algorithm is O(nmU),
network equals the minimum capacity among all
which is a pseudopolynomial time bound. We sum-
s - t-cuts. [-1
marize the preceding discussion with the following
The proof of the max-flow min-cut theorem theorem.
shows that when the augmenting path algorithm
THEOREM 4 The generic augmenting path al-
terminates, it also discovers a minimum cut IS, S],
gorithm solves the maximum flow problem in
with S defined as the set of all nodes reachable
O(nmU) time. [-7
from the source node in the residual network cor-
responding to the maximum flow. For our previ- The augmenting path algorithm is possibly the
ous numerical example, the algorithm finds the simplest algorithm for solving the maximum flow
minimum cut in the network, which is IS, S] with problem. Empirically, the algorithm performs rea-
sonably well. However, the worst-case bound on
The augmenting path algorithm also establishes the number of iterations is poor for large values of
another important result, the integrality theorem: U. For example, if U - 2n, the bound is exponen-
tial in the number of nodes. Moreover, as shown
THEOREM 3 If all arc capacities are integer, then
by known examples, the algorithm can indeed per-
the maximum flow problem always has an integer
form these many iterations. A second drawback
maximum flow. V-]
of the augmenting path algorithm is that if the
This result follows from the facts that the initial capacities are irrational, the algorithm might not
(zero) flow is integer and all arc capacities are in- terminate. For some pathological instances of the
teger; consequently, all initial residual capacities maximum flow problem, the augmenting path al-
will be integer. Since subsequently all arc flows gorithm does not terminate in a finite number of
change by integer amounts (because residual ca- iterations and although the successive flow values
pacities are integer), the residual capacities remain converge to some value, they might converge to a
integer throughout the algorithm. Further, the fi- value strictly less than the maximum flow value.
nal integer residual capacities determine an integer (Note, however, that the max-flow min-cut theo-
maximum flow. The integrality theorem does not rem is valid even if arc capacities are irrational.)
imply that every optimal solution of the maximum Therefore, if the augmenting path algorithm is to
flow problem is integer. The maximum flow prob- be guaranteed to be effective in all situations, it
lem might have noninteger solutions and, most of- must select augmenting paths carefully.
ten, it has such solutions. The integrality theorem Researchers have developed specific implemen-
shows that the problem always has at least one tations of the generic augmenting path algorithms
integer optimal solution. that overcome these drawbacks. Of these, the
What is the worst-case running time of the al- following three implementations are particularly
gorithm? An augmenting path is a directed path noteworthy:
in G(x) from node s to node t. We have seen ear-
lier that each iteration of the algorithm requires i) the maximum capacity augmenting path al-
O(m) time. In each iteration, the algorithm aug- gorithm which always augments flow along a
ments a positive integer amount of flow from the path in the residual network with the max-
255
Maximum flow problem
256
Maximum flow problem
any (shortest) p a t h from node i to node t con- shortest p a t h from that node to node t, the resid-
tains at least d(i) arcs. We say that an arc (i, j) ual network contains no directed path from s to
in the residual network is admissible if it satisfies t. The subsequent pushes maintain this property
the condition d(i) = d(j) + 1; we refer to all other and drive the solution toward feasibility. Conse-
arcs as inadmissible. quently, when there are no active nodes, the flow
The basic operation in the preflow-push algo- is a m a x i m u m flow.
r i t h m is to select an active node i and try to re- A push of 5 units from node i to node j decreases
move the excess by pushing flow to a node with both the excess e(i) of node i and the residual rij
smaller distance label. (We will use the distance of arc (i, j) by 5 units and increases both e(j) and
labels as estimates of the length of the shortest rji by 5 units. We say t h a t a push of 5 units of
p a t h to the sink node.) If node i has an admissible flow on an arc (i, j) is saturating if d = rij and is
arc (i, j), then d(j) = d ( i ) - 1 and the algorithm nonsaturating otherwise. A nonsaturating push at
sends flow on admissible arcs to relieve the node's node i reduces e(i) to zero. We refer to the pro-
excess. If node i has no admissible arc, then the cess of increasing the distance label of a node as a
algorithm increases the distance label of node i so relabel operation. The purpose of the re label op-
t h a t node i has an admissible arc. The algorithm eration is to create at least one admissible arc on
terminates when the network contains no active which the algorithm can perform further pushes.
nodes, that is, excess resides only at the source
It is instructive to visualize the generic preflow-
and sink nodes. The next algorithm describes the
push algorithm in terms of a physical network:
generic preflow-push algorithm.
arcs represent flexible water pipes, nodes represent
BEGIN joints, and the distance function measures how far
set x := 0 and d(j) := 0 for all j E N;
nodes are above the ground. In this network, we
set x sj = usj for each arc (s, j) E A(s);
d(s) := n; wish to send water from the source to the sink. We
WHILE residual network G(x) contains visualize flow in an admissible arc as water flowing
an active node downhill. Initially, we move the source node up-
DO ward, and water flows to its neighbors. Although
BEGIN we would like water to flow downhill toward the
select an active node I;
sink, occasionally flow becomes t r a p p e d locally at
push/r elab el (i);
END; a node that has no downhill neighbors. At this
END; point, we move the node upward, and again water
p r o c e d u r e push/relabel(i); flows downhill toward the sink.
BEGIN
IF network contains an admissible arc (i, j)
Eventually, no more flow can reach the sink. As
THEN push 5 : - min{e(i), r,j } units of flow we continue to move nodes upward, the remain-
from node i to node j ing excess flow eventually flows back toward the
ELSE replace d(i) by source. The algorithm terminates when all the wa-
min{d(j) + 1: (i,j) E A(i),r~j > 0}; ter flows either into the sink or flows back to the
END;
source.
The generic preflow-push algorithm.
To illustrate the generic preflow-push algorithm,
The algorithm first saturates all arcs emanating we use the example given in Fig. 4. Fig. 4a) spec-
from the source node; then each node adjacent to ifies the initial residual network. We first saturate
node s has a positive excess, so that the algorithm the arcs e m a n a t i n g from the source node, node 1,
can begin pushing flow from active nodes. Since and set d(1) = n = 4. Fig. 4b) shows the residual
the preprocessing operation saturates all the arcs graph at this stage. At this point, the network has
incident to node s, none of these arcs is admissi- two active nodes, nodes 2 and 3. Suppose that the
ble and setting d(s) - n will satisfy the validity algorithm selects node 2 for the push/relabel op-
condition (8), (9). But then, since d(s) = n, and a eration. Arc (2, 4) is the only admissible arc and
distance label is a lower bound on the length of the the algorithm performs a saturating push of value
257
Maximum flow problem
5 - m i n { e ( 2 ) , r 2 4 } - min{2, 1} 1. Fig. 4c) gives - - The preflow-push algorithm has several attrac-
the residual network at this stage. Suppose the al- tive features, particularly its flexibility and its po-
gorithm again selects node 2. Since no admissi- tential for further improvements. Different rules
ble arc emanates from node 2, the algorithm per- for selecting active nodes for the push/relabel
forms a relabel operation and gives node 2 a new operations create many different versions of the
distance label d(2) - m i n { d ( 3 ) + 1, d ( 1 ) + 1} - generic algorithm, each with different worst-case
rain{2,5} = 2. The new residual network is the complexity. As we have noted, the bottleneck op-
same as the one shown in Fig. 4c) except that eration in the generic preflow-push algorithm is the
d(2) = 2 instead of 1. Suppose this time the al- number of nonsaturating pushes and many specific
gorithm selects node 3. Arc (3, 4) is the only ad- rules for examining active nodes can produce sub-
missible arc emanating from node 3, and so the stantial reductions in the number of nonsaturating
algorithm performs a nonsaturating push of value pushes. The following specific implementations of
5 - min{e(3), r34} - min{4, 5} - 4. Fig. 4d) spec- the generic preflow-push algorithms are notewor-
ifies the residual network at the end of this itera- thy:
tion. Using this process for a few more iterations,
the FIFO preflow-push algorithm examines
the algorithm will determine a maximum flow.
the active nodes in the first-in, first-out
(e(i),d(i) (e(j), d(j))
(FIFO) order and runs in O(n 3) time;
(o, 4 ) ~ (l, o) (o, 4 ) ~ (5, o) These algorithms are due to A.V. Goldberg and
R.J. Tarjan [10], J. Cheriyan and S.N. Maheshwari
(1,1) (1, 2) [4], and R.K. Ahuja and J.B. Orlin [3], respectively.
(C) (d) These preflow-push algorithms are more general,
Fig. 4: Illustrating the preflow-push algorithm: a) the
more powerful, and more flexible than augment-
residual network G(x)
for x = 0; b) the residual network ing path algorithms. The best preflow-push algo-
after saturating arcs emanating from the source; c) the rithms currently outperform the best augmenting
residual network after pushing flow on arc (2, 4); d) the path algorithms in theory as well as in practice
residual network after pushing flow on arc (3, 4). (see, for example, [1]).
258
Maximum flow problem
i) what is the maximum number of arc-disjoint a corresponding node cover with v nodes. Conse-
(directed) paths from node s to node t; and quently, the max-flow min-cut theorem establishes
ii) what is the minimum number of arcs that the following result:
we should remove from the network so that COROLLARY 6 In a bipartite network G = (N1 t2
it contains no directed paths from node s to N2, A), the maximum cardinality of any matching
node t. equals the minimum cardinality of any node cover
We will show that these two questions are of G. El
closely related. The second question shows how ro-
These two examples illustrate important rela-
bust a network, for example, a telecommunications
tionships between maximum flows, minimum cuts,
network, is to the failure of its arcs. and many other problems in the field of combi-
In the network G, let us define the capacity of natorics. The maximum flow problem is of inter-
each arc as equal to one. Consider any feasible flow est because it provides a unifying tool for view-
x of value v in the resulting unit capacity network. ing many such results, because it arises directly
We can decompose the flow x into flows along v di- in many applications, and because it has been a
rected paths from node s to node t, each path car- rich arena for developing new results concerning
rying a unit flow. Now consider any s - t-cut IS, S] the design and analysis of algorithms.
in the network. The capacity of this cut is I(S, S) I
See also: M i n i m u m cost flow p r o b l e m ; N o n -
that is, equals the number of forward arcs in the
c o n v e x n e t w o r k flow p r o b l e m s ; Traffic net-
cut. Since each path joining nodes s and t contains
work equilibrium; N e t w o r k location: Cover-
at least one arc in the set (S, S), the removal of all
ing p r o b l e m s ; S h o r t e s t p a t h t r e e a l g o r i t h m s ;
the arcs in (S, S) disconnects all paths from node
Steiner tree problems; E q u i l i b r i u m net-
s to node t. Consequently, the network contains
works; S u r v i v a b l e n e t w o r k s ; D i r e c t e d t r e e
a disconnecting set of arcs of cardinality equal to
n e t w o r k s ; D y n a m i c traffic n e t w o r k s ; Auc-
the capacity of any s - t-cut [S, S]. The max-flow
tion algorithms; Piecewise linear network
min-cut theorem immediately implies the follow-
flow p r o b l e m s ; C o m m u n i c a t i o n n e t w o r k as-
ing result:
signment problem; Generalized networks;
COROLLARY 5 The maximum number of arc- Evacuation networks; Network design prob-
disjoint paths from s to t in a directed network lems; S t o c h a s t i c n e t w o r k p r o b l e m s : M a s -
equals the minimum number of arcs whose removal sively p a r a l l e l s o l u t i o n ; N o n o r i e n t e d m u l t i -
will disconnect all paths from node s to node t. [D c o m m o d i t y flow p r o b l e m s .
259
Maximum flow problem
lems', J. ACM 19 (1972), 248-264. ements of U (the subsets L and R may not be
[7] ELIAS, P., FEINSTEIN, A, AND SHANNON, C.E.: 'Note
disjoint), together with a sequence of m distinct
on maximum flow through a network', IRE Trans. In-
form. Theory IT-2 (1956), 117-119.
partitions of S: (A1, B ~) , . . . , (Am, B m ) such that
[8] FORD, L.R., AND FULKERSON, D.R.: 'Maximal flow for all i = 1 , . . . , rn, the partition (Ai, Bi) pairs the
through a network', Canad. J. Math. 8 (1956), 399- elements ai and bi. The maximum partition match-
404. ing problem is to construct a partition matching of
[9] GhBOW, H.N.: 'Scaling algorithms for network prob- order rn for a given collection S with m maximized.
lems', J. Comput. Syst. Sci. 31 (1985), 148-168.
[10] GOLDBERG, A.V., AND TARJAN, R.E.: 'A new ap- The maximum partition matching problem
proach to the maximum flow problem', J. A CM 35 arises in connection with the parallel routing prob-
(1988), 921-940, also: Proc. 19th ACM Symp. Theory lem in interconnection networks. In particular, in
of Computing, pp.136-146.
the study of the star networks [1], which are attrac-
Ravindra K. Ahuja tive alternatives to the popular hypercubes net-
Dept. Industrial and Systems Engin. Univ. Florida works. It can be shown that constructing an opti-
Gainesville, FL 32611, USA
mal parallel routing scheme in the star networks
E-mail address: ahuja©ufl, e d u
can be effectively reduced to the maximum parti-
Thomas L. Magnanti
tion matching problem. Readers interested in this
Sloan School of Management
and connection are referred to [2] for a detailed discus-
Dept. Electrical Engin. and Computer Sci. sion.
Massachusetts Inst. Technol.
The maximum partition matching problem can
Cambridge, MA 02139, USA
be formulated in terms of the 3-dimensional
E-mail address: magna.nti@mit, edu
James B. Orlin
matching problem as follows: given an instance S =
Sloan School of Management {C1,..., Ck} of the maximum partition matching
Massachusetts Inst. Technol. problem, we construct an instance M for the 3-
Cambridge, MA 02139, USA dimensional matching problem such that a triple
E-mail address: jorl±nOrait, edu (a, b, P) is contained in M if and only if the parti-
MSC 2000:90C35 tion P of S pairs the elements a and b. However,
Key words and phrases: network, maximum flow problem, since the number of partitions of the collection S
minimum cut problem, augmenting path algorithm, preflow- can be as large as 2n and the 3-dimensional match-
push algorithm, max-flow min-cut theorem.
ing problem is NP-hard [4], this reduction does not
hint a polynomial time algorithm for the maximum
partition matching problem.
M A X I M U M PARTITION MATCHING, MPM
The maximum partition matching problem was in- In the rest of this article, we study the ba-
troduced recently in the study of routing schemes sic properties for the maximum partition match-
on interconnection networks [2]. In this article, we ing problem, and present an algorithm of running
study the basic properties of the problem. An effi- time O(n 2 log n) for the problem. We first intro-
cient algorithm for the maximum partition match- duce necessary terminologies that will be used in
ing problem is presented. our discussion.
Let 7r = ( L , R , ( A 1 , B 1 ) , . . . , ( A m , B m ) ) be a
Definitions and Motivation. Let S = partition matching of the collection S, where L =
{ C 1 , . . . , Ck} be a collection of subsets of the uni- {al,...,am} and R = {bl,...,bm}. We will say
versal set U - { 1 , . . . , n} such that uik=lCi -- U, that the partition (Ai, Bi) left-pairs the element
and Ci fq Cj - 0 for all i ¢- j. A partition ai and right-pairs the element bi. An element a
( A , B ) of S pairs two elements a and b in U if is said to be left-paired if it is in the set L.
a is contained in a subset in A and b is con- Otherwise, the element a is left-unpaired. Simi-
tained in a subset in B. A partition matching larly we define right-paired and right-unpaired ele-
(of order m) of S consists of two ordered subsets ments. The collections A/ and Bi are called the
L - { a l , . . . , a m } and R - {bl,...,bm} of m el- left-collection and right-collection of the parti-
260
Maximum partition matching
tion (Ai, Bi). The partition matching 7r may also Suppose that the collection S consists of k sub-
be written as ~[(al, b t ) , . . . , (am, bin)] if the corre- sets C1, .... , Ck and 2 k >_ 4n. The pre-matching a
sponding partitions are implied. contains at most n pairs. Let (a, b) be a pair in a
For the rest of this paper, we assume that U - and let C and C ~be two arbitrary subsets in S such
{ 1 , . . . ,n} and that S - { C 1 , . . . , C k } is a collec- that C contains a and C' contains b. Note that the
tion of pairwise disjoint subsets of U such that number of partitions (A, B) of S such t h a t C is in
k A and C ~ is in B is equal to 2 k - 2 _> n. Therefore,
u~= 1C~ - U.
at least one such partition can be used to left-pair
C a s e I. V i a P r e - M a t c h i n g w h e n IISI{ is a and right-pair b. This observation results in the
L a r g e . A necessary condition for two ordered sub- following theorem.
sets L - { a l , . . . , a m } and R - { b t , . . . , b m } of U THEOREM 1 Let S - { C 1 , . . . , Ck} be a collec-
to form a partition matching for the collection S
tion of nonempty subsets of the universal set U =
is that ai and bi belong to different subsets in the k
{ 1 , . . . , n} such t h a t Ui= 1Ci - U and Ci A Cj - 0,
collection S, for all i - 1 , , . . . ,m. We say that for i ¢ j. If 2 k > 4n, then a m a x i m u m partition
the two ordered subsets L and R of U form a pre- matching in S can be constructed in time O(n2).
matching a - {(hi, bi)" 1 <_ i <_ rn} if ai and bi do [:]
not belong to the same subset in the collection S,
for all i - 1 , . . . , m. The pre-matching a is maxi- PROOF. Consider the following algorithm
mum if m is the largest among all pre-matchings partition- matching-I.
of S. Input: the collection S = { C 1 , . . . , Ck } of subsets
A m a x i m u m pre-matching can be constructed of U
efficiently by the algorithm pre-matching given be- Output: a partition matching r in S
1. construct a m a x i m u m pre-matching a of
low, where we say that a set is singular if it con-
s;
sists of a single element. See [3] for a proof for the F O R each pair (a, b) in a DO
correctness of the algorithm. use an unused partition of S to pair a and
Input: the collection S = { C t , . . . , Ck } of subsets b.
of U
Algorithm partition-matching-I.
Output: a m a x i m u m pre-matching a in S
1. T = S; a--- 0; Suppose the pre-matching a constructed in step
2. W H I L E T contains more than one set but 1 is a - { ( a t , b t ) , . . . , (am, bin)}. According to
does not consist of exactly three singular
the above discussion, for each pair (ai, bi) in a,
sets
DO there is always an unused partition of S that left-
2.1. pick two sets C and C' of largest cardi- pairs a and right-pairs b. Therefore, step 2 of the
nality in T; algorithm partition-matching-I is valid and con-
2.2. pick an element a in C and an element b structs a partition matching 7r for the collection S.
in C'; Since each partition matching for S induces a pre-
2.3. a=aU{(a,b),(b,a)};
matching in S and a is a m a x i m u m pre-matching,
2.4. c = c- {~}; c ' = c ' - {b};
2.5. if C or C' is empty now, delete it from T; we conclude that the partition matching 7r is a
3. IF T consists of exactly three singular sets m a x i m u m partition matching for the collection S.
C1 = {at}, C2 = {a2}, and C3 = {a3} By carefully organizing the elements in U and
THEN
the partitions of S, we can show that the algorithm
o = o u {(~, ~ ) , (~, ~ ) , (~, ~ ) ) .
partition-matching-I runs in time O(n2). See [3].
Algorithm pre-matching. [~
In the following, we show that when the cardi-
nality of the collection S is large enough, a maxi- C a s e II. V i a G r e e d y M e t h o d w h e n I]Sll is
m u m partition matching of S can be constructed S m a l l . Now we consider the case 2 k < 4n. Since
from the m a x i m u m pre-matching a produced by the number 2 k of partitions of the collection S is
the algorithm pre-matching. small, we can apply a greedy strategy that expands
261
Maximum partition matching
a current partition matching by trying to add each tion P1 can be either used or unused). The par-
of the unused partitions to the partition matching. tition P is directly right-reachable from a parti-
We show in this section that a careful use of this tion P2 = (A2, B2) if the right-paired set of P is
greedy method constructs a maximum partition contained in B2. A partition Ps is left-reachable
matching for the given collection. (resp. right-reachable) from a partition P1 if there
Suppose we have a partition matching ~ - are partitions P 2 , . . . , Ps-1 such that Pi is directly
7r[(al, bl),..., (ah, bh)] and want to expand it. The left-reachable (resp. directly right-reachable) from
partitions of the collection S then can be classified Pi-1, for all i = 2 , . . . , s. [::]
into two classes: h of the partitions are used to
The left-reachability and the right-reachability are
pair the h pairs (a/, bi), i - 1 , . . . , h, and the rest
transitive relations.
2 k - h partitions are unused. Now if there is an
Let P1 = (AI, B1) be an unused partition such
unused partition P - (A, B) such that there is a
that there are no left-unpaired elements in A1,
left-unpaired element a in A and a right-unpaired
and let Ps = (As, Bs) be a partition left-reachable
element b in B, then we simply pair the element
from P1 and there is a left-unpaired element as in
a with the element b using the partition P, thus
As. We show how we can use a chain justification
expanding the partition matching ~.
to make a left-unpaired element for the collection
Now suppose that there is no such unused parti-
A1.
tion, i.e., for all unused partitions (A, B), either A
By the definition, there are used partitions
contains no left-unpaired elements or B contains
P 2 , . . . , Ps-I such that Pi is directly left-reachable
no right-unpaired elements. This case may not nec-
from Pi-1, for i = 2 , . . . , s. We can further assume
essarily imply that the current partition match-
that Pi is not directly left-reachable from Pi-2 for
ing is the maximum. For example, suppose that
i - 3 , . . . , s (otherwise we simply delete the par-
(A, B) is an unused partition such that there is a
tition Pi-I from the sequence). Thus, these parti-
left-unpaired element a in A but no right-unpaired
tions can be written as
elements in B. Assume further that there is a used
partition (A', B') that pairs elements (a', b'), such P1 - ( { e l } U A I , B 1 ),
that the element b' is in B and there is a right-
P2 - ({C__A1,C2} U A[, B2),
unpaired element b in B'. Then we can let the par-
tition ( A ' , B ' ) pair the elements (a', b), and then P3 - ({C2, C3 } U A~, B3),
let the partition ( A , B ) pair the elements (a,b'), °
262
Maximum partition matching
263
Maximum partition matching
joint, then we can construct a maximum partition flipping, we show that a maximum partition
matching from the partition matching ~exp con- matching of n pairs can be constructed by flipping
structed by the algorithm greedy-expanding. For d partitions in the partitions P i , . . . , Pt.
this, we need the following technical lemma.
Input: a partition matching { P i , . . . , P t } that
LEMMA 5 If the sets Lreac and Rreac contain a left-pairs all elements in Uk=2Ci, t =
common partition and the partition matching ?rex p E i=2
' ICil, and the set Ci is contained in
the right-collection of each partition Pi,
has less than n pairs, then there is a set Co in S, i=l,...,t,d=lC11 <_t
ICol <_ n/2, such that either all elements in each Output: a maximum partition matching in S with
set C ¢ Co are left-paired and every used parti- n pairs.
tion whose left-paired set is not Co is contained in if not all elements in the set C1 are right-
Lreac, or all elements in each set C ~ Co are right- paired by P i , . . . , P t , replace a proper
number of right-paired elements in Ui=2Ci k
paired and every used partition whose right-paired
by the right-unpaired elements in Ci so
set is not Co is contained in Rreac. [--] that all elements in Ci are right-paired
For a proof, see [3]. by P ~ , . . . , P t ;
suppose that the partitions P I , . . . , P t - d
T H E O R E M 6 If Lreac and Rreac have a common right-pair t - d elements b~,... ,bt-d in
k
t-J,=2Ci, and that P t - d + i , . . . , P t right-
partition, then the collection S has a maximum
pair the d elements in C1;
partition matching of n pairs, which can be con-
suppose that P ~ , . . . , P t - d are the t - d
structed in linear time from the partition matching partitions in { P i , . . . , P t } that left-pair
7rexp . [-] the elements b~ , . . . , bt-d;
flip each of the d partitions in
PROOF. If 7rexp has n pairs, then /rexp is al-
{Pi,...,Pt} - {P~,...,Pt-d} to get
ready a maximum partition matching. Thus we d partitions P ~ , . . . , PJ to left-pair the d
assume that 7rexp has less than n pairs. Accord- elements in Ci. The right-paired element
ing to the above lemma, we can assume, without of each P[ is the left-paired element
loss of generality, that all elements in each set Ci, before the flipping;
{P~,...,Pt,P~,...,P~} is a partition
i = 2 , . . . , k, are left-paired, and that every used
matching of n pairs.
partition whose left-paired set is not Ci is in Lreac.
Moreover, ICll _< E k=2 Ic l. Algorithm partition-flipping.
k
Let t - ~-~i=2 ICil and d - ICll. Then we can Step 1 of the algorithm is always possible" since
assume that the partition matching ~exp consists Ci is contained in the right-collection of each par-
of the partitions tition Pi, i - 1 , . . . , t , and t >__ d, for each right-
unpaired element b in Ci, we can always pick a
, . . . , Pt , Pt + , . . . , Pt + h
k
partition Pi that right-pairs an element in Ui=2Ci,
where Pi,..., Pt are used by ~exp to left-pair the and let Pi right-pair the element b. We keep do-
elements in uik=2ci, and Pt+i,..., Pt+h are used by ing this replacement until all d elements in Ci get
7rexp to left-pair the elements in Ci, h < d. More- right-paired. At this point, the number of parti-
over, all partitions P i , . . . , Pt are in the set Lreac. tions in { P 1 , . . . , P t } that right-pair elements in
Thus, the set Ci must be contained in the right- k
Ui=2Ci is exactly t - d. Step 3 is always possible
collection in each of the partitions Pi,..., Pt. since the partitions P 1 , . . . , Pt left-pair all elements
We ignore the partitions Pt+i,...,Pt+h and in Uik=2Ci.
use the partitions Pi,..., Pt to construct a max- Now we verify that the constructed sequence
imum partition matching of n pairs. Note that {Pi,..., Pt, P~,..., P~} is a partition matching in
{Pi,..., Pt } also forms a partition matching in the S. No two partitions Pi and Pj can be identical
collection S. since { P i , . . . , P t } is supposed to be a partition
For a partition ( A , B ) of S, we say that the matching in S. No two partitions P/' and Pj can
partition (B, A) is obtained by flipping the parti- be identical since they are obtained by flipping
tion (A, B). In the following algorithm partition- two different partitions in {P1,... ,P t}. No par-
264
Maximum partition matching
tition Pi is identical to a partition P~ because Pi conclude that the partitions in W L - - Lfree U Lreac
has C1 in its right-collection while P~ has C1 in its can be used to left-pair at most l U L l - ILreacl ele-
left-collection. Therefore, the partitions P 1 , . . . , Pt, ments in any partition matching in S.
P ~ , . . . , PJ are all distinct. Similarly, the partitions in the set WR - Rfree U
Each of the partitions P 1 , . . . , P t left-pairs an Rreac can be used to right-pair at most IRreacl el-
element in Ui=2Ci k , and each of the partitions ements in any partition matching in S.
P ~ , . . . , PJ .left-pairs an element in C1. Thus, all Therefore, any partition matching in the col-
elements in the universal set U get left-paired in lection S can include at most I'Lreacl partitions in
{P1,... , P,, P{, . . . , P~ }. the set WL, at most IRreacl partitions in the set
Finally, the partitions P 1 , . . . , P t right-pair all WR, and at most all partitions in the set Wother.
elements in C1 and the elements b l , . . . , b t - d Consequently, a maximum partition matching in
k
in Ui=2Ci. Now by our selection of the parti- S consists of at most ILreacl + IRreac]-+-[Wotherl
tions, the partitions P~,... ,P~ precisely right- partitions. Since the partition matching 7rexp con-
pair all the elements in Ui=2C k i -{bl,...,bt-d}. s t r u c t e d by the algorithm greedy-expanding con-
Thus, all elements in U also get right-paired in tains just this many partitions, ~exp is a maximum
{P1,... , Pt, P~, . . . , P[~}. partition matching in the collection S. K]
This concludes that the constructed sequence Now it is clear how the maximum partition
{ P 1 , . . . , P t , P ~ , . . . , P ~ } is a maximum partition matching problem is solved.
matching in the collection S. The running time of
THEOREM 8 The maximum partition matching
the algorithm partition-flipping is obviously linear.
problem is solvable in time O ( n 2 log n). K]
D
Now we consider the case when the sets Lreac PROOF. Suppose that we are given a collection
and Rreac have no common partitions. S - { C 1 , . . . , Ck} of pairwise disjoint subsets of
U- (1,...,n}.
THEOREM 7 If Lreac and Rreac have no common
In case 2 k > 4n, we can call the algorithm
partitions, then the partition matching 7rexp is a
partition-matching-I to construct a maximum par-
maximum partition matching. K]
tition matching in time O(n2).
PROOF. Let Wother be the set of used parti- In case 2 k < 4n, we first call the algorithm
tions in ~'exp that belong to neither Lreac nor greedy-expanding to construct a partition match-
Rreac. Then Lfree U Rfree U Lreac U Rreac U Wother ing 7rexp and compute the sets Lreac and Rreac. If
is the set of all partitions of the collection S, Lreac and Rreac have no common partition, then
and Lreac U Rreac U Wother is the set of partitions according to the previous theorem, 7rexp is already
contained in the partition matching 7rexp. Since a maximum partition matching. Otherwise, we
all sets nreac, Rreac, and Wother are pairwise dis- call the algorithm partition-flipping to construct
joint, the number of partitions in 7rexp is precisely a maximum partition matching. All these can be
[Lreac[ + IRreacl + [Wotherl. done in time O ( n 2 1 o g n ) . A detailed analysis of
Now consider the set WL - Lfree U Lreac. Let this algorithm c a n be found in [3]. [3
UL be the set of elements that appears in the left- See also: F r e q u e n c y assignment problem;
collection of a partition in WL. We have Bi-objective assignment problem; Assign-
• Every P C Lreac left-pairs an element in UL; ment and matching; Assignment methods in
• Every element in UL is left-paired; clustering; Quadratic assignment problem;
Communication network assignment prob-
• If an element a in UL is left-paired by a par-
lem.
tition P, then P E Lreac.
Therefore, the partitions in Lreac precisely left-
References
pair the elements in UL. This gives ILreac[ - l U L l . [I] AKERS, S.B., AND KRISHNAMuRTHY, B." 'A group-
Since there are only lULl elements that appear in theoretic model for symmetric interconnection net-
the left-collections in partitions in Lfree U Lreac, we works', IEEE Trans. Computers 38 (1989), 555-565.
265
Maximum partition matching
[2] CHEN, C-C., AND CHEN, J.: 'Optimal parallel routing MAX-SAT is of considerable interest not only
in star networks', IEEE Trans. Computers 48 (1997), from the theoretical side but also from the prac-
1293-1303. tical one. On one hand, the decision version SAT
[3] CHEN, C-C., AND CHEN, J.: 'The maximum partition
was the first example of an NP-complete problem
matching problem with applications', SIAM J. Corn-
put. 28 (1999), 935-954. [16], moreover MAX-SAT and related variants play
[4] GAREY, M.R., AND JOHNSON, D.S.: Computers an important role in the characterization of differ-
and intractability: A guide to the theory of NP- ent approximation classes like APX and PTAS [5].
completeness, Freeman, 1979. On the other hand, many issues in mathematical
Jianer Chen logic and artificial intelligence can be expressed in
Texas A&M Univ. the form of satisfiability or some of its variants,
College Station like constraint satisfaction. Some exemplary prob-
Texas, USA
lems are consistency in expert system knowledge
E-mail address: chen@cs, tamu. edu
bases [46], integrity constraints in databases [4],
MSC2000: 05A18, 05D15, 68M07, 68M10, 68Q25, 68R05 [23], approaches to inductive inference [35], [40],
Key words and phrases: maximum matching, greedy algo-
asynchronous circuit synthesis [32]. An extensive
rithm, star network, parallel routing algorithm.
review of algorithms for MAX-SAT appeared in
[9].
MAXIMUM SATISFIABILITY PROBLEM, M. Davis and H. P u t n a m [19] started in 1960
MAX-SAT the investigation of useful strategies for handling
resolution in the satisfiability problem. Davis, G.
In the maximum satisfiability (MAX-SAT) prob-
Logemann and D. Loveland [18] avoid the memory
lem one is given a Boolean formula in conjunctive
explosion of the original DP algorithm by replacing
normal form, i.e., as a conjunction of clauses, each
the resolution rule with the splitting rule. A recent
clause being a disjunction. The task is to find an
review of advanced techniques for resolution and
assignment of truth values to the variables that
splitting is presented in [31].
satisfies the maximum number of clauses.
The MAX W-SAT problem has a natural inte-
Let n be the number of variables and m the
ger linear programming formulation. Let yj = 1 if
number of clauses, so that a formula has the fol-
Boolean variable uj is 'true', yj - 0 if it is 'false',
lowing form:
and let the Boolean variable zi = 1 if clause 6'/
is satisfied, zi = 0 otherwise. The integer linear
program is:
l<i<m l<k<lC~l m
where 1(7/I is the number of literals in clause (7/ max E wizi
and lik is a literal, i.e., a propositional variable i=1
uj or its negation u---~,for 1 < j < n. The set of subject to the constraints:
clauses in the formula is denoted by C. If one asso-
ciates a weight wi to each clause (7/one obtains the ' ~ yj+ ~ ( 1 - y j ) > _ z i ,
weighted MAX-SAT problem, denoted as MAX W- jeu,+ jeu~
SAT" one is to determine the assignment of truth i= 1,...,m,
values to the n variables that maximizes the sum yj E {0,1}, j-1,...,n,
of the weights of the satisfied clauses. In the liter- zi E {0, 1}, i= 1,...,m,
ature one often considers problems with different
numbers k of literals per clause, defined as MAX- where U+ and Ui- denote the set of indices of vari-
k-SAT, or MAX W-k-SAT in the weighted case. ables that appear unnegated and negated in clause
In some papers MAX-k-SAT instances contain up Ci, respectively. If one neglects the objective func-
to k literals per clause, while in other papers they tion and sets all zi variables to 1, one obtains an in-
contain exactly k literals per clause. We consider teger programming feasibility problem associated
the second option unless otherwise stated. to the SAT problem [11].
266
Maximum satisfiability problem
The integer linear programming formulation of gorithms that achieve a performance ratio of 3/4
MAX-SAT suggests that this problem could be have been proposed in [27] and [55]. Moreover, it is
solved by a branch and bound method (cf. also possible to derandomize these algorithms, that is,
Integer programming: Branch and bound to obtain deterministic algorithms that preserve
m e t h o d s ) . A usable method uses Chv£tal cuts. In the same bound 3/4 for every instance. The ap-
[35] it is shown that the resolvents in the proposi- proximation ratio 3/4 can be slightly improved
tional calculus correspond to certain cutting planes [28]. T. Asano [2] (following [3]) has improved the
in the integer programming model of inference bound to 0.77. For the restricted case of MAX-2-
problems. SAT, one can obtain a more substantial improve-
Linear programming relaxations of integer lin- ment (performance ratio 0.931) with the technique
ear programming formulations of MAX-SAT have in [21]. If one considers only satisfiable MAX W-
been used to obtained upper bounds in [33], [55], SAT instances, L. Trevisan [54] obtains a 0.8 ap-
[27]. A linear programming and rounding approach proximation factor, while H. Karloff and U. Zwick
for MAX-2-SAT is presented in [13]. A method for [41] claim a 0.875 performance ratio for satisfi-
strengthening the generalized set covering formu- able instances of MAX W-3-SAT. A strong nega-
lation is presented in [47], where Lagrangian mul- tive result about the approximability can be found
tipliers guide the generation of cutting planes. in [36]: Unless P = NP MAX W-SAT cannot be
The first approximation algorithms with a approximated in polynomial time within a perfor-
'guaranteed' quality of approximation [5] were mance ratio greater than 7/8.
proposed by D.S. Johnson [38] and use greedy MAX-SAT is among the problems for which lo-
construction strategies. The original paper [38] cal search has been very successful: in practice,
demonstrated for both of them a performance ra- local search and its variations are the only effi-
tio 1/2. In detail, let k be the minimum number cient and effective method to address large and
of variables occurring in any clause of the formula, complex real-world instances. Different variations
re(x, y) the number of clauses satisfied by the fea- of local search with randomness techniques have
sible solution y on instance x, and m*(x) the max- been proposed for SAT and MAX-SAT starting
imum number of clauses that can be satisfied. from the late 1980s, see for example [30], [52], mo-
For any integer k >_ 1, the first algorithm tivated by previous applications of 'rain-conflicts'
achieves a feasible solution y of an instance x such heuristics in the area of artificial intelligence [44].
that The general scheme is based on generating a
starting point in the set of admissible solution and
y) > 1 1
m*(x) - k+l' trying to improve it through the application of ba-
sic moves. The search space is given by all possi-
while the second algorithm obtains
ble truth assignments. Let us consider the elemen-
m(z, y) > 1 1 tary changes to the current assignment obtained
m*(x) - 2k" by changing a single truth value. The definitions
Recently (1997) it has been proved [12] that the are as follows.
second algorithm reaches a performance ratio 2/3. Let U be the discrete search space: U = {0, 1}n,
There are formulas for which the second algorithm and let f be the number of satisfied clauses. In
finds a truth assignment such that the ratio is 2/3. addition, let U (t) E U be the current configura-
Therefore this bound cannot be improved [12]. tion along the search trajectory at iteration t, and
One of the most interesting approaches in the N(U (t)) the neighborhood of point U (t), obtained
design of new algorithms is the use of random- by applying a set of basic moves #i (1 < i __ n),
ization. During the computation, random bits are where #i complements the ith bit ui of the string:
generated and used to influence the algorithm pro- #i ( u l , . . . , u i , . . . , u n ) = ( U l , . . . , 1 - ui,...,un)"
cess. In many cases randomization allows to obtain
better (expected) performance or to simplify the
construction of the algorithm. Two randomized al- N(U (t))
267
Maximum satisfiability problem
268
Maximum satisfiability problem
269
Maximum satisfiability problem
270
Maximum satis fiability problem
ACM 42, no. 6 (1995), 1115-1145. LAIRD, P.: 'Solving large-scale constraint satisfac-
[29] GOERDT, A.: 'A threshold for unsatisfiability', J. Corn- tion and scheduling problems using a heuristic repair
put. Syst. Sci. 53 (1996), 469-486. method': Proc. 8th Nat. Conf. Artificial Intelligence
[30] Gu, J.: 'Efficient local search for very large-scale satis- (AAAI-90), 1990, pp. 17-24.
fiability problem', ACM SIGART Bull. 3, no. 1 (1992), [45] MITCHELL, D., SELMAN, S., AND LEVESQUE, H.:
8-12. 'Hard and easy distributions of SAT problems': Proc.
[31] Gu, J., PURDOM, P.W., FRANCO, J., AND WAH, lOth Nat. Conf. Artificial Intelligence (AAAI-92), July
B.W.: 'Algorithms for the satisfiability (SAT) problem: 1992, pp. 459-465.
A survey', in D.-Z. Du, J. Gu, AND P.M. PARDALOS [46] NGUYEN, T.A., PERKINS, W.A,, LAFFREY, T.J., AND
(eds.): Satisfiability Problem: Theory and Applications, PECORA, D.: 'Checking an expert system knowledge
Vol. 35 of DIMA CS, Amer. Math. Soc. and ACM, 1997. base for consistency and completeness': Proc. Internat.
[32] GU, J., AND PuRI, R.: 'Asynchronous circuit synthesis Joint Conf. on Artificial Intelligence, 1985, pp. 375-
with Boolean satisfiability', IEEE Trans. Computer- 378.
Aided Design Integr. Circuits 14, no. 8 (1995), 961- [47] NOBILI, P., AND SASSANO, A.: 'Strengthening La-
973. grangian bounds for the MAX-SAT problem', Techn.
[33] HAMMER, P.L., HANSEN, P., AND SIMEONE, B.: 'Roof Report Inst. Informatik KSln Univ., Germany, no. 96-
duality, complementation and persistency in quadratic 230 (1996), Proc. Work Satisfiability Problem, Siena,
0-1 optimization', Math. Program. 28 (1984), 121-155. Italy (J. Franco and G. Gallo and H. Kleine Buening,
[34] HANSEN, P., AND JAUMARD, B.: 'Algorithms for Eds.).
the maximum satisfiability problem', Computing 44 [4s] RESENDE, M.G.C., AND FEO, T.A.: 'A GRASP
(1990), 279-303. for satisfiability', in M. TRICK AND D.S. JOHSON
[35] HOOKER, J.N.: 'Resolution vs. cutting plane solution (eds.): Proc. Second DIMACS Algorithm Implementa-
of inference problems: some computational experience', tion Challenge on Cliques, Coloring and Satisfiability,
Oper. Res. Left. 7, no. 1 (1988), 1-7. DIMACS 26, Amer. Math. Soc., 1996, pp. 499-520.
[36] H~.STAD, J.: ~Someoptima] inapproximability results': [49] RESENDE, M.G.C., PITSOULIS, L.S., AND PARDALOS,
Proc. 28th Annual A CM Symp. on Theory of Comput- P.M.: 'Approximate solution of weighted MAX-SAT
ing, El Paso, Texas, 1997, pp. 1-10. problems using GRASP', in D.-Z. Du, J. Gu, AND
[37] JAUMARD, B., STAN, M., AND DESROSIERS, J.: 'Tabu P.M. PARDALOS (eds.): Satis.fiability Problem: The-
search and a quadratic relaxation for the satisfiability ory and Applications, DIMACS 35, Amer. Math. Soc.,
problem', in M. TRICK AND D.S. JOHSON (eds.): Proc. 1997.
Second DIMA CS Algorithm Implementation Challenge [~01 SELMAN, B., AND KAUTZ, H.: 'Domain-independent
on Cliques, Coloring and Satisfiability, DIMACS 26, extensions to GSAT: Solving large structured satisfia-
1996, pp. 457-477. bility problems': Proc. Internat. Joint Conf. Artificial
[3s] JOHNSON, D.S.: 'Approximation algorithms for com- Intelligence, 1993, pp. 290-295.
binatorial problems', J. Comput. Syst. Sci. 9 (1974), [51] SELMAN, B., KAUTZ, H.A., AND COHEN, B.: 'Local
256-278. search strategies for satisfiability testing', in M. TRICK
[39] JOHNSON, D.S., AND TRICK, M. (eds.): Cliques, col- AND D.S. JOHSON (eds.): Proc. Second DIMACS Algo-
oring, and satisfiability: Second DIMA CS implementa- rithm Implementation Challenge on Cliques, Coloring
tion challenge, Vol. 26 of DIMA CS, Amer. Math. Soc., and Satisfiability, DIMACS 26, 1996, pp. 521-531.
1996. SELMAN, B., LEVESQUE, H., AND MITCHELL, D.: 'A
[401 KAMATH, A.P., KARMARKAR, N.K., RAMAKRISHNAN, new method for solving hard satisfiability problems':
K.G., AND RESENDE, M.G.: 'Computational exprience Proc. l Oth Nat. Conf. Artificial Intelligence (AAAI-
with an interior point algorithm on the satisfiability 92), July 1992, pp. 440-446.
problem', Ann. Oper. Res. 25 (1990), 43-58. [53] SPEARS, W.M.: 'Simulated annealing for hard satis-
[41] KARLOFF, H., AND ZWICK, U.: 'A 7/8-approximation fiability problems', in M. TRICK AND D.S. JOHNSON
algorithm for MAX 3SAT?': Proc. 38th Annual IEEE (eds.): Proc. Second DIMACS Algorithm Implementa-
Symp. Foundations of Computer Sci., IEEE Computer tion Challenge on Cliques, Coloring and Satis fiability,
Soc., 1997. no. 26 in DIMACS 26, 1996, pp. 533-555.
[42] KIRKPATRICK, S., AND SELMAN, B.: 'Critical behav- [54] TREVISAN, L.: 'Approximating satisfiable satisfiabil-
ior in the satisfiability of random Boolean expressions', ity problems': Proc. 5th Annual European Symp. Al-
Science 264 (1994), 1297-1301. gorithms, Graz, Springer, 1997, pp. 472-485.
[43] KIROUSlS, L.M., KRANAKIS, E., AND KRIZANC, D.: YANNAKAKIS, M.: 'On the approximation of maximum
'Approximating the unsatisfiability threshold of ran- satisfiability', J. Algorithms 17 (1994), 475-502.
dom formulas': Proc. Fourth Annual European Symp.
Algorithms, Springer, Sept. 1996, pp. 27-38.
[44] MINTON, S., JOHNSTON, M.D., PHILIPS, A.B., AND Roberto Battiti
Dip. Mat. Univ. Trento
271
Maximum satisfiability problem
Via Sommarive, 14, 38050 Povo (Trento), Italy Annealing refers to a process of cooling material
E-mail address: battiti%science, unitn, i t slowly until it reaches a stable state.
Metropolis also made several early contributions
MSC 2000: 03B05, 68Q25, 90C09, 90C27, 68P10, 68R05,
68T15, 68T20, 94C10 to the use of computers in the exploration of non-
Key words and phrases: maximum satisfiability, local linear dynamics. In the Sixties and Seventies he
search, approximation algorithms, history-sensitive heuris- collaborated with G.-C. Rota and others on sig-
tics. nificance arithmetic. Another contribution of Me-
tropolis to numerical analysis is an early paper on
the use of Chebyshev's iterative method for solving
large scale linear systems [1].
METROPOLIS~ NICHOLAS CONSTANTINE
Nicholas Constantine Metropolis was born in References
[1] BLAIR, A., METROPOLIS, N., NEUMANN, J. VON,
Chicago on June 11, 1915 and died on October 17,
TAUB, A.H., AND TSINGOU, M.: 'A study of a nu-
1999 in Los Alamos. At Los Alamos, Metropolis merical solution to a two-dimensional hydrodynamical
was the main driving force behind the development problem', Math. Tables and Other Aids to Computation
of the MANIAC series of electronic computers. He 13, no. 67 (July 1959), 145-184.
was the first to code a problem for the ENIAC in [2] HARLOT, F., AND METROPOLIS, N.: 'Computing and
computers: Weapons simulation leads to the computer
1945-1946 (together with S. Frankel), a task which
era', Los Alamos Sci. (1983), 132-141.
consumed approximately 1,000,000 IBM punched [3] KIRKPATRICK, S., GELATT, C.D., AND VECCHI JR.,
cards. M.P.: 'Optimization by simulated annealing', Science
Metropolis received his PhD in physics from the 220, no. 4598 (1983), 671-680.
University of Chicago in 1941. He went to Los [4] METROPOLIS, N.: 'The beginning of the Monte Carlo
method', Los Alamos Sci. 15 (1987).
Alamos in 1943 as a member of the initial staff of
[5] METROPOLIS, N.: The age of computing: A personal
fifty scientists of the Manhattan Project. He spent memoir, Daedalus, 1992.
his entire career at Los Alamos, except for two [6] METROPOLIS, N., HOWLETT, J., AND ROTA, G.-C.
periods (1946-1948 and 1957-1965), during which (eds.): A history of computing in the twentieth century,
he was professor of Physics at the University of Acad. Press, 1980.
Chicago. [7] METROPOLIS, N., AND NELSON, E.C.: 'Early comput-
ing at Los Alamos', Ann. Hist. Comput. 4, no. 4 (Oct.
Metropolis is best known for the development 1982), 348-357.
(joint with S. Ulam and J. von Neumann) of the [8] METROPOLIS, N., ROSENBLUTH, i . , TELLER, A., AND
Monte-Carlo method. The Monte-Carlo method TELLER, S.: 'Equation of state calculation by fast com-
provides approximate solutions to a variety of puting machines', J. Chem. Phys. 21 (1953).
mathematical problems by performing statistical Panos M. Pardalos
sampling experiments on a computer. However, Center for Applied Optim.
the real use of Monte-Carlo methods as a research Dept. Industrial and Systems Engin. Univ. Florida
Gainesville, FL 32611, USA
tool stems from work on the atomic bomb during
E-mail address: pardalosCufl, edu
the second world war. This work involved a direct
MSC2000: 90C05, 90C25
simulation of the probabilistic problems concerned
Key words and phrases: Metropolis, simulated annealing,
with random neutron diffusion in fissile material. Monte-Carlo method.
Metropolis and his collaborators, obtained Monte-
Carlo estimates for the eigenvalues of Schrodinger
equation. MINIMAX: DIRECTIONAL DIFFERENTIA-
In 1953, Metropolis co-authored the first paper BILITY
on the technique that came to be known as sim- Minimax is a principle of optimal choice (of some
ulated annealing [3], [8]. Simulated annealing is parameters or functions). If applied, this princi-
a method for solving optimization problems. The ple requires to find extremal values of some max-
name of the algorithm derives from an analogy be- type function. Since the operation of taking the
tween the simulation of the annealing of solids. pointwise maximum (of a finite or infinite number
272
Minimax: Directional differentiability
273
Minimax: Directional differentiability
OqD(x, y) is the superdifferential of qD at the point REMARK 5 More sophisticated results on the di-
[x, y], and Nz,u is the cone normal to a(x) at y. [3 rectional differentiability of max- and maxmin
functions can be found, e.g., in [8]. [-1
Recall that if a function F : R s -+ R is concave,
Z C R s is open, z E Z, then the set
H i g h e r - O r d e r D i r e c t i o n a l D e r i v a t i v e s . The
OF(z)
results above are related to the first order direc-
_ lz, l- lzl lv, z,_zl, tional derivatives. Using these derivatives, it is
possible to construct the following first order ex-
is called the superdifferential of F at z C Z. It is pansion:
convex and compact. f(x + ag) -- f(x) + af'(x,g) + ox,g(a), (6)
where f ' is either f~ or f~.
A m a x m i n f u n c t i o n . Let ~ ( x , y , z ) : S × G1 ×
In some cases it is possible to get 'higher-order'
G2 -4 R be continuous jointly in all variables,
S C R n be an open set, G1 C R m, G2 C R p expansions.
be compact. Put Let
274
Minimax: Directional differentiability
Clearly f ( x + A) (13)
Ro(x, g) D Rl (x, g) D R2(x, g) D . . . . [
= max fi(x) + E
iEI
,1 ]
~. Aik/kk + O(llAIIk)
Note that R0(x, g) does not depend on x and g, k=l
and R1 (x, g) does not depend on g.
= f(x) + max Ak Ak k
PROPOSITION 6 [3, Thm. 9.1] The following ex-
pansion holds:
where
l C~k
f ( x + ag) -- f ( x ) + E -~. f(k)(x' g) + o(g, cJ), d ' f ( x ) = co {A (i) - ( A i o , . . . , A i l ) " i E I } ,
k=l
Aio - f i ( x ) - f(x), A - ( A o , . . . , Al),
(10)
Ao E R, A 1 E R ~ ,
Vg ~ R ~,
k times
where A2 E Rn×n, . . . , Ak E ~t n×''×2.
f(k)(x,g)-- max f[k)(x,g), (11) k times
275
Minimax: Directional differentiability
276
Minimax game tree searching
Minimax tree notion Minimax game notion of a node are passed to its sons and tightened dur-
Minimax tree All board configurations ing the execution of the algorithm. It is easy to see
Node in the tree Board configuration that if the lower b o u n d of a node t of type 'max'
Edge from 'max' to 'min' Move by player 'max' is larger t h a n its upper b o u n d then all not visited
node
sons of node t can be pruned, and similar for nodes
Edge from 'min' to 'max' Move by player 'min'
node of type 'min'.
Node value Quality of a board position FUNCTION AlphaBeta(n, a,/3) IS
Leave node Outcome of a game BEGIN
Solution path Sequence of moves leading IF is_leave(n) THEN RETURN f(n)
to the best outcome s +-- first_son(n)
IF node_type(n) = max THEN
LOOP
Sequential Minimax Game Tree A l g o - a +-- max{a, AlphaBeta(s, a, 13)}
r i t h m s . Let t be a node of a minimax tree. T h e n IF a >__/3THEN RETURN
the function first_son(t) returns the first son node EXIT LOOP WHEN no_more_sons(s, n)
sl of t and n e x t _ s o n ( s i , t) returns the i + l t h son of s +-- next_son(s, n)
node t. The function n o _ m o r e _ s o n s ( s , t) returns END LOOP
RETURN a
true of s is the last son of t. Otherwise it returns
ELSE
false. The ordering of the sons introduced by these LOOP
functions is arbitrary. In practice it is given by j3 +--max{a, AlphaBeta(s, a,~)}
some heuristic function. The function father(t) re- IF ~ < a THEN RETURN a
turns the father node of t, is_leave(t) whether or EXIT LOOP WHEN no_more_sons(s, n)
s +-- next_son(s, n)
not t is a leave node and n o d e _ t y p e ( t ) the type of
END LOOP
node t. RETURN ~3
END IF
Minimax Algorithm. The most basic minimax al-
END AlphaBeta
gorithm is called the minimax algorithm. It sys-
tematically traverses, in a depth first, left to right Pseudocode for the alpha-beta algorithm.
fashion, the complete minimax tree. All nodes are
It has been proved in [18] t h a t the alpha-beta al-
visited exactly once.
gorithm correctly calculates the minimax value of
Alpha-Beta Algorithm. The first nontrivial algo- a tree. The above pseudocode describes the alpha-
r i t h m introduced to compute the minimax value b e t a algorithm.
of a game tree was the alpha-beta algorithm. Ac- The minimax value of a tree T is computed as
cording to D. K n u t h and R. Moore, McCarthy's follows.
comments at the D a r t m o u t h summer research con-
ference on artificial intelligence led to the use e (root(T)) +-- A l p h a B e t a ( r o o t ( T ) , - c ~ , + ~ ) .
of alpha-beta pruning in game playing programs
since the late 1950s. The first published discussion Optimal State Space Search Algorithm SSS,. It has
of an algorithm for minimax tree pruning appeared been introduced by Stockman in 1979, [29]. It orig-
in 1958 (see [11, p. 56]). Two early extensive stud- inates not in game playing but in systematic pat-
ies of the algorithm may be found in [18] and [27]. tern recognition. The algorithm was first analyzed
The idea behind the alpha-beta algorithm is to and criticized in [26].
traverse the minimax tree in a depth first, left to The idea behind the SSS. algorithm is to use
right fashion. It tries to prune sub-trees that can a tree traversal strategy t h a t is, better t h a n the
not influence the minimax value of the tree. The depth first and left to right strategy found in the
conditions used to prune sub'trees are called cut alpha-beta algorithm. The criteria used to order
conditions. The idea behind the suggested cut con- the nodes yet to visit is an upper bound of their
ditions is to associate to each node a lower and an value. Nodes are stored in non increasing order of
upper bound, called a and ~ bounds. The bounds their upper b o u n d in a list called 'open'.
277
Minimax game tree searching
T h e SSS. algorithm first traverses the mini- (Apply the F operator to node s) --
m a x tree from top to b o t t o m . Nodes whose sons IF t -" live AND n o d e t y p e = max
have not yet been visited and which cannot yet be AND NOT is_leave(t) THEN
s +-- first _ son (t)
p r u n e d are m a r k e d 'live'. Nodes m a r k e d 'solved'
LOOP
have already been visited once and have therefore insert(s, live, m, open)
their best u p p e r b o u n d associated. EXIT LOOP WHEN no more_sons(s,t)
T h e operation purge(t, open) removes all nodes s +- next_son(s, t)
from the open list for which the node t is an an- END LOOP
END IF
cestor. Due to the fact t h a t the nodes in the open
IF t = live AND node_type = min
list are sorted in nonincreasing order of their as- AND NOT is_leave(t) THEN
sociated u p p e r bound, the p r u n i n g operation only insert(firstson(t), live, m, open)
eliminates nodes t h a t need no further considera- END IF
tion. IF t = live AND is_leave(t) THEN
insert(t, solved, min {f (t), m }, open)
The SSS, algorithm is described by the follow- END IF
ing pseudocode. IF t = solved AND node_type -- max
AND NOT no_more_sons(t, father(t)) THEN
insert(next _son(t, father(t)), live, m, open)
END IF
IF t -- solved AND node_type -- max
FUNCTION SSS • IS AND no_more_sons(t, father(t)) THEN
BEGIN
insert(father(t), solved, m, open)
open +-- q}
END IF
insert(root, live, +c~, open)
IF t = solved AND node_type - min THEN
LOOP
insert(father(t), solved, m, open)
(s, t, m) +-- remove(open)
purge(father(t), open)
IF s -- root AND t = solved THEN RETURN m
END IF
(Apply the F operator to node s)
END LOOP
END SSS • SCOUT: Minimax Algorithm of Theoretical Inter-
est. In the previous sections, we have described
Pseudocode for the SSS. algorithm. the most c o m m o n m i n i m a x algorithms. While try-
ing to show the optimality of the a l p h a - b e t a al-
gorithm, J. Pearl [23] introduced the S C O U T al-
gorithm. His idea was to show t h a t the S C O U T
T h e operator F(s) is applied to each node s ex- algorithm is d o m i n a t e d by the a l p h a - b e t a algo-
t r a c t e d from the 'open' list. r i t h m and to prove t h a t S C O U T achieves an op-
It is possible to define a dual version of the timal performance. But counterexamples showed
SSS,, which may be called SSS.-dual, in which t h a t the a l p h a - b e t a a l g o r i t h m does not dominate
the c o m p u t a t i o n of u p p e r b o u n d s is replaced by the S C O U T algorithm because the conservative
the c o m p u t a t i o n of lower bounds. T h e S S S , - d u a l testing approach of the S C O U T algorithm may
algorithm has been suggested in [21]. sometimes cut off nodes t h a t would have been ex-
S t o c k m a n has shown t h a t if the SSS, algorithm plored by the a l p h a - b e t a algorithm.
explores a node, t h e n this node is also explored by The S C O U T a l g o r i t h m itself recursively com-
the a l p h a - b e t a algorithm. In fact, the a l p h a - b e t a putes the value of the first of its sons. T h e n it tests
algorithm loses efficiency (in the n u m b e r of nodes to see if the value of the first son is b e t t e r t h a t the
visited) against the SSS. algorithm when the value value of the other sons. In case of a negative result,
of the m i n i m a x tree is found towards the right of the son t h a t failed the test is completely evaluated
the tree. If the SSS. algorithm is applied to win- by recursively calling S C O U T .
lose trees then it visits exactly the same nodes in A l t h o u g h the S C O U T algorithm is more of
the same order as would the a l p h a - b e t a algorithm. theoretical interest, there are some problem in-
278
Minimax game tree searching
stances where it outperforms all other minimax return that value. If the minimax value does not
algorithms. A last advantage of the SCOUT al- belong to the set ]a,b[, then the value returned
gorithm versus one of its major competitors, the will be either a or b, depending on whether the
SSS, algorithm, is that its storage requirements minimax value belongs to ] - c o , a] or [b, +cc[. We
are similar to those of the alpha-beta algorithm. then say that the alpha-beta algorithm ]ailed low,
respectively high. In the case where the algorithm
GSEARCH: Generalized Game Tree Search Algo-
failed low, the call
rithm. In 1986, T. Ibaraki [16] proposed a gen-
eralization of the previously known algorithms to e +-- AlphaBeta ( r o o t ( T ) , - c o , a + 1)
compute the minimax value of a game tree. His
idea was to use a branch and bound like approach. will return the correct value. But it would also
Nodes of the considered tree which have not yet be possible to reiterate this procedure on a subset
been evaluated are stored in a list which is or- a + 1[.
dered according to a given criteria. Different or- The technique of limiting the interval in which
derings give different traversal strategies. A lower
the solution may be found is called aspiration
and upper bound is associated to each node. These
search. If the minimax value belongs to the spec-
bounds generalize the a and 13 values found in the
ified interval, then a much larger number of cut
alpha-beta algorithm.
conditions are verified and the tree actually tra-
Finally Ibaraki showed how the algorithm versed is much smaller than the one traversed by
GSEARCH is related to other minimax algorithms the alpha-beta algorithm without initial alpha and
like alpha-beta or SSS,, and proved that his algo- beta bounds.
rithm always surpasses the alpha-beta algorithm. Furthermore it is interesting to note that aspi-
SSS-2: Recursive State Space Search Algorithm. ration search is at the bases of a technique called
The SSS-2 algorithm has been proposed by W. Pi- iterative deepening which is used in many game
jls and A. de Bruin [24]. It is based on the idea of playing programs.
computing an upper bound for the root node and I. Alth5fer [5] suggested an incremental nega-
then repeatedly transforming this upper bound max algorithm which uses estimates of all nodes
into a tighter one. They have shown that the SSS- in the minimax tree, rather than only those of the
2 algorithm exactly expands the same nodes as leave nodes, to determine the value of the root
those to which the SSS, algorithm applies the F node. This algorithm is useful when dealing with
operator. erroneous leave evaluation functions. Under the
assumption of independently occurring and suf-
Some Variations On The Subject. Computing the ficiently small errors, the proposed algorithm is
minimax value of a game tree may be seen as aspir- shown to have exponentially reduced error prob-
ing the solution value from a leave node through abilities with respect to the depth of the tree.
the whole tree up to the root node. While moving R.L. Rivest [25] proposed an algorithm for
closer to the root node, more and more useless sub-
searching minimax trees based on the idea of ap-
trees will be eliminated, as we have already stated
proximating the min and the max operators by
for the alpha-beta algorithm. The better the a and
generalized mean value operators. The approxima-
/3 bounds, the more subtrees may be pruned. If, for tion is used to guide the selection of the next leave
instance, one knows that the minimax value will,
node to expand, since the approximation allows to
with high probability, be found in the subset ]a, b[,
select efficiently that leave node upon whose value
then it may be worth calling the alpha-beta algo-
the minimax value most highly depends. B.W. Bal-
rithm as lard [6] proposed a similar algorithm where the
e <---AlphaBeta (root(T), a, b) value of some nodes (the chance node as he calls
them) is a, possibly weighted, average of the val-
If, indeed, the minimax value e(root(T)) belongs ues of its sons. In fact he considers one additional
to the set ]a, b[, then the algorithm will correctly type of nodes called chance nodes.
279
Minimax game tree searching
Conspiracy numbers have been introduced by bors. The probability to find the optimum in the
D.A. McAllester in [22] as a measurement of the subtree rooted at a given son then always decreases
accuracy of the minimax value of an incomplete when traversing the sons in a left to right order.
tree. They measure the number of leave nodes Such ordering information is generally available in
whose value must change in order to change the game-playing programs, the ordering function be-
minimax value of the root node by a given amount. ing a heuristic function based on the knowledge of
the game to be played.
Parallel Minimax Tree Algorithms. Paral- A Mandatory Work First Algorithm. R. Hewett
lelizing the minimax algorithm is trivial over uni- and G. Krishnamurthy [15] proposed an algorithm
form trees. Even on irregular trees, the paralleliza- that achieves an efficiency of roughly 50% for an
tion remains easy. The only additional problem number of processors in the range of 2 to 25. All
arises from the fact that the size of the subtrees the nodes that still need to be explored are main-
to explore may now vary. Different processors will tained in a list called 'open' list. This list is ordered
be attributed problems of varying computational with respect to how the nodes have been reached.
volume. All what is needed then to achieve excel- More precisely, the algorithm maintains two lists
lent speedups, is a load-balancing scheme, that is, called 'open' and 'closed', and a tree called 'cut'.
a mechanism by means of which processors may, The 'open' list contains all the nodes yet to be
during run-time, exchange problems so as to keep explored, the 'closed' list contains the expanded
all processors busy all the time. nodes not yet pruned and the 'cut' tree contains
The parallelization of the alpha-beta and the the pruned nodes. The 'open' list initially contains
SSS. algorithms are much more interesting than only the root node. All processors fetch nodes from
the more theoretical minimax algorithm. There ex- the 'open' list and process them if they cannot be
ist basically two approaches or techniques to par- discarded, that is, they do not have any of their
allelize the alpha-beta algorithm. In the first ap- ancestors in the 'cut' tree. Leave nodes are eval-
proach, which has been one of the first techniques uated and their result is returned to the parent
used, all processors explore the entire tree but us- which may update its value and check for possi-
ing different search-intervals. This approach is at ble pruning by traversing the 'cut' tree up to the
the basic of the algorithm called parallel aspiration root node applying the usual alpha and beta cut-
search by G. Baudet [7]. The second one consists offs. If the node selected is not a leave node, it is
in exploring simultaneously different parts of the expanded and its sons are inserted into the 'open'
minimax tree. list and itself into the 'closed' list.
S.G. Akl et al. [1], [2] proposed an algorithm
A Simple Way to Parallelize the Exploration o]
that uses the same approach for exploring the
Minimax Trees. Exploring a minimax tree in par-
minimax tree. Their priority function is computed
allel can very simply be obtained by generating
as
the sons of the root node, and their sons and so
on up to the point where one has as many son p(ni) - p(father(ni)) - (bn, + 1 - i) . 10 (h-/-l>,
nodes waiting to be explored as there are proces-
sors. At this point, each processor explores the sub- where ni is the ith son of node father(hi), bn~ the
tree rooted at one of these nodes, using any given branching of node father(hi), h the search depth
sequential minimax algorithm. When all proces- (the maximal depth of the minimax tree) and ]
sors have completed their exploration, the solution the depth of node father(ni) in the minimax tree.
for the entire tree is computed by using the partial K. Almquist et al. [3] also developed an algo-
results obtained from each of the processors. rithm based on the idea of having two categories
In practice the sons of a node may be ordered in of unexplored nodes which are ordered according
such a way that any son has a probability of yield- to a given priority function. Furthermore they add
ing the locally optimal path that is no smaller than to this concept parallel aspiration search as well as
the corresponding probabilities for its right neigh- a novel scheduling algorithm.
280
Minimax game tree searching
In the same direction, V.-D. Cung and C. Rou- sons si of n to its Pb slaves. As soon as one slave
cairol [9] have proposed a shared memory parallel returns the next unexplored son sj is spawned to
minimax algorithm which distinguishes between that slave or the current value is returned to the
critical and non critical nodes. In their algorithm father processor if the cut condition is satisfied. If
one processor is assigned to each node. all the sons of a node have been spawned to its
slaves, the father processor waits for the results of
In the algorithm by I.R. Steinberg and M.
all its slaves. Leave processors simply compute the
Solomon [28], which is also a mandatory work first
value of their associated node using the sequential
type algorithm, the list containing the speculative
work or non critical nodes is dynamically ordered. alpha-beta algorithm.
An important advantage of the tree-splitting al-
Aspiration Search. The parallel algorithm called gorithm over other more elaborated algorithms is
aspiration search has been introduced by Baudet that it may be simply implemented as well on
in 1978 [7]. In this algorithm the search interval a shared memory parallel machine as on a dis-
] - c ~ , +co[ used by the sequential alpha-beta algo- tributed memories parallel machine.
rithm is divided into a certain number of subinter- The tree-splitting algorithm has been imple-
vals that cover the entire range ] - c ~ , +c~[. Now, mented and its execution has been simulated. On
every processor explores the entire minimax tree a 27 processor simulated machine, in which each
using one subinterval, different processors being processor has tree slave sons associated, the aver-
assigned different intervals. Any processor search- age speedup was 5.31 for trees of depth eight and
ing an interval ]ai, ai+l] may either fail low or high. a branching of three.
The principle is the same as in the sequential ver-
sion of the algorithm. Exactly one processor will P VSPLIT: Principal Variation Splitting Algo-
neither fail low, nor fail high. The value computed rithm. It has been proposed by T.A. Marsland and
by this processor is the value of the minimax tree M.S. Campbell [19] and is by far the most often
to explore. implemented algorithm, especially in chess playing
programs. The algorithm is based on the structure
The implementation of the aspiration search al-
of the sequential alpha-beta algorithm. The idea
gorithm is really simple. Furthermore, there is no
is to first explore in a sequential fashion a path
information exchange needed between processors.
from the root node to its leftmost leave. This path
If the nodes in the to explore minimax tree are or-
is called the principal variation path. The traver-
dered in such a way that the alpha-beta algorithm
sal is done to obtain alpha and beta bounds. If
has to explore the whole tree, then the speedup
the minimax tree to explore is of type best first,
obtained by using the aspiration search algorithm
then the explored principal variation path repre-
is maximal. But, when the aspiration search algo-
sents the solution path. In a second phase, for each
rithm is applied to randomly generated trees then
level of the minimax tree all the yet to be visited
Baudet has shown that the speedup is limited to
sons are explored in parallel by using the bounds
about six and is independent of the number of pro-
computed during the principal variation path com-
cessors used.
putation and the traversal of the lower levels of the
Tree-Splitting Algorithm. Among the early parallel minimax tree.
minimax algorithms is the tree-splitting algorithm The P V S P L I T algorithm is completely de-
by R.A. Finkel and J.P. Fishburn [14]. This algo- scribed by the following pseudocode using the
rithm is based on the idea to look at the available negamax notation.
processors as a tree of processors. Each processor, The P V S P L I T algorithm has been implemented
except for the ones representing leaves in the pro- in [20] on a network of Sun workstations. An ac-
cessor tree, have a fixed number Pb of s o n or slave celeration of 3.06 has been measured on 4 proces-
processors. During the execution of the algorithm sors when traversing minimax trees representing
a non leave processor associated with a node n in real chess games. The main problem of the PVS-
the minimax tree spawns the exploration of the PLIT algorithm is that, during the second phase,
281
Minimax game tree searching
the subtrees explored in parallel are not necessar- distributed among all the processors. This opera-
ily of the same size. tion concludes the synchronization phase.
The PVSPLIT algorithm is most efficient when The computation phase of the SDSSS algorithm
the iterative deepening technique is used, because may be described by the following pseudocode.
with each iteration is is increasingly likely that the
(Computation phase) -
first move tried, that is, the one on the principal
W H I L E (there exists a node in the open list
variation path, is the best one. having an upper bound of m*> L O O P
F U N C T I O N PVSplit(b, a, 13) IS (s, t,m* ) +- remove(open)
BEGIN IF s = root A N D t = solved T H E N
IF is_leave(n) T H E N R E T U R N f ( n ) BROADCAST 'the solution has been found'
s +-- first_son(n) RETURN m*
a +-- - P V S p l i t ( s , - f ~ , - a ) E N D IF
IF a >_ ~ T H E N R E T U R N a <Apply the F operator to node s>
F O R s' E s o n s ( n ) - {s} L O O P IN PARALLEL END LOOP
(wait until a slave node is idle)
vi +-- - T r e e S p l i t ( s ' , - ~ , - a ) Pseudocode for the computation phase of the SDSSS
IFvi>aTHEN algorithm.
ol +--- vi
(Update the bounds according to a Experiments executing the SDSSS algorithm on
on all slaves> an Intel iPSC/2 parallel machine have been con-
END IF ducted. Speedups of up to 11.4 have been mea-
IF c~ > / 3 T H E N
sured for 32 processors.
(Terminate all slave processors>
R E T U R N c, Distributed Game Tree Search Algorithm. R. Feld-
END IF
man [12] parallelized the alpha-beta algorithm for
END L O O P
R E T U R N c~ massively parallel distributed memory machines.
END PVSplit Different subtrees are searched in parallel by dif-
ferent processors. The allocation of processors to
Pseudocode for the P V S P L I T algorithm. trees is done by imposing certain conditions on the
nodes which are be selectable. They introduce the
Synchronized Distributed State Space Search. A concept of younger brother waits. This concept es-
completely different approach to parallelizing the sentially says that in the case of a subtree rooted
SSS, algorithm has been taken by C.G. Diderich at s l, where Sl is the first son node of a node n, is
and M. Gengler [10]. The algorithm proposed is not yet evaluated, then the other sons s 2 , . . . , Sb of
called synchronized distributed state space search node n are not selectable. Younger brothers may
(SDSSS). It is an alternation of computation and only be considered after their elder brothers, which
synchronization phases. The algorithm has been has as a consequence that the value of the elder
designed for a distributed memory multiproces- brothers may be used to give a tight search win-
sor machine. Each processor manages its own local dow to the younger brothers.
'open' list of unvisited nodes. This concept is nevertheless not sufficient to
The synchronization phase may be subdivided achieve the same good search window as the alpha-
in three major parts. First, the processors ex- beta algorithm achieves. Indeed when node Sl is
change information about which nodes can be re- computed, then the younger brothers may all be
moved from the local 'open' lists. This corresponds explored in parallel using the value of node Sl.
to each processor sending the nodes for which the Thus the node s2 has the same search window as it
'purge' operation may be applied by all the other would have in the sequential alpha-beta algorithm,
processors. Next, all the processors agree on the but this is not true anymore for si, where i >_ 3.
globally lowest upper bound m* for which nodes Indeed if nodes s2 and s3 are processed in paral-
exist in some of the 'open' lists. Finally all the lel, they only know the value of node Sl, while in
nodes having the same upper bound m* are evenly the sequential alpha-beta algorithm, the node s3
282
Minimax game tree searching
would have known the value of b o t h Sl and s2. [2] AKL, S.G., BARNARD, D.T., AND DORAN, R.J.: 'De-
This fact forces the parallel algorithm to provide sign, analysis, and implementation of a parallel tree
an information dissemination protocol. search algorithm', IEEE Trans. Pattern Anal. Machine
Intell. PAMI-4, no. 2 (1982), 192-203.
In case the nodes s2 and 83 are evaluated on [3] ALMQUIST, K., MCKENZIE, N., AND SLOAN, K.: 'An
processors P and p , , and processor P finishes its inquiry into parallel algorithms for searching game
work before P ' , producing a better value t h a n node trees', Techn. Report Univ. Washington, Seattle, WA
81 did, then processor P will inform processor P ' 12, no. 3 (1988).
of this value, allowing it to continue with better [4] ALTHOFER, I.: 'On the complexity of searching game
trees and other recursion trees', J. Algorithms 9 (1988),
information on the rest of its subtree or to termi-
538-567.
nate its work if the new value allows P ' to con- [5] ALTH(~FER, I." 'An incremental negamax algorithm',
clude that its c o m p u t a t i o n becomes useless. The Artif. Intell. 43 (1990), 57-65.
load distribution is realized by means of a dynamic [6] BALLARD, B.W.: 'The *-minimax search procedure for
load balancing scheme, where idle processors ask trees containing chance nodes', Artif. Intell. 21 (1983),
327-350.
other processors for work.
[7] BAUDET, G.M.: 'The design and analysis of algo-
Speedups as high as 100 have been obtained on rithms for asynchronous multiprocessors', PhD Thesis
a 256 processor machines. In [13], a speedup of Carnegie-Mellon Univ. Pittsburgh, PA, no. CMU-CS-
344 on a 1024 t r a n s p u t e r network interconnected 78-116 (1978).
as a grid and a speedup of 142 on a 256 processor Is] BOHM, M., AND SPECKENMEYER, E.: 'A dynamic pro-
cessor tree for solving game trees in parallel': Proc.
t r a n s p u t e r de Bruijn interconnected network have
SOR '89, 1989.
been shown.
[0] CUNG, V.-D., AND ROUCAIROL, C.: 'Parallel minimax
tree searching', Res. Report INRIA 1549 (1991). (In
Parallel M i n i m a x Algorithm with Linear Speedup.
French.)
In 1988, Alth5fer [4] proved t h a t it is possible, [10] DIDERICH, C.G.: 'Evaluation des performances de
to develop a parallel minimax algorithm which l'algorithme SSS* avec phases de synchronisation sur
achieves linear speedup in the average case. W i t h une machine parall~le ~. m~moires distributes', Techn.
the assumption that all minimax trees are binary Report Computer Sci. Dept. Swiss Federal Inst. Techn.
win-loss trees, he exhibited such a parallel mini- Lausanne, Switzerland, no. LiTH-99 (July 1992). (In
French.)
max algorithm.
[11] FEIGENBAUM, E.A., AND FELDMAN, J.: Computers
M. B5hm and E. Speckenmeyer [8] also sug- and thought, McGraw-Hill, 1963.
gested an algorithm which uses the same basic [12] FELDMANN, a., MONIEN, B., MYSLIWIETZ, P., AND
ideas as Alth5ffer. Their algorithm is more gen- VORNBERGER, O.: 'Distributed game tree search',
ICCA J. 12, no. 2 (1989), 65-73.
eral in the sense t h a t it needs only to know the
[13] FELDMANN, R., MYSLIWIETZ, P., AND MONIEN, B.:
distribution of the leave values and is independent
'Game tree search on a massively parallel system',
of the branching of the tree explored. in H.J. VAN DEN HERIK, I.S. HERSCHB~.RG, AND
In 1989, R.M. Karp and Y. Zhang [17] proved J.W.H.M. UITERWIJK (eds.): Advances in Computer
that it is possible to obtain linear speedup on ev- Chess, Vol. 7, Univ. Limburg, 1994, pp. 203-218.
ery instance of a r a n d o m uniform minimax tree if
[14] FINKEL, R.A., AND FISHBURN, J.P.: 'Parallelism in
alpha-beta search', Artif. Intell. 19 (1982), 89-106.
the number of processors is close to the height of [15] HEWETT, a., AND KRISHNAMURTHY, G.: 'Consistent
the tree. linear speedup in parallel alpha-beta search': Proc.
See also" S h o r t e s t p a t h t r e e a l g o r i t h m s ; ICCI'92, Computing and Information, IEEE Computer
Directed tree networks; Bottleneck Steiner Soc. Press, 1992, pp. 237-240.
tree problems. [10] IBARAKI, T.: 'Generalization of alpha-beta and SSS*
search procedures', Artif. Intell. 29 (1986), 73-117.
[17] KARP, R.M., AND ZHANG, Y.: 'On parallel evaluation
of game trees': A CM Annual Syrup. Parallel Algorithms
References and Architectures (SPAA'89), ACM, 1989, pp. 409-
[1] AKL, S.G., BARNARD, D.T., AND DORAN, R.J.: 420.
'Searching game trees in parallel': Proc. 3rd Biennial [is] KNUTH, D.E., AND MOORE, R.W.: 'An analysis of
Conf. Canad. Soc. Computation Studies of Intelligence, alpha-beta pruning', Artif. Intell. 6, no. 4 (1975), 293-
1979, pp. 224-231.
283
Minimax game tree searching
284
Minimax theorems
285
Minimax theorems
in particular in convex analysis and also in the THEOREM 7 (1972) Let X be a nonempty set and
theory of monotone operators on a Banach space. Y be a nonempty compact topological space. Let
(See [32] for more details of these kinds of appli- f : X x Y -+ R be lower semicontinuous on Y.
cations.) Suppose that,
• for all Xl,X2 C X there exists x3 C X such
Minimax Theorems that Depend on Con-
that
n e c t e d n e s s . It was believed for some time that
.) )
proofs of minimax theorems required either the f (X3, ") > on Y.
- 2
fixed point machinery of algebraic topology, or the
functional-analytic machinery of convexity. How- Suppose also that, for all /k C R and, for all
ever, in 1959, W.-T. Wu proved the first mini- nonempty finite subsets W of X,
max theorem in which the conditions of convex- LE(W, i~) is connected in Y.
ity were totally replaced by conditions related to
connectedness. This line of research was continued Then
by H. Tuy, L.L. Stach6, M.A. Geraghty with B.-L. min sup f - sup n~n f.
Lin, and J. Kindler with R. Trost, whose results Y X X
286
Minimax theorems
We say that a family 7t of sets is pseudoconnected More recent work by Kindler ([12], [13] and [14])
if on abstract intersection theorems has been at the
interface between minimax theory and abstract set
H0, HI, H E 7t and H0 and HI joined by H
theory.
287
Minimax theorems
Minimax Inequalities for Two or More tures'. This question is discussed in [27] and
F u n c t i o n s . Motivated by Nash equilibrium and [28]. The relationship between Theorem 12 and
the theory of noncooperative games, Fan general- Brouwer's fixed point theorem is quite interest-
ized Theorem 2 to the case of more than one func- ing. As we have already pointed out, Sion's the-
tion. In particular, he proved in [3] the following orem, Theorem 3, can be proved in an elementary
two-function minimax inequality (since the com- fashion without recourse to fixed point related con-
pactness of X is not needed, this result can in fact cepts. On the other hand, Theorem 12 can, in fact,
be strengthened to include Sion's theorem, Theo- be used to prove Tychonoff's fixed point theorem,
rem 3, by taking g = f): which is itself a generalization of Brouwer's fixed
THEOREM 12 (1964) Let X and Y be nonempty point theorem. (See [3] for more details of this.)
compact, convex subsets of topological vector A number of authors have proved minimax in-
spaces and f,g: X × Y --+ R. Suppose that f is equalities for more than two functions. See [31] for
lower semicontinuous on Y and quasiconcave on more details of these results.
X, g is upper semicontinuous on X and quasicon-
vex on Y, and Coincidence Theorems. A coincidence theorem
is a theorem that asserts that if S : X --+ 2Y and
f<g o n X x Y.
T: Y --+ 2 x have nonempty values and satisfy cer-
Then tain other conditions, then there exist x0 E X and
min sup f < sup i~f g. Y0 E Y such that y0 E Sxo and x0 E Tyo. The con-
Y X -- X nection with minimax theorems is as follows: Sup-
D pose that infy supx f ~ supz infy f. Then there
exists A E R such that
Fan (unpublished)and Simons (see [27]) general-
ized K6nig's theorem, Theorem 4, with the follow- sup inf f < A < inf sup f.
X Y Y X
ing two-function minimax inequality:
Hence,
THEOREM 13 (1981) Let X be a nonempty set, Y
• for all x E X there exists y E Y such that
be a compact topological space and f, g: X × Y -~
f(x, y) < )~; and
R. Suppose that f is lower semicontinuous on Y,
and • for all y E Y there exists x E X such that
288
Minimax theorems
See also: Stochastic quasigradient meth- [is] KOMIYA, H.: 'On minimax theorems', Bull. Inst. Math.
ods in minimax problems; Stochastic pro- Acad. Sinica 17 (1989), 171-178.
gramming: Minimax approach; Minimax: [19] NEUMANN, J. VON: 'Zur Theorie der Gesellschaft-
spiele', Math. Ann. 100 (1928), 295-320.
Directional differentiability; Bilevel linear [20] NEUMANN, J. VON: 'Ueber ein 5konomisches Gle-
programming: Complexity, equivalence to ichungssystem und eine Verallgemeinerung des Brouw-
minmax, concave programs; Bilevel optimi- erschen Fixpunktsatzes', Ergebn. Math. Kolloq. Wien
zation: Feasibility test and flexibility in- 8 (1937), 73-83.
dex; Nondifferentiable optimization: Mini- [21] NEUMANN, J. VON: 'On the theory of games of strat-
egy', in A.W. TUCKER AND R.D. LUCE (eds.): Contri-
max problems. butions to the Theory of Games, Vol. 4, Princeton Univ.
Press, 1959, pp. 13-42.
References [22] NEUMANN, M.: 'Some unexpected applicatons of the
[1] FAN, K.: 'Fixed-point and minimax theorems in locally sandwich theorem': Proc. Conf. Optimization and Con-
convex topological linear spaces', Proc. Nat. Acad. Sci. vex Analysis, Univ. Mississippi, 1989.
USA 38 (1952), 121-126. [23] NEUMANN, M.: 'Generalized convexity and the Mazur-
[2] FAN, K.: 'Minimax theorems', Proc. Nat. Acad. Sci. Orlicz theorem': Proc. Orlicz Memorial Conf., Univ.
USA 39 (1953), 42-47. Mississippi, 1991.
[3] FAN, K.: 'Sur un th~or~me minimax', C.R. Acad. Sci. [24] PRYCE, J.D.: 'Weak compactness in locally convex
Paris 259 (1964), 3925-3928. spaces', Proc. Amer. Math. Soc. 17 (1966), 148-155.
[4] FAN, K.: 'A minimax inequality and its applications', [25] SIMONS, S.: 'Crit~res de faible compacit6 en termes du
in O. SHISHA (ed.): Inequalities, Vol. III, Acad. Press, th6or~me de minimax', Sdm. Choquet, no. 23 (1970/1),
1972, pp. 103-113. 8.
[5] GERAGHTY, M.A., AND LIN, B.-L.: 'Minimax theo- [26] SIMONS, S.: 'Maximinimax: minimax, and antiminimax
rems without linear structure', Linear Multilinear Al- theorems and a result of R.C. James', Pacific J. Math.
gebra 17 (1985), 171-180. 40 (1972), 709-718.
[6] GERAGHTY, M.A., AND LIN, B.-L.: 'Minimax theo- [27] SIMONS, S.: 'Minimax and variational inequalities:
rems without convexity', Contemp. Math. 52 (1986), Are they or fixed point or Hahn-Banach type?', in
102-108. 0. MOESCHLIN AND m. PALLASCHKE (eds.): Game
[7] GHOUILA-HOURI, M.A.: 'Le th~or~me minimax de Theory and Mathematical Economics, North-Holland,
Sion': Theory of games, English Univ. Press, 1966, 1981, pp. 379-388.
pp. 123-129. [2s] SIMONS, S.: 'Two-function minimax theorems and vari-
[8] IRLE, A.: 'Minimax theorems in convex situations', ational inequalities for functions on compact and non-
in O. MOESCHLIN AND D. PALLASCHKE (eds.): Game compact sets with some comments on fixed-points the-
Theory and Mathematical Economics, North-Holland, orems', Proc. Syrup. Pure Math. 45 (1986), 377-392.
1981, pp. 321-331. [29] SIMONS, S.: 'A flexible minimax theorem', Acta Math.
[9] JOO, I., AND STACHO, L.L." 'A note on Ky Fan's mini- Hungarica 63 (1994), 119-132.
max theorem', Acta Math. Acad. Sci. Hung. 39 (1982), [30] SIMONS, S.: 'Addendum to: A flexible minimax theo-
401-407. rem', Acta Math. Hungarica 69 (1995), 359-360.
[10] KAKUTANI, S.: 'A generalization of Brouwer's fixed- [31] SIMONS, S.: 'Minimax theorems and their proofs',
point theorem', Duke Math. J. 8 (1941), 457-459. in DING-ZHU DU AND PANOS M. PARDALOS (eds.):
[11] KINDLER, J.: 'On a minimax theorem of Terkelsen's', Minimax and Applications, Kluwer Acad. Publ., 1995,
Arch. Math. 55 (1990), 573-583. pp. 1-23.
[12] KINDLER, J.: 'Intersection theorems and minimax the- [32] SIMONS, S.: Minimax and monotonicity, Vol. 1693 of
orems based on connectedness', J. Math. Anal. Appl. Lecture Notes Math., Springer, 1998.
178 (1993), 529-546. [33] SION, M.: 'On general minimax theorems', Pacific J.
[13] KINDLER, J.: 'Intersecting sets in midset spaces. I', Math. 8 (1958), 171-176.
Arch. Math. 62 (1994), 49-57. [34] TERKELSEN, F.: 'Some minimax theorems', Math.
[14] KINDLER, J.: 'Intersecting sets in midset spaces. II', Scand. 31 (1972), 405-413.
Arch. Math. 62 (1994), 168-176. [35] YANOVSKAYA, E.B.: 'Infinite zero-sum two-person
[15] K(SNIG, H." 'Uber daN Von Neumannsche Minimax- games', J. Soviet Math. 2 (1974), 520-541.
Theorem', Arch. Math. 19 (1968), 482-487.
[16] KONIG, H.: 'On certain applications of the Hahn-
Banach and minimax theorems', Arch. Math. 21 Stephen Simons
(1970), 583-591. Dept. Math. Univ. California
[17] K~SNIG,H.: 'A general minimax theorem based on con- Santa Barbara, California 93106-3080, USA
nectedness', Arch. Math. 59 (1992), 55-64. E-mail address: simons@math, ucsb. edu
289
Minimax theorems
MSC 2000: 46A22, 49J35, 49J40, 54D05, 54H25, 55M20, Arc flow capacities can be removed by adding ad-
91A05 ditional source nodes, one for each capacitated arc
Key words and phrases: minimax theorem, fixed point the-
[19], [23].
orem, Hahn-Banach theorem, connectedness.
The fixed charge transportation problem
(FCTP) is a type of M C T P in which the cost
MINIMUM CONCAVE TRANSPORTATION function ¢ij (xij) for each arc (i, j) E A is of the
PROBLEMS, M C T P form
The m i n i m u m concave transportation problem _ ~0 ifxij -- O, (5)
¢ij(xij)
(MCTP) concerns the least cost method of carry- [ f ij -+-gij " xij if xij > 0,
ing flow on a bipartite network in which the mar-
where fij and gij are coefficients with fij >_ O.
ginal cost for an arc is a nonincreasing function of
F C T P s are commonly used to model network flow
the flow on that arc. A bipartite network contains
problems involving setup costs [9]. Furthermore,
source nodes and sink nodes, but no transshipment
a variety of combinatorial problems can be con-
(i.e., intermediate) nodes. The M C T P can be for-
verted to FCTPs. For instance, consider the 0-1
mulated as
knapsack problem (KP). The KP is formulated as
rain E ¢ij(xij) (1) n
290
Minimum concave transportation problems
FCTP by first converting the integer program to [7] FLOUDAS, C.A., AND PARDALOS, P.M.: A collection
a KP [10]. of test problems for constrained global optimization Al-
gorithms, Vol. 455 of Lecture Notes Computer Sci.,
Exact solution methods for the MCTP are pre-
Springer, 1990.
dominately branch and bound enumeration pro-
Is] GRAY, P.: 'Exact solution of the fixed-charge trans-
cedures [2], [3], [4], [6], [8], [11], [12], [15]. Binary portation problem', (]per. Res. 19 (1971), 1529-1538.
partitioning is used for the FCTP; and interval [9] GUISEWITE, G.M., AND PARDALOS, P.M.: 'Mini-
partitioning is used for the MCTP with arbitrary mum concave-cost network flow problems: Applica-
tions, complexity, and algorithms', Ann. Oper. Res. 25
concave arc cost functions. Finite convergence of
(1990), 75-100.
the method was shown by R.M. Soland [22]. The
[10] KENDALL, K.E., AND ZOINTS, S.: 'Solving integer pro-
convex envelope of the cost function ¢ij(Xij) is gramming problems by aggregating constraints', Oper.
an affine function. Hence, a subproblem in the Res. 25 (1977), 346-351.
branch and bound procedure can be solved effi- [11] KENNINGTON, J.: 'The fixed-charge transportation
ciently as a linear transportation problem (LTP) problem: A computational study with a branch-and-
bound code', AIIE Trans. 8 (1976), 241-247.
[1]. Fathoming techniques (such as 'up and down
[12] KENNINGTON, J., AND UNGER, V.E.: 'A new branch-
penalties' and 'capacity improvement') based on and-bound algorithm for the fixed charge transporta-
post-optimality analysis of the LTP facilitate the tion problem', Managem. Sci. 22 (1976), 1116-1126.
branch and bound procedure for the MCTP [2], [13] KHANG, D.B., AND FUJIWARA, O.: 'Approximate so-
[3], [18], [20]. The LTP is also used in approximate lutions of capacitated fixed-charge minimum cost net-
solution methods for the MCTP which rely on suc- work flow problems', Networks 21 (1991), 689-704.
[14] KIM, D., AND PARDALOS, P.M.: 'A solution approach
cessive linearizations of the concave cost function,
to the fixed charge network flow problem using a dy-
[5], [13], [14] namic slope scaling procedure', Oper. Res. Lett. 24
Test problems for the MCTP are given in [7], (1999), 195-203.
[8], [,2], [17], [20] [15] LAMAR, B.W.: 'An improved branch and bound algo-
rithm for minimum concave cost network flow prob-
See also: Concave programming; Bilevel
lems', J. Global Optim. 3 (1993), 261-287.
linear programming: Complexity, equiv- [16] LAMAR, B.W.: 'A method for solving network flow
alence to minmax, concave programs; problems with general nonlinear arc costs', in D.-Z.
Motzkin transposition theorem; Multi- Du AND P.M. PARDALOS (eds.): Network Optimization
index transportation problems; Stochastic Problems: Algorithms, Applications, and Complexity,
transportation and location problems. World Sci., 1993, pp. 147-167.
[17] LAMAR, B.W., AND WALLACE, C.A.: 'A comparison
of conditional penalties for the fixed charge transporta-
tion problem', Techn. Report Dept. Management Univ.
References Canterbury (1996).
[1] BALINSKI, M.L.: 'Fixed-cost transportation problems', [18] LAMAR, B.W., AND WALLACE, C.A.: 'Revised-
Naval Res. Logist. 8 (1961), 41-54. modified penalties for fixed charge transportation prob-
[2] BARR, R.S., GLOVER, F., AND KLINGMAN,D.: 'A new lems', Managem. Sci. 43 (1997), 1431-1436.
optimization method for large scale fixed charge trans- [19] LAWLER, E.L.: Combinatorial optimization: Networks
portation problems', Oper. Res. 29 (1981), 448-463. and matroids, Holt, Rinehart and Winston, 1976.
[3] BELL, G.B., AND LAMAR, S.W.: 'Solution methods [20] PALEKAR, U.S., KARWAN, M.H., AND ZIONTS, S.: 'A
for nonconvex network problems', in P.M. PARDALOS, branch-and-bound method for the fixed charge trans-
D.W. HEARS, AND W.W. HAGER (eds.): Network Op- portation problem', Managem. Sci. 36 (1990), 1092-
timization, Vol. 450 of Lecture Notes Economics and 1105.
Math. Systems, Springer, 1997, pp. 32-50. [21] RECH, P., AND BARTON, L.G.: 'A non-convex trans-
[4] CABOT, A.V., AND ERENGUC, S.S.: 'Some branch- portation algorithm', in E.M.L. BEALE (ed.): Applica-
and-bound procedures for fixed-cost transportation tions of Mathematical Programming Techniques, Eng-
problems', Naval Res. Logist. 31 (1984), 145-154. lish Univ. Press, 1970.
[5] DIABY, M.: 'Successive linear approximation procedure [22] SOLAND, R.M.: 'Optimal facility location with concave
for generalized fixed-charge transportation problems', costs', Oper. Res. 22 (1974), 373-382.
J. Oper. Res. Soc. 42 (1991), 991-1001. [23] WAGNER, H.M.: 'On a class of capacitated transporta-
[6] FLORIAN, M., AND ROBILLAND, P.: 'An implicit enu- tion problems', Managem. Sci. 5 (1959), 304-318.
meration algorithm for the concave cost network flow
problem', Managem. Sci. 18 (1971), 184-193.
291
Minimum concave transportation problems
Bruce W. Lamar
Minimize Z CijXij (1)
Economic and Decision Analysis Center
(i,j)EA
The MITRE Corp.
Bedford, MA 01730 USA subject to
E-mail address: bwlamar~mitre, org
2_, b(i), (2)
MSC2000: 90C26, 90C35, 90B06, 90B10 {j: (i,j)EA} {j: (j,i)eA}
Key words and phrases: flows in networks, global optimiza- for all i E N,
tion, nonconvex programming, fixed charge transportation
problem. lij <_ xij < uij, for all (i, j) E A. (3)
We refer to the constraints (2) as the mass bal-
ance constraints. For a fixed node i, the first term
in the constraint (2) represents the total outflow
MINIMUM COST FLOW PROBLEM
of node i and the second term represents the to-
The minimum cost flow problem seeks a least cost
tal inflow of node i. The mass balance constraints
shipment of a commodity through a network to
state that outflow minus inflow must equal the
satisfy demands at certain nodes by available sup-
supply/demand of each node. The flow must also
plies at other nodes. This problem has many, var-
satisfy the lower bound and capacity constraints
ied applications: the distribution of a product from
(3), which we refer to as flow bound constraints.
manufacturing plants to warehouses, or from ware-
This article is organized as follows. To help
houses to retailers; the flow of raw material and in-
in understanding the applicability of the mini-
termediate goods through various machining sta-
mum cost flow problem, we begin in Section 2 by
tions in a production line; the routing of automo-
describing several applications. In Section 3, we
biles through an urban street network; and the
present preliminary material needed in the subse-
routing of calls through the telephone system. The
quent sections. We next discuss algorithms for the
minimum cost flow problem also has many less di-
minimum cost flow problem, describing the cycle-
rect applications. In this article, we briefly intro-
canceling algorithm in Section 4 and the successive
duce the theory, algorithms and applications of the
shortest path algorithm in Section 5. The cycle-
minimum cost flow problem. [1] contains much ad-
canceling algorithm identifies negative cost cycles
ditional material on this topic.
in the network and augments flows along them.
Let G = (N, A) be a directed network defined
The successive shortest path algorithm augments
by a set N of n nodes and a set A of m directed
flow along shortest cost augmenting paths from the
arcs. Each arc (i, j) E A has an associated cost cij
supply nodes to the demand nodes. In Section 6,
that denotes the cost per unit flow on that arc. We
we describe the network simplex algorithm.
assume that the flow cost varies linearly with the
amount of flow. Each arc (i, j) E A has an associ-
A p p l i c a t i o n s . Minimum cost flow problems arise
ated capacity uij denoting the maximum amount
in almost all industries, including agriculture, com-
that can flow on this arc, and a lower bound lij
munications, defense, education, energy, health
that denotes the minimum amount that must flow
care, manufacturing, medicine, retailing, and
on the arc. We assume that the capacity and flow
transportation. Indeed, minimum cost flow prob-
lower bound for each arc (i, j) are integers. We as-
lems are pervasive in practice. In this section, by
sociate with each node i E N an integer b(i) rep-
considering a few selected applications that arise in
resenting its supply/demand. If b(i) > 0, node i is
distribution systems planning, capacity planning,
a supply node; if b(i) < 0, then node i is a demand
and vehicle routing, we give a passing glimpse of
node with a demand o f - b ( i ) ; and if b(i) = 0, then
these applications.
node i is a transshipment node. We assume that
~-~iEg b(i) - O. The decision variables xij are arc Distribution Problems. A large class of network
flows defined for each arc (i, j) E A. flow problems center around distribution applica-
The minimum cost flow problem is an optimi- tions. One core model is often described in terms
zation model formulated as follows: of shipments from plants to warehouses (or, alter-
292
Minimum cost flow problem
natively, from warehouses to retailers). Suppose a upon their flows to model contractual agreements
firm has p plants with known supplies and q ware- with shippers or capacities imposed upon any dis-
houses with known demands. It wishes to identify tribution channel. Finally, demand arcs connect
a flow that satisfies the demands at the warehouses retailer/model nodes to the retailer nodes. These
from the available supplies at the plants and that arcs have zero costs and positive lower bounds that
minimizes its shipping costs. This problem is a equal the demand of that model at that retail cen-
well-known special case of the minimum cost flow ter.
problem, known as the transportation problem. We pzlrr~
next describe in more detail a slight generalization
of this model that also incorporates manufacturing
costs at the plants. I~/~
iii) retailer/model nodes, corresponding to the Airplane Hopping Problem. A small commuter air-
models required by each retailer; and line uses a plane, with a capacity to carry at
iv) retailer nodes corresponding to each retailer. most p passengers, on a 'hopping flight' as shown
The network contains three types of arcs: in Fig. 2a). The hopping flight visits the cities
1 , . . . , n, in a fixed sequence. The plane can pick up
i) production arcs;
passengers at any node and drop them off at any
ii) transportation arcs; and other node. Let bij denote the number of passen-
iii) demand arcs. gers available at node i who want to go to node j,
The production arcs connect a plant node to a and let fij denote the fare per passenger from node
plant/model node; the cost of this arc is the cost of i to node j. The airline would like to determine the
producing the model at that plant. We might place number of passengers that the plane should carry
lower and upper bounds on production arcs to con- between the various origins to destinations in or-
trol for the minimum and maximum production der to maximize the total fare per trip while never
of each particular car model at the plants. Trans- exceeding the plane's capacity.
portation arcs connect plant/model nodes to re- Fig. 2b) shows a minimum cost flow formulation
tailer/model nodes; the cost of any such arc is the of this hopping plane flight problem. The network
total cost of shipping one car from the manufactur- contains data for only those arcs with nonzero
ing plant to the retail center. The transportation costs and with finite capacities: any arc listed with-
arcs might have lower or upper bounds imposed out an associated cost has a zero cost; any arc
293
Minimum cost flow problem
listed without an associated capacity has an infi- m a n problem arises in other settings as well; for
nite capacity. Consider, for example, node 1. Three instance, patrolling streets by police, routing street
types of passengers are available at node 1: those sweepers and household refuse collection vehicles,
whose destination is node 2, node 3 or node 4. fuel oil delivery to households, and spraying roads
We represent these three types of passengers in a with sand during snowstorms. The directed Chi-
new derived network by the nodes 1 - 2, 1 - 3 nese p o s t m a n problem assumes that all arcs are
and 1 - 4 with supplies b12, b13 and b14. A pas- directed, that is, the postal carrier can traverse an
senger available at any such node, say 1 - 3, could arc in only one direction (like one-way streets).
board the plane at its origin node represented by In the directed Chinese p o s t m a n problem, we
flowing through the arc (1 - 3, 1) and incurring a are interested in a closed (directed) walk that tra-
cost o f - f 1 3 units (or profit of f13 units). Or, the verses each arc of the network at least once. The
passenger might never board the plane, which we network might not contain any such walk. It is easy
represent by the flow through the arc ( 1 - 3 , 3). It is to show that a network contains a desired walk if
easy to establish a one-to-one correspondence be- and only if the network is strongly connected, t h a t
tween feasible flows in Fig. 2b) and feasible loading is, every node in the network is reachable from ev-
of the plane with passengers. Consequently, a min- ery other node via a directed path. Simple graph
imum cost flow in Fig. 2b) will prescribe a most search algorithms are able to determine whether
profitable loading of the plane. the network is strongly connected, and we shall
therefore assume that the network is strongly con-
{hi nected.
In an optimal walk, a postal carrier might tra-
verse arcs more t h a n once. The m i n i m u m length
walk minimizes the sum of lengths of the repeated
~4 b arcs. Let xij denote the number of times the postal
carrier traverses arc (i, j) in a walk. Any carrier
walk must satisfy the following conditions:
t~wt "--"~ -
- ,>, 0 (4)
{j: (i,j)EA} {j: (j,i)EA}
for all i E N,
•up.it7 xij _ 1 for all (i,j) E A. (5)
Fig. 2: Formulation of the hopping plane flight problem as The constraints (4) state that the carrier enters
a minimum cost flow problem. a node the same number of times that he or she
leaves it. The constraints (5) state that the car-
Directed Chinese Postman Problem. The directed rier must visit each arc at least once. Any solution
Chinese postman problem is a generic routing x satisfying the system (4)-(5) defines a carrier's
problem that can be stated as follows. In a di- walk. We can construct a walk in t h e following
rected network G = (Y, A) in which each arc (i, j) manner. Given a flow xij, we replace each arc (i, j)
has an associated cost cij, we wish to identify a with xij copies of the arc, each arc carrying a unit
walk of m i n i m u m cost that starts at some node flow. In the resulting network, say G ~ - (N, A~),
(the post office), visits each arc of the network at each node has the same number of outgoing arcs
least once, and returns to the starting point (see as it has the incoming arcs. It is possible to decom-
the next Section for the definition of a walk). This pose this network into at most m/2 arc-disjoint di-
problem has become known as the Chinese post- rected cycles (by walking along an arc (i, j) from
m a n problem because a Chinese mathematician, some node i with xij > 0, leaving an node each
K. Mei-Ko, first discussed it. The Chinese post- time we enter it until we repeat a node). We can
294
Minimum cost flow problem
connect these cycles together to form a closed walk A path is a walk without any repetition of nodes,
of the carrier. and a directed path is a directed walk without any
The preceding discussion shows that the solu- repetition of nodes. A cycle is a path il, i 2 , . . . , ir
tion x defined by a feasible walk for the carrier together with the arc (it, il) or (i~, it). A directed
satisfies conditions (4)-(5), and, conversely, every cycle is a directed path il, i 2 , . . . , ir together with
feasible solution of system (4)-(5) defines a walk the arc (it, il). A spanning tree of a directed graph
of the postman. The length of a walk defined by G is a subgraph G' - (N, A') with A' C_ A that is
the solution x equals ~-~(i,j)EA CijXij. This problem connected (that is, contains a path between every
is an instance of the minimum cost flow problem. pair of nodes) and contains no cycle.
295
Minimum cost flow problem
complexity terms include log C or log U or both). found a minimum cost flow. Fig. 3 specifies this
We say that an algorithm is a pseudopolynomial generic version of the cycle-canceling algorithm.
time algorithm if its worst-case running time is
BEGIN
bounded by a polynomial function of n, m and U. establish a feasible flow x in the network;
For example, an algorithm with worst-case com- WHILE G(x) contains a negative cycle DO
plexity of O(nm21ogn) is a strongly polynomial BEGIN
time algorithm, an algorithm with worst-case com- identify a negative cycle W;
plexity O(nm 2 log U) is a weakly polynomial time 5 := min{r,j : (i,j) E W};
augment 5 units of flow in the cycle W
algorithm, and an algorithm with worst-case com-
and update G(x);
plexity of O(n2mU) is a pseudopolynomial time END;
algorithm. END
can decrease the cost of the flow. Conversely, it is The numerical example shown in Fig. 4a) illus-
possible to show that if the residual network G(x*) trates the cycle-canceling algorithm. This figure
does not contain any negative cost cycle, then x* shows the arc costs and the starting feasible flow
must be an optimal flow. in the network. Each arc in the network has a ca-
The negative cycle optimality condition sug- pacity of 2 units. Fig. 4b) shows the residual net-
gests one simple algorithmic approach for solving work corresponding to the initial flow. We do not
the minimum cost flow problem, which we call the show the residual capacities of the arcs in Fig. 4b)
cycle-canceling algorithm. This algorithm main- since they are implicit in the network structure. If
tains a feasible solution and at every iteration im- the residual network contains both arcs (i, j) and
proves the objective function value. The algorithm (j, i) for any pair i and j of nodes, then both have
first establishes a feasible flow x in the network residual capacity equal to 1; and if the residual
by solving a related (and easily solved) problem network contains only one arc, then its capacity
known as the maximum flow problem. Then it it- is 2 (this observation uses the fact that each arc
eratively finds negative cycles in the residual net- capacity equals 2). The residual network shown in
work and augments flows on these cycles. The algo- Fig. 4b) contains a negative cycle 1 - 3 - 2 - 1
rithm terminates when the residual network con- with cost - 3 . By augmenting a unit flow along
tains no negative cost directed cycle. Theorem 1 this cycle, we obtain the residual network shown
implies that when the algorithm terminates, it has in Fig. 4c). The residual network shown in Fig. 4c)
296
Minimum cost flow problem
contains a negative cycle 6 - 4 - 5 - 6 with cost - 4 . i) a version that augments flow in arc-disjoint
We augment unit flow along this cycle, producing negative cycles with the maximum improve-
the residual network shown in Fig. 4d), which con- ment [2]; and
tain no negative cycle. Given the optimal residual ii) a version that augments flow along a nega-
network, we can determine optimal flow using the tive cycle with minimum mean cost, that is,
method described in the previous Section. the average cost per arc in the cycle [4]).
A byproduct of the cycle-canceling algorithm is
the following important result.
Successive S h o r t e s t P a t h A l g o r i t h m . The
THEOREM 2 (Integrality property) If all arc ca- cycle-canceling algorithm maintains feasibility of
pacities and supply/demands of nodes are integer, the solution at every step and attempts to achieve
then the minimum cost flow problem always has optimality. In contrast, the successive shortest
an integer minimum cost flow. D path algorithm maintains optimality of the solu-
This result follows from the fact that for problems tion at every step (that is, the condition that the
with integer arc capacities and integer node sup- residual network G(x) contains no negative cost
plies/demand, the cycle-canceling algorithm starts cycle) and strives to attain feasibility. It maintains
with an integer solution (which is provided by the a solution x, called a pseudoflow (see below), that
maximum flow algorithm used to obtain the initial is nonnegative and satisfies the arcs' flow capac-
feasible flow) and at each iteration augments flow ity restrictions, but violates the mass balance con-
by an integral amount. straints of the nodes. At each step, the algorithm
selects a node k with excess supply (i.e., supply
What is the worst-case computational require-
not yet sent to some demand node), a node l with
ment (complexity) of the cycle-canceling algo-
unfulfilled demand, and sends flow from node k to
rithm? The algorithm must repeatedly identify
node l along a shortest path in the residual net-
negative cycles in the residual network. We can
work. The algorithm terminates when the current
identify a negative cycle in the residual network in
solution satisfies all the mass balance constraints.
O(nm) time using a shortest path label-correcting
algorithm [1]. How many times must the generic To be more precise, a pseudo.flow is a vector
cycle-canceling algorithm perform this computa- x satisfying only the capacity and nonnegativity
tion? For the minimum cost flow problem, m C U constraints; it need not satisfy the mass balance
is an upper bound on the initial flow cost (since constraints. For any pseudoflow x, we define the
cij < C and xij <_ U for all (i, j) C A) and imbalance of node i as
- m C U is a lower bound on the optimal flow cost e(i) - b(i) + E xji- E xij (6)
(since cij >_ - C and xij ~ U for all (i,j) C {j,i)EA} {(i,j)EA}
A). Any iteration of the cycle-canceling algorithm
for all i E N.
changes the objective function value by an amount
~-~(i,j)ew ci,j)5, which is strictly negative. Since we If e(i) > 0 for some node i, then we refer to
have assumed that the problem has integral data, e(i) as the excess of node i; if e(i) < 0, then we
the algorithm terminates within O(mCU) itera- refer to -e(i) as the node's deficit. We refer to a
tions and runs in O(nm2CU) time, which is a pseu- node i with e(i) = 0 as balanced. Let E and D
dopolynomial running time. denote the sets of excess and deficit nodes in the
The generic version of the cycle-canceling al- network. Notice that EiEN e(i) - EicN b(i) - O,
gorithm does not specify the order for selecting which implies that ~-~i~E e(i) -- -- ~i~D e(i). Con-
negative cycles from the network. Different rules sequently, if the network contains an excess node,
for selecting negative cycles produce different ver- then it must also contain a deficit node. The resid-
sions of the algorithm, each with different worst- ual network corresponding to a pseudoflow is de-
case and theoretical behavior. Two versions of the fined in the same way that we define the residual
cycle-canceling algorithm are polynomial time im- network for a flow. The successive shortest path
plementations" algorithm uses the following result.
297
Minimum cost flow problem
298
Minimum cost flow problem
cost flow problem (such as the assignment prob- bounds. Consider the leaf node 4 (a leaf node is a
lem for which U = 1). Researchers have developed node with exactly one arc incident to it). Node 4
weakly polynomial time and strongly polynomial has a supply of 5 units and has only one arc (4, 2)
time versions of the successive shortest path algo- incident to it. Consequently, arc (4, 2) must carry 5
rithm; some notable implementations are due to units of flow. So we set x42 - 5, add 5 units to b(2)
[3] and [5]. (because it receives 5 units of flow sent from node
4), and delete arc (4, 2) from the tree. We now have
N e t w o r k S i m p l e x A l g o r i t h m . The network a tree with one fewer node and next select another
simplex algorithm for solving the minimum cost leaf node, node 5 with the supply of 5 units and
flow problem is an adaptation of the well-known the single arc (5, 2) incident to it. We set x52 = 5,
simplex method for general linear programs. Be- again add 5 units to b(2), and delete the arc (5, 2)
cause the minimum cost flow problem is a highly from the tree. Now node 2 becomes a leaf node
structured linear programming problem, when with modified s u p p l y / d e m a n d of b(5) = - 1 0 , im-
applied to it, the computations of the simplex plying that node 5 has an unfulfilled demand of 10
method become considerably streamlined. In fact, units. Node 2 has exactly one incoming arc (1, 2)
we need not explicitly maintain the matrix repre- and to meet the demand of 10 units of node 2,
sentation (known as the simplex tableau) of the we must send 10 units of flow on this arf. We set
linear program and can perform all of the compu- x12 = 10, subtract 10 units from b(1) (since node 1
tations directly on the network. Rather than pre- sends 10 units), and delete the arc (1,2) from the
senting the network simplex algorithm as a special tree. We repeat this process until we have identi-
case of the linear programming simplex method, fied flow on all arcs in the tree. Fig. 7b) shows the
we will develop it as a special case of the cycle- corresponding flow. Our discussion assumed that
canceling algorithm described above. The primary U is empty. If U were nonempty, we would first set
advantage of our approach is that it permits the xij " - Uij, add U i j to b(j), and subtract uij from
network simplex algorithm to be understood with- b(i) for each arc ( i , j ) E U, and then apply the
out relying on linear programming theory. preceding method.
The network simplex algorithm maintains solu-
tions called spanning tree solutions. A spanning
tree solution partitions the arc set A into three
bJ)
subsets:
1@
ii) L, the nontree arcs whose flows are restricted Fig. 7: Computing flows for a spanning tree.
to value zero;
iii) U, the nontree arcs whose flow values are re- We say a spanning tree structure is ]easible if
stricted in value to the arcs' flow capacities. its associated spanning tree solution satisfies all
of the arcs' flow bounds. We refer to a spanning
We refer to the triple (T, L, U) as a spanning tree
tree structure as optimal if its associated spanning
structure. Each spanning tree structure (T, L, U)
tree solution is an optimal solution of the min-
has a unique solution that satisfies the mass bal-
imum cost flow problem. We will now derive the
ance constraints (2). To determine this solution,
optimality conditions for a spanning tree structure
we set xij - 0 for all arcs (i, j) E L, xij - uij for
(T,L,U).
all arcs (i, j) E U, and then solve the mass balance
equations (2) to determine the flow values for arcs The network simplex algorithm augments flow
in T. along negative cycles. To identify negative cycles
To show that the flows on spanning tree arcs are quickly, we use the concept of node potentials. We
unique, we use a numerical example. Consider the define node potentials lr(i) so that the reduced
spanning tree T shown in Fig. 7a). Assume that cost for any arc in the spanning tree T is zero.
U = ~, that is, all nontree arcs are at their lower That is, that is, c~ - c i j - 7r(i) + 7r(j) - 0 for
299
Minimum cost flow problem
300
Minimum cost flow problem
T, all arcs in A define the set L, and U = 0. Since the tree, creating a cycle. Since (3, 5) is at its up-
these artificial arcs have large costs, subsequent per bound, the orientation of the cycle is opposite
iterations will drive the flow on these arcs to zero. to that of (3, 5). The arcs (1, 2) and (2, 5) are for-
BEGIN ward arcs in the cycle and arcs (3, 5) and (1, 3)
determine an initial feasible tree structure are backward arcs. The maximum increase in flow
(T,L,U); permitted by the arcs (3, 5), (1, 3), (1, 2), and (2, 5)
let x be the flow and without violating their upper and lower bounds is,
let r be the corresponding node potentials;
respectively, 3, 3, 2, and 1 units. Thus, we aug-
WHILE (some nontree arc violates
its optimality condition) DO ment 1 unit of flow along the cycle. The augmen-
BEGIN tation increases the flow on arcs (1,2) and (2, 5)
select an entering arc (k,l) by one unit and decreases the flow on arcs (1,3)
violating the optimality conditions; and (3, 5) by one unit. Arc (2, 5) reaches its upper
add arc (k, l) to the spanning tree T,
bound and we select it as the leaving arc. We up-
thus forming a unique cycle Wkz;
augment the maximum possible flow 5 in date the spanning tree structure; Fig. 10c) shows
the cycle Wkl and the new spanning tree T and the new node poten-
identify a leaving arc (p, q) that tials. The sets L and U become L = {(2, 3), (5, 4)}
reaches its lower or upper flow bound; and U = {(2, 5), (4, 6)}. In the next iteration, we
update the flow x, select arc (4, 6) since this arc violates the arc opti-
the spanning tree structure (T, L, U)
mality condition. We augment one unit flow along
and the potentials 7r;
END; the c y c l e 6 - 4 - 2 - 1 - 3 - 5 - 6 a n d a r c (3, 5)
END leaves the spanning tree. Fig. 10d) shows the next
spanning tree and the updated node potentials. All
Fig. 9: The network simplex algorithm.
nontree arcs satisfy the optimality conditions and
Given a spanning tree structure (T,L, U), we the algorithm terminates with an optimal solution
first check whether it satisfies the optimality con- of the minimum cost flow problem.
ditions (7) and (8). If yes, we stop; otherwise, we
select an arc (k,l) E L or (k,l) E U violating
its optimality condition as an entering arc to be
added to the tree T, obtain the fundamental cycle '%. I \ " 1 f"
Wkl induced by this arc, and augment the max- .1 .~
imum possible flow in the cycle Wkl without vi- (,) (~)
say arc (p, q), reaches its lower or upper bound; we .18
301
Minimum cost flow problem
operation. By choosing the right data structures [3] EDMONDS, J., AND KARP, R.M.: 'Theoretical improve-
for representing the tree T, it is possible to per- ments in algorithmic efficiency for network flow prob-
lems', J. A CM 19 (1972), 248-264.
form a pivot operation in O(m) time.
[4] GOLDBERG, A.V., AND TARJAN, R.E.: 'Finding
To determine the number of iterations per- minimum-cost circulations by canceling negative cy-
formed by the network simplex algorithm, we dis- cles': Proc. 20th A CM Symposium on the Theory of
tinguish two cases. We refer to a pivot operation Computing, 1988, pp. 388-397, Full paper: J. ACM 36
as nondegenerate if it augments a positive amount (1989), 873-886.
of flow in the cycle Wkl (that is, 5 > 0), and degen-
[5] ORLIN, J.B.: 'A faster strongly polynomial minimum
cost flow algorithm': Proc. 20th A CM Syrup. Theory of
erate otherwise (that is, ~ = 0). During a degen- Computing, 1988, pp. 377-387, Full paper: Oper. Res.
erate pivot, the cost of the spanning tree solution 41 (1989), 338-350.
decreases by Ic~llS. When combined with the inte- [6] ORLIN, J.B.: 'A polynomial time primal network sim-
grality of data assumption (Assumption 2) above), plex algorithm for minimum cost flows', Math. Pro-
gram. 78B (1997), 109-129.
this result yields a pseudopolynomial bound on the
number of nondegenerate iterations. However, de- Ravindra K. Ahuja
generate pivots do not decrease the cost of flow Dept. Industrial and Systems Engin. Univ. Florida
Gainesville, FL 32611, USA
and so are difficult to bound. There are methods to
E-mail address: ahuja©ufl.edu
bound the number of degenerate pivots. Obtaining
Thomas L. Magnanti
a polynomial bound on the number of iterations re- Sloan School of Management
mained an open problem for quite some time; [6] and
suggested an implementation of the network sim- Dept. Electrical Engin. and Computer Sci.
plex algorithm that runs in polynomial time. In Massachusetts Inst. Technol.
any event, the empirical performance of the net- Cambridge, MA 02139, USA
E-mail address: magnanti©mit.edu
work simplex algorithm is very attractive. Empir-
James B. Orlin
ically, it is one of the fastest known algorithms for
Sloan School of Management
solving the minimum cost flow problem. Massachusetts Inst. Technol.
See also: N o n c o n v e x n e t w o r k flow prob- Cambridge, MA 02139, USA
lems; Traffic n e t w o r k e q u i l i b r i u m ; N e t w o r k E-mail address: jorlinOmit.edu
location: C o v e r i n g p r o b l e m s ; M a x i m u m flow MSC 2000:90C35
problem; Shortest path tree algorithms; Key words and phrases: network, minimum cost flow prob-
S t e i n e r t r e e p r o b l e m s ; E q u i l i b r i u m net- lem, cycle-canceling algorithm, successive shortest path al-
works; S u r v i v a b l e networks; D i r e c t e d t r e e gorithm, network simplex algorithm.
networks; D y n a m i c traffic networks; Auc-
tion a l g o r i t h m s ; P i e c e w i s e linear n e t w o r k
flow p r o b l e m s ; C o m m u n i c a t i o n n e t w o r k as- MINLP: A P P L I C A T I O N IN FACILITY
s i g n m e n t p r o b l e m ; G e n e r a l i z e d networks; LOCATION-ALLOCATION
E v a c u a t i o n networks; N e t w o r k design p r o b - The location-allocation problem may be stated in
lems; S t o c h a s t i c n e t w o r k p r o b l e m s : Mas- the following general way: Given the location or
sively parallel solution; M u l t i c o m m o d i t y distribution of a set of customers which could be
flow p r o b l e m s ; N o n o r i e n t e d m u l t i c o m m o d - probabilistic and their associated demands for a
ity flow p r o b l e m s . given product or service, determine the optimal
locations for a number of service facilities and the
allocation of their products or services to the cos-
References tumers, so as to minimize total (expected) loca-
[1] AHUJA, R.K., MAGNANTI, T.L., AND ORLIN, J.B.: tion and transportation costs. This problem finds
Network flows: Theory, algorithms, and applications,
a variety of applications involving the location of
Prentice-Hall, 1993.
[2] BARAHONA,F., AND TARDOS, E.: 'Note of Weintraub's
warehouses, distribution centers, service and pro-
minimum cost circulation algorithm', SIAM J. Corn- duction facilities and emergency service facilities.
put. 18 (1989), 579-583. In the last section we are going to consider the de-
302
MINLP: Application in facility location-allocation
of centers required to serve the population. This s.t ~ wij - si, i -1, . . . , n,
objective is appropriate when the demand is ex- j=l II
303
MINLP: Application in facility location-allocation
eral services for which consumers choose their ser- from i to any other facility and zero otherwise.
vice facility center. The travel patterns of the con- Therefore, the Sij tends to OiXij and this model
sumers for example can produce a variety of al- allocates the demand to the nearest facility as the
locations that differ from the nearest center rule. original p-median problem.
In order to accommodate such behavior a spatial- All the models mentioned above consider the
interaction model is incorporated within the un- static location-allocation problem where all the ac-
capacitated p-median location-allocation model in tivities take place at one instance. These formula-
the following manner: tions are sufficient if neither the level nor the lo-
1 cation of demand alters over time. An important
min ~ ~ ~ Sij log(Sij - 1) factor however, in any location-allocation problem
j i
is the dynamics of the system involving demand
-~ ~ ~ Yj Sijcij
changes over time. Particularly, in the competi-
j i
tive environment, an optimal center location could
s.t. ~-~YjSij - Oi, i - 1,...,n,
become undesirable as new competing centers de-
J
velop. Potential directions include the literature
on decision making under uncertainty, [12]. A.J.
J
i=l,...,n, j=l,...,p, Scott [18] proposed a general framework for the
integration of the spatial and discrete temporal
Yj = O, 1, j=l,...,p,
dimensions in the location-allocation models. He
where the decision variables include Yj which takes proposed a modification of the location-allocation
the value of one if the facility is located at J mod- so as to minimize an aggregate weighted transport
els. and zero otherwise; cost over T time periods, during which time the
&j - AiOi]~ exp(-13cij ) number nt, level Oit and the location (xit, yit) of
the demand points change. If the locations were
that defines the interaction of facility i and con- greatly different the center would be likely to re-
sumer j. locate at some time and costs of relocation are
1 included in the model. It was assumed that when
Ai = ~--~lYl exp(-/3cil)' i - 1 , . . . , m, a center relocates it incurs a fixed cost, a. Based
that ensures that the sum of all outflows from the on these ideas the formulation proposed for the
origin i add up to the amount of demand at that lo- uncapacitated location-allocation problem has the
cation;/3 is either calibrated to match some known following form:
interaction data or is defined exogenously. The fol- nl
304
MINLP: Application in facility location-allocation
ous period's facility locations in order to minimize ear distance location problem always has an
the present period cost. This strategy is appropri- optimal solution with the sources located at
ate whenever the period durations are sufficiently the grid points of the vertical and horizontal
long or under uncertainty regarding future data lines drawn through the existing customer lo-
or decisions. An alternative approach proposed in cations; and
[24] is a discounted present worth strategy which b) the optimal source locations lie in the convex
is appropriate whenever the foregoing conditions hull of the existing facility locations.
do not hold. In this case the facilities are being
Based on these ideas and by denoting k = 1 , . . . , K
located one per period and the decisions are made
the intersection grid points that also belong to the
in a rolling horizon framework.
convex hull of the existing facility locations, [21],
introduced the decision binary variables zik that
S o l u t i o n A p p r o a c h e s . For the uncapacitated take the value of 1 if source i is located at point
location-allocation problem using Euclidean met- k and zero otherwise. This leads to the following
ric for the distances between each facility and the discrete location-allocation problem:
different demand points, R.F. Love and H. Juel [15] n p K
showed that this problem is equivalent to a concave min Z ~ ~ CijkWijZik
minimization problem for which they used several i--1 j = l k = l
K
heuristic procedures. For the capacitated problems
s.t. ~ Zik - - 1 , i --1,... ,n,
assuming that the costs are proportional to Iq us-
k=l
ing lp distances where p >_ 1 and q > 1 are integers, P
M. Avriel [1] developed a geometric programming Z Wij -- Si,
j=l
i -- 1 , . . . , n,
approach. H.D. Sherali and C.M. Shetty [22] pro- n
posed a polar cutting plane algorithm for the case w~j-dj, j-1,...,p,
p - q - 1. For the case p - q - 2, Sherali and C.H. i=1
Tuncbilek [23] proposed a branch and bound algo- w~j>_O, i-1,...,n, j-1,...,p,
rithm (cf. M I N L P : B r a n c h a n d b o u n d m e t h - Zik -- O, 1, i--1,...,n,
ods; M I N L P : B r a n c h a n d b o u n d global op-
t i m i z a t i o n a l g o r i t h m ) that utilizes a specialized where Cijk -- cij[lak - a j [ + I~k --/3jl]" The above
model corresponds to a mixed integer bilinear pro-
tight, linear programming representation to calcu-
gramming problem. See [19] for a related version
late strong upper bounds via a Lagrangian relax-
of this discrete-site location-allocation problem
ation scheme. They exploit the special structure
involving one-to-one assignment restriction and
of the transportation constraints to derive a parti-
fixed charges. See [20] for the solution of the prob-
tioning scheme. Additional cut-set inequalities are
lem as a bilinear programming problem, since the
also incorporated to preserve partial solution.
binary variables z can be treated as positive vari-
For the uncapacitated location-allocation model
ables because of the problem structure that pre-
using rectilinear distance metric Love and J.G.
serves the binariness of z at optimality. However,
Morris [16] have developed an exact two-stage al-
in [21] it is proved that it is more useful to exploit
gorithm. R.E. Kuenne and R.M. Soland [14], have
the binary nature of z variables for the efficient so-
developed a branch and bound algorithm based on
lution of the above model. Before giving more de-
a constructive assignment of customers to sources.
tails of this proposed branch and bound based ap-
The capacitated problem has been addressed in
proach we should mention the heuristic approach
[19], [21] and utilize the discrete equivalence of the
proposed in [4], which is very widely used. This so-
capacitated location-allocation problem. In partic-
called alternating procedure exploited the funda-
ular, [8], and [26] showed that
mental concepts of the location-allocation problem
a) the optimal values of xi and Yi for each i and simply involves allocating demand to centers
must satisfy xi - a j for some j and yi = 3j and relocating centers until some convergence cri-
for some j, which means that the rectilin- terion is achieved. For the uncapacitated p-median
305
MINLP: Application in facility location-allocation
problem, the alternating procedure involves iterat- lower bounds via a suitable Lagrangian dual for-
ing through the following equations: mulation.
Briefly, for the location-allocation problems
~in=l O i ~ i j X i / C i j
= '
that have embedded spatial-interaction equations
dual-based exact methods, [17], and heuristic ap-
~in=l Oi)~ijYi/Cij
proaches, [2], have been developed.
YJ -- ~in=l Oi,'~ij/aij '
- - + (yj - is horizontal
zik -- 0,1, Vi,
Euclidean distance between well i and platform j,
x jk > 0, v(i,j, k),
g(dij) denotes the drilling cost function that de-
where uij -- min{si, dj }. The above model cor- pends on distance dij, P ( S j , xj, yj) is the platform
responds to a mixed integer linear programming cost which is a function of platform size Sj and
problem for which a special branch and bound al- its location. Based on this notation the location-
gorithm is applied based on the derivation of tight allocation problem can be formulated as follows:
306
MINLP: Application in facility location-allocation
m n n m n n
ing locations and for given locations, the so!ution erwise and the problem for platform j takes the
cannot be improved by altering the assignment of form:
wells to platforms. The mathematical formulation m
307
MINLP: Application in facility location-allocation
Note that the platform cost is a function of plat- n e t w o r k synthesis; M I N L P : Reactive dis-
form location only since the size is assumed known. tillation column synthesis; M I N L P : Design
Since the drilling cost function is convex, if the and scheduling of b a t c h processes; M I N L P :
platform cost is also convex then the problem cor- Applications in t h e i n t e r a c t i o n of design
responds to the minimization of a convex function and control; Generalized B e n d e r s decom-
that can be achieved through a local minimiza- position; M I N L P : Applications in blending
tion algorithm. Of course if the platform cost is and pooling problems.
nonconvex then global optimality cannot be guar-
anteed and global optimization techniques should References
be considered, [7]. [1] AVRIEL, M.: 'A geometric programming approach to
Finally, M.D. Devine and W.G. Lesso, [6], ap- the solution of locational problems', J. Reg. Sci. 20
plied the aforementioned procedure to two test (1980), 239-246.
[2] BEAUMONT, J.R.: 'Spatial interaction models and the
problems one involving 60 wells and 7 platforms
location-allocation problem', J. Reg. Sci. 20 (1980),
and a second one involving 102 wells and 3 plat- 37-50.
forms. In both cases they reported large economic [3] CAVALIER, T.M., AND SHERALI, H.D.: 'Sequential
savings in the field development. location-allocation problems on chains and trees with
See also: C o m b i n a t o r i a l o p t i m i z a t i o n al- probabilistic link demands', Math. Program. 32 (1985),
249-277.
g o r i t h m s in resource allocation problems;
[4] COOPER, L.: 'Heuristic methods for location-allocation
Optimizing facility location with rectilin- problems', SIAM Rev. 6 (1964), 37-53.
ear distances; Single facility location: Multi- [5] CRISTALLER, W.: Central places in southern Germany,
objective Euclidean distance location; Sin- Prentice-Hall, 1966.
gle facility location: M u l t i - o b j e c t i v e recti- [6] DEVINE, M.D., AND LESSO, W.G.: 'Models for the
minimum cost development of offshore oil fields', Man-
linear distance location; Single facility lo-
agem. Sci. 18 (1972), 378-387.
cation: Circle covering problem; Multifacil- [7] FLOUDAS, C.A.: 'Deterministic global optimization in
ity and r e s t r i c t e d location problems; Net- design, control, and computational chemistry', IMA
work location: Covering problems; Ware- Proc.: Large Scale Optimization with Applications.
house location problem; Facility location Part H: Optimal Design and Control 93 (1997), 129-
with externalities; P r o d u c t i o n - d i s t r i b u t i o n 184.
[8] FRANCIS, R.L., AND WHITE, J.A.: Facility layout and
s y s t e m design problem; Global optimiza-
location: An analytical approach, Prentice-Hall, 1974.
tion in W e b e r ' s p r o b l e m with a t t r a c t i o n [9] FRIEDRICH, C.J.: Alfred Weber's theory of the location
and repulsion; Facility location with stair- of industries, Univ. Chicago Press, 1929.
case costs; Stochastic t r a n s p o r t a t i o n and [1o] GETIS, A., AND GETIS, J.: 'Cristaller's central place
location problems; Facility location prob- theory', J. Geography 65 (1966), 200-226.
lems with spatial interaction; Voronoi di- [11] HUBBARD, M.J.: 'A review of selected factors condi-
tioning consumer travel behavior', J. Consumer Res. 5
a g r a m s in facility location; Resource allo- (1978), 1-21.
cation for epidemic control; C o m p e t i t i v e [12] IERAPETRITOU, M.G., ACEVEDO, J., AND PIS-
facility location; Chemical process plan- TIKOPOULOS, E.N.: 'An optimization approach for pro-
ning; Mixed integer linear p r o g r a m m i n g : cess engineering problems under uncertainty', Comput-
Mass and heat exchanger networks; Mixed ers Chem. Engin. 20 (1996), 703-709.
integer nonlinear p r o g r a m m i n g ; M I N L P "
[13] KOSHAKA, R.E.: 'A central-place model as a two-
level location-allocation system', Environm. Plan. 15
O u t e r a p p r o x i m a t i o n algorithm; General- (1983), 5-14.
ized o u t e r approximation; M I N L P : Gener- [14] KUENNE, R.E., AND SOLAND, R.M.: 'Exact and ap-
alized cross decomposition; E x t e n d e d cut- proximate solutions to the multisource Weber prob-
ting plane algorithm; M I N L P : Logic-based lem', Math. Program. 3 (1972), 193-209.
methods; M I N L P : B r a n c h and b o u n d m e t h - [15] LOVE, R.F., AND JUEL, H.: 'Properties and solu-
tion mathods for large location-allocation problems',
ods; M I N L P : B r a n c h and b o u n d global opti- J. Oper. Res. Soc. 33 (1982), 443-452.
mization algorithm; M I N L P : Global optimi- [16] LOVE, R.F., AND MORRIS, J.G.: 'A computational
zation with aBB; M I N L P : H e a t exchanger procedure for the exact solution of location-allocation
308
MINLP: Applications in blending and pooling problems
problems with rectangular distances', Naval Res. Lo- store the intermediate streams produced by vari-
gist. Quart. 22 (1975), 441-453. ous processes. Also, chemical products often need
[17] O'KELLY, M.: 'Spatial interaction based location- to be t r a n s p o r t e d as a mixture, either in a pipeline,
allocation models', in A. GHOSH AND G. RUSHTON
a t a n k car or a tanker. In each case, blended or
(eds.): Spatial Analysis and Location Allocation Mod-
els, v. Nostrand, 1987, pp. 302-326. pooled streams are t h e n used in further down-
[ls] SCOTT, A.J.: 'Dynamic location-allocation systems: s t r e a m processing. In modeling these processes, it
Some basic planning strategies', Environm. Plan. 3 is necessary to model not only p r o d u c t flows but
(1971), 73-82. the properties of intermediate streams as well. The
[19] SHERALI, A.D., AND ADAMS, W.P.: 'A decomposition
presence of these pools can introduce nonlineari-
algorithm for a discrete location-allocation problem',
Oper. Res. 32 (1984), 878-900. ties and nonconvexities in the model of the process,
[20] SHERALI, A.D., AND ALAMEDDINE, A.R.: 'A new resulting in difficult problems with multiple local
reformulation-linearization technique for the bilinear optima.
programming problems', J. Global Optim. 2 (1992),
379-410.
[21] SHERALI, A.D., RAMACHANDRAN,S., AND KIM, S.: 'A
localization and reformulation discrete programming
[22]
approach for the rectilinear discrete location-allocation
problem', Discrete Appl. Math. 49 (1994), 357-378.
SHERALI, A.D., AND SHETTY, C.M.: 'The rectilinear
A, ......... ...... PoOtS~j~~i:;;:~
s D,
"'-. 1 .-'""" .'""
distance location-allocation problem', AIIE Trans. 9
(1977), 136-143. Co ts
[23] SHERALI, A.D., AND TUNCBILEK, C.H.: 'A squared-
Euclidean distance location-allocation problem', Naval A~ O~
Res. Logist. 39 (1992), 447-469.
[24] SHERALI, H.D.: 'Capacitated, balanced, sequential
location-allocation problems on chain and trees', Math.
Program. 49 (1991), 381-396.
[25] SHERALI, H.D., AND NORDAI, F.L.: 'NP-hard, capac-
itated, balanced p-median problems on a chain graph
~ D~
with a continuum of link demands', Math. Oper. Res.
13 (1988), 32-49.
[26] WENDELL, R.E., AND HURTER, A.P.: 'Location the- Fig. 1: General pooling and blending problem.
ory, dominance and convexity', Oper. Res. 21 (1973),
314-320.
Marianthi Ierapetritou
Dept. Chemical and Biochemical Engin. Rutgers Univ.
98 Brett Road
Piscataway, NJ 08854, USA
E-mail address: mariemth@sol.rutgers.edu Given a set of c o m p o n e n t s i, a set of products
Christodoulos A. Floudas j, a set of pools k and a set of qualities l, let xil
Dept. Chemical Engin. Princeton Univ. be the a m o u n t of c o m p o n e n t i allocated to pool l,
Princeton, NJ 08544-5263, USA Ytj be the a m o u n t going from pool 1 to p r o d u c t j,
E-mail address: floudasOtitan, princeton, edu zij be the a m o u n t of c o m p o n e n t i going directly
MSC 2000:90C26 to p r o d u c t j and Ptk be the level of quality k in
Key words and phrases: MINLP, facility location- pool l. F u r t h e r m o r e , let Ai, D j and St be u p p e r
allocation. b o u n d s for c o m p o n e n t availabilities, p r o d u c t de-
m a n d s and pool sizes respectively, let Cik be the
level of quality k in c o m p o n e n t i, Pjk be upper
MINLP: A P P L I C A T I O N S IN B L E N D I N G b o u n d s on p r o d u c t qualities, ci be the unit price
A N D POOLING P R O B L E M S of c o m p o n e n t i and dj be the unit price of prod-
Pooling and blending is inherent in m a n y manu- uct j. T h e general pooling and blending model can
facturing plants with limited tankage available to then be w r i t t e n as [1]:
309
MINLP: Applications in blending and pooling problems
E ylJ + E z i J ~_ Dj
l i m&x 9. (Yll + z31) + 15. (Y12 + z32)
- 6 X l l - 13x21 - 10. (z31 + z32)
i j
s.t. x11+x21-Yll-Y12 =0
xil <_ Sl
i P ' Y l l + 2z31 - 2.5(yll + z31) _ 0
-- E
i
CikXil + Plk EJ YlJ - 0 P'Yl2 + 2z32 - 1.5(Y12 + z32) _ 0
P" (Yll + Y12) - 3Xll - x21 = 0
E ( P , k - Pjk )Y,j Yll + Z31 ~ 100
l
+ - Pjk)z j < o Y12 + Z32 ~ 200.
i
xil , YIj , zij , Plk ~_ O. The variable p represents the sulfur content of the
pool (and of yll and y12) and is determined as an
The first two sets of constraints ensure that the average of the sulfur contents of Xll and x21.
amount of components used and products made do
not exceed the respective availabilities or demands.
The third and fourth set of constraints are mate-
rial balance constraints around each pool, which
ensure that there is no accumulation or overflow of
material in the pools. The fifth set of constraints
relates the quality of each pool to the quality of
the components going into the pool (in this case,
the qualities are assumed to blend linearly, that ° ° . . . . . . -*-
310
MINLP: Applications in blending and pooling problems
3) a global maximum of 750 at p - 1.5 with Then, all specifications on the blend RVP can be
Xll - 50, x21 - 150, y12 - 200 and all other converted using the same index. For example, if
variables zero. there is a lower bound R L on the blend RVP, then
using the blending index results in the constraints
It is not uncommon for a large pooling problem
as:
to have many dozen local optima, with the objec-
tive function varying by small amounts but with
all the flow and quality variables taking on vastly
different values.
In some cases, the properties (such as octane num-
800.0
ber or pour point) can require complex blending
J rules which cannot be simplified using the blend-
ing index, and the full nonlinear blending equation
600.0 must be included in the model as is.
R-
- R1.25. model can decide if it is worthwhile to pro-
311
MINLP: Applications in blending and pooling problems
duce stock to sell at the end of the final pe- used extensively in the practical solution of these
riod. problems in industry.
Logical Constraints and M I N L P Formulations. It Complexity of Models. With the various options of
is often necessary to impose additional logical con- single versus multiperiod and linear versus nonlin-
straints that dictate how various components are ear blending, the models for pooling and blending
to be blended in relation to each other. Modeling can vary significantly in complexity. This is shown
such constraints often requires the addition of in- pictorially in Fig. 4.
teger variables, as discussed below.
~- g
a) If a component is to be used in a particu-
lar blend, then it must be present in at least
a certain amount in the blend. This arises
from the fact that it is usually not practical o
._N
to blend in infinitesimally small quantities.
If x represents the volume of such a compo- Simple Complex
Complexity of Process
nent, then introducing a new binary variable
5 (i.e. 5 is either 0 or 1) and the constraints Fig. 4: Types of pooling problems.
x- M 6 < 0,
x-mS>O S o l u t i o n M e t h o d s . Pooling problems can be
are sufficient to ensure this condition is sat- solved using a variety of solution algorithms. These
isfied. Here, M is a sufficiently large number, can be broadly classified as local and global solu-
while m represents the threshold value below tion methods.
which a component should not be blended in. Local Optimization Approaches. Traditionally,
b) Each product can have at most k compo- pooling and blending problems have been solved
nents in its blend. This is typically imposed using various recursion and successive linear pro-
by limitations on how many streams can be gramming (SLP) techniques. The first published
physically blended in a reasonable amount of approach for solving the pooling problem was due
time. Again, introducing the new variables to Haverly [8], who proposed the following recur-
and constraints as below" sion approach for solving the problem given in
xl - m61 > 0, Fig. 2:
° . o 1 Start with a guess for the pool quality p.
2 Solve the remaining linear problem for all other
Xn - - m~n > O, variables.
61 -{- " " " -~- 6n ~ k, 3 Calculate a new value for p from the solution in
2).
(~1,. • • ,(~n E { 0 - 1}n,
Unfortunately, this rather simple recursion will
ensures this condition is met. converge to a suboptimal solution regardless of
c) If component ,4 is to be present in the blend, the starting value for p. This can be partially
then component B must also be present: addressed by using a 'distributed recursion' ap-
xA -- m6A >_ O, proach, where an additional recursion coefficient
f and two additional 'correction vectors' are in-
x s -- m6B >_ 0,
troduced, modifying the inequalities in the model
(~S ~_ (~A. as follows:
Each of these logical constraints results in a mixed
P'Y11 + 2z31 - 2.5(yll + z31)
integer nonlinear programming (MINLP) model
(cf. also M i x e d i n t e g e r n o n l i n e a r p r o g r a m - + f (over - under) < 0,
ming). To date (2000), such models have not been P'y12 + 2z32 - 1.5(y12 + z32)
312
MINLP: Applications in blending and pooling problems
313
MINLP: Applications in blending and pooling problems
314
MINLP: Applications in the interaction o] design and control
[4] BAKER, T.E., AND LASDON, L.S.: 'Successive linear gineers develop and synthesize the structure of the
programming at Exxon', Managem. Sci. 31, no. 3 flowsheet and determine the operating parameters
(1994), 264. and steady-state operating conditions. Then, the
[5] FLOUDAS, C.A., AND VISWESWARAN,V.: 'A global op-
control engineer takes the fixed design and devel-
timization algorithm (GOP) for certain classes of non-
convex NLPs: I. Theory', Computers Chem. Engin. 14 ops a control system to maintain the system at
(1990), 1397. the desired specifications. During the first step, the
[6] FLOUDAS, C.A., AND VISWESWARAN, V.: 'A primal- dynamic operation of the process is generally not
relaxed dual global optimization approach', J. Optim. considered, and in the second step, changes to the
Th. Appl. 78, no. 2 (1993), 187.
flowsheet and operating conditions generally can
[7] FOULDS, L.R., HAUGLAND, D., AND JSRNSTEN, K.:
'A bilinear approach to the pooling problem', Chr. not be made.
Michelsen Inst. Working Paper 90, no. 3 (1990). Process design seeks to determine the arrange-
[8] HAVERLY, C.A.: 'Studies of the behaviour of recur- ment of processing units that will convert the given
sion for the pooling problem', A CM SIGMAP Bull. 25 raw materials into the desired products. The idea
(1978), 19.
is to develop a process flowsheet from the large
[9] HAVERLY, C.A.: 'Behaviour of recursion model-more
studies', ACM SIGMAP Bull. 26 (1979), 22. number of possible design alternatives. Numerous
[10] HORST, R., AND TUY, H.: Global optimization: Deter- process design methods and techniques exist for
ministic approaches, second ed., Springer, 1993. determining the best process flowsheet and oper-
[11] LASDON, L.S., WAREN, A.D., SARKAR, S., AND ating conditions. This best design is determined by
PALACIOS-GOMEZ, F.: 'Solving the pooling problem
optimizing some economic criteria and the quality
using generalized reduced gradient and successive lin-
ear programming algorithms', ACM SIGMAP Bull. 27 of the design is based on its economic value. Hence,
(1979), 9. the process is designed to operate at steady state
[12] PARDALOS, P.M., AND ROSEN, J.B.: Constrained and issues relating to the process dynamics, oper-
global optimization: Algorithms and applications, ability, and controllability are usually not consid-
Vol. 268 of Lecture Notes Computer Sci., Springer, ered.
1987.
[13] VISWESWARAN, V., AND FLOUDAS, C.A.: 'Computa- Once the process has been designed, the plans
tional results for an efficient implementation of the are handed over to the process control engineer
GOP algorithm and its variants', in I.E. GRoss- whose task is to ensure the stable dynamic per-
MANN (ed.): Global Optimization in Engineering De- formance of the process. The control engineer is
sign, Nonconvex Optim. Appl., Kluwer Acad. Publ., concerned with developing a control system which
1996, pp. 111-154.
maintains the operation of the process at the de-
[14] VISWESWARAN, V., AND FLOUDAS, C.A.: 'New for-
mulations and branching strategies for the GOP al- sired steady state in the presence ever-changing ex-
gorithm', in I.E. GROSSMANN(ed.): Global Optimiza- ternal influences. Issues such as disturbances, un-
tion in Engineering Design, Kluwer Acad. Publ., 1996, certainty, and changes in production rates must be
pp. 75-110. addressed so as to maintain product quality and
Viswanathan Visweswaran safe operation. By addressing the design and con-
SCA Technologies LLC trol sequentially, the inherent connection between
Pittsburgh, PA, USA the two is neglected. For instance, the steady-state
E-mail address: vishy, visweswaran@sca-tech, corn design of a process may appear to produce great
MSC2000: 90C90, 90C30 economic profits. However, unfavorable dynamic
Key words and phrases: pooling, blending, multiperiod op- operation may lead to a product which does not
timization. meet the required specifications. This may result
in an economic loss due to disposal or reworking
costs. Thus, a process design with good control-
M I N L P : APPLICATIONS IN THE INTER- lability aspects may have better economic value
ACTION OF DESIGN AND CONTROL that an economically optimal steady state design
In the development of a process, the steady state when the dynamic operation is considered. This
design aspects and dynamic operability issues are trade-off between the steady state design and the
usually handled sequentially. First, the design en- dynamic controllability motivates the treatment of
315
MINLP: Applications in the interaction of design and control
316
MINLP: Applications in the interaction of design and control
from the disturbances-free operating point to the The dynamic controllability is measured econom-
feasible operating point can be determined and ically by calculating the amount of material pro-
thus the cost of the disturbance can be evaluated. duced that is off-specification and on-specification.
This concept is illustrated in Fig. 1. Point A indi- The on-specification material leads to profits while
cates the nominal steady-state design, and point the off-spec material results in costs for reworking
B is the back-off point which corresponds to the or disposal.
design which will not violate the constraints hi A back-off technique was also developed in [1]
and h2 in the presence of uncertainties and distur- for the design of steady-state and open-loop dy-
bances. namic processes. Both uncertainties and distur-
The method is further developed in [17], where bances are considered for determining the amount
the control structure selection problem is analyzed. of back-off. In order to address the fact that back-
Perfect control assumptions are used along with a off approaches address the feasible operation and
linearized model to formulate a mixed integer lin- do not address controllability aspects, [5] intro-
ear program (MILP) where the integer variables duces a recovery factor which is defined as the ratio
indicate the pairings between the manipulated and of the amount of penalty recovered with control to
controlled variables. The back-off approach incor- the penalty with no control. This ratio is then used
porated the dynamic operation of the process into to rank different control strategies.
the design, but it only ensures the feasible opera- The advantage of the back-off approaches is that
tion of the process and does not directly address they determine the cost increase associated with
controllability aspects. moving to the back-off position which is attrib-
An approach for determining process designs uted to the uncertainties and disturbances. A lim-
which are both steady-state and operationally itation of this approach is that it can lead to rather
optimal was presented in [2]. The controllabil- conservative designs since the worst-case uncer-
ity of potential designs is evaluated along with tainty scenario is considered. Although the proba-
their economic performance by incorporating a bility of the worst-case uncertainty occurring may
model predictive control algorithm into the pro- not be high, this is the basis for the "final design.
cess design optimization algorithm. This coordi- Also, the method has not been applied to the de-
nated approach uses an objective function which sign/synthesis problem. A fixed design is consid-
is a weighted sum of economic and controllability ered and then the back-off is considered as a mod-
measures. ification of this design.
A multi-objective approach was proposed in [9], The optimal design of dynamic systems under
[10] to simultaneously consider both controllabil- uncertainty was addressed in [13]. Flexibility as-
ity and economic aspects of the design. This ap- pects as well as the control design were considered
proach incorporates both design and control as- simultaneously with the process design. The algo-
pects into a process synthesis framework where rithm is used to find the economic optimum which
the trade-offs between various open-loop control- satisfies all of the constraints for a given set of
lability measures and the economics of the process uncertainties and disturbances when the control
can be observed. The problem is formulated as a system is included.
mixed integer nonlinear program (MINLP), where S. Walsh and Perkins [23] outline the use of op-
integer variables are utilized for structural alter- timization as a tool for the design/control prob-
natives in the process flowsheet. Through the ap- lem. They note that the advances in computa-
plication of multi-objective techniques, a process tional hardware and optimization tools have made
design which is both economic and controllable is it possible to solve the complex problems that arise
determined. in design/control. Their assessment focuses on the
A screening approach was proposed in [4], where control structure selection problem where the eco-
the variability in the product quality is used nomic cost of a disturbance is balanced against the
to compare different steady-state process designs. performance of the controller.
317
MINLP: Applications in the interaction o/ design and control
The increasing importance of design and control are used to represent structural alternatives such
issues had lead to more and more discussion on the as the existence of process units. The modeling of
topic. One contribution to the area has been [11]. steady-state processes leads to algebraic equations
The fundamental design and control concepts are and constraints and results in an MINLP. When
described and several quantitative examples are dynamic models are to be used, the continuous
given which illustrate the interaction of design and variables are partitioned into dynamic state vari-
control. ables, control variables, and time invariant vari-
Most of the previous work does not address ables, and the resulting formulation is classified as
synthesis issues and does not treat the problem a mixed integer optimal control problem (MIOCP).
quantitatively. Two methods employ the optimi-
zation approach in process synthesis to arrive S t e a d y - S t a t e M o d e l i n g A p p r o a c h . This ap-
at mathematical programming formulations which proach was outlined in [9], [10] and follows the op-
are solved to determine the trade-offs between the timization approach for process synthesis. A sys-
steady-state design and dynamic controllability. tematic procedure is presented for incorporating
The first method [9], [10] uses steady state linear open-loop steady-state controllability measures
controllability measures while the second method into the process synthesis problem. The problem
[20] uses full nonlinear dynamic models of the pro- is formulated mathematically as a MINLP and a
cess. multi-objective optimization problem is solved to
quantitatively determine the best-compromise so-
P r o c e s s Synthesis. Mathematical programming lution among the economic and control objectives.
has been found to be a very useful tool for process The c-constraint method is used to determine the
synthesis. Its application in analyzing the inter- nonin/erior solution set where one objective can
action of design and control has followed directly be improved only at the expense of another, and
along the process synthesis methodology. the best-compromise solution is determined using
The goal in process synthesis to determine the a cutting plane algorithm.
structure and operating conditions of the process In order to apply the process synthesis ap-
flowsheet. The optimization approach to the syn- proach, the controllability measure must be ex-
thesis problem involves three steps: pressed as a function of the unknown design pa-
1) The representation of process design alterna- rameters. Steady-state controllability measures are
tives of interest through a process superstruc- used to simplify the problem and reduce imple-
ture. mentation difficulties that arise when considering
2) The mathematical modeling of the super- controllability measures as functions of frequency.
structure. The steady-state gains of the process can be writ-
ten in an analytical form thus allowing for an al-
3) The algorithmic development of solution pro-
gebraic representation.
cedure to extract the optimal process flow-
The starting point for the controllability anal-
sheet from the superstructure and solution
ysis is the linear multiple input/multiple output
of the optimization problem.
system written in the Laplace domain as
The key aspect is the postulation of a superstruc-
ture which contains all possible design alternatives z(s) = G ( s ) u ( s ) + Gd(s)d(s),
of interest. The superstructure must be sufficiently
where z are the output variables, u are the control
rich so as to include the numerous design possibili-
variables, G(s) is the process transfer function ma-
ties yet succinct enough to eliminate redundancies
trix, and Gd(s) is the disturbance transfer function
and reduce complexities.
matrix.
The mathematical model is characterized by the
Closed-loop control can be considered by ex-
variables and equations used in the model. Con-
pressing the control variable u(s) as
tinuous variables are used to represent flowrates,
compositions, temperatures, etc. Binary variables u(s) = G~(s)(~.*(s) - ~.(s)),
318
MINLP: Applications in the interaction of design and control
319
MINLP: Applications in the interaction of design and control
ity measure is the integral square error (ISE). The point constraints where ti represents the time in-
benefit of this measure is that it is easy to calcu- stance at which the constraint is enforced and h"
late and and does reflect the dynamics of the pro- and g" are general constraints. The objective func-
cess albeit only in the outputs of the process. One tions for the economic and controllability measures
downside of this measure is that there is no one to are represented by the vector J.
one correspondence between the the control struc- The initial condition for the above system is de-
ture and the ISE measure. Thus, different dynamic termined by specifying n of the 2n + m variables
characteristics of the process may not be reflected zl(t0), il(t0), z2(t0). For DAE systems with index
in the ISE. 0 or 1, the remaining n + m values can be deter-
The superstructure is the same as in the previ- mined. In this work, DAE systems of index 0 or 1
ous approach, but a dynamic model is used instead are considered and the initial conditions for zl(t)
of a steady-state model. The dynamic modeling and z2(t) are z ° and z ° respectively.
of the superstructure leads to a problem that in- Note that in this general formulation, the y vari-
cludes differential and algebraic equations (DAEs) ables appear in the DAE system as well as in the
and the formulation is a multi-objective MIOCP. point constraints and general constraints. This has
New algorithmic techniques must be developed for implications on the solution strategy.
the solution of the formulation. A similar approach to that of the previous ap-
The general formulation for the multi-objective proach is applied to address the multi-objective
MIOCP is as follows: nature of the problem. An e-constraint method is
min J(zl(ti),zl(ti),z2(ti),u(ti),x,y) applied to reduce to problem to an iterative solu-
tion of single objective MIOCPs.
s.t. fl (zl (t), Zl (t), z2 (t), u(t), x, y, t) = 0
f2(zl (t), z2(t), u(t), x, y, t) = 0 M I O C P S o l u t i o n A l g o r i t h m . The strategy for
z (t0) - solving the MIOCP is to apply iterative decom-
(t0) - position strategies similar to existing MINLP al-
gorithms with extensions for handling the DAE
h'(zl(ti),zl(ti),z2(ti), u ( t i ) , x , y ) = 0
system. The algorithm developed for the solu-
g'(zl(ti),zl(ti),z2(ti), u ( t i ) , x , y ) _< 0 tion of the MIOCP closely parallels existing al-
h"(x, y) -- 0 gorithms for MINLP optimization (GBD, OA,
g"(x, y) < 0 OA/ER, O A / E R / A P ) . The presence of the y vari-
xEXCR p ables in DAE system for the general case prohibits
the use of Outer Approximation and its variants.
y E {0, 1} q
For the special cases where the y variables do not
e It0, tN]
appear in the DAEs and do participate in a linear
i=O,...,N. and separable fashion, outer approximation and its
(1) variants can be applied to the problem. The GBD
Here, zl(t) is a vector of n dynamic variables algorithm can be applied to the solution of the
whose time derivatives, zl(t), appear explicitly, general problem, and the algorithmic development
and z2(t) is a vector of m dynamic variables whose closely follows those of GBD.
time derivatives do not appear explicitly, x is a The GBD algorithm is an iterative procedure
vector of p time invariant continuous variables, y which generates upper and lower bounds on the
is a vector of q binary variables, and u(t) is a vector solution of the MINLP formulation. The upper
of r control variables. Time t is the independent bound results from the solution of an NLP pri-
variable for the DAE system where to is the fixed mal problem and the lower bound from an MILP
initial time, ti are time instances, and tN is the master problem. The bounds on the solution con-
final time. The DAE system is represented by fl, verge in a finite number of iterations to yield the
the n differential equations, and f2, the m dynamic solution to the MINLP model. A similar method-
algebraic equations. The constraints h ~ and g~ are ology is applied to the MIOCP problem, but the
320
MINLP: Applications in the interaction of design and control
forms of the primal and master problems have to u(t) = ¢ ( w , ~.(t), ,.(t)).
be altered.
In both cases, w are the time invariant control pa-
Primal Problem. The primal problem is obtained
rameters. The set of time invariant parameters, x,
by fixing the y variables which leads to an optimal
is now expanded to include the control parameters:
control problem. For fixed values of y - yk, the
MIOCP has the following form: x = {x, w } .
min J(il(ti),zl(ti),z2(ti), u ( t i ) , x , y k)
The set of DAEs (f) is expanded to include pa-
s.t. fl (zl (t), zl (t), z2 (t), u(t), x, yk, t) -- 0
rameterization functions
f2(zl (t), z2 (t), u(t), x, yk, t) -- 0
Zx (to) -- Z 0 f(.) = {f(.), ¢('), ¢(')}
~,~ (t0) - ~0
and the control variables are converted to dynamic
h'(il(ti),zl(ti),z2(ti), u(ti),x, y k) - 0 state variables:
g'(il(ti),zl(ti),z2(ti),u(ti),x,y k) < 0
h"(x, y k) - 0 ~. = {~., u}.
g"(x, yk) < 0
Through the application of the control parame-
x E ,1:' C RP
terization, the control variables are effectively re-
t~ e [to, tN] moved from the problem and the following prob-
i-O,...,N. lem results:
(2)
min J(Zl (ti), Zl (ti), z2 (ti), x, yk)
The solution of this optimal control problem can
s.t. fl (Zl (t), Zl (t), z2(t), x, y k t) -- 0
be handled in several ways: complete discretiza-
tion, solution of the necessary conditions, dynamic f2(zl (t), z2 (t), x, yk, t) -- 0
programming, and control parameterization. This ~,~ (to) - z °
work focuses on the control parameterization tech- ~,~ (t0) - ~0
niques which parameterize only the control vari- (ti), Zl (ti), z2 (ti), x, yk) = 0
h ' (Zl
ables u(t) in terms of time invariant parameters. (3)
g'(Zl(ti),zl(ti),z2(ti),x,y k) < 0
At each step of the optimization procedure, the
h"(x, y k) - 0
DAEs are solved for given values of the decision
variables and a feasible path for z(t) is obtained. g " ( x , y }) < 0
This solution is used to evaluate the objective xEXCR p
function and remaining constraints. The control t~ e [to, tN]
parameterization can either be open loop as de-
i =0,...,N.
scribed in [21] or closed-loop such as that described
in [17] and [16] which also includes the control This problem is a nonlinear program with dif-
structure selection.
ferential and algebraic constraints (NLP/DAE).
The basic idea behind the control parameteri- This problem is solved using a parametric method
zation is to express the control variables u(t) as where the DAE system is solved as a function of
functions of time invariant parameters. This pa- the x variables. The solution of the DAE system
rameterization can be done in terms of the inde- is achieved through an integration routine which
pendent variable t (open loop): returns the values of the z variables at the time
u(t) = ¢ ( w , t). instances, z(ti), along with their sensitivities with
respect to the parameters, dz/dx(ti). The result-
Alternatively, the parameterization can be done in ing problem is an NLP optimization over the space
terms of the state variables z(t) (closed-loop): of x variables which has the form:
321
MINLP: Applications in the interaction of design and control
¢
a lower bound and y variables for the next pri-
min J(zl (ti), Zl (ti), z2(ti), x, yk)
mal problem. Dual information is required from all
s.t. h'(il(ti),zl(ti),z2(ti),x,y k) - 0 of the constraints including the DAEs whose dual
g'(il(ti),zl(ti),z2(ti),x,y k) <_ 0 variables, or adjoint variables, are dynamic. The
h"(x, yk) _ 0 constraints and their corresponding dual variables
(4)
g " ( x , y k) _< 0 are listed in Table 1.
xEX constraint dual variable
e fl v,(t)
f2 v~(t)
i = 0,...,N, g' tt I
h' ,V
where the variables 2;1(ti), zl(ti), and z2(ti) are de- g"
termined through the solution of the DAE system
h"
by integration:
Table 1: Constraints and their corresponding dual
fl (Zl (t), Zl (t), Z2 (t), X, yk, t) - 0, variables.
f2(zl (t), z2(t), x, yk t) -- 0
(5) The dual variables ~u', A', if', and X" are gener-
Zl (to) -- Z O,
ally obtained from the solution technique for the
(t0) - primal problem. Dual information from the DAE
The functions J(.), g'(.), and h'(.) are func- system is obtained by solving the adjoint problem
tions of z(ti) which are implicit functions of the x for the DAE system which has the following for-
variables through the integration of the DAE sys- mulation:
tem. For the solution of the NLP the objective and
constraints evaluations, along with their gradients p - vTdfl
with respect to x, are required. These are evalu- 1 dil'
_ dfi v~df2
ated directly for the constraints g"(x) and h"(x). 0 (7)
dzl ,
However, for the functions J(.), g'(.), and h'(.),
the values z(ti), and the gradients dz/dx(ti), as
returned from the integration, are used. The func-
tions J(.), g'(.), and h'(-) are evaluated directly This is a set of DAEs where the solutions for
and the gradients dJ/dx, dg~/dx, and dh'/dx are dfl/d~.l, dfl/dzl, df2/dzl, dfl/dz2, and df2/dz2 are
evaluated by using the chain rule: known functions of time obtained from the solu-
dJ - ( OJ
~x tion of the primal problem. The variables vl(t)
and v2(t) are the adjoint variables and the solu-
tion of this problem is a backward integration in
_ (oh: oh: time with the following final time conditions:
(6)
dx -$X-X '
-dJ
-+ Ai dh I tt , dg' dfi
dg~ -- ( Ogi dzl
\
Standard gradient based optimization tech- Thus, the Lagrange multipliers for the end-time
niques can be applied to solve this problem as an constraints are used as the final time conditions
NLP. The solution of this problem provides values for the adjoint problem and are not included in
of the x variables and trajectories for z(t). the master problem formulation.
The master problem is formulated using dual in- The master problem is formulated using the so-
formation and the solution of the primal problem. lution of the primal problem, x k and zk(t), along
Provided that the y variables participate linearly, with the dual information, tt ''k, X''k, and vk(t).
the problem is an MILP whose solution provides The master problem has the following form:
322
MINLP: Applications in the interaction of design and control
+f (t) ,~ v
® L
to YN-IIV
f2(z k (t), zk2(t), x k, y, t) dt qN-I
Ym tv
+/~,,kg,, (x k, y) + A,,kh,, (x k, y), x"'l%'[ qi+l
,, Tv Q
k E Kfeas,
tN
A----~B
y,., Iv
x,l , [
0>f ,2 Tv
G
to
iy, lv
fl (zk (t), z k (t), zk2(t), x k , y, t) dt Q
xI L! V
tN
+ / uk(t)f2(zk(t),zk2(t),xk,y,t) dt
, i
to
+lz ''k g" (x k, y) + A"kh '' (x k, y), Fig. 3: Superstructure for reactor-separator-recycle system.
, , , , ,
k E ginfeas, 0.016
y e {0, 1} q. 0.014
(8) 0.012
The integral term can be evaluated since the pro- W-~ 0.01
DAE system.
w
i i i i i
500000 550000 600000 650000 700000 750000
corn ($)
Example: Reactor-Separator-Recycle Sys-
tem. The example problem considered here is the Fig. 4: Noninferior solution set for the
design of a process involving a reaction step, a sep- reactor-separator-recycle system.
aration step, and a recycle loop. Fresh feed con- The cost function includes column and reactor
taining A and B flow into a an isothermal reactor capital and utility costs.
where the first order irreversible reaction A --+ B
takes place. The product from the reactor is sent cOStreactor - - 17639Dr1'°66 (2Dr)°"8°2,
to a distillation column where the unreacted A is cOStcolumn - - 6802Dc1"°66(2.4Nt)°"8°2
separated from the product B and sent back to the
+548.8Dlc'55Nt,
reactor. The superstructure is shown in Fig. 3.
The model equations for the reactor (CSTR) cOStexchanger s = 193023V°s "65,
For this problem, the single output is the prod- ~-~tax [cOStutilities ].
uct composition. The bottoms (product) compo-
sition is controlled by the vapor boil-up and the The controllability measure is the time weighted
323
MINLP: Applications in the interaction of design and control
References
. •0.01058 [1] BAHRI, P.A., BANDONI, J.A., AND ROMAGNOLI, J.A.:
I0.01056 'Effect of disturbances in optimizing control: Steady-
state open-loop backoff problem', AIChE J. 42, no. 4
~ 0.01054
(1996), 983-994.
[2] BRENGEL, D.D., AND SEIDER, W.D.: 'Coordinated de-
sign and control optimization of nonlinear processes',
Computers Chem. Engin. 16, no. 9 (1992), 861-886.
0.01048 I I ! I I I i
[3] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer-
0 1 2 3 4 5 8 7
Time(hr)
approximation algorithm for a class of mixed-integer
Fig. 5: Dynamic responses of product compositions for nonlinear programs', Math. Program. 36 (1986), 307-
three designs. 339.
[4] ELLIOTT, T.R., AND LUYBEN, W.L.: 'Capacity-based
See also: Chemical process planning; approach for the quantitative assessment of process
Mixed integer linear programming: Mass controllability during the conceptual design stage', In-
and heat exchanger networks; Mixed integer dustr. Engin. Chem. Res. 34 (1995), 3907-3915.
nonlinear programming; M I N L P : O u t e r ap- [5] FIGUEROA, J.L., BAHRI, P.A., BANDONI, J.A., AND
ROMAGNOLI, J.A.: 'Economic impact of disturbances
proximation algorithm; Generalized outer and uncertain parameters in chemical processes- A
approximation; M I N L P : Generalized cross dynamic back-off analysis', Computers Chem. Engin.
decomposition; E x t e n d e d cutting plane al- 20, no. 4 (1996), 453-461.
324
MINLP: Branch and bound global optimization algorithm
[6] FLOUDAS, C.A.: Nonlinear and mixed integer optimi- bined penalty function and outer approximation
zation: Fundamentals and applications, Oxford Univ. method for MINLP optimization', Computers Chem.
Press, 1995. Engin. 14, no. 7 (1990), 769-782.
[7] GEOFFRION, A.M.: 'Generalized Benders decomposi- [23] WALSH, S., AND PERKINS, J.D.: 'Operability and con-
tion', J. Optim. Th. Appl. 10, no. 4 (1972), 237-260. trol in process synthesis and design', in J.L. ANDERSON
Is] KocIs, G.R., AND GROSSMANN, I.E.: 'Relaxation (ed.): Adv. Chem. Engin., Vol. 23, Acad. Press, 1996,
strategy for the structural optimization of process flow pp. 301-402.
sheets', Industr. Engin. Chem. Res. 26, no. 9 (1987), Carl A. Schweiger
1869.
Dept. Chemical Engin. Princeton Univ.
[9] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the
Princeton, NJ 08544-5263, USA
interaction of design and control-1. A multiobjective
E-mail address: carl©titem, princeton, edu
framework and application to binary distillation syn-
thesis', Computers Chem. Engin. 18, no. 10 (1994), Christodoulos A. Floudas
933-969. Dept. Chemical Engin. Princeton Univ.
[10] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the Princeton, NJ 08544-5263, USA
interaction of design and control-2. Reactor-separator- E-mail address: floudas@titan, princeton, e d u
recycle system', Computers Chem. Engin. 18, no. 10
MSC2000: 90Cll, 49M37
(1994), 971-994.
Key words and phrases: mixed integer nonlinear optimiza-
[11] LUYBEN, M.L., AND LUYBEN, W.L.: 'Essentials of pro-
tion, parametric optimal control, interaction of design and
cess control': McGraw-Hill, 1997.
control.
[12] LUYBEN, W.L.: Process modeling, simulation, and con-
trol for chemical engineers, second ed., McGraw-Hill,
1990.
[13] MOHIDEEN, M.J., PERKINS, J.D., AND PISTIKOPOU- MINLP: B R A N C H AND BOUND GLOBAL
LOS, E.N.: 'Optimal design of dynamic systems under OPTIMIZATION ALGORITHM
uncertainty', AIChE J. 42, no. 8 (1996), 2251-2272.
A wide r a n g e of n o n l i n e a r o p t i m i z a t i o n p r o b l e m s
[14] MORARI, M., AND PERKINS, J.: 'Design for opera-
tions': FOCAPD Conf. Prec., 1994. involve integer or discrete variables in a d d i t i o n
[15] MORARI, M., AND ZAFIRIOU, E.: Robust process con- to continuous ones. T h e s e p r o b l e m are d e n o t e d
trol, Prentice-Hall, 1989. as mixed integer nonlinear programming ( M I N L P )
[16] NARRAWAY, L.T., AND PERKINS, J.D.: 'Selection of problems. Integer variables c o r r e s p o n d to logical
control structure based on economics', Computers
decision describing w h e t h e r c e r t a i n actions do or
Chem. Engin. 18 (1993), $511-515.
[17] NARRAWAY, L.T., AND PERKINS, J.D.: 'Selection of do not take place, or m o d e l i n g the sequence ac-
process control structure based on linear dynamic eco- cording to which those decisions take place. T h e
nomics', Industr. Engin. Chem. Res. 32 (1993), 2681- n o n l i n e a r n a t u r e of the M I N L P models m a y arise
2692. from:
[i8] NARRAWAY, L.T., PERKINS, J.D., AND BARTON,
G.W.: 'Interaction between process design and pro- • n o n l i n e a r relations in the integer d o m a i n only
cess control: Economic analysis of process dynamics',
• n o n l i n e a r relations in the continuous d o m a i n
J. Process Control I (1991), 243-250.
[19] PAULES IV, G.E., AND FLOUDAS, C.A.: 'APROS: only
Algorithmic development methodology for discrete- • n o n l i n e a r relations in the joint d o m a i n , i.e.,
continuous optimization problems', Oper. Res. 37,
p r o d u c t s of c o n t i n u o u s and binary/integer
no. 6 (1989), 902-915.
[20] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'Interaction variables.
of design and control: Optimization with dynamic mod-
T h e general m a t h e m a t i c a l f o r m u l a t i o n of the
els', in W.W. HAGERAND P.M. PARDALOS (eds.): Op-
M I N L P p r o b l e m s can be s t a t e d as follows:
timal Control: Theory, Algorithms, and Applications,
Kluwer Acad. Publ., 1997, pp. 388-435.
[21] VASSILIADIS, V.S., SARGENT, R.W.H., AND PAN- min f(x,y)
x,y
TELIDES, e.G.: 'Solution of a class of multistage dy-
s.t. h(x,y) =0
namic optimization problems 1. Problems without path
constraints', Industr. Engin. Chem. Res. 33 (1994), g(x, y) < 0
2111-2122. xEXcR n
[22] VISWANATHAN, J., AND GROSsMANN, I.E.: 'Acom-
y E Y (integer).
325
MINLP: Branch and bound global optimization algorithm
Here, x represents a vector of n continuous vari- First, a reasonable effort is made in solving the
ables, y is a vector of integer variables, ] ( x , y ) , original problem, by considering for instance the
h(x,y), g(x,y) represent the objective function, continuous relaxation of it. If the relaxation does
equality and inequality constraints, respectively. It not result in an integer-feasible solution, i.e., one
should be noted, that every problem of the form in which the binary variables achieve 0-1 at the
just presented, can be transformed into one where optimal point, them the root node is separated
all integer variables have been transformed into into two candidate subproblems which are subse-
binary, i.e., 0-1, variables, by realizing that every quently solved. The separation aims at creating
integer yL <_ y <_ yU can be expressed through 0-1 simpler instances of the original problem. Until the
variables, z = (Zl,... ,ZN), a s : problem is successfully solved this process of gener-
ating candidate subproblems is repeated. Branch
y _ yL _+_Zl + 2z2 -+- 4z3 + . . . + 2Y-lzN,
and bound algorithms are also known as divide-
1, and-conquer for that very reason. A basic princi-
log 2 " ple common to all branch and bound algorithms is .
Therefore, any MINLP problem can be written as: that the solution of the subproblems aims at gen-
erating valid lower bounds on the original MINLP
min f (x, y)
x,y through its relaxation to a continuous problem.
s.t. h(x,y) = 0 The relaxation, in the case of MINLP, results in
g(x, y) < o a nonlinear programming problem (NLP) which,
in the general case, is nonconvex and needs to be
xEXcR n
solved to global optimality so as to provide a valid
y e Y = { O , 1} m.
lower bound. If the NLP relaxation renders an in-
In the analysis of MINLP problems two issues teger solution, then this solution is referred to as
are of paramount importance: valid upper bound. The generation of the sequence
of valid upper and lower bounds is called bound-
• combinatorial explosion of computational re-
ing step. The way subproblems are created is by
quirements as the number of binary variables
forcing some of the binary variables to take on
increases
a value of 0 or 1. This is known as the branch-
• NP-hard nature of the problem of determin- ing step. Nodes in the tree are pruned when the
ing the global minimum solution of general corresponding valid lower bound exceeds the valid
nonconvex MINLP problems. upper bound, this stage is know as the fathom-
A complexity analysis of the former is presented ing step. The selection of the branching node, the
in [16], while the complexity of determining global branching variable and the generation of the lower
minimum solutions of MINLPs is discussed in [15]. bound are very crucial steps whose importance
Various methods exist for identifying a locally becomes even more pronounced when addressing
optimum solution of MINLP problems. These are nonconvex MINLP problems. Two basic strate-
discussed in great detail in [9] and in a recent thor- gies exists regarding the selection of the branching
ough review paper, [6], which presents a compre- node depending on whether one designs a branch
hensive account of the various approaches for ad- and bound based on a depth-first or a breadth-first
dressing issues related to the solution of mixed in- approach. In the former, the last node created is
teger nonlinear optimization problems. selected for branching, in the latter the node that
The main objective in a general branch and generated the best lower bound is selected. It is
bound algorithm is to perform an enumeration of not clear which strategy is the best and it is often
the alternatives without examining all 0-1 com- that the one that minimizes the computational re-
binations of the binary variables. A key element quirement is selected, [13]. Another alternative is
in such an enumeration if the representation of to select nodes based on the deviation of the solu-
alternatives via a binary tree. The basic ideas in tion from integrality, [12]. The most common strat-
a branch and bound algorithm are the following. egy for selecting a branching variable is to select
326
M I N L P : B r a n c h a n d b o u n d global o p t i m i z a t i o n a l g o r i t h m
the variable whose value at the solution of some such t h a t ai > x L, then x i - ai.
relaxed problem is the farthest from integer, i.e., b) If x i - x L - 0 at the solution of the con-
the most fractional variable, [17]. In [12] a method vex NLP and '~i - xL + (U - L ) / ) ~ is
based on the concept of p s e u d o c o s t s which quanti- such t h a t gi < x U, then x U - ai.
fies the effect of binary variables is also proposed, If neither b o u n d constraint is active at the
which assigns essentially priorities on the order of solution of the convex NLP for some vari-
branching variables. Finally, one of the most im- able x j , the problem can be solved by setting
portant computational step is the generation of the x j - x v or x j - x ji. Tests similar to those
lower bound, in other words the solution of the re- presented above are then used to update the
laxed problem. The effectiveness of a branch and bounds on x j .
bound depends of the quality of the lower bound
2) Feasibility based range reduction tests" In
that is generated. At every node of the branch and
addition to ensuring that tight bounds are
bound tree a nonlinear-nonconvex NLP is solved. available for the variables, the constraint un-
Two issues are important: the lower bound must
derestimators are used to generate new con-
be valid, in other words the relaxation at a par-
straints for the problem. Consider the con-
ticular node must underestimate the solution of straint g i ( x , y ) ~_ O. If its underestimating
the original problem for this node, and the lower
function g _ i ( x , y ) - 0 at the solution of the
bounds must be tight so as to enhance the fath-
convex NLP and its multiplier is #~ > 0, the
oming step. The key complexity when dealing with
constraint
nonconvex MINLPs is that the relaxation solved at
U-L
each node is, of course, a nonconvex NLP that has y) > - - - 7 -#i
-
to be solved to global optimality. W i t h the excep-
tion of problems which are convex in the x and can be included in subsequent problems.
relaxed y-space for which variants of the branch A global optimization algorithm branch and bound
and bound algorithms will lead the correct solu- algorithm has been proposed in [20]. It can be ap-
tion, [18], in all other cases g l o b a l o p t i m i z a t i o n al- plied to problems in which the objective and con-
gorithms have to be employed for the generation straints are functions involving any combination
of valid lower bounds. of binary arithmetic operations (addition, subtrac-
In [19] the scope of branch and bound algo- tion, multiplication and division) and functions
rithms was extended to problems for which valid that are either concave over the entire solution
convex underestimating NLPs can be constructed space (such as ln) or convex over this domain (such
for the convex relaxations. The problems included as exp).
bilinear and separable problems for which convex The algorithm starts with an automatic refor-
underestimators can be build [14]. A number of mulation of the original nonlinear problem into a
very useful tests were proposed to accelerate the problem that involves only linear, bilinear, linear
reduction of solution space. Namely: fractional, simple exponentiation, univariate con-
cave and univariate convex terms. This is achieved
1) Optimality based range reduction tests: For through the introduction of new constraints and
the first set of tests, an upper bound U on the variables. The reformulated problem is then solved
nonconvex MINLP must be computed and a to global optimality using a branch and bound ap-
convex lower bounding NLP must be solved proach. Its special structure allows the construc-
to obtain a lower bound L. If a bound con- tion of a convex relaxation at each node of the tree.
straint for variable x i , with x L < x i ~ x U, is The integer variables can be handled in two ways
active at the solution of the convex NLP and during the generation of the convex lower bound-
has multiplier A~ > 0, the bounds on x i can ing problem. The integrality condition on the vari-
be u p d a t e d as follows: ables can be relaxed to yield a convex NLP which
a) If x i - x v - 0 at the solution of the con- can then be solved globally. Alternatively, the inte-
vex NLP and '~i - x U - ( U - L ) / A * is ger variables can be treated directly and the con-
327
MINLP: Branch and bound global optimization algorithm
vex lower bounding MINLP can be solved using a local solution. This bound generation strategy is
a branch and bound algorithm as described ear- incorporated within a branch and bound scheme: a
lier. This second approach is more computation- lower and upper bound on the global solution are
ally intensive but is likely to result in tighter lower first obtained for the entire solution space. Sub-
bounds on the global optimum solution. In order sequently, the domain is subdivided by branching
to obtain an upper bound for the optimum solu- on a binary or a continuous variable, thus creating
tion, several methods have been suggested. The new nodes for which upper and lower bounds can
MINLP can be transformed to an equivalent non- be computed. At each iteration, the node with the
convex NLP by relaxing the integer variables. For lowest lower bound is selected for branching. If the
example, a variable y E {0, 1 } can be replaced by a lower bounding MINLP for a node is infeasible or
continuous variable z E [0, i] by including the con- if its lower bound is greater than the best upper
straint z- z. z = 0. The nonconvex NLP is then bound, this node is fathomed. The algorithm is
solved locally to provide an upper bound. Finally, terminated when the best lower and upper bound
the discrete variables could be fixed to some arbi- are within a pre-specified tolerance of each other.
trary value and the nonconvex NLP solved locally. Before presenting the algorithmic procedure, an
In [i] SMIN was proposed which is designed to overview of the underestimation and convexifica-
address the following class of problems to global tion strategy is given, and some of the options
optimality" available within the algorithm are discussed.
In order to transform the MINLP problem of
min f (x) + x TAoy + cToy
the form just described into a convex problem
s.t. h(x) + x T Aly + c~y - 0 which can be solved to global optimality with the
g(x) + x TA2y + cT2y < 0 OA or GBD algorithm, the functions f(x), h(x)
xEXCR n and g(x) must be convexified. The underestima-
tion and convexification strategy used in the c~BB
y EY (integer),
algorithm has previously been described in detail
where c0-V, c~ and c~ are constant vectors, A0, A1 [3], [5], [4]. Its main features are exposed here.
and A2 are constant matrices and f(x), h(x) and In order to construct as tight an underestimator
g(x) are functions with continuous second order as possible, the nonconvex functions are decom-
derivatives. The solution strategy is an extension posed into a sum of convex, bilinear, univariate
of the aBB algorithm for twice-differentiable NLPs concave and general nonconvex terms. The overall
[7], [5], [4]. It is based on the generation of two function underestimator can then be built by sum-
converging sequences of upper and lower bounds ming up the convex underestimators for all terms,
on the global optimum solution. A rigorous under- according to their type. In particular, a new vari-
estimation and convexification strategy for func- able is introduced to replace each bilinear term,
tions with continuous second order derivatives al- and is bounded by its convex envelope. The uni-
lows the construction of a lower bounding MINLP variate concave terms are linearized. For each non-
problem with convex functions in the continuous convex term nt(x) with Hessian matrix Hnt(x), a
variables. If no mixed-bilinear terms are present convex underestimator L(x) is defined as
(Ai = 0, Vi), the resulting MINLP can be solved
L(x) - nt(x) - ~ ai(x v - xi)(xi - xL), (1)
to global optimality using the outer approxima-
i
tion algorithm (OA), [8]. Otherwise, the general-
ized Benders decomposition (GBD) can be used, where x v and x L are the upper and lower bounds
[10], or the Glover transformations [11] can be ap- on variable xi, respectively, and the a parame-
plied to remove these bilinearities and permit the ters are nonnegative scalars such that H n t ( x ) +
use of the OA algorithm. This convex MINLP pro- 2 diag(ai) is positive semidefinite over the domain
vides a valid lower bound on the original MINLP. [xL,xg]. The rigorous computation of the a pa-
An upper bound on the problem can be obtained rameters using interval Hessian matrices is de-
by applying the OA algorithm or the GBD to find scribed in [3], [5], [4].
328
MINLP: Branch and bound global optimization algorithm
The underestimators are updated at each node for the largest separation distances between the
of the branch and bound tree as their quality convex underestimating functions and the original
strongly depends on the bounds on the variables. nonconvex functions. These efficient rules are ex-
An unusual feature of the SMIN-c~BB algorithm posed in [2]. Variable bound updates performed
is the strategy used to select branching variables. before the generation of the convex MINLP have
It follows a hybrid approach where branching may been found to greatly enhance the speed of conver-
occur both on the integer and the continuous vari- gence of the c~BB algorithm for continuous prob-
ables in order to fully exploit the structure of the lems [2]. For continuous variables, the variable
problem being solved. After the node with the low- bounds are updated by minimizing or maximiz-
est lower bound has been identified for branching, ing the chosen variable subject to the convexified
the type of branching variable must be determined constraints being satisfied. In spite of its compu-
according to one of the following two criteria: tational cost, this procedure often leads to signif-
icant improvements in the quality of the underes-
1) Branch on the binary variables first.
timators and hence a noticeable reduction in the
2) Solve a continuous relaxation of the noncon- number of iterations required.
vex MINLP locally. Branch on a binary vari- In addition to the update of continuous vari-
able with a low degree of fractionality at the able bounds, the SMIN-c~BB algorithm also relies
solution. If there is no such variable, branch on binary variable bound updates. Through simple
on a continuous variable. computations, an entire branch of the branch and
The first criterion results in the creation of an in- bound tree may be eliminated when a binary vari-
teger tree for the first q levels of the branch and able is found to be restricted to 0 or 1. The bound
bound tree, where q is the number of binary vari- update procedure for a given binary variable is as
ables. At the lowest level of this integer tree, each follows:
node corresponds to a nonconvex NLP and the 1) Set the variable to be updated to one of its
lower and upper bounding problems at subsequent bounds y = YB.
levels of the tree are NLP problems. The efficiency
2) Perform interval evaluations of all the con-
of this strategy lies in the minimization of the num-
straints in the nonconvex MINLP, using the
ber of MINLPs that need to be solved. The combi-
bounds on the solution space for the current
natorial nature of the problem and its nonconvex-
node.
ities are handled sequentially. If branching occurs
on a binary variable, the selection of that variable 3) If any of the constraints are found infeasible,
can be done randomly or by solving a relaxation fix the variable to y = 1 - ys.
of the nonconvex MINLP an~i choosing the most 4) If both bounds have been tested, repeat this
fractional variable at the solution. procedure for the next variable to be up-
The second criterion selects a binary variable dated. Otherwise, try the second bound.
for branching only if it appears that the two newly In [1] GMIN, which operates within a classical
created nodes will have significantly different lower branch and bound framework, was proposed. The
bounds.Thus, if a variable is close to integrality at main difference with similar branch and bound
the solution of the relaxed problem, forcing it to algorithms [12], [17] is its ability to identify the
take on a fixed value may lead to the infeasibility of global optimum solution of a much larger class of
one of the nodes or the generation of a high value problems of the form
for a lower bound, and therefore the fathoming of
a branch of the tree. If no binary variable is close min /(x,y)
x,y
to integrality, a continuous variable is selected for s.t. h(x,y) = 0
branching.
g(x, y) < 0
A number of rules have been developed for the
xeXCR n
selection of a continuous branching variable. Their
aim is to determine which variable is responsible y C N q,
329
MINLP: Branch and bound global optimization algorithm
where N is the set of nonnegative integers and the nearest integer to provide an updated bound for
only condition imposed on the functions f(x,y), y*.
g(x, y) and h(x, y) is that their continuous relax- See also" G l o b a l o p t i m i z a t i o n in b a t c h de-
ations possess continuous second order derivatives. sign u n d e r u n c e r t a i n t y ; S m o o t h n o n l i n e a r
This increased applicability results from the use of n o n c o n v e x o p t i m i z a t i o n ; I n t e r v a l global op-
the aBB global optimization algorithm for contin- t i m i z a t i o n ; a B B a l g o r i t h m ; Global o p t i m i -
uous twice-differentiable NLPs [7], [5], [4]. z a t i o n in g e n e r a l i z e d g e o m e t r i c p r o g r a m -
At each node of the branch and bound tree, the ming; G l o b a l o p t i m i z a t i o n in p h a s e a n d
nonconvex MINLP is relaxed to give a noncon- c h e m i c a l r e a c t i o n e q u i l i b r i u m ; Global op-
vex NLP, which is then solved with the aBB algo- t i m i z a t i o n m e t h o d s for s y s t e m s of nonlin-
rithm. This allows the identification of rigorously ear e q u a t i o n s ; C o n t i n u o u s global o p t i m i -
valid lower bounds and therefore ensures conver- zation: M o d e l s , a l g o r i t h m s a n d software;
gence to the global optimum. In general, it is not Disjunctive programming; Reformulation-
necessary to let the aBB algorithm run to com- l i n e a r i z a t i o n m e t h o d s for global opti-
pletion as each one of its iterations generates a mization; M I N L P : L o g i c - b a s e d m e t h o d s ;
lower bound on global solution of the NLP being MINLP: Branch and bound methods;
solved. A strategy of early termination leads to M I N L P : Global o p t i m i z a t i o n w i t h a B B ;
a reduction in the computational requirements of C h e m i c a l process p l a n n i n g ; M i x e d i n t e g e r
each node of the binary branch and bound tree linear p r o g r a m m i n g : M a s s a n d h e a t ex-
and faster overall convergence. c h a n g e r n e t w o r k s ; M i x e d i n t e g e r nonlin-
The GMIN-c~BB algorithm selects the node ear p r o g r a m m i n g ; M I N L P : O u t e r a p p r o x -
with the lowest lower bound for branching at every i m a t i o n a l g o r i t h m ; G e n e r a l i z e d o u t e r ap-
iteration. The branching variable selection strat- p r o x i m a t i o n ; M I N L P : G e n e r a l i z e d cross de-
egy combines several approaches: branching prior- c o m p o s i t i o n ; E x t e n d e d c u t t i n g plane algo-
ities can be specified for some of the integer vari- rithm; Generalized Benders decomposition;
ables. When no variable has a priority greater than MINLP: Heat exchanger network synthe-
all other variables, the solution of the continuous sis; M I N L P : R e a c t i v e d i s t i l l a t i o n c o l u m n
relaxation is used to identify either the most frac- synthesis; M I N L P : D e s i g n a n d scheduling
tional variable or the least fractional variable for of b a t c h processes; M I N L P : A p p l i c a t i o n s
branching. in t h e i n t e r a c t i o n of design a n d control;
Other strategies have been implemented to en- M I N L P : A p p l i c a t i o n in facility location-
sure a satisfactory convergence rate. In particular, allocation; M I N L P : A p p l i c a t i o n s in blend-
bound updates on the integer variables can be per- ing a n d p o o l i n g p r o b l e m s .
formed at each level of the branch and bound tree.
References
These can be carried out through the use of inter- [i] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
val analysis. An integer variable, y*, is fixed at its C.A.: 'Global optimization of MINLP probelms in pro-
lower (or upper) bound and the range of the con- cess synthesis and design', Computers Chem. Engin. 21
straints is evaluated with interval arithmetic, using (1997), $445-$450.
[2] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
the bounds on all other variables. If the range of
C.A.: 'A global optimization method, aBB for twice-
any constraint is such that this constraint is vio- differentiable NLP's- If. Implementation and compu-
lated, the lower (or upper) bound on variable y* tational results', Computers Chem. Engin. 22 (1998),
can be increased (or decreased) by one. Another 1137-1158.
strategy for bound updates is to relax the integer [3] ADJIMAN,C.S., ANDROuLAKIS, I.P, MARANAS,C.D.,
variables, to convexify and underestimate the non- AND FLOUDAS, C.A.: 'A global optimization method,
aBB, for process design', Computers Chem. Engin. 20
convex constraints and to minimize (or maximize)
(1996), $419-$424.
a variable y* in this convexified feasible region. The [4] ADJIMAN, C.S., DALLWIG, S., FLOUDAS, C.A., AND
resulting lower (or upper) bound on relaxed vari- NEUMAIER, A.: 'A global optimization method, aBB
able y* can then be rounded up (or down) to the for twice-differentiable N L P ' s - I. Theoretical Ad-
330
MINLP: Branch and bound methods
vances', Computers Chem. Engin. 22 (1998), 1159- ExxonMobil Res. & Engin.
1179. Annandale, New Jersey 08801, USA
[5] ADJIMAN, C.S., AND FLOUDAS, C.A.: 'Rigorous con- E-mail address: ipandro~erenj, com
vex underestimators for general twice-differentiable
problems', J. Global Optim. 9 (1996), 23-40. MSC2000: 90C10, 90C26
[6] ADJIMAN, C.S., SCHWEIGER, C.A., AND FLOUDAS, Key words and phrases: mixed integer nonlinear program-
C.A.: 'Mixed-integer nonlinear optimization in process ming, global optimization, branch and bound algorithms.
synthesis', in D.-Z. Du AND P.M. PARDALOS (eds.):
Handbook Combinatorial Optim., Kluwer Acad. Publ.,
1998.
[7] ANDROULAKIS, I.P, MARANAS, C.D., AND FLOUDAS, MINLP: BRANCH AND BOUND METH-
C.A.: 'aBB, a global optimization method for general ODS
constrained nonconvex problems', J. Global Optim. 7 A general mixed integer nonlinear programming
(1995), 337-363. problem (MINLP) can be written as
[8] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer-
approximation algorithm for a class of mixed-integer min J'(x, y)
nonlinear programs', Math. Program. 36 (1986), 307-
s.t. h(x, y) = 0
339.
[9] FLOUDAS, C.A.: Nonlinear and mixed-integer optimi- (MINLP) g(x, y) < 0
zation: Fundamentals and applications, Oxford Univ.
x E R '~
Press, 1995.
[10] GEOFFRION, A.M.: 'Generalized Benders decomposi- yEZ m.
tion', J. Optim. Th. Appl. 10 (1972), 237-260.
Here x is a vector of n continuous variables and y
[11] GLOVER, F.: 'Improved linear integer programming
formulations of nonlinear integer problems', Managem. is a vector of m integer variables. In many cases,
Sci. 22 (1975), 445-452. the integer variables y are restricted to the values
[12] GUPTA, O.K., AND RAVINDRAN, R.: 'Branch and 0 and 1. Such variables are called binary variables.
bound experiments in convex nonlinear integer pro- The function f is a scalar valued objective func-
gramming', Managem. Sci. 31 (1985), 1533-1546.
tion, while the vector functions h and g express
[13] LATER, E.L., AND WOOD, D.E.: 'Branching and
bound methods: A survey', Oper. Res. (1966), 699-719.
linear or nonlinear constraints. Problems of this
[14] MCCORMICK, G.P.: 'Computatbility of global solu- form have a wide variety of applications, in ar-
tions to factorable nonconvex programs; Part I - convex eas as diverse as IR spectroscopy [6], finance [3],
underestimating problems', Math. Program. 10 (1976), chemical process synthesis [9], topological design of
147-175. transportation networks [12], and marketing [10].
[151 MURTY, K.G., AND KABADI, S.N.: 'Some NP-
complete problems in quadratic and nonlinear prgram- The earliest work on branch and bound algo-
minK', Math. Program. 39 (1987), 117-123. rithms for mixed integer linear programming dates
[16] NEUMHAUSER, G.L., AND WOLSEY, L.A.: Integer and back to the early 1960s [7], [13], [15]. Although the
combinatorial optimization, Wiley, 1988. possibility of applying branch and bound methods
[17] OSTROVSKY, G.M., AND MIKHAILOV, G.W.: 'Discrete to mixed integer nonlinear programming problems
optimization of chemical processes', Computers Chem.
was apparent from the beginning, actual work on
Engin. 14 (1990), 111-124.
[lS] QUESADA, I., AND GROSSMANN, I.E.: 'An LP/NLP such problems did not begin until later. Early pa-
based branch and bound algorithm for convex MINLP pers on branch and bound algorithms for mixed
optimization problems', Computers Chem. Engin. 16 integer nonlinear programming include [11], [14].
(1992), 937-947. A branch and bound algorithm for solving
[19] RYOO, H.S., AND SAHINIDIS, N.V.: 'Global optimiza-
(MINLP) requires the following data structures.
tion of nonconvex NLPs and MINLPs with applications
in process design', Computers Chem. Engin. 19 (1995), The algorithm maintains a list L of unsolved sub-
551-566. problems. The algorithm also maintains a record
[20] SMITH, E.M.B., AND PANTELIDES, C.C.: 'Global op- of the best integer solution that has been found.
timization of nonconvex MINLPs', Computers Chem. This solution, (x*, y*), is called the incumbent so-
Engin. 21 (1997), $333-$338.
lution. The incumbent solution provides an upper
Ioannis P. A ndroulakis bound, ub, on the objective value of an optimal
Corp. Strategic Res. solution to (MINLP).
331
MINLP: Branch and bound methods
The basic branch and bound procedure is as fol- The optimal solution to the initial nonlinear pro-
lows. gramming relaxation is y = (1/4, 1/4, 0), with an
1) Initialize" Create the list L with (MINLP) as objective value of z - 0 . Both yl and y2 take on
the initial subproblem. If a good integer so- fractional values in this solution, so it is necessary
lution is known, then initialize x*, y*, and to select a branching variable. The algorithm ar-
ub to this solution. If there is no incumbent bitrarily selects yl as the branching variable, and
solution, then initialize ub to +ce. creates two new subproblems in which Yl is fixed
at 0 or 1. In the subproblem with yl fixed at 0, the
2) Select" Select an unsolved subproblem, S,
optimal solution is y - (0, 1/4, 0), with z = 1/16.
from the list L. If L is empty, then stop: If
Since the optimal value of y2 is fractional, the algo-
there is an incumbent solution, then that so-
rithm again creates two new subproblems, with y2
lution is optimal; If there is no incumbent
fixed at 0 and 1. The optimal solution to the sub-
solution, then (MINLP) is infeasible.
problem with Yl = 0 and Y2 - 0 is y = (0, 0, 0),
3) Solve" Relax the integrality constraints in S with z - 1/8. This establishes an incumbent in-
and solve the resulting nonlinear program-
teger solution. The subproblem with yl - 0 and
ming relaxation. Obtain a solution ~, ~, and
Y2 -- 1 is infeasible and can be eliminated from
a lower bound, Ib, on the optimal value of the
consideration. The subproblem with Yl = 1 has
subproblem.
an optimal solution with y = (1, 1/4,0) and ob-
4) Fathom" If the relaxed subproblem was in- jective value z - 9/16. Since 9/16 is larger than
feasible, then S will clearly not yield a better the objective value of the incumbent solution, this
solution to (MINLP) than the incumbent so- subproblem can be eliminated from consideration.
lution. Similarly, if lb > ub, then the current Thus the optimal solution to the example problem
subproblem cannot yield a better solution to is y* = (0, 0, 0) with objective value z* = 1/8.
(MINLP) than the incumbent solution. Re-
move S from L, and return to step 2. y--(1/4,1/4,0)
5) Integer Solution" If ~ is integer, then a z-O
new incumbent integer solution has been ob-
tained. Update x*, y*, and ub. Remove S
from L and return to step 2.
6) Branch: At least one of the integer variables
y-(0,1/4,0) y--(1,1/4,0)
Yk takes on a fractional value in the solution
to the current subproblem. Create a new sub- z-1/16 z - 9/16
problem, S1 by adding the constraint bound > ub
Yk 4 L~kJ.
Create a second new subproblem, $2 by Y2 -- 0
adding the constraint
332
MINLP: Branch and bound methods
shows the branch and bound tree for the exam- and bound algorithms for MILP is the 'best bound
ple problem. rule', in which the subproblem with the smallest
There are a number of important issues in the lower bound is selected. The best bound rule is
implementation of a branch and bound algorithm widely used within branch and bound algorithms
for (MINLP). for (MINLP)[4], [11], [18]
The first important issue is how to solve the In step 6, there may be a choice of several vari-
nonlinear programming relaxations of the sub- ables with fractional values to be the branching
problems in step 3. If the objective function f and variable. A simple approach is to select the vari-
the constraint functions g are convex, while the able whose value Y'k is furthest from being an inte-
constraint functions h are linear, then the nonlin- ger [4], [11]. In mixed integer linear programming,
ear programming subproblems in step 3 are convex estimates of the increase in the objective function
and thus relatively easy to solve. A variety of meth- that will result from forcing a variable to an inte-
ods have been used to solve these subproblems in- ger value are often made. These estimates, called
cluding generalized reduced gradient (GRG) meth- 'pseudocosts' or 'penalties', are used to select the
ods [11], sequential quadratic programming (SQP) branching variable. Penalties have also been used
[4], active set methods for quadratic programming in branch and bound algorithms for mixed integer
[8], and interior point methods [16]. nonlinear programming problems [11], [18].
However, if the nonlinear programming sub- The performance of the branch and bound algo-
problems are nonconvex, then it can be ex- rithm can be improved by computing lower bounds
tremely difficult to solve the nonlinear program- on the optimal value of a subproblem without ac-
ming relaxation of S or even obtain a lower tually solving the subproblem. In [8], lower bounds
bound on the optimal objective function value. For on the optimal objective value of a subproblem are
some specialized classes of nonconvex optimization derived from an optimal dual solution to the sub-
problems, including indefinite quadratic program- problem's parent problem. If this lower bound is
ming, bilinear programming, and fractional linear larger than the objective value of the incumbent
programming, convex functions which underesti- solution, then the subproblem can be eliminated
mate the nonconvex objective function are known. from consideration. In [4], Lagrangian duality is
These convex underestimators are widely used in used to compute lower bounds during the solution
branch and bound algorithms for nonconvex non- of a subproblem. When the lower bound exceeds
linear programming problems. Branch and bound the value of the incumbent solution, the current
techniques for nonconvex continuous optimization subproblem can be discarded.
problems can also been used within a branch and Another way to improve the performance of
bound algorithm for nonconvex mixed integer non- a branch and bound algorithm for (MINLP) is
linear programming problems. For instance, the to tighten the formulation of the nonlinear pro-
B A R O N system uses this approach to solve a va- gramming subproblems before solving them. In the
riety of nonconvex mixed integer nonlinear pro- BARON package, dual information from the so-
gramming problems [17], [18]. This approach is lution to a nonlinear programming subproblem is
also used in the GMIN-c~BB algorithm to solve used to restrict the ranges of variables and con-
nonconvex 0 - 1 mixed integer nonlinear program- straints in the children of the subproblem [17], [18].
ming problems with twice differentiable objective
In branch and cut approaches, constraints called
and constraint functions [1].
cutting planes are added to the nonlinear program-
The choice of the next subproblem to be solved ming subproblems [3], [19]. These additional con-
in step 2 can have a significant influence on the straints are selected so that they reduce the size
performance of the branch and bound algorithm. of the feasible region of nonlinear programming
In mixed integer linear programming, a variety subproblems without eliminating any integer solu-
of heuristics are employed to select the next sub- tions from consideration. This tightens the formu-
problem [2]. One popular heuristic used in branch lations of the subproblems and thus increases the
333
MINLP: Branch and bound methods
334
MINLP: Design and scheduling o~ batch processes
[15] LAWLER, E.L., AND WOOD, D.E.: 'Branch and bound this problem does not give the actual schedule,
methods: A survey', Oper. Res. 14, no. 4 (1966), 699- but does guarantee that a feasible schedule exists.
719. A separate problem, typically a MILP, must be
[16] LEE, E.K., AND MITCHELL, J.E.: 'Computa-
tional experience of an interior point algorithm
solved to find the actual schedule.
in a parallel branch-and-cut framework': Proc. The second method for formulating the batch
Eighth SIAM Conf. Parallel Processing for Sci. process design and scheduling problem is based on
Computing, Minneapolis, March 1 9 9 7 , 1997, a state-task-network (STN) representation. In this
www.siam.org/catalog/mcc07/heath97.htm. approach, the planning horizon is discretized into
[lZ] RYOO, H.S., AND SAHINIDIS, N.V.: 'A branch-and-
time steps. Each task must be assigned to both a
reduce approach to global optimization', J. Global Op-
tim. 8, no. 2 (1996), 107-139. unit and a time slot. The formulation results in
[ls] SAmNIDIS, N.V.: 'BARON: A general purpose global a large MINLP whose solution provides both the
optimization software package', J. Global Optim. 8, plant design and the actual schedule.
no. 2 (1996), 201-205.
[lO] STUBBS, R.A., AND MEHROTRA, S.: 'A branch-and-
Continuous-Time F o r m u l a t i o n s . The early
cut method for 0-1 mixed convex programming', Math.
Program. 80 (1999), 515-532. work of [10] was based on the single product cam-
paign (SPC) scheduling policy. In a single product
Brian Botchers
Dept. Math. New Mexico Tech.
campaign, all batches of one product are processed
Socorro, NM 87801, USA one after the other, followed by all of the batches
E-mail address: b o r c h e r s C n m t , edu of the next product, and so on.
MSC 2000: 90Cll In this approach, the scheduling information is
Key words and phrases: mixed integer programming, incorporated by way of a planning horizon con-
branch and bound, MINLP. straint. This constraint requires that all products
must be completed before the planning horizon,
H, is reached. In a single product campaign, the
MINLP: DESIGN AND SCHEDULING OF time between batches of product i is based on the
BATCH PROCESSES maximum processing time over all of the stages,
The design of batch processes has been a major
tLi = ma. x(tij ),
area of research for the past several decades. In )
conjunction with the design of batch plants, many where tLi is the 'limiting' time for product i. The
different approaches have been proposed for the planning horizon constraint can be written as the
determination of an optimal schedule for the plant. sum over all of the products of the limiting time
It has been recognized for some time that in order multiplied by the number of batches of each prod-
to increase the efficiency of batch processes, the
uct
two tasks of design and scheduling should be con-
Qi
sidered simultaneously.
The problem is to design a batch process con- i
sisting of M processing steps, in which N products where Qi is the total production of i and Bi is the
are made, where all materials follow the same path batch size for i. Because Qi and Bi are variables,
through the process. This is commonly known as this results in a NLP.
a multiproduct batch plant, or a flow-shop. In [4] the authors formulated the batch process
There are two predominant methods for for- design and scheduling problem as a MINLP. Their
mulating the batch process design and schedul- model was based on the SPC model of [10]. In this
ing problem. The first is a continuous-time ]ormu- problem, more than one piece of equipment per
lation in which the scheduling information is in- stage is available for use in parallel. Rather than
corporated through a planning horizon constraint. solve the MINLP rigorously, they relaxed the num-
This problem can be formulated as a NLP or ber of units per stage to be continuous and solved
MINLP depending on whether the number of par- the resulting NLP. [5] formulated the MINLP us-
allel units is fixed or variable. The solution of ing binary 0-1 variables and solved it with an outer
335
MINLP: Design and scheduling of batch processes
Problem formulation. Nj - Z YC j.
(2
336
MINLP: Design and scheduling of batch processes
For the UIS policy with zero cleanup The objective is to minimize the cost of
times, the planning horizon constraint the plant. [3] used a fixed-charge cost for
derived by [2] is used, each unit, ~j, plus a nonlinear cost func-
tion on the size of the unit,
niptij <_H. Nj.
i
Cost -
5) Logical constraints J
- If a stage j exists, then at least one pro-
cessing task must be assigned to it, This formulation is a MINLP where all binary
variables participate linearly and separably. How-
E Ytj >_YEXj. ever, it is a nonconvex problem due to the cost
t function, and the bilinear terms in the batch size
If a stage j does not exist, there can be constraints and the planning horizon constraints.
no tasks assigned to it, [3] used the outer approximation method imple-
mented in DICOPT ([11]) to solve a number of
Ytj <_YEXj. example problems. Due to the nonconvexities in
If a stage j exists, then one of the tasks the formulation, there is no guarantee of global
assigned to it must be the first task as- optimality with the outer approximation method,
signed to stage j, but they report good results for the examples pre-
sented in the paper.
YFtj = YEXj. Two examples are briefly discussed to illustrate
t the proposed approach for multiproduct batch
There cannot be more than one first task plants with a variety of scheduling policies. The
assigned to each stage, first example consists of three products with four
processing tasks and five potential units and su-:
EYFtj < 1. perunits. The MINLP formulation with the SPC
t
policy contains 33 binary variables and 54 contin-
A task can be the first task assigned to uous variables. With the ZW policy, the number
a stage only if the task is among those of binary variables drops to 8, with 98 continuous
assigned to the stage, variables. For the UIS policy, the formulation has
33 binary variables with 51 continuous variables.
YFtj <__Ytj.
The second example is larger and contains
No tasks that occur before the first task 6 products with 7 potential units and supe-
assigned to stage j can be among those runits. The SPC policy formulation contains 46
assigned to the stage, binary variables and 101 continuous variables. The
MINLP formulation for the ZW policy has 11 bi-
Yt,j <_ 1 - Y F t j fort'<t. nary and 374 continuous variables. The UIS policy
If multiple tasks are assigned to a unit, formulation has 46 binary and 95 continuous vari-
they must be consecutive tasks, ables. In all cases the examples were solved in less
than 50 minutes using G A M S / D I C O P T + + on Mi-
Ytj <_ YFtj + Yt-lj. crovax II.
337
MINLP: Design and scheduling of batch processes
The main variables of the formulation are: i) they correspond to an approximation of the
time horizon; and
a) binary structural variables representing the
existence of an equipment; ii) they result in an unnecessary increase of the
number of binary variables in particular, and
b) binary allocation variables for the assignment in the overall size of the mathematical model.
of a task to a unit at the beginning of a time
period; A continuous-time formulation was proposed
in [12], based on the STN representation and
c) continuous variables representing the capac- the scheduling formulation proposed in [13]. It
ity of a unit;
gives rise to a mixed integer nonlinear program-
d) continuous variables corresponding to the ming problem which is solved using a stochastic
batch size of a task to a unit at each time MINLP optimizer based on an evolutionary algo-
period; rithm (EA) with simulated annealing (SA) pre-
338
MINLP: Design and scheduling of batch processes
sented in [12]. The method is based on a guided which establish the relationship between pro-
stochastic generation of alternative vectors of deci- cessing time, Tijt, and time of event (l), Tl.
sion variables, which explore promising areas of the
O <_ T1 <_ T2 < . . . < Tlmax <_ H ,
search space through selection, crossover, and mu-
tation operations applied to individuals in a pop- expressing the monotonic increase in event
ulation of solution candidates. It can be used to times.
deal with nonconvex, nondifferentiable functions
4) Allocation constraints:
although it has no guarantee of convergence to
even a local optimal solution. The proposed for-
mulation involves the following basic variables:
o_ E E -E E E
iEb v,<v iEb l<V' V'<V
• Main design variables representing the dis- <Ej,
crete decisions of selecting a unit (j), Ej, or
a storage (s), Es, or continuous decisions cor- E /,,<_/max
iEIj
E iEIj
E E
t ( t t! ttt<~t max
responding to the capacity of unit storage or
utility, Vj, Vs, and Uu, respectively. W i j l -- E Xijll' ,
l'>l
• Main operation variables corresponding to
the discrete decision of allocation of task (i) expressing the relationship between Wijl and
in unit (j) at time Tt, Wijl, and the decision Xijw operation variables, [13].
of assigning task (i)in unit (j) between start- 5) Material balances written for state s at event
ing time Tt and end time Tl,, and continuous time Tt"
variables, the time of event (/), Tt, the batch
size, the processing time and utility require- Csl I - C s l ' - i
ment of task (i) allocated to unit (j) starting +)-]~E in E B i j l X i j l l '
Psij
at TI, Bijl, Tijl, Ui~ l, respectively, iEIs jEJi l<l'
Based on these variables the proposed formulation
--
ZZooPsij Bijl' ,
involves" iEIs jEJi
1) Processing task models" o <_ C,~, <_ V,o + tl,.
339
MINLP: Design and scheduling of batch processes
340
MINLP: Generalized cross decomposition
Mixed integer linear programming: Mass ond Conf. Foundations of Computer Aided Operations
and heat exchanger networks; Mixed integer (1994), 253-274.
nonlinear programming; MINLP: Outer ap- [1o] SPARROW, R.E., FORDER, G.J., AND RIPPIN,
D.W.T.: 'The choice of equipment sizes for multiprod-
proximation algorithm; Generalized outer uct batch plants. Heuristics vs. branch and bound',
approximation; MINLP: Generalized cross Industr. Engin. Chem. Process Des. Developm. 14
decomposition; Extended cutting plane al- (1975), 197-203.
gorithm; MINLP: Logic-based methods; [ii] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
bined penalty function and outer-approximation
MINLP: Branch and bound methods;
method for MINLP optimization', Computers Chem.
MINLP: Branch and bound global optimi- Engin. 14 (1990), 769-782.
zation algorithm; MINLP: Global optimi- [12] XIA, Q., AND MACCHIETTO, S.: 'Design and synthesis
zation with ~BB; MINLP: Heat exchanger of batch plants- MINLP solution based on a stochastic
network synthesis; MINLP: Reactive distil- method', Computers Chem. Engin. 21 (1997), $697-
lation column synthesis; Generalized Ben- $702.
ders decomposition; MINLP: Applications [13] ZHANG, X., AND SARGENT, R.W.H.: 'The optimal op-
eration of mixed production facilities - general formu-
in the interaction of design and control; lation and some solution approaches for the solution',
MINLP: Application in facility location- Proc. 5th Internat. Syrup. Process Systems Engin. (Ky-
allocation; MINLP: Applications in blend- ongju, Korea) (1994), 171-177.
ing and pooling problems; Job-shop sched- Christodoulos A. Floudas
uling problem; Stochastic scheduling; Vehi- Dept. Chemical Engin. Princeton Univ.
cle scheduling. Princeton, NJ 08544-5263, USA
E-mail address: f l o u d a s ~ t i t a n , princeton, edu
S. T. Harding
References Dept. Chemical Engin. Princeton Univ.
[1] BARBOSA-P6VOA, A.P.F.D., AND MACCHIETTO, S.: Princeton, NJ 08544-5263, USA
'Detailed design of multipurpose batch plants', Com- Marianthi Ierapetritou
puters Chem. Engin. 18 (1994), 1014-1042. Dept. Chemical and Biochemical Engin. Rutgers Univ.
[2] BIREWAR, D.B., AND GROSSMANN, I.E.: 'Incorporat- 98 Brett Road
ing scheduling in the optimal design of multiproduct Piscataway, NJ 08854, USA
batch plants', Computers Chem. Engin. 13 (1989), E-mail address: marianth~sol.rutgers.edu
141-161.
MSC 2000:90C26
[3] BIREWAR, D.B., AND GROSSMANN, I.E.: 'Simultane-
ous synthesis, sizing and scheduling of multiproduct Key words and phrases: batch process, design, scheduling,
batch plants', Industr. Engin. Chem. Res. 29 (1990), continuous and discrete time models.
2242-2251.
[4] GROSSMANN, I.E., AND SARGENT, R.W.H.: 'Optimum
design of multipurpose batch plants', Industr. Engin. M I N L P : GENERALIZED CROSS DECOM-
Chem. Process Des. Developm. 18 (1979), 343-348.
POSITION
[5] Kocls, G.R., AND GROSSMANN, I.E.: 'Computational
experience with DICOPT solving MINLP problems in Decomposition methods, such as the classical Ben-
process synthesis engineering', Computers Chem. En- ders decomposition (cf. G e n e r a l i z e d B e n d e r s
gin. 13 (1989), 307-315. decomposition), [1], and Dantzig-Wolfe decom-
[6] KONDILI, E., PANTELIDES, C.C., AND SARGENT, position, [3], have been used to solve many dif-
R.W.H.: 'A general algorithm for short-term schedul-
ferent large structured optimization problems, by
ing of batch operations- I. MILP formulation', Com-
puters Chem. Engin. 17 (1993), 211-227. decomposing them with the help of relaxation of
[7] KU, H., AND KARIMI, I.: 'Scheduling in multistage se- constraints or fixation of variables. The success of
rial batch processes with finite intermediate storage - such an approach depends very much on the struc-
Part I. MILP formulation; Part II. Approximate algo- ture of the problem. In some cases these methods
rithms', AIChE Annual Meeting, Miami (1986). are very efficient, but in other cases they are not
[8] MANUAL: gBSS, general batch scheduling system - User
competitive with other techniques.
manual and language reference, Imperial College, 1996.
[9] PANTELIDES, C.C.: 'Unified frameworks for the op- However, the simple elegance of these basic prin-
timal proces planning and scheduling', Proc. Sec- ciples has inspired many researchers to propose
341
MINLP: Generalized cross decomposition
modifications of the basic methods, mostly aimed for example nonlinear mixed integer programming
at improving the efficiency of the methods, but problems, see for example [4].
also aimed at extending the applicability of the
approaches. T h e P r o b l e m . Consider the following general op-
Dantzig-Wolfe decomposition, originally for lin- timization problem.
ear programming problems, [3], has been extended
to convex nonlinear programming problems, [2],
v* - min f (x, y)
under several names, for example generalized lin- s.t. Gi(x, y) < 0
ear programming. We will here simply use the term (P) a2(x, y) <_ 0
'nonlinear Dantzig-Wolfe decomposition'. xEX
Benders decomposition, originally for linear yEY
mixed integer programming problems, [1], has
where X and Y are compact, nonempty sets. As-
been extended to partly convex nonlinear pro-
sume that X is convex and f, Gi and G2 are
gramming problems, [5], under the name 'gener-
proper convex functions in x for any fixed y E Y,
alized Benders decomposition'.
i.e. that the problem is convex in x. Also assume
On the other hand, among the numerous sug- that that f, Gi and G2 are bounded and Lips-
gestions for modifications to increase the effi- chitzian on (X, Y). Note that we do not assume
ciency, there is one which in a way shares the any convexity in the y-variables. An important
simplicity and clear principle of the basic meth- case is when Y is a (finite) set of integers.
ods, namely cross decomposition, [11]. Usually de-
Furthermore we assume the following (as
scribed as a combination of Benders decomposition
was done in [5] for generalized Benders de-
and Dantzig-Wolfe decomposition, simultaneously
composition). The optimization with respect to
using the two methods in an iterative manner,
x of the Lagrangian functions must be possi-
the method borrows its basic convergence prop-
ble to do 'essentially independent' of y (called
erties from these two methods. However, one can
property P by A.M. Geoffrion). We there-
also view cross decomposition as the more general
fore assume that the functions ql, q2, q3 and
method, and Benders and Dantzig-Wolfe decom-
q4 exist, such that f ( x , y ) + u~Gi(x,y)+
position as modifications of cross decomposition,
uT2 G2(x, y) -- qi(q3(x, u), y, u), Vx, y, u, and
obtained by excluding one of the subproblems and
u~Gi(x, y) + ~T2G2(x, y) -- q2(q4(x, u), y, u),
one of the master problems.
Vx, y, ~, where q3 and q4 are scalar functions, qi
Cross decomposition was originally developed and q2 are increasing in their first argument, and
for linear mixed integer programming problems, is assumed to belong to the set of all possi-
[11], but the approach is more general and not ble nonnegative, normalized directions C - {~ >_
restricted to such problems. The first application 0" e T ~ - 1}, where e is a vector of ones. Since ],
of cross decomposition was to the capacitated fa- Gi and G2 are convex in x and bounded and Lip-
cility location problem, [12], and produced a so- schitzian on (X, Y), the same applies to qi for any
lution method which is recognized as one of the fixed u _> 0, and to q2 for any fixed ~ E C.
most efficient existing methods for that problem. The optimal solution of P is denoted by (x*, y*).
However, another early application was to the sto- We will also mention the case when P is convex,
chastic transportation problem (a convex problem i.e. where f, Gi and G2 are convex functions (in
with linear parts), [10]. y too) and Y is a convex set. Lagrangian duality
Here we will describe 'generalized cross decom- can be used to get a dual solution (the optimal
position', which was first proposed in [6], and more Lagrange multipliers), denoted by u* - ( u ~ , ul).
thoroughly treated in [7]. The generalization of the Let us for convenience introduce the following
procedure, parallel to that in [5] for generalized notation.
Benders decomposition, enables the solving of non-
linear programming problems with convex parts, L(x,y,u) - f(x, y ) + u~Gi(x,y) + uT2 G2(x,Y),
342
MINLP: Generalized cross decomposition
V _ I y E Y. 3x E X . Gl (x,y) <_ O, } /
G2(x, y) < 0 " q> ql | m i n q 3 ( x u (k)) y,u (k)) Vk E Pu,
\zEX ' ' - '
The problem is convex in x, so we can use La- %,
V= { yEY" ( -
maxminL(x,y,~)
~EC x EX
) <0 } . guments in q3 and q4, namely u and ~, are fixed.
Since the minima are attained, we use the nota-
The full primal master problem is given below: tion x (k), Vk E Pu, and ~(k), Vk E Ru, for the
minimizers of q3 and q4.
v* - min q
Inserting this, we obtain the final form of the
s.t. q>minL(x,y,u), Vu>_0, relaxed primal master problem.
xEX
0>minL(x,y,~), V~EC,
xEX VPM -- min q
yEY. (PM) s.t. q > L(x (k), y, u(k)), Vk E Pu,
This problem has an infinite number of con- 0 _> L(~(~), y, ~(k)), Vk E Ru,
straints, one for each nonnegative dual point and yEY.
one for each nonnegative dual direction. Each con-
straint contains an optimization problem (mini- The constraints in the first set are called value
mization with respect to x), which should in theory cuts, and those in the second set are called feasi-
be solved for all y E Y before the main problem, bility cuts.
minuey h(y), can be solved. However, we have
T h e D u a l M a s t e r P r o b l e m . Using Lagrangian
zexminL(x' y' u) - ql \zEx(minq3(x'u)' y, u)
duality on (P) yields a relaxation and a lower
and bound, VL, on v*"
minL(x y , ~ ) - q 2
xEX '
(
\xEX
minq4(x ~) y,
' '
) " VL -- maxg(ul)
Ul~O
Since ql and q2 are proper, convex, bounded and
Lipschitzian on X, and X is compact and convex, where, VUl _> 0,
343
MINLP: Generalized cross decomposition
g(Ul) -- min L1 (x, y, Ul) To handle unbounded dual solutions, ul, we can
use the following subproblem:
s.t. G2(x, y) <_ 0
xEX ~(ul) - min L1 (x, y, ul)
yEY. (UDS) s.t. G2(x, y) <_ 0
xEX
This leads to a dual master problem, which is a
convexification of the problem. If (P) is not convex yEY.
a duality gap might occur. We denote the subset (UDS) does not produce a bound on v*, but if
of the solutions that are included by (x(k),y(k)), ~(ul) _ 0 it yields a dual cut that will eliminate
Vk E Px, and obtain the restricted dual master it 1 •
problem as
VDM - maxq The Cross Decomposition Algorithm. In
(DM) s.t.
q < L1 (x (k) , y(k), ul), the subproblem phase of the cross decomposition
Vk E Px, method we iterate between the primal subproblem
(PS) and the dual subproblem (DS) (or (UDS)).
Ul~O.
The primal subproblem, (PS), supplies an upper
bound, h(~), on v*, and ul for the dual subprob-
The S u b p r o b l e m s . The primal subproblem is a
lem. The dual subproblem, (DS), supplies a lower
convex problem in x, obtained by fixing y to ~.
bound, g(ul), on v*, and ~ for the primal subprob-
h(~) - min f(x, ~) lem. If (PS) has an unbounded solution, ul, we use
(PS) s.t. Gl(x,~) <_ 0 (UDS) (instead of (DS)) to get ~.
G2(x, < 0 Unfortunately, the lack of controllability for the
important parts of the solutions, y and ul, which
xEX.
occurs unless the problem is strictly convex, im-
A solution to (PS) is assumed to consist of plies that this procedure alone cannot be expected
both a primal solution, x (k), and a dual solution, to converge to the optimal solution.
(k) u~k)
(u 1 , ). Due to the convexity we can use La- We therefore need to use the master problems to
grangian duality without creating a duality gap. ensure convergence. (PM) or (DM) can be solved
(PSL) h(~) - s u p m i n n ( x ~,u). with all the constraints generated by the subprob-
u>OXEX
lem solutions. We have all the known results for
If (PS) is infeasible, (PSL) will be unbounded in u, generalized Benders or nonlinear Dantzig-Wolfe
and a solution is represented by a direction, ~(k). decomposition to fall back on, so this technique
A valid cut for the primal master problem also is well known. After the solution of one master
requires a corresponding primal solution, ~(k), ob- problem, the subproblem phase is reentered. (We
tained by solving do not switch to Benders or Dantzig-Wolfe decom-
min L(z, ~, ~(k)). position completely.)
xEX We will later describe convergence tests that tell
(Note that ~(k) is not feasible in (PS).) us exactly when to use a master problem. The exis-
The dual subproblem is the following (noncon- tence of such convergence tests is a very important
vex) problem, obtained by relaxing the first set of aspect of cross decomposition. Let us, before get-
constraints in (P) and fixing the Lagrange multi- ting any further, give below a short algorithm for
pliers ul to ~1" cross decomposition algorithm.
g(ul) - - m i n n l ( z , y, ul) Let us denote the convergence test in step 3
(before (PS)) by CTP and the convergence test
(DS) s.t. G2(x, y) <_ 0 in step 6 (before (DS)) by CTD. The optimality
xEX tests (step 2 and step 5) are included in the con-
yEY vergence tests, and the decision about where to go
344
MINLP: Generalized cross decomposition
is based on the results of both tests. The algorithm tar problems and that the description of the
is pictured in Fig. 1. functions h(y) or g(ui) or the set Y is re-
. . . .
If C T P indicates that (PS) will not give fur- • Can ~ give a bound-improvement in ( P S ) ?
ther convergence, we use (PM). If CTD indicates • Can ui give a bound-improvement in (DS)?
failure of convergence for (DS), we can use (DM)
Testing extreme rays, ui, for convergence, we
(which however gives certain convergence only if
note that the subproblem (UDS) can not give
(P) is convex). After (PM) we go to (PS) and af-
bound-improvement. We call the test of un-
ter (DM) we go to (DS), in order to make use of
bounded solutions CTDU.
the output of the master problems. In the general
We now give the convergence tests, CT, with
nonconvex case, it is not necessary to use (DM).
strict inequalities, following [11]"
It is even possible to omit the convergence tests
C T D if only (DM) is used. CTP If L(x (k), ~, u(k)) < ~, Vk E Pu,
and ~(~(k), ~, ~(k)) < 0, Vk E Ru,
then y will give primal improvement. If
T h e C o n v e r g e n c e T e s t s . Returning to the ques- not, use a master problem.
tion of convergence in the subproblem phase, we CTD If L1 (x (k), y(k), ~ ) > v__,Vk E Px,
then ~i will give dual improvement. If not,
make the following definitions of ~-improvements.
use a master problem.
'~-bound-improvement' is an improvement of at CTDU If L~(x(k),y(k),~) > O, Vk E Px,
least e of the upper or lower bound. then ui will give dual cut-improvement. If
'e-cut-improvement' is a generation of a new, so . not, use a master problem.
far unknown cut, that is at least ~ better (i.e. has a We call C T D and the first part of C T P value
value of at least e higher or lower) than all known convergence tests and C T D U and the second part
cuts at some point. of C T P feasibility convergence tests. This conforms
Discussing linear mixed integer problems, as in to the notation of value and feasibility cuts in the
[11], one can let ~ = 0. In such a case we simply master problems.
omit e from the above notation. One can show t h a t the convergence tests C T P
Cut-improvement thus means that a new cut and C T D are necessary for bound-improvement
will be included in one of the restricted mas- and sufficient for cut- or bound-improvement, see
345
MINLP: Generalized cross decomposition
[7]. The convergence tests C T D U are sufficient for When the bounded set Y is completely de-
cut-improvement. scribed with an accuracy better t h a n e by either
However, there can be an infinite number of pri- value cuts or feasibility cuts, the e-convergence
mal a n d / o r dual improvements, so one can not be tests will fail (if not earlier). Each time the e-
certain that CT will fail within a finite number of convergence tests do not fail, we will get improve-
steps. For this reason it is necessary to consider ment according to one of the three cases mentioned
e-improvements. above.
We need the following e-convergence tests, CTe: A finite number of e-bound-improvements is ob-
CTPe If L(x (k), ~, u (k)) < ~ - e, Vk E Pu, viously sufficient to decrease the finite distance
and ~(~(k),~, ~(k)) < - e , Vk E Ru, between ~ and v* to less t h a n e. After an e-cut-
then ~ will give primal e-improvement. If improvement, the new cut describes h(y) with an
not, use a master problem. accuracy better t h a n e in the area around ~ where
CTDe If L~ (x (k), y(k), ~ ) >_ v__+ e, Vk E Px ,
h(y) < L ( x (1), y, u (1)) + e. Due to the Lipschitzian
then ~1 will give dual e-improvement. If
not, use a master problem. property of the functions f, G1 and G2, there is
CTDUe If L~(x (k), y(k), ~ ) >_ e, Vk E Px, a least distance, ~, proportional to ~, from ~ to
then ~1 will give dual e-cut-improvement. any point y violating this inequality, and the e-
If not, use a master problem. convergence tests will fail for any point with a dis-
The e-value convergence tests correspond to the tance to ~ less then ~. The bounded set V can
value cuts of the master problems, and the ~ used be completely covered by a finite number of such
corresponds directly to a change of e of the bounds areas.
(e-bound-improvement). The e-feasibility conver- In the third case, an ~l-bound-improvement to-
gence tests, on the other hand, correspond to fea- gether with an e2-cut-improvement, where el +
sibility cuts of the master problems, and the e used e2 = e, we can ignore the least of el and ~2, leaving
corresponds to the 'infeasibility' it gives some pre- us with the other one greater or equal to ~/2. This
viously feasible points, which is what we call e- yields one of the two cases above, so exchanging e
cut-improvement for feasibility cuts. W h i l e t h e s e for e/2 finiteness is still assured.
e-tests are sufficient for e-improvement, they are
For unbounded solutions to (PS), any y sat-
not necessary. To prove necessity would require
isfying ~(~(l),y,~(0) > _~ will make the ~-
an inverse Lipschitz assumption, namely that for
convergence tests fail, and because of the Lips-
points a certain distance apart, the value of a func-
chitzian property of G1 and G2 there is a least
tion (the feasibility cut) should differ by at least a
distance, ~ (proportional to ~), from ~ to any y
certain amount. The following result is proved in
not making the e-convergence tests fail. Thus an
[7].
area of a certain least size is made 'infeasible', and
The e-value convergence tests of CTPe, the
the bounded set Y \ V can be covered by a finite set
feasibility convergence tests of C T P and the e-
of such areas. Thus C T P e will fail within a finite
convergence tests CTDe are necessary for e-bound-
number of steps.
improvement. The e-convergence tests CTe are
sufficient for e-bound- or e-cut-improvement, in Note that it is enough that C T P e fails. To ob-
the sense that they are sufficient for one of the tain finiteness we do not need to use CTDe, even
following. if it might be useful in practice. We cannot show
that CTDe will fail within a finite number of steps.
I) e-bound-improvement.
Dual e-bound-improvement can only occur a finite
II) e-cut-improvement. number of times, but dual e-cut-improvement can
III) el-bound-improvement and ~2-cut- occur an infinite number of times, since the area to
improvement, where el + ~2 - - £. be covered by the cuts is the nonnegative orthant
Now it is possible to verify finiteness of the con- of Ul.
vergence tests. A formal proof for this can be found We therefore require t h a t (PM) is used regu-
in [7]. The following reasoning is used. larly. (One could even skip (DM) completely.) The
346
MINLP: Generalized cross decomposition
following is our main result. sible extent. Therefore the theoretical result that
generalized cross decomposition equipped with ~-
THEOREM 1 The generalized cross decomposition
algorithm equipped with c-convergence tests CT¢ convergence tests does not have asymptotically
finds an e-optimal solution to (P) in a finite num- weaker convergence than generalized Benders de-
ber of steps, if the generalized Benders decompo- composition, is quite satisfactory.
sition algorithm does. [i] Finally one might mention that these ap-
proaches also has been applied to pure (not mixed)
All the results for generalized Benders decompo- integer programming problems in [8] (nonlinear)
sition can be directly used for generalized cross and [9] (linear). In such cases, various duality
decomposition, especially the following two. gaps appear, and exact solution is not possible.
In [5] it is shown that generalized Benders de- However, the approach may be useful for obtain-
composition has finite exact convergence if Y is a ing good bounds on the objective function value,
finite discrete set. The worst case is solving the pri- which are to be used in branch and bound meth-
mal subproblem with each possible y E Y, which ods.
will give a perfect description of h(y) and V on Y. See also: D e c o m p o s i t i o n p r i n c i p l e of lin-
Therefore we know that if Y is a finite discrete ear p r o g r a m m i n g ; G e n e r a l i z e d B e n d e r s de-
set, the generalized cross decomposition algorithm composition; M I N L P : Logic-based meth-
will solve P exactly in a finite number of steps. ods; Simplicial d e c o m p o s i t i o n a l g o r i t h m s ;
It is also shown in [5] that generalized Ben- S t o c h a s t i c linear p r o g r a m m i n g : D e c o m -
ders decomposition terminates in a finite num- p o s i t i o n a n d c u t t i n g planes; Simplicial
ber of steps to an e-optimal solution, i.e. where d e c o m p o s i t i o n ; Successive q u a d r a t i c pro-
- v < ~ for any given ~ > 0, if the set of in- gramming: Decomposition methods; Chem-
teresting (Ul, u2)-solutions (possible optimal solu- ical process p l a n n i n g ; M i x e d i n t e g e r lin-
tions to the primal subproblem) is bounded and ear p r o g r a m m i n g : M a s s a n d h e a t ex-
Y C_ V. This makes the primal feasibility cuts (and c h a n g e r n e t w o r k s ; M i x e d i n t e g e r nonlin-
the corresponding convergence tests) unnecessary. ear p r o g r a m m i n g ; M I N L P : O u t e r a p p r o x -
So for generalized cross decomposition, we know i m a t i o n a l g o r i t h m ; G e n e r a l i z e d o u t e r ap-
the following. p r o x i m a t i o n ; E x t e n d e d c u t t i n g p l a n e algo-
If h(y) is bounded from above for all y E Y, i.e. rithm; MINLP: Branch and bound meth-
(PS) has a feasible solution for every y E Y, then ods; M I N L P : B r a n c h a n d b o u n d global opti-
the cross decomposition algorithm (without UDS m i z a t i o n a l g o r i t h m ; M I N L P : Global o p t i m i -
and the e-feasibility convergence tests of CT¢) will zation with aBB; M I N L P : Heat exchanger
yield finite ~-convergence, i.e. yield ~ - v < ~ in a n e t w o r k synthesis; M I N L P : R e a c t i v e dis-
finite number of steps, for any given ¢ > 0. t i l l a t i o n c o l u m n synthesis; M I N L P : Design
If Y ~ V one might get asymptotic conver- a n d s c h e d u l i n g of b a t c h processes; M I N L P :
gence of the feasibility cuts, i.e. solutions getting A p p l i c a t i o n s in t h e i n t e r a c t i o n of design
closer and closer to the feasible set, but never actu- a n d control; M I N L P : A p p l i c a t i o n in facility
ally becomes feasible. If one is reluctant to base a l o c a t i o n - a l l o c a t i o n ; M I N L P : A p p l i c a t i o n s in
stopping criterion on e-feasible solutions, one could blending and pooling problems.
use penalty functions, which transforms feasibility
cuts to value cuts and gives better possibilities of References
handling cases where Y ~ V. One could also use [1] BENDERS, J.F.: 'Partitioning procedures for solving
artificial variables for this purpose. As for nonlin- mixed-variables programming problems', Numerische
ear penalty function techniques, one should not Math. 4 (1962), 238-252.
forget the Lipschitzian assumption made. [2] DANTZIG, G.B.: Linear programming and extensions,
Princeton Univ. Press, 1963.
The practical motivation behind cross decompo- [3] DANTZIG, G.B., AND WOLFE, P.: 'Decomposition
sition is to replace the hard primal master problem principle for linear programs', Oper. Res. 8 (1960),
with the easier dual subproblem to the largest pos- 101-111.
347
MINLP: Generalized cross decomposition
[4] FLOUDAS, C.A.: Nonlinear and mixed-integer optimi- guarantees convergence to the global optimum of a
zation: Fundamentals and applications, Oxford Univ. much broader class of problems. The integer vari-
Press, 1995.
ables may participate in the problem in a very gen-
[5] GEOFFRION, A.M.: 'Generalized Benders decomposi-
tion', J. Optim. Th. Appl. 10 (1972), 237-260. eral way, provided that the continuous relaxation
[6] HOLMBERG, K.: 'Decomposition in large scale math- of the MINLP is C 2 continuous. This article de-
ematical programming', PhD Thesis Dept. Math. scribes both algorithms.
L inkbping Univ. (1985).
[7] HOLMBERG, K.: 'On the convergence of cross decom-
position', Math. Program. 47 (1990), 269-296. The SMIN-aBB A l g o r i t h m . The SMIN-
[8] HOLMBERG, K.: 'Generalized cross decomposition ap- a B B algorithm [1], [3], [7] guarantees finite e-
plied to nonlinear integer programming problems: Du- convergence to the global solution of MINLPs be-
ality gaps and convexification in parts', Optim. 23
longing to the class
(1992), 341-356.
[9] HOLMBERG, K.: 'Cross decomposition applied to inte-
ger programming problems: Duality gaps and convexi-
min f(x) + x T A I y + c~y
x,y
fication in parts', Oper. Res. 42, no. 4 (1994), 657-668. T
s.t. gi(x) + x TAg,iy + Cg,iy ___O,
[10] HOLMBERG, K., AND J(3RNSTEN, K.: 'Cross decompo-
sition applied to the stochastic transportation prob- i = 1,...,m,
lem', Europ. J. Oper. Res. 17 (1984), 361-368. T (1)
h(x) + x TAh,iy + Ch,iY -- O,
[11] RoY, T.J. VAN: 'Cross decomposition for mixed inte-
ger programming', Math. Program. 25 (1983), 46-63. i = 1,... ,p,
[12] RoY, T.J. VAN: 'A cross decomposition algorithm for X E [ x L , x U]
capacitated facility location', Oper. Res. 34 (1986),
y e {0, 1} q
145-163.
Kaj Holmberg where f (x), g(x), and h(x), are continuous, twice-
Dept. Math. Linkbping Inst. Technol. differentiable functions, m is the number of in-
SE-581 83 Linkbping, Sweden
equality constraints, p is the number of equality
E-mail address: kahol@mai, l i u . se
constraints, q is the dimension of the binary vari-
MSC2000: 90Cll, 90C30, 49M27 able vector, AI, Ag,i and Ah,i are n x q matrices,
Key words and phrases: decomposition, primal-dual, non-
and c f, Cg,i and Ch,i a r e q-dimensional vectors.
linear, mixed integer.
The main features of any branch and bound al-
gorithm are the strategy used to generate valid
lower and upper bounds for the problem and
MINLP: GLOBAL OPTIMIZATION WITH
the selection criteria for the branching node and
o BB
the branching variable. Optionally, a procedure to
The aBB global optimization algorithm for con-
tighten the variable bounds may be considered.
tinuous twice-differentiable NLPs (cf. a B B algo-
Each one of these issues is examined in the context
r i t h m ) [2], [4], [5], [6], [8], [18] can be used to de-
of the SMIN-aBB algorithm.
sign global optimization algorithms for mixed in-
teger nonconvex problems [1], [3], [7]. One such al- Generation of Valid Upper and Lower Bounds. A
gorithm, the special structure mixed integer a B B local solution of the nonconvex MINLP (1) using
algorithm (SMIN-aBB) is designed to address the one of the algorithms described in [13] constitutes
class of MINLPs in which all the integer variables a valid upper bound on the global optimum solu-
are binary variables that participate in linear or tion of that problem. The generalized Benders de-
mixed-bilinear terms and in which the nonconvex composition (GBD) [10], [14] or a standard MINLP
functions in the continuous variables have continu- branch and bound algorithm (B&B) [9], [11], [15],
ous second order derivatives. This algorithm is an [19], [20] may be used to obtain such a solution.
extension of the aBB algorithm and branching is When there are no mixed-bilinear terms, the outer
performed on both the continuous and the binary approximation with equality relaxation (OA/ER)
variables. A second algorithm, the general struc- [12], [16] may also be used. Alternatively, the bi-
ture mixed integer a B B algorithm (GMIN-aBB), nary variables may be fixed to a combination of
348
MINLP: Global optimization with c~BB
0 and 1 values and the resulting nonconvex NLP branched on. If a continuous variable is judiciously
may be solved locally. chosen, the partition results in an improvement of
A relaxed problem which can be solved to global the lower bound on the problem through a tight-
optimality must be constructed from problem (1) ening of the convex relaxation of the nonconvex
in order to obtain a valid lower bound. The class of continuous functions. Binary variables have an in-
MINLPs in which the continuous functions ] ( x ) , direct effect on the quality of the convex underes-
9i(x), and hi(x), are convex can be solved to timators as they influence the range of values that
global optimality using the GBD or B&B algo- the continuous variables can take on.
rithms, and, when there are no mixed-bilinear A first branching variable selection scheme ex-
terms, the O A / E R algorithm. To identify a guar- ploits the direct relationship between the range
anteed lower bound on the solution of the problem, of the continuous variables and the quality of the
it therefore suffices to construct convex underesti- lower bounds and therefore branches only on these
mators for the nonconvex functions f ( x ) , gi(x), variables. One of the rules available for the c~BB
and hi(x), and to solve the resulting problem with algorithm [2] is used for the selection. These are
one of these algorithms. The rigorous convexifica- based on the size of the variable ranges, or on a
tion/relaxation strategy used in the c~BB algorithm measure of the quality of the underestimator for
for nonconvex continuous problem [2], [4], [5], [6] each term, or on a measure of each variabJe's over-
allows the construction of the desired lower bound- all contribution to the quality of the underestima-
ing MINLP. This scheme is based on a decomposi- tors.
tion of the functions into a sum of terms with spe- A second approach aims to first tackle the com-
cial mathematical structure, such as linear, con- binatorial aspects of the problem by branching
vex, bilinear, trilinear, fractional, fractional tri- only on binary variables for the first q levels of the
linear, univariate concave and general nonconvex branch and bound tree, where q is the number of
terms. A different convex relaxation technique is binary variables. The nonconvexities are dealt with
then applied for each class of term. The fact that on subsequent levels of the tree, by branching on
a summation of convex functions is itself a con- the continuous variables. The specific binary vari-
vex function is then used to construct overall func- able used for branching is chosen randomly or from
tion underestimators and arrive at a convex lower a priority assigned on the basis of its effect on the
bounding MINLP. structure of the problem. In particular, the binary
variables that influence the bounds on the greatest
Selection o] Branching Node. A list of the lower
number of variables are given the highest priori-
bounds on all the nodes that have not yet been
ties. Once all the binary variables have been fixed,
explored during the branch and bound procedure
the problems that must be considered are continu-
is maintained. A number of approaches can be used
ous nonconvex and convex problems for the upper
to select the next branching node, such as depth-
and lower bound respectively. The bounding of the
first, breadth-first or smallest lower bound first.
nodes below level q is therefore less computation-
Since the purpose of the algorithm is to identify
ally intensive than above that level.
the global solution of the problem, all promising
regions, that is, all regions for which the lower A third approach also involves branching on
bound is less than or equal to the best upper bound the continuous and binary variables although the
on the solution, must be explored. The strategy choice is no longer based on the level in the tree.
that usually minimizes the number of nodes to be To increase the impact of binary variable branch-
examined and therefore the CPU requirements of ing on the quality of the lower bound, such a vari-
the algorithm is used to choose the next branching able is selected when a continuous relaxation of
node in the SMIN-c~BB algorithm. Thus, the node the problem indicates that the two children node
with the smallest lower bound is selected. will have significantly different lower bounds, and
that one of them may even be infeasible. Thus,
Selection o] Branching Variable. Several strate- if one of the binary variables is close to 0 or 1
gies can be used to select the next variable to be at a local solution of the continuous relaxation, it
349
MINLP: Global optimization with aBB
is branched on. The degree of closeness is an ar- Yi E {0, 1} whose bounds are being updated. The
bitrary parameter which can typically be set to procedure above is used.
0.1 or 0.2. If no 'almost-integer' binary variable is
found, a continuous variable is selected for branch- Algorithmic Procedure. The algorithmic procedure
ing. In general, this hybrid strategy results in a for the SMIN-aBB algorithm is as follows:
faster improvement in the lower bounds than the
second approach, but it is more computationally
intensive because a continuous relaxation must be
solved before selecting a branching variable and a
larger number of MINLP nodes may be encoun- PROCEDURE SMIN-aBB algorithm()
tered during the branch and bound search. Decompose functions in problem;
Set tolerance e;
Variable Bound Updates. The tightening of vari- Set f" = f0 = - c ~ and f* = ~o = +c~;
able bounds is a very important step because of Initialize list of lower bounds {f_0};
m
updates are beneficial in two ways. First, they in- Pseudocode for the SMIN-aBB algorithm.
directly lead to the construction of tighter under-
estimators as they affect the continuous variable
bounds. Second, they allow a binary variable to be
fixed and therefore decrease the number of combi-
nations that potentially need to be explored. An
interval-based strategy can be used to carry out In order to illustrate the algorithmic procedure,
binary variable bound updates. Given the current a small example proposed in [17] is used. It is a
upper bound f* on the global optimum solution, simple design problem where one of two reactors
the feasible region F is defined by the constraints must be chosen to produce a given product at the
appearing in the nonconvex problem, a new con- lowest possible cost. It involves two binary vari-
straint f ( x ) + x T A f y + c}-y < ]*, and the box ables, one for each reactor, and seven continuous
(x,y) E [xL,x U] x [yn,yV]. Consider a variable variables. The formulation is:
350
MINLP: Global optimization with a B B
351
MINLP: Global optimization with aBB
The SMIN-aBB algorithm is especially effective node of the branch and bound tree is obtained by
for chemical process synthesis problem such as dis- solving a continuous relaxation of the nonconvex
tillation network or heat exchanger network syn- MINLP at that node. When the integer variables
thesis [1], [3]. that have not yet been fixed are allowed to vary
continuously between their bounds, the problem
T h e G M I N - a B B A l g o r i t h m . The G M I N - a B B becomes a nonconvex NLP. The validity of the
algorithm is designed to address the broad class of lower bound can only be ensured if the global so-
problems represented by lution of this nonconvex NLP is identified or if a
lower bound on this solution is found. On the other
rain f ( x , y)
x,y hand, when all integer variables have been fixed to
s.t. g(x, y) _~ 0 integer values at a node, no additional partitioning
h(x, y) -- 0 (2) of this node can take place and the global optimum
solution of the nonconvex NLP is required to guar-
x e [x ,x U]
antee convergence of the GMIN-aBB. Based on
yE[yL,yU]MN q
these conditions, the aBB algorithm can be used
where f ( x , y), g(x, y), and h(x, y), are functions as as subroutine to generate valid lower bounds:
whose continuous relaxation is twice continuously
If at least one integer variable can be relaxed
differentiable.
at the current node, run the aBB algorithm
The GMIN-aBB algorithm [2], [3], [7] extends
for a few iterations to obtain a valid lower
the applicability of the standard branch and bound
bound on the global solution of the continu-
approaches for MINLPs [9], [11], [13], [15], [19], [20]
ous relaxation or run the aBB algorithm to
by making use of the aBB-algorithm. The most
completion to obtain the global solution of
crucial characteristics of the algorithm are the
the continuous relaxation.
branching strategy, the derivation of a valid lower
bound on problem (2), and the variable bound up- • Otherwise, run the aBB algorithm to com-
date strategies. pletion to obtain the global solution for the
current node.
Branching Variable Selection. Branching in the
GMIN-aBB algorithm is carried out on the in- This strategy makes use of the convergence charac-
teger variables only. When it is a bisection, the teristics of the aBB algorithm to improve the per-
partition takes place either at the midpoint of the formance of the GMIN-aBB algorithm. The rate
range of the selected variable, or at the value of of improvement of the lower bound on the global
that variable at the solution of the lower bound- solution of a nonconvex NLP is usually very high
ing problem. It is also possible to branch on more at early iterations and then gradually tapers off.
than one variable at a given node, or to perform At later stages of an aBB run, the computationally
k-section on one of the variables. More than two expensive reduction of the gap between the bounds
children node may be created from a parent node on the solution of the continuous relaxation does
when the structure of the problem is such that the not result in a sufficiently significant increase in
bounds on a small fraction of the integer variables the lower bound to affect the performance of the
affect the bounds on many of the other variables in GMIN-aBB algorithm and can therefore be by-
the problem. As in the SMIN-aBB algorithm, an passed.
integer variable is chosen randomly or according
Generation of a valid upper bound. Because of the
to branching priorities. An additional rule consists
finite size of the branch and bound tree, it is not
of selecting the most or least fractional variable
necessary to generate an upper bound on the non-
at the solution of a continuous relaxation of the
convex MINLP at each node in order to guaran-
problem.
tee convergence of the GMIN-aBB algorithm. In
Generation of a Valid Lower Bound. A guaranteed the worst case, the integer variables are fixed at
lower bound on the global solution of the current every node of the last level of the tree, and the
352
MINLP: Global optimization with aBB
solutions of the corresponding NLPs provide the tor of objective function, ]* denotes the current
upper bounds needed to. identify the global opti- best upper b o u n d on the global o p t i m u m solu-
m u m solution. However, upper bounds play a sig- tion, C(x, y, w) denotes the set of convexified con-
nificant role in improving the convergence rate of straints, and w is the set of new variables intro-
the algorithm by allowing the fathoming of nodes duced during the convexification/relaxation proce-
whose lower b o u n d is greater t h a n the smallest up- dure. Finally, the improved lower or upper bound
per bound and therefore reducing the final size of is obtained by setting yL _ [y,] or yU _ [y,].
the branch and bound tree. An upper bound on the In the interval-based approach, an iterative pro-
solution of a given node can be obtained in several cedure is followed based on an interval test which
ways. For example, if the solution of the contin- provides sufficient conditions for the infeasibility of
uous relaxation is integer-feasible, that is, all the the original constraints and the 'bound improve-
relaxed integer variables have integer values at the ment constraint' f ( x , y ) <_ f*, given the relaxed
solution, this solution is both a lower and an upper region (x, y) C Ix L, x U] × [yL, yU]. This set of con-
bound on the current node. If the a B B algorithm straints defines a region denoted by F. The proce-
was run for only a few iterations and the relaxed dure to improve the lower (upper) bound on vari-
integer variables are integer at the lower bound, able yi is as follows:
they can be fixed to these integer values and the
PROCEDURE interval-based bound update()
resulting nonconvex NLP can be solved locally to
Set initial bounds L = yL and U = yV;
yield an upper bound on the solution of the node. Set iteration counter k = 0;
Finally, a set of integer values satisfying the integer Set maximum number of iterations K;
constraints can be used to construct a nonconvex DOk<KandL~=U
NLP whose local solutions are upper bounds on Compute 'midpoint' M - [(U + L)/2];
the current node solution. Set left region
{(x,y) e F: y, e [L, M]};
Variable Bound Updates. If the bounds on the in- Set right region
{(x,y) e F: y, e [M + 1, U]};
teger variables at any given node can be tight-
Test interval feasibility of left
ened, the solution space can be significantly re- (right) region;
duced due to the combinatorial nature of the prob- IF feasible,
lem. The allocation of computational resources for Set U = M ( L = M ) ;
this purpose is therefore a potentially worthwhile ELSE
investment. An optimization-based approach or an Test interval feasibility of right
(left) region;
interval-based approach may be used to update the
IF feasible,
variable bounds. These approaches are similar to Set L - M ( U = M ) ;
those developed for the a B B algorithm but they ELSE
take advantage of the integrality of the variables. IF k=O,
Thus, in the optimization approach, the lower or RETURN(infeasible node);
ELSE
upper bound on variable yi is improved by first re-
SetL-U (U-L);
laxing the integer variables, and then solving the Set U = y V (L=yL);
convex NLP Set k = k + 1;
OD;
min or maxx,y,w Yi RETURN(y L = L (y~ = U));
s.t. /(x,y,w) ~_ 7* END interval-based bound update;
C(x, y, w) Interval-based bound update procedure.
(3)
x e Ix L, x v ]
The variable b o u n d tightening is performed be-
y e [yL, yU] fore calling the a B B algorithm to obtain a lower
w e [w L , w v] bound on the solution of the current node. In many
cases, during an a B B run, variable bound updates
where f ( x , y, w) denotes the convex underestima- are also used to improve the quality of the gener-
353
MINLP: Global optimization with c~BB
354
MINLP: Global optimization with aBB
355
MINLP: Global optimization with a B B
[13] FLOUDAS, C.A.: Nonlinear and mixed integer optimi- minimum temperature approach, ATtain, places
zation: Fundamentals and applications, Oxford Univ. a lower bound on the utility consumption in a
Press, 1995.
heat exchanger network and decomposed a heat
[14] GEOFFRION, A.M.: 'Generalized Benders decomposi-
tion', J. Optim. Th. Appl. 10 (1972), 237-260. exchanger network into independent subnetworks.
[15] GUPTA, O.K., AND RAVINDRAN, R.: 'Branch and This enables the heat exchanger network synthesis
bound experiments in convex nonlinear integer pro- problem to be decomposed into four subproblems.
graming', Managem. Sci. 31 (1985), 1533-1546. The first subproblem finds the appropriate mini-
[16] KocIs, G.R., AND GROSSMANN, I.E.: 'Relaxation
mum temperature approach, the second subprob-
strategy for the structural optimization of process flow
sheets', Industr. Engin. Chem. Res. 26 (1987), 1869.
lem minimizes the utility consumption, the third
[17] KocIs, G.R., AND GROSSMANN, I.E.: 'A modelling subproblem finds the minimum number of matches
and decomposition strategy for the MINLP optimiza- and identifies the matches and their heat duty, and
tion of process flowsheets', Computers Chem. Engin. the fourth finds and optimizes the actual network
13 (1989), 797-S19. structure.
[18] MARANAS, C.D., AND FLOUDAS, C.A.: 'Global
minimum potential energy conformations of small
See [5] for a systematic scheme for solving these
molecules', J. Global Optim. 4 (1994), 135-170. problems sequentially. First, the utility consump-
[19] OSTROVSKY, G.M., OSTROVSKY, M . G . , AND tion is minimized using the linear programming
MIKHAILOW, G.W.: 'Discrete optimization of chemical (LP) transshipment model approach of [10]. Sec-
processes', Computers Chem. Engin. 14 (1990), III. ond, a set of process matches and their heat du-
[20] QUESADA, I., AND GrtOSSMANN, I.E.: 'An LP/NLP
ties that minimize the total number of units is
based branch and bound algorithm for convex MINLP
optimization problems', Computers Chem. Engin. 16 found with the mixed integer linear programming
(1992), 937-947. (MILP) strategy of [10]. Then, the network struc-
ture is found [5] by optimizing a superstructure
Claire S. Adjiman
Dept. Chemical Engin. Princeton Univ.
that contains all possible network configurations
Princeton, NJ 08544-5263, USA embedded within it using a nonlinear program-
E-mail address: ¢laire@titem. princeton, e d u ming (NLP) problem. When there is more than
Christodoulos A. Floudas one combination of matches and heat duties that
Dept. Chemical Engin. Princeton Univ. satisfies the minimum unit criterion, the best com-
Princeton, NJ 08544-5263, USA bination is found by exhaustive enumeration. The
E-mail address: floudas@titan, princeton, e d u minimum temperature approach is optimized with
MSC2000: 65K05, 90Cll, 90C26 a golden section search that solves all three of these
Key words and phrases: global optimization, twice- optimization problems at each iteration.
differentiable MINLPs, branch and bound, aBB algorithm.
In the late 1980s it was found, [4], [12], that bet-
ter network designs could be obtained by solving
some of the heat exchanger network design sub-
M I N L P : HEAT EXCHANGER NETWORK problems simultaneously. C.A. Floudas and A.R.
SYNTHESIS Ciric [4] combined the MILP stream matching
Heat exchanger network synthesis problems arise problem with the NLP superstructure optimiza-
in chemical process design when the heat released tion problem formulated in [5], creating a mixed
by hot process streams is used to satisfy the de- integer nonlinear programming problem (MINLP)
mands of cold process streams. These problems that avoided the exhaustive search through all
have been the subject of an intensive research ef- combinations of matches that minimize the num-
fort, and over 400 publications have been written ber of units. In 1990, they [2] formulated the en-
in the area. See [7], [8], [9] for reviews of the area, tire heat exchanger network design problem as a
and [1], [3] for detailed analysis of HEN synthesis. MINLP. The solution of this problem yields the
The discovery by T. Umeda et al. [11] of a optimal temperature approach, utility level, pro-
thermodynamic pinch point that limits heat in- cess matches, heat duties, and network structure,
tegration in a heat exchanger network led to much eliminating the need for a global section search for
of this research effort. They showed that setting the optimum minimum temperature approach.
356
MINLP: Heat exchanger network synthesis
T.F. Yee and I.E. Grossmann [12] used a smaller within it. Two superstructures are particularly in-
superstructure proposed in [6] that embodies a teresting.
sequential-parallel network structure to formulate
an alternative MINLP for heat exchanger network f~.~a,i
.~,
P r o b l e m S t a t e m e n t . This article will explore
two mixed integer nonlinear programming prob- Fig. 1. A superstructure for one hot stream exchanging
lems in heat exchanger network synthesis: com- heat with two cold streams.
bined match-network optimization and heat ex- Fig. 1 shows a superstructure of a hot stream,
changer network synthesis without decomposition. above the thermodynamic pinch point, that may
The synthesis without decomposition problem can exchange heat with two cold streams [5]. Notice
be stated as follows: that the stream can be piped in series, in parallel,
Given: and in split-mix-bypass configurations, as shown
in Fig. 2. As we shall see, this richness leads to
1) A set of hot process streams and hot utilities
nonconvex constraints in the MINLP. The first
i E H, their inlet and outlet temperatures
network superstructure is created by constructing
T i, T °'i, and heat capacity flow rates Fi;
similar structures for every other stream above the
2) A set of cold process streams and cold util- pinch point.
ities j E C, their inlet and outlet tempera-
tures T j, T O'j, and heat capacity flow rates
FJ; and
F"T~//"I
3) Overall heat transfer coefficients Uij.
Determine"
A) The stream matches (ij), the heat duty Qij
I e.I, t/. ~ ,o.l fa o.I
of match (ij), and the heat exchanger area
Aij of match (i j);
F',Tj
B) the piping structure for each stream in the
network; and
~..s,I ~ @ to.~
c) the temperature and flowrate within each /'"' t
pipe of the network. j~o.I
In the match-network problem, one is also given ./~.at1
357
MINLP: Heat exchanger network synthesis
pinch point has partitioned the temperature range neously minimizes the utility consumption, selects
into two intervals, and in each interval, individual the stream matches, and optimizes the network
process streams can only exchange heat once. layout, in heat exchanger network synthesis with-
One could increase the number of times two out decomposition.
streams can exchange heat by partitioning the
Match-Network Problem. The MINLP model of
temperature range further. This is the basic strat-
the match-network problem has three components:
egy behind the second superstructure [6], [12]
a transshipment model [10] that identifies feasible
shown in Fig. 3. Here, the temperature range has
process stream matches and their heat duties, a
been partitioned into many intervals, or stages.
superstructure model of all possible network struc-
Within any particular stage, each hot stream may
tures, and an objective ]unction.
exchange heat with each cold stream; multiple in-
The transshipment model partitions the tem-
tervals allow any particular match to take place
perature range into t - 1 , . . . , T temperature in-
many times in the network. Unlike the first su-
tervals, using the inlet and outlet temperatures
perstructure, each stream in each stage is piped
of the streams and the temperature interval ap-
in a parallel configuration, and the inlet and out-
proach temperature (TIAT). Hot streams release
let temperature of each parallel line is fixed by the
heat into the temperature intervals, where it either
temperature interval. Series piping structures arise
flows to the cold streams in the same interval or
when a stream exchanges heat only once per inter-
cascades down to the next colder interval. The bi-
val. The superstructure does not contain split-mix-
nary variable Y~j denotes the existence of a match
bypass or series-parallel structures, but as we shall
between hot stream i and cold stream j, where
see that in exchange the nonconvex constraints
heat loads are qij and Qij, and heat residuals are
that arise from the first superstructure have been
eliminated. Rk.The model is composed by the following con-
straints:
st~l su~v-2
[ + =
I J6Ct
I ieg, jeCi,
~ie Rj t 1 T
TJEC, -- , . . . , ,
I ~ qiJt = QiJ ,
I ! '
Tcm~mtutc Tcmlmratttre Tcmocraure
h:gaOon
k=-I
kgaUon
k=2
location
le-3
I iEH, j ECi,
I Q~j - uY~j <_o,
Fig. 3: Two-stage superstructure. 1 iES, jECi,
358
MINLP: Heat exchanger network synthesis
The second part of the match-network synthe- where ATij,max equals T i - T j. Lastly, the objec-
sis model is the hyperstructure topology model, tive function minimizes the total investment cost"
which consists of mass and energy balances for the
mixers and splitters, feasibility constraints, utility
load constraint and bounds on the flow rate heat-
capacities. min ~a t~,~--t.Qii
oj' --t-'
o~ +t i~,j Yij.
iEa jEC Uij 3 ' J
Mass balances for the splitters at the inlet of the tI,i_tO,J
j
In t o ,i t I , j
superstructure: j
_
i
~ ¢I,k _ Fk k E HCT. The model is a mixed integer nonlinear program-
Jk ~
kI ming (MINLP) problem, as the objective function
Here, HCT is the set of all process streams and and the energy balances are nonlinear, and the de-
utilities. Mass balances for the mixers at the inlets cision variables Y/j are binary. Notice that the en-
of the exchangers: ergy balances are bilinear, creating a nonconvex
f kI,k ,eB,k E,k kt
feasible region.
, J k' ,k" -o, k RcT
ko Heat E x c h a n g e r N e t w o r k Synthesis without De-
Mass balances for the splitters at the outlets of the composition. MINLP models that optimize utility
exchangers: consumption as well as process matches, heat du-
ties, and network configurations can also be formu-
Sk0'~ + E'¢S'kk",k' -- f E''kk- 0, k', k E HOT.
lated. See [2] and [12] for pseudopinch approaches
ktt
that set the TIAT to a small value and lets heat
Energy balances for the mixers at the inlets of the
flow across the pinch. A strict decomposition at the
exchangers:
pinch can also be maintained by letting TIAT vary,
T k ~ I,k ~eB,k . O,k E , k . I,k and using integer variables to model the changing
~k, + ~-~ Jk' ,k,, t k' - f k, t k, - 0 ,
kit structure of the temperature cascade.
k ~, k E HCT. EXAMPLE 1 These techniques are demonstrated
Energy balances over the heat exchangers: with a problem given in both [12] and [2]. The
problem consists of two hot streams, two cold
Qij_ f.E,i
,3 (tI, i - tO, i / - 0 , i E H, j E C, streams, one hot utility (steam), and one cold util-
I , j ) --O. ity (cooling water). The stream data is given in
Qij -- fJiE , j ( t O , J _ ti
Table 1.
The minimum temperature approach between a Stream Ti,(C) Tout(C)FCp(kW/C)
hot stream and a cold stream: H1 500 320 6
H2 480 380 4
tjI,i _ tO,J >
_ ATmin,
H3 460 360 6
t O ' i - t I'j > ATmin. H4 380 360 20
H5 380 320 12
Logical relations between the heat-capacity flow C1 290 660 18
rates and the existence of a match: F 700 700
CW 300 320
f 7 'i - F i Y i j < O, U - 1.OkW/(m2C)
j ifE 'J _ F J Yi j _< O. Annual cost= 1200A°'6 for all exchangers
Cs = 140$/kW
Lower bounds on the heat-capacity flow rates C ~ = lO$/kW
through the exchanger: Table 1" Stream data for example problem.
359
MINLP: Heat exchanger network synthesis
290
el
Match Q (kW) A (m 2)
H1-C1 948.454 79.391 Fig. 5. Optimal network configuration for example
H1-CW 131.546 6.280 problem; simultaneous approach [12].
H2-C1 400.000 29.057
H3-C1 600.000 57.488 V]
H4-C1 400.000 14.880
H5-C1 720.000 25.509
S-C1 3591.546 32.112 Conclusions. Mixed integer nonlinear program-
ming offer a powerful approach to heat exchanger
Table 2: Match data for example problem; pseudo-pinch
network synthesis. Using these techniques, stream
method [2].
matching, the combinatorial component of heat
exchanger network synthesis, can be performed
Yee and Grossmann [12] used the same problem while simultaneously minimizing the utility con-
to demonstrate the simultaneous optimization ap- sumption and selecting the cost-optimal heat
proach. The problem is again formulated as a exchanger network configuration. Merging these
MINLP problem. The optimal network configura- tasks leads to more cost-effective stream matches
tion is given in Fig. 4. The annual cost of this and lower exchanger costs.
network is $576,640. HRAT is 13.1C. The match See also" Global o p t i m i z a t i o n of heat
data of this network is given in Table 3. e x c h a n g e r networks; M i x e d integer lin-
ear p r o g r a m m i n g : H e a t e x c h a n g e r net-
Match Q (kW) A (m 2) work synthesis; M I N L P : Mass and heat ex-
S-C1 3676.4 32.6 changer networks; C h e m i c a l process plan-
H1-C1 863.6 64.1 ning; M i x e d integer linear p r o g r a m m i n g :
H2-C1 400.0 17.1 Mass and heat e x c h a n g e r networks; M i x e d
H3-C1 600.0 47.0 integer n o n l i n e a r p r o g r a m m i n g ; M I N L P :
H4-C1 400.0 13.8
O u t e r a p p r o x i m a t i o n algorithm; General-
H1-CW 216.4 7.9
H5-C1 720.0 18.4
ized o u t e r a p p r o x i m a t i o n ; M I N L P " Gener-
alized cross decomposition; E x t e n d e d cut-
Table 3: Match data for example problem; simultaneous ting plane algorithm; M I N L P : Logic-based
approach [12].
m e t h o d s ; M I N L P " B r a n c h and b o u n d m e t h -
ods; M I N L P : B r a n c h and b o u n d global op-
114 H2 t i m i z a t i o n algorithm; M I N L P : Global op-
2 ° t i m i z a t i o n w i t h c~BB; G e n e r a l i z e d B e n d e r s
decomposition; M I N L P : R e a c t i v e distilla-
~ -160.4~ 660
tion c o l u m n synthesis; M I N L P : Design and
I.,~s .,- l~S.~ J N ,mo.z scheduling of b a t c h processes; M I N L P : Ap-
.~2. I
plications in t h e i n t e r a c t i o n of design and
0 control; M I N L P : A p p l i c a t i o n in facility
location-allocation; M I N L P : Applications in
Fig. 4: Optimal network configuration for example blending and pooling problems.
problem; pseudopinch [2].
360
MINLP: Logic-based methods
Kemal Sahin
Dept. Chemical Engin. Univ. Cincinnati
Cincinnati OH 45221, USA
I " I r: lO|
h,
ci - ~/i
< 0 V / 8'x =
L ci = 0 J
, i e I, (1)
361
MINLP: Logic-based methods
for MINLP [3] for solving problem (DP1), and in one linear approximation of each of the terms in
which the disjunctions are given as in equation the disjunctions. Selecting the smallest number of
(1), and all the functions are assumed to be con- subproblems amounts to the solution of a set cov-
vex. The algorithm consists of solving a sequence ering problem, which is of small size and easy to
of NLP subproblems and master problems, which solve [9].
are as follows. The above problem (MDP1) can be solved by
For fixed values of the boolean variables, Y~k = the methods described in [1] and [7]. It is also in-
A
true and Y/k -- false for i # i, the corresponding teresting to note that for the case of process net-
NLP subproblem is as follows" works Turkay and Grossmann [9] have shown that
if the convex hull representation of the disjunctions
'min Z-- Eck + f(x)
in (1) is used in (MDP1), then assuming B i - I
k
and converting the logic relations gt(Y) into the
s.t. g(x) < 0
inequalities A y < a, leads to the MILP problem,
hik(X) < 0
for YTk - - true" (MIPDF) minZ- ECk + f(x)
(NLPD) Ck -- ~Yik k
for Yik -- false • ~ B i x -- 0 such that
LCk -- 0 >_ f ( x l) + V f ( x l ) T ( x -- xl),
kESD,
g(x l) + V g ( x l ) T ( x -- x l) < O,
xER n, ci E R 1
l -- 1 , . . . , L ,
Note that for every disjunction k E S D only con-
hi(x l)T Xz, + VxN, hi(x t) T XN
1i
straints corresponding to the boolean variable Yik
that is true are imposed. Also, fixed charges 7ik _< y,,
are only applied to these terms. Assuming that K
subproblems (NLPD) are solved in which sets of ~.EK~, iEI,
linearizations l - 1 , . . . , K are generated for sub- 2
XN~ -- x ~ -t- XN~ ,
sets of disjunction terms L(ik) - {l" YiZk -- true}, 1 ~ U
0 ~ XNi Xgiyi,
one can define the following disjunctive OA master
problem: 2 i __
O ~ XN <xV(l_yi)
362
MINLP: Mass and heat exchanger networks
[2] on the MILP master problem of the OA al- [3] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer-
gorithm, is equivalent to generating a generalized approximation algorithm for a class of mixed-integer
nonlinear programs', Math. Program. 36 (1986), 307.
Benders cut. Therefore, a logic-based version of the
[4] GEOFFRION, A.M.: 'Generalized Benders decomposi-
generalized Benders method consists of performing tion', J. Optim. Th. Appl. 10, no. 4 (1972), 237-260.
one Benders iteration on the MILP master prob- [5] Kocm, G.R., AND GROSSMANN, I.E.: 'A modeling and
lem (MIPDF). It should also be noted that slacks decomposition strategy for the MINLP optimization
can be introduced to (MDP1) and to (MIPDF) of process flowsheets', Computers Chem. Engin. 13
to reduce the effect of nonconvexities as in the (1989), 797.
[6] LEE, S., AND GROSSMANN, I.E.: 'New algorithms for
augmented-penalty MILP master problem [10]. nonlinear generalized disjunctive programming', Com-
Finally, it should be noted that S. Lee and puters Chem. Engin. 24 (2000), 2125.
Grossmann [6] have developed a new branch and [7] RAMAN, R., AND GROSSMANN, I.E.: 'Symbolic integra-
bound method and a MINLP reformulation that is tion of logic in mixed integer linear programming tech-
niques for process synthesis', Computers Chem. Engin.
based on the convex hull of each of the disjunctions
17' (1993), 909.
in (DP 1) with nonlinear inequalities. [8] RAMAN, R., AND GROSSMANN, I.E.: 'Modelling and
See a l s o " Disjunctive programming; computational techniques for logic based integer pro-
Reformulation-linearization m e t h o d s for gramming', Computers Chem. Engin. 18 (1994), 563.
global o p t i m i z a t i o n ; M I N L P : B r a n c h and [9] TURKAY, M., AND GROSSMANN, I.E.: 'A logic based
outer-approximation algorithm for MINLP ~optimiza-
bound methods; MINLP: Branch and
tion of process flowsheets', Computers Chem. Engin.
bound global optimization algorithm; 20 (1996), 959-978.
M I N L P : Global o p t i m i z a t i o n w i t h (~BB; [10] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
M I N L P : G e n e r a l i z e d cross decomposition; bined penalty function and outer-approximation
D e c o m p o s i t i o n principle of linear p r o g r a m - method for MINLP optimization', Computers Chem.
ming; G e n e r a l i z e d B e n d e r s decomposition; Engin. 14 (1990), 769.
Simplicial d e c o m p o s i t i o n algorithms; Sto- Ignacio E. Grossmann
chastic linear p r o g r a m m i n g : D e c o m p o s i t i o n Carnegie Mellon Univ.
Pittsburgh, PA, USA
and c u t t i n g planes; Simplicial decomposi-
E-mail address: grossmann©cmu, edu
tion; Successive q u a d r a t i c p r o g r a m m i n g :
MSC2000: 90C10, 90C09, 90Cll
D e c o m p o s i t i o n m e t h o d s ; C h e m i c a l process
Key words and phrases: generalized disjunctive pro-
planning; M i x e d integer linear p r o g r a m - gramming, disjunctive programming, outer approximation
ming: Mass a n d heat e x c h a n g e r networks; method, generalized Benders decomposition, mixed integer
M i x e d integer n o n l i n e a r p r o g r a m m i n g ; programming.
M I N L P : O u t e r a p p r o x i m a t i o n algorithm;
Generalized outer approximation; Extended
c u t t i n g plane algorithm; M I N L P : H e a t ex- MINLP: MASS A N D HEAT E X C H A N G E R
c h a n g e r n e t w o r k synthesis; M I N L P : Reac- NETWORKS, M E N , M H E N
tive distillation c o l u m n synthesis; M I N L P :
Design a n d scheduling of b a t c h processes; Mass integration in the form of mass exchanger
M I N L P : A p p l i c a t i o n s in t h e i n t e r a c t i o n of networks, MEN, appears in the chemical indus-
design and control; M I N L P : A p p l i c a t i o n in tries as an economic alternative in waste treat-
facility location-allocation; M I N L P : Appli- ment, feed preparation, product separation, recov-
cations in b l e n d i n g a n d pooling problems. ery of valuable materials, etc. The MEN involves a
set of rich streams, wherefrom one or more compo-
nents are removed by means of lean streams (mass
References separating agents) in mass transfer operations that
[1] BEAUMONT, N.: 'An algorithm for disjunctive pro- do not require energy (constant pressure and tem-
grams', Europ. J. Oper. Res. 48 (1991), 362-371.
[2] BENDERS, J.F.: 'Partitioning procedures for solv-
perature).
ing mixed-variables programming problems', Numer. The MEN synthesis/design problem is posed as
Math. 4 (1982), 238-252. a combinatorial problem, involving discrete and
363
MINLP: Mass and heat exchanger networks
continuous decisions (e.g. the mass exchange op- increasing thus the considered MEN struc-
erations/matches and the unit sizes, respectively), tures and the combinatorial complexity of the
that both affect the overall mass integration cost. synthesis problem. Note that, this is not sim-
Rich streams ilar to an a priori decomposition of the net-
R={il i=l..N R} work into separable subnetworks.
Gi yS
I,C
Lean • Each stream entering the network is split to-
streams wards all its potential mass exchanger units.
S ={jl j=I..Ns }
After each mass exchanger, a splitter is con-
Mass
Exchange
sidered for each stream, where the stream is
Network split towards its final mixer and all the other
l-
U
x! < xt < xu potential stream exchangers.
Lj _<Lj j,c-- j,c-- j,c
xs Prior to each potential mass exchanger, a
J,C
1 < y! < yU mixer is considered for each participating
Yi,c- ~,c- i,c stream, where the flow from the initial split-
Fig. 1. ter and connecting (bypass) flows from all the
other exchangers of the stream are merged
When the mass transfer operations can take
into the flow towards the exchanger.
place at different temperatures, heat integration
of the rich and lean streams is also considered • A mixer is considered at the network out-
within a combined mass and heat exchanger net- let of each stream, where flows from all the
work, MHEN, synthesis problem. potential stream exchangers are merged into
In isothermal MEN synthesis, the simultaneous the outlet flow.
optimization of the mass exchange operations, the For example, for a rich stream i and its mth and
mass separating agent flows and the network con- m~th possible exchangers with lean streams j and
figuration has been formulated by K.P. Papalexan- j' respectively, we have Fig. 2"
dri, E.N. Pistikopoulos and C.A. Floudas [9] as an
MINLP problem based on:
yOoc gO
g..I m ~ ~ , g l ~ J E ..............
~[,:,B ......... ~ j m
a) the MEN superstructure of synthesis/design q ijm ijj'mm' t o
alternatives; G i _ [ s. ,[-. R\ I other ~ _
s -\ omer g~..,.,.. ]------'~\ exchangers( t
Y ic \excnangers "JJ...... I \ ! Yic
b) modeling of mass exchange in each mass ex-
changer; and, gij'
'm'~ .........."...........J i ~ =@ "~ /
c) minimization of a total annualized network
Fig. 2: Rich stream superstructure.
cost.
xZ.. x.°
Details are given below qmc~ - qmc ,n
364
MINLP: Mass and heat exchanger networks
,
zero stream flows (e.g. gij'm' o
-- gijm - 0 and intermediate compositions of components (molar
gij'jm'mB _-- 0 results in the exchangers in series). fractions x I, x 0, yI, yO) are illustrated in the cor-
For a lean stream j and its mth and m'th ex- responding superstructure figures.
changers with rich streams i and i', we have Fig. 3.
Modeling Mass Exchange. The existence of each
The MEN superstructure is described by mass potential mass exchanger in the network is denoted
balances for the overall streams and each transfer- by a binary variable:
able component at the exchangers, splitters and
mixers of the superstructure: 1, when t h e m t h exchanger
Eijm - between streams i and j exists,
E I 0
gijm (Yijmc - Yijmc) - Mijmc, 0, otherwise,
iER, c-1,...,C,
liEm (Xijmc
0 I
(1) and defined by
-- Xijmc ) -- Mijmc,
i e S, c - 1 , . . . , C , Em-Eli m U <_0
0
gijm -~- ~
E
j6S,m
E
iER,m
g I m -- Gi - O,
IIijm-Lj -0'
B
gijj'mm' -- gijm = 0,
E
i E R,
iES,
(2)
ly lEm- E i j m U < O,
M i j m c - EijmU < O,
g,%, z,%, >_ O,
where Mijmc is the mass exchange load of com-
ponent c in mass exchanger (ijm), and U a large
(7)
365
MINLP: Mass and heat exchanger networks
force constraints in (8) are activated only when the The main advantage of the simultaneous MEN
corresponding exchanger exists (Eijm - 1). synthesis model (P1), as opposed to the sequen-
The size of each potential mass exchanger (num- tial MEN synthesis method, is that the trade-off
ber of mass transfer stages, N st, etc.) is calculated between the capital and operating costs is system-
as a function of the variable mass transfer, through atically considered. Also,
appropriate design equations (e.g. for perforated- • (P1) derives the optimal network with re-
plate columns the Kremser equation): spect to all the transferable components, con-
_ Nst (gijm
E E
, lijm, XIjmc , Xijmc,
0 I y.O. sidering the mass transfer of each compo-
Yijmc, ~3mc)"
nent separately within the calculated mass-
(9)
transfer stages of each exchanger.
Minimizing Network Cost. The total network cost • Forbidden mass exchange matches, limited
comprises mass exchange and/or forbidden exchanger
connections can be explicitly considered in
• the annualized capital cost of the mass ex-
(P1).
changers, that may be discontinuous (involve
a fixed charge cost factor), and • Variable target compositions are straightfor-
wardly handled.
• the annualized operating cost, i.e. the cost of
the mass separating agents. When the mass exchange matches and mass ex-
change loads are fixed (e.g. when these are deter-
Consequently, the MEN MINLP synthesis mined within a sequential MEN synthesis frame-
model is formulated as follows: work), (P1) reduces to an NLP and can be solved
(P1) min to derive a network configuration and unit sizes
with minimum capital cost.
E (AC~jmEijm+ AC2m(N~jtm))+ ~ AC~Lj Extending the concept of cost optimality of the
ijrn j mass exchanger network, two special cases have
such that been studied:
• MEN and regeneration networks.
(2)-(9)
When regenerating agents are available for
, gijm, g i j j ' m m ~ ~ O, some (or all) lean streams, the total mass
I 0
Yijmc, Yijmc -- ,
>0 integration cost involves also the regenera-
tion cost. The regeneration network can be
icR, j,j'cS,
considered simultaneously within the MINLP
m, m I = 1, . . . , M, MEN synthesis model [9], accounting for
c= 1,...,C, all the regeneration alternatives of the lean
liIm , lijE m, lii,
B jmm, , lij0 m > O, streams and employing binary variables to
I 0
denote the existence of the regenerating ex-
Xijmc , Xijmc ~ O, changers. In this case, the mass separating
icR, j,j'cS, agents behave as lean streams in the mass
m,m I = 1,...,M, exchangers of the main MEN and as rich
streams in the regenerating mass exchang-
c= 1,...,C,
ers. The regeneration network is not neces-
E i j m = 0, 1, sarily separable from the main MEN, as a
iER, j,j' ES, lean stream may be partly regenerated be-
m, m l = 1 , . . . , M . fore being used as a separating agent in an-
other mass exchanger. Thus, the lean stream
(P1) is a nonconvex MINLP problem and global superstructures involve all the possible inter-
optimization methods are required to guarantee connections between the exchangers of the
global optimal solutions. main MEN and the regenerating exchangers.
366
MINLP: Mass and heat exchanger networks
For example, for a lean stream j and its mth peratures, heat integration between the network
and m~th exchangers with rich stream i and streams can be simultaneously considered within
regenerant k we have Fig. 4. a combined MEN and HEN synthesis problem [7].
x1.. xo The available rich and lean streams define hot, cold
=---. ..... -...
or hot-and-cold streams in the heat integration
1E BR ..............
L~ I ...: \ tjm pijkmm, to other \ problem, depending on whether their supply and
J I ." Rn\ I exchangers.
1;;..7_,_';L----P x (MENan.d ,~ target compositions are above or below the mass
X2s ~
'"
"
-jKtm m J \ regenerating)[ X(
=
J ".. E, exchange temperatures. Thus, their heat exchange
".. "........ / ~jkm'~ / g
toother "~ fro,,,O:t,;:r"..~,~ ~ ' ~ X I alternatives include both hot- and cold-side match-
MEN exchangers Jexch"an's'ers . . ~ ' ~ ) . . o-R .......
(MEN and. ^ikm
regenerating) " / ^ikm'
" "......" " , . . ~ ing. Inlet and outlet temperatures and composi-
tions in mass and heat exchangers are variables.
Fig. 4: Regenerable lean stream superstructure. The combined mass and heat exchanger super-
The overall superstructure of mass ex- structure involves all the possible mass and heat
change and regeneration alternatives involves exchangers of a stream and all the possible inter-
also the superstructures of the regenerating connections between them, Fig. 6.
agents, that have variable flows, while the I m~s [__
overall network cost includes the main MEN ...." Tm ...
/" oth........ I I other ""..~
and the regeneration cost (capital and oper- Rich stream
[ ' . . . . . . ha,tgersl I massexchangers "'~
I"".:"'~a:e~cthSaidn;ersl.4 ......[he%Ui~cl2ng~eers.......
.. l
ating cost). ~...- :: .. ..
zI zO
"j.'krn ~ ~km hot,i~ ~
~,,t,, ]exchanger]. 'ide I']exchanger
coldside ~. . -
Hk
/ ~~~]/ ............
= from other ,
...
...."
hE
jkm
/"
......~
'r~
to othe
exchangers ~._.__.._.~
Fig. 6: Combined MEN and HEN superstructure.
Z s ~ exchangers ~,J 4 ~ Zk
k The combined MEN and HEN superstructure is
'".........
described by
• mass balances at the superstructure splitters
Fig. 5: Regenerating stream superstructure. (i.e. the initial stream splitters and the split-
ters after each side of the possible mass and
• Flexible mass exchange networks.
heat exchangers), similar to (2) and (3), and
The ability of MEN to accommodate varia-
considering all the connecting flows;
tions in the rich stream flows and inlet com-
positions in an efficient manner affects cost • mass balances for overall flows and transfer-
optimality. A multiperio'd M I N L P M E N syn- able components at the superstructure mix-
thesis model has been suggested in [7], to de- ers (i.e. the final stream mixers and the mix-
rive mass exchange networks, flexible to ac- ers prior to each side of the potential mass
commodate in an optimal manner different and heat exchangers), similar to (4), (5) and
mass integration requirements. In the mul- (6), and considering all the connecting flows;
tiperiod MINLP model a weighted operat- • energy balances at the superstructure mixers;
ing cost is optimized simultaneously with the • mass balances at the mass exchangers, simi-
capital cost for mass exchangers that can op- lar to (1), and
erate feasibly under the different conditions.
• energy balances at the heat exchangers.
The MEN superstructure is extended to in-
clude control variables that enhance flexi- The MHEN synthesis model also involves
bility (as exchanger-bypassing streams and • binary variables, to denote the existence of
overall bypass streams that are accordingly mass and heat exchangers, and their defini-
penalized). tion (mixed integer constraints),
When the alternative mass transfer opera- • driving force constraints for mass exchange
tions take place at different and/or variable tem- (8) at the potential mass exchangers, and for
367
MINLP: Mass and heat exchanger networks
368
MINLP: Outer approximation algorithm
Extending the concept of mass exchange to non- ous mass integration', Industr. Engin. Chem. Res. 35
isothermal mass transfer operations Papalexandri (1996), 4523-4536.
and Pistikopoulos introduced a mass/heat transfer
[7] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: 'A
multiperiod MINLP model for the synthesis of flexible
module [8], where mass is transferred between dif- heat and mass exchange networks', Computers Chem.
ferent phases or reacting species if that is thermo- Engin. 18 (1994), 1125-1139.
dynamically feasible, i.e. if that decreases the total IS] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: 'A
Gibbs free energy of the system. Mass and energy generalized modular representation framework for pro-
cess synthesis', AIChE J. 42 (1996), 1010-1032.
balances, taking into account possible reactions,
[9] PAPALEXANDRI, K.P., PISTIKOPOULOS, E.N., AND
and mass-transfer driving-force constraints based FLOUDAS, C.A.: 'Mass exchange networks for waste
on total Gibbs free energy are employed to model minimization: A simultaneous approach', Chem. En-
the mass/heat transfer module as an aggregate gin. Res. Des. 72 (1994), 279-294.
of differential mass and energy transfer phenom- [10] SEBASTIAN, P., NADEAU, J.P., AND PUIGGALI, J.R.:
ena. Considering a superstructure of mass/heat 'Designing dryers using heat and mass exchange net-
works: An application to conveyor belt dryers', Chem.
and heat exchange modules in a process and all
Engin. Res. Des. 74 (1996), 934-943.
possible interconnections between them, process [11] SRINIVAS, B.K., AND EL-HALWAGI, M.M.: 'Synthe-
synthesis tasks can be formulated as mass/heat sis of combined heat and reactive mass exchange net-
and heat exchange superstructure MINLP prob- works', Chem. Engin. Sci. 49 (1994), 2059-2074.
lems, where binary variables are employed to de- Katerina P. Papalexandri
note the existence of mass/heat and heat exchang- bp Upstream Technol.
ers. Then, process operations (conventional and/or U.K.
hybrid) and networks are derived as combinations E-mail address: papaloxk©bp, tom
of mass/heat and heat exchange phenomena [8],
MSC2000: 93A30, 93B50
Key words and phrases: MINLP, mass and heat exchange,
See also: M I N L P : H e a t e x c h a n g e r n e t - separation.
w o r k s y n t h e s i s ; G l o b a l o p t i m i z a t i o n of h e a t
e x c h a n g e r n e t w o r k s ; M i x e d i n t e g e r lin-
ear programming: Heat exchanger network
MINLP: O U T E R APPROXIMATION AL-
synthesis; M i x e d i n t e g e r l i n e a r p r o g r a m -
GORITHM
ming: Mass and heat exchanger networks;
The outer approximation algorithm (OA algo-
MINLP: Global optimization with aBB.
rithm) ([1], [2], [9]) addresses mixed integer non-
linear programs of the form:
References rain Z- f(x,y)
[1] BAGAJEWICZ, M.J., AND MANOUSIOUTHAKIS, V.:
'Mass/heat exchange network representation of distil- (P) s.t. gj(x,y) <_ 0, j C J,
lation networks', AIChE J. 38 (1992), 1769-1800. x6X, y~Y,
[2] EL-HALWAGI, M.M., AND MANOUSIOUTHAKIS, V.: 'Si-
multaneous synthesis of mass-exchange and regenera- where f(.), g(.) are convex, differentiable func-
tion networks', AIChE J. 36 (1990), 1209-1219. tions, J is the index set of inequalities, and x
[3] EL-HALWAGI, M.M., AND SmNIVAS, B.K.: 'Synthesis and y are the continuous and discrete variables,
of reactive mass exchange networks', Chem. Engin. Sci.
respectively. The set X is commonly assumed to
47 (1992), 2113-2119.
[4] EL-HALWAGI, M.M., SRINIVAS, B.K., AND DUNN, be a convex compact set, e.g. X = {x: x C
R.F.: 'Synthesis of optimal heat-induced separation R n, Dx <_ d, x L <_ x <_ xU}; the discrete set Y
networks', Chem. Engin. Sci. 50 (1995), 81-97. corresponds to a polyhedral set of integer points,
[5] GUPTA, A., AND MANOUSIOUTHAKIS,V.: 'Minimum Y = {y: y E Z m, Ay <_ a}, and in most cases is
utility cost of mass exchange networks with variable restricted to 0-1 values, y E {0, 1} m. In most ap-
single component supplies and targets', lndustr. En-
gin. Chem. Res. 32 (1993), 1937-1950. plications of interest the objective and constraint
[6] LAKSHMANAN, A., AND BIEGLER, L.T.: 'Synthesis functions f(.), g(-) are linear in y (e.g. fixed cost
of optimal chemical reactor networks with simultane- charges and logic constraints).
369
MINLP: Outer approximation algorithm
The OA algorithm is based on the following the- PROPERTY 2 The solution of problem (RM-OA),
orem [1]: corresponds to a lower bound to the solution of
THEOREM 1 Problem (P) and the following problem (P). [:]
mixed-integer linear program (MILP) master Note that since function linearizations are ac-
problem (M-OA) have the same optimal solution cumulated as iterations proceed, the master prob-
(x*, y*), lems (RM-OA) yield a nondecreasing sequence of
(M-OA) min ZL -- lower bounds, Z~ < . . . < Z K, since linearizations
such that are accumulated as iterations k proceed.
The OA algorithm as proposed by M.A. Duran
a >_ f ( x k yk) + V f ( x k yk) _ yk , and I.E. Grossmann [1] consists of performing a
cycle of major iterations, k = 1 , . . . , K, in which
gj(x k yk) + Vgj(x k yk) _ yk ~_ 0, (NLP1) is solved for the corresponding yk and
the relaxed MILP master problem (RM-OA) is up-
jEJ, kEK*, dated and solved with the corresponding function
xEX, yEY, linearizations at the point (x k, yk). The (NLP1)
subproblems yield an upper bound that is used to
where
define the best current solution, UB g - min(Zkv).
(x k, yk) is the optimal The cycle of iterations is continued until this upper
K*- / k: solution to (NLP1) / ' bound and the lower bound of the relaxed master
for all feasible yk E Y problem, are within a specified tolerance.
min Z~: - f(x, yk) It should be noted that for the case when the
problem (NLP1) has no feasible solution, there are
(NLP1) s.t. gj(x,y k) ~ O, j E J,
two major ways to handle this problem. The more
xEX, general option is to consider the solution of the
where Zkv is an upper bound to the optimum of feasibility problem,
problem (P). [::]
min u
Note that since the functions f(x, y) and g(x, y) (NLFP) s.t. gj(x, yk) ~_ u, j E J,
are convex, the linearizations in (M-OA) corre- x E X, u E R 1.
spond to outer approximations of the nonlinear
feasible region in problem (P). Also, since the mas- R. Fletcher and S. Leyffer [2] have shown that
ter problem (M-OA) requires the solution of all for infeasible NLP subproblems, if the linearization
feasible discrete variables yk, the following MILP at the solution of problem (NLFP) is included, this
relaxation is considered, assuming that the solu- will guarantee convergence to the optimal solution.
tion of K NLP subproblems is available: For the case when the discrete set Y is given
(RM-OA) m i n Z g -- a by 0-1 values in problem (P), the other option to
ensure convergence of the OA algorithm without
such that
solving the feasibility subproblems (NLFP), is to
o~ > f (x k yk) + V f (xk yk) ( x - x k ) introduce the following integer cut whose objective
-- , , y_ yk ,
is to make infeasible the choice of the previous 0-1
values generated at the K previous iterations [1]:
gj(x k yk) + Vgj(x k yk) _ yk ~_ 0,
Given the assumption on convexity of the func- where B k - {i" yki -- 1}, g k - {i" yki -- 0},
tions ](x,y) and g(x,y), the following property k = 1 , , . . . , K. This cut becomes very weak as the
can be easily be established, dimensionality of the 0-1 variables increases. How-
370
MINLP: Outer approximation algorithm
(M-OAF) m i n Z g -- 0c~ The above proof follows from the fact that the
Lagrangian and feasibility cuts in (RM-GBD) are
such that
surrogates of the outer approximations in the mas-
~>_ U B k - e , ter problem (M-OA). Given the fact that the
lower bounds of GBD are generally weaker, this
, , (;_ , method commonly requires a larger number of cy-
cles or major iterations. As the number of 0-1
gj(x k yk) + Vgj(x k yk) (;__ yk <_0, variables increases this difference becomes more
pronounced. This is to be expected since only
jEJ, k=l,...,K, one new cut is generated per iteration. Therefore
xCX, yEY. user-supplied constraints must often be added to
the master problem to strengthen the bounds. As
While in (M-OA) the interpretation of the new for the OA algorithm, the trade-off is that while
point yK is that it represents the best integer so- it generally predicts stronger lower bounds than
lution to the approximating master problem, in GBD, the computational cost for solving the mas-
(M-OAF) it represents an integer solution whose ter problem (M-OA) is greater since the number
lower bounding objective does not exceed the cur- of constraints added per iteration is equal to the
rent upper bound UBK; in other words it is a fea- number of nonlinear constraints plus the nonlinear
sible solution to (M-OA) with an objective below objective.
the current estimate. Note that in this case the The OA algorithm is also closely related to the
OA iterations are terminated when (M-OAF) is extended cutting plane (ECP) method by T. West-
infeasible. erlund and F. Peterssen [8]. The main difference
Another interesting point about the OA algo- lies that in the ECP method no NLP subprob-
rithm is the relationship of its master problem with lem is solved, and that linerization simply takes
the one of the generalized Benders decomposition place over the predicted continuous points from
method [3], which is given by: the MILP master problem, which in turn will nor-
371
MINLP: Outer approximation algorithm
mally only include linearizations of the most vio- EXAMPLE 5 In order to illustrate the performance
lated constraints. of the OA algorithm, a simple numerical MINLP
Extension of the OA algorithm [4] include the example is considered.
L P / N L P based branch and bound [6], which
min Z- Yl + 1.5y2 + 0.5y3
avoids t h e complete solution of the MILP mas-
+
ter problem (M-•A) at each major iteration. The
method starts by solving an initial NLP sub- s.t. ( x l - 2) 2 - x 2 _< 0
problem which is linearized as in (M-•A). The x l - 2yl _> 0
basic idea consists then of performing an LP-
xl - x2 - 4 ( 1 - Y2) _< 0
based branch and bound method for (M-•A) in
(MIP-EX) Xl-(1-yi) ~0
which NLP subproblems (NLP1) are solved at
those nodes in which feasible integer solutions are x 2 - - Y 2 ~_0
found. By updating the representation of the mas- Xl + x2 __ 3y3
ter problem in the current open nodes of the tree Yl + Y2 + Y3 _~ 1
with the addition of the corresponding lineariza- 0 < _ x i ~_4, 0__x2__4
tions, the need of restarting the tree search is
Yl, Y2, Y3 -- 0, 1.
avoided. Another important extension has been
the method by Fletcher and Leyffer [2] who in-
cluded a quadratic approximation based on the Objective function
Hessian of the Lagrangian to the master problem . m
m
,~..........~"
no feasible solution can be found in the MIQP mas- -5 m
ter. m
-10 m
m
l
for the MIP master problem according to the sign m
372
MINLP: Reactive distillation column synthesis
only one more iteration t h a n OA (4 versus 3). It is Control, Vol. 93 of IMA Vol. Math. Appl., Springer,
interesting to note that the NLP relaxation of this 1997, pp. 73-100.
[5] KocIs, G.R., AND GROSSMANN, I.E.: 'Relaxation
problem is 2.53, which is significantly lower than
strategy for the structural optimization of process
the optimal mixed integer solution. Also, as can be flow sheets', Industr. Engin. Chem. Res. 26, no. 1869
seen in Table 1, an NLP-based branch and bound (1987).
method requires the solution of 5 NLP subprob- [6] QUESADA, I., AND GROSSMANN, I.E.: 'An LP/NLP
lems, while the E C P method requires 5 successive based branch and bound algorithm for convex MINLP
MILP problems. optimization problems', Computers Chem. Engin. 16
(1992), 937-947.
Method Subproblems Master LPs [7] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
problems solved bined penalty function and outer-approximation
BB 5 (NLP1) method for MINLP optimization', Computers Chem.
OA 3 (NLP2) 3 (M-MIP) 19 LPs Engin. 14 (1990), 769.
GBD 4 (NLP2) 4 (M-GBD) 10 LPs [8] WESTERLUND, T., AND PETTERSSON, F.: 'A cutting
ECP -- 5 (M-MIP) 18 LPs plane method for solving convex MINLP problems',
Table 1: Summary of computational results. Computers Chem. Engin. 19 (1995), $131-$136.
[9] YUAN, X., ZHANG, S., PIBOLEAU, L., AND
D DOMENECH, S.: 'Une methode d'optimisation non-
lineare en variables pour la conception de procedes',
See also" C h e m i c a l process planning;
Oper. Res. 22, no. 331 (1988).
Mixed integer linear programming: Mass
a n d h e a t e x c h a n g e r n e t w o r k s ; M i x e d in- Ignacio E. Grossmann
teger nonlinear programming; Generalized Carnegie Mellon Univ.
Pittsburgh, PA, USA
Benders decomposition; Generalized outer
E-mail address: grossmann@cmu, edu
a p p r o x i m a t i o n ; M I N L P : G e n e r a l i z e d cross
d e c o m p o s i t i o n ; E x t e n d e d c u t t i n g p l a n e al- MSC 2000: 90Cll
gorithm; MINLP: Logic-based methods; Key words and phrases: mixed integer nonlinear program-
ming, outer approximation method, generalized Benders de-
MINLP: Branch and bound methods;
composition, extended cutting plane method.
MINLP: B r a n c h a n d b o u n d g l o b a l o p t i m i -
zation algorithm; MINLP: Global optimi-
zation with aBB; MINLP: Heat exchanger
n e t w o r k s y n t h e s i s ; M I N L P : R e a c t i v e dis- MINLP: REACTIVE DISTILLATION COL-
tillation column synthesis; MINLP: Design UMN SYNTHESIS
•a n d s c h e d u l i n g o f b a t c h p r o c e s s e s ; M I N L P : Reactive distillation (RD) occurs when a reaction
A p p l i c a t i o n s in t h e i n t e r a c t i o n of d e s i g n takes place in the liquid holdup on the trays, in
a n d c o n t r o l ; M I N L P : A p p l i c a t i o n in f a c i l i t y the reboiler, or in the condenser of a distillation
l o c a t i o n - a l l o c a t i o n ; M I N L P : A p p l i c a t i o n s in column. Reactive distillation can increase the con-
blending and pooling problems. version of equilibrium limited reactions by con-
tinuously separating products and reactants, im-
References prove the selectivity in some kinetically limited
[1] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer- reaction systems, and separate azeotropic and iso-
approximation algorithm for a class of mixed-integer
meric mixtures by converting one species into an-
nonlinear programs', Math. Program. 36 (1986), 307.
[2] FLETCHER, R., AND LEYFFER, S.: 'Solving mixed inte- other that is easy to remove. It can also create a
ger nonlinear programs by outer approximation', Math. natural heat integration that uses an exothermic
Program. 66 (1994), 327. heat of reaction to create vapor boilup in a dis-
[3] GEOFFRION, A.M.: 'Generalized Benders decomposi- tillation column, and reduce capital costs by com-
tion', J. Optim. Th. Appl. 10, no. 4 (1972), 237-260. pleting several processing steps in a single vessel.
[4] GROSsMANN, I.E., AND KRAVANJA, Z.: 'Mixed-integer
Reactive distillation is used commercially to pro-
nonlinear programming: A survey of algorithms and
applications', in L.T. BIEGLER, T.F. COLEMAN, A.R. duce methyl tert-butyl ether [13], esters including
CONN, AND F.N. SANTOSA(eds.): Large-Scale Optimi- methyl acetate [1], and nylon 6,6 [9]. It has also
zation with Applications. Part II: Optimal Design and been proposed for hydrolysis reactions [7], ethy-
373
MINLP: Reactive distillation column synthesis
374
MINLP: Reactive distillation column synthesis
[ ~ s xisFsk+Vk-lKi,k-lXik-1 (2)
(3)
Bi = (1 - f l ) L l X i l , (5)
B~ =P~, i ~ P, (6)
1] o, (7)
Fig. 2. Tray-by-tray superstructure of [8].
375
MINLP: Reactive distillation column synthesis
376
MINLP: Reactive distillation column synthesis
Feed type Diam. (m) Height (m) Boilup ratio Reboiler duty (MW) Condenser duty (MW)
Distr. 1.3 12 0.958 6.7 7.31
Two-feed 1.3 12 0.96 6.9 7.5
the same. Fig. 3 shows the solutions. The column temperature approach is the driving force for heat
specifications are given in Table 2. transfer. Concentration and temperature approach
constraints are considered at each end of the ex-
Heat and Mass E x c h a n g e Networks. In this changer. Equilibrium can be represented by a zero
approach, process units are defined as combina- concentration approach, which means no driving
tions of heat and mass exchanger blocks, and the force for mass transfer.
alternatives for the synthesis are explored simul- Product
f V(ABC)D- Water1
taneously in a superstructure. A reactive distilla-
$ I
tion column can be described as a combination of I L(ABC)D - V(ABC)D ]
mass/heat exchanger units with a condenser and Feed
a reboiler [10]. Heat and mass transfer takes place
between the contacting vapor and liquid phases
[ 1
and from reactants to products. Multiple feeds and
products and side heating and cooling tasks can be I LAsc o,-v c , ]
included in the description in the form of multiple I t, I Steam-VABC(D)1
mass and heat exchanger blocks between liquid
and vapor streams. Its phase and quality define
J Producl
377
MINLP: Reactive distillation column synthesis
ture. Usually, the objective function of the optimi- timal reactive distillation column obtained is pic-
zation problem is a cost function. If the cost func- tured in Figure 6. The column has two reaction
tion includes only operating cost, which depends zones and multiple feeds, and the operating cost is
on the raw material and utility consumption, the 1.17 x 106 S/yr.
objective function can be easily formulated from
the superstructure. If, however, capital investment Pr°ductI [ V(ABC)D
- Water1
costs are involved in the objective cost function,
the formulation is not straightforward from the su- [ L(ABC)D-V(ABC)D I
A
perstructure, since process unit specifications are
not considered in the superstructure. In this case, r 1
capital cost is to be approximated using cost func-
tions that take operating conditions into account.
Separation difficulty can be used in evaluating the
capital cost of a distillation tray. t
J
V(ABC)D-,Watc~
I
I I
Fig. 6. Optimal reactive distillation column for ethylene
glycol production.
Feed
I L!c,o-vAB,OD 1
Conclusions. This paper discussed the MINLP
applications in reactive distillation design prob-
I t lems. Two main approaches are studied: distil-
f Steam-VABCCD)
]
lation based superstructure approach that uses
J rigorous tray-by-tray method to model reactive
distillation, and heat and mass exchanger net-
Fig. 5. Steps of the synthesis framework. work superstructure approach that realizes reac-
tive distillation processes as combinations of sev-
K.P. Papalexandri and E.N. Pistikopoulos [10] eral mass/heat exchangers with a condenser and
used the production of ethylene glycol from ethy- a reboiler. Examples are included to demonstrate
lene oxide and water to demonstrate this approach. the approaches.
The reactions involved in this production were See also: C h e m i c a l p r o c e s s planning;
given before. Physical properties, cost and reac- M i x e d i n t e g e r linear p r o g r a m m i n g : M a s s
tion data are the same as given earlier in Table a n d h e a t e x c h a n g e r n e t w o r k s ; M i x e d integer
1. The difference from the example problem stud- n o n l i n e a r p r o g r a m m i n g ; M I N L P : O u t e r ap-
ied in [6] is the objective, which is the minimiza- proximation algorithm; Generalized outer
tion of operating cost only. The set of streams in- a p p r o x i m a t i o n ; M I N L P : G e n e r a l i z e d cross
clude the intermediate streams L{EO, H20, EG, d e c o m p o s i t i o n ; E x t e n d e d c u t t i n g plane al-
DEG} and V{EO, H20, EG, DEG} and the prod- gorithm; MINLP: Logic-based methods;
uct streams L(EG) and L(DEG). Five liquid-liquid MINLP: Branch and bound methods;
mass/heat exchange matches and 15 liquid-vapor M I N L P : B r a n c h a n d b o u n d global o p t i m i -
mass/heat exchange matches are considered. Rep- z a t i o n a l g o r i t h m ; M I N L P : Global o p t i m i -
resenting each match with a binary variable, and zation with aBB; M I N L P : Heat exchanger
considering all possible interactions between units, n e t w o r k synthesis; G e n e r a l i z e d B e n d e r s de-
the problem is formulated as a mixed integer non- c o m p o s i t i o n ; M I N L P : D e s i g n a n d sched-
linear programming problem with the objective of uling of b a t c h processes; M I N L P : Appli-
minimizing operating cost, which includes raw ma- cations in t h e i n t e r a c t i o n of design a n d
terial cost, purification, and utility cost. The op- control; M I N L P : A p p l i c a t i o n in facility
378
MINLP: Trim-loss problem
379
MINLP: Trim-loss problem
used solution methods for trim-loss and assort- reel the problem can be simplified by omitting the
ment problems is given in [7]. raw paper length and assuming that the pattern
When using an LP-approach to solve an inte- lengths are equal.
ger problem the biggest difficulty is to convert the Besides the demand constraint, certain con-
continuous solution such that the integer variables straints are needed to keep the problem feasible.
obtain integer values. The rounding methods are Let the width of a product i be expressed by bi
heuristic [8] and often fail to give the optimal inte- and the width of the raw paper used for cutting
ger solution even though the solution may be fairly pattern j by ~j,max. The trim-loss width cannot
good. exceed, for instance, 200mm owing to the machin-
ery. This limit is represented by Aj. Furthermore,
P r o b l e m F o r m u l a t i o n . The trim-loss problem the maximum number of products that can be cut
is a bilinear nonconvex integer nonlinear program- out from a pattern often has a physical restric-
ming (INLP) problem. The appearance of a cut- tion. The outcoming product reels have to form
ting pattern needs to be determined by integer an angle big enough so that the reels do not at-
variables and the bilinearity comes from the de- tach together, yet with too big an angle between
mand constraints. the outermost reels the paper may be torn off. Let
A cutting pattern tells how many times a cer- this upper limit be Nj,max.
tain product is cut out from the raw paper. Let Besides the total number of patterns, the pat-
a cutting pattern have the index j and a prod- tern changes are also of interest when doing the
uct the index i. Assume a customer demand with optimization. This is due to the fact that the ma-
I different products and further assume that the chinery normally needs to be stopped for a knife
maximum allowed number of different cutting pat- change which causes a production stop. Let there-
terns is J. Further let m j be the number of times fore the variable yj be 1 if the cutting pattern j
a certain cutting pattern is repeated and nij be exists and 0 if not. The sum of yj variables then
the number of times a product i appears in cut- indicates how many different cutting patterns are
ting pattern j. If the demand of a product i is needed to satisfy the production and the sum of mj
expressed by ni,order, the demand constraints can indicates the total number of all patterns which are
be written as related to the running metres of the raw material.
J Now the basic formulation can be written in
ni,order -- E mj • nij ~_ 0, (1) mathematical form. The objective is to minimize
j=l the total number of patterns and the number of
i= 1,...,I, pattern changes.
mj, nij E Z +.
380
MINLP: Trim-loss problem
2 3 3
Sijk - nij <_ 0, (10)
4
--Sijk + n i j -- L i j " (1 - ~ j k ) ~_ O, (11)
n. = 2 n3~= 2 n3, = 3
n2, = 1 n,2= 1
Sijk -- Lij " ~jk <_ O. (12)
Fig. 2: The integer variables. Using the above constraints the bilinear demand
The last constraint, the demand constraint (8), constraint can be written in linear form
J K
is an integer bilinear constraint where both vari-
ables in bilinear terms are pure integers. This
ni,°rder- E E 2k-l'sijk ~ O. (13)
j = l k=l
makes the problem a nonconvex MINLP prob-
The mj could also be represented by special or-
lem where the nonconvexity appears in the integer
dered sets (SOS) where at most, one of the binary
variables.
variables are allowed to be nonzero.
There are very few methods available that are
K
capable of solving similar nonconvex MINLP prob-
mj - Z k . (14)
lems. Some heuristic methods such as simulated
k=l
annealing [9] may find the global optimal solu- K
tion within infinite time but algorithmic methods
have not been proven to converge with such types k=l
of problems. Only recently (1999) some advance- It should be noted that the usage of this kind of
ments have been reported in [1] and [11]. transformation may enlarge the integrality gap un-
However, it is fully possible to transform the less for instance the nij variables in equations (3)-
trim-loss problem into convex or linear form and (5) are replaced with corresponding variables Sijk.
use some established MINLP or MILP solver to The same transformation can be modified such
solve the resulting problem to global optimality. that nij is replaced by a binary representation and
Some linear transformations are presented in [6] mj is defined through the slack-variables 8ijk.
and methods to transform the nonconvex problem
into a convex form can be found in [10] and [5]. P a r a m e t e r i z a t i o n M e t h o d s . Beyond the linear
transformation, the problem can be written in lin-
L i n e a r T r a n s f o r m a t i o n s . As can be seen from ear form by simply parameterizing one of the vari-
381
MINLP: Trim-loss problem
ables in the bilinear term. This m e t h o d though ders. This creates an interesting problem, where
may lead to global optimality only in such cases the integer search space is reduced at the expense
where all the possible combinations have been con- of more complex nonlinear functions, which could,
sidered. This strategy may be good for smaller in principle, be used as benchmarks for the perfor-
problems but it may also generate far too many mance of MINLP algorithms.
integer variables in solving larger trim-loss prob- The basic principle for the convex transforma-
lems. tion is to first expand the bilinearity in the demand
It is quite easy to generate all the possible com- constraint
binations of nij variables satisfying the constraints mj. -- + + (16)
(3)-(5). This strategy results in a problem where
-- T " ( rnj q- nij ) -- T 2.
these constraints can be removed and where the
nij variables in the resulting linear d e m a n d con- In the following text, the translation constant T =
straint are parameters: 1 is used for simplicity. The second step is to sub-
I stitute the bilinear t e r m in the original demand
hi,order -- m j • nij < O, (15)
constraint
mjEZ.
J
The same type of parameterization strategy may ni,orde r -- ~ ( m j nt-- 1)(nij + 1) (17)
also be applied to the other variable mj but in j=l
this case it may be more difficult to define the ex- J
act values of the parameters. One strategy is to use + (mj + w ) + J < 0.
the upper bounds M j or define all the mj variables j=l
to be equal to one and make sure that a sufficient It should be noted t h a t the transformations that
amount of the variables m j a r e considered. follow need to consider the whole problem not only
Another alternative is to combine the param- individual functions, which makes the transforma-
eterization and transformation methods so that a tion techniques more demanding. A transforma-
proper amount of parameterized variables are com- tion of a single function may cause linear con-
bined with original variables. This strategy may be straints to become nonlinear if one is unaware of
very efficient but often requires such information this fact.
that may be difficult to obtain from a larger prob-
Exponential T r a n s f o r m a t i o n . The demand con-
lem without any knowledge of the solution.
straint is originally a negative bilinear constraint.
The exponential transformation can only be ap-
C o n v e x T r a n s f o r m a t i o n s . In the previous sec-
plied to a positive bilinear constraint. Therefore,
tions a number of methods were presented where
one of the variables in the bilinear term needs to
the nonconvex problem can be transformed or pa-
be substituted with its reversed value.
rameterized into linear form. The main drawback
for this linear transformation strategy is the large rij -- N j , m a x - nij (18)
number of extra constraints and continuous vari- and the d e m a n d constraint is modified to
ables. The parameterization strategy results in a J J
formulation with a few constraints but many extra ni,orde r -- ~ mj. N j , m a x -[- ~ mj.rij. (19)
integer variables. j=l j=l
In the following a number of convexification Now the exponential transformation can be ap-
methods are presented. Generally, the convex for- plied. The transformation is of the form
mulations need fewer extra constraints and contin-
uous variables as the linear strategies and no extra m j + 1 - e Mj , rij + 1 = e R~ (20)
integer variables as is the case with the parameter- and the variables are defined as
ization methods. Thus, the convex transformation Lj
could be expected to result in formulations which mj - ~~jl . l, (21)
are easier to solve especially for larger-scale or- /=1
382
MINLP: Trim-loss problem
383
MINLP: Trim-loss problem
Rij--l+E3iJk"
(1) k+l 1 . (37)
Five methods for transforming the originally non-
convex trim-loss problem into convex form have
k-1 been discussed. Three of t h e m were directly appli-
The demand constraint is obtained exactly in the cable to a negative bilinear function but for two
same way as before methods some operations were needed to change
J the d e m a n d constraint into a positive bilinear con-
1
ni'°rder -- J + j~l= Mj - Rij (38) straint.
- E (Nmax + 1). ~ / 3 j l .
Lj 1+ E/3ijk" k
/ E x a m p l e : A N u m e r i c a l P r o b l e m . In this last
section a numerical example is solved with all of
j=l /=1 k-1 the presented methods. To improve the perfor-
<0. mance of the solution procedure some extra linear
constraints need to be defined. They are, however,
Modified Square-Root Transformation. As the last not specified here.
transformation, a modification to the previously In the following example order an upper limit
presented square-root transformation is intro- for products h i , m a x that are allowed to be produced
duced. In such cases where the variable mj may also has been defined. Here, the maximal possible
take large values, it may be more efficient to use overproduction of any product is 2. This limit is
another type of binary representation. somewhat u n n a t u r a l and is therefore not used as a
constraint. However, the use of this type of upper
- Z (39) bounds makes it possible to efficiently reduce the
l--1 combinatorial space.
where L~ - ~log2(mj,max)J + 1 if mj,max is the i bi (mm)hi,order hi,max _
384
MINLP: Trim-loss problem
The parameter Mmin is the lower bound for the variable combinations as a function of number of
sum of the variables mj. This sum can easily be binary variables. This information is more infor-
calculated in advance and significantly enhances mative than just the number of variables.
the optimization performance. Strategy Constraints Variables Comb. 2n
Before doing the actual optimization it should (I/B/C)
36/23/120 298
be pointed out that the results are not compara- o 408
2. 366 6/88/144 2105
ble. The main purpose for showing the numerical
214o
. 59 51/51/-
results is to demonstrate that the above presented 2634
4. 201 282/47/-
strategies are fully usable and result in quite ef- 2 96
. 199 -/169/84
ficient solvable formulations. The transformation 6. 199 -/169/84 2 96
which the linear transformation and the param- The MILP problems 1-4 were solved with
eterization strategies result into MILP formula- CPLEX-5.0 using default settings and the MINLP
tions. The third group, the convex transformation problems 5-9 were solved by 'mittlp', an ECP ap-
strategy produces MINLP formulations that have plication written by H. Skrifvars. The optimization
in this case been solved using the extended cutting was done on a Pentium Pro 200MHz running the
plane (ECP) algorithm by T. Westerlund and F. Linux operating system.
Peterssen [12]. The optimization results can be seen in the fol-
In the parameterization strategies the problem lowing table.
is redefined by parameterizing certain variables
Strategy Nodes ECP-iter. CPU-time (s)
which means that the resulting problem has al- (MILP) (MINLP)
ready been partly solved. This may, however, not 265 716
always be a benefit, especially in such problems 51 0.51
where a huge number of parameters increases the 2174 3.2
265 7.7
integer search space for other variables.
4 8.6
The strategies are numbered as follows: 66.6
7
9 138.6
1. binary representation of mj
10 736.4
2. binary representation of nij
6 49.9
3. parameterization of nij
4. parameterization of mj The optimal result has two cutting patterns with
5. exponential transformation
the widths B1 - 2110 mm and B2 = 2170 mm
6. square-root transformation
7. logarithmic and square-root transformation and multiples ml = 8, m 2 = 7. The appearances
8. inverted transformation of the patterns are given by the following variables:
9. modified square-root transformation nl,1 -- 1, n2,1 -- 2, n6,1 -- 2, n3,2 -- 2, n4,2 -- 1,
n5,2 ---- 2
The strategies enlarge the problem both in terms
of variables and constraints. In the following the
number of variables and constraints are given. All C o n c l u s i o n s . The study above is not a fair com-
the constraints are linear except in the convex parison. Experience has shown that the perfor-
transformation strategies where six of the con- mance order is highly dependent on the specific
straints are nonlinear. problem. In order to get an idea of which of the
The strategies 1-4 are linear formulations of methods is, in average the most efficient one, tens
which 3-4 use the parameterization strategy to of problems of different sizes need to be solved.
overcome the bilinearity. Strategies 5-9 are con- However, the study illustrates that it is fully pos-
vex transformations. The field with combinations sible to apply the transformation methods to a well
gives simply the number of unconstrained discrete explored real industrial problem.
385
MINLP: Trim-loss problem
Notation.
References
i product index [1] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
J cutting pattern index C.A.: 'Global optimization of MINLP problems in pro-
I number of products in the order cess synthesis and design', Computers Chem. Engin. 21
J number of possible cutting patterns (1997), $445-$450.
mj number of times the pattern j is used [2] DAKIN, R.J.: 'A tree search algorithm for mixed in-
nij number of product i in pattern j teger programming problems', Computer J. 8 (1965),
rij reversed value of nij 250-255.
Tti,order number of product i ordered [3] GILMORE, P.C., AND GOMORY, R.E.: 'A linear pro-
bi width of product i gramming approach to the cutting-stock problem',
Sigma;r, width of raw paper of pattern j Oper. Res. 9 (1961), 849-859.
Aj max. trim-loss width [4] HAESSLER, R.W.: 'A heuristic programming solution
j ~yl~a x max. number of products in pattern j to a non-linear cutting stock problem', Managem. Sci.
Yj binary variable that is one if mj > 0 17 (1971), S793-S802.
cj,C~ cost coefficients [5] HARJUNKOSKI, I., P(3RN, a., WESTERLUND, T., AND
M~ upper bound / transformation variable SKRIFVARS, H.: 'Different strategies for solving bilinear
binary variables for defining mj integer problems with convex transformations', Com-
L~j upper bound puters Chem. Engin. 21 (1997), $487-$492.
$ijk slack-variable for linear transformations [6] HARJUNKOSKI, I., WESTERLUND, T., ISAKSSON, J.,
flijk binary variables for defining nij or rij AND SKRIFVARS, H.: 'Different formulations for solv-
!
nij fixed nij values ing trim-loss problems in a paper converting mill with
T translation constant ILP', Computers Chem. Engin. 20 (1996), S121-S126.
Nq transformation variable [7] HINXMAN, A.I.: 'The trim-loss and assortment prob-
Rij transformation variable lems: A survey', Europ. J. (?per. Res. 5 (1980), 8-18.
l,k,m indices of binary variables [8] JOHNSTON, R.E.: 'Rounding algorithms for cutting
Lj , Ki number of binary variables needed stock problems', Asia-Pacific J. Oper. Res. 3 (1986),
See also: D e c o m p o s i t i o n t e c h n i q u e s for 166-171.
[9] KIRKPATRICK, S., GELATT, C.D., AND VECCHI, M.P.:
MILP: Lagrangian relaxation; LCP: Parda-
'Optimization by simulated annealing', Science 220
los-Rosen mixed integer formulation; Inte- (1983), 671-680.
ger linear complementary problem; Integer [10] SKRIFVARS, H., HARJUNKOSKI, I., WESTERLUND, T.,
p r o g r a m m i n g : C u t t i n g p l a n e a l g o r i t h m s ; In- KRAVANJA, Z., AND PORN, R.: 'Comparison of differ-
teger programming: B r a n c h a n d c u t al- ent MINLP methods applied on certain chemical engi-
neering problems', Computers Chem. Engin. 20 (1996),
gorithms; Integer programming: Branch
$333-$338.
and bound methods; Integer program- [11] SMITH, E.M.B., AND PANTELIDES, C.C.: 'Global op-
ming: Algebraic methods; Integer program- timization of nonconvex MINLPs', Computers Chem.
ming: Lagrangian relaxation; Integer pro- Engin. 21 (1997), $791-$796.
gramming duality; Time-dependent travel- [12] WESTERLUND, T., AND PETTERSSON, F.: 'An ex-
ing s a l e s m a n p r o b l e m ; Set covering, p a c k i n g tended cutting plane method for solving convex
MINLP problems', Computers Chem. Engin. 19
a n d p a r t i t i o n i n g p r o b l e m s ; S i m p l i c i a l piv-
(1995), S131-S136.
o t i n g a l g o r i t h m s for i n t e g e r p r o g r a m m i n g ;
Iiro Harjunkoski
Multi-objective mixed integer program-
Process Design Lab./~bo Akad. Univ.
ming; Mixed integer classification problems; Biskopsgatan 8
Integer programming; Multi-objective inte- FIN-20500 Turku, Finland
ger linear programming; Multiparametric E-mail address: ±harjunk©abo.fi
mixed integer linear programming; Para- Ray PSrn
metric mixed integer nonlinear optimiza- Dept. Math./~bo Akad. Univ.
386
Mixed integer classification problems
387
Mixed integer classification problems
efficient solutions, authors often introduce addi- yielding a variation of (1) in which the objective
tional terms in the objective function. As an ex- function is replaced with
ample, S.M. Bajgier and A.V. Hill [2] used a for- 2 Kg
mulation similar to the following: min~-~ 7rgCg E N g k Z g k .
r" 2 Ng 9-1 Yg k--1
7r9
min E ~ E [Cgzgn +eldgn -e2d;n ] In this formulation [1], Kg is the number of dis-
9=1 n=l tinct attribute vectors x in the training sample
s.t. Xlw+Wo'l+d +-d~-<0 from group g, Ngk is the number of repetitions of
X2w+wo.l-d ++d 2 >0 the kth distinct observation from group g, and the
dg - Mzg <_0 matrices Xg contain only one copy of each such
observation.
, w, w0 free; d +, dg >_ 0; zg E {0, 1}gg.
The deviation variables d + and dg measure the H e u r i s t i c s . Advances in computer hardware, op-
amount by which each score fails on the correct timization software and algorithms for the mixed
and incorrect side of the zero cutoff, respectively. integer classification problem have allowed pro-
The objective functions rewards the former and gressively larger training samples to be employed:
penalizes the latter, using small positive objective where G.J. Koehler and S.S. Erenguc [5] were re-
coefficients el and e2 to prevent improvements in stricted to combined training samples of 100 in
these terms from inducing unnecessary misclassi- 1990 (on a mainframe), P.A. Rubin [8] was able
fications. to handle over 600 observations in 1997 (on a per-
The motivation for formulation (1) is simple: if sonal computer). Nonetheless, a variety of heuris-
the training samples are representative of the over- tics have been developed to find near optimal solu-
all population, the discriminant function that min- tions to the problem. Several revolve around this
imizes misclassification costs on the training sam- property of the problem: if the training samples
ples should come close to minimizing expected mis- can be classified with perfect accuracy by a lin-
classification cost on the overall population. Mod- ear function, then problem (1) can be solved as a
els like (1) tend to be computationally expensive, linear program, with the zgr, deleted, to obtain a
however. The constant M must be chosen large discriminant function. Deletion of the zgn reduces
enough that the best choice of w and wo is not the objective function to a constant 0. Although
rendered infeasible by a misclassified observation this is perfectly acceptable, heuristics may substi-
with score larger than M in magnitude; but the tute an objective function from one of the linear
larger M is, the weaker the bounds in a branch and programming classification models, to encourage
bound solution of the problem, and thus the longer the chosen discriminant function to separate scores
the solution time. As is typical with mixed integer of the two groups as much as possible. This often
programming models, computation time increases also necessitates inclusion of a normalization con-
modestly with the number of attributes (p) but straint, to keep the resulting linear program from
more dramatically with the number of zero-one being unbounded. Alternatively, (1) may be solved
variables (N1 + N2, the combined sample size). heuristically to determine which training observa-
Unfortunately, the reliability of the discriminant tions to misclassify, and then a linear programming
functions improves as the training samples grow, model using the remaining observations may be
creating a tension between validity and tractabil- employed to select the final discriminant function.
ity. The BPMM heuristic of [5] solves the linear
In the special case where all attribute variables program dual to a relaxation of the mixed inte-
are discrete, it is likely that some observation vec- ger problem, notes which observations would be
tors will appear more than once in the training misclassified by the resulting discriminant func-
samples. When that occurs, the number of zero- tion, and then solves the dual of each linear re-
one variables can be reduced from one per ob- laxation obtainable by deleting one of those ob-
servation to one per distinguishable observation, servations. Solving the dual problem tends to be
388
Mixed integer classification problems
also operate on the dual of the linear relaxation of s.t. Xgw + w0" 1 - M . z g - Ug. 1 _< 0
the mixed integer problem, restricting basis entry Xgw+W0.1+M.zg-Lg.1 >_0
to force certain dual variables to take value zero
U g - L g >O
(equivalent to relaxing the corresponding primal
constraints, thus allowing the associated observa-
Lh - Ug + Myhg > e
tions to be misclassified). Ygh + Yhg -- 1
w, w0, L, U free;
z9 E {0, 1}Yg; y E {0, 1} G(G-1).
M u l t i p l e G r o u p s . When G > 2 groups are in- The first three constraints are repeated for g =
volved, the problem becomes considerably more 1 , . . . , G while the next two are repeated for all
complicated. In a practical application with mul- pairs g,h = 1 , . . . , G such that g # h. Observa-
tiple groups, it is plausible that misclassification tions are classified into group g if their scores fall
costs would depend not only on the group to which in the interval [Lg, Ug]. Variable Ygh = 1 if the scor-
a misclassified point belonged but also the one into ing interval for group g precedes that for group h.
which it was classified. Thus an appropriate objec- Parameter c > 0 dictates a minimum separation
tive function might look like between intervals.
Using a single scalar-valued discriminant func-
tion with G > 2 groups is restrictive; it assumes
that the groups project onto some line in an or-
derly manner. In [3], Gehrlein also suggested a
G G Ng model using a vector-valued discriminant function
7[9
f() of dimension G. Observation x would be clas-
g=l h=l n=l sifted into the group corresponding to the largest
h#g
component of f(x). The model increases the num-
ber of coefficient variables and the number of con-
straints but not the number of 0-1 variables, the
primary determinant of execution time. The model
where Cgh is the cost of classifying a point from is"
group g into group h and Zghn is 1 if the nth ob- G Ng
servation of group g is classified into group h and min ~ ~ g C g ~ z g , ~
0 otherwise. This represents a substantial escala- g=l Ng n=l
389
Mixed integer classification problems
mixed integer model similar to Gehrlein's is solved [3] GEHRLEIN, W.V.: 'General mathematical program-
to obtain the first scalar function. Thereafter, a se- ming formulations for the statistical classification prob-
lem', Oper. Res. Left. 5, no. 6 (1986), 299-304.
quence of similar mixed integer models is solved,
[4] HAND, D.J.: Construction and assessment of classifi-
with each model bearing additional constraints cation rules, Wiley, 1997.
compelling the scores produced by the next scoring [5] KOEHLER, G.J., AND ERENGUC, S.S.: 'Minimizing
function to have sample covariance zero with the misclassifications in linear discriminant analysis', De-
scores of each of the preceding functions. The co- cision Sci. 21, no. 1 (1990), 63-85.
variance constraints impose a sort of probabilistic [6] PAVUR, R.: 'Dimensionality representation of linear
discriminant function space for the multiple-group
'orthogonality' on the dimensions of the composite problem: An MIP approach', Ann. Oper. Res. 74
(vector-valued) scoring function. (1997), 37-50.
See also" I n t e g e r p r o g r a m m i n g ; M u l t i - [7] RUBIN, P.A.: 'Heuristic solution procedures for a
o b j e c t i v e m i x e d i n t e g e r p r o g r a m m i n g ; Sim- mixed-integer programming discriminant mode]', Man-
agerial and Decision Economics 11, no. 4 (1990), 255-
plicial p i v o t i n g a l g o r i t h m s for i n t e g e r pro-
266.
g r a m m i n g ; Set covering, p a c k i n g a n d par-
[8] RUBIN, P.A" 'Solving mixed integer classification
t i t i o n i n g p r o b l e m s ; T i m e - d e p e n d e n t travel- problems by decomposition', Ann. Oper. Res. 74
ing s a l e s m a n p r o b l e m ; G r a p h coloring; In- (1997), 51-64.
t e g e r p r o g r a m m i n g duality; I n t e g e r pro- Paul A. Rubin
gramming: Lagrangian relaxation; Integer Michigan State Univ.
programming: Algebraic methods; Integer East Lansing, MI, USA
programming: Branch and bound meth- E-mail address: rubin%msu, edu
ods; I n t e g e r p r o g r a m m i n g : B r a n c h a n d c u t MSC 2000: 62H30, 65Cxx, 65C30, 65C40, 65C50, 65C60,
algorithms; Integer programming: Cutting 90Cll
p l a n e a l g o r i t h m s ; I n t e g e r linear c o m p l e - Key words and phrases: classification, discriminant analy-
mentary problem; LCP: Pardalos-Rosen sis, integer programming.
mixed integer formulation; Decomposition
t e c h n i q u e s for M I L P : L a g r a n g i a n relax-
ation; M u l t i - o b j e c t i v e i n t e g e r linear pro- M I X E D INTEGER LINEAR PROGRAM-
gramming; Multiparametric mixed integer MING: H E A T E X C H A N G E R NETWORK
linear p r o g r a m m i n g ; P a r a m e t r i c m i x e d in- SYNTHESIS
t e g e r n o n l i n e a r o p t i m i z a t i o n ; S t o c h a s t i c in- Heat exchanger networks use the waste heat re-
t e g e r p r o g r a m m i n g : C o n t i n u i t y , stability, leased by hot process streams to heat the cold pro-
r a t e s of convergence; S t o c h a s t i c i n t e g e r cess streams of a chemical manufacturing plant,
p r o g r a m s ; B r a n c h a n d price: I n t e g e r pro- reducing utility costs by as much as 80%. Heat ex-
gramming with column generation; Statisti- changer network synthesis has been an active area
cal classification: O p t i m i z a t i o n a p p r o a c h e s ; of process research ever since the energy crisis of
L i n e a r p r o g r a m m i n g m o d e l s for classifica- the 1970s, and over 400 research papers have been
tion; O p t i m i z a t i o n in B o o l e a n classification published in the area. See [1], [2], [4], [5], [6], for
p r o b l e m s ; O p t i m i z a t i o n in classifying t e x t recent reviews.
documents. In 1979, T. Umeda et al. [8] discovered a thermo-
dynamic pinch point that limits the energy savings
References of a heat exchanger network, establishes minimum
[1] ASPAROUKHOV,O.K., AND STAM, A.: 'Mathematical utility levels, and partitions the heat exchanger
programming formulations for two-group classification network into two independent subnetworks. This
with binary variables', Ann. Oper. Res. 74 (1997), 89- discovery revolutionized heat exchanger network
112.
synthesis: with it, designers could compute util-
[2] BAJGIER, S.M., AND HILL, A.V.: 'An experimental
comparison of statistical and linear programming ap- ity levels a priori, then seek the heat exchanger
proaches to the discriminant problem', Decision Sci. network structure that uses the minimum utility
13, no. 4 (1982), 604-618. consumption while also minimizing the total in-
390
Mixed integer linear programming: Heat exchanger network synthesis
vestment cost. This remaining problem requires while cold process streams, the heat sinks, are akin
matching the hot utilities and process streams that to stores and shopping malls, the sinks of manu-
release heat with the cold process streams and util- factured goods.
ities that require heat, choosing the network struc- The analogy is not perfect, as heat only flows
ture of each stream, and designing the individual from a high temperature to a lower one, in obedi-
heat exchanger networks. In general, this is a mixed ence to the second law of thermodynamics. Par-
integer nonlinear programming problem (MINLP), titioning the temperature range of the heat ex-
but can be decomposed into two smaller problems changer network into intervals can capture this
by first selecting the matches between hot and cold heat flow pattern. Each interval sends excess, or
process streams and utilities by minimizing the to- residual, heat to the interval below it, just as ex-
tal number of units, then optimizing the network cess manufactured goods are sent to a discount
structure. The first problem is a mixed integer lin- warehouse.
ear programming problem that will be discussed The hot side of this temperature cascade is cre-
in detail here. ated by ordering T~ and TjI + ATmin from the
highest to the lowest value, creating t = 1 , . . . , T I
U s i n g M I L P M o d e l s to F i n d t h e M i n i - temperature intervals. Temperatures on the cold
m u m N u m b e r of U n i t s . Stated formally, the side of the cascade equal the temperature on the
minimum-units problem is: hot side minus ATmin. Hot stream i releases QiH
Given units of heat to temperature interval t. QiH is equal
1) A set of hot process streams and utilities to
i E H, and for each hot stream i: FCPi(Tt-1 - Tt)
a) the inlet and outlet temperatures T~ and if T[ >_ Tt -1 and T ° <_ Tt ,
T°;
FCP (T _ - T °)
b) either the heat capacity flow rate FCpi QiH -
or the heat duty Qi. if T I >__Tt -1 and T ° >_ Tt ,
Q
2) A set of cold process streams j C C, and for
each cold stream j:
a) the inlet and outlet temperatures TjI and
Cold stream j absorbs QjCt units of heat from tem-
T°;
perature interval t. Q~t equals
b) either the heat capacity flow rate FCpj
or the heat duty Qj,
F C P j ( T t - 1 - - Tt)
3) The minimum temperature difference be- if T I < Tt - ATtain and
tween hot and cold streams exchanging heat,
T ? >_ Tt - 1 - A Tmin,
ATtain.
Q~ _ F C P j ( T ° - Tt-~)
Identify a set of stream matches (i j) and their
if T] <_ Tt - ATmin and
heat duties Qij that
T ? <_ Tt -1 - ATmin,
a) meets the heating and cooling needs of each
stream; and Qj
i f T / - T° and T] - T t - i - ATmin.
b) minimizes the total number of matches.
S.A. Papoulias and I.E. Grossmann [7] formu- Any excess heat sent to interval t from hot
lated this as a mixed integer programming problem stream i cascades down to interval t + 1 through the
using a transshipment model, by making an anal- residual flow Rit. Process utilities may be treated
ogy between heat exchanger networks and trans- as process streams, or may be placed at the top or
portation networks. In the transshipment analogy, bottom of the cascade.
hot process streams, the sources of heat, are simi- This transshipment model of heat flow leads to
lar to manufacturing plants, the sources of goods, the following mixed integer linear programming
391
Mixed integer linear programming: Heat exchanger network synthesis
ing this a mixed integer linear programming prob- Table 3" QC, heat absorbed by cold stream i from
lem. t e m p e r a t u r e interval t.
392
Mixed integer linear programming: Heat exchanger network synthesis
In this example, the minimum number of units [1] BIEGLER, L.T., GROSSMANN, I.E., AND WESTER-
is 5, and there are four solutions to this MILP that BERG, A.W.: Systematic methods of chemical process
meet this minimum (cf. Table 4). [2] design, Prentice-Hall, 1997.
[2] FLOUDAS, C.A.: Nonlinear and mixed-integer optimi-
zation, Oxford Univ. Press, 1995.
[3] GUNDERSEN, T., AND GROSSMANN, I.E.: 'Improved
Conclusions. Mixed integer linear programs are optimization strategies for automated heat exchanger
used in heat exchanger network synthesis to iden- synthesis through physical insights', Computers Chem.
tify the minimum number of units, and a set of Engin. 14, no. 9 (1990), 925.
matches and their heat loads meeting the mini- [4] GUNDERSEN, W., AND NAESS, L.: 'The synthesis of cost
mum. These MILPs are based upon a transship- optimal heat exchanger networks: An industrial review
of the state-of-the-art', Computers Chem. Engin. 12,
ment model of heat flow.
no. 6 (1988), 503.
See also: Global optimization of heat ex- [5] JEZOWSKI, J.: 'Heat exchanger network grassroot and
c h a n g e r networks; Chemical process plan- retrofit design: The review of the state-of-the-art: Part
ning; Mixed integer linear programming: I', Hungarian J. Industr. Chem. 22 (1994), 279-294.
Mass and heat exchanger networks; Mixed [6] JEZOWSKI, J.: 'Heat exchanger network grassroot and
retrofit design: The review of the state-of-the-art: Part
integer nonlinear programming; Gener-
II', Hungarian J. Industr. Chem. 22 (1994), 295-308.
alized Benders decomposition; MINLP: [7] PAPOULIAS, S.A., AND GROSSMANN, I.E.: 'A struc-
Outer approximation algorithm; General- tural optimization approach in process synthesis - II:
ized outer approximation; MINLP: Gener- Heat recovery networks', Computers Chem. Engin. 7
alized cross decomposition; Extended cut- (I 983), 707.
[8] UMEDA, T., HARADA, T., AND SHIROKO, K.: 'A ther-
ting plane algorithm; MINLP: Logic-based
modynamic approach to the structure in chemical pro-
methods; MINLP: Branch and bound meth- cesses', Computers Chem. Engin. 3 (1979), 373.
ods; MINLP: Branch and bound global opti-
mization algorithm; MINLP: Global optimi- Kemal Sahin
Dept. Chemical Engin. Univ. Cincinnati
zation with c~BB; MINLP: Heat exchanger
Cincinnati OH 45221, USA
network synthesis; MINLP: Reactive dis- Korhan Gursoy
tillation column synthesis; MINLP: Design Dept. Chemical Engin. Univ. Cincinnati
and scheduling of batch processes; MINLP: Cincinnati OH 45221, USA
Applications in the interaction of design Amy Ciric
and control; MINLP: Application in facility Dept. Chemical Engin. Univ. Cincinnati
location-allocation; MINLP: Applications in Cincinnati OH 45221, USA
blending and pooling problems. MSC 2000:90C90
Key words and phrases: MILP, HEN synthesis, transship-
References ment model.
393
Mixed integer linear programming: Mass and heat exchanger networks
394
Mixed integer linear programming: Mass and heat exchanger networks
is explored mapping the rich and the lean streams • Rk is the set of rich streams, present in in-
on equivalent composition scales, that are derived terval k;
from the mass transfer feasibility requirements in
• Sk is the set of lean streams, present in in-
(1). In general, the composition equivalent scales terval k;
and the minimum composition difference, e, are de-
fined for each component of interest and each pair • Nint is the number of composition intervals;
of rich and lean streams. In the simple case of a • WRik is the mass exchange load of rich
single component, where mass transfer is indepen- stream i in interval k,
dent of the presence of other components in the
WR~ - Gi(Yk - max(yk+l, y~));
rich streams, the CID is constructed as illustrated
in Fig. 2. • W SJk is the mass exchange load of lean
Ri, stream j in interval k,
395
Mixed integer linear programming: Mass and heat exchanger networks
Variable inlet compositions usually require flexible minimum number of mass exchangers is found em-
mass exchange networks to accommodate the vari- ploying the expanded transshipment model, where
ations and define a different problem. For a single the existence of a mass exchange match-separator
component it has been shown that the minimum in a subnetwork is denoted by a binary variable:
MSA cost corresponds to the lower bounds of the
inlet compositions [8]. 1, when streamsi, j
For nonconvex equilibrium relations, (TP 1) can- Eijm- exchange mass
not guarantee feasible mass transfer throughout in subnetwork m
the composition range, while the predicted MSA 0, otherwise.
cost is a lower bound to the actual minimum one.
B.K. Srinivas and M.M. E1-Halwagi suggested in For a single component, the minimum number of
[14] an iterative procedure to calculate the mini- mass exchanger units is given by the following
mum required MSA cost, that involves two major MILP problem [4]:
steps:
i) a 'feasibility problem', where 'critical' com-
min
m iERm jESm
E Jm
position levels are identified and included in
s.t. (~ik -- (~ik-1 + E Mijk
the CID (nonconvex NLP step, that requires jESmk
global optimization methods), and - WRik,
ii) (TP1) with updated intervals, which calcu- k E Ira, i E Rmk, m E M,
lates increasing lower bounds to the mini-
mum MSA cost.
iE Rrnk
Instead of target outlet compositions for the rich (TP2) k E Ira, j E Sink, m E M
streams, it may be of interest to remove a cer-
E Mijk -- EijmUijm ~_ 0
tain total mass load of pollutants. Then, (TP1) is
kEIm
solved with variable rich outlets and a fixed total
5/k ~ 0 , k E I m , i E R m ,
mass exchange load [10]:
M~jk >_O, k E Im,
- Z - i E Rkm, j E Skm
i
E~jk =O, 1, k E Im,
The minimum-utility-cost problem has been al-
ternatively formulated as an LP or MINLP prob- i E Rkm, j E Skin,
lem, based on total mass balances and the follow- where
ing property:
• Rm is the set of rich streams, present in sub-
/ Mass l°st by allthe rich /
network m,
streams below each (2)
pinch point candidate • Sm is the set of lean streams, present in sub-
network m,
Mass gained by all the lean /
- streams below each _ 0 • Im is the set of intervals in subnetwork m,
pinch point candidate • Rkm is the set of rich streams, present in in-
and employing binary variables to denote the terval k of subnetwork m, or above,
relative position of variable outlet compositions • Skin is the set of lean streams, present in in-
with respect to each pinch point candidate in the terval k of subnetwork m,
CID [5], [6], [8], [9].
• WR~ is the mass exchange load of rich
The minimum number of mass exchange opera-
stream i in interval k,
tions (units) for fixed MSA cost is determined in
each subnetwork in a second step, in an attempt • (~ik is the residual mass exchange load of rich
to minimize the fixed cost of the separators. The stream i in interval k,
396
Mixed integer linear programming: Mass and heat exchanger networks
• W SJk is the mass exchange load of lean matches, featuring the minimum MSA cost, may
stream j in interval k, as determined by be generated by solving (TP2) iteratively and in-
(TP1), cluding integer cuts. These do not necessarily cor-
respond to networks of the same overall cost.
• M i j k is the mass exchange load between i and
j in interval k, The expanded transshipment model can also
be employed to determine the minimum MSA
• Uijm is an upper bound to the possible mass
cost, considering variable mass loads for the lean
exchange load between i and j in subnetwork
streams. Then, forbidden or restricted mass ex-
m~
397
Mixed integer linear programming: Mass and heat exchanger networks
an optimization variable for each mass exchanger MHEN is independent of such a stream decompo-
separately. In the sequential synthesis method this sition, see Fig. 3.
is fixed arbitrarily to a possibly conservative value
for the construction of the CID. E1-Halwagi and V. (Ri I , ySi ,T si ) ( R i l ' Y i t 'Tit )
Manousiouthakis [4] suggested a two-level optimi-
hot ~ rich / cold
zation procedure to select a unique e for all mass stream ~
stream t
stream
exchange operations, based on the impact of e on (Ri I ,ySi 'TI ) ~ ( R i l ' y i 'T1 )
the final MEN cost, still, not exploiting the overall
Fig. 4: Rich s u b s t r e a m with T2 _< Tl _< T[.
cost trade-offs.
When isothermal mass exchange operations Although the mass exchange temperatures
take place at different temperature levels, the op- ( T 1 , . . . , T N ) are variables, their relative position
erating and overall mass integration costs are af- with respect to inlet and outlet stream tempera-
fected by the heating and cooling requirements tures (greater or less) can be prepostulated. Thus,
of the system. Energy integration between the the rich and lean substreams define hot (or cold)
rich and lean streams can be considered within a streams before their mass exchange operations and
mass and heat exchanger network synthesis prob- cold (or hot) streams afterwards, cf. Fig. 4.
lem (MHEN) to reduce the total cost. The overall A CID is constructed, similarly to the simple
problem is addressed combining MEN and HEN MEN case, involving the several substreams with
synthesis tools. The optimal temperature of mass variable flows, and thus, variable mass loads in
exchange is defined for each pair of rich and lean each composition interval. Mass exchange is per-
streams by the equilibrium relations that limit mitted between substreams of the same tempera-
mass transfer ture. A temperature interval diagram, TID, is also
constructed, involving the hot and cold substreams
Yi >_Kij(T)xj, and the available heating and cooling utilities, with
variable heat loads per interval, due to the vari-
where Kij(T) is a known function of temperature. able substream flows. In order to avoid discrete
T s ,x s T1 ,x s decisions (i.e. presence or not of streams in tem-
_ ~ ~- Tk ,x s [ perature intervals with variable limits), the tem-
= TN,Xs ~ ""1 "'" 1 perature range for each mass transfer operation is
Heat Mass i1
discretized and a substream is associated with each
Exchange Exchange
Network Network candidate temperature [13].
Tt,x t TN'Xt ! [ The minimum utility cost is found from the so-
= = Tk x i.... lution of the combined LP transshipment model,
Tl ,x t which, for a single component is as follows:
Fig. 3.
min
In the sequential synthesis framework, the over-
all minimum operating cost for the network (cost
of mass separating agents and heating/cooling
(TP3) + E E chQHUhn
nETI hE HUn
utilities) may be calculated from a combined mass
and heat transshipment model. Each stream is
considered to consist of substreams, of the same in- nETI cECUn
let and outlet composition and temperature, each
such that
of which participates to isothermal mass exchange
operations at a different temperatures. Srinivas
and E1-Halwagi proved [13], that, for monotonic WRy',
dependence of the equilibrium constant on tem- jeSl}eSSjk
398
Mixed integer linear programming: Mass and heat exchanger networks
399
Mixed integer linear programming: Mass and heat exchanger networks
400
Mixed integer nonlinear programming
[12] SRINIVAS,B.K., AND EL-HALWAGI, M.M.: 'Optimal ing/planning of batch processes and retrofit
design of pervaporation systems for waste reduction', of heat recovery systems).
Computers Chem. Engin. 17 (1993), 957-970.
[13] SPdNIVAS, B.K., AND EL-HALWAGI, M.M.: 'Synthe- The book [88] studies mixed integer linear optimi-
sis of combined heat and reactive mass exchange net- zation and combinatorial optimization, while the
works', Chem. Engin. Sci. 49 (1994), 2059-2074.
[40] studies mixed integer nonlinear optimization
[14] SRINWAS, B.K., AND EL-HALWAGI, M.M.: 'Synthesis
of reactive mass exchange networks with general non- problems.
linear equilibrium relations', AIChE J. 40 (1994), 463- The coupling of the integer domain and the con-
472. tinuous domain with their associated nonlineari-
Katerina P. Papalexandri ties make the class of MINLP problems very chal-
bp Upstream Technol. lenging from the theoretical, algorithmic, and com-
U.K. putational point of view. Mixed integer nonlinear
E-mail address: papalexkObp, cora optimization problems are encountered in a vari-
MSC2000: 93A30, 93B50 ety of applications in all branches of engineering
Key words and phrases: MILP, mass and heat exchange, and applied science, applied mathematics, and op-
separation. erations research. These represent very important
and active research areas that include:
• process synthesis
M I X E INTEGER
D NONLINEAR P R O -
- heat exchanger networks
G R A M M I N G, M I N L P
- retrofit of heat recovery systems
A wide range of nonlinear optimization problems distillation sequencing
-
401
Mixed integer nonlinear programming
-design and retrofit of multiproduct ii) design of dynamic systems under uncertainty
plants [31], [85]; and
- synthesis, design and scheduling of mul- iii) design of batch processes under uncertainty
tipurpose plants [109], [63], [57], [108].
- planning under uncertainty
In the area of molecular design, the MINLP ap-
• facility location and allocation
plications include"
• facility planning and scheduling
i) the computer-aided molecular design aspects
• topology of transportation networks of selecting the best solvents [91];
The applications in the area of process synthesis ii) design of polymers and refrigerants [80], [22],
in chemical engineering include: [23], [35], [111], [21], [126]; and
i) the synthesis of grassroot heat recovery net- iii) property prediction under uncertainty [81].
works [43], [25], [24], [139], [138], [140];
The MINLP applications in the area of interac-
ii) the retrofit of heat exchanger systems [25], tion of design, synthesis and control include:
[95];
i) studies under steady state operation of chem-
iii) the synthesis of distillation-based separation ical processes [78], [79], [96], [97]; and
systems [102], [131], [8], [9], [104], [90];
ii) studies under dynamic operation [85], [86],
iv) the synthesis of mass exchange networks [54], [11s].
[99];
Applications of MINLP approaches have also
v) the synthesis of complex reactor networks
emerged in the area of process operations and in-
[71], [73], [74], [119];
clude:
vi) the synthesis of reactor-separator-recycle sys-
i) short term scheduling of batch and semicon-
tems [72];
tinuous processes [143], [84];
vii) the synthesis of utility systems [65];
ii) the design of multiproduct plants [53], [17],
viii) the synthesis of total process systems [68], [lS];
[69], [75], [28], [29], [98], [76]; and
iii) the synthesis, design and scheduling of mul-
ix) the analysis and synthesis of metabolic path- tipurpose plants [127], [128], [36], [93], [94],
ways [30], [58], [59], [107]. [132], [133], [116], [37], [13], [137]; and
Reviews of the mixed integer nonlinear optimiza- iv) planning under uncertainty [64], [62], [63],
tion frameworks and applications in Process Syn- [77], [106].
thesis are provided in [49], [40], [50], and [7], while
algorithmic advances for logic and global optimi- Reviews of the advances in the design, scheduling
zation in Process Synthesis are reviewed in [44]. and planning of batch plants can be found in [113],
[52], while a collection of recent contributions can
The MINLP applications in the area of process
be found in the proceedings of the 1998 FOCAPO
design include:
meeting.
i) reactive distillation processes [26]; MINLP applications received significant atten-
ii) design of dynamic systems [14], [11], [117], tion in other engineering disciplines. These include
[118];
i) the facility location in a multi-attribute space
iii) plant layout systems [105], [47]; and [45];
iv) environmentally benign systems [27], [123]. ii) the optimal unit allocation in an electric
The MINLP applications in the area of process power system [16];
synthesis and design under uncertainty include: iii) the facility planning of an electric power gen-
i) deterministic and stochastic uncertainty eration [19], [114];
analysis [51], [1], [33]; iv) the chip layout and compaction [32];
402
Mixed integer nonlinear programming
y e
M a t h e m a t i c a l D e s c r i p t i o n . The general alge-
braic MINLP formulation can be stated as: where y now is a vector of q 0-1 variables (e.g., ex-
istence of a process unit (yi - 1) or nonexistence
min f(x, y) - 0)).
x,y
s,t, h(x, y) - 0 Challenges in M I N L P . Dealing with mixed integer
g(x, y) _~ 0 (1) nonlinear optimization models of the form (1) or
xcXCR n (2) present two major challenges. These difficul-
y EY integer. ties are associated with the nature of the problem,
namely, the combinatorial domain (y-domain) and
Here x represents a vector of n continuous vari- the continuous domain (x-domain).
ables (e.g., flows, pressures, compositions, temper- As the number of binary variables y in (2) in-
atures, sizes of units), and y is a vector of inte- crease, one is faced with a large combinatorial
ger variables (e.g., alternative solvents or mate- problem, and the complexity analysis results char-
rials); h ( x , y ) - 0 denote the m equality con- acterize MINLP problems as NP-complete [88].
straints (e.g., mass, energy balances, equilibrium At the same time, due to the nonlinearities the
relationships); g ( x , y ) < 0 are the p inequality MINLP problems are in general nonconvex which
constraints (e.g., specifications on purity of distil- implies the potential existence of multiple local so-
lation products, environmental regulations, feasi- lutions. The determination of a global solution of
bility constraints in heat recovery systems, logical the nonconvex M I N L P problems is also NP-hard,
constraints); and f ( x , y ) is the objective function since even the global optimization of constrained
(e.g., annualized total cost, profit, thermodynamic nonlinear programming problems can be NP-hard
criteria). [100], and even quadratic problems with one neg-
ative eigenvalue are NP-hard [101]. An excellent
REMARK 1 The integer variables y with given
book on complexity issues for nonlinear optimiza-
lower and upper bounds
tion is [129].
yL < y < y U Despite the aforementioned discouraging results
from complexity analysis, which are worst-case re-
can be expressed through 0-1 variables (i.e., bi- sults, significant progress has been achieved in the
nary), denoted as z, by the following formula: MINLP area from the theoretical, algorithmic, and
computational perspective. As a result, several al-
y _ y L + Zl + 2z2 + 4z3 + . . . 4- 2 N - I Z N , gorithms have been proposed for convex and non-
convex MINLP models, their convergence proper-
where N is the minimum number of 0-1 variables
ties have been investigated, and a large number of
needed. This minimum number is given by:
applications now exist that cross the boundaries
log (yU _ yL) of several disciplines. In the sequel, we will discuss
log 2 S these developments.
403
Mixed integer nonlinear programming
the form (1) or restricted classes of (2) includes The generalized cross decomposition, GCD, si-
the following: multaneously utilizes primal and dual information
by exploiting the advantages of Dantzig-Wolfe and
1) generalized Benders decomposition, GBD,
generalized Benders decomposition.
[46], [103], [42];
An overview of these local MINLP algorithms
2) outer approximation, OA, [34];
and extensive theoretical, algorithmic, and appli-
3) outer approximation with equality relax- cations of GBD, OA, OA/ER, OA/ER/AP, GOA,
ation, OA/ER, [67]; and GCD algorithms can be found in [40].
4) outer approximation with equality relaxation The branch and bound, B B, approaches start
and augmented penalty, OA/ER/AP, [131]; by solving the continuous relaxation of the MINLP
5) generalized outer approximation, GOA, [38]; and subsequently perform an implicit enumeration
where a subset of the 0-1 variables is fixed at each
6) generalized cross decomposition, GCD, [61];
node. The lower bound corresponds to the NLP
7) branch and bound, BB, [15], [55], [92], [20], solution at each node and it is used to expand on
[110], [39]; the node with the lowest lower bound or it is used
8) feasibility approach, FA, [82]; to eliminate nodes if the lower bound exceeds the
9) extended cutting plane, ECP, [135], [134]; current upper bound. If the continuous relaxation,
NLP in most cases with the exception of the algo-
10) logic-ba [124], [130]. rithm of [110] where an LP problem is obtained, of
In the pioneering work [46] on the generalized the MINLP has a 0-1 solution for the y variables,
benders decomposition, GBD, two sequences of then the BB algorithm will terminate at that node.
updated upper (nonincreasing) and lower (nonde- With a similar argument, if a tight NLP relaxation
creasing) bounds are created that converge within results in the first node of the tree, then the num-
e in a finite number of iterations. The upper ber of nodes that would need to be eliminated can
bounds correspond to solving subproblems in the x be low. However, loose NLP relaxations may result
variables by fixing the y variables, while the lower in having a large number of NLP subproblems to
bounds are based on duality theory. be solved. The algorithm terminates when the low-
The outer approximation, OA, addresses prob- est lower bound is within a prespecified tolerance
lems with nonlinear inequalities, and creates se- of the best upper bound.
quences of upper and lower bounds as the GBD, The feasibility approach, FA, rounds the relaxed
but it has the distinct feature of using primal in- NLP solution to an integer solution with the least
formation, that is the solution of the upper bound local degradation by successively forcing the su-
problems, so as to linearize the objective and con- perbasic variables to become nonbasic based on
straints around that point. The lower bounds in the reduced cost information. The premise of this
OA are based upon the accumulation of the lin- approach is that the problems to be treated are
earized objective function and constraints, around sufficiently large so that techniques requiring the
the generated primal solution points. solution of several NLP relaxations, such as the
The OA/ER algorithm extends the OA to han- branch and bound approach, have prohibitively
dle nonlinear equality constraints by relaxing them large costs. They therefore wish to account for the
into inequalities according to the sign of their as- presence of the integer variables in the formulation
sociated multipliers. and solve the mixed integer problem directly. This
The O A / E R / A P algorithm introduces an aug- is achieved by fixing most of the integer variables
mented penalty function in the lower bound sub- to one of their bounds (the nonbasic variables)
problems of the O A / E R approach. and allowing the remaining small subset (the basic
The generalized outer approximation, GOA, ex- variables) to take discrete values in order to iden-
tends the OA to the MINLP problems that the tify feasible solutions. After each iteration, the re-
GBD addresses and introduces exact penalty func- duced costs of the variables in the nonbasic set are
tions. computed to measure their effect on the objective
404
Mixed integer nonlinear programming
function. If a change causes the objective function [130] introduced LOGMIP, a computer code for
to decrease, the appropriate variables are removed disjunctive programming and MINLP problems,
from the nonbasic set and allowed to vary for the and studied modeling alternatives and process syn-
next iteration. When no more improvement in the thesis applications.
objective function is possible, the algorithm is ter-
minated. This strategy leads to the identification O v e r v i e w of G l o b a l O p t i m i z a t i o n Ap-
of a local solution. p r o a c h e s for N o n c o n v e x M I N L P M o d e l s . In
The cutting plane algorithm proposed in [66] the previous Section we discussed local MINLP al-
for NLP problems has been extended to MINLPs gorithms which are applicable to convex MINLP
[135], [134]. The ECP algorithm relies on the lin- models. While identification of the global solution
earization of one of the nonlinear constraints at for convex problems can be guaranteed, a local so-
each iteration and the solution of the increasingly lution is often obtained for nonconvex problems.
tight MILP made up of these linearizations. The The recent book by [41] discusses the theoretical,
solution of the MILP problem provides a new point algorithmic and applications oriented advances in
on which to base the choice of the constraint to be the global optimization of mixed integer nonlin-
linearized for the next iteration of the algorithm. ear models. A number of global MINLP algorithms
The ECP does not require the solution of any NLP that have been developed to address different types
problems for the generation of an upper bound. of nonconvex MINLPs are presented in this sec-
As a result, a large number of linearizations are tion. These include:
required for the approximation of highly nonlinear
problems and the algorithm does not perform well 1) Branch and reduce approach, [115];
in such cases. Due to the use of linearizations, con- 2) interval analysis based approach, [125];
vergence to the global optimum solution is guar-
3) extended cutting plane approach, [135], [136];
anteed only for problems involving inequality con-
straints which are convex in the x and relaxed y- 4) re]ormulation/spatial branch and bound ap-
space. proach, [121], [122];
An alternative to the direct solution of the 5) hybrid branch and bound and outer approxi-
MINLP problem was proposed by [124]. Their ap- mation approach, [142], [141];
proach stems from the work of [70] on a model-
6) The SMIN-aBB approach, [2], [4];
ing/decomposition strategy which avoids the zero-
flows generated by the nonexistence of a unit in a 7) The GMIN-aBB approach, [2], [4].
process network. The first stage of the algorithm
In the sequel, we will briefly discuss the ap-
is the reformulation of the MINLP into a gener-
proaches 1)-7).
alized disjunctive program. A vector of Boolean
variables indicate the status of a disjunction (True Branch and Reduce algorithm. [115] extended the
or False) and are associated with the alternatives. scope of branch and bound algorithms to prob-
The set of disjunctions allows the representation lems for which valid convex underestimating NLPs
of several alternatives. A set of logical relation- can be constructed for the nonconvex relaxations.
ships between the Boolean variables is introduced. The range of application of the proposed algo-
Instead of resorting to binary variables within a rithm encompasses bilinear problems and separa-
single model, the disjunctions are used to gener- ble problems involving functions for which convex
ate a different model for each alternative. Since all underestimators can be built [83], [10]. Because the
continuous variables associated with the nonexist- nonconvex NLPs must be underestimated at each
ing alternatives are set to zero, this representa- node, convergence can only be achieved if the con-
tion helps to reduce the size of the problems to tinuous variables are branched on. A number of
be solved. Two algorithms are suggested by [124]. tests are suggested to accelerate the reduction of
They are logic-based variants of the outer approx- the solution space. They are summarized in the
imation and generalized Benders decomposition. following.
405
Mixed integer nonlinear programming
Optimality Based Range Reduction Tests. For the are not obtained through optimization. Instead,
first set of tests, an upper bound U on the non- they are based on the range of the objective func-
convex MINLP must be computed and a convex tion in the domain under consideration, as com-
lower bounding NLP must be solved to obtain a puted with interval arithmetic. As a consequence,
lower bound L. If a bound constraint for variable these bounds may be quite loose and efficient fath-
x/, with x L < x/ ~ x U, is active at the solution oming techniques are required in order to enhance
of the convex NLP and has multiplier )~ > 0, the convergence. [125] suggested node fathoming tests
bounds on x / c a n be updated as follows: and branching strategies which are outlined in the
sequel. Convergence is declared when best upper
1) If xi - x U - 0 at the solution of the convex
and lower bounds are within a prespecified toler-
NLP and ~i - x U - ( U - L ) / ) ~ is such that
~i > x L, then x L - ~i. ance and when the width of the corresponding re-
gion is below a prespecified tolerance.
2) If x / - x L - 0 at the solution of the convex
NLP and ni - x L + ( U - L ) / ) ~ is such that Node Fathoming Tests. The upper-bound test is a
~ / < x U, then x U - ~/. classical criterion used in all branch and b o u n d
schemes: If the lower bound for a node is greater
If neither bound constraint is active at the so-
than the best upper bound for the MINLP, the
lution of the convex NLP for some variable xj,
node can be fathomed.
the problem can be solved by setting xj - x v or
xj - x jL. Tests similar to those presented above The in feasibility test is also used by all branch
are then used to update the bounds on xj. and bound algorithms. However, the identifica-
tion of infeasibility using interval arithmetic differs
Feasibility Based Range Reduction Tests. In addi- from its identification using optimization schemes.
tion to ensuring that tight bounds are available An inequality constraint g / ( x , y ) < 0 is declared
for the variables, the underestimators of the con- infeasible if its interval inclusion over the current
straints are used to generate new constraints for domain, is positive. If a constraint is found to be
the problem. Consider the constraint gi(x, y) < 0. infeasible, the current node is fathomed.
If its underestimating function gi(x, y) - 0 at the The monotonicity test is used in interval-based
solution of the convex NLP and its multiplier is approaches. If a region is feasible, the monotonicity
#~ > 0, the constraint properties of the objective function can be tested.
U-L For this purpose, the inclusions of the gradients of
g (x, y ) >
- #; the objective with respect to each variable are eval-
uated. If all the gradients have a constant sign for
can be included in subsequent problems.
the current region, the objective function is mono-
The branch and reduce algorithm has been tonic and only one point needs to be retained from
tested on a set of small problems. the current node.
Interval Analysis Based Approach. An approach The nonconvexity test is used to test the exis-
based on interval analysis was proposed by [125] tence of a solution (local or global) within a region.
to solve to global optimality problems with a If such a point exists, the Hessian matrix of the
twice-differentiable objective function and once- objective function at this point must be positive
differentiable constraints. Interval arithmetic al- semidefinite. A sufficient condition is the nonneg-
lows the computation of guaranteed ranges for ativity of at least one of the diagonal elements of
these functions [87], [112], [89]. The approach re- its interval Hessian matrix.
lies on the same concepts of successive partition- [125] suggested two additional tests to acceler-
ing of the domain and bounding of the objective ate the fathoming process. The first is denoted
function, while the branching takes place on the as lower bound test. It requires the computation
discrete and continuous variables. The main differ- of a valid lower bound on the objective function
ence with the branch and bound algorithms is that through a method other t h a n interval arithmetic.
bounds on the problem solution in a given domain If the upper bound at a node is less t h a n this lower
406
Mixed integer nonlinear programming
bound, the region can be eliminated. The second 3) If constraint i is such that gi(xk,y k) > 0, add
test, the distrust region method, aims to help the its linearization around (x k, yk).
algorithm identify infeasible regions so that they
The convergence criterion is also modified. In
can be removed from consideration. Based on the
addition to the test used in Step 3, the following
knowledge of an infeasible point, interval arith-
two conditions must be met:
metic is used to identify an infeasible hypercube
centered on that point. 1) (x k - x k - 1 ) T ( x k _ x k - 1 ) _~ 5, & pre-specified
tolerance.
Branching Strategies. The variable with the widest
2) yk _ yk-1 _ 0.
range is selected for branching. It can be a contin-
uous or a discrete variable. In order to determine The ECP algorithm for pseudoconvex MINLPs
where to split the chosen variable, a relaxation of has been used to address a trim loss problem aris-
the MINLP is solved locally. ing in the paper industry [136]. A comparative
study between the outer approximation, the gen-
• Continuous Branching Variable: If the opti-
eralized Benders decomposition and the extended
mal value of the continuous branching vari-
cutting plane algorithm for convex MINLPs was
able, x*, is equal to one of the variable
presented in [120].
bounds, branch at the midpoint of the in-
terval. Otherwise, branch at x * - ~ , where Reformulation/Spatial Branch and Bound Algo-
is a very small scalar. rithm. A global optimization algorithm of the
• Discrete Branching Variable" If the optimal branch and bound type was proposed in [121]. It
value of the discrete branching variable, y*, can be applied to problems in which the objective
is equal to the upper bound on the variable, and constraints are functions involving any combi-
define a region with y - y* and one with nation of binary arithmetic operations (addition,
yL _< y < y . _ 1, where yL is the lower subtraction, multiplication and division) and func-
bound on y. Otherwise, create two regions tions that are either concave over the entire solu-
yL <_y < int(y*) and int(y*) + 1 < y < yU, tion space (such as ln) or convex over this domain
where yV is the upper bound on y. (such as exp).
The algorithm starts with an automatic refor-
This algorithm has been tested on a small example
mulation of the original nonlinear problem into a
problem and a molecular design problem [125].
problem that involves only linear, bilinear, linear
Extended Cutting Plane for Pseudoconvex fractional, simple exponentiation, univariate con-
MINLPs. The use of the ECP algorithm for non- cave and univariate convex terms. This is achieved
convex MINLP problems-was suggested in [135], through the introduction of new constraints and
using a modified algorithmic procedure as de- variables. The reformulated problem is then solved
scribed in [136]. The main changes occur in the to global optimality using a branch and bound ap-
generation of new constraints for the MILP at each proach. Its special structure allows the construc-
iteration (Step 4). In addition to the construction tion of a convex relaxation at each node of the tree.
of the linear function lk(x, y) at iteration k, the It should be noted that due to the introduction
following steps are taken: of many new constraints and variables the size of
the convex relaxation of the reformulated problem
1) Remove all constraints for which li(x k, yk) >
increases substantially even for modest size prob-
gji( xk,yk)" These correspond to lineariza-
lems. The integer variables can be handled in two
tions which did not underestimate the cor-
ways during the generation of the convex lower
responding nonlinear constraint at all points
bounding problem. The integrality condition on
due to the presence of nonconvexities.
the variables can be relaxed to yield a convex NLP
2) Replace all constraints for w h i c h li(x k, y k ) _ which can then be solved globally. Alternatively,
gji (xk, yk) _ 0 by their linearization around the integer variables can be treated directly and
(xk, yk). the convex lower bounding MINLP can be solved
407
Mixed integer nonlinear programming
using a branch and bound algorithm. This second tinuous variables and/or appear in at most bilin-
approach is more computationally intensive but ear terms, while nonlinear terms in the continuous
is likely to result in tighter lower bounds on the variables appear separably from the binary/integer
global optimum solution. variables. These mathematical models become:
In order to obtain an upper bound on the op- rain f ( x ) + x TA0y + c~y
x~y
timum solution, a local MINLP algorithm can
be used. Alternatively, the MINLP can be trans- s.t. h(x) + x TAly + c~y - 0
formed to an equivalent nonconvex NLP by relax- g(x) + x TA2y + c~y _< 0 (3)
ing the integer variables. For example, a variable xEXCR n
y C {0, 1} can be replaced by a continuous variable y EY integer,
z E [0, 1] by including the constraint z - z. z = 0.
This algorithm has been applied to reactor se- where c~-, c~ and c~ are constant vectors, A0, A1
lection, distillation column design, nuclear waste and A2 are constant matrices and f(x), h(x) and
blending, heat exchanger network design and mul- g(x) are functions with continuous second order
tilevel pump configuration problems. derivatives.
The theoretical, algorithmic and computational
Hybrid Branch and Bound and Outer Approxima- studies of the SMIN-aBB algorithm are presented
tion. [142] proposed a global optimization MINLP in detail in [41].
approach for the synthesis of heat exchanger net-
works without stream splitting. This approach is The GMIN-aBB Algorithm. The GMIN-aBB
a hybrid branch and bound with outer approxima- global optimization algorithm proposed in [2] op-
tion. It is based on two alternative convex underes- erates within a branch and bound framework. The
timators for the heat transfer area. The first type main difference with the algorithms of [56], [92]
of these convex underestimators along with the and [20] is its ability to identify the global opti-
variable bounds and techniques for the bound con- mum solution of a much larger class of problems
traction are based on a thermodynamic analysis. of the form
The second type is based on a relaxation and trans- rain f(x, y)
x~y
formation so as to employ specific underestimation
schemes. These convex underestimators result in s.t. h(x, y ) = 0
a convex MINLP that is solved using the Outer g(x, y) _~ 0
Approximation approach and which provides valid xEXCR n
lower bounds on the global solution. This approach y E N q,
has been applied to five heat exchanger network
examples that employ the MINLP model of [138] where N is the set of nonnegative integers and the
that contains linear constraints and nonconvex ob- only condition imposed on the functions f ( x , y ) ,
jective function. g(x, y) and h(x, y) is that their continuous relax-
ations possess continuous second order derivatives.
[141] introduced a deterministic branch and con-
This increased applicability results from the use of
tract approach for structured process systems that
the aBB global optimization algorithm for contin-
have univariate concave, bilinear and linear frac-
uous twice-differentiable NLPs [12], [6], [5], [3].
tional terms. They proposed properties of the con-
The theoretical, algorithmic and computational
traction operation and studied their effect on sev-
studies of the GMIN-aBB Algorithm are presented
eral applications.
in detail in [41].
The SMIN-aBB Algorithm. The SMIN-aBB See also: C o m p l e x i t y t h e o r y : Q u a d r a t i c
global optimization algorithm, proposed by [2] is programming; Complexity of d e g e n e r -
designed to solve to global optimality mathemati- acy; C o m p l e x i t y classes in o p t i m i z a -
cal models where the binary/integer variables ap- tion; I n f o r m a t i o n - b a s e d c o m p l e x i t y a n d
pear linearly and hence separably from the con- information-based optimization; Fractional
408
Mixed integer nonlinear programming
combinatorial optimization; Complexity of [6] ADJIMAN, C.S., AND FLOUDAS, C.A.: 'Rigorous con-
gradients, Jacobians~ and Hessians; Com- vex underestimators for general twice-differentiable
problems', J. Global Optim. 9 (1996), 23-40.
plexity theory; Computational complex-
[7] ADJIMAN, C.S., SCHWEIGER, C.A., AND FLOUDAS,
ity theory; Parallel computing: Complexity C.A.: 'Mixed-integer nonlinear optimization in pro-
classes; K o l m o g o r o v c o m p l e x i t y ; G l o b a l o p - cess synthesis', in D.-Z. Du AND P.M. PARDA-
t i m i z a t i o n in t h e a n a l y s i s a n d m a n a g e m e n t LOS (eds.): Handbook Combinatorial Optim., Kluwer
of environmental systems; Interval global Acad. Publ., 1998.
[8] AGGARWAL, A., AND FLOUDA'S, C.A.: 'Synthesis of
optimization; Continuous global optimiza-
general distillation sequences- Nonsharp separa-
tion: Applications; Chemical process plan- tions', Computers Chem. Engin. 14, no. 6 (1990), 631.
ning; Mixed integer linear programming: [9] AGGARWAL, A., AND FLOUDAS, C.A.: 'Synthesis
Mass and heat exchanger networks; Gen- of heat integrated nonsharp distillation sequences',
eralized Benders decomposition; MINLP: Computers Chem. Engin. 16 (1992), 89.
Outer approximation algorithm; General- [10] AL-KHAYYAL,F.A.: 'Jointly constrained bilinear pro-
grams and related problems: An overview', Comput.
ized outer approximation; MINLP: Gener-
Math. Appl. 19 (1990), 53.
alized cross decomposition; Extended cut- [11] ALLGOR, R.I., AND BARTON, P.I.: 'Mixed integer
ting plane algorithm; MINLP: Logic-based dynamic optimization', Computers Chem. Engin. 21
methods; MINLP: Branch and bound meth- (1997), $451-$456.
ods; M I N L P : B r a n c h a n d b o u n d g l o b a l o p t i - [12] ANDROULAKIS,I.P., MARANAS, C.D., AND FLOUDAS,
C.A.: 'aBB: A global optimization method for general
mization algorithm; MINLP: Global optimi-
constrained nonconvex problems', J. Global Optim. 7
zation with aBB; MINLP: Heat exchanger
(1995), 337-363.
network synthesis; MINLP: R e a c t i v e dis- [13] BARBOSA-POVOA, A.P., AND MACCHIETTO, S.: 'De-
tillation column synthesis; MINLP: Design tailed design of multipurpose batch plants', Comput-
and scheduling of batch processes; MINLP: ers Chem. Engin. 18, no. 11-12 (1994), 1013-1042.
A p p l i c a t i o n s in t h e i n t e r a c t i o n o f d e s i g n [14] BARTON, P.I., ALLGOR, R.J., AND FEEHERY, W.F.:
'Dynamic optimization in a discontinuous world', I-
a n d c o n t r o l ; M I N L P : A p p l i c a t i o n in f a c i l i t y
EC Res. 37, no. 3 (1998), 966-981.
location-allocation; MINLP: Applications [15] BEALE, E.M.L.: 'Integer programming': The State of
in b l e n d i n g a n d p o o l i n g p r o b l e m s ; M I N L P : the Art in Numerical Analysis, Acad. Press, 1977,
Trim-loss problem. pp. 409-448.
[16] BERTSEKAS, D.L., LOWER, G.S., SANDELL, N.R.,
AND POSBERGH, T.A.: 'Optimal short term schedul-
References ing of large-scale power systems', IEEE Trans. A u-
[1] ACEVEDO, J., AND PISTIKOPOULOS, E.N.: 'A para- tom. Control AC-28 (1983), 1.
metric MINLP algorithm for process synthesis prob- [17] BIREWAR, D.B., AND GROSSMANN, I.E.: 'Incorporat-
lems under uncertainty', I-EC Res. 35, no. 1 (1996), ing scheduling in the optimal design of multiproduct
147-158. batch plants', Computers Chem. Engin. 13 (1989),
[2] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS, 141-161.
C.A.: 'Global optimization of MINLP problems in [18] BIREWAR, D.B., AND GROSSMANN, I.E.: 'Simultane-
process synthesis and design', Computers Chem. En- ous synthesis, sizing and scheduling of multiproduct
gin. Suppl. 21 (1997), $445-$450. batch plants', Computers Chem. Engin. 29, no. 11
[3] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS, (1990), 2242.
C.A.: 'A global optimization method, aBB, for gen- [19] BLOOM, J.A.: 'Solving an electricity generating ca-
eral twice-differentiable N L P s - II. Implementation pacity expansion planning problem by generalized
and computational results', Computers Chem. Engin. benders' decomposition', Oper. Res. 31, no. 5 (1983),
22, no. 9 (1998), 1159-1179. 84.
[4] ADJIMAN, C.S., ANDROuLAKIS, I.P., AND FLOUDAS, [20] BORCHERS, B., AND MITCHELL, J.E.: 'An improved
C.A.: 'Global optimization of mixed-integer nonlinear branch and bound algorithm for mixed integer nonlin-
problems', AIChE J. 46 (2000), 1769-1797. ear programs', Techn. Report Renssellaer Polytechnic
[5] ADJIMAN, C.S., DALLWIG, S., FLOUDAS, C.A., AND Inst., no. 200 (1991).
NEUMAIER, A.: 'A global optimization method, aBB, [21] CAMARDA, K.V., AND MARANAS, C.D.: 'Optimiza-
for general twice-differentiable NLPs- I. Theoretical tion in polymer design using connectivity indices', I-
advances', Computers Chem. Engin. 22, no. 9 (1998), EC Res. 38 (1999), 1884-1892.
1137-1158.
409
Mixed integer nonlinear programming
[22] CHURI, N., AND ACHENIE, L.E.K.: 'Novel mathemat- Computers Chem. Engin. 15, no. 12 (1991), 843.
ical programming model for computer aided molecular [38] FLETCHER, R., AND LEYFFER, S.: 'Solving mixed in-
design', Industr. Engin. Chem. Res. 35, no. 10 (1996), teger nonlinear programs by outer approximation',
3788-3794. Math. Program. 66, no. 3 (1994), 327-349.
[23] CHURI, N., AND ACHENIE, L.E.K.: 'The optimal [39] FLETCHER, R., AND LEYFFER, S.: 'Numerical experi-
design of refrigerant mixtures for a two-evaporator ence with lower bounds for MIQP branch and bound',
refrigeration system', Computers Chem. Engin. 21, SIAM J. Optim. 8, no. 2 (1998), 604-616.
no. 13 (1997), 349-354. [40] FLOUDAS, C.A.: Nonlinear and mixed integer optimi-
[24] CmIc, A.R., AND FLOUDAS, C.A.: 'A retrofit ap- zation: Fundamentals and applications, Oxford Univ.
proach of heat exchanger networks', Computers Press, 1995.
Chem. Engin. 13, no. 6 (1989), 703. [41] FLOUDAS, C.A.: Deterministic global optimization:
[25] CIRIC, A.R., AND FLOUDAS, C.A.: 'A mixed-integer Theory, methods and applications, Nonconvex Optim.
nonlinear programming model for retrofitting heat ex- Appl. Kluwer Acad. Publ., 2000.
changer networks', I-EC Res. 29 (1990), 239. [42] FLOUDAS, C.A., AGGARWAL, A., AND CIRIC, A.R.:
[26] CmIc, A.R., AND GU, D.Y.: 'Synthesis of nonequi- 'Global optimum search for nonconvex NLP and
librium reactive distillation processes by MINLP op- MINLP problems', Computers Chem. Engin. 13,
timization', AIChE J. 40, no. 9 (1994), 1479-1487. no. 10 (1989), 1117-1132.
[27] CIRIC, A.R., AND HUCHETTE, S.G.: 'Multiobjective [43] FLOUDAS, C.A., AND CIRIC, A.R.: 'Strategies for
optimization approach to sensitivity analysis waste overcoming uncertainties in heat exchanger network
treatment costs in discrete process synthesis and op- synthesis', Computers Chem. Engin. 13, no. 10
timization problems', I-EC Res. 32, no. 11 (1993), (1989), 1133.
2636-2646. [44] FLOUDAS, C.A., AND GROSSMANN, I.E.: 'Algorith-
[28] DAICHENDT, M.M., AND GROSSMANN, I.E.: 'Prelimi- mic approaches to process synthesis: logic and global
nary screening for the MINLP synthesis of process sys- optimization': FOCAPD'9,{, Vol. 91 of AIChE Syrup.
tems: I. Aggregation techniques', Computers Chem. Ser., 1995, pp. 198-221.
Engin. 18 (1994), 663. [45] GANISH, B., HORSKY, D., AND SRIKANTH, K.: ~An
[29] DAICHENDT,M.M., AND GROSSMANN, I.E.: 'Prelimi- approach to optimal positioning of a new product',
nary screening for the MINLP synthesis of process sys- Managem. Sci. 29 (1983), 1277.
tems: II. Heat exchanger networks', Computers Chem. [46] GEOFFRION, A.M.: 'Generalized Benders decomposi-
Engin. 18 (1994), 679. tion', J. Optim. Th. Appl. 10 (1972), 237-260.
[30] DEAN, J.P., AND DERVAKOS, G.A.: 'Design of [47] GEORGIADIS, M.C., ROTSTEIN, G.E., AND MACCHI-
process-compatible biological agents', Computers ETTO, S.: 'Optimal layout design in multipurpose
Chem. Engin. Suppl. A 20 (1996), $67-$72. batch plants', Industr. Engin. Chem. Res. 36, no. 11
[31] DIMITRIADIS, V.D., AND PISTIKOPOULOS, E.N.: (1997), 4852-4863.
'Flexibility analysis of dynamic systems', I-EC Res. [48] GEROMEL, J.C., AND BELLONI, M.R.: 'Nonlinear
34, no. 12 (1995), 4451-4462. programs with complicating variables: theoretical
[32] DORNEIGH, M.C., AND SAHINIDIS, N.Y.: 'Global analysis and numerical experience', IEEE Trans.
optimization algorithms for chip layout and com- Syst., Man Cybern. S M C - 1 6 (1986), 231.
paction', Engin. Optim. 25, no. 2 (1995), 131-154. [49] GROSSMANN, I.E.: 'MINLP optimization strategies
[33] DUA, V., AND PISTIKOPOULOS, E.N.: 'Optimization and algorithms for process synthesis': Proc. 3rd. Inter-
techniques for process synthesis and material de- nat. Conf. on Foundations of Computer-Aided Process
sign under uncertainty', Chem. Engin. Res. Des. 76, Design, 1990, p. 105.
no. A3 (1998), 408-416. [50] GROSSMANN, I.E.: 'Mixed integer optimization tech-
[34] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer ap- niques for algorithmic process synthesis', in J.L. AN-
proximation algorithm for a class of mixed-integer DERSON (ed.): Advances In Chemical Engineering,
nonlinear programs', Math. Program. 36 (1986), 307. Process Synthesis, Vol. 23, Acad. Press, 1996, pp. 171-
[35] DUVEDI, A., AND ACHENIE,L.E.K.: 'On the design of 246.
environmentally benign refrigerant mixtures: A Math- [51] GROSSMANN, I.E., AND FLOUDAS, C.A.: 'Active con-
ematical Programming Approach', Computers Chem. straint strategy for flexibility analysis in chemical pro-
Engin. 21, no. 8 (1997), 915-923. cesses', Computers Chem. Engin. 11, no. 6 (1987),
[36] FAQIR, N.M., AND KARIMI, I.A.: 'Design of mul- 675.
tipurpose batch plants with multiple production [52] GROSsMANN, I.E., QUESADA, I., RAMON, a . , AND
routes': Proc. FOCAPD '89, Snowmass, Colorado, VOUDOURIS, V.T.: 'Mixed-integer optimization tech-
1990, p. 451. niques for the design and scheduling of batch pro-
[37] FLETCHER, R., HALL, J.A.J., AND JOHNS, W.R.: cesses': Proc. NATO Advanced Study Inst. Batch Pro-
'Flexible retrofit design of multiproduct batch plants', cess Systems Engineering, 1992.
410
Mixed integer nonlinear programming
[53] GROSSMANN, I.E., AND SARGENT, R.W.H.: 'Optimal experience with DICOPT solving MINLP problems in
design of multipurpose chemical plants', Industr. En- process systems engineering', Computers Chem. En-
gin. Chem. Process Des. Developm. 18 (1979), 343. gin. 13 (1989), 307.
[54] GUPTA, i . , AND MANOUSIOUTHAKIS, V.: 'Minimum [7o] KocIs, G.R., AND GROSSMANN, I.E.: 'A modelling
utility cost of mass exchange networks with variable and decomposition strategy for the MINLP optimiza-
single component supplies and targets', I-EC Res. 32, tion of process flowsheets', Computers Chem. Engin.
no. 9 (1993), 1937-1950. 13, no. 7 (1989), 797-819.
[55] GUPTA, O.K.: 'Branch and bound experiments in [71] KOKOSSIS, A.C., AND FLOUDAS, C.A.: 'Optimization
nonlinear integer programming', PAD Thesis Purdue of complex reactor networks-I, isothermal operation',
Univ. (1980). Chem. Engin. Sci. 45, no. 3 (1990), 595.
[56] GUPTA, O.K., AND RAVINDRAN, R.: 'Branch and [72] KOKOSSIS, A.C., AND FLOUDAS, C.A.: 'Optimal
bound experiments in convex nonlinear integer pro- synthesis of isothermal reactor-separator-recycle sys-
gramming', Managem. Sci. 31, no. 12 (1985), 1533- tems', Chem. Engin. Sci. 46 (1991), 1361.
1546. [73] KOKOSSIS, A.C., AND FLOUDAS, C.A.: 'Optimization
[57] HARDING, S.T., AND FLOUDAS, C.A.: 'Global opti- of complex reactor networks- II. Nonisothermal op-
mization in multiproduct and multipurpose batch de- eration', Chem. Engin. Sci. 49, no. 7 (1994), 1037.
sign under uncertainty', Industr. Engin. Chem. Res. [74] KOKOSSIS, A.C., AND FLOUDAS, C.A.: 'Stability in
36, no. 5 (1997), 1644-1664. optimal design: Synthesis of complex reactor net-
[ss] HATZIMANIKATIS, V., FLOUDAS, C.A., AND BAILEY, works', AIChE 40, no. 5 (1994), 849-861.
J.E.: 'Analysis and design of metabolic reaction net- [75] KRAVANJA, Z., AND GROSSMANN, I.E.: 'PROSYN-
works via mixed-integer linear optimization', AIChE An MINLP process synthesizer', Computers Chem.
J. 42, no. 5 (1996), 1277-1292. Engin. 14 (1990), 1363.
[59] HATZIMANIKATIS, V., FLOUDAS, C.A., AND BAI- [76] KRAVANJA, Z., AND GROSSMANN, I.E.: 'A compu-
LEY, J.E.: 'Optimization of regulatory architectures tational approach for the modeling/decomposition
in metabolic reaction networks', Biotechnol. and Bio- strategy in the MINLP optimization of process flow-
engin. 52 (1996), 485-500. sheets with implicit models', I-EC Res. 35, no. 6
[6o] HOANG, H.H.: 'Topological optimization of networks: (1996), 2065-2070.
A nonlinear mixed integer model employing gener- [77] LIU, M.L., AND SAHINIDIS, N.Y.: 'Process planning
alized Benders decomposition', IEEE Trans. A utom. in a fuzzy environment', Europ. J. Oper. Res. 100,
Control AC-27 (1982), 164. no. 1 (1997), 142-169.
[61] HOLMBERG, K.: 'On the convergence of the cross de- ITs] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the
composition', Math. Program. 47 (1990), 269. interaction of design and control, Part 1: A multiob-
[62] IERAPETRITOU, M.G., AND PISTIKOPOULOS, E.N.: jective framework and application to binary distilla-
'Simultaneous incorporation of flexibility and eco- tion synthesis', Computers Chem. Engin. 18, no. 10
nomic risk in operational planning under uncertainty', (1994), 933-969.
Computers Chem. Engin. 18, no. 3 (1994), 163-189. [79] LUYBEN, M.L., AND FLOUDAS, C.A.: 'Analyzing the
[63] IERAPETRITOU, M.G., AND PISTIKOPOULOS, E.N.: interaction of design and control, Part 2: Reactor-
'Batch plant design and operations under uncer- separator-recycle system', Computers Chem. Engin.
tainty', Industr. Engin. Chem. Res. 35, no. 2 (1996), 18, no. 10 (1994), 971-994.
772-787. [so] MARANAS, C.D.: 'Optimal computer-aided molecular
[64] IERAPETRITOU, M.G., PISTIKOPOULOS, E.N., AND design: A polymer design case study', Industr. Engin.
FLOUDAS, C.A.: 'Operational planning under uncer- Chem. Res. 35, no. 10 (1996), 3403-3414.
tainty', Computers Chem. Engin. 20, no. 12 (1996), [Sl] MARANAS, C.D.: 'Optimal molecular design under
1499-1516. property prediction uncertainty', AIChE J. 43, no. 5
[65] KALITVENTZEFF, B., AND MARECHAL, F.: 'The man- (1997), 1250-1264.
agement of a utility network': Process Systems Engi- Is2] MAWENGKANG, H., AND MURTAGH, B.A.: 'Solving
neering. PSE '88, Sydney, Australia, 1988, p. 223. nonlinear integer programs with large scale optimiza-
[66] KELLEY, J.E.: 'The cutting plane method for solving tion software', Ann. Oper. Res. 5 (1986), 425.
convex programs', J. SIAM 8, no. 4 (1960), 703-712. [s3] MCCORMICK, G.P.: 'Computability of global solu-
[67] KocIs, G.R., AND GROSsMANN, I.E.: 'Relaxation tions to factorable nonconvex programs: Part I - Con-
strategy for the structural optimization of process flow vex underestimating problems', Math. Program. 10
sheets', I-EC Res. 26, no. 9 (1987), 1869. (1976), 147-175.
[6s] KocIs, G.R., AND GROSsMANN, I.E.: 'Global opti- Is4] MOCKUS, L., AND REKLAITIS, G.V.: ' i new global
mization of nonconvex MINLP problems in process optimization algorithm for batch process scheduling',
synthesis', I-EC Res. 27, no. 8 (1988), 1407. in C.A. FLOUDAS AND P.M. PARDALOS (eds.): State
[69] KocIs, G.R., AND GROSsMANN, I.E.: 'Computational of the Art In Global Optimization, Kluwer Acad.
411
Mixed integer nonlinear programming
Publ., 1996, pp. 521-538. is NP-Hard', Oper. Res. Left. 7, no. 1 (1988), 33.
[85] MOHIDEEN, M.J., PERKINS, J.D., AND PISTIKOPOU- [101] PARDALOS, P.M., AND VAVASIS, S.A.: 'Quadratic
LOS, E.N.: 'Optimal design of dynamic systems under programming with one negative eigenvalue is NP-
uncertainty', AIChE J. 42, no. 8 (1996), 2251-2272. hard', J. Global Optim. 1 (1991), 15.
[86] MOHIDEEN, M.J., PERKINS, J.D., AND PISTIKOPOU- [102] PAULES IV, G.E., AND FLOUDAS, C.A.: 'Synthesis of
LOS, E.N.: 'Robust stability considerations in optimal flexible distillation sequences for multiperiod opera-
design of dynamic systems under uncertainty', J. Pro- tion', Computers Chem. Engin. 12, no. 4 (1988), 267.
cess Control 7, no. 5 (1997), 371-385. [103] PAULES IV, G.E., AND FLOUDAS, C.A.: 'APROS:
[87] MOORE, R.E.: Methods and applications of interval Algorithmic development methodology for discrete-
analysis, SIAM, 1979. continuous optimization problems', Oper. Res. 37,
[88] NEMHAUSER, G.L., AND WOLSEY, L.A.: Integer and no. 6 (1989), 902.
combinatorial optimization, Interscience Set. Discrete [104] PAULES IV, G.E., AND FLOUDAS, C.A.: 'Stochastic
Math. and Optim. Wiley, 1988. programming in process synthesis: A two-stage model
[89] NEUMAIER, A.: Interval methods .for systems of equa- with MINLP recourse for multiperiod heat-integrated
tions, Encycl. Math. Appl. Cambridge Univ. Press, distillation sequences', Computers Chem. Engin. 16,
1990. no. 3 (1992), 189.
[9o] NOVAK, Z., KRAVANJA, Z., AND GROSSMANN, I.E.: [105] PENTEADO, F.D., AND CIRIC, A.R.: 'An MINLP ap-
'Simultaneous synthesis of distillation sequences in proach for safe process plant layout', I-EC Res. 35,
overall process schemes using an improved MINLP ap- no. 4 (1996), 1354-1361.
proach', Computers Chem. Engin. 20, no. 12 (1996), [106] PETKOV, S.B., AND MARANAS, C.D.: 'Multiperiod
1425-1440. planning and scheduling of multiproduct batch plants
[91] ODELE, O., AND MACCHIETTO, S.: 'Computer aided under demand uncertainty', I-EC Res. 36, no. 11
molecular design: A novel method for optimal solvent (1997), 4864-4881.
selection', Fluid Phase Equilib. 82 (1993), 47. [107] PETKOV, S.B., AND MARANAS, C.D.: 'Quantitative
[92] OSTROVSKY, G.M., OSTROVSKY, M.G., AND assessment of uncertainty in the optimization of meta-
MIKHAILOW, G.W.: 'Discrete optimization of chem- bolic pathways', Biotechnol. and Bioengin. 56, no. 2
ical processes', Computers Chem. Engin. 14 (1990), (1997), 145-161.
111-117. [108] PETKOV, S.B., AND MARANAS, C.D.: 'Design of sin-
[93] PAPAGEORGAKI, S., AND REKLAITIS, G.V.: 'Optimal gle product campaign batch plants under demand un-
design of multipurpose batch plants: 1. Problem for- certainty', AIChE J. 44, no. 4 (1998), 896-911.
mulation', I-EC Res. 29, no. 10 (1990), 2054. [109] PISTIKOPOULOS, E.N., AND IERAPETRITOU, M.G.:
[94] PAPAGEORGAKI, S., AND REKLAITIS, G.V.: 'Optimal 'Novel approach for optimal process design under
design of multipurpose batch plants: 2. A decomposi- uncertainty', Computers Chem. Engin. 19, no. 10
tion solution strategy', I-EC Res. 29, no. 10 (1990), (1995), 1089-1110.
2062. [110] QUESADA, I., AND GROSSMANN, I.E.: 'An LP/NLP
[95] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: based branch and bound algorithm for convex MINLP
'An MINLP retrofit approach for improving the flex- optimization problems', Computers Chem. Engin. 16
ibility of heat exchanger networks', Ann. Oper. Res. (1992), 937-947.
42 (1993), 119. [111] RAMAN, V.S., AND MARANAS, C.D.: 'Optimization
[96] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: in product design with properties correlated with
'Synthesis and retrofit design of operable heat ex- topological indices', Computers Chem. Engin. 45
changer networks: 1. Flexibility and structural con- (1999), 997-1017.
trollability aspects', I-EC Res. 33 (1994), 1718. [112] RATSCHEK, H., AND ROKNE, J.: Computer methods
[97] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: for the range of functions, Ellis Horwood Set. Math.
'Synthesis and retrofit design of operable heat ex- Appl. Halsted Press, 1984.
changer networks: 2. Dynamics and control structure [113] REKLAITIS, G.V.: 'Perspectives on scheduling and
considerations', I-EC Res. 33 (1994), 1738. planning process operations': Proc. ~th. Internat.
[98] PAPALEXANDRI, K.P., AND PISTIKOPOULOS, E.N.: Syrup. on Process Systems Engineering, Montreal,
'Generalized modular representation framework for Canada, 1991.
process synthesis', AIChE J. 42 (1996), 1010. [114] ROUHANI, R., LASDON, L., LEBOW, W., AND WAR-
[99] PAPALExANDRI, K.P., PISTIKOPOULOS, E.N., AND REN, A.D.: 'A generalized Benders decomposition ap-
FLOUDAS, C.A.: 'Mass exchange networks for waste proach to reactive source planning in power systems',
minimization: A simultaneous approach', Chem. En- Math. Program. Stud. 25 (1985), 62.
gin. Res. Developm. 72 (1994), 279. [115] RYOO, H.S., AND SAHINIDIS, N.Y.: 'Global optimi-
[100] PARDALOS, P.M., AND SCHNITGER, G.: 'Checking lo- zation of nonconvex NLPs and MINLPs with applica-
cal optimality in constrained quadratic programming tions in process design', Computers Chem. Engin. 19,
412
Mixed integer nonlinear programming
no. 5 (1995), 551-566. [130] VECCHIETTI, A., AND GROSSMANN, I.E.: 'LOGMIP:
[116] SAHINIDIS, N.Y., AND GROSSMANN, I.E.: 'Conver- A disjunctive 0-1 nonlinear optimizer for process sys-
gence properties of generalized Benders decomposi- tem models', Computers Chem. Engin. 23 (1999),
tion', Computers Chem. Engin. 15, no. 7 (1991), 481. 555-565.
[117] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'MINOPT: [131] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
A software package for mixed-integer nonlinear opti- bined penalty function and outer-approximation for
mization, user's guide', Manual Computer-Aided Sys- MINLP optimization', Computers Chem. Engin. 14,
tems Lab. Dept. Chemical Engin. Princeton Univ. no. 7 (1990), 769-782.
(1997). [132] WELLONS, M.C., AND REKLAITIS, G.V.: 'Scheduling
[118] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'Interac- of multipurpose batch chemical plants: 1. Formula-
tion of design and control: optimization with dy- tion of single-product campaigns', I-EC Res. 30, no. 4
namic models', in W.W. HAGER AND P.M. PARDA- (1991), 671.
LOS (eds.): Optimal Control: Theory, Algorithms, and [133] WELLONS, M.C., AND REKLAITIS, G.V.: 'Scheduling
Applications, Kluwer Acad. Publ., 1998, pp. 388-435. of multipurpose batch chemical plants: 1. Multiple
[119] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'Optimiza- product campaign formulation and production plan-
tion framework for the synthesis of complex reactor ning', I-EC Res. 30, no. 4 (1991), 688.
networks', I-EC Res. 38 (1999), 744-766. [134] WESTERLUND, W., AND PETTERSSON, F.: 'An ex-
[1201 SKRIFVARS, H., HARJUNKOSKI, I., WESTERLUND, T., tended cutting plane method for solving convex
KRAVANJA, Z., AND PORN, R.: 'Comparison of differ- MINLP problems', Computers Chem. Engin. 19
ent MINLP methods applied on certain chemical en- (1995), 131-136.
gineering problems', Computers Chem. Engin. Suppl. [~as] WESTERLUND, T., PETTERSSON, F., AND GROSS-
20 (1996), $333-$33S. MANN, I.E.: 'Optimization of pump configuration
[121] SMITH, E.M.B., AND PANTELIDES, C.C.: 'Global op- problems as a MINLP problem', Computers Chem.
timisation of general process models', in I.E. GROSS- Engin. 18, no. 9 (1994), 845-858.
MANN (ed.): Global Optimization in Engineering De- [136] WESTERLUND, T., SKRIFVARS, H., HARJUNKOSKI, I.,
sign, Kluwer Acad. Publ., 1996, pp. 355-386. AND P(3RN, R.: 'An extended cutting plane method
[122] SMITH, E.M.B., AND PANTELIDES, C.C.: 'A symbolic for a class of non-convex MINLP problems', Comput-
reformulation/spatial branch and bound algorithm for ers Chem. Engin. 22, no. 3 (1998), 357-365.
the global optimization of nonconvex MINLPs', Com- [137] XIA, Q., AND MACCHIETTO, S.: 'Design and synthesis
puters Chem. Engin. 23 (1999), 457-478. of batch plants: MINLP solution based on a stochastic
[123] STEFANIS, S.K., LIVINGSTON, A.G., AND PIS- method', Computers Chem. Engin. 21 (1997), $697-
TIKOPOULOS, E.N.: 'Environmental impact consid- $702.
erations in the optimal design and scheduling of [138] YEE, T.F., AND GROSSMANN, I.E.: 'Simultaneous op-
batch processes', Computers Chem. Engin. 21, no. 10 timization models for heat integration - II. Heat ex-
(1997), 1073-1094. changer network synthesis', Computers Chem. Engin.
[124] TURKAY, M., AND GROSSMANN, I.E.: 'Logic-based 14, no. 10 (1990), 1165-1184.
MINLP algorithms for the optimal synthesis of pro- [139] YEE, T.F., GROSSMANN, I.E., AND KRAVANJA, Z.:
cess networks', Computers Chem. Engin. 20, no. 8 'Simultaneous optimization models for heat integra-
(1996), 959-978. tion - I. Area and energy targeting and modeling of
[1251 VAIDYANATHAN, R., AND EL-HALWAGI, M.: 'Global multi-stream exchangers', Computers Chem. Engin.
optimization of nonconvex MINLP's by interval anal- 14, no. 10 (1990), 1151-1164.
ysis', in I.E. GROSSMANN (ed.): Global Optimization [140] YEE, T.F., GROSSMANN, I.E., AND KRAVANJA, Z.:
in Engineering Design, Kluwer Acad. Publ., 1996, 'Simultaneous optimization models for heat integra-
pp. 175-193. tion -III. Area and energy targeting and modeling of
[126] VAIDYARAMAN, S., AND MARANAS, C.D.: 'Optimal multi-stream exchangers', Computers Chem. Engin.
synthesis of refrigation cycles and selection of refrig- 14, no. 11 (1990), 1185-1200.
erants', AIChE J. 45 (1999), 997-1017. [141] ZAMORA, J.M., AND GROSSMANN, I.E.: 'Continu-
[127] VASELENAK, J., GROSSMANN, I.E., AND WESTER- ous global optimization of structured process systems
BERG, A.W.: 'An embedding formulation for the op- models', Computers Chem. Engin. 22, no. 12 (1998),
timal scheduling and design of multiproduct batch 1749-1770.
plants', I-EC Res. 26, no. 1 (1987), 139. [142] ZAMOHA, J.M., AND GROSsMANN, I.E.: 'A global
[~28] VASELENAK, J., GROSsMANN, I.E., AND WESTER- MINLP optimization algorithm for the synthesis of
BERG, A.W.: 'Optimal retrofit design of multipurpose heat exchanger networks with no stream splits', Com-
batch plants', I-EC Res. 26, no. 4 (1987), 718. puters Chem. Engin. 22, no. 3 (1998), 367-384.
[129] VAVASIS, S.: Nonlinear optimization: Complexity is- [14a] ZHANG, X., AND SARGENT, R.W.H.: 'The opti-
sues, Oxford Univ. Press, 1991. mal operation of mixed production facilities: Gen-
413
Mixed integer nonlinear programming
eral formulation and some solution approaches': Proc. DEFINITION 1 An algorithmic language describes
5th Int. Symp. Process Systems Engineering, 1994, (explicitly or implicitly) the computation of solv-
pp. 171-177.
ing a problem, that is, 'how' a problem can be pro-
Christodoulos A. Floudas cessed using a machine. The computation consists
Dept. Chemical Engin. Princeton Univ. of a sequence of well-defined instructions which
Princeton, NJ 08544-5263, USA can be executed in a finite time by a Turing ma-
E-mail address: floudas©titan, princeton, e d u chine. The information of a problem which is cap-
tured by an algorithmic language is called algo-
MSC2000: 90Cll, 49M37
rithmic knowledge of the problem. [:]
Key words and phrases: decomposition, outer approxima-
tion, branch and bound, global optimization.
Algorithmic knowledge to describe a problem is
very common in our everyday life ~ one only need
to look at cookery-books, or technical maintenance
MODELING LANGUAGES IN OPTIMIZA- manuals ~ that o n e m a y ask whether the human
TION: A NEW PARADIGM brain is 'predisposed' to preferably present a prob-
lem in describing its solution recipe.
In this paper, modeling languages are identified as
a new computer language paradigm and their ap- However, there exists at least one different way
plications for representing optimization problems to capture knowledge about a problem; it is the
is illustrated by examples. method which describes 'what' the problem is by
defining its properties, rather than saying 'how'
Programming languages can be classified into
to solve it. Mathematically, this can be expressed
three paradigms: imperative, functional, and logic
by a set {x e X : R(x)}, where X is a continu-
programming [14]. The imperative programming
ous or discrete state space and R(x) is a Boolean
paradigm is closely related to the physical way of
relation, defining the properties or the constraints
how (the von Neumann) computer works: Given a
of the problem; x is called the variable(s). A no-
set of memory locations, a program is a sequence
tational system that represents a problem in this
of well defined instructions on retrieving, storing
way is called a declarative language.
and transforming the content of these locations.
The functional paradigm of computation is based DEFINITION 2 A declarative language describes
on the evaluation of functions. Every program can the problem as a set using mathematical vari-
be viewed as a function which translates an input ables and constraints defined over a given state
into a unique output. Functions are first-class val- space. This space can be finite or infinite, count-
ues, that is, they must be viewed as values them- able or noncountable. The information of a prob-
selves. The computational model is based on the A- lem which is captured by a declarative language is
calculus invented by A. Church (1936) as a math- called declarative knowledge of the problem. [::]
ematical formalism for expressing the concept of a
computation. The paradigm of logic programming The declarative representation, in general, does
is based on the insight that a computation can be not give any indication on how to solve the prob-
viewed as a kind of (constructive) proof. Hence, lem. It only states what the problem is. Of course,
a program is a notation for writing logical state- there exists a trivial algorithm to solve a declara-
ments together with specified algorithms for im- tively stated problem, which is to enumerate the
plementing inference rules. state space and to check whether a given x E X vi-
All three programming paradigms concentrate olates the constraint R(x). The algorithm breaks
on problem representation as a computation, that down, however, whenever the state space is infi-
is, the problem is stated in a way that describes nite. But even if the state space is finite, it is
the process of solving it. The computation on how for most nontrivial problems ~ so large that a full
to solve a problem 'is' its representation. One may enumeration is practically impossible.
call such a notational system an algorithmic lan- Algorithmic and declarative representations are
guage. two fundamentally different kinds of modeling
414
Modeling languages in optimization: A new paradigm
and representing knowledge. Declarative knowl- which is clearly a declarative statement of the
edge answers the question 'what is?', whereas algo- problem. In Scheme, a functional language, this
rithmic knowledge asks 'how to?' [4]. An algorithm formula can be implemented directly as a function
gives an exact recipe of how to solve a problem. A in the following way:
mathematical model, i.e. its declarative represen- (define (gcd a b)
tation, on the other hand, (only) defines the prob- (if (= b 0) a
lem as a subspace of the state space. No algorithm (gcd b (remainder a b))))
is given to find all or a single element of the feasible Similar formulations can be given for any other
subspace. language which includes recursion as a basic con-
trol structure. This class of problems is surpris-
ingly rich. The whole paradigm of dynamic pro-
W h y D e c l a r a t i v e R e p r e s e n t a t i o n . The ques- gramming can be subsumed under this class.
tion arises, therefore, why to present a problem
A class of problems of a very different kind are
using a declarative way, since one must solve it
linear programs, which can be represented declar-
anyway and, hence, represent as an algorithm?
atively in the following way:
The reasons are, first of all, conciseness, insight,
and documentation. Many problems can be rep- {mincx" Ax > b}
resented declaratively in a very concise way, while
From this f o r m u l a t i o n - in contrast to the class
the representation of their computation is long and
of recursive definitions - - nothing can be deduced
complex. Concise writings favor also the insight
that would be useful in solving the problem. How-
of a problem. Furthermore, in many scientific pa-
ever, there exists well-known methods, for example
pers a problem is stated in a declarative way using
the simplex method, which solves almost all in-
mathematical equations and inequalities for docu-
stances in a very efficient way. Hence, to make the
mentational purposes. This gives a clear statement
declarative formulation of a linear program useful
of the problem and is an efficient way to commu-
for solving it, one only needs to translate it into
nicate it to other scientists. However, documenta-
a form, the simplex algorithm accepts as input.
tion is by no means limited to human beings. One
The translation from the declarative formulation
can imagine declarative languages implemented on
{min cx" Ax >_ b} to such an input-form can be
a computer like algorithmic languages, which are
automated. This concept can be extended to non-
parsed and interpreted by a compiler. In this way,
linear and discrete problems.
an interpretative system can analyse the structure
of a declarative program, can pretty-print it on a
A l g e b r a i c M o d e l i n g L a n g u a g e s . The idea to
printer or a screen, can classify it, or symbolically
state the mathematical problem in a declarative
transform it in order to view it as a diagram or in
way and to translate it into an 'algorithmic' form
another textual form.
by a standard procedure led to a new language
Of course, the most interesting question is
paradigm emerged basically in the community of
whether the declarative way of representing a
operations research at the end of the 1980s, the
problem could be of any help in solving the prob-
algebraic modeling languages (AIMMS [1], AMPL
lem. [7], GAMS [2], LINGO [18], and LPL [12] and oth-
Indeed, for certain classes of problems the com- ers). These languages are becoming increasingly
putation can be obtained directly from a declara- popular even outside the community of operations
tive formulation. This is true for all recursive def- research. Algebraic modeling languages represent
initions. A classical example is the algorithm of a problem in a purely declarative way, although
Euclid to find the greatest common divisor (gcd) most of them include computational facilities to
of two integers. One can proof that manipulate the data as well as certain control
structures.
_ ~gcd(b,a mod b), b> 0 One of their strength is the complete separation
gcd(a, b)
/ a, b - 0, of the problem formulation as a declarative model
415
Modeling languages in optimization: A new paradigm
from finding a solution, which is supposed to be to integrate symbolic model transformation rules
computed by an external program called a solver. into the declarative language in order to generate
This allows the modeler not only to separate the formulations which are more useful for a solver.
two main tasks of model formulation and model AMPL, for example, automatically detects par-
solution, but also to switch easily between several tially separable structure and computes second
solvers. This is an invaluable benefit for many dif- derivatives [8]. This information are also handed
ficult problems, since it is not uncommon that a over to a nonlinear solver. LPL, to cite a very dif-
model instance can be solved using one method, ferent undertaking, has integrated a set of rules to
and another instance is solvable only using another translate symbolically logical constraints into 0-1
method. Another advantage of such languages is to constraint [11]. To do this in an intelligent way is
separate dearly between model structure, which all but easy, because the resulting 0-1 formulation
only contains parameters (place-holder for data) should be as sharp as possible. This translation is
but no data, and model instance, in which the pa- useful for large mathematical models which must
rameters are replaced by a specific data set. This be extended by a few logical conditions. For many
leads to a natural separation between model for- applications the original model becomes straight-
mulation and data gathering stored in databases. forward while the transformed is complicated but
Hence, the main features of these algebraic mod- still relatively easy to solve (examples were given
eling languages are" in [11]). Even if the resulting formulation is not
solvable efficiently, the modeler can gain more in-
• purely declarative representation of the prob-
sights into the structure of the model from such
lem;
a symbolic translation procedure, and eventually
• clear separation between formulation and so-
modify the original formulation.
lution;
• clear separation between model structure and Second Generation Modeling Languages.
model data. Another research activity, actually under way, goes
It is, however, naive to think that one only needs in the direction of extending the algebraic mod-
to formulate a problem in a concise declarative eling languages in order to express also algorith-
form and to link it somehow to a solver in order mic knowledge. This is necessary, because even if
to solve it. First of all, the 'linking process' is not one could link an purely declarative language to
so straightforward as it seems initially. Second, a any solver, it remains doubtful of whether this can
solver may not exist which could solve the problem be done efficiently in all cases. Furthermore, for
at hand in an efficient way. One only needs to look many problems it is not useful to formulate them
at Fermat's last conjecture which can be stated in in a declarative way: the algorithmic way is more
a declarative way as {a,b,c,n E N +" a n + bn = straightforward and easier to understand. For still
c n, a , b , c >_ 1, n > 2} to convince oneself of this other problems a mixture of declarative and algo-
fact. Even worse, one can state a problem declar- rithmic knowledge leads to a superior formulation
atively for which no solver can exist. This is true in terms of understandability as well as in terms
already for the rather limited declarative language of efficiency, (examples are given below to confirm
of first order logic, for which no algorithm exists this findings).
which decides whether a formula is true or false in Therefore, AIMMS integrates control structures
general (see [5]). and procedure definitions. GAMS, AMPL and
In this sense, efforts are under way actually in LPL also allow the modeler to write algorithms
the design of such languages which focus on flex- powerful enough to solves models repeatedly.
ibly linking the declarative formulation to a spe- A theoretical effort was undertaken in [10] to
cific solver to make this paradigm of purely declar- specify a modeling language which allows the mod-
ative formulation more powerful. This language- eler (or the programmer) to combine algorithmic
solver-interface problem has different aspects and and declarative knowledge within the same lan-
research goes in many directions. A main effort is guage framework without intermingle them. The
416
Modeling languages in optimization: A new paradigm
overall syntax structure of a model (or a program) 1) In CLP the algorithmic part ~ normally a
in this framework is as follows: search mechanism ~ is behind the scene and
MODEL ModelName the computation is intrinsically coupled with
<declarative part of the model> the declarative language itself. This could be
BEGIN a strength because the programmer does not
<algorithmic part of the model>
END ModelName. have to be aware of how the computation is
taking place, he or she only writes the rules
Declarative and algorithmic knowledge are clearly
in a descriptive, that is declarative, way and
separated. Either part can be empty, meaning that
triggers the computation by a request. In re-
the problem is represented in a purely declara-
ality, however, it is an important drawback,
tive or in a purely algorithmic form. The declar-
because ~ for most nontrivial problem
ative part consists of the basic building blocks of
the programmer 'must' be aware on how the
declarative knowledge: variables, parameters, con-
computation is taking place. Therefore, to
straints, model checking facilities, and sets (that
guide the computation in CLP, the declar-
is a way to 'multiply' basic building blocks). This
ative program is interspersed with additional
part may also contain 'ordinary declarations' of
rules which have nothing to do with the de-
an algorithmic language (e.g., type and function
scription of the original problem. In a model-
declarations). Furthermore, one can declare whole
ing language, the user either links the declar-
models within this part, leading to nested model
ative part to an external solver or writes the
structures, which is very useful in decomposing
solver within the language. In either case,
a complex problem into smaller parts. The algo-
both parts are strictly separated. Why is this
rithmic part, on the other hand, consists of all
separation so important? Because it allows
control structures which make the language Tur-
the modeler to 'plug in' different solvers with-
ing complete. One may imagine his or her favorite
out touching the overall model formulation.
programming language being implemented in this
part. A language which combines declarative and 2) The second difference is that the model-
algorithmic knowledge in this way is called model- ing language paradigm lead automatically to
ing language. modular design. This is probably to hottest
DEFINITION 3 A modeling language is a nota- topic in software engineering: building com-
tional system which allows one to combine (not to ponents. Software engineering teaches us that
merge) declarative and algorithmic knowledge in a complex structure can be only managed
the same language framework. The content cap- efficiently by break it down into many rel-
tured by such a notation is called a model. [:] atively independent components. The CLP
approach leads more likely to programs that
Such a language framework is very flexible. Purely
are difficult to survey and hard to debug and
declarative models are linked to external solvers to maintain, because such considerations are
to be solved; purely algorithmic models are pro- entirely absent within the CLP paradigm.
grams, that is algorithms + data structures, in the
ordinary sense. 3) On the other hand, the community of
CLP has developed methods to solve spe-
Modeling Language and Constraint Logic cifiC classes of combinatorial problems which
Programming. Merging declarative and algorith- seems to be superior to other methods. This
mic knowledge is not new, although it is not very is because they rely on propagation, simplifi-
common in language design. The only existing lan- cation of constraints, and various consistency
guage paradigm doing it is constraint logic pro- techniques. In this sense, CLP solvers could
gramming (CLP), a refinement of logic program- be used and linked with modeling languages.
ming [13]. There are, however, important differ- Such a project is actually under way between
ences between the CLP paradigm and the para- the AMPL language and the ILOG solver [6],
digm of modeling language as defined above. [17].
417
Modeling languages in optimization: A new paradigm
Hence, while the representation of models is tic, problems with n < 50 were solvable within the
probably best done in the language framework of LPL framework. Using the constraint language OZ
modeling languages, the solution process can taken [19] problems of n _< 200 are efficiently solvable us-
place in a CLP solver for certain problems. ing techniques of propagation and variable domain
reductions. However, the success of all these meth-
M o d e l i n g E x a m p l e s . Five modeling examples ods seems to be limited compared to the best we
are chosen from very different problem domains to can attain. In [20], [21], Sosic Rok and Gu Jun
illustrate the highlights of the presented paradigm presented a polynomial time local heuristic that
of modeling language. The first two examples show can solve problems of n < 3 000 000 in less than
that certain problems are best formulated using al- one minute. The presented algorithm is very sim-
gorithmic knowledge, the next two examples show ple. The conclusion seems to be for the n-queens
the power of a declarative formulation, and a last problem that an algorithmic formulation is advan-
example indicates that mixing both paradigms is tageous.
sometimes more advantageous.
A Two-Person Game. Two players choose at ran-
Sorting. Sorting is a problem which is preferably dom a positive number and note it on a piece of
expressed in an algorithmic way. Declaratively, paper. They then compare them. If both numbers
the problem could be formulated as follows: Find are equal, then neither player gets a payoff. If the
a permutation 7r such that A~ < A~,+~ for all difference between the two numbers is one, then
i E { 1 , . . . , n - 1} where A1,...,n is an array of ob- the player who has chosen the higher number ob-
jects on which an order is defined. It is difficult tains the sum of both; otherwise the player who
to imagine a 'solver' that could solve this problem has chosen the smaller number obtains the sum
as efficiently as the best known sorting algorithms of both. What is the optimal strategy for a player,
such as Quicksort, of which the implementation is i.e. which numbers should be chosen with what fre-
straightforward. quencies to get the maximal payoff? This problem
The reason why the sorting problem is best was presented in [9] and is a typical two-person
formulated as an algorithm is probably that the zero-sum game. In LPL, it can be formulated as
state space is exponential in the number of items, follows"
however, the best algorithm only has complexity MODEL Game 'finite two-person zero-sum game';
SET i ALIAS j : - / 1 : 5 0 / ;
O(nlogn).
PARAMETER p{i, j} := IF(j > i, IF(j = i + 1,
The n-queens problem. The n-queens problem is - i - j , MIN(i,j)), IF(j < i,-p[j,i],O));
VARIABLE x{i};
to place n queens on a chessboard of dimension
CONSTRAINT R: SUM{i} x[i] = 1;
n x n in such a way, that they cannot beat each MAXIMIZE gain: MIN{j} (SUM{i} p[j, i]. x[i]);
other. This problem can be formulated declarative END Game.
as follows" {xi, xj E { 1 , . . . , n } ' x i ~ xj, xi + i This is an very compact way to declaratively for-
xj + j, x i - i ~ xj - j } , where xi is the column mulate the problem and it is difficult to imag-
position of the ith queen (i.e. the queen in row i). ine how this could be achieved using algorithmic
Using the LPL [12] formulation: knowledge alone. It is also an efficient way to state
MODEL nQueens; the problem, because large instances can be solved
PARAMETER n; SET i ALIAS j ::= { 1 , . . . , n } ; by an linear programming solver. LPL automati-
DISTINCT VARIABLE x{i}[1,..., n]; cally transforms it into an linear program. (By the
CONSTRAINT S { i , j : i < j}:
way, the problem has an interesting solution: Each
x[i] + i < > x[j] + j AND x[i] - i < > x[j] - j;
END player should only choose number smaller than
six.)
the author was able to solve problems for n < 8
using a general MIP solver. The problem is auto- Equal Circles in a Square. The problem is to find
matically translated into a 0-1 problem by LPL. the maximum diameter of n equal mutually dis-
Replacing the MIP-solver by a tabu search heuris- joint circles packed inside a unit square.
418
Modeling languages in optimization: A new paradigm
In LPL, this problem can be compactly formu- one could formulate the algorithmic knowledge as
lated as follows" follows"
MODEL circles 'pack equal circles in a square'; SOLVE the small cutting-stock problem
PARAMETER n 'number of circles'; SOLVE the knapsack problem
WHILE a rewarding pattern was found DO
SET i ALIAS j = 1 , . . . , n ;
add pattern to the cutting-stock problem
VARIABLE
SOLVE the cutting-stock problem again
t 'diameter of the circles';
x{i}[0, 1] 'x-position of the center'; SOLVE the knapsack problem again
y{i}[0, 1] 'y-position of the center'; ENDWHILE
CONSTRAINT The two models (the cutting-stock problem and
R{i,j: i < j} 'circles must be disjoint':
the knapsack problem) can be formulated declara-
(~[i] - ~[j])~ + (y[i] - y[j])~ > t;
MAXIMIZE obj 'maximize diameter': t; tively. In the proposed framework of modeling lan-
END guage, the complete problem can now be expressed
as in the program below.
C.D. Maranas et al. [15] obtained the best known
MODEL CuttingStock;
solutions for all n < 30 and, for n - 15, an
MODEL Knapsack(i, w, p, K, x,obj);
even better one using an equivalent formulation SET i;
in GAMS and linking it to MINOS [16], an well- PARAMETER w{i}; p{i}; K;
known nonlinear solver. INTEGER VARIABLE x{i};
CONSTRAINT R: SUM{i} w . x < K;
The (Fractional) Cutting-Stock Problem. Paper is MAXIMIZE obj" SUM{i} p.x;
manufactured in rolls of width B. A set of cus- END Knapsack.
tomers W orders dw rolls of width b~ (with w E SET
w 'rolls ordered'; p 'possible patterns';
W). Rolls can be cut in many ways, every subset
PARAMETER
P~ C_ W such that ~i~P' yibi <_B is a possible cut- a{w,p} 'pattern table';
pattern, where yi is a positive integer. The ques- d{w} 'demands';
tion is how the initial roll of width B should be cut, b{w} 'widths of ordered rolls';
that is, which patterns should be used, in order to B 'initial width';
INTEGER y{w} 'new added pattern';
minimize the overall paper waste. A straightfor-
C 'contribution of a cut';
ward formulation of this problem is to enumerate
VARIABLE
all patterns, each giving a variable, then to min- X{p} 'number of rolls cut according to p';
imize the number of used patterns while fulfilling CONSTRAINT
the demands. The resulting model is a very large Dem{w}" SUM{p} a,X >_d;
linear program which cannot be solved. MINIMIZE obj" SUM{p} X;
BEGIN
A well-known method in operations research to
SOLVE;
solve such kind of problems is to use a column SOLVE Knapsack(w, b,Dem.dual,B, y, C);
generation method (see [3] for details), that is, a WHILE (C > 1) DO
small instance with only a few patterns is solved p .= p + {'pattern_'+str(#p)};
and a rewarding column ~ a p a t t e r n - is added a{w, #p} := y[w];
SOLVE;
repeatedly to the problem. The new problem is
SOLVE Knapsack(w, b,Dem.dual,B, y, C);
then solved again. This process is repeated, un- END;
til no pattern can be added. To find a rewarding END CuttingStock.
pattern, another problem ~ named a knapsack
This formulation has several remarkable prop-
p r o b l e m - must be solved.
erties:
The problem can be formulated partially be
algorithmic partially by declarative knowledge. 1) It is short and readable. The declarative part
It consists of two declaratively formulated prob- consists of the (small) linear cutting-stock
lems (a linear program and an knapsack problem), problem, it also contains, as a submodel,
which are both repeatedly solved. In a pseudocode a knapsack problem. The algorithmic part
419
Modeling languages in optimization: A new paradigm
implements the column generation method. modeler to communicate the model easily and to
Both parts are entirely separated. build it in a readable and maintainable way.
2) It is a complete formulation, except from the See also: L a r g e scale u n c o n s t r a i n e d o p t i m i -
data. No other code is needed; both models zation; O p t i m i z a t i o n s o f t w a r e ; C o n t i n u o u s
can be solved using a standard MIP solver global o p t i m i z a t i o n : Models~ a l g o r i t h m s a n d
(since the knapsack problem is small in gen- software.
eral).
3) It has a modular structure. The knapsack References
problem is an independent component with [1] BISSCHOP, J.: AIMMS, the modeling system, Paragon
Decision Techn., 1998, www.paragon.nl.
its own name space; there is no interference
[2] BROOKE, A., KENDRICK, D., AND MEERAUS, A.:
with the surrounding model. It could even be GAMS. A user's guide, Sci. Press, 1988.
declared outside the cutting-stock problem. [3] CHVATAL, V." Linear programming, Freeman, 1973.
4) The cutting-stock problem is only one prob- [4] FEIGENBAUM, E.A.: 'How the 'what' becomes the
'how", Comm. A CM 39, no. 5 (1996), 97-104.
lem of a large class of relevant problems
[5] FLOYD, R.W., AND BEIGEL, R.: The language of ma-
which are solved using a column generation chines, an introduction to computability and formal
or, alternatively, a row-cut generation. languages, Computer Sci. Press, 1994.
[6] FOURER, R.: 'Extending a general-purpose algebraic
modeling language to combinatorial optimization: A
C o n c l u s i o n . It has been shown that certain
logic programming approach', in D.L. WOODRUFF
problems are best formulated as algorithms, oth- (ed.): Advances in Computational and Stochastic Op-
ers in a declarative way, still others need both timization, Logic Programming, and Heuristic Search:
paradigms to be stated concisely. Computer sci- Interfaces in Computer Sci. and Oper. Res., Kluwer
ence made available many algorithmic languages; Acad. Publ., 1998, pp. 31-74.
[7] FOURER, R., GAY, D.M., AND KERNIGHAN, B.W.:
they can be contrasted to the algebraic model-
AMPL, a modeling language for mathematical pro-
ing languages which are purely declarative. A lan- gramming, Sci. Press, 1993.
guage, called modeling language, which combines [8] GAY, D.M.: 'Automatically finding and exploiting
both paradigms was defined in this paper and ex- partially separable structure in nonlinear programming
amples were given showing clear advantages of do- problems', A T ~ T Bell Lab. Murray Hill, New Jersey
ing so. Its is more powerful than both paradigms (1996).
[9] HOFSTADTER, D.R.: Metamagicum, Fragen Bach der
separated.
Essenz yon Geist und Struktur, Klett-Cotta, Stuttgart,
However, the integration of algorithmic and 1988.
declarative knowledge cannot be done in an ar- [10] HTJRLIMANN,T.: 'Computer-based mathematical mod-
bitrary way. The language design must follow cer- eling', Habilitations Script Fac. Economic and Social
tain criteria well-known in computer science. The Sci. Inst. Informatics Univ. Fribourg Dec. (1997).
[11] HURLIMANN, W.: 'An efficient logic-to-IP trans-
main criteria are: reliability and transparency. Re-
lation procedure', Working Paper Inst. In/or-
liability can be achieved by a unique notation to matics Univ. Fribourg March (1998), ftp://ftp-
code models, that is, by a modeling language, and iiuf.unifr,ch/pub/lpl/doc / AP MOD 1.pdf.
by various checking mechanisms (type checking, [12] HURLIMANN, T.: Reference manual for the LPL mod-
unit checking, data integrity checking and oth- eling language, working paper version ~.30, June,
Inst. Informatics Univ. Fribourg, 1998, ftp://ftp-
ers). Transparency can be obtained by flexible de-
iiuf.unifr.ch/pub/lpl/doc/Manual.ps.
composition techniques, like modular structure as [13] JAFFAR, J., AND MAHER, M.J.: Constraint logic pro-
well as access and protection mechanisms of these gramming: A survey, Handbook Artificial Intelligence
structure, well-known techniques in language de- and Logic Programming. Oxford Univ. Press, 1995.
sign and software engineering. [14] LOUDEN, K.C.: Programming languages -Principles
and practice, PWS/Kent Publ., 1993.
Solving efficiently and relevant optimization
[15] MARANAS, C.D., FLOUDAS, C.A., AND PARDALOS,
problems using present desktop machine not only P.M.: New results in the packing of equal circles in a
asks for fast machines and sophisticated solvers, square, Dept. Chemical Engin. Princeton Univ., 1993.
but also for formulation techniques that allow the [16] MURTAGH, B.A., AND SAUNDERS, M.A.: MINOS 5.0,
420
Molecular structure determination: Convex global underestimation
user guide, Systems Optim. Lab. Dept. Oper. Res. the experimental structures were known (see [3],
Stanford Univ., 1987. [9]). While most of these have been made with a
[17] SA, ILOG: ILOG solver ,~.0 user's manual; ILOG
blend of a human expert's abilities and computer
solver ~.0 reference manual, ILOG, 1997.
[is] SCHRAGE, L.: Optimization modeling with LINGO, assistance, fully automated methods have shown
Lindo Systems, 1998, www.lindo.com. promise for producing previously unattainable ac-
[19] SMOLKA, G.: 'The Oz programming model', in J. VAN curacy [2].
LEEUWEN (ed.): Computer Sci. Today, Vol. 1000 of Lec- These machine based prediction strategies at-
ture Notes Computer Sci., Springer, 1995, pp. 324-343.
tempt to lessen the reliance on experts by develop-
[2o] SOSIC, R., AND GU, J.: 'A polynomial time algorithm
for the n-queens problem', SIGART Bull. 1, no. 3 ing a completely computational method. Such ap-
(1990), 7-11. proaches are generally based on two assumptions.
[21] SOSIC, R., AND GU, J.: '3,000,000 queens in less than First, that there exists a potential energy function
one minute', SIGART Bull. 2, no. 1 (1991), 22-24. for the protein; and second that the folded state
Tony Hiirlimann corresponds to the structure with the lowest poten-
Inst. Informatics Univ. Fribourg tial energy (minimum of the potential energy func-
Fribourg, Switzerland tion) and is thus in a state of thermodynamic equi-
E-mail address: tony.huerlimann~unifr, ch librium. This view is supported by in vitro obser-
MSC2000: 90C10, 90C30 vations that proteins can successfully refold from
Key words and phrases: algorithmic language, declarative a variety of denatured states. Evolutionary theory
language, modeling language, solver. also supports a folded state at a global energy min-
imum. Protein sequences have evolved under pres-
sure to perform certain functions, which for most
MOLECULAR STRUCTURE DETERMINA- known occurrences requires a stable, unique, and
TION: CONVEX GLOBAL UNDERESTIMA- compact structure. Unless specifically required for
TION a certain function, there was no biochemical need
An important class of difficult global minimization for proteins to hide their global minimum behind
problems arise as an essential feature of molecu- a large kinetic energy barrier. While kinetic blocks
lar structure calculations. The determination of a may occur, they should be limited to special pro-
stable molecular structure can often be formulated teins developed for certain functions (see [1]).
in terms of calculating the global (or approximate
global) minimum of a potential energy function M o l e c u l a r M o d e l . Unfortunately, finding the
(see [6]). Computing the global minimum of this 'true' energy function of a molecular structure,
function is very difficult because it typically has if one even exists, is virtually impossible. For ex-
a very large number of local minima which may ample, with proteins ranging in size up to 1,053
grow exponentially with molecule size. amino acids (a collagen found in tendons), ex-
One such application is the well known pro- haustive conformational searches will never be
tein folding problem. It is widely accepted that tractable. Practical search strategies for the pro-
the folded state of a protein is completely depen- tein folding problem currently require a simplified,
dent on the one-dimensional linear sequence (i.e., yet sufficiently realistic, molecular model with an
'primary' sequence) of amino acids from which the associated potential energy function representing
protein is constructed" external factors, such as en- the dominant forces involved in protein folding [4].
zymes, present at the time of folding have no effect In a one such simplified model, each residue in the
on the final, or native, state of the protein. This led primary sequence of a protein is characterized by
to the formulation of the protein/olding problem: its backbone components N H - C a l l - C'O and
given a known primary sequence of amino acids, one of 20 possible amino acid sidechains attached
what would be its native, or folded, state in three- to the central Ca atom. The three-dimensional
dimensional space. structure of the chain is determined by internal
Several successful predictions of folded protein molecular coordinates consisting of bond lengths l,
structures have been made and announced before bond angles ~, sidechain torsion angles X, and the
421
Molecular structure determination: Convex global underestimation
backbone dihedral angles ¢, ¢, and w. Fortunately, that allows only certain preset values for the back-
these 1 0 r - 6 parameters (for an r-residue struc- bone dihedral angle pairs (¢, ¢). Since the residues
ture) do not all vary independently. Some of these in this model come in only two forms, hydrophobic
( 7 r - 4 of them) are regarded as fixed since they and polar, where the hydrophobic monomers ex-
are found to vary within only a very small neigh- hibit a strong pairwise attraction, the lowest free
borhood of an experimentally determined value. energy state involves those conformations with the
Among these are the 3 r - 1 backbone bond lengths greatest number of hydrophobic 'contacts' [4] and
l, the 3 r - 2 backbone bond angles 0, and the intrastrand hydrogen bonds. Simplified potential
r - 1 peptide bond dihedral angles w (fixed in functions have been successful in [11], [10], and
the trans conformation). This leaves only the r [12]. Here we use a simple modification of the en-
sidechain torsion angles X, and the r - 1 backbone ergy function from [11].
dihedral angle pairs (¢, ¢). In the reduced repre-
sentation model presented here, the sidechain an-
T h e C o n v e x G l o b a l U n d e r e s t i m a t o r . One
gles X are also fixed since sidechains are treated
practical means for finding the global minimum of
as united atoms (see below) with their respective
the polypeptide's potential energy function is to
torsion angles X fixed at an 'average' value taken
use a convex global underestimator to localize the
from the Brookhaven Protein Databank. Remain-
search in the region of the global minimum. The
ing are the r - 1 backbone dihedral angles pairs.
idea is to fit all known local minima with a con-
These also are not completely independent; they
vex function which underestimates all of them, but
are severely constrained by known chemical data
which differs from them by the minimum possible
(the Ramachandran plot) for each of the 20 amino
amount in the discrete L1 norm. The minimum of
acid residues. Furthermore, since the atoms from
this underestimator is used to predict the global
one Ca to the next Ca along the backbone can
minimum for the function, allowing a more local-
be grouped into rigid planar peptide units, there
ized conformer search to be performed based on
are no extra parameters required to express the
the predicted minimum.
three-dimensional position of the attached O and
More precisely, given an r-residue structure with
H peptide atoms. Hence, these bond lengths and
n = 2 r - 2 backbone dihedral angles, denote
bond angles are also known and fixed.
a conformation of this simplified model by ¢ E
Another key element of this simplified polypep- R n, and the corresponding simplified potential en-
tide model is that each sidechain is classified as ergy function value by F(¢). Then, assuming that
either hydrophobic or polar, and is represented by k > 2n + 1 local minimum conformations ¢(J), for
only a single 'virtual' center of mass atom. Since j = 1 , . . . , k, have been computed, a convex qua-
each sidechain is represented by only the single dratic underestimating function U(¢) is fitted to
center of mass 'virtual atom' C8, no extra pa- these local minima so that it underestimates all the
rameters areneeded to define the position of each local minima, and normally interpolates F(¢(J)) at
sidechain with respect to the backbone mainchain. 2n + 1 points. This is accomplished by determining
The twenty amino acids are thus classified into two
the coefficients in the function U(¢) so that
groups, hydrophobic and polar, according to the
scale given by S. Miyazawa and R.L. Jernigan [7]. 5j - F ( ¢ (j)) - U ( ¢ (j)) >_ 0 (1)
Corresponding to this simplified polypeptide
model is a simple energy function. This function for j - 1 , . . . , k, and where ~--~jn=_1 5j is minimized.
includes four components: a contact energy term That is, the difference between F(¢) and U(¢) is
favoring pairwise hydrophobic residues, a second minimized in the discrete L1 norm over the set
contact term favoring hydrogen bond formation of k local minima ¢(J), j - 1,... ,k. Of course,
between donor NH and acceptor C ~ = O pairs, a this 'underestimator' only underestimates known
steric repulsive term which rejects any conforma- local minima. The specific underestimating func-
tion that would permit unreasonably small inter- tion U(¢) used in this convex global underestima-
atomic distances, and a main chain torsional term tot (CGU) method is given by
422
Molecular structure determination: Convex global underestimation
additional set of constraints are imposed on the order k, and/In is the n × (n + 1) 'augmented'
coefficients of U(¢): matrix ( O ' I n ) where In is the identity ma-
trix of order n.
ci -+-¢_.idi <_ O,
i--1,...,n. (3) Since the matrix in (5) has more rows than
¢i -Jr-¢idi >_ O, columns (2(k + n) rows and k + 2n + 1 columns,
Note that the satisfaction of (3) implies that ci < 0 where k > 2n + 1), it is computationally more effi-
and di > 0 for i = 1 , . . . , n . cient to consider it as a dual problem, and to solve
The unknown coefficients ci, i = 0 , . . . , n, and the equivalent primal. After some simple transfor-
mations, this primal problem reduces to:
di, i = 1 , . . . , n, can be determined by a linear pro-
gram which may be considered to be in the dual min f T y 1 _ f T ek
form. For reasons of efficiency, the equivalent pri-
mal of this problem is actually solved, as described s.t. I~nT - -
for an approximation to the global minimum of the first of the 2n + 1 constraints in (6) in fact requires
correct energy function F(¢). t h a t e kT Yl - 1, t h e n t h e function fTyl- f T e k is
also bounded below, and so this primal linear pro-
An efficient linear programming formulation
gram always has an optimal solution. This optimal
and solution satisfying (1)-(3) will now be sum-
marized. Let f(J) - F(¢(J)), for j - 1 , . . . , k , solution gives the values of c, d, and (f via the dual
vectors, and also determines which values of f(J)
and let f E R k be the vector with elements f(J).
are interpolated by the potential function U(¢).
Also let w(J) E R n be the vector with elements
1-(¢IJ))2 That is, the basic columns in the optimal solu-
2 ' i - 1 ' " " " ' n, and let ek C R k be the vec-
tion to (6) correspond to the conformations ¢(J)
tor of ones. Now define the following two matrices
(I) E R ( n + l ) x k and f~ E R n X k : for which F ( ¢ ( J ) ) - U(¢(J)).
~_
( T ) ek
Note that once an optimal solution to (6) has
been obtained, the addition of new local minima is
¢(11...¢(k) (4) very easy. It is done by simply adding new columns
--(co(1)...03(k)). to (I) and f~, and therefore to the constraint matrix
in (6). The number of primal rows remains fixed
Finally, let c E R n+l, d E R n, and 5 E R k be the at 2n + 1, independent of the number k of local
vectors with elements ci, di, and 5i, respectively. minima.
423
Molecular structure determination: Convex global underestimation
The convex quadratic underestimating function Hence, these bounds can be used to define the new
U(¢) determined by the values c E R n+l and d E hyperrectangle H ¢ in which to generate new con-
R n now provides a global approximation to the lo- figurations.
cal minima of F(¢), and its easily computed global Clearly, if Ec is reduced, the size of H ¢ is also
minimum point Cmin is given by (¢min)i : -ci/di, reduced. At every iteration the predicted global
i = 1 , . . . , n , with corresponding function value minimum value Umin satisfies Umin _~ F ( ¢ * ) , where
Umin given by Umin - c 0 - ~'~i=1 n ci2/di • The value ¢* is the smallest known local minimum confor-
Umin is a good candidate for an approximation to mation. Therefore, Ec = F(¢*) is often a good
the global minimum of the correct energy function choice. If at least one improved point ¢, with
F(¢), and so Cmin can be used as an initial start- F(¢) < F(¢*), is obtained in each iteration, then
ing point around which additional configurations the search domain H ¢ will strictly decrease at each
(i.e., local minima) should be generated. These lo- iteration, and may decrease substantially in some
cal minima are added to the constraint matrix in iterations.
(6) and the process is repeated. Before each iter-
ation of this process, it is necessary to reduce the T h e C G U A l g o r i t h m . Based on the preceding
volume of the hyperrectangle H ¢ over which the description, a general method for computing the
new configurations are produced so that a tighter global, or near global, energy minimum of the po-
fit of U(¢) to the local minima 'near' Cmin is con- tential energy function F ( ¢ ) can now be described.
structed.
1) Compute k > 2n 4-1 distinct local minima
The rate and method by which the hyperrectan-
¢(J), for j - 1 , . . . , k, of the function F(¢).
gle size is decreased, and the number of additional
local minima computed at each iteration must be 2) Compute the convex quadratic underestima-
determined by computational testing. But clearly tor function given in (2) by solving the linear
the method depends most heavily on computing program given in (6). The optimal solution
local minima quickly and on solving the resulting to this linear program gives the values of c
linear program efficiently to determine the approx- and d via the dual vectors.
imating function U(¢) over the current hyperrect- 3) Compute the predicted global minimum
angle. point Cmin given by (¢min)i -- -ci/di, i -
If Ec is a cutoff energy, then one means for de- 1,... ,n, with corresponding function value
creasing the size of the hyperrectangle H ¢ at any Umin given by Umin - co - ~--~in_-iC2/ (2di).
step is to let H ¢ = {¢: U(¢) <_ Ec}. To get the 4) If Cmin -- ¢*, where ¢* =
bounds of H¢, consider U(¢) < Ec where U(¢) argmin{F(¢(J))'j - 1 , 2 , . . . } is the best
satisfies (2). Then limiting ¢i requires that local minimum found so far, then stop and
report ¢* as the approximate global mini-
1
ci¢i +-~di¢i < Ec- (7) mum conformation.
5) Reduce the volume of the hyperrectangle H ¢
As before, the minimum value of U(¢) is attained over which the new configurations will be
when ¢i - - c i / d i , i - 1 , . . . , n . Assigning this produced, and remove all columns from cI,
minimum value to each ¢i, except ¢k, then results and f~ which correspond to the conformations
in which are excluded from H¢.
2 6) Use Cmin as an initial starting point around
Ck¢k + 2 dk¢2k <_ E c - 1 /#~k
co + -~ . ~ci -- ilk. (8) which additional local minima ¢(J) of F(¢)
(restricted to the hyperrectangle H¢) are
The lower and upper bounds on ¢k, k - 1 , . . . , n, generated. Add these new local minimum
are given by the roots of the quadratic equation conformations as columns to the matrices
and f~.
1 2
Ck¢k + -~dk¢k -- ilk. (9)
7) Return to step 2.
424
Monte-Carlo simulated annealing in protein folding
The number of new local minima to be gener- contact frequencies in protein structures', Protein En-
ated in step 6 is unspecified since there is currently gin. 6 (1993), 267-278.
no theory to guide this choice. In general, a value [8] PHILLIPS, A.T., ROSEN, J.B., AND WALKE, V.H.:
'Molecular structure determination by global optimi-
exceeding 2n + 1 would be required for the con- zation', in P.M. PARDALOS,G.L XUE, AND D. SHAL-
struction of another convex quadratic underesti- LOWAY (eds.): DIMACS, Amer. Math. Soc., 1995,
mator in the next iteration (step 2). In addition, pp. 181-198.
the means by which the volume of the hyperrect- [9] RICHARDS, F.M.: 'The protein folding problem', Sci-
angle H e is reduced in step 5 may vary. One could entif. A met. (1991), 54-63.
[10] SRINIVASAN, R., AND ROSE, G.D.: 'LINUS: A hierar-
use the two roots of (7) to define the new bounds
chic procedure to predict the fold of a protein', PRO-
of He. Another method would be simply to use TEINS: Struct. Funct. Genet. 22 (1995), 81-99.
He- {¢i" ((/)min)i- (~i _~ ¢i _~ (¢min)i-+-(~i} where [11] SUN, S., THOMAS, P.D., AND DILL, K.A.: 'A simple
( ~ i - I((~min)i- (¢*)il, i - - 1 , . . . , U. protein folding algorithm using binary code and sec-
For complete details of the CGU method and ondary structure constraints', Protein Engin. 8, no. 8
(1995), 769-778.
its computational results, see [5], [8].
[12] YUE, K., AND DILL, K.A.: 'Folding proteins with a
See also: Simulated annealing methods in simple energy function and extensive conformational
protein folding; Packet annealing; Phase searching', Protein Sci. 5 (1996), 254-261.
problem in X-ray crystallography: Shake A. T. Phillips
and bake approach; Global optimization in Computer Sci. Dept. Univ. Wisconsin-Eau Claire
protein folding; Multiple minima problem Eau Claire, WI 54701, USA
in protein folding: aBB global optimiza- E-mail address: phillipa©uwec, edu
tion approach; Adaptive simulated anneal- MSC2000: 65K05, 90C26
ing and its application to protein folding; Key words and phrases: protein folding, molecular structure
Genetic algorithms; Global optimization in determination, convex global underestimation.
Lennard-Jones and Morse clusters; Protein
folding: Generalized-ensemble algorithms;
Monte-Carlo simulated annealing in protein MONTE-CARLO SIMULATED ANNEALING
folding; Simulated annealing. IN PROTEIN FOLDING
We review uses of Monte-Carlo simulated anneal-
References ing in the protein folding problem. We will discuss
[1] ABAGYAN, R.A: 'Towards protein folding by global en- the strategy for tackling the protein folding prob-
ergy optimization', Federation of Europ. Biochemical lem based on all-atom models. Our approach con-
Soc.: Lett. 325 (1993), 17-22. sists of two elements: the inclusion of accurate sol-
[2] ANDROULAKIS, I.R., MARANIS, C.D., AND FLOUDAS, vent effects and the development of powerful sim-
C.A.: 'Prediction of oligopeptide conformations via de- ulation algorithms that can avoid getting trapped
terministic global optimization', J. Global Optim. 11
(1997), 1-34.
in states of energy local minima. For the former,
[3] BENNER, S.A., AND GERLOFF, D.L.: 'Predicting the we discuss several models varying in nature from
conformation of proteins: man versus machine', Fed- crude (distance-dependent dielectric function) to
eration of Europ. Biochemical Soc.: Lett. 325 (1993), rigorous (reference interaction site model). For the
29-33. latter, we show the effectiveness of Monte-Carlo
[4] DILL, K.A.: 'Dominant forces in protein folding', Bio-
simulated annealing.
chemistry 29, no. 31 (1990), 7133-7155.
[5] DILL, K.A., PHILLIPS, A.T., AND ROSEN, J.B.: 'Pro-
tein structure and energy landscape dependence on se- Introduction. Proteins under their native physi-
quence using a continuous energy function', J. Comput. ological conditions spontaneously fold into unique
Biol. 4, no. 3 (1997), 227-239. three-dimensional structures (tertiary structures)
[6] MERZ, K., AND GRAND, S. LE: The protein folding
in the time scale of milliseconds to minutes. Al-
problem and tertiary structure prediction, Birkh~iuser,
1994. though protein structures appear to be depen-
[7] MIYAZAWA, S., AND JERNIGAN, R.L.: 'A new substi- dent on various environmental factors within the
tution matrix for protein sequence searches based on cell where they are synthesized, it was inferred by
425
Monte-Carlo simulated annealing in protein folding
426
Monte-Carlo simulated annealing in protein folding
427
Monte-Carlo simulated annealing in protein folding
ing function of energy and the Boltzmann fac- K (the final temperature TF was sometimes set
tor WB(E) decreases exponentially with E, the equal to 100 K, 50 K, or 1 K)[23], [46]. The tem-
probability distribution PB(T,E) has a bell-like perature for the nth MC sweep is given by
shape in general. When the temperature is high,
is small, and WB(E) decreases slowly with E. T n --TI'~ n - I , (9)
So, PB(T, E) has a wide bell-shape. On the other
where ~ is a constant which is determined by TI,
hand, at low temperature fl is large, and WB(E)
TF, and the total number of MC sweeps of the
decreases rapidly with E. So, PB(T,E) has a
run. Each run consists of 104 ~ 106 MC sweeps,
narrow bell-shape (and in the limit T --+ 0 K,
and we usually made 10 to 20 runs from different
PB(T,E) oc 5 ( E - EGS), where EGS is the global-
initial conformations.
minimum energy). However, it is very difficult to
obtain canonical distributions at low temperatures
with conventional simulation methods. This is be- R e s u l t s . We now present the results of our simula-
cause the thermal fluctuations at low tempera- tions based on Monte-Carlo simulated annealing.
tures are small and the simulation will certainly All the simulations were started from randomly-
get trapped in states of energy local minima. Sim- generated conformations.
(a)
ulated annealing [27] is based on the process of 35
428
Monte-Carlo simulated annealing in protein folding
The first example is Met-enkephalin. This brain ferred from NMR experiments ([13, Fig. 2]). The
neuro peptide consists of 5 amino acids with the figures were created with RasMol [55].
amino-acid sequence: T y r - G l y - G l y - P h e - M e t . Be- We see a striking similarity between simulation
cause it is one of the smallest peptides that have results in water (Fig. 3c)) and those ofNMR exper-
biological functions, it has served as a bench iments (Fig. 3d)). The simulation results in Fig. 3
mark for testing a new simulation method. The are from the same number of MC sweeps. It seems
global minimum conformation of this peptide for that the presence of water speeds up the conver-
E C E P P / 2 energy function in gas phase (~ - 2) gence of the backbone structures in the sense that
is known [31], [49]. For KONF90 realization of it requires less number of MC sweeps for conver-
E C E P P / 2 energy, the peptide is essentially in the gence [26].
ground state for Ep _< - 1 1 kcal/mol [49], [15] and ..( (a)
the lowest value is -12.2 kcal/mol [17], [16].
In Fig. 1, we show the 'time series' of the to-
tal conformational energy Ep (in (1)) obtained by
conventional canonical Monte-Carlo simulations at
T - 1000, 300, and 50 K. ., .,
~, ~ i-~~- .~,.~:~-'"-,~
.--.~ ~, i ....
K in Fig. lc) are very small and this run has appar- !:~" 7'--~,, . •" ~ )",,
-:~-=:.: , .,:................
~.,
.: ,.~: ..,,
....
creased during the simulation from 1000 K to 50 "-~... ~a )'-.,.,. ....
.
K.
35
3O
25
20
15
"
uJ 1050
-5
-10
-15
0 50000 100000 150000 200000 Fig. 3: Superposition of the eight conformations of
MC Sweep
Met-enkephalin obtained as the lowest-energy structures
Fig. 2" Time series of energy Ep (kcal/mol) of by Monte-Carlo simulated annealing in gas phase (a),
Met-enkephalin from a Monte-Carlo simulated annealing simple-repulsive solvent (b), and water (c) together with
run. superposition of five conformations deduced from the
NMR experiment (d).
We have up to now presented the results in gas
phase (e = 2). In Fig. 3 we compare the super- The solvation free energy based on the RISM
posed structures of lowest-energy conformations theory is very accurate, but it is also computa-
from 8 Monte-Carlo simulated annealing runs in tionally very demanding. We are currently try-
gas phase, simple-repulsive solvent, and water (the ing to solve this problem making the algorithm
latter two contributions were calculated by the more efficient and robust [24]. Hereafter, we dis-
RISM theory) [26] with those of 5 structures in- cuss how well other solvation theories can still de-
429
Monte-Carlo simulated annealing in protein folding
scribe the effects of solvent in the prediction of segment with ~-strand length m _> 3.
three-dimensional structures of oligopeptides and In Table 1 we summarize the c~-helix formation
small proteins. in the 20 Monte-Carlo simulated annealing runs
Next systems we discuss are those of homo- [44]. The results are for Definition II of the a-helix
oligomers with length of 10 amino acids. From the state.
structural data base of X-ray experiments of pro- We see that (Met)i0, (Ala)i0, and (Leu)i0 gave
tein structures [8] and CD experiments [6], it is many helical conformations: 15, 9, and 9 (out
known that certain amino acids have more ten- of 20), respectively. In particular, (Met)i0 and
dency of c~-helix formation than others. For in- (Ala)i0 produced long helices, some conformations
stance, alanine is a helix former and glycine is a being almost entirely helical (t~ _> 8). On the other
helix breaker, while phenylalanine has intermedi- hand, (Val)i0, (Ile)i0, and (Gly)i0 gave few helical
ate helix-forming tendency. We have performed 20 conformations: 2, 2, and I (out of 20), respectively.
Monte-Carlo simulated annealing runs of 10,000 We obtained not only a smaller number of helices
MC sweeps in gas phase (c = 2) with each of but also shorter helices for these homo-oligomers
(Ala)10, (Leu)10, (Met)10, (Phe)10, (Ile)10, (Val)10, than the above three homo-oligomers. Finally, the
and (Gly)i0 [44]. These amino acids are nonpolar results for (Phe)i0 indicate that Phe has interme-
and we can avoid the complications of electrostatic diate helix-forming tendency between these two
and hydrogen-bond interactions of side chains with groups. We thus have the following rank order of
each other, with main chain, and with the solvent. helix-forming tendency for the seven amino acids
[44]"
In order to analyze how much c~-helix formation
is obtained by simulations, we first define c~-helix Met > Ala > Leu > Phe > Val > Ile > Gly. (10)
state of a residue. We consider that a residue is in
This can be compared with the experimentally de-
the c~-helix state when the dihedral angles (¢, ¢)
termined helix propensities [8], [6]. Our rank order
fall in the range ( - 6 0 + 45 °, - 5 0 + 45 °) (Definition
(10) is in good agreement with the experimental
I) [23], [46]. The length g of a helical segment is
data.
then defined by the number of successive residues
that are in the c~-helix state. The number n of he- We then analyzed the relation between helix-
lical residues in a conformation is defined by the forming tendency and energy. We found that
sum of ~ over all helical segments in the conforma- the differences A E = E N H - EH between min-
tion. Note that t~ = 3 corresponds to roughly one imum energies for nonhelical (NH) and helical
turn of c~-helix. We therefore consider a conforma- (H) conformations is large for homo-oligomers
tion as helical if it has a segment with helix length with high helix-forming tendency (9.7, 10.2, 21.5
~>3. kcal/mol for (Met)10, (Ala)i0, (Leu)i0, respec-
tively) and small for those with low helix-forming
The average values of the dihedral angles ¢ and
tendency (0.5, 1.6, - 3 . 2 kcal/mol for (Val)i0,
¢ for the helical segments based on Definition I (Ile)i0, (Gly)10, respectively). Moreover, we found
(with helix length t~ _> 3) are - 7 0 ° and - 3 7 °, re- that the large A E for the former homo-oligomers
spectively, and the standard deviation is --~ 10 ° for are caused by the Lennard-Jones term AELj
E C E P P / 2 energy function [46], [44]. Hence, for de-
(13.3, 8.0, 17.5 kcal/mol for (Met)j0, (Ala)i0,
tailed analyses of the data we adopt a more strin- (Leu)i0, respectively). Hence, we conjecture that
gent criterion for a-helix state (Definition II)- The the differences in helix-forming tendencies are de-
range is (¢, ¢) - ( - 7 0 d: 20 °, - 3 7 + 20 °) [44]. termined by the following factors [44]. A helical
We likewise consider that a residue is in the/~- conformation is energetically favored in general be-
strand state when the dihedral angles (¢, ¢) fall in cause of the Lennard-Jones term ELj. For amino
the range ( - 1 3 0 + 5 0 °, 135:i:45 °) [44]. The ~-strand acids with low helix-forming tendency except for
length rn is then defined to be the number of suc- Gly, however, the steric hindrance of side chains
cessive residues that are in the f~-strand state. We raises ELj of helical conformations so that the
consider a conformation as f~-stranded if it has a difference AELj between nonhelical and helical
430
Monte-Carlo simulated annealing in protein ]olding
conformations are reduced significantly. The small gorithms [3] in [48], [47]. The obtained results gave
AELj for these amino acids can be easily overcome quantitative support to those by Monte-Carlo sim-
by the entropic effects and their helix-forming ten- ulated annealing described above [44].
dencies are small. Note that such amino acids (Val
and Ile here) have two large side-chain branches We have so far studied peptides with nonpolar
at C ~, while the helix forming amino acids such as amino acids each of which is electrically neutral
Met and Leu have only one branch at C ~ and Ala as a whole. We now discuss the helix-forming ten-
has a small side chain. dencies of peptides with polar amino acids where
We now study the ~-strand forming tendencies side chains are charged by protonation or depro-
of these seven homo-oligomers. In Table 2 we sum- tonation. One example is the C-peptide, residues
marize the ~-strand formation in 20 Monte-Carlo 1-13 of ribonuclease A. It is known from the X-
simulated annealing runs [44]. ray diffraction data of the whole enzyme that the
The implications of the results are not as obvi- segment from Ala-4 to G l n - l l exhibits a nearly 3-
ous as in the c~-helix case. This is presumably be- turn c~-helix [64], [58]. It was also found by CD
cause a short, isolated ¢/-strand is not very stable [56] and NMR [53] experiments that the isolated
by itself, since hydrogen bonds between ~-strands C-peptide also has significant c~-helix formation in
are needed to stabilize them. However, we can still aqueous solution at temperatures near 0°C.
give a rough estimate for the rank order of strand-
forming tendency for the seven amino acids [44]:
Furthermore, the CD experiment of the isolated
C-peptide showed that the side-chain charges of
Val > Ile > P he > Leu > Ala > Met > G ly. (11)
residues Glu-2- and His-12 + enhance the stabil-
Here, we considered Val as more strand-forming ity of the c~-helix, while the rest of the charges
than Ile, since the longer the strand segment is, of other side chains do not [56]. The NMR ex-
the harder it is to form by simulation. Our rank periment [53] of the isolated C-peptide further
order (11) is again in good agreement with the ex- observed the formation of the characteristic salt
perimental data [8]. bridge between Glu-2- and Arg-10 + that exists in
By comparing (11) with (10), we find that the the native structure determined by the X-ray ex-
helix-forming group is the strand-breaking group periments of the whole protein [64], [58].
and vice versa, except for Gly. Gly is both helix
and strand breaking. This reflects the fact that In order to test whether our simulations can re-
Gly, having no side chain, has a much larger (back- produce these experimental results, we made 20
bone) conformational space than other amino Monte-Carlo simulated annealing runs of 10,000
acids. MC sweeps with several C-peptide analogues [23],
The helix-coil transitions of homo-oligomer sys- [46]. The amino-acid sequences of four of the ana-
tems were further analyzed by multicanonical al- logues are listed in Table 3.
431
Monte-Carlo simulated annealing in protein folding
Table 2" ~-Strand formation in homo-oligomers from 20 Monte-Carlo simulated annealing runs.
432
Monte-Carlo simulated annealing in protein folding
i
, ?
' ;]'
]°
/
chains is the key factor for the difference in helix-
forming tendency. When some of the side chains
are charged, however, these charges play an im-
+ portant role in the helix stability in addition to
the above factor: Some charges enhance helix sta-
bility, while others reduce it.
We have up to now discussed a-helix for-
mations in our simulations of oligopeptide sys-
tems. We have also studied /~-sheet forma-
tions by Monte-Carlo simulated annealing [38],
[39], [51]. The peptide that we studied is the
1 (c) fragment corresponding to residues 16-36 of
bovine pancreatic trypsin inhibitor (BPTI) and
has the amino-acid sequence" Ala:6-Arg+-Ile-Ile -
!,, Arg +-Tyr-Phe-Tyr-Asn-Ala-Lys+-Ala-Gly-Leu-
Cys-Gln-Thr-Phe-Val-Tyr-Gly 36. An antiparallel
T 'i .
~-sheet structure in residues 18-35 is observed in
X-ray crystallographic data of the whole protein
[10].
We first performed 20 Monte-Carlo simulated
annealing runs of 10,000 MC sweeps in gas phase
(c = 2) with the same protocol as in the pre-
vious simulations [38]. Namely, the temperature
was decreased exponentially from 1000 K to 250 K
for each run, and all the simulations were started
Fig. 4: The lowest-energy conformations of Peptide II (a)
and Peptide III (b) of C-peptide analogues obtained from from random conformations. The difference of the
20 Monte-Carlo simulated annealing runs in gas phase, present simulation and the previous ones comes
and the corresponding X-ray structure (c). only from that of the amino-acid sequences.
433
Monte-Carlo simulated annealing in protein/olding
The most notable feature of the obtained results resentation of solvent by the sigmoidal dielectric
is that a-helices, which were the dominant motif in function (which gave c~-helices instead) is therefore
previous simulations of C-peptide and other pep- not sufficient. Hence, the same peptide fragment,
tides, are absent in the present simulation. Most BPTI(16-36), was further studied in aqueous solu-
of the conformations obtained consist of stretched tion that is represented by solvent-accessible sur-
strands and a 'turn' which connects them. The face area of (3) by Monte-Carlo simulated an-
lowest-energy structure indeed exhibits an antipar- healing [51]. Twenty simulation runs of 100,000
allel ~-sheet [38]. MC sweeps were made. It was indeed found that
the lowest-energy structure obtained has a ~-
We next made 10 Monte-Carlo simulated an-
sheet structure (actually, type II ~ ~-turn) at the
nealing runs of 100,000 MC sweeps for BPTI(16-
very location suggested by the NMR experiments
36) with two dielectric functions: c = 2 and the
[40]. This structure and that deduced from the
sigmoidal, distance-dependent dielectric function
X-ray experiments [10] are compared in Fig. 5.
of (2) [39]. The results with c = 2 reproduced our
The figures were created with Molscript [29] and
previous results: Most of the obtained conforma-
Raster3D [2], [35].
tions have ~-strand structures and no extended c~-
helix is observed. Those with the sigmoidal dielec- Although both conformations are ~-sheet struc-
tric function, on the other hand, indicated forma- tures, there are important differences between the
tion of a-helices. One of the low-energy conforma- two: The positions and types of the turns are differ-
tions, for instance, exhibited about a four-turn c~- ent. Since the X-ray structure is taken from the ex-
helix from Ala-16 to Gly-28 [39]. This presents an periments on the whole B P T I molecule, it does not
example in which a peptide with the same amino- have to agree with that of the isolated BPTI(16-
acid sequence can form both c~-helix and ~-sheet 36) fragment. It was found [51] that the simulated
structures, depending on its electrostatic environ- results in Fig. 5b) have remarkable agreement with
ment. those in the NMR experiments of the isolated frag-
ment [40].
We have so far dealt with peptides with small
number of amino acids (up to 21) with simple sec-
ondary structural elements: a single a-helix or ~-
sheet. The native proteins usually have more than
one secondary structural elements. We now discuss
our attempts on the first-principles tertiary struc-
ture predictions of larger and more complicated
systems.
The first example is the fragment correspond-
ing to residues 1-34 of human parathyroid hor-
".~ ~ ~:... ~ ~.,,
mone (PTH). An NMR experiment of PTH(1-34)
.,.,.,-,,.""~ suggested the existence of two c~-helices around
residues from Ser-3 to His-9 and from Set-17 to
Let-28 [28]. Another NMR experiment of a slightly
longer fragment, PTH(1-37), in aqueous solution
Fig. 5: The structure of BPTI(16-36) deduced from X-ray also suggested the existence of the two helices [32].
experiments (a) and the lowest-energy conformation of One of the determined structures, for instance, has
BPTI(16-36) obtained from 20 Monte-Carlo simulated c~-helices in residues from Gln-6 to His-9 and from
annealing runs in aqueous solution represented by
Ser-17 to Lys-27 [32].
solvent-accessible surface area (b).
For PTH(1-34) we performed 20 Monte-Carlo
NMR experiments suggest that this peptide ac- simulated annealing runs of 10,000 MC sweeps
tually forms a f~-sheet structure [40]. The rep- in gas phase (c = 2) with the same protocol as
434
Monte-Carlo simulated annealing in protein folding
in the previous simulations [50]. Many conforma- nealing [34] to compare with the results of the
tions among the 20 final conformations obtained recent NMR experiment in aqueous solution [32].
exhibited c~-helix structures (especially in the N- Ten simulation runs of 100,000 MC sweeps were
terminus area). In Fig. 6 we show the lowest-energy made in gas phase (e = 2) and in aqueous solu-
conformation of PTH(1-34) [50]. tion that is represented by the terms proportional
This conformation indeed has two c~-helices to the solvent-accessible surface area (see (3)). Al-
around residues from Val-2 to Asn-10 (Helix 1) though the results are preliminary, the simulations
and from Met-18 to Glu-22 (Helix 2), which are in gas phase did not produce two helices this time
precisely the same locations as suggested by exper- in contrast to the previous work [50], where a short
iment [28], although Helix 2 is somewhat shorter second helix was observed, as discussed in the pre-
(5 residues long) than the corresponding one (12 vious paragraph. The lowest-energy conformation
residues long) in the experimental data. has an c~-helix from Val-2 to Asn-10. The simula-
tions in aqueous solution, on the other hand, did
observe the two c~-helices. The lowest-energy con-
formation obtained has a-helices from Gln-6 to
His-9 and from Gly-12 to Glu-22. Note that the
elix 2 second helix is now more extended than the first
one in agreement with experiments. This structure
together with one of the NMR structure [32] is
shown in Fig. 7. The figures were again created
with Molscript [29] and Raster3D [2], [35].
Generalized-ensemble simulations of P T H ( 1 -
~ - Helix I
37) are now in progress in order to obtain more
quantitative information such as average helicity
as a function of residue number, etc.
Fig. 6: Lowest-energy conformation of PTH(1-34)
obtained from 20 Monte-Carlo simulated annealing runs in The second example of more complicated sys-
gas phase. tern is the immunoglobulin-binding domain of
streptococcal protein G. This protein is composed
..~-.~--~
"? c
, . ~ ' ~
., :..... "
of 56 amino acids and the structure determined
by an NMR experiment [14] and an X-ray diffrac-
J tion experiment [1] has an a-helix and a ~-sheet.
The c~-helix extends from residue Ala-23 to residue
Asp-36. The /~-sheet is made of four ~-strands:
(b)
from Met-1 to Gly-9, from Let-12 to Ala-20, from
L
Glu-42 to Asp-46, and from Lys-50 to Glu-56.
This structure is shown in Fig. 8a). The figures
in Fig. 8 were again created with Molscript [29]
. - ~ ~ - and Raster3D [2], [35].
We have performed eight Monte-Carlo simu-
lated annealing runs of 50,000 to 400,000 MC
Fig. 7: A structure of PTH(1-37) deduced from NMR sweeps with the sigmoidal, distance-dependent di-
experiments (a) and the lowest-energy conformation of electric function of (2). The lowest-energy confor-
PTH(1-37) obtained from 10 Monte-Carlo simulated mation so far obtained has four a-helices and no
annealing runs in aqueous solution represented by
~-sheet in disagreement with the X-ray structure.
solvent-accessible surface area (b).
This structure is shown in Fig. 8b).
A slightly larger peptide fragment, PTH(1-37), The disagreement of the lowest-energy structure
was also studied by Monte-Carlo simulated an- (Fig. 8b)) so far obtained with the X-ray structure
435
Monte-Carlo simulated annealing in protein folding
(Fig. 8a)) is presumably caused by the poor repre- of some solvent effects is very important for a suc-
sentation of the solvent effects. As can been seen cessful prediction of the tertiary structures of small
in Fig. 8 a), the X-ray structure has both interior peptides and proteins.
where a well-defined hydrophobic core is formed See also: S i m u l a t e d a n n e a l i n g m e t h o d s in
and exterior where it is exposed to the solvent. p r o t e i n folding; P a c k e t annealing; P h a s e
The distance-dependent dielectric function, which p r o b l e m in X - r a y c r y s t a l l o g r a p h y : Shake
mimics the solvent effects only in electrostatic in- and bake approach; Global o p t i m i z a t i o n in
teractions, is therefore not sufficient to represent p r o t e i n folding; M u l t i p l e m i n i m a p r o b l e m
the effects of the solvent here. in p r o t e i n folding: a B B global optimiza-
(a)
tion approach; A d a p t i v e s i m u l a t e d anneal-
ing and its a p p l i c a t i o n to p r o t e i n fold-
ing; Genetic algorithms; M o l e c u l a r struc-
t u r e d e t e r m i n a t i o n : Convex global underes-
timation; Global o p t i m i z a t i o n in L e n n a r d -
• "-- r "
J o n e s and M o r s e clusters; P r o t e i n fold-
t .Z "~'
~ °
(b) ing: G e n e r a l i z e d - e n s e m b l e algorithms; Sim-
u l a t e d annealing; Bayesian global optimi-
zation; R a n d o m search m e t h o d s ; Stochas-
tic global o p t i m i z a t i o n : T w o - p h a s e m e t h -
ods; Global o p t i m i z a t i o n based on statisti-
k, .M. cal models; Stochastic global optimization:
S t o p p i n g rules; G e n e t i c a l g o r i t h m s for pro-
Fig. 8: A structure of protein G deduced from an X-ray
tein s t r u c t u r e prediction; M o n t e - C a r l o sim-
experiment (a) and the lowest-energy conformation of ulations for stochastic optimization.
protein G obtained from Monte-Carlo simulated annealing
runs with the distance-dependent dielectric function (b).
References
[1] ACHARI, A., HALE, S.P., HOWARD, A.J., CLORE,
G.M., GRONENBORN, A.M., HARDMAN, K.D., AND
Conclusions. In this article we have reviewed the-
WHITLOW, M.: '1.67-/~ X-ray structure of the B2
oretical aspects of the protein folding problem. immunoglobulin-binding domain of streptococcal pro-
Our strategy in tackling this problem consists of tein G and comparison to the NMR structure of the
two elements: 1) inclusion of accurate solvent ef- B1 domain', Biochemistry 31 (1992), 10449-10457.
fects, and 2) development of powerful simulation [2] BACON, D., AND ANDERSON, W.F.: 'A fast algorithm
algorithms that can avoid getting trapped in states for rendering space-filling molecular pictures', J. Mol.
Graphics 6 (1988), 219-220.
of energy local minima.
[3] BERG, B.A., AND NEUHAUS, W.: 'Multicanonical al-
We have shown the effectiveness of Monte-Carlo gorithms for first order phase transitions', Phys. Leli.
simulated annealing by showing that direct folding B267 (1991), 249-253.
of (~-helix and ~-sheet structures from randomly- [4] BROOKS, III, C.L.: 'Simulations of protein folding and
generated initial conformations are possible. unfolding', Curt. Opin. Struct. Biol. 8 (1998), 222-226.
[5] BRUNGER, A.T.: 'Crystallographic refinement by simu-
As for the solvent effects, we considered sev- lated annealing: Application to a 2.8/~ resolution struc-
eral methods: a distance-dependent dielectric func- ture of aspartate aminotransferase', J. Mol. Biol. 203
tion, a term proportional to solvent-accessible (1988), 803-816.
surface area, and the reference interaction site [6] CHAKRABARTTY, A., KORTEMME, W., AND BALDWIN,
model (RISM). These methods vary in nature from R.L.: 'Helix propensities of the amino acids measured
in alanine-based peptides without helix-stabilizing
crude but computationally inexpensive (distance-
side-chain interactions', Protein Sci. 3 (1994), 843-852.
dependent dielectric function) to accurate but [7] CHANDLER, D., AND ANDERSEN, H.C.: 'Optimized
computationally demanding (RISM theory). In the cluster expansions for classical fluids. Theory of molec-
present article, we have shown that the inclusion ular liquids', J. Chem. Phys. 57 (1972), 1930-1937.
436
Monte-Carlo simulated annealing in protein folding
Is] CHOU, P.Y., AND FASMAN, G.D.: 'Prediction of pro- Carlo simulated annealing method', Protein Engin. 3
tein conformation', Biochemistry 13 (1974), 222-245. (1989), 85-94.
[9] DAGGETT, V., KOLLMAN, P.A., AND KUNTZ, I.D.: [23] KAWAI, H., OKAMOTO, Y., FUKUGITA, M.,
'Molecular dynamics simulations of small peptides: de- NAKAZAWA, T., AND KIKUCHI, T.: 'Prediction of
pendence on dielectric model and pH', Biopolymers 31 a-helix folding of isolated C-peptide of ribonuclease
(1991), 285-304. A by Monte Carlo simulated annealing', Chem. Lett.
[i0] DEISENHOFER, J., AND STEIGEMANN, W.: 'Crystallo- (1991), 213-216.
graphic refinement of the structure of bovine pancreatic [24] KINOSHITA, M., OKAMOTO, Y., AND HIRATA, F.: 'Cal-
trypsin inhibitor at 1.5/~ resolution', Acta Crystallogr. culation of hydration free energy for a solute with many
331 (1975), 238-250. atomic sites using the RISM theory: robust and effi-
[Ii] DILL, K.: 'The meaning of hydrophobicity', Science cient algorithm', J. Comput. Chem. 18 (1997), 1320-
250 (1990), 297-297. 1326.
[12] EPSTAIN, C.J., GOLDBERGER, R.F., AND ANFINSEN, [25] KINOSHITA, M., OKAMOTO, Y., AND HIRATA, F.: 'Sol-
C.B.: 'The genetic control of tertiary protein struc- vation structure and stability of peptides in aqueous so-
ture: studies with model systems', Cold Spring Harbor lutions analyzed by the reference interaction site model
Symp. Quant. Biol. 28 (1963), 439-449. theory', J. Chem. Phys. 107 (1997), 1586-1599.
[13] GRAHAM, W.H., II, E.S. CARTER, AND HICKS, R.P.: [26] KINOSHITA, M., OKAMOTO, Y., AND HIRATA, F.:
'Conformational analysis of Met-enkephalin in both 'First-principle determination of peptide conformations
aqueous solution and in the presence of sodium dode- in solvents: combination of Monte Carlo simulated an-
cyl sulfate micelles using multidimensional NMR and nealing and RISM theory', J. Amer. Chem. Soc. 120
molecular modeling', Biopolymers 32 (1992), 1755- (1998), 1855-1863.
1764. [27] KIRKPATRICK, S., C.D. GELATT, JR., AND VECCHI,
[14] GRONENBORN, A.M., FILPULA, D.R., Essm, N.Z., M.P.: 'Optimization by simulated annealing', Science
ACHARI, A., WHITLOW, M., WINGFIELD, P.T., AND 220 (1983), 671-680.
CLORE, G.M.: 'A novel, highly stable fold of the im- [2s] KLAUS, W., DIECKMANN, T., WRAY, V., SCHOM-
munoglobulin binding domain of streptococcal protein BURG, D., WINGENDER, E., AND MAYER, H.: 'In-
G', Science 253 (1991), 657-661. vestigation of the solution structure of the human
[15] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Prediction parathyroid hormone fragment (1-34) by IH NMR
of peptide conformation by multicanonical algorithm: spectroscopy, distance geometry, and molecular dy-
new approach to the multiple-minima problem', J. namics calculations', Biochemistry 30 (1991), 6936-
Comput. Chem. 14 (1993), 1333-1338. 6942.
[16] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Compara- [29] KRAULIS, P.J.: 'MOLSCRIPT: A program to produce
tive study of multicanonical and simulated annealing both detailed and schematic plots of protein struc-
algorithms in the protein folding problem', Phys. A tures', J. Appl. Crystallogr. 24 (1991), 946-950.
212 (1994), 415-437. [3o] LEVINTHAL, C.: 'Are there pathways for protein fold-
[17] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Sampling ing?', J. Chem. Phys. 65 (1968), 44-45.
ground-state configurations of a peptide by multi- [31] LI, Z., AND SCHERAGA, H.A.: 'Monte Carlo-
canonical annealing', J. Phys. Soc. Japan 63 (1994), minimzation approach to the multiple-minima prob-
3945-3949. lem in protein folding', Proc. Nat. Acad. Sci. USA 84
[is] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Tertiary (1987), 6611-6615.
structure prediction of C-peptide of ribonuclease A [32] MARX, U.T., AUSTERMANN, S., BAYER, P., ADER-
by multicanonical algorithm', J. Phys. Chem. B 102 MANN, K., EJCHART, A., STICHT, H., WALTER, S.,
(1998), 653-656. SCHMID, F.-X., JAENICKE, R., FORSSMANN, W.-G.,
[19] HANSMANN, U.H.E., AND OKAMOTO, Y.: 'Effects of AND ROSCH, P.: 'Structure of human parathyroid hor-
side-chain charges on a-helix stability in C-peptide of mone 1-37 in solution', J. Biol. Chem. 270 (1995),
ribonuclease A studied by multicanonical algorithm', 15194-15202.
J. Phys. Chem. B 103 (1999), 1595-1604. [33] MASUYA, M., AND OKAMOTO, Y., in preparation.
[2o] HINGERTY, B.E., RITCHIE, R.H., FERRELL, T., AND [34] MASUYA, M., AND OKAMOTO, V., in preparation.
TURNER, J.E.: 'Dielectric effects in biopolymers: the [35] MERRITT, E.A., AND MURPHY, M.E.P.: 'Raster3D
theory of ionic saturation revisited', Biopolymers 24 version 2.0. A program for photorealistic molecular
(1985), 427-439. graphics', Acta Crystallogr. DS0 (1994), 869-873.
[21] HIRATA, F., AND ROSSKY, P.J.: 'An extended RISM [36] METROPOLIS, N., ROSENBLUTH, A., ROSENBLUTH,
equation for molecular polar fluids', Chem. Phys. Left. M., TELLER, A., AND TELLER, E.: 'Equation of state
83 (1981), 329-334. calculations by fast computing machines', J. Chem.
[22] KAWAI, H., KIKUCHI, T., AND OKAMOTO, Y.: 'A pre- Phys. 21 (1953), 1087-1092.
diction of tertiary structures of peptide by the Monte [37] MOMANY, F.A., McGumE, R.F., BURGESS, A.W.,
437
Monte-Carlo simulated annealing in protein folding
AND SCHERAGA, H.A.: 'Energy parameters in polypep- fragment (1-34) predicted by Monte Carlo simulated
tides. VII. Geometric parameters, partial atomic annealing', Internat. J. Peptide Protein Res. 42 (1993),
charges, nonbonded interactions, hydrogen bond inter- 300-303.
actions, and intrinsic torsional potentials for the natu- [51] OKAMOTO, Y., MASUYA, M., NABESHIMA, M., AND
rally occurring amino acids', J. Phys. Chem. 79 (1975), NAKAZAWA, T.: '~-Sheet formation in BPTI(16-36) by
2361-2381. Monte Carlo simulated annealing', Chem. Phys. Lett.
[38] NAKAZAWA, T., KAWAI, H., OKAMOTO, Y., AND 299 (1999), 17-24.
FUKUGITA, M.: '~-sheet folding of bovine pancreatic [52] OOI, T., OOBATAKE, M., NI3METHY, C., AND SCHER-
trypsin inhibitor fragment (16-36) as predicted by AGA, H.A.: 'Accessible surface areas as a measure of
Monte Carlo simulated annealing', Protein Engin. 5 the thermodynamic parameters of hydration of pep-
(1992), 495-503. tides', Proc. Nat. Acad. Sci. USA 84 (1987), 3086-
[39] AKAZAWA, T., AND OKAMOTO, Y.: 'Electrostatic ef-
N 3090.
fects on the a-helix and 13-strand folding of BPTI(16- [~3] OSTERHOUT, J.J., BALDWIN, R.L., YORK, E.J.,
36) as predicted by Monte Carlo simulated annealing', STEWART, J.M., DYSON, H.J., AND WRIGHT, P.E.:
J. Peptide Res. 54 (1999), 230-236. 'IH NMR studies of the solution conformations of an
[40] NAKAZAWA, T., OKAMOTO,Y., KOBAYASHI, Y., KYO- analogue of the C-peptide of ribonuclea.se A', Biochem-
GOKU, Y., AND AIMOTO, S., in preparation. istry 28 (1989), 7059-7064.
[41] NI~METHY, G., POTTLE, M.S., AND SCHERAGA, H.A." [54] RAMSTEIN, J., AND LAVERY, R.: 'Energetic coupling
'Energy parameters in polypeptides. 9. Updating of ge- between DNA bending and base pair opening', Proc.
ometrical parameters, nonbonded interactions, and hy- Nat. Acad. Sci. USA 85 (1988), 7231-7235.
drogen bond interactions for the naturally occurring [55] SAYLE, R.A., AND MILNER-WHITE, E.J.: 'RasMol:
amino acids', J. Phys. Chem. 87 (1983), 1883-1887. biomolecular graphics for all', TIBS 20 (1995), 374-
[42] NILGES, M., CLORE, G.M., ANDGRONENBORN, A.M.: 376.
'Determination of three-dimensional structures of pro- [56] SHOEMAKER, K.R., KIM, P.S., BREMS, D.N., MAR-
teins from interproton distance data by hybrid distance QUSEE, S., YORK, E.J., CHAIKEN, I.M., STEWART,
geometry-dynamical simulated annealing calculations', J.M., AND BALDWIN, R.L.: 'Nature of the charged-
FEBS Lett. 229 (1988), 317-324. group effect on the stability of the C-peptide helix',
[43] OKAMOTO, Y.: 'Dependence on the dielectric model Proc. Nat. Acad. Sci. USA 82 (1985), 2349-2353.
and pH in a synthetic helical peptide studied by Monte [57] SIPPL, M.J., NI3METHY, G., AND SCHERAGA, H.A."
Carlo simulated annealing', Biopolymers 34 (1994), 'Intermolecular potentials from crystal data. 6. Deter-
529-539. mination of empirical potentials for O-H... O - C hy-
[44] OKAMOTO, Y.: 'Helix-forming tendencies of nonpolar drogen bonds from packing configurations', J. Phys.
amino acids predicted by Monte Carlo simulated an- Chem. 88 (1984), 6231-6233.
nealing', PROTEINS: Struct. Funct. Genet. 19 (1994), [58] WILTON, JR., R.F., DEWAN, J.C., AND PETSKO, G.A.:
14-23. 'Effects of temperature on protein structure and dy-
[45] OKAMOTO, Y.: 'Protein folding problem as studied namics: X-ray crystallographic studies of the protein
by new simulation algorithms', Recent Res. Developm. ribonuclease-A at nine different temperatures from 98
Pure Appl. Chem. 2 (1998), 1-23. to 320 K', Biochemistry 31 (1992), 2469-2481.
[46] OKAMOTO, Y., FUKUGITA, M., NAKAZAWA, T., AND [59] WESSON, L., AND EISENBERG, D.: 'Atomic solvation
KAWAI, H.: 'a-helix folding by Monte Carlo simulated parameters applied to molecular dynamics of proteins
annealing in isolated C-peptide of ribonuclease A', Pro- in solution', Protein Sci. 1 (1992), 227-235.
tein Engin. 4 (1991), 639-647. [6o] WETLAUFER, D.B.: 'Nucleation, rapid folding, and
[47] OKAMOTO, Y., AND HANSMANN, U.H.E.: 'Thermody- globular intrachain regions in proteins', Proc. Nat.
namics of helix-coil transitions studied by multicanon- Acad. Sci. USA 70 (1973), 697-701.
ical algorithms', J. Phys. Chem. 99 (1995), 11276- [61] WILSON, C., AND DONIACH, S.: 'A computer model
11287. to dynamically simulate protein folding: studies with
[48] OKAMOTO, Y., HANSMANN, U.H.E., AND NAKAZAWA, crambin', PROTEINS: Struct. Funct. Genet. 6 (1989),
T.: 'a-Helix propensities of amino acids studied by 193-209.
multicanonical algorithm', Chem. Lett. (1995), 391- [62] WILSON, S.R., AND CUI, W.: 'Conformation searching
392. using simulated annealing': The Protein Folding Prob-
[49] OKAMOTO, Y., KIKUCHI, T., AND KAWAI, H.: 'Pre- lem and Tertiary Structure Prediction, Lecture Notes,
diction of low-energy structures of Met-enkephalin by Birkh~.user, 1994, pp. 43-70.
Monte Carlo simulated annealing', Chem. Left. (1992), [63] WILSON, S.R., CuI, W., MOSKOWITZ, J.W., AND
1275-1278. SCHMIDT, K.E.: 'Conformational analysis of flexible
[50] OKAMOTO, Y., KIKUCHI, T., NAKAZAWA, T., AND molecules: location of the global minimum energy con-
KAWAI, H.: 'a-Helix structure of parathyroid hormone formation by the simulated annealing method', Tetra-
438
Monte-Carlo simulations for stochastic optimization
hedron Lett. 29 (1988), 4373-4376. Here, ~ is the vector of random elements from h, q,
[64] WYCHOFF, H.W., TSERNOGLOU, D., HANSON, A.W., T, and W. A prototypical problem of this nature
KNOX, J.R., LEE, B., AND RICHARDS, F.M.: 'The
is a capacity allocation model under uncertain de-
three-dimensional structure of ribonuclease-S', J. Biol.
Chem. 245 (1970), 305-328. mand and/or capacity availabilities, x is a strate-
gic decision allocating resources while y represents
Yuko Okamoto
an operational recourse decision that is made after
Dept. Theoret. Stud. Inst. Molecular Sci.
and observing the demand and availabilities. Example
Dept. Functional Molecular Sci. applications of this type include capacity expan-
Graduate Univ. Adv. Stud. sion planning in an electric power system [16] and
Okazaki, Aichi 444-8585, Japan in a telecommunications network [61]. The two-
E-mail address: okamotoy©ims, ac. jp
stage model generalizes to a more dynamic, multi-
MSC 2000:92C40 stage model (see, e.g., [10]) in which decisions are
Key words and phrases: simulated annealing, protein fold- made, and random events unfold, over time. For
ing, tertiary structure prediction, c~-helix, f~-sheet.
multistage applications in asset-liability manage-
ment see [13] and in hydro-electric scheduling see
[39].
MONTE-CARLO SIMULATIONS FOR STO-
In the context of a simulation model, ](x,~)
CHASTIC OPTIMIZATION
could represent a performance measure under a
Many important real-world problems contain sto-
design specified by x. For example, f ( x , ~) might
chastic elements and require optimization. Sto-
represent the number of hours in a workday that a
chastic programming and simulation-based optimi-
critical machine is blocked in a queueing network
zation are two approaches used to address this
model of a manufacturing system in which buffer
issue. We do not explicitly discuss other related
sizes are determined by x. In another application,
areas including stochastic control, stochastic dy-
E.L. Plambeck et al. [53] allocate constrained pro-
namic programming, and Markov decision pro-
cessing rates to unreliable machines with buffers
cesses. We consider a stochastic optimization prob-
in a fluid serial queueing network in order to max-
lem of the form
imize steady-state throughput. In nonterminating
(SP) z* = minEf(x,~), simulations, the expectation in El(x, ~) is typically
xEX
with respect to a steady-state distribution.
where x is a vector of decision variables with de-
terministic feasible region X C R d, ~ is a random Note that Ef(x,~) can capture objectives not
usually thought of as a 'mean'. For example, if c
vector, and ] is a real-valued function with finite
represents random rates of return and x invest-
expectation, Ef(x,~), for all x E X. We use x* to
ment amounts, we might want to maximize the
denote an optimal solution to (SP). Note that the
decision x must be made prior to observing the probability of exceeding a return threshold, T. We
realization of ~. can write P(cx > T) - EI(cx > T) where I(.)
is the indicator ]unction that takes value one if
A wide variety of types of problems can be ex-
its argument is true and zero otherwise. For more
pressed as (SP) depending on the definitions of
on probability maximization models (and general-
f and X. Two of the most commonly-used ap-
izations of (SP) in which X contains probabilistic
proaches are rooted in mathematical programming
constraints) see [54]. See [45] for a discussion of
and in discrete-event simulation modeling.
risk modeling in stochastic optimization.
In a two-stage stochastic linear program with
A more general model than (SP) allows the dis-
recourse [6], [14], X is a polyhedral set and f is
tribution of ~ to depend on x. Some simple types
defined as the optimal value of a linear program,
of dependencies can effectively be captured in (SP)
given x and ~, i.e.,
via modeling tricks, such as the x scaling random
min qy elements of T in (1). General dependencies, how-
f ( x , ~ ) - cx + v>_o (1) ever, are difficult to handle. For work on decision-
s.t. Wy- T x + h.
dependent distributions when there are a finite
439
Monte-Carlo simulations .for stochastic optimization
and review technique) problem with 70 nodes and (SPn) zn rain 1 f(x,~i).
xEX n
i--1
110 stochastic arcs. The arcs model the times re-
quired to complete activities and a decision vari- Even when it is possible to construct (SPn) us-
able associated with each arc influences (param- ing [.i.d. variates, it may be preferable to use an-
eterizes) the distribution of the random activity other sampling scheme in order to reduce the vari-
duration. These problems contain objectives with ance of the resulting estimators. Moreover, in non-
high-dimensional expectations and all were solved terminating simulation models, generating [.i.d.
using Monte-Carlo methods. replicates from a stationary distribution is often
In this article we discuss: impossible (for exceptions see recent work on ex-
act sampling, e.g., [3], [22]), but under appropriate
several types of Monte-Carlo-based solution
conditions we may run the simulation for a length
procedures that can be used for solving (SP);
n and replace the objective function in (SPn) with
ii) methods for testing the quality of a candidate a consistent estimate of the desired long-run aver-
solution ~ E X; age performance measure.
iii) variance reduction techniques used in sto- After constructing an instance of (SPn) we em-
chastic optimization; and ploy a (deterministic) optimization algorithm to
iv) theoretical justification for using sampling. obtain a solution x n.* In the case of stochastic lin-
440
Monte-Carlo simulations .for stochastic optimization
ear programming, (SPn) is a large scale linear pro- the estimates need not be unbiased but the bias
gram. The cutting plane algorithm of R.M. Van must effectively shrink to zero as the algorithm
Slyke and R.J-B. Wets [64], its variant with a qua- proceeds. For convergence properties of SA meth-
dratic proximal term [58], and its multistage ver- ods see [49] and for SQG procedures see [23].
sion [7], [9] are powerful tools for solving such prob- Cutting plane methods are applicable when
lems. A cutting plane algorithm with a proximal Ef(x,~) is convex. The iterates {x e} are found by
term and IPA-based gradients is used in an exter- solving a sequence of optimization problems of the
nal sampling method for solving the queueing net- form
work problem in [53]. See [8] for a recent survey
min max Ef(x e, ~) + V E f ( x e, ~)(x - xe),
of computational methods for stochastic program- x E X ~=I,...,L
ming instances of (SPn). where L grows as the algorithm proceeds. At
Intuitively, we might expect solutions of (SPn) each iteration a first order Taylor approximation
to more accurately approximate solutions of (SP) of Ef(x,~), i.e., a cutting plane, is computed at
as n increases. We discuss results supporting this the current iterate x g and is used to refine the
in the section 'Theoretical Justification for Sam- piecewise-linear outer approximation of Ef(x,~).
pling'. In addition, after having solved (SPn) to The key idea is that this approximation need
obtain x n* it would be desirable to know whether n only be accurate in the neighborhood of an opti-
was 'large enough'. More generally, we would like mal solution. For stochastic linear programs, G.B.
to be able to test the quality of a candidate so- Dantzig, P.W. Glynn [15], and G. Infanger [37],
lution (such as x~). This is discussed in the next [38] and J.L. nigle and S. Sen [32], [34] have de-
section. veloped Monte-Carlo-based cutting plane methods
We now turn to solution procedures based on by using statistical estimates for the cut intercepts
internal sampling. These algorithms adapt deter- and gradients. Dantzig, Glynn, and Infanger use
ministic optimization algorithms by replacing ex- separate streams of observations of ~ to estimate
act function and gradient evaluations with Monte- each cut. The stochastic decomposition algorithm
Carlo estimates. The sampling is internal because of Higle and Sen uses common random number
new observations of ~ are generated on an as- streams to calculate each cut and employs an up-
needed basis at each iteration of the algorithm. dating procedure to ensure that the statistical cuts
We briefly discuss stochastic adaptations of steep- are asymptotically valid (i.e., lie below Ef(x,~)).
est descent and cutting plane methods. Relative to SA and SQG methods, cutting plane
A deterministic steepest descent algorithm for procedures avoid potentially difficult projections
(SP) forms iterates {x g} using the recursion and, in practice, have a reputation for converging
more quickly, particularly when X is high dimen-
-nx sional.
Grid search and optimization of metamodels are
1-Ix performs a projection onto X and {pe} are two common approaches to optimizing system per-
steplengths. It is usually impossible to calculate formance in discrete-event simulation models. In
VEf(x,~) exactly and it must be estimated. Sto- grid search, X is replaced by a 'grid' of points
chastic approximation (SA) and stochastic quasi- X m - - { x l , . . . , x m } and sample-mean estimates
gradient (SQG) algorithms are stochastic variants n
1
of a steepest descent search. The Kei]er-Wolfowitz
Z #)
SA method uses unbiased estimates of Ef(x,~) to i=1
form finite-difference approximations of the gra- are formed at each x E Xm. (SP) is then approx-
dient. The Robbins-Monro SA procedure requires imately solved by z n - minxexm fn(X) with x n
unbiased estimates of VEf(x, ~). SQG methods do being the associated minimizer. Grid search is at-
not require that El(x, ~) be differentiable and work tractive because it requires minimal structure, but
under more general assumptions concerning the es- in implementing this procedure, we must exercise
timates of (sub)gradients of Ef (x, ~). In particular, care in selecting m and n. With independent sam-
441
Monte-Carlo simulations for stochastic optimization
piing at each grid point, K.B. Ensor and Glynn [21] almost surely optimal to (SP). Next, it is desirable
consider the rate at which n must grow relative to to have a statement regarding the rate of conver-
m in order to achieve consistency and they also gence and an associated asymptotic distribution.
discuss the method's limiting behavior when the These consistency and limiting distribution results
rate of growth is at (and slower than) the critical are aimed at justifying sampling-based methods
rate. and may be viewed as establishing solution qual-
A metamodel can be used to approximate a ity. However, the approach discussed in this sec-
more complex simulation model which, in turn, tion centers on the question: Given a candidate
is an approximation of the real system. In such solution ~ E X, what can be said regarding its
a metamodel, estimates of Ef(x,~) are formed at quality? Because candidate solutions may be ob-
each point in a set specified by an experimental tained by internal or external sampling schemes
design, and the parameters of the postulated re- or via another, heuristic, method, procedures that
sponse surface are fit to these observed values. The can directly test the quality of ~, regardless of its
resulting function is then optimized with respect origin, are very attractive.
to x. For more on metamodels see, e.g., [11], [47]. One natural way of defining solution quality is
The review in [25] includes optimization using re- by the optimality gap, Ef(~,~) - z*. An optimal
sponse surfaces, and metamodeling has also been solution has an optimality gap of zero, but in our
applied in stochastic programming [5]. setting we hope to make probabilistic statements
The grid-search and metamodel approaches are such as
classified as external sampling procedures if the
P{Ef(~,~)- z* < e} > c~, (2)
procedure is executed once. However, it may be
desirable to refine the grid (or the region covered where e is a random confidence interval width and
by the experimental design) in the neighborhood c~ is a confidence level, e.g., c~ - 0.95. Unfortu-
of promising values of x and repeat the methodol- nately, exact confidence intervals such as (2) can
ogy. When it is adaptively repeated in this fashion be difficult to obtain even in relatively simple sta-
the procedure is classified as an internal sampling tistical settings so we attempt to construct approx-
method. imate confidence intervals
We have not explicitly discussed approaches for P { E f ( ~ , ~) - z* _< e} ~ c~. (3)
when X is discrete. These range from methods for
selecting the best design in simulation to those To form a confidence interval (3) for Ef (~, ~) -
for solving stochastic integer programming mod- z* we estimate the mean of a gap random variable
els. Finally, sampling-based procedures for multi- Gn - Un - - L n that is expressed as the difference
stage stochastic programs have been proposed in between upper and lower bound estimators and
[17]. satisfies EGn >_ E J ' ( ~ , ~ ) - z*.
In many problems it is relatively straight-
Establishing Solution Quality. Establishing forward to estimate the performance of a sub-
solution quality is a key concept when using an optimal decision ~ via simulation. For exam-
approximation scheme to solve an optimization ple, the standard sample mean estimator, (-In -
problem. When applying Monte-Carlo techniques n1 }-]i=1
n
f(x, ~i), provides an unbiased estimate of
to (SP), the best we can expect are probabilistic the expected cost of using decision ~, i.e., Ef(~, ~).
quality statements. In the context of external sam- To construct a confidence interval for the opti-
pling, there has been significant work on studying mality gap we also want an estimate of z*. How-
the behavior of solutions to (SPn) for large sam- ever, unbiased estimates of z* are difficult to ob-
ple sizes (see the last section). There are analogous tain so an estimator Ln that satisfies ELn <_ z*
convergence results for algorithms based on inter- is used. In [51] it is shown that if the objective
nal sampling. Such results take a number of forms in (SPn) is an unbiased estimate of Ef(x,~) then
but perhaps the most fundamental is to show that Ez n _< z*, i.e., z n* is one possible lower bound esti-
limit points of the sequence of solutions are, say, mator Ln. Higle and Sen [33] perform a Lagrangian
442
Monte-Carlo simulations for stochastic optimization
relaxation of a reformulation of (SPn) which uses structures of f(x,~). Suppose that we have rx(~),
explicit 'nonanticipativity' constraints. The result- with known mean #r, which is believed to ap-
ing lower bound is weaker in expectation than z n proximate (be positively correlated with) f(x,~).
but has the computational advantage that the op- In CVs we attempt to 'subtract out' variation by
timization problem separates by scenario. generating observations of [ f ( x , ~ ) - Fx(~)] + #r,
Once observations of Gn can be formed, we can which has the same expectation as f(x,~). (It
appeal to the batch means method and use the is common to incorporate a multiplicative fac-
central limit theorem [51], or a nonparametric ap- tor with the control variate Fx(~) and also pos-
proach [31], [33], to construct approximate con- sible to use multiple controls.) In IS we attempt
fidence intervals (3). Another approach to exam- to reduce variance by generating observations of
ining solution quality is to test the null hypoth- #r [f (z, ~)/Fx(~)]. In CVs observations of ~ are
esis that the (generalized) Karush-Kuhn-Tucker generated from its original distribution. However,
(KKT) optimality conditions are satisfied; see [63]. in IS the expected value of the ratio is not the ratio
Higle and Sen [31] also consider the KKT condi- of expectations and, as a result, there is a change
tions but use them to derive bounds on the opti- of measure induced by Fx that is required to yield
mality gap. an unbiased estimate. Under the new IS distribu-
tion, we are more likely to sample ~ where Fx(~) is
V a r i a n c e R e d u c t i o n T e c h n i q u e s . When apply- large, i.e., scenarios that our approximation func-
ing the 'crude' Monte-Carlo method to estimate tion predicts have high cost. In an IS scheme for
Ef(x,~) for fixed x, we use the standard sample stochastic linear programs, [15], [37] use an ap-
mean estimator based on i.i.d, terms, proximation function that is separable in the com-
n
ponents of ~ while [48] utilizes a piecewise-linear
1 E f(x ~.i). approximation. See [12] for the solution of a sto-
n
i=1 chastic optimization problem to price American-
The error associated with this estimate is propor- style financial options using the simpler European
tional to option as a control variate. These papers report
significant variance reduction in computational re-
[var f (x, ~) ] 1/2 sults.
. (4)
Other VRTs exploit correlation structures in the
This error can be decreased by increasing the sam- solution methodology. Common random numbers
ple size. However, obtaining an additional digit of (CRNs) are often used in simulation when com-
accuracy requires increasing the sample size by paring the performance of two systems. The use of
a factor of 100. If f is defined as the optimal CRNs has been suggested in a stochastic approx-
value of a mathematical program or as the per- imation method with finite differences where the
formance measure of a simulation model, increas- same stream is used for the forward and backward
ing the number of evaluations of f in this fashion point estimates [50]. The upper and lower bounds
can be prohibitively expensive. Variance reduction used to determine solution quality (see the previ-
techniques (VRTs) effectively decrease the numer- ous section) may be viewed as two 'systems' and
ator in (4) instead of increasing the denomina- the use of CRNs in estimating their difference has
tor. Many problems for which crude Monte-Carlo been advocated in [34], [51]. In order to reduce
would yield useless results are instead made com- the error in the resulting response surface, various
putationally tractable via VRTs. As described in methods have been proposed for generating the
the section 'Solution Procedures', sampling is also streams of observations of ~ at each point in the ex-
used to estimate VEf(x,~), but for simplicity we perimental design. The Schruben-Margolin scheme
primarily restrict our attention to VRTs for esti- [59] uses a mixture of CRNs and antithetic variates
mating El(x, ~). and an extension [65] also incorporates CVs.
Some VRTs, including control variates (CVs) Another group of VRTs attempts to more reg-
and importance sampling (IS), exploit special ularly spread the sampled observations over the
443
Monte-Carlo simulations .for stochastic optimization
444
Monte-Carlo simulations for stochastic optimization
sumptions, asymptotic normality for V~(z n - z*) ZIEMBA, W.T.: 'The Russell-Yasuda Kasia model: An
and x / ~ ( x ~ - x*) may be obtained, e.g., [19]. How- asset/liability model for a Japanese insurance company
using multistage stochastic programming', Interfaces
ever, when inequality constraints in X play a non-
24 (1994), 29-49.
trivial role we cannot, in general, expect to ob- [14] DANTZIG, G.B.: 'Linear programming under uncer-
tain limiting distributions that are normal [44], tainty', Managem. Sci. 1 (1955), 197-206.
[62], [18]. See [44] for a limiting distribution for [15] DANTZIG, G.B., AND GLYNN, P.W.: 'Parallel proces-
v ~ ( x ~ - x*) that is the solution of a (random) sors for planning under uncertainty', Ann. Oper. Res.
quadratic program. 22 (1990), 1-21.
[16] DANTZIG, G.B., GLYNN, P.W., AVRIEL, M., STONE,
See also: M o n t e - C a r l o s i m u l a t e d a n n e a l i n g J.C., ENTRIKEN, R., AND NAKAYAMA, M.: 'Decompo-
in protein folding. sition techniques for multi-area generation and trans-
mission planning under uncertainty', Report Electric
References Power Res. Inst., no. EPRI 2940-1 (1989).
[1] AtTCHISON, J., AND SILVEY, S.D.: 'Maximum- [17] DEMPSTER, M.A.H., AND THOMPSON, R.T.: 'EVPI-
likelihood estimation of parameters subject to re- based importance sampling solution procedures for
straints', Ann. Math. Statist. 29 (1958), 813-828. multistage stochastic linear programmes on parallel
[2] ARTSTEIN, Z., AND WETS, R.J.-B.: 'Stability results MIMD architectures', Ann. Oper. Res. 90 (1999), 161-
for stochastic programs and sensors, allowing for dis- 184.
continuous objective functions', SIAM J. Optim. 4 [18] DUPACOVA, J." 'On non-normal asymptotic behavior
(1994), 537-550. of optimal solutions for stochastic programining prob-
[3] ASMUSSEN~ S., GLYNN, P.W.~ AND THORISSON~ H.: lems and on related problems of mathematical statis-
'Stationary detection in the initial transient problem', tics', Kybernetika 27 (1991), 38-51.
A CM Trans. Modeling and Computer Simulation 2 [19] DUPA(~OVJ~, J., AND WETS, R.J.-B." 'Asymptotic be-
(1992), 130-157. havior of statistical estimators and of optimal solutions
[4] ATTOUCH, H., AND WETS, R.J.-B.: 'Approxima- of stochastic optimization problems', Ann. Statist. 16
tion and convergence in nonlinear optimization', in (1988), 1517-1549.
O. MANGASARIAN, R. MEYER, AND S. ROBINSON [20] EDIRISINGHE, C., AND ZIEMBA, W.W.: 'Implement-
(eds.): Nonlinear Programming, Vol. 4, Acad. Press, ing bounds-based approximations in convex-concave
1981, pp. 367-394. two-stage stochastic programming', Math. Program. 75
[5] BAILEY, T.G., JENSEN, P.A., AND MORTON, D.P.: (1996), 295-325.
'Response surface analysis of two-stage stochastic lin- [21] ENSOR, K.B., AND GLYNN, P.W.: 'Stochastic optimi-
ear programming with recourse', Naval Res. Logist. 46 zation via grid search', in G.G. YIN AND Q. ZHANG
(1999), 753-778. (eds.): Mathematics o] Stochastic Manufacturing Sys-
[6] BEALE, E.M.L.: 'On minimizing a convex function sub- tems, Vol. 33 of Lect. Applied Math., Amer. Math. Sou.,
ject to linear inequalities', J. Royal Statist. Soc. 17B 1997, pp. 89-100.
(1955), 173-184. [22] ENSOR, K.B., AND GLYNN, P.W.: 'Simulating the
[7] BINGE, J.R.: 'Decomposition and partitioning meth- maximum of a random walk', J. Statist. Planning In-
ods for multistage stochastic linear programs', Oper. ference 85 (2000), 127-135.
Res. 33 (1985), 989-1007. [23] ERMOLIEV, Y.: 'Stochastic quasigradient methods', in
[8] BIRGE, J.R.: 'Stochastic programming computation Y. ERMOLIEV AND R.J.-B. WETS (eds.): Numeri-
and applications', INFORMS J. Comput. 9 (1997), cal Techniques for Stochastic Optimization, Springer,
111-133. 1988, pp. 141-185.
[9] BINGE, J.R., DONOHUE, C.J., HOLMES, D.F., AND [24] FRAUENDORFER, K.: Stochastic two-stage program-
SVINTSITSKI, O.G.: 'n parallel implementation of the ming, Vol. 392 of Lecture Notes Economics and Math.
nested decomposition algorithm for multistage stochas- Systems, Springer, 1992.
tic linear programs', Math. Program. 75 (1996), 327- [25] Fu, M.C.: 'Optimization via simulation: A review',
352. Ann. Oper. Res. 53 (1994), 199-248.
[10] BIRGE, J.R., AND LOUVEAUX, F.: Introduction to sto- [26] FUTSCHIK, A., AND PFLUG, G.CH.: 'Optimal alloca-
chastic programming, Springer, 1997. tion of simulation experiments in discrete stochastic
[11] Box, G.E.P., AND DRAPER, N.R.: Empirical model- optimization and approximative algorithms', Europ. J.
building and response surfaces, Wiley, 1987. Oper. Res. 101 (1997), 245-260.
[12] BROADIE, M., AND GLASsERMAN, P.: 'Pricing [27] GLASsERMAN, P.: Gradient estimation via perturbation
American-style options using simulation', J. Econom. analysis, Kluwer Acad. Publ., 1991.
Dynam. Control 21 (1997), 1323-1352. [28] GLYNN, P.W.: 'Optimization of stochastic systems via
[13] CARINO, D.R., KENT, T., MEYERS, D.H., STACY, C., simulation': Proc. 1989 Winter Simulation Conf., 1989,
SYLVANUS, M., TURNER, A.L., WATANABE, K., AND
445
Monte-Carlo simulations for stochastic optimization
446
Motzkin transposition theorem
447
Motzkin transposition theorem
being infeasible. By Principle A this happens if has no solutions. By the M T T this is the case if
and only if there exist nonnegative vectors y and and only if at least one of the systems
v and a nonnegative scalar A such that l y T A - - yoc -- O, yTb-- yoz > O,
(Wlz)
(yTA+vTB--~c) x>yTa+vTb--~z
t y>0, y0>0
448
Motzkin transposition theorem
This is the duality theorem ]or linear optimization. To prove the MTT, one derives from Farkas'
Note that one other case may occur, namely that lemma that the 'weaker' system
both problems are infeasible. It became clear above (S1) A x >_ a, B x >_ b
that ( ( P ) ) i s infeasible if and only if ((Wl~z)) has a
is infeasible if and only if the system ((T1)) has a
solution, so
solution. If ((S1)) is feasible then one easily ver-
the primal problem ((P)) is infeasible if ifies that ((S)) has no solution if and only if the
and only if there exists a dual ray y, i.e., optimal value of the problem
a vector y such that
(P1) min{u" A x > a, B x + ue >_ b}
yTA--O, yTb>o, y>O. (3)
is a nonnegative real. Here e denotes the all-one
In fact, the latter statement is equivalent to the vector. Since ((P1)) is feasible and below bounded,
statement that (3) and A x > b are alternative sys- by the duality theorem this happens if and only if
tems, which is the special case of the MTT oc- the optimal value of the dual problem
curring when B is vacuous and which is known as
Farkas' lemma. (See L i n e a r o p t i m i z a t i o n : T h e - (D1) max aTy+bTv • eTv -- 1,
o r e m s of t h e a l t e r n a t i v e and F a r k a s l e m m a . ) y>0, v_>0
In just the same way it can be derived from a vari- is a nonnegative real and, finally, this occurs if and
ant of Farkas' lemma that: only if ((T2)) has a solution. Thus it has been
the dual problem ((D)) is infeasible if shown that the MTT is logically equivalent to the
and only if there exists a primal ray x, duality theorem for linear optimization.
i.e., a vector x such that So far the issue of how to prove the MTT has not
A x >_ O, cT x < O. (4) been touched. One possible approach is to prove
the duality theorem for linear optimization and
It has been shown above that the MTT implies then derive the M T T in the above described way.
the duality theorem for linear optimization. The This approach is now quite popular in text books.
converse is also true: Assuming the duality theo- For a recent example see, e.g., [2]. The easiest way
rem for linear optimization, the MTT easily can for a direct proof is to prove first the Farkas' lemma
be proved, showing that the two results are logi- and then derive the MTT from this lemma. The
cally equivalent. This goes in two steps. Assuming latter step uses the easy to verify statement that
the duality theorem for linear optimization, first ((S)) has no solution if and only if the system
one derives Farkas' lemma and then it is shown
that the MTT follows. To derive Farkas' lemma,
Ax- ta > 0,
consider the problem B x - t b - se > 0,
t-s>_O,
min{OTx'Ax>_b}.
-s<0
Clearly, the system A x >_ b has a solution if and
only if the optimal value of this problem is zero. has no solution. Application of a suitable variant
By the duality theorem this holds if and only if of Farkas' lemma to this system yields the MTT.
the optimal value of the dual problem Farkas' lemma and its proof have a rich history;
for a nice and detailed survey one might consult
max{bTy " y T A - 0 , y_>0} [3]
is also zero. This holds if and only See also: M u l t i - i n d e x t r a n s p o r t a t i o n p r o b -
lems; M i n i m u m concave transportation
y T A -- O, y > O ~ bTy <_ O,
p r o b l e m s ; S t o c h a s t i c t r a n s p o r t a t i o n a n d lo-
which is true if and only if the system cation problems; Linear optimization: The-
o r e m s of t h e a l t e r n a t i v e ; L i n e a r p r o g r a m -
y T A -- O, y >_ O, bT y > O
ruing; F a r k a s l e m m a ; T u c k e r h o m o g e n e o u s
has no solution, proving Farkas' lemma. s y s t e m s of l i n e a r r e l a t i o n s .
449
Motzkin transposition theorem
450
Multi-index transportation problems
note the Cartesian product of these index sets, the same number k of axial and planar demand
that is, the set of all joint indices (k-tuples) a = constraints; however there are only ~-~icK ]AiI ax-
(a(1),...,a(k)) with a ( i ) C Ai for all i E K. ial constraints, versus ~-~i~K YIs~K\{i} ]As] planar
One variable Xa is associated with each joint index constraints. Of course, it is possible to combine
a C A. Thus, for example in a 3ITP with index sets demand constraints with different values of m, so
I, J and L, the variable xa stands for xije when as to formulate different types of restrictions (e.g.,
the joint index is a = (i, j, t~). see [5] and [16]).
Given unit costs Ca C R for all a C A, a linear Reductions between MITPs are presented in
objective function is [16], where it is shown in particular that an m-fold
kITP can be reduced to a l-fold kITP for any m
min ~-~ CaXa (1)
aCA (with 0 < m < k), thereby generalizing a result in
[14]. Thus, an algorithm that solves planar kITPs
and the variables are usually restricted to be non-
is in principle capable of solving m-fold kITPs for
negative:
any m (with 0 < m < k).
Xa>_O for a l l a C A . (2) Notice that any M I T P with arbitrary right hand
Given the integer m with 0 < m < k, the de- sides can be transformed to a MITP with right
mand constraints of the m-fold kITP are defined hand sides 1. This is a (pseudopolynomial) trans-
as follows. Let (kKm) denote the set of all ( k - re)- formation and simply involves duplicating a re-
source with a supply of q units by q unit-supply
element subsets of K; an F C (kKm) is interpreted
resources. There seems to be little advantage in
as a set of k - m 'fixed indices'. Given such an F
doing so, except perhaps in converting an integer
and a ( k - m)-tuple g E AF -- ®I~FA I of 'fixed
M I T P into one with 0-1 variables.
values', let
Another issue is the existence of feasible solu-
A(F,g) = {a C A" a(f) - g(f), V I E F} tions. For an axial M I T P the requirement of equal
be the set of k-tuples which coincide with g on the total demands ~~g dig - ~-~g dig for all i,j C K
fixed indices. The m-fold demand constraints are is a necessary and sufficient condition for the ex-
istence of feasible solutions. Feasibility conditions
• ° (3)
-
451
Multi-index transportation problems
(at identical cost) to all sources and destinations, therein). To model this as a planar 3ITP, let A1
while a constraint with F = {2, 3} will be used if be the set of employees; A2 the set of tasks; A3 the
there a r e dFg vehicles of type g(3) available at the set of time periods;
different sources g(2).
rjk forF-{2,3}, Vg-(j,k);
Interesting cases arise when each resource or fac-
tor g E Ai corresponds to a point Pi,t in a metric dFg-- 1 forE-{I,3}, Vg-(i,k);
space, i.e., a set with a distance 5, and the unit rij forF-{1,2}, Vg-(i,j);
costs Ca are 'decomposable' as defined below. Each
joint index a E A may be interpreted as a cluster and require the decision variables to be in {0, 1}.
of points among which transportation and other A special case arises when rjk = 1 for all j, k and
activities are conducted. The unit cost ca reflects N = M. The polyhedral structure of the resulting
the within-cluster transportation costs associated planar 3ITP is investigated in [7]. Other references
with these activities; it is decomposable if it can dealing with timetabling problems formulated as
be expressed as a function of the distances be- M I T e s are [15], [10] and [12].
tween pairs of points in the cluster a. Examples Multitarget Tracking. Consider the following (ide-
include the diameter maxi,j 5(Pi,a(i) , Pj,a(j)), when alized) situation. N objects move along straight
all these activities are performed simultaneously; lines in the plane. At each of T time instants a scan
the sum costs ~i,j 5(Pi,a(i), Pj,a(j)) when all activi- has been made, and the approximate position of
ties are performed sequentially; and the Hamilton- each object is observed and recorded. From such a
ian path or path costs, when all points Pit in the scan it is not possible to deduce which object gen-
cluster have to be visited in a shortest sequence. erated which observation. Also, a small error may
Other interesting cases arise when one of the be associated with each observation. A track is de-
indices denotes time. A simple dynamic location fined as a T-tuple of observations, one from each
problem [27] may be modeled as an axial kITe, scan. For each possible track a cost is computed
where index set A1 may denote the set of facilities based on a least squares criterion associated with
(say, warehouses) to be located; A2 that of candi- the observations in the track. The problem is now
date locations; and A3 that of time periods. The to identify N tracks while minimizing the sum of
costs cijt may include discounted construction and the costs of these tracks. This problem is called the
operating costs of these facilities. See [38] and [33] data-association problem in [25]. It can be modeled
for other applications of this type. as an axial integer TIAP as follows: let Ai be the
set of observations in scan i, i = 1 , . . . , T, and let
Timetabling. Other problems involving time and di9 - 1, i - 1 , . . . , T , g - 1 , . . . , N . Not surpris-
which can be formulated as MITPs arise in ingly, this problem is NP-hard already for T = 3
timetabling or staffing applications. To illustrate, (see [37]; notice however that this does not follow
consider the following generic situation. Given are from the NP-hardness of 3IAP due to the struc-
N employees (index i), each of which can be as- ture present in the cost-coefficients in the objective
signed to one of M tasks (index j) during each of function of multitarget tracking problems). Other
T time periods (index k). Moreover, for each pair references dealing with target tracking problems
consisting of a task and a time period a number formulated as axial MIAPs are [23] and [24]; see
rjk is given denoting the number of employees re- also [20].
quired for task j in period k. Also, a number rij
is given denoting the number of periods that task Tables with Given Marginals. Other statistical
j requires employee i. An employee can only be applications of M I T e s require finding multidi-
assigned to one task during each time period. Fi- mensional tables with given sums across rows
nally, there is a cost-coefficient Cijk which gives the or higher-dimensional planes, as specified in con-
cost of employee i performing task j in period k. straints (3). The right-hand sides dFg of such con-
This problem is called the multiperiod assignment straints are often known as marginals. In a simple
problem in [21] (see also the references contained application [3] arising in the integration of surveys
452
Multi-index transportation problems
and controlled selection, each index set represents some medium-sized planar integer 3ITPs. A tabu
a population from which a sample is to be drawn. search algorithm for this problem is described in
A (joint) sample is a k-tuple, one from each popu- [18]. Heuristic solution approaches based on La-
lation. The marginals are specified marginal prob- grangian relaxation are proposed in [26], [28] and
ability distributions over each population, giving [29] for multitarget tracking problems.
rise to axial demand constraints. Given sample One major difficulty with these exact or ap-
costs Ca, the problem is to find a joint probability proximate solution methods may be the sheer
distribution, defined by (Xa), of all the samples, size of MITP formulations; if, for example, all
consistent with these marginal distributions and IAiI - n then an m-fold kITP has n k variables
of minimum expected cost (1). and (km)nk-m constraints. In contrast, the two ap-
In contrast, problems of updating input-output proaches sketched below yield feasible solutions to
matrices (see [34] and references therein) typically axial MITPs much more quickly than simply writ-
have nonlinear objectives. In such problems, given ing down all the cost coefficients. In particular,
are a k-dimensional array B of data (for exam- these algorithms only produce the nonzero vari-
ple, past input-output coefficients) and arrays d ables xa and their values; all other variables are
of marginals (for example, forecast aggregate co- zero in the solution. In addition, this solution is
efficients) with appropriate dimensions. The prob- integral if all demands are integral. Of course, the
lem is to determine values Xa, the updated array effectiveness of these methods relies on some as-
entries, satisfying the demand constraints corre- sumptions on the cost coefficients Ca, assumptions
sponding to the given marginals, and such that the which are verified in several applications.
resulting updated array X - (Xa) differs as little
A Greedy Algorithm .for Axial M I T P s . The greedy
as possible from the given array B, as specified by
algorithm below (a multi-index extension of the
an appropriate (nonlinear) objective function. A
N o r t h - W e s t corner rule) finds a feasible solution
(nonlinear) MITP arises when the values xa are
to axial MITPs in O(k Y~'~i IAil) time, which is (for
constrained to be nonnegative, a natural require-
fixed k) linear in the size of the demand data dig.
ment in many contexts.
This solution is in fact optimal if the cost coeffi-
Other Applications. include an axial integer 3ITP cients are known to satisfy a 'Monge property' [3],
model for planning the launching of weather satel- [31], [32] defined below. (For k - 3, this greedy
lites [27], and an axial integer 5IAP arising in rout- algorithm is already described in [4] to obtain a
ing meshes in circuit design [9]. basic feasible solution).
Consider the axial kITP with equality con-
S o l u t i o n M e t h o d s . As noted above, MITPs are straints (3) and assume that each Ai =
linear programming problems with a special struc- { 1 , . . . , IAi]}. Recalling that the demands are de-
ture. There are several proposals for extensions noted dig, a s s u m e that EgEAi dig -- EgEA1 dig for
all i E K, a necessary and sufficient condition for
of LP (transportation) algorithms to MITPs (e.g.,
[13], [4] for 3ITPs and [1] for a 4ITP). the problem to be feasible.
As also mentioned earlier, integer MITPS are PROCEDURE greedy MITP algorithm
hard to solve. Exact algorithms have been pro- WHILE (~-']~geA~d~9 > 0 for all i E K) DO
let a(i) - min{g E A," d,9 > 0};
posed for the axial integer 3IAP (see T h r e e - i n d e x
let A = min{di,a(i)" i E K};
a s s i g n m e n t p r o b l e m ) and for the planar integer let xa = A;
3IAP (see [39] and [19]). Other exact approaches FOR i E K DO let d~,~(z) - d~,~(~)- A;
for integer MITPs rely on structure that is present RETURN x
in the particular application considered (see, e.g., END
[12]). A greedy algorithm for axial MITPs.
Several methods have been proposed to obtain
good approximate solutions to integer MITPs. In A Monge Property. The join a V b and meet a A b
[21] results are reported for a rounding heuristic on of a,b E A are
453
Multi-index transportation problems
454
Multi-index transportation problems
[2] BANDELT, H.-J., CRAMA,Y., AND SPIEKSMA, F.C.R.: the planar three-index assignment problem', Europ. J.
'Approximation algorithms for multidimensional as- Oper. Res. 77 (1994), 141-153.
signment problems with decomposable costs', Discrete [20] MAVRIDOU, T., PARDALOS, P.M., PITSOULIS, L., AND
Appl. Math. 49 (1994), 25-50. RESENDE, M.G.C.: 'A GRASP for the biquadratic
[3] BEIN, W.W., BRUCKER, P., PARK, J.K., AND assignment problem', Europ. J. Oper. Res. 105/3
PATHAK, P.K.: 'A Monge property for the d- (March 1998), 613-621.
dimensional transportation problem', Discrete Appl. [21] MILLER, J.L., AND FRANK, L.S.: 'A binary-rounding
Math. 58 (1995), 97-109. heuristic for multi-period variable-task duration assign-
[4] CORBAN, i . : ' i multidimensional transportation ment problems', Computers Oper. Res. 23 (1996), 819-
problem', Rev. Roumaine Math. Pures et Appl. IX 828.
(1964), 721-735. [22] MOTZKIN, T.: 'The multi-index transportation prob-
[5] CORBAN, i . : 'On a three-dimensional transportation lem', Bull. Amer. Math. Soc. 58 (1952), 494.
problem', Rev. Roumaine Math. Pures et Appl. XI [23] MURPHEY, R., PARDALOS, P.M., AND PITSOULIS, L.:
(1966), 57-75. 'A GRASP for the multitarget multisensor tracking
[6] CRAMA, Y., AND SPIEKSMA, F.C.R.: 'Approximation problem': DIMACS, Vol. 40, Amer. Math. Sou., 1998,
algorithms for three-dimensional assignment problems pp. 277-302.
with triangle inequalities', Europ. J. Oper. Res. 60 [24] MURPHEY, R., PARDALOS, P.M., AND PITSOULIS, L.:
(1992), 273-279. 'A parallel GRASP for the data association multidi-
[7] EULER, R., AND LE VERGE, H.: 'Time-tables, poly- mensional assignment problem': IMA Vol. Math. Appl.,
hedra and the greedy algorithm', Discrete Appl. Math. Vol. 106, Springer, 1998, pp. 159-180.
65 (1996), 207-222. [25] PATTIPATTI, K.R., DEB, S., BAR-SHALOM, Y., AND
[8] FAIGLE, U., AND KERN, W.: 'Submodular linear pro- WASHBURN JR., R.B.: 'Passive multisensor data asso-
grams on forests', Math. Program. 72 (1996), 195-206. ciation using a new relaxation algorithm', in Y. BAR-
[9] FORTIN, D., AND TUSERA, A.: 'Routing in meshes us- SHALOM (ed.): Multitarget-multisensor tracking: Ad-
ing linear assignment', in A. BACHEM, U. DERIGS, vances and applications, 1990, p. 111.
M. JONGER, AND R. SCHRADER (eds.): Oper. Res. '93, [26] PATTIPATTI, K.R., DES, S., BAR-SHALOM, Y., AND
1994, pp. 169-171. WASHBURN JR., R.B.: 'A new relaxation algorithm
[10] FRIEZE, A.M., AND YADEGAR, J.: 'An algorithm for passive sensor data association', IEEE Trans. A utom.
solving 3-dimensional assignment problems with appli- Control 3 7 (1992), 198-213.
cation to scheduling a teaching practice', J. Oper. Res. [27] PIERSKALLA, W.P.: 'The multidimensional assignment
Soc. 32 (1981), 989-995. problem', Oper. Res. 16 (1968), 422-431.
[11] GEETHA, S., AND VARTAK, M.N.: 'The three- [28] POORE, A.B.: 'Multidimensional assignment formula-
dimensional bottleneck assignment problem with ca- tion of data-association problems arising from multitar-
pacity constraints', Europ. J. Oper. Res. 73 (1994), get and multisensor tracking', Comput. Optim. Appl. 3
562-568. (I 994), 27-57.
[12] GILBERT, K.C., AND HOFSTRA, R.B.: 'An algorithm [29] POORE, A.B., AND RIJAVEC, N.: 'A Lagrangian relax-
for a class of three-dimensional assignment problems ation algorithm for multidimensional assignment prob-
arising in scheduling applications', IIE Trans. 8 (1987), lems arising fi'om multitarget tracking', SIAM J. Op-
29-33. tim. 3 (1993), 544-563.
[13] HALEY, K.B.: 'The solid transportation problem', [30] UEYRANNE, M., AND SPIEKSMA, F.C.R.: 'Approxi-
Q
Oper. Res. 10 (1962), 448-463. mation algorithms for multi-index transportation prob-
[14] HALEY, K.B.: 'The multi-index problem', Oper. Res. lems with decomposable costs', Discrete Appl. Math.
11 (1963), 368-379. 76 (1997), 239-253.
[15] JUNGINGER, W.: 'Zurfickfiihrung des Stundenplan- [31] QUEYRANNE, M., SPIEKSMA, F.C.R., ANDTARDELLA,
problems auf einen dreidimensionales Transportprob- F.: 'A general class of greedily solvable linear pro-
lem', Z. Oper. Res. 16 (1972), 11-25. grams', in G. RINALDI AND L. WOLSEY (eds.): Proc.
[16] JUNGINGER, W.: 'On representatives of multi-index Third IPCO Conf. (Integer Programming and Combi-
transportation problems', Europ. J. Oper. Res. 66 natorial Optimization), 1993, pp. 385-399.
(1993), 353-371. [32] QUEYRANNE, M., SPIEKSMA, F.C.R., AND TARDELLA,
[17] KARP, R.M.: 'Reducibility among combinatorial prob- F.: 'A general class of greedily solvable linear pro-
lems', in R.E. MILLER AND J.W. THATCHER (eds.): grams', Math. Oper. Res. (to appear).
Complexity of Computer Computations, Plenum, 1972, [33] RAUTMAN, C.A., REID, R.A., AND RYDER, E.E.:
pp. 85-103. 'Scheduling the disposal of nuclear waste material in
[18] MAGOS, D.: 'Tabu search for the planar three-index as- a geologic repository using the transportation model',
signment problem', J. Global Optim. 8 (1996), 35-48. Oper. Res. 41 (1993), 459-469.
[19] MAGOS, D., AND MILIOTIS, P.: 'An algorithm for [34] ROMERO, D.: 'Easy transportation-like problems on K-
455
Multi-index transportation problems
dimensional arrays', J. Optim. Th. Appl. 66 (1990), ficulties. The first two are the same as those exist-
137-147. ing for multi-objective integer linear programming
[35] SCHELL,E.: 'Distribution of a product by several prop-
(MOILP) problem (cf. M u l t i - o b j e c t i v e i n t e g e r
erties', in DIRECTORATE OF MANAGEMENT ANALYSIS
(ed.): Second Symposium in Linear Programming 2, linear p r o g r a m m i n g ) , i.e.
DCS/Comptroller HQ, US Air Force, Washington DC, • the number of efficient solutions may be very
1955, pp. 615-642.
[36] large;
SHARMA, J.K., AND SHARUP, K.: 'Time-minimizing
multidimensional transportation problem', J. Engin. • the nonconvex character of the feasible set re-
Production 1 (1977), 121-129. quires to device specific techniques to gener-
[37] SPIEKSMA, F.C.R., AND WOEGINGER, G.J.: 'Geomet-
ate the so-called 'nonsupported' efficient so-
ric three-dimensional assignment problems', Europ. J.
Oper. Res. 91 (1996), 611-618. lutions (cf. M u l t i - o b j e c t i v e i n t e g e r linear
[38] TZENG, G., TEODOROVIC, D., AND HWANG, M." programming).
'Fuzzy bicriteria multi-index transportation problems
A particular single CO problem is characterized by
for coal allocation planning of Taipower', Europ. J.
Oper. Res. 95 (1996), 62-72. some specificities of the problem, generally a spe-
[39] VLACH, M.: 'Branch and bound method for the three- cial form of the constraints; the existing methods
index assignment problem', Ekonomicko-Matematicky for such problem use these specificities to define
Obzor 3 (1967), 181-191. efficient ways to obtain an optimal solution. For
[4o] VLACH,M.: 'Conditions for the existence of solutions of MOCO problem, it appears interesting to do the
the three-dimensional planar transportation problem',
same to obtain the set of efficient solutions. Con-
Discrete Appl. Math. 13 (1986), 61-78.
[41] YEMELICHEV, V.A., KOVALEV, M.M., AND KRATSOV, sequently, and contrary to what is often done in
M.K.: Polytopes, graphs and optimization, Cam- MOLP and MOILP methods, a third difficulty is
bridge Univ. Press, 1984. to elaborate methods avoiding to introduce addi-
Maurice Queyranne tional constraints so that we preserve during all the
Univ. British Columbia procedure the particular form of the constraints.
Vancouver, B.C., Canada The general form of a MOCO problem is
E-mail address: blaurice. Queyranne©coramerce. ubc. ca
Frits Spieksma 'min' z k ( X ) = ckX,
XES
Maastricht Univ.
k= 1,...,K,
Maastricht, The Netherlands
E-mail address: spieksma©math, unimaas, nl
(P) where S=DNB ~
MSC 2000:90C35 with X(n × 1),
Key words and phrases: transportation problem, three- B = {0,1}
dimensional transportation problem, greedy algorithm,
Monge property, approximation algorithms. and D is a specific polytope characterizing the CO
problem: assignment problem, knapsack problem,
traveling salesman problem, etc.
MULTI-OBJECTIVE COMBINATORIAL OP- There exists several surveys on MOCO; some
TIMIZATION, MOCO are devoted to specific problems (i.e., the partic-
It is well known that, on the one hand, combina- ular form of D): the shortest path problem [8],
torial optimization (CO) provides a powerful tool transportation networks [2], and the scheduling
to formulate and model many optimization prob- problem [6], [7]; the survey [9] is more general
lems, on the other hand, a multi-objective (MO) examining successively the literature on MO as-
approach is often a realistic and efficient way to signment problems, knapsack problems, network
treat many real world applications. Nevertheless, flow problems, traveling salesman problems, loca-
until recently, Multi-objective combinatorial opti- tion problems, set covering problems.
mization (MOCO) did not receive much attention In the present article we put our attention on
in spite of its potential applications. One of the the existing methodologies for MOCO. First we
reason is probably due to specific difficulties of examine how to determine the set E(P) of all
MOCO models. We can distinguish three main dif- the efficient solutions and we distinguish three ap-
456
Multi-objective combinatorial optimization
proaches: direct methods, two-phase methods and At any node of the branch and bound tree, vari-
heuristic methods. Subsequently we analyse inter- ables are set to 0 or 1; let B0 and B1 denote the
active approaches to generate a 'good compromise' index sets of variables assigned to the values 0 and
satisfying the decision maker. 1, respectively. Let F be the index set of free vari-
ables which always follow, in the order O, those
Generation of E(P). belonging to B1 U B0. If i - 1 is the last index of
fixed variables, we have B1 [2 B0 - { 1 , . . . , i - 1};
Direct methods. The first idea is to use intensively
F-
classical methods for single objective problem (P)
Initially, i - 1. Let
existing in the literature to determine E(P). Of
course, each time a feasible solution is obtained the • W- W - EjEB1 wj ~ 0 be the leftover ca-
457
Multi-objective combinatorial optimization
458
Multi-objective combinatorial optimization
Ph.D. thesis of E.L. Ulungu, which gave rise to in a single-objective problem (P9) defined by the
the so-called M O S A method to approximate E(P) global weighted deviation function:
(see, in particular, [11]). After this pioneer study, K
this direction has been tackled by other research min ~-~.pad k
t e a m s : P. Czyzak and A. Jaszkiewicz ([3]) pro- k=l
posed another way to adapt simulating annealing (Pg) s.t. z k ( X ) + d~ - d k - gk, Vk,
to a MOCO problem; independently, [5], [4] and [1] X C S- D N B n.
did the same with tabu search, the later combining
When a solution is obtained, the decision maker
also tabu search and genetic algorithms; genetic
can possibly modify the values of the goals gk be-
algorithms are also used in [13].
fore a new iteration is performed. One drawback
The principle idea of MOSA method can be re-
is that the additional goal constraints induce the
sumed in short terms. One begins with an initial
loss of the particular structure of the initial CO
iterate X0 and initializes the set of potentially effi-
problem, so that a general ILP software must be
cient points P E to just contain X0. One then sam-
used to solve problem (Pg).
ples a point Y in the neighborhood of the current
iterate. But instead of accepting Y if it is better Interactive Two-Phase Methods and MOSA
than the current iterate on an objective: we now Method. The two-phase methodology described
accept it if it is not dominated by any of the points above can easily be adapted to build interac-
currently in the set P E . If it is not dominated, we tively a good compromise. At each step of the
make Y the current iterate, add it to P E , and first phase, the decision maker can indicate which
throw out any point in P E that are dominated pair (X~, Xs) he prefers so that only a small subset
by Y. On the other hand, if Y is dominated, we of S E ( P ) is generated in the direction given by the
still make it the current iterate with some proba- decision maker; at the second phase, only one (or
bility. In this way, as we move the iterate through a few number of) triangles /~Z~Z~ is (are) anal-
the space, we simultaneously build up a set P E of ysed to verify if there exists in it a more satisfying
potentially efficient points. The only complicated nonsupported efficient solution. In the same spirit,
aspect of this scheme is the method for computing an interactive MOSA method can be designed (see
the acceptance probability for Y when it is dom- also [12]): the decision maker gives some goals gk
inated by a point in P E . The MOSA method is and only the solutions satisfying z k ( X ) < gk are
described in details in [11] and in Bi-objective putting in the list of potential efficient solutions.
assignment problem. When this list contains a certain a priori fixed
number of solutions, the decision maker indicates
which one is preferred, modifies the goals gk in a
Interactive Determination of a G o o d C o m -
more restrictive sense before to continue the search
p r o m i s e . The general idea of interactive methods
with MOSA.
is described in Multi-objective integer linear
An example of such interactive procedure is
programming. Two types of methods can be dis-
given in [12] for a real case study.
tinguished, which we treat in the following subsec-
See also: Fractional combinatorial opti-
tions.
mization; Replicator dynamics in com-
Goal Programming. As pointed out in [9], this binatorial optimization; N e u r a l n e t w o r k s
methodology is often used by American re- for combinatorial optimization; Combina-
searchers to treat several case studies. The gen- torial matrix analysis; Combinatorial op-
eral idea of goal programming method is to in- timization algorithms in resource alloca-
troduce for each objective k deviation variables tion problems; Combinatorial optimization
d + and d-, respectively by excess and by default, games; Evolutionary algorithms in combi-
with respect to a certain a priori goal gk, so that natorial optimization; Multi-objective opti-
goal constraints are defined. If some priorities ex- mization: Pareto optimal solutions, proper-
pressed by some weights Pk are given, this results ties; Multi-objective optimization: Interac-
459
Multi-objective combinatorial optimization
tive methods for preference value functions; [9] ULUNGU, E.L., AND TEGHEM, J.: 'Multi-objective
Multi-objective optimization: Lagrange du- combinatorial optimization problems: A survey', J.
Multi-Criteria Decision Anal. 3 (1994), 83-104.
ality; Multi-objective optimization: Inter-
[10] ULUNGU, E.L., AND TEGHEM, J.: 'Solving multi-
action of design and control; Outranking objective knapsack problem by a branch and bound
methods; Preference disaggregation; Fuzzy procedure', in J. CLIMACO (ed.): Multicriteria Analy-
multi-objective linear programming; Multi- sis, Springer, 1997, pp. 269-278.
objective optimization and decision sup- [11] ULUNGU, E.L., TEGHEM, J., FORTEMPS, PH., AND
TUYTTENS, D.: 'MOSA method: A tool for solving
port systems; Preference disaggregation ap-
MOCO problems I', Multi-Criteria Decision Anal. 8
proach: Basic features, examples from fi- (1999), 221-236.
nancial decision making; Preference model- [12] ULUNGU, E.L., TEGHEM, J., AND OST, CH.: ~ElC[i-
ing; Multiple objective programming sup- ciency of interactive multi-objective simulated anneal-
port; Multi-objective integer linear pro- ing through a case study', J. Oper. Res. Soc. 49 (1998),
gramming; Bi-objective assignment prob- 1044-1050.
[13] VIENNET, R., AND FONTEX, M.: 'Multi-objective com-
lem; Estimating data for multicriteria deci- binatorial optimization using a genetic algorithm for
sion making problems: Optimization tech- determining a Pareto set', Internat. J. Syst. Sci. 27,
niques; Multicriteria sorting methods; Fi- no. 2 (1996), 255-260.
nancial applications of multicriteria anal- [14] VIS~E, M., TEGHEM, Z., PIRLOT, M., AND ULUNGU,
ysis; Portfolio selection and multicriteria E.L.: 'Two-phases method and branch and bound pro-
cedures to solve the bi-objective knapsack problem', J.
analysis; Decision support systems with
Global Optim. 12 (1998), 139-155.
multiple criteria.
Jacques Teghem
References Lab. Math. & Operational Research Fac. Polytechn. Mons
[1] BEN ABDELAZIZ, F., CHAOUACHI, J., AND KRICHEN, 9, rue de Houdain
S.: 'A hybrid heuristic for multiobjective knapsack B-7000 Mons, Belgium
problems', Techn. Report Inst. Sup. Gestion, Tunisie
E-mail address: teghem(Dmathro, fpms. ac. be
s u b m i t t e d (1997).
[2] CURRENT, J.R., AND MIN, H.: 'Multiobjective design
of transportation networks: taxonomy and annotation', MSC2000: 90C29, 90C27
Europ. J. Oper. Res. 26, no. 2 (1986), 187-201. Key words and phrases: multi-objective programming, com-
[3] CZYZAK, P., AND JASZKIEWICZ, A.: 'Pareto simulated binatorial optimization.
annealing - A metaheuristic technique for multiple ob-
jective combinatorial optimization', J. Multi-Criteria
Decision Anal. 7 (1998), 34-47.
[4] GANDIBLEUX, X., MEZDAOUI, N., AND FRI~VILLE, A."
'A tabu search procedure to solve multiobjective com- MULTI-OBJECTIVE INTEGER LINEAR
binatorial optimisation problems', in R. CABALLERO PROGRAMMING, MOILP
AND R. STEUER (eds.): Proc. Volume of MOPGP'96, From the 1970s onwards, multi-objective linear
Springer, 1997.
programming (MOLP) methods with continuous
[5] HANSEN, M.P.: 'Tabu search for multiobjective optimi-
zation: MOTS', Techn. Report Inst. Math. Modelling, solutions have been developed [8]. However, it is
Techn. Univ. Denmark (1996), Submitted for publica- well known that discrete variables are unavoidable
tion. in the linear programming modeling of many ap-
[6] HOOGEVEEN, H.: 'Single machine bicriteria schedul- plications, for instance, to represent an investment
ing', PhD Diss. Univ. Eindhoven (1992).
choice, a production level, etc.
[7] K(3KSALAN, M., AND KOKSALAN-KONDA, CKI.S."
'Multiple criteria scheduling on single machine: A re- The mathematical structure is then integer lin-
view and a general approach', in M. KARWAN ET AL. ear programming (ILP), associated with MOLP
(eds.): Essays in Decision Making, Springer, 1997. giving a MOILP problem. Unfortunately, MOILP
[8] ULUNGU, E.L., AND TEGHEM, J.: 'Multi-objective cannot be solved by simply combining ILP and
shortest problem path: A survey', in M. CERNY,
MOLP methods, because it has got its own spe-
D. GLACKAUFOVA, AND D. LOULA (eds.): Proc. In-
ternat. Workshop on MCDM, Liblice (Czechoslovakia), cific difficulties.
1991, pp. 176-188. The problem (P) considered is defined as
460
Multi-objective integer linear programming
n
K m 2~
!
max xj,
XED Zl ( X ) - 6xl + 3x2 + x3,
j=l
k- 1,...,K, z2(X) - Xl + 3x2 + 6x3,
T X < d, D - ( X " Xl + z2 + x3 <_ 1, zi e {0, 1}}.
X_>0, For this problem,
where D- XER n"
(P) xj integer,
E(P) - {(1, 0, 0); (0, 1, 0); (0, 0, 1)}
jEJ
with T ( m × n), while N S E ( P ) - {(0, 1, 0)}.
d(m × 1), Nevertheless, V.J. Bowman [1] has given a the-
oretical characterization of E(P)" Setting
X(n×l),
J C {1,...,n}. Mk -- max z k ( X ) ,
XED
If we denote L D - { X " T X <_ d, X >_ 0}, problem -2k -- Mk + ~k, with~k>0,
(LP) is the linear relaxation of problem (P)" p>0,
(LP) ' max ' zk (X) , k- 1 , . . . , K,
then E(P) is characterized by the optimal solu-
XELD tions of the problem (P~)"
A solution X* in D (or L D ) is said to be effi-
min max
cient for problem (P) (or (LP)) if there does not XED k
exist any other solution in D (or L D ) such that
z k ( X ) >>_zk(X*), k - 1 , . . , K , with at least one
strict inequality.
(5
Let E(.) denote the set of all efficient solutions consisting of minimizing the augmented weighted
of problem (.). It is well known (see [8])that (LP) Tchebychev distance between z k ( X ) and gk.
may be characterized by the optimal solutions of Let us note that another characterization of
the single objective and parametrized problem: E(P) is given in [2] for the particular case of binary
K variables.
max ~ AkZk(X) Two types of problems can be analysed:
k=l
XELD • Generate E(P) explicitly. Several methods
(LP~) have been proposed; they are reviewed in
with Ak>0, Vk,
K
[10]. below we will present two of them, which
appear general, characteristic and efficient.
E
k=l
Ak--1
• To determine interactively with the decision
This fundamental principle often called Ge- maker a 'best compromise' in E(P) accord-
offrion's theorem is no longer valid in pres- ing to the preferences of the decision maker.
ence of discrete variables because the set D is not Some of the existing approaches are reviewed
convex. The set of optimal solutions of problem in [11]; below we will describe three of these
(P~), defined as problem (LP~) in which L D is interactive methods.
replaced by D, is only a subset SE(P) of E(P);
the solutions in SE(P) are called supported effi-
G e n e r a t i o n of E ( P ) .
cient solutions, while the solutions belonging to
N S E ( P ) = E(P) \ SE(P) are called nonsupported K l e i n - H a n n a n method. See [5]. This is an iterative
efficient solutions. procedure for sequentially generating the complete
The breakdown of Geoffrion's theorem for prob- set of efficient solutions for problem (P) (we sup-
lem (P) can be illustrated by the following obvious pose that the coefficients c~k) are integers); it con-
example: sists in solving a sequence of progressively more
461
Multi-objective integer linear programming
these constraints is given in [5]. The set of - the node S r is feasible and Z r - zr;
solutions Ej (P) is then defined as Ej (P) - the node Sr is unfeasible and
E ( P j ) M E(P), where E ( P j ) is the set of all ~ j e F ~ min(O, tij) > d r for some i =
optimal solutions of (P j).
1,...,m.
The usual backtracking rules are applied.
The procedure continues until, at some iteration J,
the problem (Pj) becomes infeasible; at this time • (branching rule) A variable al C F r is se-
E(P) - U j :g-1
oEj(P). lected to be the branching variable.
- I f the node S r is feasible, l E {j E
Kiziltan-Yucaoglu method. See [4]. This is a direct F 0}.
adaptation to a multi-objective framework of the - Otherwise, index 1 is selected by the min-
well-known Balas algorithm for the ILP problem imum unfeasibility criterion"
with binary variables. m
At node S r of the branch and bound scheme, the min E max (0,-d~ + tij).
jCF r
following problem is considered: i=1
462
Multi-objective integer linear programming
When the A
explicit enumeration is complete, in E by X* and a new iteration is per-
E(P) - E. formed;
- if Z* ¢ Z and X* is not preferred to any
solution in E: E is not modified and the
Interactive M e t h o d s . Such methods are partic-
second stage is initiated;
ularly important to solve multi-objective applica-
- if Z* C Z: Z defines a face of the efficient
tions. The general idea is to determine progres-
surface and the second stage is initiated.
sively a good compromise solution integrating the
preferences of the decision maker. • (Stage 2)" Introduction of the best non sup-
The dialog with the decision maker consist of a ported solutions. We will not give details
succession of 'calculation phase' managed by the about this second stage (see [3] or [10]); let us
model and 'information phase' managed by the de- just say that it is performed in the same spirit
cision maker. but considering the single objective problem
At each calculation phase, one or several new max G(X)
efficient solutions are determined taking into ac-
XED
count the information given by the decision maker
at the preceding information phase. At each in- G(X) < G - ~ withe>0
formation phase, a few number of easy questions
where G is the optimal value obtained for the
are asked to the decision maker to collect infor-
last function G(X) considered.
mation about its preferences in regard to the new
solutions.
Steuer-Choo Method. See [9]. Several interactive
Gonzalez-Reeves-Franz Algorithm. See [3]. In this approaches of MOLP problems can also be ap-
method a set E of K efficient solutions is selected plied to MOILP; among them, we mention only
and updated in each algorithm step according to the Steuer-Choo method, which is a very general
the decision maker's preferences. At the end of the procedure based on problem (pT) defined in the
procedure, E will contain the most preferred so- introduction.
lutions. The method is divided in two stages: in The first iteration uses a widely dispersed group
the first one, the supported efficient solutions are of A weighting vectors to sample the set of efficient
considered, while the second one deals with non- solutions. The sample is obtained by solving prob-
supported efficient solutions. lem (pT) for each of the ~ values in the set. Then
the decision maker is asked to identify the most
(Stage 1)" Determination of the best sup- preferred solution X (1) among the sample. At iter-
ported efficient solutions. E is initialized with ation j, a more refined grid of weighting vectors
K optimal solutions of the K single objective is used to sample the set of efficient solution in the
ILP problems. Let us denote by Z the K cor- neighborhood of the point zk(X(J)) (k - 1,... ,K)
responding points in the objective space of in the objective space. Again the sample is ob-
the solution of E. At each iteration, a linear tained by solving several problems (pT) and the
direction of search G(X) is build: G(X)is the most preferred solution X (j+l) is selected. The
inverse mapping of the hyperplane defined by procedure continues using increasingly finer sam-
the points of Z in the objective space into the pling until the solution is deemed to be acceptable.
decision space. A new supported efficient so-
lution X* is determined by solving the single The MOMIX Method. (See [6].) The main charac-
objective ILP problem maxxED G(X) and teristic of this method is the use of an interactive
Z* is the corresponding point in the objec- branch and bound concept initially introduced
tive space. Then: in [7] to design the interactive phase.
- if Z* ~ Z and the decision maker prefers • (First compromise): The following minimax
solution X* to at least one solution of optimization, with m = 1, is performed to
E" the least preferred solution is replaced determined the compromise )~(1):
463
Multi-objective integer linear programming
min 5
one of the following conditions is verified"
(pm) a) D (re+l) - 0;
Vk I-[~m)(M~ m) - zk(X)) <
b) ~/r(m+l) (re+l) < ek Vk;
"'k -- mk --
X E D (m)
c) the vector Z of the incumbent val-
where ues (values of the criteria for the best
- D (~) - D;
compromise already determined) is
preferred to the new ideal point (of
- [m~1), M~ 1)] are the variation intervals of
component ""k ~/r(m+l) ).
the criteria k, provided by the pay-off ta-
ble (see [8]); The first step of the procedure is stopped
if either more than q successive iterations
- II~1) are certain normalizing weights tak-
do not bring an improvement of the in-
ing into account these variation intervals
cumbent point Z or more than Q itera-
(see [8]).
tions have been performed.
REMARK 1 If the optimal solution is not Note that the parameters ek, q and Q are
unique, an augmented weighted Tchebychev fixed in the agreement with the decision
distance is required in order to obtain an ef- maker.
ficient first solution. V-]
• (Backtracking procedure)" It can be hoped
(Interactive phases)" There are integrated in that the appropriate choice of the criterion
an interactive branch and bound tree; a first zlm(1), at each level m of the depth-first pro-
step (a depth-first progression in the tree) gression, has been made so that at the end
leads to the determination of a first good of the first step, a good compromise has been
compromise; the second step (a backtracking found.
procedure) confirms the degree of satisfaction Nevertheless, it is worth examining some
achieved by the decision maker or it finds a other parts of the tree to confirm the satis-
better compromise if necessary. faction of the decision maker. The complete
- (Depth first progression)" For m > 1, let tree is generated in the following manner: at
at the ruth iteration each level, K subnodes are introduced by suc-
1) )~(m) be the ruth compromise; cessively adding the constraints"
2) z~m) be the corresponding values of
the criteria; Zlm(1 ) ( x ) > z /re(l)'
(m)
3) [ra~m) , "'~k
'/r(m) ] be the variation inter- (m) (m)
zl~(2)(X) > zl~(2); Z/m(1 ) ( X ) ~_ Zlm(1),
vals of the criteria; and
4) H~m) be the weight of the criteria.
The decision maker has to choose, at .(m) (m)
this ruth iteration, the criterion/re(l) E Zlm(K)(X) > ~lm(K); zlm(k)(X) <_Zl~(k),
{k" k - 1 , . . . , K } he is willing to
for all k - 1 , . . . , K - 1, where Im(k) E
improve in priority. Then a new con-
{k" k - 1 , . . . , K } is the kth objective that
straint is introduced so that the fea-
the decision maker wants to improve at the
sible set becomes ~ D (re+l) - D (m) N
mth level of the branch and bound tree.
{zlm(1)(X) > z Ira(l)} (m) Further, the vari-
At each level m, the criteria are thus or-
(re+l) ~/r(m+l)
ation intervals [mk , ""k ] and the dered according to the priorities of the de-
rr(m+l)
weights "'k are updated on the new cision maker in regard with the compromise
feasible set D (re+l). The new compromise ~(m).
~(m+l) is obtained by solving the prob- The usual backtracking procedure is ap-
lem (pro+l). plied; yet it seems unnecessary to explore the
Different tests allow to terminate this whole tree. Indeed, the subnode k > K of
first step. The node (m + 1) is fathomed if each branching correspond to a simultaneous
464
Multi-objective integer linear programming
relaxation of those criteria lm(k), k <_K, the problems: Optimization techniques; Multi-
decision maker wants to improve in priority! criteria sorting methods; Financial applica-
Therefore, the subnodes k > K - 2 or tions of multicriteria analysis; Portfolio se-
3, for instance, do almost certainly not bring lection and multicriteria analysis; Decision
any improved solutions. support systems with multiple criteria.
The fathoming tests and the stopping tests
are again applied in this second step. References
[1] BOWMAN JR., V.J.: 'On the relationship of the
Tchebycheff norm of the efficient frontier of multi-
See also: Decomposition techniques for criteria objectives', in H. THIRIEZ AND S. ZIONTS
MILP: Lagrangian relaxation; LCP: Parda- (eds.): Multiple Criteria Decision Making, Springer,
los-Rosen mixed integer formulation; In- 1976, pp. 76-85.
[2] BURKARD, R.E.: 'A relationship between optimality
teger linear complementary problem; In-
and efficiency in multiple criteria 0-1 programming
teger programming: Cutting plane al- problems', Comput. Oper. Res. 8 (1981), 241-247.
gorithms; Integer programming: Branch [3] GONZALEZ, J.J., REEVES, G.R., AND FRANZ, L.S.:
and cut algorithms; Integer programming: 'An interactive procedure for solving multiple objective
Branch and bound methods; Integer pro- integer Linear programming problems', in Y. HAIMES
gramming: Algebraic methods; Integer pro- AND V. CHANKONG (eds.): Decision Making with Mul-
tiple Objectives, Springer, 1985, pp. 250-260.
gramming: Lagrangian relaxation; Inte-
[4] KIZILTAN, G., AND YUCAOGLU, E.: 'An algorithm
ger p r o g r a m m i n g duality; Time-dependent for multiobjective zero-one linear programming', Man-
traveling salesman problem; Set cover- agem. Sci. 29, no. 12 (1983), 1444-1453.
ing, packing and partitioning problems; [5] KLEIN, D., AND HANNAN, E.: 'An algorithm for the
Simplicial pivoting algorithms for integer multiple objective integer linear programming', EJOR
9, no. 4 (1982), 378-385.
programming; Multi-objective mixed inte-
[6] L'Hom, H., AND TEGHEM, J.: 'Portfolio selection by
ger programming; Mixed integer classifica- MOLP using an interactive branch and bound', Found.
tion problems; Integer programming; Mul- Computing and Decision Sci. 20, no. 3 (1995), 175-185.
tiparametric mixed integer linear program- [7] MARCOTTE, 0., AND SOLAND, R.: 'An interactive
ming; P a r a m e t r i c mixed integer nonlinear branch and bound algorithm for multiple criteria opti-
optimization; Stochastic integer program- mization', Managem. Sci. 32, no. 1 (1986), 61-75.
[8] STEUER, R.E.: Multiple criteria optimization theory,
ming: Continuity, stability, rates of conver-
computation and applications, Wiley, 1986.
gence; Stochastic integer programs; Branch [9] STEUER, R.E., AND CHOO, E.-U.: 'An interactive
and price: Integer p r o g r a m m i n g with col- method weighted Tchebycheff procedure for multiple
umn generation; Multi-objective optimi- objective programming', Math. Program. 26 (1983),
zation: Pareto optimal solutions, proper- 326-344.
[10] TEGHEM, J., AND KUNSCH, P.: 'Interactive method
ties; Multi-objective optimization: Interac-
for multi-objective integer linear programming', in
tive methods for preference value functions; G. FANDEL ET AL. (eds.): Large Scale Modelling and
Multi-objective optimization: Lagrange du- Interactive decision analysis, Springer, 1986, pp. 75-
ality; Multi-objective optimization: Inter- 87.
action of design and control; Outranking [11] TEGHEM, J., AND KUNSCH, P.: 'A survey of techniques
methods; Preference disaggregation; Fuzzy for finding efficient solutions to multi-objective inte-
ger linear programming', Asia-Pacific J. Oper. Res. 3
multi-objective linear programming; Multi-
(1986), 1195-106.
objective optimization and decision sup-
Jacques Teghem
port systems; Preference disaggregation ap-
Lab. Math. & Operational Research Fac. Polytechn. Mons
proach: Basic features, examples from finan- 9, rue de Houdain
cial decision making; Preference modeling; B-7000 Mons, Belgium
Multiple objective p r o g r a m m i n g support; E-mail address: teghem~mathro. :fpms. a c . b e
Multi-objective combinatorial optimization; MSC2000: 90C29, 90C10
Bi-objective assignment problem; Estimat- Key words and phrases: multi-objective programming, in-
ing data for multicriteria decision making teger, linear programming.
465
Multi-objective integer linear programming
466
Multi-objective mixed integer programming
467
Multi-objective mixed integer programming
468
Multi-objective mixed integer programming
DM's implicit utility function. may be portions of the nondominated set that the
C. Ferreira et al. [5] proposed a decision support program is unable to compute, even considering p
system for bicriterion mixed integer programs. The very small (for example, the line segment from C
interactive process follows an open communication to C' in Fig. 2, for a given p), this characteriza-
protocol asking the DM to specify bounds for the tion is still possible in practice. Note that p can be
objective function values. These bounds are in- set so small that the DM is unable to discriminate
put into (P~,g) defining subregions to carry on the between those solutions and a nearby weakly non-
search for nondominated solutions. Some objective dominated solution (this corresponds to C' getting
space regions are progressively eliminated either closer to C in Fig. 2).
by dominance or infeasibility.
In [16] and [15] a lexicographic weighted
Tchebycheff and Achievement Scalarizing Pro- Tchebycheff program is proposed for the nonlinear
grams. Bowman [3] proved that the parameter- and infinite-discrete feasible region cases to over-
ization on w of minx~x Ill - f(x)llw generates come this drawback of the augmented weighted
the nondominated set, where wi > 0 for all i, Tchebycheff program. The lexicographic approach
k can also be applied to the mixed integer (linear)
~-~i=lwi - 1, f is a criterion point such that
f > f ( x ) for all x E Z and I I f - f ( x ) l l w de- case. However, it is more difficult to implement
notes the w-weighted Tchebycheff metric, that since two stages of optimization are employed. At
is, m a x l < i < k { w i l f i - fi(x)l}. This scalarizing pro- the first stage only a is minimized. When the first
gram is equivalent to stage results in alternative optima, a second stage
is required. It consists of minimizing - ~ i k l fi (z)
min a
over the solutions that minimize a in order to elim-
(Tw) s.t. wi (-]i - fi(x)) <_ a, 1 <_ i <_ k, inate the weakly nondominated solutions.
xEX, a>_O. Besides (Tw) (either the augmented or the
(Tw) may yield weakly nondominated solutions lexicographic forms), there are other similar ap-
(for instance, point C in Fig. 1). Replacing the ob- proaches that also allow to characterize the non-
jective function in (Ww) by a - p ~-]~k_1 f i ( x ) w i t h dominated set of multi-objective mixed integer
p a small positive value, all the solutions returned programs. An approach of this type consists in
by this augmented weighted Tchebycheff program discarding the w-vector or fixing it and varying
are nondominated. R.E. Steuer and E.-U. Choo f, the criterion reference point that represents the
[16] proved that there are always p small enough DM's aspiration levels. This scalarizing program
that enable to reach all the nondominated set for can be denoted by (TT). There always exist refer-
the finite-discrete and polyhedral feasible region ence points satisfying f > f ( x ) for all x E X, such
cases. that (Ty) produces a particular nondominated so-
lution ~ - f (5). The variation of f can be done ac-
cording to a vector direction 0, leading to (T]+e).
i
l
_ l)/w,~,h,ed The reference points are thus projected onto the
p
i
nondominated set. Reference points that do not
i f this distance ~rchebycheffcont
i~Augmenledweighted our,,:
~ Tchebycheffconto~r
satisfy the condition f > f ( x ) for all x E X may
also be considered provided that the a variable is
4 =z~, C'
defined without sign restriction. This corresponds
to the minimization of a distance from Z to the
reference point if the latter is not attainable and
2, to the maximization of such a distance if the ref-
erence point is attainable. If reference or aspira-
Fig. 2: Illustration of the augmented weighted Tchebycheff tion levels are used as controlling parameters, the
metric. (weighted) Tchebycheff metric changes its form of
Concerning the MOMIP case, although there dependence on controlling parameters and should
469
Multi-objective mixed integer programming
be interpreted as an achievement function [9]. tives for the points forming the pair. The algorithm
finishes when the maximum 'error' is lower than a
Like (Tw), the simplest form of (TT) may pro-
predefined maximum allowable 'error'.
duce weakly nondominated solutions. The aug-
mented form is a good substitute in practice and Another interactive method capable of solving
the lexicographic approach guarantees that all MOMIP problems was developed by A. Durso [4].
nondominated solutions can be reached. In what This method employs a branching scheme consid-
follows, let (T.) denote either the simplest, the ering progressively smaller portions of the non-
augmented or the lexicographic form. dominated set by imposing lower bounds on the
criterion values. At each interaction, the k non-
Scalarizing programs (T~), (TT) and their ex-
dominated solutions that define the (quasi)ideal
tensions or slight different formulations are used
criterion point for each new node are calculated.
to generate nondominated solutions in several (in-
The DM is then asked to select the node for
teractive) methods proposed in literature, namely
branching by choosing the preferred ideal point.
in the following ones. The branching process begins by solving an equally
Steuer and Choo [16] proposed a general weighted augmented Tchebycheff program to de-
purpose multi-objective programming interactive termine a 'centralized' nondominated point for the
method that assumes an implicit DM's utility subset of the node under exploration. Once the
function without any special restriction on shape. DM chooses the most preferred of the k + 1 non-
The strategy of the interactive procedure is to dominated points already known for this node, say
sample series of progressively smaller subsets of ~', up to k new nodes (children) are created. Each
nondominated solutions. At each interaction, the child inherits its parent's bounding constraints and
DM selects his/her preferred solution from a sam- uses ~ to further restrict one of them. Thus, the
ple of nondominated solutions obtained from (T~) ith child restricts the ith criterion by imposing
with several w-vectors and the ideal criterion point fi(x) >_ ~i + 5 with 5 small positive. This approach
in the role of f. The solution preferred by the may be regarded as an open communication pro-
DM provides information to tighten the set of w- cedure that terminates when the DM is satisfied
vectors for the next interaction. The procedure ter- with the incumbent solution (the preferred non-
minates when a nondominated criterion point suf- dominated solution obtained so far).
ficiently close to the optimal criterion point of the M.J. Alves and J. Climaco [2] proposed a
underlying utility function is found. MOMILP open communication interactive ap-
Solanki's method [14], which is designed for proach. It combines the Tchebycheff theory with
bi-objective mixed integer linear programs, is the traditional branch and bound technique for
an adaptation of the noninferior set estimation solving single-objective mixed integer programs.
(NISE) method developed by J.L. Cohon for bi- At each interaction, the DM specifies either a ref-
objective linear programs. It seeks to generate erence point f , which is input in (TT) to compute
a representative subset of nondominated solu- a nondominated solution via branch and bound,
tions by combining the NISE's key features with or just selects an objective function, say fj, he/she
weighted Tchebycheff scalarizing programs. At wants to improve with respect to the previous non-
each iteration, a new nondominated solution, say dominated solution. In the latter case, the refer-
z 3, is computed by solving (Tw) for specific w and ence point is automatically adjusted by increas-
N
f, assuring that z 3 belongs to the region between ing the j t h component of f keeping the others
a pair of nondominated criterion points previously equal, in order to produce new nondominated solu-
determined, say (z 1, z2). This pair is then replaced tions (directional search) more suited to the DM's
by (z 1, z3) and (z 3, z2). The approximation of the preferences. This involves an iterative process of
nondominated surface is progressively improved, sensitivity analysis and operations to update the
thus decreasing the 'errors' associated with the ap- branch and bound tree. The sensitivity analysis
proximate representation of the pairs. This 'error' takes advantage of the special behavior of the para-
is measured by the largest range of the two objec- metric scalarizing program (T]+o). It returns a
470
Multi-objective mixed integer programming
value Oj > 0 such that the structure of the pre- proaches are continuous/integer ([7], [10]) working
vious branch and bound tree remains unchanged almost all the time with nondominated continuous
for variations in f j up to f j + Oj. Therefore, refer- solutions of the linear relaxation of the problem.
ence points f + 0 - ( f l, . . . , f j + Oj," " , f k) with Whenever the DM finds a satisfactory continuous
Oj _< Oj lead to nondominated solutions that may solution, an integer nondominated solution close
be obtained in a straightforward way. If the DM to it is then computed.
wishes to continue the search in the same direction,
a slight increase over Oj, say Oj + e, is first consid- C o n c l u s i o n s a n d F u t u r e D e v e l o p m e n t s . Most
ered. In this case, the previous sensitivity analysis methods developed so far for MOMIP problems re-
also returns the best candidate node, i.e., an ances- quire an excessive amount of computational effort,
tor of the node that will produce the next nondom- or require too much cognitive load from the DM,
inated solution. The previous branch and bound or only address bi-objective problems. In addition,
tree is thus used to proceed to the next computa- computational experience with real-world applica-
tions. Since further branching is usually required, tions is lacking. Although interesting or promising
an attempt is made to simplify the tree before en- approaches have been developed, further research
larging it. The underlying idea is to avoid an ever- efforts must be made in order to build effective in-
growing tree. This simplification means cutting off teractive methods able to handle real-sized prob-
parts of the tree linked by branching constraints lems.
no longer active. In sum, this approach brings to- See also" M i x e d i n t e g e r classification prob-
gether sensitivity analysis phases meant to adjust lems; I n t e g e r p r o g r a m m i n g ; Simplicial piv-
the reference point and simplification/branching o t i n g a l g o r i t h m s for i n t e g e r p r o g r a m m i n g ;
operations of the search tree to compute nondomi- Set covering, p a c k i n g a n d p a r t i t i o n i n g prob-
nated solutions. This process is repeated as long as lems; T i m e - d e p e n d e n t t r a v e l i n g s a l e s m a n
the DM wishes to continue the directional search p r o b l e m ; G r a p h coloring; I n t e g e r p r o g r a m -
or if the reference point has not been adjusted m i n g duality; I n t e g e r p r o g r a m m i n g : La-
enough to yield a nondominated solution differ- grangian relaxation; Integer programming:
ent from the previous one (a situation that occurs Algebraic methods; Integer programming:
more often in all-integer programs than in mixed B r a n c h a n d b o u n d m e t h o d s ; I n t e g e r pro-
integer models). Computational experiments have g r a m m i n g : B r a n c h a n d cut a l g o r i t h m s ; In-
shown that this multi-objective approach succeeds t e g e r p r o g r a m m i n g : C u t t i n g plane algo-
in performing directional searches. The times of r i t h m s ; I n t e g e r linear c o m p l e m e n t a r y prob-
computing phases using simplification/branching lem; L C P : P a r d a l o s - R o s e n m i x e d inte-
operations have been significantly reduced by this ger f o r m u l a t i o n ; D e c o m p o s i t i o n t e c h n i q u e s
strategy. for M I L P : L a g r a n g i a n relaxation; M u l t i -
o b j e c t i v e i n t e g e r linear p r o g r a m m i n g ; Mul-
Some researchers have developed other methods t i p a r a m e t r i c m i x e d i n t e g e r linear p r o g r a m -
for multi-objective integer programming that are ming; P a r a m e t r i c m i x e d i n t e g e r n o n l i n e a r
also applicable to the mixed integer case. Good optimization; Stochastic integer program-
examples of such approaches are those in [17], [10] ming: C o n t i n u i t y , stability, r a t e s of conver-
and [7]. In our opinion, they all are open commu- gence; S t o c h a s t i c i n t e g e r p r o g r a m s ; B r a n c h
nication procedures that share some key features, a n d price: I n t e g e r p r o g r a m m i n g w i t h col-
namely the concept of projecting a reference direc- umn generation.
tion onto the nondominated surface (although this
procedure is used in different ways) and the type of References
information required about the DM's preferences.
[1] AKSOY, Y.: 'An interactive branch-and-bound algo-
rithm for bicriterion nonconvex/mixed integer pro-
This information lies fundamentally in the speci- gramming', Naval Res. Logist. 37 (1990), 403-417.
fication of aspiration levels for the objective func- [2] ALVES, M.J., AND CLIMACO, J" 'An interactive
tion values (reference points). Some of these ap- reference point approach for multiobjective mixed-
471
Multi-objective mixed integer programming
integer programming using branch-and-bound', Europ. weighted Tchebycheff procedure for multiple objective
J. Oper. Res. 124, no. 3 (2000), 478-494. programming', Math. Program. 26 (1983), 326-344.
[3] BOWMAN, V.J.: 'On the relationship of the Tcheby- [17] VASSILEV, V., AND NARULA, S.C.: 'A reference direc-
cheff norm and the efficient frontier of multiple-criteria tion algorithm for solving multiple objective integer
objectives', in H. THIRIEZ AND S. ZIONTS (eds.): Mul- linear programming problems', J. Oper. Res. Soc. 44,
tiple Criteria Decision Making, Vol. 130 of Lecture no. 12 (1993), 1201-1209.
Notes Economics and Math. Systems, Springer, 1976, [ls] VILLARREAL, B., KARWAN, M.H., AND ZIONTS, S.:
pp. 76-86. 'An interactive branch and bound procedure for mul-
[4] DURSO, A.: 'An interactive combined branch-and- ticriterion integer linear programming', in G. FANDEL
bound/Tchebycheff algorithm for multiple criteria op- AND T. GAL (eds.): Multiple Criteria Decision Making:
timization', in A. GOICOECHEA, L. DUCKSTEIN, AND Theory and Application, Vol. 177 of Lecture Notes Eco-
S. ZIONTS (eds.): Multiple Criteria Decision Making, nomics and Math. Systems, Springer, 1980, pp. 448-
Proc. 9th Internat. Conf., Springer, 1992, pp. 107-122. 467.
[5] FERREIRA, C., SANTOS, B.S., CAPTIVO, M.E.,
CLIMACO, J., AND SILVA, C.C." 'Multiobjective loca- Maria JoSo Alves
tion of unwelcome or central facilities involving envi- Fac. Economics Univ. Coimbra and INESC
ronmental aspects: A prototype of a decision support Coimbra, Portugal
system', Belgian J. Oper. Res., Statist. and Computer E-mail address: mjoao~inescc.pt
Sci. 36, no. 2-3 (1996), 159-172. JoSo Climaco
[6] FEYERABEND,P.: Against method, Verso, 1975. Fac. Economics Univ. Coimbra and INESC
[7] KARAIVANOVA,J., KORHONEN, P., NARULA, S., WAL- Coimbra, Portugal
LENIUS, J., AND VASSILEV, V.: ' i reference direction
approach to multiple objective integer linear program- MSC2000: 90C29, 90Cll
ming', Europ. J. Oper. Res. 81 (1995), 176-187. Key words and phrases: multi-objective mathematical pro-
[8] KARWAN, M.H., ZIONTS, S., VILLARREAL, B., AND gramming, multicriteria analysis, interactive method.
RAMESH, R.: 'An improved interactive multicriteria
integer programming algorithm', in Y. HAIMES AND
V. CHANKONG (eds.): Decision Making with Multiple
Objectives, Vol. 242 of Lecture Notes Economics and
Math. Systems, Springer, 1985, pp. 261-271.
M U L T I - O B J E C T I V E OPTIMIZATION AND
[9] LEWANDOWSKI,i., AND WIERZBICKI, i . : 'Aspiration DECISION S U P P O R T SYSTEMS
based decision analysis and support. Part I: Theoreti- Multiple criteria decision making ( M C D M ) refers
cal and methodological backgrounds', WP-88-03, In- to the explicit i n c o r p o r a t i o n of more t h a n one eval-
ternat. Inst. Appl. Systems Anal. (IIASA), Austria
uation criteria into a decision problem. M C D M
(1988).
[10] NARULA, S.C., has been a very active field of research roughly
AND VASSILEV, V.: 'An interactive
algorithm for solving multiple objective integer lin- since the 1970s. A l t h o u g h b o u n d a r i e s might be
ear programming problems', Europ. J. Oper. Res. 79 fuzzy and overlapping, multicriteria decision anal-
(1994), 443-450. ysis (studying the problem of identifying the
[11] RAMESH, R., ZIONTS, S., AND KARWAN, M.H.: 'A 'most-preferred' a m o n g a finite discrete set of
class of practical interactive branch and bound algo-
alternatives), m u l t i - a t t r i b u t e utility theory (us-
rithms for multicriteria integer programming', Europ.
J. Oper. Res. 26 (1986), 161-172. ing utility functions explicitly to model a deci-
[12] RoY, B.: 'Meaning and validity of interactive proce- sion maker's preferences) and multi-objective op-
dures as tools for decision making', Europ. J. Oper. t i m i z a t i o n (modeling the decision p r o b l e m within
Res. 31 (1987), 297-303. a m a t h e m a t i c a l p r o g r a m m i n g framework) have
[13] SOLAND, R.M.: 'Multicriteria optimization: A general
emerged as m a j o r fields of interest under M C D M .
characterization of efficient solutions', Decision Sci. 10
(1979), 26-38. For more i n f o r m a t i o n on the general field of
[14] SOLANKI, R.: 'Generating the noninferior set in mixed M C D M , see [21].
integer biobjective linear programs: An application to Multi-objective mathematical programming
a location problem', Comput. Oper. Res. 18, no. 1 provides a flexible modeling framework t h a t al-
(1991), 1-15.
lows for simultaneous o p t i m i z a t i o n of more t h a n
[15] STEUER, R.: Multiple criteria optimization: Theory,
one objective function over a feasible set. Mathe-
computation and application, Wiley, 1986.
[16] STEUER, R., AND Cnoo, E.-U.: 'An interactive matically, the multi-objective o p t i m i z a t i o n prob-
lem can be expressed as:
472
Multi-objective optimization and decision support systems
473
Multi-objective optimization and decision support systems
Perhaps the most straight-forward way of ap- methods that rely on simplex-like procedures or
proaching the (MOO) problem is as in vector op- parametric searches that incorporate book-keeping
timization methods. Also referred to as posterior mechanisms based on the fact that the set of ef-
methods, these methods are based on the sole as- ficient extreme points is connected. A well-known
sumption that the decision maker prefers more to procedure that solves (MOLP) for all of its ex-
less in each objective function in (MOO) hence treme points is ADBASE which was developed by
they propose identifying all of the efficient solu- R.E. Steuer [19].
tions of (MOO) and presenting them to the de-
EXAMPLE 1 Consider the MOLP problem [18]:
cision maker for the identification of the most-
preferred solution. Along with theoretical findings max xl~ X2~ X3
[11], [2], some vector optimization methods have s.t. 2Xl + 3x2 + 4x3 < 12
been proposed; however, the methods have not (1)
4xl + X2 + X3 _< 8
gained practical recognition in general. The fail-
Xl~X2~X3 >_ O.
ure in the implementation of the proposed meth-
ods can be explained by the heavy computational
requirements of these methods. Perhaps a more
important factor is the difficulty of presenting the x3
efficient set in a 'legible' way to the decision maker.
Furthermore, as the efficient set is usually contin- el
uous when the feasible region is, the task of iden- E1 / e2
tifying the most-preferred solution is a monstrous J
one attributed to the decision maker. 5
Xl
474
Multi-objective optimization and decision support systems
studied by the decision maker. Moreover, extreme constraints that define the feasible region, but usu-
efficient points may not carry the trade-off infor- ally in a conservative way so as to retain some
mation well since some portions of the efficient computational tractability. Similarly, the multiple
set may end up being over-emphasized whereas objective integer programming problem is a very
some regions are highly missed. Indeed, there is difficult one to solve due to the additional compli-
no reason for a decision maker to be solely inter- cations related to integrality.
ested in extreme point efficient solutions. The at-
tractiveness of efficient extreme points mostly lies A p p l i c a t i o n s . Along with what one can call 'case
in their mathematical properties. With this mo- studies', certain applications that are more generic
tivation, a method that applies to a general set than a case study but more specific than prob-
of (MOO) problems has been suggested to find lem (MOO) itself have appeared. Typical exam-
globally-representative subsets of the efficient set ples include, but are not limited to, bicriteria net-
work optimization problems, bicriteria knapsack
problems, and multicriteria scheduling problems.
W o r k i n g in t h e O u t c o m e Space. The outcome Since usually these are problems that naturally in-
set Y = {y e R P : y = f ( x ) 3 x e X} helps redefine volve multiple criteria, the methods developed for
an equivalent problem to (MOO) in p-dimensional these problems have practical implications. Most
outcome space: of the methods developed can be categorized un-
der a priori methods. A typical approach is to form
(MOO0) {max y a weighted combination of the objective functions.
s.t. yCY.
Recently, interactive and vector optimization ap-
As the number of objectives p is usually much less proaches that deal with similar problems have also
than the number of variables n, the structure of appeared.
Y is simpler than that of Z [4], [8]. The ability to
work directly with (MOO0) thus has the poten- A R e l a t e d O p t i m i z a t i o n P r o b l e m . A related
tial of providing significant computational benefits problem is the problem of optimizing a function
that vector optimization algorithms have tried to g: R n --+ R p o v e r the efficient set XE. This can
realize [3]. be a difficult global optimization problem depend-
ing on the properties of the objective function g.
R e f l e c t i o n s on O p t i m i z a t i o n Trends. As a The problem is motivated in different ways. Some-
field within the general field of optimization, multi- times, in certain settings, a function that is to serve
objective optimization is naturally affected by the as a pseudo utility function is available. Then op-
trends that become dominant in optimization. timizing this pseudo utility function over the effi-
Consequently, interior point methods, genetic al- cient set in a sense corresponds to solving problem
gorithms, neural networks have been applied to the (MOO) itself. In addition, when g becomes one of
(MOO) problem in various ways. As there are dif- the objective functions, then solving this problem
ficult problems under (MOO) that cannot be yet provides the range of values the objective func-
practically solved, new developments in the gen- tion takes over the efficient set. This information
eral field of optimization constitute a potential to is valuable for a decision maker who is trying to
solve these problems. make assessments to solve a problem and is used
in some of the interactive algorithms. The diffi-
N o n l i n e a r and Integer P r o b l e m s . Most of the culty of the problem has also resulted in heuristic
algorithms proposed to solve problem (MOO) con- solution approaches.
centrate on the fully linear case. In general, when
nonlinearities are introduced, the efficient solu- Trends. The advances in information technology
tions and the efficient set become difficult to char- affect the field of multiple criteria decision mak-
acterize. There are some algorithms that allow for ing heavily. Faster computers and parallel process-
nonlinearities in the objective functions, and in the ing opportunities make it timewise feasible to solve
475
Multi-objective optimization and decision support systems
optimization problems that would be deemed im- objective linear programs in outcome space', J. Optim.
practical in the past. Improved graphical capabili- Th. Appl. 98 (1998), 17-35.
[4] BENSON, H.P., AND LEE, D.: 'Outcome-based algo-
ties make it feasible to accommodate sophisticated
rithm for optimizing over the efficient set of a bicriteria
user interfaces to invite the decision maker in the linear programming problem', J. Optim. Th. Appl. 88,
problem solving process more actively and reliably. no. 1 (1996), 77-105.
The developments in the World Wide Web present [5] BENSON, H.P., LEE, D., AND MCCLURE, J.P.: 'A
many opportunities to explore for individual and multiple-objective linear programming model for the
citrus rootstock selection problem in Florida', J. Multi-
group decision support. At this point in time, there
Criteria Decision Anal. 6 (1997), 1-13.
is still a need to solve the MOO problem in a rig- [6] BENSON, H.P., AND SAYIN, S.: 'Towards finding global
orous, user-friendly and creative way. The decision representations of the efficient set in multiple objec-
support systems that enable the involvement of the tive mathematical programming', Naval Res. Logist. 44
decision maker in modeling and problem solving (1997), 47-67.
practically seem to be the way of solving (MOO) [7] CHARNES, A., AND COOPER, W.W.: 'Goal program-
ming and multiple objective optimization-Part 1', Eu-
problems. The vector optimization approaches can
top. J. Oper. Res. 1 (1977), 39.
also benefit from a decision support framework in [8] DAUER, J.P., AND LIU, Y.-H.: 'Solving multiple ob-
their effort to help the decision maker identify a jective linear programs in objective space', Europ. J.
most-preferred solution. Oper. Res. 46 (1990), 350-357.
See also: Multi-objective optimization: [9] ECKER, J.G., HEGNER, N.S., AND KOUADA, I.A.:
'Generating all maximal efficient faces for multiple ob-
Pareto optimal solutions, properties; Multi- jective linear programs', J. Optim. Th. Appl. 30 (1980),
objective optimization: Interactive meth- 353-381.
ods for preference value functions; Multi- [10] GARDINER, L.R., AND STEUER, R.E.: 'Unified in-
objective optimization: Lagrange duality; teractive multiple-objective programming- An open-
Multi-objective optimization: Interaction architecture for accommodating new procedures', J.
Oper. Res. Soc. 45, no. 12 (1994), 1456-1466.
of design and control; Outranking meth-
[11] GEOFFRION, A.M.: 'Proper efficiency and the theory of
ods; Preference disaggregation; Fuzzy multi- vector maximization', J. Math. Anal. Appl. 22 (1968),
objective linear programming; Preference 618-630.
disaggregation approach: Basic features, ex- [12] GEOFFRION, A.M., DYER, J.S., AND FEINBERG, A.:
amples from financial decision making; Pref- 'An interactive approach for multi-criterion optimiza-
erence modeling; Multiple objective pro- tion with an application to the operations of an aca-
demic department', Managem. Sci. 19 (1972), 357-368.
gramming support; Multi-objective integer [13] HWANG, C.L., AND MASUD, A.S.M.: Multiple objec-
linear programming; Multi-objective com- tive decision making-methods and applications, A state
binatorial optimization; Bi-objective assign- of the art survey, Lecture Notes Economics and Math.
ment problem; Estimating data for multicri- Systems. Springer, 1979.
teria decision making problems: Optimiza- [14] KEENEY, R.L., AND RAIFFA, H.: Decisions with mul-
tiple objectives: Preferences and value tradeoffs, Wiley,
tion techniques; Multicriteria sorting meth-
1976.
ods; Financial applications of multicriteria [15] KORHONEN, P., AND LAAKSO, J.: 'A visual interac-
analysis; Portfolio selection and multicrite- tive method for solving the multiple criteria problem',
ria analysis; Decision support systems with Europ. J. Oper. Res. 24 (1986), 277-287.
multiple criteria. [16] KORHONEN, P., MOSKOWITZ, H., AND WALLENIUS, Z.:
'Choice behavior in interactive multiple criteria deci-
sion making', Ann. Oper. Res. 23 (1990), 161-179.
References [17] KORHoNEN, P., AND WALLENIUS, J.: 'A Pareto race',
[1] BENAYOUN, R., MONTGOLFIER, J. DE, TERGNY, J., Naval Res. Logist. 35 (1988), 615-623.
AND LARICHEV, O.: 'Linear programming with multi- [18] SAYIN, S.: 'An algorithm based on facial decomposition
ple objective functions: Step method (STEM)', Math. for finding the efficient set in multiple objective linear
Program. 1 (1971), 366-375. programming', Oper. Res. Lett. 19 (1996), 87-94.
[2] BENSON, H.P.: 'Existence of efficient solutions for vec- [19] STEUER, R.E.: 'Operating manual for the AD-
tor maximization problems', J. Optim. Th. Appl. 26 BASE multiple objective linear programming package',
(1978), 569-580. Techn. Report College Business Admin. Univ. Georgia,
[3] BENSON, H.P.: 'A hybrid approach for solving multiple
476
Multi-objective optimization: Interaction of design and control
M u l t i - O b j e c t i v e O p t i m i z a t i o n . In any decision
MULTI-OBJECTIVE OPTIMIZATION: IN- making process, the goal is to reach the best com-
TERACTION OF DESIGN AND CONTROL promise solution among a number of competing
Traditionally, process design and process control objectives. Many examples of competing objec-
are treated sequentially. Dynamics are not con- tives exist in the field of engineering. For exam-
sidered during the design phase, and flowsheet ple, in the design of a process, one may have to
changes can not be made during the control phase. consider safety and operational issues as well as
The problem with this approach is that the two are economic issues. A decision making process is nec-
inherently connected as the design of the process essary when the most economic design is not the
affects its controllability. Thus, the steady state safest or most operable.
design and the dynamic operability issues should The best compromise solution depends on the
be treated simultaneously. Analyzing the interac- relative importance of the conflicting objectives.
tion o] design and control addresses the issue of This relative importance is not easily determined
quantitatively determining the trade-offs between and is usually a subjective decision. The one re-
the steady state economics and the dynamic con- sponsible for making this decision is the deci-
trollability. sion maker (DM) whose choice can be based on
The interaction of design and control problem is a number of factors. Since subjective measures
to determine the process flowsheet which is both and decisions do not translate well into mathemat-
the economically optimal and controllable. There ics, a quantitative way of determining the trade-
are different methods for addressing this problem. offs and relative importance among the the objec-
One common approach is to use overdesign where, tives is necessary for a multi-objective optimiza-
once the economic steady state design is deter- tion framework.
mined, surge tanks are added or equipment sizes
are increased in order to handle any dynamic prob- M u l t i - O b j e c t i v e F r a m e w o r k for t h e I n t e r a c -
lems which may arise. This overdesign is usually t i o n of D e s i g n a n d C o n t r o l . In analyzing the
based on heuristic rules and will likely move the interaction of design and control, the objectives
design away from its economic optimum. There is that are considered measure the steady state eco-
no guarantee that the measures taken will even nomics and the dynamic controllability of the pro-
477
Multi-objective optimization: Interaction of design and control
1) Process representation;
2) Mathematical modeling;
3) Generation of noninferior solution set (deter-
mine trade-offs);
4) Best-compromise examination. Noninferior SolutionSet
478
Multi-objective optimization: Interaction of design and control
terial and energy balances, thermodynamic rela- the noninferior solution set, determining the util-
tions, and other constraints. The controllability ity function based on information from the DM,
measures are included in the formulation as ~7. The and determining the best-compromise solution.
variables in this problem are partitioned as contin- Different techniques have been developed in
uous x and binary y. order to assess the trade-offs among the objec-
tives quantitatively. See [7] for a tutorial in multi-
S o l u t i o n of t h e M O P . One way to address the objective optimization. A review is also available
solution of the MOP is to formulate it using a util- in [17]. Much of the fundamental aspects of multi-
ity function U which implicitly relates the multiple objective optimization can be found in [1].
objectives in terms of some common basis:
¢
rain U[J(x, y)] N o n i n f e r i o r S o l u t i o n Sets. The noninferior so-
lution set can be determined in a number of ways.
s.t. h(x,y)=O
One approach is the formulate the problem as
g ( x , y ) <_ O (2)
xER p min E wiJi(x, y)
icI
y e {0, 1} q.
s.t. h(x,y)=O
By introducing the utility function, the vector g(x, y ) < O (4)
optimization problem has been reduced to a
xER p
scalar optimization problem and MINLP tech-
niques can be applied to solve the problem. These y E {0, 1} q,
MINLP techniques include generalized Benders de- where the weights wi are selected such that wi >_0
composition (GBD)[4], [14], outer approximation for all i and Y~i~I wi - 1. Through a suitable
(OA) [2], outer approximation with equality re- choice of the weights, the noninferior solution set
laxation ( O A / E R ) [ 8 ] , and outer approximation can be found. This approach can miss some points
with equality relaxation and augmented penalty in the noninferior solution set if the solution region
( O A / E R / A P ) [16]. These methods are discussed is nonconvex. In order to address this problem, a
in detail in [3]. weighted norm can be used as follows:
With the definition of the noninferior solution
set, the optimization problem can be formulated
min ~ [wiJi(x, y)] p
aS
it1
min U[J(x, y)] s.t. h(xy)-O
(3) ' (5)
s.t. a(J) -0. g(x, y)<_ 0
479
Multi-objective optimization: Interaction of design and control
The advantage of this formulation is that the problems of the form (2) where U is unknown, con-
weights have a physical meaning for the DM. If vex, and continuously differentiable. Due to con-
the DM knows the desired values for each objec- vexity, the partial derivatives of U with respect to
tive for a given noninferior point, the weights can each of the arguments in the objective space are
be set to the reciprocal of these values. The non- positive. This is expressed mathematically as
inferior solution will be the one that is most like
0U(J) > 0.
the one with the values specified by the DM. The
OJi
disadvantage of this formulation is that it can be
difficult to solve. Thus, a decrease in Ji will lead to a decrease in U.
In the interactive scheme, the DM is asked for the
Another way to determine the noninferior so-
positive trade-off weights, w k, for a given solution
lution set is through the e-constraint method [6].
k. This weight is defined as the ratio of the change
In this approach, all but one of the objectives is
in the utility function with respect to one function
incorporated into the problem as a constraint less
divided by the change in the utility function with
than e. This results in the following formulation:
respect to another. This is expressed mathemati-
min J1 (x, y) cally as
s.t. J i ( x , y ) _< ei, i = 2,...,q, Ou(jk)/oJi
h(x, y) = O wki -
OU(Jk)/OJ1
(7)
g(x, y) _< 0 where jk _ [Jl(x k , y k ) , . . . , J l ( x k,yk)]. A line
xcR p search along a feasible direction of steepest descent
y e (0, 1} q. locates an improved solution for the next iteration.
By exploiting the fact that the utility function
By varying the values of ei, the points of the non-
is convex, cutting planes can be introduced to re-
inferior solution set can be found.
duce the search to improving directions [10]. Since
U is convex,
Choosing the Best-Compromise Solution. To
this point, the focus has been on determining the 0 _ U ( J * ) - U(J k) (8)
noninferior solution set. Only one of the points can ~_ V f u ( j k ) ( j , _ jk)
be chosen as the best solution for the problem,
and the task of the DM is to determine this point. min V f u ( j k ) ( J - J k)
Once the noninferior solution set is determined, it s.t. h(x, y) - O
is presented to the DM who will choose the solu- _~ g(x, y) _~ O
tion point he prefers. The selection of this point is
xER p
based on the relative importance of the objectives
in the eyes of the decision maker. y e {0, 1}q.
Instead of assigning arbitrary weights to the var- This involves the linearization in the objective
ious objectives, a systematic approach can applied space around the point jk. If the solution to the
which uses the trade-off information in the non- minimization is zero, then the optimal solution
inferior solution set. The slope of the noninferior J* has been found. If the solution has a negative
solution set at any point reveals how much one ob- value, then the direction leads to an improvement
jective will be improved at the expense of another in the objective space. This minimization can be
objective. This information is used in an interac- performed over a number of points k - 1 , . . . , K to
tive, iterative cutting plane algorithm to determine find a direction which improves all of them. Cut-
the best compromise solution. ting planes in the objective space are formed to
find new values of the objectives which improve the
Cutting Plane Algorithm. The cutting plane utility function according to the trade-off weights,
algorithm described in [11] is based on [5] and [10]. VU, which the DM provides. At each iteration
Marginal rates of substitution were used to solve of the algorithm, the following problem must be
480
Multi-objective optimization: Interaction of design and control
481
Multi-objective optimization: Interaction of design and control
482
Multi-objective optimization: Interactive methods for preference value functions
recycle system', Computers Chem. Engin. 18, no. 10 mous improvements in the speed and storage of
(1994), 971-994. computers make it practical to apply these algo-
[13] PALAZOGLU, A., AND ARKUN, Y.: 'A multiobjective
rithms to the solution of realistically-sized problem
approach to design chemical plants with robust dy-
namic operability characteristics', Computers Chem. applications.
Engin. 10, no. 6 (1986), 567-575. Formally, the statement of the multi-objective
[14] PAULES, IV, G.E., AND FLOUDAS, C.A.: 'APROS: optimization problem of interest here is
Algorithmic development methodology for discrete-
continuous optimization problems', Oper. Res. 37, SVMAX f(x) - [ f l ( x ) , . . . , fp(X)],
no. 6 (1989), 902-915.
(v)
s.t. xEX.
[15] SCHWEIGER, C.A., AND FLOUDAS, C.A.: 'Interaction
of design and control: Optimization with dynamic mod- Here, p > 2, X is a nonempty subset of R n, each
els', in W.W. HAGER AND P.M. PARDALOS (eds.): Op- fj, j - 1 , . . . ,p, is a real-valued function defined
timal Control: Theory, Algorithms, and Applications, on X or on some suitable set containing X, and
Kluwer Acad. Publ., 1997, pp. 388-435. VMAX indicates that, in some unspecified sense,
[16] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
we are to 'vector maximize' the vector f(x) of ob-
bined penalty function and outer approximation
method for MINLP optimization', Computers Chem. jective ]unctions (criteria) over X. The set X is
Engin. 14, no. 7 (1990), 769-782. called the set of decision alternatives or the deci-
[17] ZIONTS, S.: 'Methods for solving management prob- sion set, and {f(x) E R P ' x e Z}, is called the
lems involving multiple objectives', Working Paper outcome set.
SUNY at Buffalo (1979).
There are a large number of diverse solution al-
Carl A. Schweiger gorithms for problem (V). All are intended to help
Dept. Chemical Engin. Princeton Univ. the decision maker (DM) find a most preferred
Princeton, NJ 08544-5263, USA
solution to the problem. In the majority of these
E-mail address: carl©titan.princeton, e d u
algorithms, the notion of efficiency plays an indis-
Christodoulos A. Floudas
Dept. Chemical Engin. Princeton Univ.
pensable role. An efficient (nondominated, nonin-
Princeton, NJ 08544-5263, USA ferior, Pareto optimal) solution for problem (V) is
E-mail address: floudas@titan, princeton, e d u a solution 5 E X such that there exists no other
solution x E X that satisfies ](x) > f(-~) and
MSC2000: 90C29, 90Cll, 90C90
Key words and phrases: interaction of design and control, f(x) ~ f(5). Let XE denote the set of efficient
multi-objective optimization, mixed integer nonlinear opti- solutions for problem (V). Notice that if 5 E XE,
mization, Pareto optimal solution. then there is no other feasible solution for prob-
lem (V) that achieves at least as large a value as
in each criterion of the problem and a strictly
MULTI-OBJECTIVE OPTIMIZATION: IN- larger value than ~ in at least one criterion of the
TERACTIVE METHODS FOR PREFERENCE problem.
VALUE FUNCTIONS In the great majority of instances of problem
The multi-objective optimization (multiple criteria (V), the preference value ]unction (value ]unc-
decision making) problem is the problem of choos- tion) v of the DM is unknown. This is a function
ing a most preferred solution when two or more in- v" R p ~ R that maps the outcomes of problem
commensurate, conflicting objective functions (cri- (V) to real numbers in such a way that for any
teria) are to be simultaneously maximized. Interest two outcomes yl and y2, the DM prefers yl to y2
in multi-objective optimization has risen sharply if and only if v(y 1) > v(y2). Although v is un-
during the past 30 years. There are at least three known, what is known is that for each objective
reasons for this. First, and most importantly, is the function fj, the DM prefers more of fj to less of
increasing recognition that most applied problems fj. Mathematically, this means that v is coordi-
in both the private and public sectors involve mul- natewise increasing, i.e., that whenever 7, z E R p
tiple objectives rather than one objective. Second, satisfy ~ > z and ~j > zj for some j - 1 , . . . , p ,
a variety of solution algorithms for multi-objective then v(~) > v(z). It is easy to show that when
optimization are now available. Finally, the enor- v is coordinatewise increasing, any maximizer x*
483
Multi-objective optimization: Interactive methods for preference value ]unctions
4"
of v[f(x)] over X must satisfy x* E XE. In other during the iterations seems to be burdensome for
words, as long as the DM prefers more to less, the him in many cases. This may cause the DM to
search for a most preferred solution to problem (V) prematurely terminate the search so that a most
can be confined to XE. This is one of the key rea- preferred solution is not found.
sons that the concept of efficiency is so important There are literally hundreds of interactive algo-
to the majority of the algorithms for problem (V). rithms for problem (V). Many are limited to cases
The interactive methods constitute one of the where problem (V) is a multiple objective linear
most popular categories of algorithms for solving programming problem. Others apply when prob-
problem (V). An interactive method for problem lem (V) is a multiple objective convex, nonlinear
(V) consists of a sequence of DM-computer inter- programming problem, a multiple objective inte-
actions designed to create a sequence of decision ger programming problem, or some other type of
alternatives that terminates with a most preferred multiple objective optimization problem. Instead
solution to the problem. In a majority of cases, of examining these algorithms individually, we will
the generated alternatives are efficient. Each iter- describe them by groups according to the charac-
ation of the interactive process consists of three teristics that they possess.
steps. First, an initial solution is found with the One of the key characteristics of the interac-
aid of the computer. Typically, this solution is tive algorithms concerns the type of information
found by solving a single-objective optimization required of the DM at each iteration. For instance,
problem that generates either an efficient point or, at each iteration, the DM may be asked to intu-
at worst, a feasible point. Next, the DM is asked itively assign or re-assign weights to the criteria
to react to the generated point by answering one according to his current assessment of their rela-
or more questions involving his preferences for it. tive importance. R.E. Steuer [13] has shown some
Last, based upon the answers given, the computer important stumbling blocks to this approach, how-
generates a new point, typically by modifying pa- ever. Other algorithms may instead elicit relax-
rameters in the single-objective optimization prob- ation quantities from the DM. In these cases, the
lem. This process continues until either the com- DM is asked how much he would be willing to re-
puter or the DM identifies a most preferred so- lax the level of one objective function in order to
lution. The value function v of the DM is never obtain possible improvements in the levels of other
needed and, in fact, is assumed to be unavailable. objective functions. Some of the oldest interactive
algorithms use this approach [1], [9]. Still other
There are several advantages to using interac- types of algorithms ask the DM various types of
tive methods as compared to other categories of
trade-off questions. The trade-off questions are de-
methods for problem (V). For instance, the pref- signed to obtain an estimate of the gradient of the
erence information asked of the DM at each it- value function of the DM at the current solution.
eration is not difficult to supply. Furthermore, the This approach is also relatively old, but difficult
DM thereby learns about his value function, which for the DM to accomplish [5], [14]. Finally, a num-
is often initially vague or mostly unknown. As the ber of algorithms call for the DM to make paired
search continues, the DM also learns about the comparisons at each iteration. In a paired com-
decision or efficient decision alternatives available parison, the DM is given two solutions to compare
and the trade-offs in the objective functions across and must give his preference for one or the other.
these decision alternatives. The optimizations re- Usually, the DM can accomplish this. But when
quired of the computer are also usually not difficult the two solutions are quite similar, difficulties can
to perform. Finally, because the DM is highly in- arise [15]. In addition, algorithms that use paired
volved in the process, his confidence in the most comparisons can sometimes call for excessive num-
preferred solution that is eventually found is en- bers of these comparisons [12].
hanced.
A second dimension where the interactive algo-
A frequent criticism of the interactive methods rithms differ is in the approach used to explore
is that, in practice, the work required of the DM the feasible region X or the efficient set XE. Some
484
Multi-objective optimization: Interactive methods for preference value functions
algorithms use .feasible direction methods [2]. In overall quality of the solution process and the an-
these algorithms, at each iteration, the direction to swers obtained. Although preliminary, these com-
move from a point that was last found and the dis- parisons seem to show the relative superiority of
tance to move along the direction are determined the weighting space reduction and other criterion
with the aid of the DM. By moving along the di- weight space search methods, and of the visual in-
rection by the specified amount, the next solution teractive methods. Readers should note, however,
point is found. In many algorithms, all such points that the rankings in the study are subjectively-
are efficient. In another group of algorithms, feasi- obtained by the authors [7].
ble region reduction is used to explore X or XE. As For further general reading on interactive meth-
points in X or in XE are examined in these meth- ods, see [2], [3], [4], [6], [11], [12], [13], [14].
ods, portions of X are removed, usually via lin- See also: M u l t i - o b j e c t i v e optimization:
ear cuts. Another set of algorithms uses weighting P a r e t o o p t i m a l solutions, p r o p e r t i e s ; M u l t i -
space reduction. In these algorithms, a weighted objective optimization: Lagrange dual-
sum of fj, j = 1 , . . . , p , is maximized at each iter- ity; M u l t i - o b j e c t i v e o p t i m i z a t i o n : I n t e r a c -
ation, thereby yielding a point in XE. Based upon tion of design a n d control; O u t r a n k i n g
the DM's responses to these maximizations, por- m e t h o d s ; P r e f e r e n c e d i s a g g r e g a t i o n ; Fuzzy
tions of the weighting space are removed. Eventu- m u l t i - o b j e c t i v e linear p r o g r a m m i n g ; M u l t i -
ally, the portion of the weighting space remaining o b j e c t i v e o p t i m i z a t i o n a n d decision sup-
is so small that the DM can pick out the set of p o r t s y s t e m s ; P r e f e r e n c e d i s a g g r e g a t i o n ap-
weights associated with a most preferred solution. proach: Basic f e a t u r e s , e x a m p l e s f r o m fi-
Other approaches used to explore X o r X E nancial decision m a k i n g ; P r e f e r e n c e model-
include the trade-off cutting plane method [10], ing; M u l t i p l e o b j e c t i v e p r o g r a m m i n g sup-
Lagrange multiplier methods, visual interactive port; M u l t i - o b j e c t i v e i n t e g e r linear pro-
methods (see, e.g. [7]), and the branch and bound gramming; Multi-objective combinatorial
method [8], among others. For further reading con- o p t i m i z a t i o n ; B i - o b j e c t i v e a s s i g n m e n t prob-
cerning these methods, see [3], [4], [6], [11], [12], lem; E s t i m a t i n g d a t a for m u l t i c r i t e r i a deci-
[13] sion m a k i n g p r o b l e m s : O p t i m i z a t i o n tech-
niques; M u l t i c r i t e r i a s o r t i n g m e t h o d s ; Fi-
Another way to group the interactive algorithms nancial a p p l i c a t i o n s of m u l t i c r i t e r i a anal-
for problem (V) is according to whether or not ysis; P o r t f o l i o selection a n d m u l t i c r i t e r i a
they handle inconsistencies in the DM's preference analysis; Decision s u p p o r t s y s t e m s w i t h
responses. As human beings, DM's are prone to m u l t i p l e criteria.
giving preference responses over the course of the
solution procedure that imply inconsistencies such
as asymmetries or intransitivities of preference. References
Some algorithms take no account of these possi- [1] BENAYOUN, a., MONTGOLFIER, J., TERGNY, J., AND
LARITCHEV, O.: 'Linear programming with multiple
ble inconsistencies and have been criticized for this
objective functions: Step method (STEM)', Math. Pro-
[12]. Others attempt to reduce inconsistency by ei- gram. 1 (1971), 366-375.
ther minimizing the DM's cognitive burden or by [2] BENSON, H.P., AND AKSOY, Y.: 'Using efficient fea-
incorporating tests for inconsistency that are used sible directions in interactive multiple objective linear
as the interactive solution process proceeds. programming', Oper. Res. Lett. 10 (1991), 203-209.
[3] BUCHANAN, J.T., AND DAELLENBACH, H.G.: 'A com-
W.S. Shin and A. Ravindran [12] have compared parative evaluation of interactive solution methods for
various of the classes of interactive algorithms ac- multiple objective decision models', Europ. J. Oper.
cording to four criteria that are important in prac- Res. 29 (1987), 353-359.
[4] EVANS, G.W.: 'An overview of techniques for solv-
tice. These criteria are the DM's cognitive bur-
ing multiobjective mathematical programs', Managem.
den, the ease with which the single-objective op- Sci. 30 (1984), 1268-1282.
timizations called for can be used, implemented [5] GEOFFRION, A.M., DYER, J.S., AND FEINBERG, A.:
and solved, the handling of inconsistency, and the 'An interactive approach for multicriterion optimiza-
485
Multi-objective optimization: Interactive methods for preference value functions
486
Multi-objective optimization: Lagrange duality
487
Multi-objective optimization: Lagrange duality
for some x C X ~
/ problem
sup inf (#,f(x))+ (A g(x))
A>0xEX
has at least one solution. K]
Ya={y: (0,y) e G , 0 e R m, y e R P } .
Associates with the primal problem (P), we con- On the other hand, J.W. Nieuwenhuis [7] sug-
sider the following two kinds of dual problems" gested another normality condition:
[_j DEFINITION 6 The primal problem (P) is said to
AE£ be N-normal, if
where
clYG - YclG.
YS(A) -- {Y e R p" f ( x ) + Ag(x) ~ y, Vx e X'}
O
and
LEMMA 7 Slater's constraint qualification (3~,
(Dj) Y H - ( ~ #, ) , g(~) > 0) yields J-stability and N-normality. E]
#>0
)~>0
where THEOREM 8 Suppose that Yc is closed,
minD (P) ~ 0, and the efficient solutions to (P)
YH- (~,,)
are all proper. Then, under the condition of J-
-- { y E R p.
+ ], stability,
Vx C X~ S
min (P) - max (DN) -- max (Dj).
THEOREM 3 i) For any feasible x in (P) and
[:]
for any feasible y in (DN) or (Dj),
488
Multi-objective optimization: Pareto optimal solutions, properties
489
Multi-objective optimization: Pareto optimal solutions, properties
Here, p > 2, X is a nonempty subset of R n, each for problem (V). Notice that XE is a subset of
fj, j = 1 , . . . , p, is a real-valued function defined on XWE. In some cases of problem (V), such as when
X or on a suitable set containing X, and VMAX the objective functions are ratios of linear func-
indicates that we are to, in some as-yet unspecified tions, it is easier to analyze and generate points in
sense, 'vector maximize' the vector XWE than points in XE.
Let U represent a utility function defined on the
f(x) - ,
space R p of the objective functions of problem (V).
of objective/unctions (criteria) over X. The set X Suppose that U is coordinatewise increasing, i.e.,
is called the set of alternatives or the decision set. that whenever ~, z E R p satisfy ~ > z and ~j > zj
Of all of the solution concepts proposed for help- for some j - 1,... ,p, then U(~) > g(z). Suppose
ing the DM find a most preferred solution for prob- that x* is an optimal solution to the single objec-
lem (V), the concept of efficiency has proven to tive problem
be of overriding importance. An efficient (Pareto
optimal, noninferior, nondominated) solution for (s)
problem (V) is a point 5 E X such that there exists xEX
no other point x E X that satisfies f(z) > f(~)
and f(x) ~ f(~). Letting X E denote the set of all Then x* must be an efficient solution for problem
efficient points for problem (V), we see that when- (V) (cf. [11]).
ever 5 E XE, there is no other feasible point that The property in the previous paragraph explains
does at least as well as 5 in all of the criteria for to a great extent why the concept of efficiency is
problem (V) and strictly better in at least one cri- of such fundamental value. The assumption that
terion. A point 5 E X is called dominated when the utility function U in the above paragraph is
for some other point x E X, f(x) >_ f(~) and, coordinatewise increasing implies that in problem
for at least one j - 1,... ,p, fj(x) > fj(5). Thus, (S), for each j - 1 , . . . , p, more of fj is preferred to
we have the alternate definition for efficiency that less of fj. Thus, if we imagine that U is the utility
states that a point 5 is an efficient solution for (or value) function of the DM over the objective
problem (V) when 5 E X and there are no other function space of problem (V), then the previous
points in X that dominate 5. paragraph implies that whenever the DM prefers
One of the reasons for the fundamental impor- more to less in each objective function of problem
tance of the efficiency concept is that it has proven (V), any point that maximizes the DM's utility
to be highly useful in a variety of algorithms for for f (x) over X must be an efficient point in prob-
problem (V). Among these algorithms are the sat- lem (V). In short, as long as we know that the
isficing methods, compromise programming, most DM prefers more to less, we can confine the search
interactive methods, and the vector maximization for a most preferred solution to XE. Although the
method. The latter method, for instance, seeks to utility function of the DM is generally not actu-
generate either all of XE or key parts of XE. The ally available, in virtually all applications the DM
generated set is shown to the DM. Then, based does, indeed, prefer more to less in each objective
upon the DM's internal utility (or value) function, function of problem (V). Thus, in essentially all
the DM chooses from the generated set a most pre- cases, any most preferred solution for problem (V)
ferred solution. For details concerning these meth- will be found in XE.
ods for problem (V), see [7], [10], [12], [13], [14]. Because of the central importance of efficiency,
In some cases, it is useful to consider a slightly a great deal of effort has been made by researchers
relaxed concept of efficiency called weak efficiency. to delineate the properties of the efficient points
A point ~ E X is called a weakly efficient (weakly and of the efficient set for problem (V). In what
Pareto optimal, weakly noninferior, weakly non- follows, we shall briefly highlight some of the most
dominated) solution for problem (V) when there is important of these properties.
no other point x E X such that f(x) > f(~). Let Consider the single-objective optimization prob-
XWE denote the set of all weakly efficient points lem
490
Multi-objective optimization: Pareto optimal solutions, properties
p
The scalarization properties can be used for var-
max E wjfj(x), ious purposes, including the generation of points in
(W) j-1
X g , XWE and XPR E. For instance, when each f j,
s.t. xEX. j - 1 , . . . , p, is a linear function and X is a polyhe-
Here, wj, j = 1,... ,p, are parameters, which are dron, from properties 3) and 6), points in XE, in-
often thought of as weights associated with the cluding, at least potentially, all of XE, can be gen-
objective functions fj, j = 1 , . . . ,p, of problem erated by solving problem (W) as the parameter
(V). A number of so-called scalarization properties w > 0 is varied. Under the assumptions of property
for efficient points of problem (V) are expressed in 3), the same process will generate points in XpRE,
terms of problem (W). To present some of these, including, at least potentially, all of XpRE. How-
another efficiency concept, called proper efficiency, ever, from properties 3)-5), it is apparent that no
is needed. A point x ° is said to be a properly ef- such simple process for generating XE exists, even
ficient solution for problem (V) when x ° E XE under the assumptions of property 3). This is an-
and, for some sufficiently large number M, when- other motivation for the proper efficiency concept.
ever fi(x) > fi(x °) for some i = 1 , . . . , p and some Another i m p o r t a n t issue in efficiency concerns
x E X, there exists some j = 1 , . . . , p such that testing. One may want to test a given point for ef-
f j(x) < f j(x °) and ficiency in problem (V), and one may want to test
whether XE and X p R E a r e empty or not. We will
Si( ) - Si(x o )
<M.
-
present several of the properties of efficiency that
provide some of the theory for these tests. These
In words, for each properly efficient solution of properties all utilize the single-objective problem
problem (V), for each criterion, the possible mar- p
ginal gains in that criterion relative to the losses max Eli(x),
in the criteria that have losses cannot all be un- j=l
bounded from above. Let XPR E denote the set of (T) s.t. fj(x) > ]j(x°),
properly efficient solutions for problem (V), and j - 1,...,p,
let w T - ( W l , . . . , Wp). T h e n some key scalariza- xEX.
tion properties are as follows.
Here, x ° is an arbitrary element of R n. The prop-
1) If ~ is the unique optimal solution to problem erties are as follows.
(W) for some w > 0, w 7~ 0, then 5 E XE.
7) The point x ° E R n belongs to XE if and only
2) If ~ is an optimal solution to problem (W) if x ° is an optimal solution to problem (T).
for some w >_ 0, w 5¢ 0, then 5 E XWE.
8) Suppose that x ° E X in problem (T), and
3) Assume that for each j - 1 , . . . , p , fj is a that problem (T) has no finite m a x i m u m
concave function on the convex set X. Then value. Then X p R E - 0 [1].
X E XpRE if and only if ~ is an optimal so-
9) Suppose t h a t the assumptions of property 3)
lution to problem (W) for some w > 0.
hold, that x ° E X in problem (T), and that
4) Under the assumptions in property 3), 5 E problem (T) has no finite m a x i m u m value.
XWE if and only if ~ is an optimal solution Then, if the set
to problem (W) for some w > 0, w 7~ 0.
Z - {z E R p" z <_ f (x) for s o m e x E X }
5) Under the assumptions of property 3), if
X E XE but 5 ~ XpRE, then there exists is closed, XE -- O.
a w > 0, w 7~ 0 with wj - 0 for at least
10) Assume t h a t each fj, j - 1 , . . . , p , is a linear
one j - 1 , . . . , p such that 5 is an optimal
function and t h a t X is a polyhedron. Sup-
solution to problem (W).
pose that x ° E X in problem (T), and that
If each fj, j - 1 , . . . ,p, is a linear function problem (T) has no finite m a x i m u m value.
and X is a polyhedron, Xp R E -- XE. T h e n XE -- O.
491
Multi-objective optimization: Pareto optimal solutions, properties
492
Multicommodity flow problems
[2] BENSON, H.P.: 'On a domination property for vector these commodities might be telephone calls in a
maximization with respect to cones', J. Optim. Th.
telecommunications network, packages in a distri-
Appl. 39 (1983), 125-132.
bution network, or airplanes in an airline flight net-
[3] BENSON, H.P.: 'Errata corrige', J. Optim. Th. Appl.
43 (1984), 477-479. work. Each commodity has a unique set of charac-
[4] BENSON, H.P.: 'Complete efficiency and the initializa- teristics and the commodities are not interchange-
tion of algorithms for multiple objective programming', able. That is, you cannot satisfy demand for one
Oper. Res. Left. 10 (1991), 481-487. commodity with another commodity. The objec-
[5] BENSON, H.P., AND SAYIN, S.: 'Towards finding global
tive of the MCF problem is to flow the commodi-
representations of efficient sets in multiple objective
mathematical programming', Naval Res. Logist. 44 ties through the network at minimum cost without
(1997), 47-67. exceeding arc capacities. A comprehensive survey
[6] BENVENISTE, M.: 'Testing for complete efficiency in of linear multicommodity flow models and solution
a vector maximization problem', Math. Program. 12 procedures are presented in [2].
(1977), 285-288.
Integer multicommodity flow (IMCF) problems,
[7] GOICOECHEA, A., HANSEN, D.R., AND DUCKSTEIN,
L.: Multiobjective decision analysis with engineering a constrained version of the linear multicommodity
and business applications, Wiley, 1982. flow problem in which flow of a commodity (speci-
[8] HENIG, M.I.: 'The domination property in multicri- fied in this case by an origin-destination pair) may
teria optimization', J. Math. Anal. Appl. 114 (1986), use only one path from origin to destination.
7-16.
MCF and IMCF problems are prevalent in a
[9] Luc, D.T.: Theory of vector optimization, Springer,
1989. number of application contexts, including trans-
[10] SAWARAGI,Y., NAKAYAMA,H., AND TANINO, T.: The- portation, communication and production.
ory of multiobjective optimization, Acad. Press, 1985.
[11] SOLAND, R.M.: 'Multicriteria optimization: A general MCF Example Applications.
characterization of efficient solutions', Decision Sci. 10
• Routing vehicles in traffic networks (dynamic
(1979), 26-38.
[12] STEUER, R.: Multiple criteria optimization: Theory, traffic assignment). This involves the deter-
computation and application, Wiley, 1986. mination of minimum delay routes for ve-
[13] Yu, P.L.: Multiple-criteria decision making, Plenum, hicles from their origins to their respective
1985. destinations over the traffic network. The al-
[14] ZELENY, M.: Multiple criteria decision making, Mc-
lowable congestion levels determine the arc
Graw-Hill, 1982.
capacities. Alternatively, there are no capac-
Harold P. Benson ities but the cost on an arc is a function of
Dept. Decision and Information Sci. Univ. Florida
the amount of flow on the arc. In the former
Gainesville, Florida 32611-7169, USA
case, the objective function is linear while in
E-mail address: bensonCdale, cba. ufl. edu
the latter it is nonlinear.
MSC 2000:90C29
Key words and phrases: multi-objective optimization, mul- • Distribution systems planning. In this prob-
tiple criteria decision making, efficient solution, Pareto opti- lem there are different products (or, com-
mal solution, noninferior solution, nondominated solution, modities) produced at several plants with
weakly efficient solution, weakly Pareto optimal solution,
known production capacities. Each commod-
weakly noninferior solution, weakly nondominated solution,
properly efficient solution. ity has a certain demand in each customer
zone. The demand is satisfied by shipping
via regional distribution centers with finite
MULTICOMMODITY FLOW PROBLEMS storage capacities. A.M. Geoffrion and G.W.
Graves [28] model this problem of routing the
Linear multicommodity flow problems (MCF) are
commodities from the manufacturing plants
linear programs (LPs) that can be characterized
to the customer zones through the distribu-
by a set of commodities and an underlying net-
tion centers as a MCF problem.
work. A commodity is a good that must be trans-
ported from one or more origin nodes to one or • Import and export models. One of the factors
more destination nodes in the network. In practice that may affect export is handling capacity at
493
Multicommodity flow problems
ports. D. Barnett, J. Binkley and B. McCarl be modeled as a MCF problem with inflows
[8] use a MCF model to analyze the effect of given as probabilistic density functions.
US port capacities on the export of wheat, • Forest management. For each planning pe-
corn and soybean. riod, forest managers have to make decisions
• Optimization of freight operations. T. concerning the land areas to be harvested,
Crainic, J.A. Ferland and J.M. Rousseau [20] the volume of timber to be harvested from
develop a MCF-based routing and scheduling these areas, the land areas to be developed
optimization model that considers the plan- for recreation and the road network to be
ning issues for the railroad industry. More built and maintained in order to support
recently, H.N. Newton [48] and C. Barnhart, both the timber haulers and recreationists.
H. Jin and P.H. Vance [13] study the rail- This problem has been formulated as a MCF
road blocking problem using multicommod- problem in [33].
ity based formulations. • Street planning. L.R. Foulds [26] introduced
• Freight Assignment in the Less-than- this problem and modeled it as a MCF prob-
Truckload (LTL) industry. An LTL carrier lem. The objective is to identify a set of two-
has to consolidate many shipments to make way streets such that making these streets
economic use of the vehicles. This requires one-way minimizes the total congestion cost
the establishment of a large number of termi- in the network.
nals to sort freight. Trucking companies use • Spatial price equilibrium (SPE) problem.
forecasted demands to define routes for each This problem requires modeling consumer
vehicle to carry freight to and from the termi- flows within a general network. The SPE
nals. Once the routes are fixed, the problem problem determines the optimum levels of
is to deliver all the shipments with minimum production and consumption at each market
total service time or cost. This problem is and the optimal flows satisfy the equilibrium
formulated as a MCF problem in [17] and property. R.S. Segall [59] models and solves
[24]. the SPE problem as a MCF problem.
• Express Shipment Delivery. D. Kim [40] For a more comprehensive description of MCF
models the shipment delivery problem faced applications, see [57], [2], [37].
by express carriers like Federal Express,
United States Postal Service, United Parcel IMCF Example Applications.
Service, etc. as a MCF problem on a network • Airline fleet assignment. Given a time ta-
in space and time. ble of flight arrivals and departures, the ex-
• Routing messages in a telecommunications or pected demand on the flights and a set of air-
computer network. The network consists of craft, the objective is to arrive at a minimum
transmission lines. Each message request is cost assignment of aircraft to the flights. This
a commodity. The problem is to route the problem has been extensively studied in [1],
messages from origins to the respective des- [31].
tinations at a minimum cost. T.L. Magnanti • Airline crew scheduling. This problem deals
et al. [42] and others provide MCF-based for- with the minimum cost scheduling of crews.
mulations for this problem. Factors such as hours of work limitations and
• Long-term hydro-generation optimization. Federal Aviation Administration regulations
The task in this case is to determine the must be taken into account while solving the
amount of hydro-generation at a reservoir in problem. For an in-depth study see [5], [14].
an interval of time, that minimizes the ex- • Airline maintenance routing problems re-
pected cost of power generation over a pe- quire that single aircraft be routed such that
riod of time, divided into several intervals. N. maintenance requirements are satisfied and
Nabonna [47] showed that this problem can each flight is assigned to exactly one aircraft.
494
Multicommodity flow problems
This problem has been studied in [19], [10], signing commodity k in its entirety to arc ij equals
k
[251 qk times the unit flow cost for arc ij, denoted cij.
Bandwidth packing problems require that Arc ij has capacity dij, for all ij E A. Node i has
bandwidth be allocated in telecommunica- supply of commodity k, denoted b/k, equal to 1 if
tions networks to maximize total revenue. i is the origin node for k, equal to - 1 if i is the
destination node for k, and equal to 0 otherwise.
The demands, or calls, on the networks are
the commodities and the objective is to route The node-arc MCF formulation is:
the calls from their origin to their destina- kqk k
minimize E E Cij Xij (1)
tion. In the case of video teleconferencing,
kEK ijEA
since call splitting is not allowed, each call
must be routed on exactly one network path. such that
This IMCF problem is described in [49]. k k
E xij- E xji -- bki, Vi E N, Vk E K, (2)
Package flow problems, such as those arising ijEA jiEA
in express package delivery operations, re-
~-'~ qk Xijk <
_ dij , Vij E A, (3)
quire that shipments, each with a specific ori-
kEK
gin and destination, be routed over a trans-
Xijk _> O, Vij E A vk E K. (4)
portation network. Each set of packages with
a common origin-destination pair can be con-
Note that without restricting generality of the
sidered as a commodity and often, to facil-
problem, we model the arc flow variables x having
itate operations and ensure customer satis-
values between 0 and 1. To do this, we scale the
faction, must be assigned to a single network
demand for each commodity to 1 and accordingly
path. These problems are cast as IMCF prob-
adjust the coefficients in the objective function (1)
lems in [12].
and in constraints (3). Also note the block-angular
structure of this model. The conservation of flow
F o r m u l a t i o n s . Multicommodity flow problems constraints (2) form nonoverlapping blocks, one for
can be modeled in a number of ways depending each commodity. Only the arc capacity constraints
how one defines a commodity. There are three ma- (3) link the values of the flow variables of different
jor options: a commodity may originate at a sub- commodities.
set of nodes in the network and be destined for To contrast, the path-based or column genera-
another subset of nodes, or it may originate at a tion MCF formulation has fewer constraints, and
single node and be destined for a subset of the far more variables. Again, the underlying network
nodes, or finally it may originate at a single node G is comprised of node set N and arc set A, with
and be destined for a single node. K.L. Jones et qk representing the quantity of commodity k. P(k)
al. [34] present models for each of these different represents the set of all origin-destination paths in
cases. In the interest of space, we will only consider G for k, for all k E K. In the column generation
models for the last case. The other cases can also model, the binary decision variables are denoted
be modeled using variants of the models presented yk, where ypk is the fraction of the total flow of
here. commodity k assigned to path p E P(k). The cost
We present two different formulations of the of assigning commodity k in its entirety to path p
MCF problem: the node-arc or conventional for- equals qk times the unit flow cost for path p, de-
mulation and the path or column generation for- noted Cpk. ck represents the sum of the cija costs for
mulation. The MCF is defined over the network G all arcs ij contained in path p. As before, arc ij
comprised of node set N and arc set A. MCF con- P is equal
has capacity dij, for all ij E A. Finally, 5ii
tains decision variables x, where xijk is the fraction to 1 if arc ij is contained in path p E P(k), for all
of the total quantity (denoted qk ) of commodity k E K; and is equal to 0 otherwise.
k assigned to arc ij. In the IMCF problem these The path or column generation IMCF formula-
variables are restricted to be binary. The cost of as- tion is then:
495
Multicommodity flow problems
minimize ~ E k kypk
Cpq (5) ploit the underlying network structure. Experi-
kEgpEP(k) ences with primal partitioning techniques have
such that been reported in [51], [53], [54], [55], [32], [43],
[36], [24], among others. J.B. Rosen [53] devel-
E E
kEKpeP(k)
P < dij, Vij E A, (6) ops a partitioning strategy for angular problems.
J.K. Hartman and L.S. Lasdon [32] develop a gen-
E ykp - 1 ' VkEK, (7) eralized upper bounding algorithm for multicom-
p~P(k) modity network flow problems in which the special
> 0, vp e P(k), vk e K. (s) structure of the MCF problem is exploited. Their
primal partitioning procedure, a specialization of
the generalized upper bounding procedure devel-
LP Solution M e t h o d s . Comprehensive surveys oped by G.B. Dantzig and R.M. Van Slyke [21],
of the available multicommodity network flow so- involves the determination at each iteration of the
lution techniques are provided in [6], [37]. Descrip- inverse of a basis containing only one row for each
tions of these approaches are also provided in [2], saturated arc. Similarly, C.J. McCallum [44] devel-
[38]. oped a generalized upper bounding algorithm for a
Price-directive decomposition techniques use the communications network planning problem. All of
path-based MCF model. To limit the number of these procedures exploit the block-diagonal prob-
variables considered in finding an optimal solu- lem structure and perform all steps of the simplex
tion, column generation techniques are used. Fur- method on a reduced working basis of dimension
ther details of price-directive decomposition and m, where m represents the size of set A.
column generation are provided in [22], [41], [61], Interior point methods and parallel comput-
[18], [45]. ing techniques have also been applied to MCF
Resource-directive decomposition techniques at- problems. Interior point methods provide polyno-
tempt to solve MCF problems by allocating arc mial time algorithms for the MCF problems. The
capacity by commodity and solving the resulting best time bound is due to P.M. Vaidya [62]. G.L.
decoupled minimum cost flow problems for each Schultz and R.R. Meyer [58] provide an interior
commodity. Additional descriptions of this tech- point method with massive parallel computing to
nique can be found in [52], [61], [27], [41], [30], solve multicommodity flow problems.
[37], [39], [35], [60]. Development of new heuristic procedures for
Computational comparisons of the performance MCF problems include the primal and dual-ascent
of price- and resource-directive decomposition heuristics described in [17] and [9], respectively. A.
methods can be found in [3], [4]. A. Ali, R.V. Gersht and A. Shulman [29] use a barrier-penalty
Helgason, J.L. Kennington, and H. Lall [4] report method to find nearly optimal solutions for mul-
that specialized decomposition codes can be ex- ticommodity problems, while R. Schneur [62] de-
pected to run from three to ten times faster than a scribes a scaling algorithm to determine nearly fea-
general linear programming package. Furthermore, sible MCF solutions.
A.A. Assad [7] reports that resource-directive al- Recently, price-directive decomposition or col-
gorithms converge quickly for small problems but umn generation approaches, such as those pre-
are outperformed by the price-directive method for sented in [2], [11], [23], [34] have been the most
larger MCF problems. extensively used method for solving large versions
G. Saviozzi [56] uses subgradient techniques of the linear MCF problem. The general idea of col-
on the Lagrangian relaxation of the bundle con- umn generation is that optimal solutions to large
straints and proposes a method of arriving at an LP's can be obtained without explicitly including
advanced starting basis for the minimum cost mul- all columns (i.e., variables) in the constraint ma-
ticommodity flow problem. trix (called the Master Problem or MP). In fact,
Partitioning methods specialize the simplex only a very small subset of all columns will be in an
method by partitioning the current basis to ex- optimal solution and all other (nonbasic) columns
496
Multicommodity flow problems
497
Multicommodity flow problems
p. The latter branch, however, cannot be enforced subproblems are formulated as constrained or un-
if the pricing problem is solved as a shortest path constrained shortest path problems.
problem. There is no guarantee that the solution to P. Raghavan and C.D. Thompson [50] illustrate
the shortest path problem is not path p. In fact, it the use of randomized algorithms to solve some
is likely that the shortest path for k is indeed path integer multicommodity flow problems. They use
p. As a result, to enforce a branching decision, the randomized rounding procedures that give prov-
pricing problem solution must be achieved using a ably good solutions in the sense that they have a
next shortest path procedure. In general, for a sub- very high probability of being close to optimality.
problem, involving a set of a branching decisions, Barnhart et al. [12] present a branch and price
the pricing problem solution must be achieved us- and cut algorithm for general IMCF problems
ing a kth shortest path procedure. where each commodity is represented by an origin-
The key to developing a branch and price pro- destination pair and flow volume. Branch and cut,
cedure is to identify a branching rule that elimi- another variant of branch and bound, allows valid
nates the current fractional solution without com- inequalities, or cuts, to be added throughout the
promising the tractability of the pricing problem. branch and bound tree. Branch and price and cut
In general, J. Desrosiers et al [23] argue this can combines column and row generation to yield very
be achieved by basing branching rules on variables strong LP relaxations at nodes of the branch and
in the original formulation, and not on variables bound tree.
in the column generation formulation. This means See also: M i n i m u m cost flow p r o b l e m ; N o n -
that branching rules should be based on the arc convex n e t w o r k flow p r o b l e m s ; Traffic net-
flow variables Xij
k from the node-arc formulation of w o r k e q u i l i b r i u m ; N e t w o r k location: Coy-
the problem. Barnhart et al. [15] develop branch- e r i n g p r o b l e m s ; M a x i m u m flow p r o b l e m ;
ing rules for a number of different master problem Shortest path tree algorithms; Steiner tree
structures. They also survey specialized algorithms problems; Equilibrium networks; Survivable
that have appeared in the literature for a broad networks; Directed tree networks; Dynamic
range of applications. traffic n e t w o r k s ; A u c t i o n a l g o r i t h m s ; Piece-
M. Parker and J. Ryan [49] present a branch and wise linear n e t w o r k flow p r o b l e m s ; N o n o r i -
price algorithm for the bandwidth packing prob- e n t e d m u l t i c o m m o d i t y flow p r o b l e m s ; C o m -
lem. in which the objective is to choose which of munication network assignment problem;
a set of commodities to send in order to maxi- G e n e r a l i z e d n e t w o r k s ; E v a c u a t i o n networks;
mize revenue. They use a path-based formulation. N e t w o r k d e s i g n p r o b l e m s ; S t o c h a s t i c net-
Their branching scheme selects a fractional path w o r k problems" M a s s i v e l y parallel solution.
and creates a number of new subproblems equal to
the length of the path (measured in the number of References
[1] ABARA, J.: 'Applying integer linear programming to
arcs it contains) plus one. On one branch, the path
the fleet assignment problem', Inter:faces 19 (1989),
is fixed into the solution and on each other branch, 20-28.
one of the arcs on the path is forbidden. To limit [2] AHUJA, R.K., MAGNANTI, T.L., AND ORLIN, J.B.:
time spent searching the tree they use a dynamic Network flows: Theory, algorithms, and applications,
optimality tolerance. They report the solution of Prentice-Hall, 1993.
[3] ALI, A.I., BARNETT, D., FAaHANGIAN, K., KENNING-
14 problems with as many as 93 commodities on
TON, J.L., PATTY, B., SHETTY, B., MCCARL, B., AND
networks with up to 29 nodes and 42 arcs. All but TONG, P.: 'Multicommodity network problems: Appli-
two of the instances are solved to within 95% of cations and computations', IIE Trans. 16 (1984), 127-
optimality. 134.
K. Ziarati et al. [16] consider the problem of as- [4] ALI, A., HELGASON, R., KENNINGTON, J., AND LALL,
H.: 'Computational comparison among three multi-
signing railway locomotives to trains. They model
commodity network flow algorithms', Oper. Res. 28
the problem as an integer multicommodity flow (1980), 995-1000.
problem with side constraints and solve using [5] ANBIL, R., GELMAN, E., PATTY, B., AND TANGA,
a Dantzig-Wolfe decomposition technique, where R.: 'Recent advances in crew-pairing optimization at
498
Multicommodity flow problems
American Airlines', Interfaces 21 (1991), 62-64. [21] DANTZIG, G.B., AND SLYKE, R.M. VAN: 'Generalized
[6] ASSAD, A.A.: 'Multicommodity network flows- A sur- upper bounding techniques', J. Comput. Syst. Sci. 1
vey', Networks 8 (1978), 37-91. (1967), 213-226.
[7] ASSAD, A.A.: 'Solving linear multicommodity flow [22] DANTZIG, G.B., AND WOLFE, P.: 'Decomposition
problems': Proc. IEEE Internat. Conf. Circuits and principle for linear programs', Oper. Res. 8 (1960),
Computers, Vol. 1, 1980, pp. 157-161. 108-111.
[s] BARNETT, D., BINKLEY, J., AND MCCARL, B.: 'The [23] DESROSIERS, J., DUMAS, Y., SOLOMON, M.M., AND
effects of US port capacity constraints on national and SOUMIS, F.: 'Time constrained routing and schedul-
world grain shipments', Techn. Paper Purdue Univ. ing', in M.E. BALL, T.L. MAGNANTI, C. MONMA, ,
(1982). AND G.L. NEMHAUSER (eds.): Handbook Oper. Res.
[9] BARNHART, C.: 'Dual-ascent methods for large-scale and Management Sci., Vol. 8, Elsevier, 1995, pp. 35-
multi-commodity flow problems', Naval Res. Logist. 40 139.
(1993), 305-324. [24] FARVOLDEN, J.M., POWELL, W.B., AND LUSTIG, I.J.:
[10] BARNHART, C., BOLAND, N.L., CLARKE, L.W., 'A primal partitioning solution for the arc-chain for-
JOHNSON, E.L., NEMHAUSER, G.L., AND SHENOI, mulation of a multicommodity network flow problem',
R.G.: 'Flight string models for aircraft fleeting and Oper. Res. 4, no. 4 (1993), 669-693.
routing', Transport. Sci. 32, no. 3 (1998), 208-220, Fo- [25] FEO, T.A., AND BARD, J.F.: 'Flight scheduling and
cused Issue on Airline Optimization. maintenence base planning', Managem. Sci. 35 (1989),
[11] BARNHART, C., HANK, C.A., JOHNSON, E.L., AND 1415-1432.
SIGISMONDI, G.: 'A column generation and parti- [26] FOULDS, L.R.: 'A multicommodity flow network design
tioning approach for multicommodity flow problems', problem', Transport. Res. B 15 (1981), 273-283.
Telecommunication Systems 3 (1995), 239-258. [27] GEOFFRION, A.M.: 'Primal resource-directive ap-
[12] BARNHART, C., HANK, C.A., AND VANCE, P.H.: proaches for optimizing non-linear decomposable sys-
'Using branch-and-price-and-cut to solve origin- tems', Oper. Res. 18 (1970), 375-403.
destination integer multicommodity flow problems', [2s] GEOFFRION, A.M., AND GRAVES, G.W.: 'Multicom-
Oper. Res. 48, no. 2 (2000), 318-326. modity distribution systems design by Bender's decom-
[13] BARNHART, C., JIN, H., AND VANCE, P.H.: 'Railroad position', Managem. Sci. 20 (1974), 822-844.
blocking: A network design application', Working Pa- [29] GERSHT, A., AND SHULMAN, A.: 'A new algorithm for
per Center Transport. Stud., MIT (1997). the solution of the minimum cost multicommodity flow
[14] BARNHART, C., JOHNSON, E.L., ANBIL, R., AND problem', Proc. 26th IEEE Conf. Decision and Control
HATAY, L.: 'A column generation technique for the (1987), 748-758.
long-haul crew-assignment problem', in T.A. CIRIANI [30] GRINOLD, R.C.: 'Steepest ascent for large scale linear
AND R.C. LEACHMAN (eds.): Optimization in Indus- program', SIAM Rev. 14 (1972), 447-464.
try: Math. Programming and Optimization Techniques, [31] HANK, C.A., BARNHART, C., JOHNSON, E.L.,
Vol. 2, Wiley, 1994, pp. 7-24. MARSTEN, R.E., NEMHAUSER, G.L., AND SIGISMONDI,
[15] BARNHART, C., JOHNSON, E.L., NEMHAUSER, G.L., G.: 'The fleet assignment problem: solving a large-scale
SAVELSBERGH, M.W.F., AND VANCE, P.H.: 'Branch- integer program', Math. Program. 70 (1995), 211-232.
and-price: Column generation for solving huge integer [32] HARTMAN, J.K., AND LASDON, L.S.: 'A generalized
programs', Oper. Res. 46, no. 3 (1998), 316-329. upper bounding algorithm for multicommodity net-
[16] BARNHART, C., JOHNSON, E.L., NEMHAUSER, G.L., work flow problems', Networks 1 (1972), 333-354.
SIGISMONDI, G., AND VANCE, P.: 'Formulating a mixed [33] HELGASON, R., KENNINGTON, J., AND WONG, P.: 'An
integer programming problem to improve solvability', application of network programming for national for-
Oper. Res. 41 (1993), 1013-1019. est planning', Techn. Report Dept. Oper. Res. Southern
[17] BARNHART, C., AND SHEFFI, Y.: 'A network-based Methodist Univ., Dallas O R 81006 (1981).
primal-dual heuristic for the solution of multi- [34] JONES, K.L., LUSTIG, I.J., FARVOLDEN, J.M., AND
commodity network flow problems', Transport. Sci. 27 POWELL, W.B.: 'Multicommodity network flows: The
(1993), 102-117. impact of formulation on decomposition', Math. Pro-
[ls] BAZARAA, M.S., AND JARVIS, J.J.: Linear program- gram. 62 (1993), 95-117.
ming and network flows, Wiley, 1977. [35] KARKAZIS, J., AND BOFFEY, T.B.: 'A subgradient
[19] CLARKE, L.W., JOHNSON, G.L., NEMHAUSER, G.L., based optimal solution method for the multicommodity
AND ZHU, Z.: 'The aircraft rotation problem', Ann. problem', in R.E. BURKARD AND T. ELLINGER (eds.):
Oper. Res.: Math. Industr. Systems II 69 (1997), 33- Methods Oper. Res., Vol. 40, Anton Hain Verlag, 1981,
46. pp. 339-344.
[20] CRAINIC, T., FERLAND, J.A., AND ROUSSEAU, J.M.: [36] KENNINGTON, J.L.: 'Solving multicommodity trans-
'A tactical planning model for Rail freight transporta- portation problems using a primal partitioning simplex
tion', Transport. Sci. 18 (1984), 165-184. technique', Naval Res. Logist. Quart. 24 (1977), 309-
499
Multicommodity flow problems
500
Multicriteria sorting methods
501
Multicriteria sorting methods
on MCDA and its application in the study of classi- velop a linear discriminant model so that the min-
fication problems with or without ordered classes. imum distance of the score of each alternative
MCDA provides an arsenal of powerful and effi- from a predefined cut-off point is maximized (max-
cient nonparametric classification methods and ap- imize the minimum distance-MMD). To develop
proaches, which are free of statistical assumptions this model, they proposed the following goal pro-
and restrictions, while furthermore they are able gramming formulation:
to incorporate the decision maker's preferences in
a flexible and realistic way. max d
The remainder of the article is organized as fol- s.t. ~ wixij ÷ d <_c, Vi E Group 1,
lows. Section 2 provides a review of MCDA sorting E wixij-d>c, ViEGroup2,
approaches and techniques, outlining their basic
characteristics, concepts and limitations. In sec- where wi is the weight of attribute i, xij is the eval-
tion 3, a new MCDA sorting method is described uation of alternative j on attribute i, and c is the
and its operation is depicted through a simple il- cut-off score (wi and d are unrestricted in sign).
lustrative example. Finally, section 4 concludes the Soon after proposing this model, the same au-
paper and outlines some possible future research thors proposed a variety of similar goal program-
directions concerning the application of MCDA in ming formulations incorporating several other dis-
sorting problems. crimination criteria, such as the sum of deviations
(optimize the sum of deviations-OSD), the sum
of interior deviations (minimize the sum of inte-
Multicriteria Sorting Methods. The MCDA
rior deviations-MSID) and the maximum devia-
methods which have been proposed for the study
tion [10].
of sorting problems can be distinguished either ac-
These two studies attracted the interest of sev-
cording to the approach from which they are orig-
eral operational researchers and management sci-
inated (multi-objective/goal programming, multi-
entists. S.M. Bajgier and A.V. Hill [2] proposed
attribute utility theory, outranking relations, pref-
a new goal programming approach in order to
erence disaggregation), or according to the type of
minimize the number of misclassifications using
problem that they address (ordered or non-ordered
a mixed integer programming formulation (MIP)
classes). The review presented in this section will
and conducted a first experimental study to com-
distinguish the methods according to their origi-
pare the MMD model, the OSD model, and their
nation, but in the same time the type of problems
MIP formulation with LDA. They concluded that
that they address will also be discussed.
the goal programming formulations are generally
Goal Programming Approaches. The work of A. superior to LDA, except for the case of moderate
Charnes and W.W. Cooper [4] set the foundations to low overlap between groups and equal disper-
on goal/multi-objective programming, but it can sion matrices, where LDA outperforms all the ex-
also be considered as one or the pioneering stud- amined goal programming formulations.
ies in the field of MCDA in general. Since then, The performance of goal programming ap-
both multi-objective and goal programming con- proaches compared to statistical techniques was
stitute two major fields of interest from the theo- an issue that several researchers tried to investi-
retical and practical points of view in the MCDA gate using mainly experimental data sets. Freed
and operations research communities. In partic- and Glover [11] compared MMD, MSID, OSD
ular, goal programming approaches, during the and LDA and they concluded that although the
1960s and the 1970s have been used to elicit at- presence of outliers pose a greater problem for
tribute weights in multiple criteria ranking deci- the two simpler goal programming formulations
sion problems ([15], [27], [36], [35]). N. Freed and (MMD and MSID) than for LDA, generally the
F. Glover [9] were among the first to investigate goal programming approaches outperform LDA.
the potentials of goal programming techniques in E.A. Joachimsthaler and A. Stam [18] compared
the discriminant problem. Their aim was to de- the LDA, QDA, logistic regression and OSD pro-
502
Multicriteria sorting methods
cedures and they concluded that these method- b) they produce improper solutions.
ologies produce similar results although the mis-
A solution is considered unbounded if the objec-
classification rates for LDA and QDA tended to
tive function can be increased or decreased without
increase with highly kurtosis data and increased
limit, in which case the discrimination rule (func-
dispersion heterogeneity. C.A. Markowski and E.P.
tion) may be meaningless, whereas a solution is
Markowski [22] examined the influence of qualita-
improper if all observations fall on the classifica-
tive attributes on the discriminating performance
tion hyperplane.
of MMD and LDA. Although the incorporation of
To overcome these problems new goal program-
qualitative attributes in LDA violates the normal-
ming formulations were proposed, including hybrid
ity assumption, the experimental study of the au-
models ([12], [13]), nonlinear programming formu-
thors showed that the incorporation of qualitative
lations [37], as well as several mixed integer pro-
variables improved the performance of LDA, while
gramming formulations ([1], [3], [5], [20], [33], [38],
on the other hand MMD did not appear to be par-
[39]).
ticularly well-suited for use with qualitative vari-
In the light of this review of goal programming
ables. In another experimental study conducted
approaches for discriminant problems it is possi-
by P.A. Rubin [32], QDA outperformed 15 goal
ble to identify the following three characteristics
programming approaches, leading the author to
of the research in this field:
indicate that 'if LP models are to be considered
seriously as an alternative to conventional proce- 1) The majority of the proposed models aim at
dures, they must be shown to outperform QDA developing a linear discrimination rule (func-
under plausible conditions, presumably involving tion). The extension of the models to develop
non-Gaussian data'. These experimental studies a nonlinear discriminant function leads to
clearly indicate the confusion concerning the dis- nonlinear programming formulations which
criminating performance of the goal programming are generally computationally intensive and
formulations as opposed to well known multivari- difficult to solve. Among the few alternative
ate statistical techniques. Except for this issue, approaches is the MSM method (multisur-
the research on the field of goal programming ap- face method) proposed by O.L. Mangasarian
proaches for discriminant problems, was also fo- [21] that leads to the construction of a piece-
cus on the theoretical drawbacks which were of- wise linear discrimination surface between
ten meet. Markowski and Markowski [23] were the two groups (see also [26] for a revision of the
first to identify two major drawbacks of the goal method using multi-objective programming
programming formulations (MMD and OSD) pro- and fuzzy mathematical programming tech-
posed by Freed and Glover ([9], [10]). More specif- niques).
ically, they proved that if each quadrant contains 2) Little research has been made on extend-
at least one case from the second group, unaccept- ing the existing framework on the multigroup
able solutions will result in MMD (all coefficients discriminant problem. E.-U. Choo and W.C.
in the discriminant function are zeros which leads Wedley [5], W. Gochet et al. [14], as well as
all the observations to be classified in the same J.M. Wilson [39] applied goal programming
group), while furthermore they showed that the so- approaches in multigroup discriminant prob-
lutions (discriminant functions) obtained through lems, but generally most of the studies in this
the MMD and the OSD models are not stable when field were focused on two-group discrimina-
the data are transformed (when there is a shift tion trying to extend the original goal pro-
from the origin). Except for these two problems, gramming models of Freed and Glover ([9],
many goal programming formulations were found [10]) in order to achieve higher classification
to suffer from two additional theoretical shortcom- accuracy and predicting ability.
ings [29]: 3) The models based on the goal programming
approach can be applied in any classification
a) they produce unbounded solutions, and problem with or without ordered classes.
503
Multicriteria sorting methods
Outranking Relations Approaches. In contrast to is based on the definition of a veto threshold vj(ri)
the goal programming approaches, outranking re- for criterion j and the profile ri. The veto threshold
lations procedures study the classification problem vj(ri) for criterion j defines the minimum accepted
on a completely different basis. The aim of such difference between the values of the profile ri and
procedures is not to develop a discriminant func- alternative a on the specific criterion so that we
tion (linear or nonlinear), but instead their aim is can say that they have totally different preference
to model the decision makers" preferences and de- according to criterion j.
velop a global preference model which can be used Let F(a, ri) be the set consisted of all criteria for
to assign the alternatives (observations) into the which the discordance index value is greater than
predefined classes. To achieve the classification of the value of global concordance index. For each af-
the alternatives some reference profiles are deter- firmation of the type: 'alternative a outranks pro-
mined which can be considered as representative file ri according to all criteria', the credibility in-
examples of each class. Through the comparison of dex as(a, ri) is calculated. If F(a, ri) is empty then
each alternative with these reference profiles the as(a, ri) = C(a, ri), otherwise the credibility index
classification of the alternatives is accomplished. is calculated as follows:
A representative example of MCDA sorting
method based on the outranking relations ap-
as(a r i ) - C(a ri)" H 1 - D ( ) _ j , ari,
,
proach is the E L E C T R E TRI method proposed jEF
by W. Yu [40]. The aim of E L E C T R E TRI is to If the value of the credibility index of the affirma-
provide a sorting of the alternatives under con- tion 'alternative a outranks profile ri according to
sideration into two or more ordered categories. In all criteria' exceeds a predefined cut-off value A,
order to define the categories E L E C T R E TRI uses then the proposition 'a outranks ri' can be consid-
some reference alternatives (reference profiles) ri, ered to be valid. Denoting the outranking relation
i = 1 , . . . , k - 1, which can be considered as fic- as S, the preference (P), indifference (I) and in-
titious alternatives different from the alternatives comparability (R) relations between alternative a
under consideration. The profile ri is the theoreti- and profile ri can be defined as follows:
cal limit between the categories Ci and Ci+l (C i+1
is preferred to Ci) and ri is strictly better than • aIri if and only if aSri and riSa;
ri-1 for each criterion. To provide a sorting of the • aPri if and only if aSri and no riSa;
alternatives in categories E L E C T R E TRI makes • riPa if and only if no aSri and riSa;
comparisons of each alternative with the profiles.
• aRri if and only if no aSri and no riSa.
For an alternative a and a profile ri the con-
cordance index cj(a, ri) is calculated. This index According to these relations two sorting proce-
expresses the strength of the affirmation 'alterna- dures are applied: the pessimistic and the opti-
tive a is at least as good as profile ri on criterion mistic one. The sorting procedure starts by com-
j'. In order to compare the alternative to a refer- paring alternative a to the worst profile rl and in
ence profile on the basis of more than one criteria, the case where aPrl, a is compared to the sec-
a global concordance index C(a, ri) is calculated. ond profile r2, etc., until one of the following two
This index expresses the strength of the affirma- situations appears:
tion 'a is at least as good as ri according to all i) aPri and ri+lPa or aIri+l;
criteria'. Setting wj as the weight of the criterion
ii) aPri and aRri+l, . . . , aRri+k, ri+k+lPa.
j, C(a, ri) is constructed as the weighted average
of all ci ( a, r i ). If situation i) appears, then alternative a is as-
In contrast to the concordance index, the dis- signed to category i + 1 by both pessimistic and
cordance index Dj(a, ri) expresses the strength of optimistic procedures. If situation ii) appears, then
the opposition to the affirmation 'alternative a is a is assigned to category i + 1 by the pessimistic
at least as good as profile ri according to crite- procedure and to category i + k + 1 by the opti-
rion gj'. The calculation of the discordance index mistic procedure.
504
Multicriteria sorting methods
It is clear that the E L E C T R E TRI method is is not constructed through a direct interrogation
a powerful tool for analyzing the decision maker's procedure between the decision analyst and the de-
preference in sorting problems involving multiple cision maker. Instead, decision instances (e.g. past
criteria where the classes are ordered. However, decisions) are used in order to analyze the deci-
the major drawback of the method is the signifi- sion policy of the decision maker, to specify his/her
cant amount of information that it requires by the preferences and construct the corresponding global
decision maker (weights of the criteria, preference preference model as consistently as possible.
and indifference thresholds, veto thresholds, etc.). A well known preference disaggregation method
This problem can be overcame using decision in- is the UTA method (UTilit@s Additives) proposed
stances (assignment examples) as proposed in [25]. in [17]. Given a predefined ranking of a reference
Other MCDA sorting methods based on the set of alternatives, the aim of the UTA method
outranking relations approach have been pro- is to construct a set of additive utility functions
posed in [24] (N-TOMIC method), [31] and the which are as consistent as possible with the pre-
P R O M E T H E E method as it has been modified in ordering of the alternatives (and consequently with
[19]. Furthermore, P. Perny [28] extended the ex- the decision maker's preferences). The form of the
isting framework of the sorting methods based on additive utility function is the following:
the outranking relations approach in the case in
which the groups are not ordered. More specifi- - Z (gJ),
J
cally, he proposed the construction of a fuzzy out-
ranking relation in order to estimate the member- where U(~) denotes the global utility of an alter-
ship of each alternative for each group, and sug- native described over a vector of criteria y, while
gested two assignment procedures: uj(gj) is the partial or marginal utility of an alter-
native on criterion gj.
a) filtering by strict preference (the assignment Except for the study of ranking problems, the
rule consists of testing whether an alterna- methodological framework of the preference dis-
tive is preferred or not to a reference profile aggregation approach using the UTA method is
reflecting the lower limit of a group), and also applicable in sorting problems. The UTADIS
b) filtering by indifference (the assignment rule method (VTilit@s Additives DIScriminantes) ([6],
consists of testing whether an alternative is [16], [17], [42]) is a representative example. In the
indifferent or not to a reference profile repre- UTADIS method, the sorting of the alternatives
senting a prototype of a group). is accomplished by comparing the global utility
(scores) of each alternative a, denoted as U(a),
Overall the main characteristics of sorting meth- with some thresholds ( u l , . . . , Uq-1) which distin-
ods based on the outranking relations approach guish the classes C 1 , . . . , C a (the classes are or-
of MCDA include their application to both sort- dered, so that C1 is the class of the best alterna-
ing (ordered classes) as well as discrimination tives and C a is the class of the worst alternatives).
(non ordered classes) problems, and the significant
amount of information that they require by the de- U(a) > a
cision maker. u2 <_ U(a) < ul a C2
505
Multicriteria sorting methods
two error functions denoted as a+(a) and a-(a), M.H.DIS (Multigroup Hierarchical DIScrimina-
representing the deviations of a misclassified alter- tion) and differs from most of the aforementioned
native from the utility threshold. The estimation MCDA approaches in two major aspects.
of both the additive utility model and the utility
thresholds is achieved through linear programming
1) It employs a hierarchical discrimination ap-
proach: the method does not aim on the
techniques ([6], [42]).
development of an overall global preference
See [7] and [41] for three variants of the UTADIS
model (discriminant function) which will
method to improve the classification accuracy of
characterize all the observations (alternatives
the obtained additive utility models as well as their
or objects). Instead the method is trying to
predicting ability. The first variant (UTADIS I) ex-
distinguish the groups progressively, starting
cept for the classification errors also incorporates
by discriminating the first group (best alter-
the distances of the correctly classified alternatives
natives) from all the others, and then pro-
from the utility thresholds which have to be max-
ceeding to the discrimination between the ob-
imized. The second variant (UTADIS II) is based
jects which belong to the other groups.
on a mixed integer programming formulation min-
imizing the number of misclassifications instead of 2) It accommodates three different discrimina-
their magnitude, while the third variant (UTADIS tion criteria in a very flexible and efficient
III) combines UTADIS I and II, and its aim is way. The most common discrimination crite-
to minimize the number of misclassifications and rion in the previous approaches is the min-
maximize the distances of the correctly classified imization of the classification error which is
alternatives from the utility thresholds. measured as the deviations of the scores of
Overall the main characteristics of the applica- the misclassified alternatives from some cut-
tion of the preference disaggregation approach in off points. However, such an objective does
the study of sorting problems, can be summarized not necessarily yield the optimal classifica-
in the following three aspects. tion rule. For instance, consider that in a dis-
1) The information that is required is minimal, crimination problem, three alternatives are
since, similarly to the goal programming ap- misclassified with the following deviations
proaches, only a predefined classification of a from the cut-off point: [0.25, 0.25, 0.25], with
the overall objective of minimizing the to-
reference set of alternatives is required.
tal classification error being 0.75. It is ob-
2) The preference disaggregation approach is fo-
vious, that this classification result is not op-
cused only on decision problems where the
timal, since a classification result [0, 0, 0.75]
classes are ordered, since it is assumed that
yields the same value for the overall classi-
there is a strict preference relation between
fication error (0.75), but there is only one
the classes.
misclassified alternative instead of three. Sev-
3) The classification/sorting models which are eral mixed integer programming formulations
developed have a nonlinear form, since the have been proposed to confront this issue,
marginal utilities of the evaluation criteria but their application in real world prob-
are piecewise linear and consequently the lems is prohibited by the significant amount
global utility model is also nonlinear, in con- of time required to solve such problems.
trast to the linear discriminant models used M.H.DIS employs an efficient mixed integer
in the goal programming approaches. programming (MIP) formulation for mini-
mizing the number of misclassifications, once
A Multigroup Hierarchical Discrimination the minimization of the classification error
M e t h o d . In this section a new method is pre- has been achieved. Furthermore, M.H.DIS
sented for the study of discrimination problems also considers a third criterion in order to
with two or more ordered groups (multigroup achieve the higher possible discrimination.
discrimination). The proposed method is called These three discrimination criteria have been
506
Multicriteria sorting methods
used in previous studies separately, or in hy- alternatives have been classified in the predefined
brid models ([12], [13]), but they have never classes.
been used through a sequential procedure. Throughout this hierarchical classification pro-
Instead, in M.H.DIS initially the classifica- cedure, it is assumed that the decision maker's
tion error is minimized. Then considering preferences are monotone functions (increasing or
only the misclassified alternatives M.H.DIS decreasing) on the criteria's scale. This assump-
tries to 're-arrange' their classification error tion implies that in the case of a criterion gi E G1,
in order to minimize the number of misclassi- as the evaluation of an alternative on this criterion
fications, and finally the maximum discrimi- increases, then the decision of classifying this al-
nation between the alternatives is attempted. ternative into a higher (better) class is more favor-
able to a decision of classifying the alternative into
a lower (worst) class. For instance, in the credit
Model Formulation. Let A = { a l , . . . , an} be a set
granting problem as the profitability of a firm in-
of n alternatives which should be classified into q
creases, the credit analyst will be more favorable in
ordered classes C 1 , . . . , C a. (C1 is preferred to C2,
classifying the firm as a healthy firm, rather than
C2 is preferred to C3, etc.) Each alternative is de-
classifying it as a risky one. A similar implication
scribed (evaluated) along a set G = {gl,... ,gin}
is also made for each criterion gi E G2.
of m evaluation criteria. The evaluation of each
This preference relation between the several
alternative a on criterion gi is denoted as gi(a).
possible decisions of classifying a specific alterna-
According to the set A of alternatives, Pi different
tive a into one of the predefined classes, imposes
values for each criterion gi can be distinguished.
the following general classification rule:
These Pi values are rank-ordered from the small-
est value g~ to the largest value g~/~. Furthermore, The decision concerning the classifica-
among the set of criteria it is possible to distin- tion of an alternative a into one of the
guish two subsets: a subset G1 consisting of ml cri- predefined classes should be made in
teria for which higher values indicate higher pref- such a way that the utility (value) of
erence, and a second subset G2 consisting of m2 such a decision for the decision maker
criteria for which the decision maker's preference is maximized.
is a decreasing function of the criterion's scale. For The utility of a decision concerning the classifi-
instance, in an investment decision problem G1 cation of an alternative a into group Cj can be
may include criteria related to the return of an in- expressed in the form of additive utility function:
vestment project (projects with higher return are m
507
Multicriteria sorting methods
increases the preference of decision concerning the archical discrimination procedure, two linear pro-
classification of a firm in the group of healthy firms grams and one mixed integer program are solved
in also increasing. On the other hand, for the group to estimate 'optimally' the two utility functions.
of risky firms the marginal utility will be a decreas-
LPI: Minimizing the Overall Classification Error.
ing function of the criterion's (profitability) values,
According to the classification rule (1), to achieve
indicating that as profitability increases the pref-
the correct classification of an alternative a E Ck
erence of the decision concerning the classification
at stage k (cf. Fig. 1), the estimated utility func-
of a firm in the group of risky firms is decreasing.
tions should satisfy the following constraint:
Consequently, at each stage of the hierarchical
uC~ (~) > u-C~ (~).
classification procedure that was described above,
two utility functions are constructed. The first one Since, in linear programming it is not possible to
corresponds to the utility of a decision concerning use strict inequality constraints, a small positive
the classification of an alternative a into class Ck real number s may be used as follows:
(denoted as UCk(a)), while the second one corre- u c~ (~) - u-C~ (~) > ~.
sponds to the utility of a decision concerning the
nonclassification of an alternative a into class Ck If for an alternative a 6 Ok the classification rule at
(denoted as U-Ck(a)). Based on these two utility stage k yields UCk(a) < u-Ck(a), then this alter-
functions the aforementioned general classification native is misclassified, since it should be classified
rule can be expressed as follows: in one of the lower classes (the specific classifica-
tion of the alternative will be determined in the
if U Ck (a) > U -Ck (a), t h e n a 6 Ck, (1) next stages of the hierarchical discrimination pro-
if U ck (a) < U -Ck (a), then a ~ Ck. cess). The classification error in this case is:
Following this rule, the overall hierarchical dis- ~(~) = v-C~ (~) - uCk (~) + ~.
crimination procedure is presented in Fig. 1.
Similarly, to achieve the correct classification of
comider-~, fl an alternative b ¢_ Ck at stage k, the estimated
.w-
(U~(.)>U'¢~(.)) utility functions should satisfy the following con-
•. I
A ~,o
straint:
U -ck (b) - U ck (b) >_ s.
(u~c°)>~u'~c°)) If this constraint is not satisfied for an alterna-
¥,, /~ No
tive b ~_ Ck at stage k, then this fact implies that
•.~. ) C -.q ) this alternative should be classified in class Ck
( u~(o),u-~.(., ) and the classification error in this case is e(b) =
Yes .,1, No
u c~ (b) - u -c~ (b) + ~.
(o.C,)(o.C,)
t !
1
Moreover, to achieve the monotonicity of the
marginal utilities, the following constraints are im-
k f posed:
c ~ (g~) _ 0
Fig. 1" The hierarchical classification procedure.
-c~ ( ~ , ) = 0
ifg, 6 G1 ~uCk(g~+l) > uCk(g{) (2)
Estimation of Utility Functions. According to the
hierarchical discrimination procedure which was u~-Ck (gj+l
~ )<u=, c~(g~)
j
described above, to achieve the classification of the
alternatives in q classes, the number of utility func-
tions which must be estimated is 2 ( q - 1). The u:, c~ (g~ ) - o (3)
estimation of these utility functions in M.H.DIS if g, 6 G2 uCk (g~+l) < uCk (g~)
is accomplished through linear programming tech- U;Ck (g{+l) > u;Ck (g{)
niques. More specifically, at each stage of the hier-
508
Multicriteria sorting methods
where ~ and oi o j +1 are two consecutive values of of the classification errors which may lead to the
criterion gi (gj+l
i > gij for all gi 6 G) . These con- reduction of the number of misclassifications.
straints can be simplified by setting: In M.H.DIS this is achieved through a mixed in-
teger programming (MIP) formulation. However,
J
Wij,j+l = Ui i ) -- (gi ) since MIP formulations are difficult to solve, espe-
if gi 6 G1 ~ , j-ck
,~+~ = u~- c k ( g { ) - u ; c~ (9 j~+ 1)
cially in cases where the number or integer or bi-
(4) nary variables is large, the MIP formulation used
Ok Ck j
Wij,j+l -- Ui (9i) --
u/Ck(,.qj+l
i )
in M.H.DIS considers only the misclassifications
if gi 6 G2 -Ck --Ck ~j+l j occurred by solving (LP1), while retaining all the
~j,j+l = u~ (~ ) _ ~;c~(a) correct classifications. Let C be the set of alter-
(5) natives which have been correctly classified after
solving (LP1), and M be the set of misclassified
The marginal utility of criterion gi at point g{
alternatives for which e(a) > 0. The MIP formu-
can then be calculated through the following for-
lation used in M.H.DIS is the following (LP2):
mulas:
j-1 pi-1 min
uC"g j ' - z., c, j -Ck
• Wil,l+l ~ Iti (gi ) -- E Wil,l+l" aEA
t=~ l=j v c~ (~) - u - c k (a) > ~,
s.t.
(~)
Va e C~ n C,
Using these transformations, constraints (2) and U -Ck (b) - U ck (b) >_ s,
(3) can be rewritten as follows (a small positive
Vbf! C k , b 6 C,
number t is used to ensure the strict inequality)"
U ck (a) - U -ck (a) + I(a) >_ s,
Ck > t, --Ck
Wij,j+l _ Wij,j+l k t, Vgi.
Va e Ok n M,
Consequently, the initial linear program (LP1) u - c k (b) - u c, (b) + Z(~) > ~,
to be solved can be formulated as follows: Vb C_ Ck,b 6 M ,
min F - E e(a) Ck > t
Wij,j+l --
aEA -Ck
u c, (a) - u -c, (~) + ~(~) > ~, Wij,j+l k
s.t.
Wij,j+l ---1
V a e C~, i j
U -ck (b) - U Ck (b) + e(b) >_ s, Wij,j+l - 1
- -
Vb f~ Ck, i j
Ck >t s, t, I (a) integer.
Wij,j+l --
--Ck _
Wij,j+l > t The first set of constraints is used to ensure that
-1
Wij,j+l -- all the correct classifications achieved by solving
i j (LP1) are retained. The second set of constraints
_c, is used only for the alternatives which were mis-
Wij,j+l -- 1
i j classified by (LP 1). Their meaning is similar to the
e ( a ) , s , t > O. constraints in LP1, with the only difference be-
ing the transformation of the continuous variables
LP2: Minimizing the Number of Misclassifica- e(a) of LP1 (classification errors) into integer vari-
tions. If after the solution of (LP1), there exist ables I(a) which indicate whether an alternative is
some alternatives a 6 A for which e(a) > 0, then misclassified or not. The meaning of the final two
obviously these alternatives are misclassified. How- constraints has already been illustrated in the dis-
ever, as it has been already illustrated during the cussion of the LP1 formulation. The objective of
discussion of the main characteristics of M.H.DIS, LP2 is to minimize the number of misclassifica-
it may be possible to achieve a 're-arrangement' tions occurred through the solution of LP1.
509
Multicriteria sorting methods
LP3: Maximizing the M i n i m u m Distance. Solving evaluation criteria [25] for which higher values are
LP1 and LP2 the 'optimal' classification of the al- preferred. The alternatives must be classified in
ternatives has been achieved, where the term 'op- three ordered classes. Table 1, illustrates the eval-
timal' refers to the minimization of the number uation of the alternatives on the criteria as well as
of misclassified alternatives. However, the correct the predefined classification.
classification of some alternatives may have been gl g: ga Class
'marginal', that is although they are correctly clas- al 70 64.75 46.25 C1
sified, their global utilities according to the two a2 61 62 60 C1
utility functions developed may have been very a3 40 50 37 C:
a4 66 40 23.125 C:
close. The objective of LP3 is to maximize the min-
a5 20 20 20 (73
imum difference between the global utilities of the a8 15 15 30 (273
correctly classified alternatives achieved according
to the two utility functions. Table 1: Data of the illustrative example (Source: [25]).
Similarly to LP2, let C be the set of alternatives Distinguishing between C1 and C2-C3
which have been correctly classified after solving In the first stage of the hierarchical discrimina-
LP1 and LP2, and M be the set of misclassified tion procedure, the aim is to distinguish the alter-
alternatives. LP3 can be formulated as follows: natives belonging in class C1 from the alternatives
belonging in classes C2 and C3. To achieve this
max d classification two utility functions are developed,
s.t. U Ck (a) - U -Ck (a) - d >__s, denoted as U C~ (a) and U -C1 (a).
Va 6 Ck N C, The utility of the decision of classifying the al-
U -Ck (b) - U Ck (b) - d > s, ternative al in class C1 can be expressed as follows:
510
Multicriteria sorting methods
511
Multicriteria sorting methods
--
- U l C~(15) - w -C* w -C* w -C* 0.29615,
11,2 -q- 12,3 -q- 13,4 -l-
w -14,5
C, - c ~ _ 0.25937,
if- W15,6 - u C~(37) - w 31,2C~
~- w 32,3
C~ c,
~-w33,4 _ 0.19753
- u C~(20) - - w C*
11,2 - 0.0001 -- u 3 C~ ( 3 7 ) - W34,5 - C ' - 0.22209
- c ~ -[- w 35,6
- -el
u 1 (20) - w 12,3
-C~ -1- w-C1 w-C1 - u C* (46.25) - w 31,2 e* w e* w c~
13,4 -q- 14,5 -~- -[- 32,3 -q- 33,4 q-
w 15,6
-C~ - 0.22229 w 34,5
C~ - 0.33323
- ~(4o) = ~c~ c~ _ o.ooo~
11,2 -[- W12,3 - u3 C~(46.25) - w 35,6
-C~ - 0.11104,
512
Multicriteria sorting methods
513
Multicriteria sorting methods
u c:(a) u -c:(a)
min F - e(a3) + e(a4) + e(a5) ~- e(a6)
a3 0.9999 0.0005
s.t. U C2 (a3) - U -c2 (a3) + e(a3) > 0.001
a4 0.9997 0.0003
uC2(a4) - u - C 2 ( a 4 ) + e(a4) _> 0.001 a5 0.0002 0.9996
.a6 0.0005 0.7949
u-C2(a5) - uC:(a5) + e(a5) > 0.001
u - C 2 ( a 6 ) - uC2(a6) + e(a6) _ 0.001 Table 6: Global utilities obtained through the solution of
LP3 (stage 2).
W ic2
j,j+l >
-- 0.0001, W i-j ,cj :+ l -- 0.0001
>
3 3 In this point the hierarchical discrimination pro-
~-~-~ C2 -1 cedure ends, since all the alternatives have been
W i j , j + l -- ,
i=1 j = l classified in the three predefined classes. Moreover,
3 3
-c2 -- 1, this classification is correct. In particular, in stage
~-~ ~-~ Wij,j+l
1 a l and a2 have been correctly classified in class
i j--1
C1, while in stage 2 a3 and a4 have been correctly
Vi-1,2,3, Vj=I,...,4,
classified in class C2, and a5 and a6 have been clas-
e(a3), e(a4), e(a5), e(a6) 2 0. sified into the final class C3 (cf. Table 6).
Table 5 presents the global utilities of the alter-
natives according to the solution obtained by LP1 C o n c l u d i n g R e m a r k s A n d Future P e r s p e c -
in this second stage. tives. The focal point of interest in this article
u c2(~) u-c2(~) was the application of MCDA in the study of
sorting or more generally discrimination (classi-
a3 0.8944 0.1000
a4 0.7333 0.2501 fication) problems. Such types of problems have
a5 0.2111 0.8000 major practical interest in several fields includ-
a6 0.1612 0.7500 ing finance, environmental and energy policy and
Table 5" Global utilities obtained through the solution of planning, marketing, medical diagnosis, robotics
LP1 (stage 2). (pattern recognition), etc. The multivariate statis-
tical classification techniques have been used for
The alternatives are correctly classified in their
decades to study such problems. However, their in-
original classes, and therefore, it is not necessary
ability to provide a realistic and flexible approach
to proceed with LP2 (similarly to the first stage).
to support real world decision making problems in
Instead, the method proceeds in solving LP3 to
situations where classification is required, led oper-
achieve better discrimination of the alternatives.
ational researchers, management scientists as well
max d as practitioners towards the exploitation of the re-
s.t. U C2 (a3) - U -C2 (a3) - d > 0.001 cent advances in the fields of operations research,
U C2 (a4) - U -c2 (a4) - d > 0.001
management science, and artificial intelligence.
Among these 'alternative' approaches for the
U -C2 (a5) - U C2 (a5) - d > O.O01
study of classification problem, MCDA provides an
u-C2(a6) - uC2(a6) - d > 0.001 arsenal of tools and methods to develop classifica-
W ic2
j,j+l >
-- 0.0001, W i-j c, j2+ l -- 0.0001 ,
> tion (sorting) models within a realistic and flexi-
3 3 ble context. This article outlined the main MCDA
Wij,j+l = 1 classification techniques, both from the specific
i=1 j = l
3 3 type of classification problems that they address
Wij,j+l = 1,
(ordered or non-ordered classes), as well as from
i=1 j=~ the MCDA approach that they employ (goal pro-
Vi-1,2,3, Vj=l,...,4, gramming, outranking relations, preference disag-
d>0. gregation).
Furthermore, a new MCDA approach has been
Table 6 presents the global utilities calculated proposed. The M.H.DIS method, extends the com-
according to the solution of LP3. mon two-group classification framework, through a
514
Multicriteria sorting methods
515
Multicriteria sorting methods
set of additive utility functions for multicriteria deci- mixed-integer programming discriminant model', Man -~
sion making, the UTA method', Europ. J. Oper. Res. agerial and Decision Economics 11 (1990), 255-266.
10 (1982), 151-164. [34] SMITH, C.: 'Some examples of discrimination', Ann.
[ls] JOACHIMSTHALER, E.A., AND SWAM, A.: 'Four ap- Eugenics 13 (1947), 272-282.
proaches to the classification problem in discriminant [35] SRINIVASAN, V., AND SHOCKER, A.D.: 'Estimating the
analysis: An experimental study', Decision Sci. 19 weights for multiple attributes in a composite criterion
(I 988), 322-333. using pairwise judgements', Psychometrika 38, no. 4
[19] KHOURY, N.T., AND MARTEL, J.M.: 'The relationship (1973), 473-493.
between risk-return characteristics of mutual funds and [36] SRINIVASAN, V., AND SHOCKER, A.D.: 'Linear pro-
their size', Finance 11, no. 2 (1990), 67-82. gramming techniques for multidimensional analysis of
[2o] KOEHLER, G.J., AND ERENGUC, S.S.: 'Minimizing preferences', Psychometrika 38, no. 3 (1973), 337-396.
misclassifications in linear discriminant analysis', De- [37] STAM, A., AND JOACHIMSTHALER, E.A.: 'Solving the
cision Sci. 21 (1990), 63-85. classification problem via linear and nonlinear pro-
[21] MANGASARIAN, O.L.: 'Multisurface method for patter gramming methods', Decision Sci. 20 (1989), 285-293.
separation', IEEE Trans. Inform. Theory IT- 14, no. 6 [3s] STAM, A., AND JOACHIMSTHALER, E.A.: 'A compar-
(1968), 801-807. ison of a robust mixed-integer approach to existing
[22] MARKOWSKI, C.A., AND MARKOWSKI, E.P.: 'An ex- methods for establishing classification rules for the dis'
perimental comparison of several approaches to the dis- criminant problem', Europ. J. Oper. Res. 46 (1990),
criminant problem with both qualitative and quantita- 113-122.
tive variables', Europ. J. Oper. Res. 28 (1987), 74-78. [39] WILSON, J.M.: 'Integer programming formulation of
[23] MARKOWSKI, E.P., AND MARKOWSKI, C.A.: 'Some statistical classification problems', OMEGA Internat.
difficulties and improvements in applying linear pro- J. Management Sci. 24, no. 6 (1996), 681-688.
gramming formulations to the discriminant problem', [40] Yu, W.: 'ELECTRE TRI: Aspects methodologiques
Decision Sci. 16 (1985), 237-247. et manuel d'utilisation', Document du Lamsade (Univ.
[24] MASSAGLIA, M., AND OSTANELLO, A.: 'N-TOMIC: A Paris-Dauphine) 74 (1992).
decision support for multicriteria segmentation prob- [41] ZOPOUNIDIS, C., AND DOUMPOS, M.: 'A multicriteria
lems', in P. KORHONEN (ed.): Internat. Workshop decision aid methodology for the assessment of country
Multicriteria Decision Support, Vol. 356 of Lecture risk', in C. ZOPOUNIDIS AND J.M. GARCIA V/tZQUEZ
Notes Economics and Math. Systems, Springer, 1991, (eds.): Managing in Uncertainty, Proc. VI Internat.
pp. 167-174. Conf. AEDEM, AEDEM Ed., 1997, pp. 223-236.
[25] MOUSSEAU, V., AND SLOWINSKI, R.: 'Inferring an [42] ZOPOUNIDIS, C., AND DOUMPOS, M.: 'Preference dis-
ELECTRE-TRI model from assignment examples', J. aggregation methodology in segmentation problems:
Global Optim. 12, no. 2 (1998), 157-174. The case of financial distress', in C. ZOPOUNIDIS (ed.):
[26] NAKAYAMA, H., AND KAGAKU, N.: 'Pattern classifica- New Operational Approaches for Financial Modelling,
tion by linear goal programming and its extensions', J. Physica Verlag, 1997, pp. 417-439.
Global Optim. 12, no. 2 (1992), 111-126. [43] ZOPOUNIDIS, C., AND DOUMPOS, M.: 'A multi-group
[27] PEKELMAN, D., AND SEN, S.K.: 'Mathematical pro- hierarchical discrimination method for managerial de-
gramming models for the determination of attribute cision problems: The M.H.DIS method': Paper Pre-
weights', Managem. Sci. 20, no. 8 (1974), 1217-1229. sented at the EURO X V I Conf.: Innovation and Qual-
[2s] PERNY, P.: 'Multicriteria filtering methods based on ity of Life, Brussels, 12-15 July, 1998.
concordance and non-discordance principles', Ann.
Constantin Zopounidis
Oper. Res. 80 (1998), 137-165.
[29] RAGSDALE, C.T., AND SWAM, A.: 'Mathematical pro- Dept. Production Engin. and Management
gramming formulations for the discriminant problem: Financial Engin. Lab.
An old dog does new tricks', Decision Sci. 22 (1991), Techn. Univ. Crete
296-307. Univ. Campus, 73100 Chania, Greece
[30] RoY, B • Mdthodologie multicrit~re d'aide ~ la d~cision, E-mail address: kostas@ergasya, rue. gr
Economica, 1985. Michael Doumpos
[31] RoY, B., AND MOSCAROLA, J." 'Procedure automa- Dept. Production Engin. and Management
tique d'examem de dossiers fond6e sur une segmenta- Financial Engin. Lab.
tion trichotomique en presence de crit~res multiples', Techn. Univ. Crete
RAIRO Rech. Opgrat. 11, no. 2 (1977), 145-173. Univ. Campus, 73100 Chania, Greece
[32] RUBIN, P.A.: 'A comparison of linear programming E-mail address: dmichael©ergasya, rue. gr
and parametric approaches to the two- group discrim-
inant problem', Decision Sci. 21 (1990), 373-386. MSC 2000:90C29
[33] RUBIN, P.A.: 'Heuristic solution procedures for a Key words and phrases: sorting, multicriteria analysis, goal
programming, outranking relation, preference disaggrega-
516
Multidimensional knapsack problems
517
Multidimensional knapsack problems
518
Multidimensional knapsack problems
In the GA of [34] infeasible solutions were al- ferent local search procedures (such as greedy, SA,
lowed to participate in the search and a simple threshold accepting [14], [15] and noising [9]) can
fitness function which uses a graded penalty term be used. They also presented two TS heuristics.
was used. In [56] simple heuristic operators based
on local search algorithms were used, and a hy- M u l t i p l e - C h o i c e P r o b l e m s . One problem that
brid algorithm based on combining a GA with a is related to the MKP is the multidimensional
TS heuristic was suggested. multiple-choice knapsack problem (MMKP). Sup-
In [48], [49] a GA was presented where parent pose that { 1 , . . . , n } is divided up into K sets
selection is not unrestricted (as in a standard GA) Sk, k - 1,... , K , which are mutually exclusive
but is restricted to be between 'neighboring' solu- Sk N Sl -- O, Vk # l, and exhaustive [.JkK=lSk --
tions. Infeasible solutions were penalized as in [34]. {1,... ,n}. If we then add to the formulation of
An adaptive threshold acceptance schedule (moti- the MKP given previously the constraint
vated by [14], [15]) for child acceptance was used.
In the GA of [33] only feasible solutions were al- xj-1, k-1,...,K, (2)
j6Sk
lowed. P.C. Chu and J.E. Beasley [10] presented a
GA based upon a simple repair operator to ensure we obtain the MMKP. Equation (2) ensures that
that all solutions were feasible. exactly one variable is chosen from each of the sets
Sk, k = 1 , . . . , K .
A n a l y s e d H e u r i s t i c s . Analysed heuristics have See [44] for a heuristic for MMKP based on the
some theoretical underlying analysis relating to MKP heuristic of Magazine and Oguz [41].
their worst-case or probabilistic performance. The special case of the MMKP corresponding
A.M. Frieze and M.R.B. Clarke [23] described to m = 1 is known as the multiple-choice knap-
a polynomial approximation scheme based on the sack problem (MCKP) and its LP relaxation as
use of the dual simplex algorithm for LP, and anal- the linear multiple-choice knapsack problem (LM-
ysed the asymptotic properties of a particular ran- CKP). Work on MCKP includes [16], which pre-
dom model. sented a hybrid dynamic programming tree search
In [47] a class of generalized greedy algorithms algorithm incorporating a Lagrangian relaxation
is proposed in which items are selected according bound; [4], which presented a heuristic based
to decreasing ratios of their pj's and a weighted upon SA; and [3], which presented a tree search
sum of their rij's. These heuristics were subjected algorithm incorporating a Lagrangian relaxation
to both a worst-case, and a probabilistic, perfor- bound. For work on LMCKP see [50]. Earlier work
mance analysis. on MCKP and LMCKP is cited in [3], [4], [16], [50].
I. Averbakh [5] investigated the properties of See also: Q u a d r a t i c k n a p s a c k ; I n t e g e r pro-
several dual characteristics of the MKP for differ- gramming.
ent probabilistic models. He also presented a fast
References
statistically efficient approximate algorithm with
[1] AARTS, E.H.L., AND LENSTRA, J. (eds.): Local search
linear running time complexity for problems with in combinatorial optimization, Wiley, 1997.
random coefficients. [2] ABOUDI, R., AND JORNSTEN, K.: 'Tabu search for gen-
eral zero-one integer programs using the pivot and com-
plement heuristic.', ORSA J. Comput. 6 (1994), 82-93.
O t h e r H e u r i s t i c s . G.E. Fox and G.D. Scudder [3] AGGARWAL, V., DEO, N., AND SARKAR, D.: 'The
[19] presented a heuristic based on starting from knapsack problem with disjoint multiple-choice con-
setting all variables to zero(one) and successively straints', Naval Res. Logist. 39 (1992), 213-227.
choosing variables to set to one(zero). See [13] for [4] AL-SULTAN, K.: 'A new approach to the multiple-
a heuristic based upon simulated annealing (SA). choice knapsack problem': Proc. 16th Internat. Conf.
Computers and Industr. Engineering, 1994, pp. 548-
See [27] for a heuristic based on ghost image pro-
550.
cesses. S. Hanafi and others [31] presented a simple [5] AVEP.BAKH, I.: 'Probabilistic properties of the dual
multistage algorithm within which a number of dif- structure of the multidimensional knapsack problem
519
Multidimensional knapsack problems
and fast statistically efficient algorithms.', Math. Pro- primitive tool', J. Heuristics 2 (1997), 147-167.
gram. 65 (1994), 311-330. [23] FRIEZE, A.M., AND CLARKE, M.R.B.: 'Approximation
[6] BALAS, E., AND MARTIN, C.H.: 'Pivot and comple- algorithms for the m-dimensional 0-1 knapsack prob-
m e n t - a heuristic for 0-1 programming', Managem. lem: worst-case and probabilistic analysis', Europ. J.
Sci. 26 (1980), 86-96. Oper. Res. 15 (1984), 100-109.
[7] BATTITI, a., AND TECCHIOLLI, G.: 'Local search [24] GAVISH, B., AND PmKUL, H.: 'Allocation of databases
with memory: Benchmarking RTS', OR Spektrum 17 and processors in a distributed computing system', in
(1995), 67-86. J. AKOKA (ed.): Managem. of Distributed Data Pro-
[8] B)i.CK, T., FOGEL, D.B., AND MICHALEWICZ, Z. cessing, North-Holland, 1982, pp. 215-231.
(eds.): Handbook of evolutionary computation, Ox- [25] GAVISH, B., AND PIRKUL, H.: 'Efficient algorithms for
ford Univ. Press, 1997. solving multiconstraint zero-one knapsack problems to
[9] CHARON, I., AND HUDRY, O.: 'The noising method: optimality', Math. Program. 31 (1985), 78-105.
A new method for combinatorial optimization', Oper. [26] GILMORE, B.C., AND GOMORY, R.E.: 'The theory and
Res. Left. 14 (1993), 133-137. computation of knapsack functions', Oper. Res. 14
[10] CHU, P.C., AND BEASLEY, J.E.: 'A genetic algorithm (1966), 1045-1075.
for the multidimensional knapsack problem', J. Heuris- [27] CLOVER, F.: 'Optimization by ghost image processes in
tics 4 (1998), 63-86. neural networks', Comput. Oper. Res. 21 (1994), 801-
[11] CRAMA, V., AND MAZZOLA, J.B.: 'On the strength 822.
of relaxations of multidimensional knapsack problems', [28] CLOVER, F., AND KOCHENBERGER, G.A.: 'Critical
INFOR 32 (1994), 219-225. event tabu search for multidimensional knapsack prob-
[12] DAMMEYER, F., AND Voss, S.: 'Dynamic tabu list lems', in I.H. OSMAN AND J.P. KELLY (eds.): Meta-
management using reverse elimination method', Ann. Heuristics: Theory and Applications, Kluwer Acad.
Oper. Res. 41 (1993), 31-46. Publ., 1996, pp. 407-427.
[13] DREXL, A.: 'A simulated annealing approach to the [29] CLOVER, F.W., AND LACUNA, M.: Tabu search,
multiconstraint zero-one knapsack problem', Comput- Kluwer Acad. Publ., 1997.
ing 40 (1988), 1-8. [30] HANAFI, S., AND FREVILLE, A.: 'An efficient tabu
[141 DUECK, G.: 'New optimization heuristics: the grand search approach for the 0-1 multidimensional knapsack
deluge algorithm and the record-to-record travel', J. problem', Europ. J. Oper. Res. 106 (1998), 659-675.
Comput. Phys. 104 (1993), 86-92. [31] HANAFI, S., FREVILLE, A., AND ABEDELLAOUI, A.EL.:
[15] DUECK, G., AND SCHEUER, T.: 'Threshold accepting: 'Comparison of heuristics for the 0-1 multidimen-
A general purpose optimization algorithm appearing sional knapsack problem', in I.H. OSMAN AND J.P.
superior to simulated annealing', J. Comput. Phys. 90 KELLY (eds.): Meta-Heuristics: Theory and Applica-
(1990), 161-175. tions, Kluwer Acad. Publ., 1996, pp. 449-465.
[16] DYER, M.E., RIHA, W.O., AND WALKER, J.: ' i hy- [32] HILLIER, F.S.: 'Efficient heuristic procedures for inte-
brid dynamic programming/branch-and-bound algo- ger linear programming with an interior', Oper. Res.
rithm for the multiple-choice knapsack problem', J. 17 (1969), 600-637.
Comput. Appl. Math. 58 (1995), 43-54. [33] HOFF, A., LOKKETANGEN, A., AND MITTET, I.: 'Ge-
[17] EVERETT, H.: 'Generalized Lagrange multiplier netic algorithms for 0/1 multidimensional knapsack
method for solving problems of optimum allocation of problems', Working Paper Molde College, Britveien 2,
resources', Oper. Res. 11 (1963), 399-417. 6~00 Molde, Norway (1996).
[18] FONTANARI, J.F.: 'A statistical analysis of the knap- [34] KHURI, S., Bti.CK, T., AND HEITK('3TTER, J.: 'The
sack problem', J. Phys. A: Math. Gen. 28 (1995), zero/one multiple knapsack problem and genetic algo-
4751-4759. rithms': Proc. 199~ A CM Syrup. Applied Computing
[19] Fox, G.E., AND SCUDDER, G.D.: 'A heuristic with tie (SAC'94), ACM, 1994, pp. 188-193.
breaking for certain 0-1 integer programming models', [35] KOCHENBERGER, G.A., MCCARL, B.A., AND
Naval Res. Logist. Quart. 32 (1985), 613-623. WYMANN, F.P.: 'A heuristic for general integer
[20] FREVILLE, A., AND PLATEAU, G.: 'Heuristics and re- programming', Decision Sci. 5 (1974), 36-44.
duction methods for multiple constraints 0-1 linear pro- [36] LEE, J.S., AND GUIGNARD, M.: 'An approximate al-
gramming problems', Europ. J. Oper. Res. 24 (1986), gorithm for multidimensional zero-one knapsack prob-
206-215. l e m s - a parametric approach', Managem. Sci. 34
[21] FREVILLE, i . , AND PLATEAU, G.: 'An efficient prepro- (1988), 402-410.
cessing procedure for the multidimensional 0-1 knap- [37] LOKKETANGEN, A., AND GLOVER, F.: 'Probabilistic
sack problem', Discrete Appl. Math. 49 (1994), 189- move selection in tabu search for zero-one mixed inte-
212. ger programming problems', in I.H. OSMAN AND J.P.
[22] FREVILLE, A., AND PLATEAU, G.: 'The 0-1 bidimen- KELLY (eds.): Meta-Heuristics: Theory and Applica-
sional knapsack problem: toward an efficient high-level tions, Kluwer Acad. Publ., 1996, pp. 467-487.
520
Multidisciplinary design optimization
[38] LOKKETANGEN, A., AND GLOVER, F.. 'Solving zero- [55] SZKATULA, K.: 'The growth of multi-constraint ran-
one mixed integer programming problems using tabu dom knapsacks with large right-hand sides of the con-
search', Europ. J. Oper. Res. 106 (1997), 624-658. straints', Oper. Res. Left. 21 (1997), 25-30.
[39] LOKKETANGEN, A., J(DRNSTEN, K., AND STOROY, S." [56] THIEL, J., AND VOSS, S.: 'Some experiences on solv-
'Tabu search within a pivot and complement frame- ing multiconstraint zero-one knapsack problems with
work', Internat. Trans. Oper. Res. 1 (1994), 305-316. genetic algorithms', INFOR 32 (1994), 226-242.
[40] LOULOU, R., AND MICHAELIDES,E.' 'New greedy-like [57] TOYODA, Y.: 'A simplified algorithm for obtaining ap-
heuristics for the multidimensional 0-1 knapsack prob- proximate solutions to zero-one programming prob-
lem', Oper. Res. 27 (1979), 1101-1114. lems', Managem. Sci. 21 (1975), 1417-1427.
[41] MAGAZINE, M.J., AND OGUZ, 0.' 'A heuristic al- [58] VOLGENANT, A., AND ZOON, J.A.: 'An improved
gorithm for the multidimensional zero-one knapsack heuristic for multidimensional 0-1 knapsack problems',
problem', Europ. J. Oper. Res. 16 (1984), 319-326. J. Oper. Res. Soc. 41 (1990), 963-970.
[42] MARTELLO, S., AND TOTH, P." Knapsack problems: Al- [59] ZANAKIS, S.H.: 'Heuristic 0-1 linear programming: An
gorithms and computer implementations, Wiley, 1990. experimental comparison of three methods', Managem.
[43] MITCHELL, M" An introduction to genetic algorithms, Sci. 24 (1977), 91-104.
MIT, 1996. J.E. Beasley
[44] MOSER, M., JOKANOVIC, D.P., AND SHIRATORI, N." The Management School, Imperial College
'An algorithm for the multidimensional multiple-choice London ST7 2AZ, England
knapsack problem', IEICE Trans. Fundam. Electron-
E-mail address: j. beasley©i¢, ac. uk
ics, Commun. and Computer Sci. E80A (1997), 582-
589. MSC2000: 90C27, 90C10
[45] PIRKUL, H." 'A heuristic solution procedure for the Key words and phrases: multidimensional knapsack, multi-
multiconstraint zero-one knapsack problem', Naval constraint knapsack, multiple choice knapsack, combinato-
Res. Logist. 34 (1987), 161-172. rial optimization.
[46] REEVES, C.R." Modern heuristic techniques for combi-
natorial problems, Blackwell, 1993.
[47] RINNOOY KAN, A.H.G., STOUGIE, L., AND VERCEL- MULTIDISCIPLINARY DESIGN OPTIMIZA-
LIS, C." 'A class of generalized greedy algorithms for TION, M D O
the multi-knapsack problem', Discrete Appl. Math. 42
Modern large scale vehicle design (aircraft, ships,
(1993), 279-290.
[48] RUDOLPH, G., AND SPRAVE, J." 'A cellular genetic automobiles, mass transit) requires the interac-
algorithm with self-adjusting acceptance threshold': tion of multiple disciplines, traditionally processed
Proc. First IEE/IEEE Internat. Conf. Genetic Algo- in a sequential order. Multidisciplinary optimiza-
rithms in Engineering Systems: Innovations and Ap- tion (MDO), a formal methodology for the integra-
plications, IEEE, 1995, pp. 365-372.
tion of these disciplines, is evolving toward meth-
[49] RUDOLPH, G., AND SPRAVE, J." 'Significance of locality
and selection pressure in the grand deluge evolutionary ods capable of replacing the traditional sequential
algorithm', in H.M. VOIGT, W. EBELING, I. RECHEN- methodology of vehicle design by concurrent algo-
BERG, AND H.P. SCHWEFEL (eds.)" Parallel Problem rithms, with b o t h an overall gain in product per-
Solving from Nature IV. Proc. Internat. Conf. Evo- formance and a decrease in design time. The obsta-
lutionary Computation, Lecture Notes Computer Sci., cles to M D O becoming a production methodology,
Springer, 1996, pp. 686-694.
in the same sense as quality control, are numer-
[50] SARIS, S., AND KARWAN, M.H." 'The linear multiple
choice knapsack problem', Oper. Res. Left. 8 (1989), ous and formidable. In aircraft design, for instance,
95-100. typical disciplines involved would be aerodynam-
[51] SCHILLING, K.E.. 'The growth of m-constraint random ics, structures, t h e r m o d y n a m i c s , controls, propul-
knapsacks', Europ. J. Oper. Res. 46 (1990), 109-112. sion, manufacture, and economics. Detailed anal-
[52] SENJU, S., AND TOYODA, Y." 'An approach to linear
yses in each of these disciplines could involve tens
programming with 0-1 variables', Managem. Sci. 15
to hundreds of subroutines and tens of thousands
(1968), 196-207.
[53] SHIH, W." 'A branch and bound method for the mul- of lines of code. Managing the software libraries
ticonstraint zero-one knapsack problem', J. Oper. Res. and d a t a alone is a daunting task.
Soc. 30 (1979), 369-378. Codes from different disciplines typically are
[54] SZKATULA, K." 'The growth of multi-constraint ran-
grossly incompatible, but even within disciplines,
dom knapsacks with various right-hand sides of the
constraints', Europ. J. Oper. Res. 73 (1994), 199-204. d a t a structures and solution representations may
be incompatible, requiring 'translation' routines or
521
Multidisciplinary design optimization
recoding. This incompatibility is particularly acute tion). If response surface approximations are used,
when stand-alone packages with interactive inter- two prevalent approximation methods are classical
faces are involved. Most disciplinary codes, de- least squares and DA CE (Design and Analysis of
signed years ago for small serial computers, are Computer Experiments).
very ill-suited to modern parallel architectures, S. Burgee, A.A. Giunta, V. Balabanov, B.
even with a coarse grained approach. Grossman, W.H. Mason, R. Narducci, R.T.
Detailed, highly accurate disciplinary analyses Haftka, and Watson [3] has a detailed discussion
are very expensive, requiring sometimes hours on of the multipoint, classical least squares approach
a supercomputer, even when run in parallel. The to response surface construction, and of the use of
import of this is that, regardless of the dimension parallelism within disciplines (the pipelined MDO
of the design space, it can be sampled for accurate paradigm of Burgee is also provably convergent).
function values at only a relatively small number The tack of this approach is to use classical de-
of points. Other obstacles to achieving true MDO sign of experiments theory, regression statistics,
include model verification, noisy function values, and low order polynomial approximation models.
and flawed parallel optimization methodologies. The DACE [7] model posits that the output of
a computer analysis program is
Almost every conceivable strategy for MDO has
been proposed. A good recent summary of hierar-
chical approaches can be found in [4], and [9] pi-
where Z(x) is a zero mean stationary Gaussian
oneered nonhierarchical or concurrent approaches.
process. (This is clearly a fiction since computer
The basic idea of concurrent methods, and a par-
output is deterministic. The issue is whether the
ticular variant known as concurrent subspace op-
model has predictive power.) Using Bayesian sta-
timization (CSSO), is to simultaneously and inde-
tistics, the best unbiased predictor is
pendently optimize each of the disciplines (or 'con-
tributing analyses', as they are called), and then Y(x) - ~ + r(x, S ) R - I ( Y s - 1. ~),
perform a global coordination that brings the en-
where S is a set of observation sites, Ys is the vec-
tire system closer to a globally feasible and opti-
tor of observations at S, r(x, S) is the correlation
mal point. Collaborative optimization differs from
of x with sites S, R is the correlation matrix be-
CSSO in how the global coordination is managed.
tween sites S, and ~ is the estimate of the mean.
An excellent discussion of these approaches is in
Some parametrized functional form for the correla-
the proceedings [2]. While concurrent methods are
tion is assumed, and then these correlation param-
intuitively appealing and naturally parallelizable,
eters and ~ are computed as maximum likelihood
they are not guaranteed to converge [8].
estimates.
Trust region model management [1] is a rigor- DACE models are more flexible than polynomial
ous approach to MDO that shows promise, and as- models, but with sparse data in high dimensions
pects of CSSO when combined with an extended neither DACE nor polynomial models have much
Lagrangian and response surface approximations, predictive power. To appreciate the problem, ob-
can lead to a provably convergent MDO method serve that a cube in 30 dimensions has 230 ~ 109
(J.F. Rodriguez, J.E. Renaud and L.T. Watson, vertices, and to even evaluate an algebraic formula
[6]). A noteworthy aspect of the Rodriguez method at each vertex requires supercomputer power.
[6] is that the convergence proof covers variable
fidelity data, which is crucial in practice. MDO Paradigm Example. As an illustration,
In a taxonomy of MDO approaches, one dis- an MDO paradigm for aircraft design is presented
tinction would be between hierarchic or nonhier- here. The MDO algorithm is a repeat loop, with a
archic. Another distinction is whether parallelism nominal design as its starting point, approximate
is achieved between disciplines (concurrent disci- optimal designs as loop iterates, and an optimal
plinary computation) or within disciplines (mul- design as its ending point (see Fig. 1). At the start
tipoint, response surface, local/global computa- of each loop, aerodynamic shape and mission vari-
522
Multidisciplinary design optimization
ables are obtained from either the nominal start- of the optimal weight and necessary aerodynamic
ing design or the intermediate approximate opti- quantities over the approximation domain.
real design. These shape and mission variables are A genetic algorithm (GA; cf. Genetic algo-
then used in the parallel simple aerodynamic and r i t h m s ) is used to find sets of approximate D-
structural analyses. optimal design points in the approximation do-
NOMINAL DESIGN MDO loop main obtained from the parallel simple analyses.
!
..1 Configuration design variables
The structure of a response surface model is em-
bodied in the regression matrix X, which defines
Parallel simple aerodynamic analyses the GA merit function IXTXl (maximized by a set
Parallel simple structural analyses
of points called D-optimal). These D-optimal de-
~ Approximation domain
sign points are input to the detailed aerodynamic
I Regressionanalysis I analysis code, which performs detailed analyses at
I Response surface model structures each of the D-optimal design points in parallel.
I D-optimal point selection I The analyses result in accurate aerodynamic quan-
~ D-optimal design point sets tities, such as wave drag and other drag compo-
I Parallel detailed ! A~l~Odynamic
loads ~'
_ ['iParallel structural
nents, and accurate aerodynamic loads.
aerodynamic analyses --- optimizations The accurate aerodynamic quantities are used
. ~ Accurate aerodynamic
quantities
[Accurat,
~ weights to generate reduced-term polynomial response sur-
Aerodynamic response [ [Weight response face models for each of the expensive quantities
surfaces surface
(such as wave drag). An aerodynamic load cal-
Configuration ]4 culated in the detailed aerodynamic analyses is
optimization used in a detailed structural optimization to calcu-
~ Approximate optimal design
late an accurate optimal weight for that particular
T aerodynamic load. This structural optimization is
OPTIMAL DESIGN done (in parallel) for each aerodynamic load gener-
ated in the detailed aerodynamic analyses. The ac-
Fig. 1: MDO paradigm.
curate optimal weights calculated in the structural
The simple aerodynamic analyses are performed optimization are used to generate a reduced-term
on a regular grid of points in the design space. Sim- polynomial response surface model for the optimal
ple aerodynamic calculations evaluate the (aero- weight.
dynamic) feasibility of each grid point using toler- All the response surface models are then used
ances on the constraints and move limits on the in a configuration optimization to generate an ap-
objective function, eliminating grossly infeasible proximate optimal design, which will be used as
points, and generating an approximation domain. the starting design for the next iteration of the
The simple structural analyses use the aerody- MDO loop. The grid spacing may possibly be re-
namic shape and mission variables in basic weight fined for the simple analyses. When some conver-
equations to calculate approximate weights needed gence criterion is satisfied, the MDO loop exits
by the objective function and constraints, further with an optimal design.
refining the approximation domain. Note that the source of parallelism in the
Using the relatively abundant data from the present MDO paradigm is the multipoint approx-
simple analyses, regression analysis and analysis imations within each discipline, where the disci-
of variance are used to identify less important plines are visited sequentially in a pipeline. This
terms in the polynomial response surface models. contrasts sharply with CSSO MDO paradigms,
Once the less important terms are eliminated, the where the source of the parallelism is processing
structure of the reduced-term polynomial regres- the disciplines in parallel.
sion models is known, and can be used later in See also: O p t i m a l design of c o m p o s i t e
the generation of response surface approximations s t r u c t u r e s ; Multilevel m e t h o d s for o p t i m a l
523
Multidisciplinary design optimization
524
Multifacility and restricted location problems
min g(X1, . . . , XN). m E ~4. This result carries over to multi- (facil-
{x~,...,xNIcJ=
ity) Weber problems when each Bm has no more
In the first part of this survey it is assumed that than 4 extreme points [24]. For more than 4 ex-
~" -R 2 whereas $" will be a restricted set later treme points it is in general wrong (see [24] for a
on.
counterexample).
The models above implicitly assume that the In the case where all Bm are polytopes we can
new facilities can be distinguished, that the give linear programming formulations for the mul-
amount of interaction between each new and ex- tifacility Weber as well as the multifacility We-
isting facility is known and that the new facilities ber-Rawls problem ([34]) using B ° , the polar set
have mutual communication. Note, that problems of Bin, m E f i 4 .
without communication between the new facili-
ties can be separated into N independent 1-facility
min
problems which can be easily solved by suitable al-
mE.M nEAr r,sEAf
gorithms. Also, in many applications we want to s>r
0
locate a number of indistinguishable facilities to s.t. <Exm - Xn, em> < Zmn,
serve the overall demand. This implies that we are Vm E A/i, n E A/'e ° E E x t ( B ° ) ,
not only locating facilities, but we are also allo- !
525
Multi/acility and restricted location problems
incide with each other or with existing facilities. l m and the boundary of 7~ (see [15], [26], [28] and
This raises at least two issues" the illustration in the following figure).
• A priori detection of coincidences which re-
sult in a reduction of the dimension of the
problem and allow the exploitation of differ- /
entiability are discussed in ([20], [31], [7]).
• If coincidence is excluded, the theory of re-
stricted location can be used which is dis-
cussed next.
So far, the set ~ for placing new facilities was
the whole plane R 2. Now, the feasibility set 9r =
R 2 \ int(7~) is considered, where T/C_ R 2 is the re-
stricting set assumed to be connected in R 2. This
problem is more complicated than the unrestricted
Fig." Example of a restricted location problem with 4
one, since ~" is in general not convex. But from existing facilities and an elliptic forbidden region.
a practical point of view it is a necessary exten-
sion of the classical location model, since forbid- The discretization also works for restricted cen-
den regions appear everywhere" nature reserves, ter problems [16] and can be extended to noncon-
lakes, exclusion of coincidence in multifacility, etc. vex forbidden regions (see [15], [26]) and also to the
These problems are called restricted location prob- case of attraction and repulsion (negative weights
lems and have been developed in [1], [12], [14], [15] are allowed), see [29]. The concept of forbidden re-
and [26]. In the following we exclude the trivial gions has been successfully applied to a problem in
case and assume that none of the optimal solutions PCB assembly, where the bins holding the parts to
of the unrestricted problem is a feasible solution of be inserted into the PCB have to be stored [10].
the restricted one. Of course, the PCB itself has to be forbidden for
If the objective function h of the location prob- placing a bin. A solution approach, where also the
lem is convex it can be shown that optimal solu- issue of space requirements in a multifacility set-
tions of the restricted problem can be found on the ting is addressed can be found in [9], [15]. A more
boundary of 7~. Therefore, level curves general case where the new facility is a line has
been considered in [33]. Algorithms for multifacil-
L=(z) "- { Z e R '~" h ( X ) - z}
ity problems with forbidden regions can be found
and level sets in [8], [15], [27].
L<(z) "- <X e R '~" h ( X ) <_ z} Another type of restricted location problem is
can be used to reformulate the restricted location one, where not only placement, but also tresspass-
problem as ing of regions is forbidden. These problems are
called barrier location problems. The correspond-
min {z" L=(z) N OT~ ~ 0 and L<(z) C_ T~} .
ing models are mathematically challenging, since
A resulting search algorithm was formulated in the distance functions (and thus also the objec-
[11], but proved to be inefficient in practical appli- tive functions) are no longer convex. [17] considers
cations. Euclidean distances and one circle as forbidden re-
An efficient approach originally presented in gion. [1] and [4] develop heuristics for Ip distances
[12], [14], [15] identifies finite dominating sets and barriers that are closed polygons. [19] and [3]
(FDS) on the boundary 7~, i.e. a finite set of lo- obtain discretization results for 11 distances and
cations on 07~ which contains an optimal solution. arbitrary shaped barriers by showing an equiva-
Using this discretization, problems with gauge dis- lence of the barrier problem to a network location
tance and convex forbidden region can be solved problem. In the more general context of gauge dis-
by considering as FDS the intersection points of tances an FDS is given in [13] for median problems
526
Multifacility and restricted location problems
and in [5] for center problems. Finally, [18] consid- [8] FLIEGE, J., AND NICKEL, S.: 'An interior point method
ers barrier problems if the distance is an a r b i t r a r y for multifacility location problems with forbidden re-
gions', Studies in Location Anal. 14 (2000), 23-45.
n o r m and the barrier consists of a line with finitely
[9] FOULDS, L.R., AND HAMACHER, H.W.: 'Optimal bin
m a n y passages. location and sequencing in printed circuit board assem-
See also" S i n g l e facility location: Multi- bly', Europ. J. Oper. Res. 66 (1993), 279-290.
objective Euclidean distance location; Sin- [10] FRANCIS, R.L., HAMACHER, H.W., LEE, C.-Y., AND
gle f a c i l i t y l o c a t i o n : M u l t i - o b j e c t i v e r e c t i - YERALAN, S.: 'Finding placement sequences and bin
locations for Cartesian robots'; Trans. Inst. Industr.
l i n e a r d i s t a n c e l o c a t i o n ; S i n g l e f a c i l i t y lo-
Engin. (IIE) (1994), 47-59.
cation: Circle covering problem; Network [11] FRANCIS, R.L., MCGINNIS JR., L.F., AND WHITE,
location: Covering problems; Warehouse J.A.: Facility layout and location: An analytical ap-
location problem; Facility location with proach, second ed., Prentice-Hall, 1992.
externalities; Production-distribution sys- [12] HAMACHER, H.W.: Mathematische LSsungsverfahren
fiir planare Standortprobleme, Vieweg, 1995.
tem design problem; Global optimization
[13] HAMACHER, H.W., AND KLAMROTH, K.: 'Planar loca-
in Weber's problem with attraction and tion problems with barriers under polyhedral gauges',
repulsion; Facility location with staircase Report in Wirtschaftsmath. Dept. Math. Univ. Kaiser-
costs; Stochastic transportation and loca- slautern 21 (1997), to appear in Ann. Oper. Res.
tion problems; Facility location problems (2000).
with spatial interaction; Voronoi diagrams [14] HAMACHER, H.W., AND NICKEL, S.: 'Combinatorial
algorithms for some 1-facility median problems in the
in f a c i l i t y l o c a t i o n ; O p t i m i z i n g facility loca-
plane', Europ. J. Oper. Res. 79 (1994), 340-351.
tion with rectilinear distances; Combinato- [15] HAMACHER, H.W., AND NICKEL, S.: 'Restricted planar
r i a l o p t i m i z a t i o n a l g o r i t h m s in r e s o u r c e al- location problems and applications', Naval Res. Logist.
l o c a t i o n p r o b l e m s ; M I N L P : A p p l i c a t i o n in 42 (1995), 967-992.
facility location-allocation; Resource alloca- [16] HAMACHER, H.W., AND SCH(3BEL, A.: 'A note on cen-
ter problems with forbidden polyhedra', Oper. Res.
t i o n for e p i d e m i c c o n t r o l ; C o m p e t i t i v e facil-
Left. 20 (1997), 165-169.
ity location.
[17] KATZ, I.N., AND COOPER, L.: 'Facility location in the
presence of forbidden regions, I: Formulation and the
case of Euclidean distance with one forbidden circle',
References Europ. J. Oper. Res. 6 (1981), 166-173.
[1] ANEJA, Y.P., AND PARLAR, M.: 'Algorithms for We- [18] KLAMROTH, K.: 'Planar location problems with line
ber facility location in the presence of forbidden regions barriers', Report in Wirtschaftsmath. Dept. Math.
and/or barriers to travel', Transport. Sci. 28 (1994), Univ. Kaiserslautern 13 (1996), to appear in Optim.
70-76. [19] LARSON, R.C., AND SADIQ, G.: 'Facility locations with
[2] AURENHAMMER, F.: 'Voronoi diagrams - A survey of the Manhattan metric in the presence of barriers to
a fundamental geometric data structure', ACM Com- travel', Oper. Res. 31 (1983), 652-669.
puting Surveys 23 (1991), 345-405. [2o] LEFEBVRE, O., MICHELOT, C., AND PLASTRIA, F.:
[3] BATTA, R., GHOSE, A., AND PALEKAR, V.S.: 'Locat- 'Sufficient conditions for coincidence in minisum multi-
ing facilities on the Manhattan metric with arbitrarily facility location problems with a general metric', Oper.
shaped barriers and convex forbidden regions', Trans- Res. 39 (1991), 437-442.
port. Sci. 23 (1989), 26-36. [21] LOVE, R.F., MORRIS, J.G., AND WESOLOWSKY,
[4] BUTT, S.E., AND CAVALIER, T.M.: 'An efficient algo- G.O.: Facilities location: Models and methods, North-
rithm for facility location in the presence of forbidden Holland, 1988.
regions', Europ. J. Oper. Res. 90 (1996), 56-70. [22] MASUYAMA, S., IBARAKI, T., AND HASEGAWA, T.:
[5] DEARING, P.M., HAMACHER, n . w . , AND KLAMROTH, 'The computational complexity of the M-center prob-
K.: 'Center problems with barriers', Techn. Report lems on the plane', Trans. IECE Japan E64 (1981),
Depts. Math. Univ. Kaiserslautern and Clemson Univ. 57-64.
(1998), To appear in Naval Research Logistics. [23] MEGIDDO, N., AND SUPOWIT, K.J.: 'On the com-
[6] DURIER, R., AND MICHELOT, C.: 'Geometrical prop- plexity of some common geometric location problems',
erties of the Fermat-Weber problem', Europ. J. Oper. SIAM J. Comput. 13 (1984), 182-196.
Res. 20 (1985), 332-343. [24] MICHELOT, C.: 'Localization in multifacility location
[7] FLIEGE, J.: 'Nondifferentiability detection and dimen- theory', Europ. J. Oper. Res. 31 (1987), 177-184.
sionality reduction in minisum multifacility location
problems', J. Optim. Th. Appl. 94 (1997).
527
Multifacility and restricted location problems
[25] MINKOWSKI, H.: Gesammelte Abhandlungen, Vol. 2, levels and the definition of the objectives and con-
Chelsea, 1967. straints at particular levels.
[26] NICKEL, S.: Discretization of planar location problems,
Given a set of objectives {fi}i=l,...,M with
Shaker, 1995.
[27] NICKEL, S.: 'Bicriteria and restricted 2-facility Weber f i : R n --+ R and a vector of variables x E R n,
problems', Math. Meth. Oper. Res. 45, no. 2 (1997), partitioned into subsets x = ( X l , . . . , XM) for some
167-195. integer M denoting the number of subsystems, a
[28] NICKEL, S.: 'Restricted center problems under polyhe- prototypical form of MLP may be stated as fol-
dral gauges', Europ. J. Oper. Res. 104, no. 2 (1998),
lows:
343-357.
[29] NICKEL, S., AND DUDENHt3FFER, E.-M.: 'Weber's
min
problem with attraction and repulsion under polyhe- xlES1
dral gauges', J. Global Optim. 11 (1997), 409-432. s.t. x2 E argmin{f2(x)}
[30] OKABE, A., BOOTS, B., AND SUGIHARA, K.: Spatial x2ES2
tesselations. Concepts and applications of Voronoi di-
agrams, Wiley, 1992.
[31] PLASTRIA, F.: 'When facilities coincide: exact optimal-
ity conditions in multifacility Location', J. Math. Anal. XM6SM
Appl. 169 (1992), 476-498.
[32] PLASTRIA, F.: 'Continuous location problems', in
where the optimization problem at each level i
Z. DREZNER (ed.): Facility Location - A Survey of Ap- controls its own subset of variables xi, while the
plications and Methods, Springer, 1995, pp. 225-262. other subsets of variables X l , . . . , X i - l , X i + l , X M
[33] SCHOBEL, A" Locating lines and hyperplanes: Theory serve as parameters. The constraint set for each
and algorithms, Kluwer Acad. Publ., 1999. level is Si - {x: h i ( x ) - O, gi(x) >__ 0} with
[34] WARD, J.E., AND WENDELL, R.E.: 'Using block norms
hi : R '~ ---+ Rmh~ and gi : R n ~ R mgi for some
for location modeling', Oper. Res. 33 (1985), 1074-
1090. integers mh~ , mg~.
This form of MLP inspired by the work of H.
Horst W. Hamacher
Stackelberg [89] can be viewed as an M-player
Fachber. Math. Univ. Kaiserslautern
Postfach 3049, 67653 Kaiserslautern, Germany Stackelberg game ([18], [81]). Its interpretation is
E-mail address: hamacher©mathematik, uni-kl, de that of M autonomous players or decision makers
Stefan Nickel seeking to minimize their (possibly constrained)
Fachber. Math. Univ. Kaiserslautern objective functions while manipulating subsets of
Postfach 3049, 67653 Kaiserslautern, Germany decision or design variables disjoint from those of
E-mail address: nickel©mathematik, uni-kl, de other decision makers. The higher-level problems
MSC 2000:90B85 are implicit in the variables of the lower-level prob-
Key words and phrases: location theory, Weber problem, lems. This formulation has been studied widely in
Weber-Rawls problem, multifacility Weber problem, mul- the bilevel case. See, for example, [15] and the ref-
tiWeber problem, gauge, convex polytope, linear program- erences therein. In general, all problem levels, but
ming, NP-hard, Voronoi diagram, restricted location prob-
the outermost one, may contain a number of con-
lem, finite dominating set, discretization, barrier location
problems. current optimization problems.
A related variant of the problem, known as
the generalized bilevel p r o g r a m m i n g problem, rep-
MULTILEVEL METHODS FOR OPTIMAL resents the reaction of the lower-level problem to
DESIGN decisions made by the upper-level problem via a
solution of an equilibrium problem stated as a vari-
Multilevel, or hierarchical, programming problems
ational inequality:
(MLP) are constrained optimization programs in
which subsets of the solution set are themselves so-
min
lution sets of other, lower-level optimization pro- xEX,
yEY(x)
grams. Several general MLP problem statements
exist. They differ from one another in the specifics s.t. ( A ( x , y), y - z) < 0
of optimization variable distribution among the for all z E Y ( x ) ,
528
Multilevel methods for optimal design
where the upper-level domain X is such that the cal organization. Maintaining disciplinary auton-
lower-level domain Y(x) is not empty. This formu- omy while accounting for interdisciplinary subsys-
lation was introduced by P. Marcotte in [60] and tem couplings and allowing for integrated system
studied in [44], [61], and [68]. optimization with respect to system and interdis-
Multilevel problems may be partitioned into two ciplinary objectives is one of the tasks of MDO.
classes with respect to another criterion [97]. In Overviews of multidisciplinary optimization may
one of the classes, upper-level optimization prob- be found in [6] and [87].
lems depend on the corresponding lower-level ones Practitioners of engineering have been using
through the optimal value functions (or the mar- multilevel methods, in some form, since optimi-
ginal ]unctions) of the lower-level problems. An zation algorithms made their appearance in engi-
optimal value function represents the value of a neering problems. The seminal works [57], [62], and
lower-level objective function at a solution of that [95] contributed to a systematic development and
lower-level problem. In the other class, upper-level understanding of hierarchical optimization. Mul-
problems depend on the corresponding lower-level tilevel methods have been studied extensively in
problems through the actual optimal solutions of application to multidisciplinary design ([16], [17],
the latter. An example of two such formulations [22], [93], [94])and single-discipline design areas
in engineering design optimization will be given that give rise to large problems, such as structural
further. optimization (e.g., [84], [88], [71]). Engineering
Multilevel programming problems arise in nu- multilevel optimization has always had a strong
merous applications where the structure of the ap- connection with multi-objective optimization (e.g.,
plication involves hierarchical decision making or [52]).
where the sheer size and complexity of the problem
necessitates partitioning of the system and pro- P r o b l e m F o r m u l a t i o n . The procedure of formu-
cessing the subsystems in a hierarchical fashion. lating an engineering design problem as a multi-
Information on applications of multilevel optimi- level or a bilevel problem is difficult and depends
zation in such varied areas as power systems, wa- on the complexity and size of the problem. The
ter resource systems, urban traffic systems, and general components in formulating a multilevel op-
river pollution control can be found in [35], [49], timization problem are as follows:
[50], [51], [59], [66], [67], [82], and many other ref-
The original problem is studied to determine
erences. The use of multilevel algorithms in engi-
its structure. Structure is of paramount im-
neering control is well documented, for instance,
portance in deciding to adopt a particular
in [45] and [54].
formulation. For instance, most formulations
The broad area of multidisciplinary design op- assume that the problem subsystems share
timization (MDO) ~ a term that denotes a large only a relatively small number of variables,
set of research subjects and practical techniques i.e., that the bandwidth of interdisciplinary
for the design of complex coupled engineering sys- coupling is relatively small.
tems ~ is particularly amenable to the use of
multilevel methods, due to the extreme computa- The problem is partitioned into a system
tional expense and the organizational complexity (or upper-level) problem and subsystem (or
of the field. For instance, the design of aircraft in- lower-level) problems. Decisions are made on
volves aerodynamics, structural analysis, control, inclusions of particular variables and con-
weights, propulsion, and cost, to list a few disci- straints into the system and subsystems. De-
plines. The complexity and expense of each dis- cisions are also made on the form of the sys-
cipline have assured that most disciplines have tem and subsystem objectives.
developed into vast, autonomous fields of study, Finally, algorithms are selected for solving
so that practically feasible optimization methods the system and subsystem optimization prob-
that involve the contributing disciplines must take lems. One must distinguish a formulation of
into account such an autonomy and the hierarchi- the problem from the algorithm used to solve
529
Multilevel methods for optimal design
that formulation. While some of the mul- proaches have enjoyed success when applied to
tilevel formulations can be easily shown to specific problems, insufficient analytical founda-
be mathematically equivalent to the original tion and the difficulty of the problem usually mean
problem with respect to solution sets, they that the approaches are not robust, and extensive
may not be equivalent with respect to other 'fine-tuning' of heuristic parameters is required for
attributes, such as constraint qualifications each new problem or instance of a problem. Hence,
and optimality conditions. Hence the numer- recent years have seen renewed interest in sys-
ical properties of algorithms applied to dif- tematic, analytically substantiated approaches to
ferent formulations vary widely [8], [9], [10]. MLP. Many such developments have taken place
in bilevel optimization.
Problem decomposition constitutes a special area
of study. In general, decomposition techniques take Bilevel O p t i m i z a t i o n . Although bilevel optimi-
advantage of the problem structure and depend on zation problems (BLP) form the simplest case of
the strength and bandwidth of couplings among multilevel optimization, they are very difficult to
the subsystems. Separable and partially separable solve and constitute a fertile research area. A sur-
problems are particularly amenable to decomposi- vey of the field can be found in [28]. A large bibli-
tion. ography with an emphasis on theoretical develop-
Two types of decomposition may be considered ments is also provided in [91].
in design optimization. Coarse-grained decompo- The conventional general bilevel problem may
sition with respect to disciplines presents no diffi- be posed as follows:
culty, because the design problem initially consists
of autonomous parts. The difficulty at this level min fl (X, y)
xEX
of problem formulation is in integration or synthe- s.t. hi (x, y) - 0
sis. However, in realistic applications, even though gl(x,y) >_ O,
the coarse-grained decomposition is frequently ob-
vious, the complexity of the problem requires that where y solves for fixed x:
a dependence analysis be performed in order to de- min /2(x, y)
termine the most advantageous arrangement or se- yEY
quencing of the disciplinary subsystems in the op- s.t. h2(x, y) = 0
timization procedure. Automatic techniques based g2(x, y) >_O.
on graph-theoretic foundations may be found in
The cases of linear and convex problem functions
[75] and [76], for instance.
have been studied widely. A popular class of meth-
Finer-grained decomposition within a particu-
ods for the linear bilevel problem (extreme point
lar discipline may be addressed by a multitude
algorithms) computes global solutions by enumer-
of techniques for decomposition of mathematical
ating extreme points of the lower-level feasible
programs. Extensive references on decomposition
set (e.g., [27]). Convex bilevel problems are often
in general mathematical programming, beginning
solved by branch and bound methods (e.g., [15]).
with [19] and [31], and extended in [48] and many A survey of methods for linear and convex bilevel
others, can be found in [41] and [42]. Further ref-
programming can be found in [11].
erences to decomposition techniques aimed specif-
The considerably more difficult case of nonlin-
ically at design problems can be found in [92].
ear and nonconvex problem functions has inspired
General multilevel programming presents an ex- much research activity as well but has, to date, led
ceedingly difficult problem, and many multilevel to few computationally successful algorithms. The
formulations and algorithms of engineering de- existing approaches to nonlinear bilevel optimiza-
sign rely more on heuristics than on theoreti-
tion can be classified into several categories.
cally substantiated foundations. There are excep-
tions, for instance, such as those in [12], [65], [29], Penalty-Based Methods. This category uses
and [72]. While many engineering multilevel ap- penalty methods. In some algorithms (e.g., [1]),
530
Multilevel methods for optimal design
a barrier function penalizes the lower-level objec- Examples: Collaborative Optimization. Col-
tive. In double-penalty methods, both the lower- laborative optimization (CO) is a general approach
level problem and the upper-level problem are ap- to solving multidisciplinary design optimization
proximated by sequences of unconstrained optimi- problems by formulating them as nonlinear bilevel
zation problems ([53], [58], [61]). Single or double- programs of special structure. CO comprises a
penalty methods are, in general, expected to con- number of methods. Its antecedents can be traced
verge slowly, especially for highly nonlinear prob- to earlier hierarchical approaches, as in [57] and
lems. Thus using these methods for the usually [95]. The underlying idea of CO appeared in [13],
large and nonlinear design optimization problems [77], [78], [79], [85] and [93], [94]. The approach
may be difficult. has recently received attention under the name of
collaborative optimization [22], [23], [83], [90].
KKT-Based Methods. The algorithms of this cate-
Given that MDO problems are naturally parti-
gory convert the bilevel problem into a noncon-
tioned into subsystems along disciplinary lines, CO
vex, single-level optimization problem by using
suggests an intuitively attractive way to formulate
the Karush-Kuhn-Tucker conditions (KKT condi-
the optimization problem so that the autonomy
tions) of the lower-level problem as constraints on
of the disciplinary subsystem computations is pre-
the upper-level problem ([14], [15], [20], [36], [43]).
served. However, the approach presents a problem
If the lower-level problem is convex, the KKT for-
that is difficult to solve by means of conventional
mulation is equivalent to the original formulation
nonlinear programming software ([7], [55]). The
[14]. However, even in this case, the KKT condi-
analytical and computational aspects of CO were
tions on the lower-level problem include the com-
addressed in [9], of which the following discussion
plementarity slackness condition as a constraint.
is an abstract. As a complete description of CO is
The form of the complementarity condition makes
lengthy, only an abbreviated version is considered
the single-level problem difficult to solve. The
here.
KKT formulation suffers from an additional dif-
It is assumed that the original system is com-
ficulty. Namely, it is well known from the study of
posed of a number, say M, of interdependent but
the sensitivity and stability of nonlinear program-
autonomous systems, each of which is described by
ming (e.g., [39]) that even if the lower-level prob-
a disciplinary analysis Ai, i - 1,... , M , expressed
lem behaves exceedingly well in that it satisfies
in the form
such stringent assumptions as strong second order
sufficiency and regularity as a constraint qualifica- Ai(xi, yi(xi)) = O,
tion, the feasible set of the single-level problem will
where, given a vector of disciplinary design vari-
generally not be differentiable with respect to x.
ables xi, the analysis (frequently represented by
Hence, the performance of gradient-based solvers
a numerical differential equation solver or simula-
on the transformed problem may be adversely af-
tor) is performed to yield the vector of state vari-
fected.
ables or responses yi(xi). The sets of disciplinary
Descent-Based Methods. Another category of algo- variables xi are not necessarily disjoint. The disci-
rithms is based on solving subproblems that result plinary constraints are usually represented by in-
in descent for the upper-level problem with gradi- equalities
ent information of the lower-level problem used in
>_ 0.
a number of ways ([33], [38], [56], [80]).
The remainder of the article will be devoted Once the system objective and variables and the
to a more detailed description of two specific ap- subsystem constraints and variables are identified,
proaches to nonlinear, nonconvex problems that the bilevel problem is formed as follows:
arose from the need to solve engineering design The constraints of the system problem com-
problems. One approach is a bilevel formulation, prise the 'consistency' (or 'coupling' or 'match-
the other is an algorithm for solving multilevel for- ing') conditions that are used to drive the discrep-
mulations. ancy among the inputs and outputs shared by the
531
Multilevel methods for optimal design
subsystems to zero. The values of the constraints where x, solves the subsystem optimization prob-
are computed by solving the subsystem optimiza- lem.
tion problems, and the number of consistency con- Another instance of system-level consistency
straints is related to the number of subsystems and conditions matches the system-level variables with
variables shared among the subsystems. The form their subsystem counterparts computed in sub-
of the consistency constraints determines a partic- problem
ular implementation of CO.
Let ~ and 77represent system-level variables cor- gi(~, 77) = (~ - x,, ~ - y(x,)). (4)
responding to inputs and outputs of subsystems,
respectively. Then, given M subsystems, the ab- The behavior of optimization algorithms applied
breviated system program is to the original and CO formulations will differ
greatly, as the formulations are not equivalent with
min F(~, ~)
(1) respect to constraint qualifications or optimality
s.t. a(~, y) - 0 , conditions.
where In general, value functions are not differentiable,
and this may cause difficulties for optimization
- algorithms applied to the system-level problem.
However, under a number of strong assumptions,
the constraints are locally differentiable and can
is the set of system consistency constraints ob- usually be computed.
tained by solving lower-level subproblems, each of
Derivatives of the system-level constraints with
which is of the form
respect to the system-level design variables are the
/ 1[ ]
min ~ II~i- xill2 + II~i- Y(xi)ll2 (2) sensitivities of the minima or the solutions of the
s.t. ci(xi, y(xi)) >_ O, subsystem-level optimization problems to parame-
ters. The area of sensitivity in nonlinear program-
where i is the number of the subsystem. Thus, ming has been studied extensively. Relevant re-
the objective of a subsystem optimization prob- sults can be found in [39] and [40]. In particu-
lem is always to minimize the discrepancy between lar, under the assumptions of sufficient smooth-
the shared variables of the subsystems, in a least ness, second order sufficiency, regularity as con-
squares sense, subject to satisfying the disciplinary straint qualification, and strict complementarity
constraints, which do not depend explicitly on the slackness, the basic sensitivity theorem (BST)
system variables passed down to the subsystems as proves the existence of a unique, local, contin-
parameters. The subsystems remain feasible dur- uously differentiable solution-multiplier triple for
ing optimization, while interdisciplinary .feasibility the perturbed problem. Moreover, locally, the set
is gradually attained at the system level via the of active constraints remains unchanged and reg-
consistency constraints. Maintaining disciplinary ularity and strict complementarity hold, allowing
feasibility is extremely important from the design one to compute derivatives locally. In fact, under
perspective. a number of assumptions, stronger statements can
The problem now consists of a set of decoupled be made about the differentiability of the value
subproblems that can be solved independently and function ([30], [74]).
in parallel. Under the conditions of BST, local first order
One instance of the system-level consistency derivatives of the consistency constraints (3) have
conditions gives rise to the form in which CO is a particularly simple form because, in the case of
usually presented: namely, the consistency condi- CO, the constraints of the lower-level problems do
tion is intended to drive to zero the value function not depend on parameters. On the other hand, the
of the subproblem (2). That is, first order sensitivities of solutions of the lower-
1 level problem that form the derivatives of the con-
- - ,112 + 11,7 - y(x,)ll 2 , (3)
sistency constraints (4), while of closed form, are
532
Multilevel methods for optimal design
expensive to compute and involve second order The algorithms of the class are based on trust
derivatives of the subsystem Lagrangians. region methodology (see, e.g., [34], [37], [64])and
There is another feature of the CO formula- are proven to converge under reasonable assump-
tion with compatibility constraints (3) that will tions.
cause difficulties for nonlinear programming algo- The idea of the MAESTRO algorithms is to at-
rithms applied to the system-level problem: La- tain sequential predicted sufficient decrease con-
grange multipliers will almost never exist for the ditions for all the constrained objectives, and is
equality constrained system level problem, with a direct extension of the multilevel ideas for the
all the ensuing consequences. The nonexistence of equality constrained optimization problem. The
Lagrange multipliers is due to the description of approach can be summarized as follows. Given an
the feasible region that causes the Jacobian of the initial approximation to the solution of the mul-
system-level constraints to vanish at a solution. tilevel problem, the trial step for the multilevel
The formulation with compatibility constraints (4) problem is computed as a sum of a sequence of
aims to address this problem. However, the compu- substeps, each of which predicts sufficient (or opti-
tation of derivatives for this formulation is clearly mal) decrease in the quadratic model of the objec-
expensive, as it not only involves solving a system tive of a given subproblem, subject to maintain-
of equations, but also requires the computation of ing predicted decrease in the models of the pre-
second order information for the subsystems. The vious objectives. For instance, in the case of the
difficulties are addresses in detail in [9]. unconstrained bilevel problem, the trial step for
In summary, CO is an appealing approach to the bilevel problem is a sum of two substeps. The
design optimization; however, the bilevel nature of first substep is computed to predict sufficient de-
the problem formulation will cause difficulties for crease, via the quadratic model of the innermost
conventional nonlinear programming algorithms objective f2, for the subproblem of approximately
applied to the system-level problem. This is to be optimizing
expected for a bilevel problem. 1
mf2(s) - f2(xc) + V f2(xc)Ts + -~sTH2(xc)s,
E x a m p l e : M A E S T R O , a Class of Multi- in the trust region of size 6/2 to produce the sub-
level A l g o r i t h m s . As mentioned earlier, most step s f2, where xc is the current approximation
multilevel formulations and algorithms for engi- to the solution and H2 is the current approxima-
neering design problems assume that the band- tion to the Hessian of f2. The second step s/~
width of coupling among the subsystems com- would then approximately minimize the quadratic
prised by the multilevel system is small. While model of the outermost objective fl, constructed
many problems may be stated in this way, it is at xc + s f2, in the trust region of size 6f~, subject
becoming increasingly important to consider prob- to constraints that enforce the preservation of the
lems with large bandwidth of coupling where, to predicted sufficient or optimal decrease for fl. The
use an MDO expression, 'everything affects every- total trial step is evaluated by using the merit func-
thing else'. MAESTRO (a class of multilevel al- tion designed to account for the sequential pro-
gorithms for constrained optimization; [2]) is in- cessing of the objectives. The algorithm is shown
tended for solving large nonlinear programming to converge to critical points of the bilevel or mul-
problems with arbitrary couplings among the nat- tilevel problem. Thus, the essential difference be-
urally occurring subsystems, i.e., a particular in- tween this approach and the classical approaches
stance of MDO problems with a single objective. to bilevel optimization is that instead of starting
The class was extended in, e.g., [5] to include a from the optimality conditions for the bilevel or
large class of steps for the nonlinear programming multilevel problem, the approach attempts to ob-
problem and in [3], [4] to incorporate general non- tain decrease on the sequence of subproblem mod-
linear objectives. The class makes no assumptions els, while preserving predicted decrease for the
on the structure of the problem, such as convexity previously processed subproblems, and to measure
or separability. progress via the use of an appropriate merit func-
533
Multilevel methods for optimal design
534
Multilevel methods for optimal design
Res. 9 (1982), 77-100. [34] DENNIS, JR., J.E., AND SCHNABEL, R.B.: Numerical
[16] BARTHELEMY, J-F.M.: 'Engineering applications of methods for unconstrained optimization and nonlinear
heuristic multilevel optimization methods', NASA equations, Prentice-Hall, 1983.
TM-101504 (1988). [35] DIRICKX, Y.M.I., AND JENNERGREN, L.P.: Systems
[17] BARTHELEMY, J-F.M., AND RILEY, M.F.: 'Improved analysis by multilevel methods, Wiley, 1979.
multilevel optimization approach for the design of com- [36] EDMUNDS, T.A., AND BARD, J.F.: 'Algorithm for non-
plex engineering systems', AIAA J. 26 (1988), 353-360. linear bilevel mathematical programs', IEEE Trans.
[is] BASAR, W., AND SELBUZ, H.: 'Closed loop Stackelberg Syst., Man Cybern. 21 (1991), 83-89.
strategies with applications in optimal control of mul- [37] EL-ALEM, M.M.: 'A global convergence theory for
tilevel systems', IEEE Trans. Autom. Control AC-24 the Celis-Dennis-Tapia trust region algorithm for con-
(1979), 166-178. strained optimization', SIAM J. Numer. Anal. 28
[19] BENDERS, J.F.: 'Partitioning procedures for solving (1991), 266-290.
mixed variables programming problems', Numerische [38] FALK, J.E., AND LIU, J.: 'On bilevel programming,
Math. 4 (1962), 238-252. Part I: General nonlinear cases', Math. Program. 70
[20] BIALAS, W.F., AND KARWAN, M.H.: 'On two-level (1995), 47-72.
optimization', IEEE Trans. Autom. Control AC-27 [39] FIACCO, A.V. (ed.): Introduction to sensitivity and sta-
(1982), 211-214. bility analysis in nonlinear programming, Acad. Press,
[21] BRACKEN, J., AND MCGILL, J.: 'Mathematical pro- 1983.
grams with optimization problems in the constraints', [4o] FIACCO, A.V., AND MCCORMICK, G.P. (eds.): Nonlin-
Oper. Res. 21 (1973), 37-44. ear programming, sequential unconstrained minimiza-
[22] BRAUN, R.D.: 'Collaborative optimization: An archi- tion techniques, SIAM, 1990.
tecture for large-scale distributed design', PhD Thesis [41] FLIPPO, O.E.: 'Stability, duality and decomposition in
Stanford Univ. (1996). general mathematical programming', PhD Thesis Eras-
[23] BRAUN, R.D., MOORE, A.A., AND KROO, I.M.: 'Col- mus Univ. Rotterdam, The Netherlands (1989).
laborative approach to launch vehicle design', J. Space- [42] FLIPPO, O.E., AND RINNOOY KAN, A.H.G.: 'Decom-
craft and Rockets 34 (1997), 478-486. position in general mathematical programming', Math.
[24] BURKE, J.V.: 'Calmness and exact penalization', Program. 60 (1993), 361-382.
SIAM J. Control Optim. 29 (1991), 493-497. [43] FORTUNY-AMAT, J., AND MCCARL, B.: 'A represen-
BURKE, J.V.: 'An exact penalization viewpoint of con- tation of a two-level programming problem', J. Oper.
strained optimization', SIAM J. Control Optim. 29 Res. Soc. 32 (1981), 783-792.
(1991), 968-998. [44] FRIESZ, T., TOBIN, R., CHO, H., AND MEHTA, N.:
[26] CALAMAI, P.H., AND VICENTE, L.N.: 'Generating lin- 'Sensitivity analysis based heuristic algorithms for
ear and linear-quadratic bilevel programming prob- mathematical programs with variational inequality
lems', SIAM J. Sci. Comput. 14 (1993), 770-782. constraints', Math. Program. 48 (1990), 265-284.
[27] CANDLER, W., AND TOWNSLEY, R.: 'A linear two-level [45] GAHUTU, D.W.H., AND LOOZE, D.P.: 'Parametric co-
programming problem', Comput. Oper. Res. 9 (1982), ordination in hierarchical control', Large Scale Systems
59-76. a (1985), 33-45.
[28] CHEN, Y.: 'Bilevel programming problems: Analysis, [46] GAUVIN, J.: 'The method of parametric decomposition
algorithms and applications', PhD Thesis Univ. Mon- in mathematical programming: the nonconvex case', in
treal (1994). C. LEMARECHAL AND R. GRIFFIN (eds.): Nonsmooth
[29] CHIDAMBARAM, B., AND RAO, J.R.J.: 'A study of con- optimization, Pergamon, 1978, pp. 131-149.
straint activity in bilevel models of optimal design', [47] GAUVIN, J., AND DUBEAU, F.: 'Some examples and
Techn. Report Univ. Houston UH-ME-SDL-94-01 counterexamples for stability analysis of nonlinear pro-
(1994). gramming problems', in A.V. FIACCO (ed.): Vol. 21 of
[30] CLARKE, F.H. (ed.): Optimization and nonsmooth Math. Program. Stud., North-Holland, 1983, pp. 69-78.
analysis, SIAM, 1990. [48] GEOFFRION, A.M.: 'Generalized Benders decomposi-
[31] DANZIG, G.B., AND WOLFE, P.: 'Decomposition prin- tion', J. Optim. Th. Appl. 10 (1972), 237-260.
ciple for linear programming', Oper. Res. $ (1960), [4o] GOULBECK, B., BRDYS, M., ORR, C.H., AND RANCE,
101-111. J.P.: 'A hierarchical approach to optimized control of
[a2] DE LUCA, A., AND DI PILLO, G.: 'Exact augmented water distribution systems: Part I, decomposition', Op-
Lagrangian approach to multilevel optimization of timal Control Appl. Meth. 9 (1988), 51-61.
large-scale systems', Internat. J. Syst. Sci. 18 (1987), [50] GOULBECK, B., BRDYS, M., ORR, C.H., AND RANCE,
157-176. J.P.: 'A hierarchical approach to optimized control of
[33] DE SILVA, A.H., AND McCORMICK, G.P.: 'Implicitly water distribution systems: part II, lower-level algo-
defined optimization problems', Ann. Oper. Res. 34 rithm', Optimal Control Appl. Meth. 9 (1988), 109-126.
(1992), 107-124.
535
Multilevel methods/or optimal design
536
Multilevel optimization in mechanics
537
Multilevel optimization in mechanics
the solution of the initial problem. The aforemen- tion and 0 denotes the generalized gradient of F.H.
tioned method leads to the following three main Clarke [7] as it has been extended by R.T. Rock-
applications of the multilevel optimization tech- afellar [25] for nonLipschitzian functionals. In this
niques in the framework of Mechanics and more case the variational inequalities of the convex en-
generally in engineering sciences. ergy problems are replaced by hemivariational in-
a) Calculation of large structures. equalities (cf. e.g. [20], [21], [17], [8])and instead
of a global minimum of the convex potential or
b) Validation of the simplifying assumptions
complementary energy functionals, the local min-
used for the calculation of complex struc-
ima and maxima are searched and among them
tures. Accuracy testing.
the global minimum as well. For the numerical
c) Accuracy improvement of simplified models treatment of hemivariational inequalities certain
used for the estimation of the behavior of numerical methods have been developed (cf. e.g.
complex structures. [21]) and among them, the two methods described
Note that in the above, the term 'structure' can be in [15] are extensions of the multilevel optimization
replaced with the term 'systems', meaning systems methods to substationarity problems.
whose behavior is characterized by the solution of It should also be noted that most of the do-
a minimax problem. main decomposition methods are special cases of
Since most of the multilevel techniques devel- the multilevel optimization algorithms, as it re-
oped in the early sixties for the trajectory deter- sults easily if one considers the energy functionals
mination problems in space science are also appli- corresponding to the partial differential equations
cable to stationarity problems, and since recently studied. Then the domain decomposition leads to
it has been proved that in the dynamic problems energy functionals which have to be minimized on
involving impact phenomena the functional of the the decomposed parts of the domain.
action is stationary [23], [22] it results that there Finally, it should be mentioned that fractal ge-
is also a further application of the multilevel opti- ometries in optimization problems arising in Me-
mization methods: chanics are treated by means of appropriate mul-
tilevel transformations of the problem as is will
d) Calculation of the dynamic behavior of struc- be shown further. It is evident that an optimiza-
tures involving impact effects. tion problem with many variables cannot always
To the aforementioned applications the following, directly be decomposed into independent optimi-
classical one, can be added. zation subproblems. The aim of the multilevel op-
timization is to define with respect to an optimi-
e) Solution of optimal control (minimum of
zation problem, appropriate mutually independent
weight or cost, maximum of strength) in dy-
subproblems. Each of these when solved indepen-
namic structural analysis problems.
dently yields the optimum of the overall problem
This article deals mainly with static systems. after an iterative procedure which is called second-
Concerning the application d) and e) the reader is level controller. The decomposition into subprob-
referred to [27], [12] in relation with [23], [22]. In lems is achieved by choosing some variables, called
dynamic problems analogous methods to the static coordinating variables, which are freely manipu-
problems can be developed. lated by the second-level controller in such a way
The classical decomposition techniques which that the subproblems (first-level of the problem)
are applied to optimization problems (cf. in this re- have solutions which in fact yield the optimum of
spect also [20, pp. 355f[]) have been extended and the initial problem, i.e. before its decomposition
they can be applied also to substationarity prob- into subproblems. Here, the ideas of [3] are closely
lems [25], i.e. to problems of the type followed.
538
Multilevel optimization in mechanics
There are several different methods of trans- H is immediately separable into N individual sub-
forming a given constrained optimization problem systems, except for its last term.
into a multilevel optimization problem. All these In the method of nonfeasible decomposition it
methods are basically combination of two meth- is assumed that p(i) has a known value. The term
ods: the feasible decomposition method or model p(i)Ts(i) is put in the ith subsystem and all of the
coordination method and the nonfeasible decompo- p(i)Tg(i)(x(J), u (j)) terms associated with the j t h
sition method or goal coordination method. variables are put in the j t h subsystem. On the
Let us consider the problem other hand, in the feasible decomposition method
min H(x, u) it is assumed that s (i) has a known value. More-
X~U over, all of the p(i)T[g(i)(x(J), u ( J ) ) - S(i)] terms as-
s.t. f(x, u) - 0 (1) sociated with the j t h variables are put in the j t h
a(x, u) > 0, subsystem. In both cases, the optimization prob-
where x is a vector in En, u is a vector in E r a , f lem is separable and each subsystem can be opti-
is an n vector of C 2 functions, II is a twice contin- mized independently. Equation (2) is rewritten in
uously differentiable (C 2) function, and R is an r more compact form as
vector of C 2 functions. To decompose, coordinat- n ( x , v; A,/z, p) (3)
ing variables s may be substituted not only for a
single variable but also, for functions g(x, u), so
= F(x, v)+ ~Tf(x, v)
that II is splitted into mutually disjoint parts and + # T [ R ( x , v) - (r] + pTh(x, v),
the f and R equations contain no common x, u,
where tr > 0, v represents u and s and h(x, v)
or s variables between the subproblems. Thus the
denotes all g(i) - s (i), p is a Lagrange multiplier
following problem results:
vector of the same dimension as g,/z is an r vector
N
including all Lagrange multipliers, and A is an n
n ( x , u, s) - ~ II (~)(x (~) , u (~), s (~))
vector including all Lagrange multipliers.
i--1
The Kuhn-Tucker theory of nonlinear program-
f(i)(x(i),u(i),s(i)) -- O, i - 1,... ,N,
ming [9] implies that if II(x, v) has a critical point
R (i) (x (/), u (i), s (/)) > 0, i - 1,..., N. at (x °, v °) such that the constraint equations in
The (i) denotes to the ith subproblem or subsys- (1), are satisfied, and if the rank of
tem which must be optimized. For example in a
control problem x denotes the state, u denotes the
control and x (1) is the state vector for the first
subsystem. Also the coupling equations must be is full and equals the rank of
added:
s (/) - g(i)(x (j), u (j)) for all j ¢ i.
T( RT T N ) 1' (4)
The Lagrangian of the new problem reads
where
n(x, u, s; ~, t,, o) (2)
N N y (:)
= ~ H (i) + ~ A (i)T f(i)
i=1 i=1 at (x °, v°), then a set of unique Lagrange multi-
N N pliers A°, /z ° and p0 exist at the critical point.
+ ~ . ( ~ ) ~ (R(') - ~('))+ Z o (~)~(g(~) - ~(~)), The necessary conditions for a critical point (local
i-1 i-1
minimum) are
where er (i) > 0 are additional slack variables such
that OH 0II
= =0, #~R~-0, R>0, /z<0,
Ox Ov (5)
R (i) - a (i) - 0.
539
Multilevel optimization in mechanics
OH = fT _ 0, II
0X 0 p = hT -- O. (6) As=-~ ~ , witha>0,
540
Multilevel optimization in mechanics
In the majority of cable structures the number method of Lasdon and Schoeffier and the feasi-
of cables and nodes is large, and so an optimi- ble gradient controller method of Brosilow, Las-
zation problem with a large number of unknowns don and Pearson [11]. In the nonfeasible gradient
and constraints must be solved. Here, a multilevel controller method the value of p is supposed to be
optimization technique suitable for the solution of constant in the first level, say Pl, and the min-
this kind of optimization problem is proposed. The imization problem decomposes into the two sub-
initial optimization problem is decomposed into a problems
number of subproblems. In the 'first level' of the
min{H'(u) + u T G K o w - p~w}
calculation, each subproblem is optimized sepa- U,W
It is interesting to note that some of these sub- After performing the optimization, the values of u,
problems constitute minimization problems with- v and w, e.g. ui, vi and wi, result. It is obvious
out inequality constraints (corresponding to clas- that vi # wi. The task of the second level is to
sical bilateral structures), and the algorithms for estimate a new value of p, e.g. P2 by means of the
their numerical treatment are much faster. The ini- equation
tial problem is decomposed into two subproblems:
P2 -- Pl -{" ~ ( V l -- Wl), g > 0,
the first involves only the displacement terms and
corresponds to a structure resulting from the given where ~ is a properly chosen constant (see, e.g.,
one by considering that all the cables act as bars [11]), and to transmit this value to the first level.
(capable of having compressive forces), and the The optimization is performed again, new values
second, including only the slackness terms, corre- u2, v2 and w2 result, etc., until the differences
sponds to a hypothetical slack structure. In order v i - wi are made negligible. The algorithm con-
to perform the decomposition, the potential energy verges in a finite number of steps, provided that
of the structure is written in the form the minima exist [11].
II(u, v) - n ' ( u ) + II"(v) + u T G K 0 v, (9) In the feasible gradient controller method, the
value of w is taken as constant in the first level,
where
e.g. wi, and thus the initial problem decomposes
1 T T into the two subproblems
II'(u) - ~ u K u - u (GK0e0 + p) (10)
and min{H'(u) + u T G K o w i }
u
541
Multilevel optimization in mechanics
rain {II(u, A): A _> 0}. (12) A. Consider a large structure involving also some
cables and assume that due to the pretension of the
By introducing a new variable w, (12) takes the
cables the structure is calculated as if the cables
form
are rods, i.e. by ignoring the fact that a cable may
min {II(u, A, w) - II'(u) + II"(A) (13) become slack and then it has zero stresses. Then
in the equations (9)-(11) v - 0 and the solution
~uTGKoNw e w- A, A _> 0 } .
of the minimum problem is obtained by solving an
As in the previous section, the decomposition can unconstrained minimization problem, i.e. by a lin-
be performed by the two methods of the feasi- ear system solver. In order to check whether the
ble and the nonfeasible gradient controller respec- solution of the simplified model is close to the so-
tively. For the sake of brevity only the nonfeasi- lution of the initial problem, in which some cables,
ble gradient method will be shown here. The La- say r, may become slack, i.e. vi > O, i - 1 , . . . , r,
grangian of (13) is first considered it is enough to verify whether the second level
542
Multilevel optimization in mechanics
controller which gives a value of the slackness of stresses) or displacements (respectively, forces).
the cables causes a significant change in the solu- Thus the feasible and the nonfeasible decompo-
tion of the first level problem which corresponds to sition method have a precise mechanical mean-
the simplified structure. Also the algorithm offers ing. In the first case the Lagrange multipliers, i.e.
an improvement of the solution of the simplified the strains (respectively, the stress) are controlled
model. while in the second one the coordinating variables,
i.e. the stress (respectively, the strain) of the links
B. Here, the investigation of the mutual influence
between the two substructures are controlled, in
of two subsystems is presented. Consider two sub-
order to achieve the position of equilibrium of the
structures connected together, for instance a cylin-
whole structure.
drical shell with a hemispherical shell covering the
one end of the cylinder. The solution of the whole D. Some of the resulting substructures may have
linear elastic structural compound minimizes, for a known analytical solution. Then this fact facili-
a given external loading, the potential (or the com- tates the calculation and may be applied as a test
plementary) energy of the whole structure. Let Xl for the accuracy of the resulting solution via a nu-
(respectively, x2) be the variables of the cylindri- merical technique, e.g. by the FEM model. The
cal (respectively, the hemispherical) shell and let z procedure is described in [24].
be the common variables at the contact line which
are common in both structures. In order to decom- E. The multilevel decomposition method can be
pose the potential energy into two minimum prob- used also as estimator of the sensitivity of the final
lems, one containing the unknowns of the cylindri- solution to small changes of the system to be opti-
cal shell and the other of the hemispherical shell, mized [24]. This method may be used for example
the common variables for the cylindrical (respec- in estimating how a partial change in a structure
tively, hemispherical) shell are denoted by Zl (re- influences the stress and strain field of the struc-
spectively, z2) and thus the initial problem ture without solving twice the structure.
is written as
D e c o m p o s i t i o n A l g o r i t h m s for N o n c o n v e x
rain {IIl(Xl, Zl) + II2(x2, z2)" Zl - z2 - 0}. M i n i m i z a t i o n P r o b l e m s . In unilateral contact
Xl ,X2 ,ZI ,~'2
problems with friction, Panagiotopoulos proposed
Here HI (respectively, YI2) denotes the potential
in 1975 an algorithm [18] called later PANA-
or the complementary energy of the cylindrical (re-
algorithm for the decomposition of the quasivari-
spectively, the hemispherical).shell. Thus it can be
ational inequality problems into two classical vari-
tested by the nonfeasible controller method how
ational inequality problems which are equivalent
the difference Zl -z2 influences the solution of the
to two minimization problems. Analogous decom-
problem. The procedure is similar in the case of
position methods of complicated problems using
elastoplastic structures with the difference that the
an analogous to [18] fixed point procedure can be
minimum is constrained by inequalities.
applied to the treatment of much more compli-
The above procedure may find applications in
cated problems today involving nonconvex energy
estimating the influence of saddles on pipelines of
functions. This section is devoted to the study of
rigidity rings on long tubes etc.
multilevel decomposition algorithms for problems
C. Note that in all the above cases the Lagrange belonging to the general framework of the substa-
multipliers have a precise meaning: they corre- tionarity problems.
spond in the sense of energy to the chosen co- It is known that the equilibrium of an elastic
ordinating variables, i.e., if the coordinating vari- body gt in adhesive contact with a support F is
ables are stresses (respectively, strains) or forces governed by the following problem [21], [17]: Find
(respectively, displacements) then the coordinat- u E V such as to satisfy the hemivariational in-
ing Lagrange multipliers are strains (respectively, equality
543
Multilevel optimization in mechanics
- ~ Ix j
~ ( u , u) + jN(UN)dF
geometry are analyzed here as a sequence of clas-
sical interface subproblems. These classical sub-
problems result from the consideration of the ]rac-
tal interface as the unique 'fixed point' of a given
iterative ]unction system (IFS), which consists of
+f - (i, u)/' N contractive mappings wi" R 2 -+ R 2 with con-
tractivity factors 0 < si < 1, i - 1 , . . . , N [2].
According to this procedure, a fractal set A is the
where 0 denotes the generalized gradient of Clarke.
'fixed point' of a transformation W i.e.
In engineering problems the nonconvex superpo-
N
tentials (cf. e.g. [16]) j g and jT are not indepen-
A- W(A) - (.j Wi(A),
dent but they depend jY (respectively, jT) on the
i=1
vectors ST (respectively, SN), where ST, SN are
where Wi is defined
the reactions corresponding to UT, ug respectively.
In this case a hemivariational inequality cannot be Wi(B) = {wi(x)" x E B}, VB E H(R2).
formulated. In order to solve this problem numer- Generally a fractal set A is given by the relation"
ically one may apply the following procedure: In
the first step it is assumed that SN is given, say,
A-lira
n--+ oo
W(~)(B), VBeH(R2),
S(~ ) and the problem (S(N°) enters with its work where H ( R 2) is the space of all compact sub-
into (f~0), u)) sets of R 2. Thus each level corresponds to a clas-
544
Multilevel optimization in mechanics
sical geometry approximating the fractal geome- obtained using numerical procedures for the solu-
try. Within each level a new optimization problem tion of ( 1 7 ) a n d (18). This procedure is repeated
is solved with the new data. Thus the multilevel several times by increasing n; at the limit n --+ co,
character of the optimization problem results from u (n) and a (n) give the solution of the fractal inter-
the necessity to take into account the fractal ge- face problem.
ometry. See also: M u l t i l e v e l m e t h o d s for o p t i m a l
In the sequel a linear elastic structure occupying design; Bilevel p r o g r a m m i n g : A p p l i c a t i o n s ;
a subset ~ of R 3 is considered. In its undeformed S t o c h a s t i c bilevel p r o g r a m s ; Bilevel frac-
state the structure has a boundary F which is de- t i o n a l p r o g r a m m i n g ; Bilevel p r o g r a m m i n g :
composed into two mutually disjoint parts Fu and I n t r o d u c t i o n , h i s t o r y a n d overview; Bilevel
F F. It is assumed that on Fu (respectively, F F) linear p r o g r a m m i n g ; B ilevel linear p r o g r a m -
the displacements (respectively, the tractions) are m i n g : C o m p l e x i t y , e q u i v a l e n c e to m i n m a x ,
given. In the structure ~ some cracks with inter- concave p r o g r a m s ; Bilevel p r o g r a m m i n g :
faces (I) of fractal type are formed. These cracks in O p t i m a l i t y c o n d i t i o n s a n d duality; Bilevel
brittle materials frequently propagate along one or p r o g r a m m i n g ; B ilevel p r o g r a m m i n g : Algo-
more irregular ways. In this case the fracture sys- r i t h m s ; Bilevel p r o g r a m m i n g : G l o b a l opti-
tem may be considered to be a cluster of branches m i z a t i o n ; Bilevel p r o g r a m m i n g in m a n a g e -
propagating in such a way that new branches in m e n t ; Bilevel p r o g r a m m i n g : A p p l i c a t i o n s in
the n + 1 step are successively created from a for- e n g i n e e r i n g ; Bilevel o p t i m i z a t i o n : Feasibil-
mer branch at the n step. In other words the frac- ity t e s t a n d flexibility index; Bilevel pro-
ture system can be modeled by an IFS procedure. gramming: Implicit function approach.
Regarding now the boundary conditions on (I), it
is assumed that nonmonotone, possibly multival- References
[1] ARROW, K.J., HURWICZ, L., AND UZAWA, H.: Studies
ued laws describe the behavior of each interface in
in linear and nonlinear programming, Stanford Univ.
the normal and tangential directions. More specif- Press, 1985.
ically, it is assumed that the following boundary [2] BARNSLEY, M.: Fractals everywhere, Acad. Press,
conditions hold: 1988.
[3] BAUMAN, E.J.: 'Trajectory decomposition', in C.T.
--~N E OjN(UN, X), LEONDES (ed.): Optimization methods for large scale
systems with applications, McGraw-Hill, 1971.
- - S T E OjT(UT, X).
[4] BERTSEKAS, D.P., AND TSITSIKLIS, J.N.: Parallel and
Then according to the previous section, an equi- distributed computation: numerical methods, Prentice-
librium position of f2 is characterized by the hemi- Hall, 1989, Last edition: Athena Sci. Belmont Mass.
1997.
variational inequality (16).
[5] BOLZA, O.: Lectures on the calculation of variations,
In this case, where the fractured body ~2 with Chicago, 1904.
fractal interfaces (I) is studied, it is necessary to [6] BROSILOW,C.B., LASDON,L.S., ANDPEARSON, J.D.:
substitute in (16) the domain F with (I). As it has 'A multi-level technique for optimization': Proc. Joint
been mentioned above, (I) is the fixed point of a Aurora. Control Conf., Rensselaer ~ Polytech. Inst.,
1965.
given transformation denoted by W, i.e.
[7] CLARKE, F.H.: Optimization and nonsmooth analysis,
,I, = W,I,, Wiley, 1983.
[8] DEMYANOV, V.F., STAVROULAKIS, G.E., POLYAKOVA,
(I)(n+l) = W(~(n) L.N., AND PANAGIOTOPOULOS, P.D.: Quasidifferentia-
@(-) _+ @. bility and nonsmooth modelling in mechanics, engineer-
n--~oo
ing and economics, Kluwer Acad. Publ., 1996.
Thus, for each approximation (I)(n) of the fractal [9] HADLEY, G.: Non-linear and dynamic programming,
Addison-Wesley, 1964.
interface (~ a structure ~(n) must be solved. Since
[10] HAMEL, G.: Theoretische Mechanik, Springer, 1967.
(I)(n) is an interface set with classical geometry the [11] LASDON, L.C., AND SCHOEFFLER, J.D.: 'A multi-level
solutions u (n) and a (n) (where u (n) and a (n) are the technique for optimization': Proc. Joint A utom. Con-
corresponding displacement and stress fields) are trol Conf., Rensselaer Polytech. Inst., 1965.
545
Multilevel optimization in mechanics
546
Multiparametric linear programming
547
Multiparametric linear programming
- Solve (1) by treating 0 as a free variable ric m i x e d integer linear programming; Para-
to obtain 0*. If no feasible solution exists, metric m i x e d integer nonlinear optimiza-
stop; (1) is infeasible. tion.
- Fix 0 = 0* and solve (1) to obtain an
initial basis B and corresponding critical References
region. [1] GAL, T.: 'Putting the LP survey into perspective',
OR/MS Today 19, no. 6 (1992), 93.
2) Find all optimal solutions: [2] GAL, T.: 'Weakly redundant constraints and their im-
- Construct two lists V and W, where V pact on postoptimal analysis in linear programming',
consists of those optimal bases whose Europ. J. Oper. Res. 60 (1992), 315-336.
[3] GAL, T.: Postoptimal analyses, parametric program-
neighboring bases have been identified,
ming, and related topics, de Gruyter, 1995.
and W consists of those bases whose [4] GAL, T.: Advances in sensitivity analysis and paramet-
neighbors have yet not been identified. ric programming, Kluwer Acad. Publ., 1997.
- Select any basis from W and identify all [5] GAL, T., AND NEDOMA, J.: 'Multiparametric linear
its neighboring bases. From all the identi- programming', Managem. Sci. 18 (1972), 406-422.
[6] GRANOT, D., GRANOT, F., AND JOHNSON, E.L.: 'Du-
fied bases, insert in W those bases which
ality and pricing in multiple right-hand choice linear
are neither in V nor in W. The optimal programming problems', Math. Oper. Res. ? (1982),
solutions (and corresponding critical re- 545-556.
gions) are then determined by moving [7] GREENBERG, H.J.: 'An analysis of degeneracy', Naval
from the basis to its neighbors by one Res. Logist. Quart. 33 (1986), 635-655.
dual step. [8] GREENBERG, H.J.: 'How to analyze the results of lin-
ear programs- Part 1: Preliminaries', Interfaces 23,
- Repeat the procedure until W = {0}.
no. 4 (1993), 56-67.
[9] GREENBERG, H.J.: 'How to analyze the results of lin-
z(o) ear programs- Part 2: Price Interpretation', Interfaces
23, no. 5 (1993), 97-114.
[10] GREENBERG, H.J.: 'How to analyze the results of linear
~e) ~
programs- Part 3: Infeasibility Diagnosis', Interfaces
z,(o~
23, no. 6 (1993), 120-139.
[11] GREENBERG, H.J.: 'How to analyze the results of lin-
ear programs- Part 4: Forcing Structures', Interfaces
eL
24, no. 1 (1994), 121-130.
.......-" ............' [12] GREENBERG, H.J.: 'The use of optimal partition in
/~i...... " cR~ linear programming solution for postoptimal analysis',
Oper. Res. Left. 15 (1994), 179-185.
oy [13] GREENBERG, H.J.: 'The ANALYZE rulebase for sup-
porting LP analysis', Ann. (9per. Res. 65 (1996), 91-
126.
[14] HANSEN, P.M., LABBE, M., AND WENDELL, R.E.:
Fig. 2: z(0) is a continuous and convex function of 0.
'Sensitivity analysis in multiple objective linear pro-
See also: Multiplicative programming; gramming: the tolerance approach', Europ. J. Oper.
Global o p t i m i z a t i o n in multiplicative pro- Res. 38, no. 1 (1989), 63-69.
[15] MAGNATI, T.L., AND ORLIN, J.B.: 'Parametric linear
gramming; Linear programming; P a r a m e t -
programming and anti-cycling pivoting rules', Math.
ric linear programming: Cost simplex algo- Program. 41 (1988), 317-325.
rithm; Parametric global optimization: Sen- [16] MURTY, K.: 'Computational complexity of parametric
sitivity; M u l t i p a r a m e t r i c linear program- linear programming', Math. Program. 19 (1980), 213-
ming; Selfdual parametric m e t h o d for linear 219.
programs; Nondifferentiable optimization: [17] Roos, C., TERLAKY, T., AND VIAL, J.-PH.: The-
ory and algorithms for linear optimization, an interior
Parametric programming; B o u n d s and solu-
point approach, Wiley, 1997.
tion vector e s t i m a t e s for parametric NLPs; [18] WANG, H-F, AND HUANG, C-S: 'Multiparametric anal-
Parametric optimization: E m b e d d i n g s , path ysis of the maximum tolerance in a linear programming
following and singularities; M u l t i p a r a m e t - problem', Europ. J. (9per. Res. 67, no. 1 (1993), 75-87.
548
Multiparametric mixed integer linear programming
[19] WARD, J.E., AND WENDELL, R.E.: 'Approaches to more than one uncertain parameter the solution
sensitivity analysis in linear programming', Ann. Oper. approach is available only for the right-hand side
Res. 27 (1990), 3-38.
case. Next we will describe solution approaches for
[2o] WENDELL, R.E.: 'The tolerance approach to sensitiv-
ity analysis in linear programming', Managem. Sci. 31 i) single parametric mixed integer linear pro-
(1985), 504-578. grams for objective function coefficients
[21] WENDELL, R.E.: 'Linear programming 3: The toler- parametrization; and
ance approach', in T. GAL AND H.J. GREENBERG
(eds.): Advances in Sensitivity Analysis and Paramet- ii) single parametric pure integer programs
ric Programming, Kluwer Acad. Publ., 1997. when the uncertain parameter is present on
Vivek Dua the right-hand side of the constraints.
Imperial College These illustrate some concepts which are based
London, U.K. upon some basic observations. For other solution
Efstratios N. Pistikopoulos approaches, see the literature cited above. Finally
Imperial College
we will present a solution approach for right-
London, U.K.
E-mail address: e . pistikopoulosOic, a c . uk
hand side multiparametric mixed integer linear
programs.
MSC2000: 90C31, 90C05
Key words and phrases: sensitivity analysis with respect to
right-hand side changes, critical region.
Mixed Integer Linear Programming Prob-
l e m s I n v o l v i n g a Single U n c e r t a i n P a r a m e -
t e r in O b j e c t i v e F u n c t i o n Coefficients. These
MULTIPARAMETRIC MIXED INTEGER can be stated as follows:
LINEAR PROGRAMMING z(¢) - min(c m + c'¢)x + dmy
x,y
In this article we describe theoretical and algorith-
s.t. Ax + Ey <_ b, (1)
mic developments in the field of parametric pro-
n, ye{O, 1}
gramming for linear models involving 0-1 integer
variables. We will consider two cases of the prob- Cmin _~ ¢ <_ Cmax,
lem: single parametric (when a single uncertain where x is a vector of continuous variables; y is the
parameter is present) and multiparametric (when vector of 0-1 integer variables; ¢ is a scalar un-
more than one uncertain parameters are present certain parameter bounded between its lower and
in the model). For the case when a single uncer- upper bounds ¢min and (/)maxrespectively; A is an
tain parameter is present, solution approaches are (m x n) matrix; E is an (m x l) matrix; c, d, d
based upon and b are vectors of appropriate dimensions. Solu-
i) enumeration ([12], [13], [11]); tion procedure for (1) is based upon following two
features of the formulation in (1). First feature of
ii) cutting planes ([6]); and
this formulation is that, since the uncertain pa-
iii) branch and bound techniques ([8], [10]). rameter is present in the objective function only,
For the multiparametric case, solution algorithm the feasible region of (1) remains constant for all
that has been proposed is based upon branch and the fixed values of ¢ in [¢min,¢max]. And the sec-
bound fundamentals [1], [2]. While most of the ond feature is that, the optimal value of (1) for
work on single parametric problems has been re- ¢min _~ ~ _~ (/)max is piecewise linear, continuous,
viewed in the two excellent papers [5] and [7], and and concave on its finite domain. The solution is
has been borrowed here for the sake of complete- then approached by deriving valid upper and lower
ness, the work on multiparametric problems, the bounds, using the concavity property of the objec-
focus of this article, is quite recent and is described tive function value, and sharpening these bounds
in detail. It may be mentioned that while solution until they converge to the same value, as described
approaches for single parametric case are available next. Solving (1) for ¢ fixed at its endpoints ~min
for uncertainty in objective function coefficients and Cmax, gives upper bounds A B and B C respec-
or right-hand side of constraints, for the case of tively (see Fig. 1); and a linear interpolation, AC,
549
Multiparametric mixed integer linear programming
between the endpoints provides a lower bound to (2), a solution will remain optimal for some inter-
the solution. The region A B C within which the so- val of 0 and then suddenly another solution will
lution will lie is then reduced by solving (1) at ¢int, become op$imal, and remain so for the next inter-
the intersection point of two upper bounds AB and val (see Fig. 3). The problem thus reduces to solv-
BC. This results (see Fig. 2) in two smaller re- ing (2) at an end point, say 0min, and then finding
gions, A D E and EFC, within which the solution a point 0i at which the current solution becomes
will exist. This procedure is continued until the dif- infeasible. Solving (2) at Oi + c will give another
ference between upper and lower bounds becomes integer solution. This procedure is continued until
zero. we hit the other end point, 0max.
z(¢) UB = UPPER BOUND; LB = LOWER BOUND z(O)
UB I
• 1
I ,,
-i
I
I
I
I
i
I
I
...., -" " I I I
I I
I
i I
! i i I
t I i
I I
I I I
i.~ , . . . . I I I t
i
I
Fig. 1: Derivation of bounds. Fig. 3: Step function nature of objective function value.
z(¢) Consider a multiparametric mixed integer linear
UB l programming problem (mp-MILP) of the following
UB2 B
form"
z(0) - m i n c T x + d -ry
x,y
s I I
550
Multiparametric mixed integer linear programming
from the root node (lower bound) towards terminal lution at an intermediate node, 2(0)/, valid in its
nodes (upper bound) by fixing y variables at the corresponding critical regions, CR/, is then ana-
intermediate nodes. The complete enumeration of lyzed, to decide whether to explore subnodes of
the tree is avoided by fathoming those intermedi- this intermediate node or not, by using the follow-
ate nodes which guarantee a suboptimal solution. ing fathoming criteria. A given space in any node
At the root node, by relaxing the integrality can be discarded if one of the following holds:
condition on y, i.e., considering y as a continuous • (infeasibility criterion) Problem (6) is infea-
variable bounded between 0 and 1, (3) is trans- sible in the given space.
formed to an mp-LP of the following form"
• (integrality criterion) An integer solution is
'~(0) - m i n c T x + d T~) found in the given space.
s.t. Ax + E~I < b + FO, • (dominance criterion) The solution of the
GO < g, (4) node is greater than the current upper bound
in the same space.
0_<~_<1,
If all the regions of a node are discarded the node
, xER n OER s
can be fathomed. While the first two fathoming
The solution of (4), given by linear parametric pro- criteria (Infeasibility and Integrality) are easy to
files, 2(0) i, valid in their corresponding critical re- apply, in order to apply the third one (dominance
gions, CR i, represents a parametric lower bound. criteria) we need a comparison procedure, which
Similarly, at a node where all y are fixed, y - ~', is described next.
(3) is transformed to an mp-LP of the following x2
form:
¢
- {0, 1} t,
1
x E Rn 0 E R s
k
551
Multiparametric mixed integer linear programming
• (case 1; Fig. 5) All constraints from CR are • (case 4; Fig. 8) The problem is infeasible.
redundant. This implies that CR _~ CR, and This implies that two spaces are apart from
t h e r e f o r e C R int - C"R. each other and CR i n t - {0}.
02 02 int
k kXkkkkkkkkkkk gR CR = re)
I/ ', '
~ , •
,,~ , / C R
A = cRint
/ x CR
s •
I l l g l l g l i l l l *,
4 / / / 7 / / / / 1 1 / / / ~
....._
0L . ,
el
Fig. 5: Definition of cRint; Case 1.
Fig. 8" Definition of cRint; Case 4.
• (case 2; Fig. 6) All constraints from C~-R are
redundant. This implies that CR _D CR, and Once C R int has been defined, the second step
t h e r e f o r e C R int - CR. is to compare ~ to ~, so as to find which of the
02
two is lower. This is achieved by defining a new
% %%
• %) constraint"
, tS tt ,,,-..-~?
s t
2a~ca and checking for redundancy of this constraint in
• t ,
eo ~,~
el
I •
qlg'gl~lg 0 •
CR
. . --~.~
• (case 2; Fig. 10) The problem is infeasible.
el
This implies that ~(0) >_ ~(0) and therefore
the space can be discarded from further anal-
Fig. 7" Definition of cRint; Case 3.
ysis.
552
Multiparametric mixed integer linear programming
553
Multiparametric mixed integer linear programming
Discrete Math. 1 (1977), 375-390. taneously refold to their unique, native structure
[9] NEMHAUSER, G.L., AND WOLSEY, L.A.: Integer and after denaturation. This implies that the forma-
combinatorial optimization, Wiley, 1988.
tion of the native structure is controlled primar-
[10] OHTAKE, Y., AND NISHIDA, N.: 'A branch-and-bound
algorithm for 0-1 parametric mixed-integer program- ily by the amino acid sequence. According to An-
ming', Oper. Res. Left. 4, no. 1 (1985), 41-45. finsen's hypothesis the native structure is in a
[11] PIPER, C.J., AND ZOLTNERS, A.A.: 'Some easy state of thermodynamic equilibrium correspond-
postoptimality analysis for zero-one programming', ing to the conformation with the lowest free en-
Managem. Sci. 22, no. 7 (1976), 759-765.
ergy. Through mathematical modeling of protein
[12] ROODMAN, G.M.: 'Postoptimality analysis in zero-one
programming by implicit enumeration', Naval Res. Lo-
interaction energies, the protein folding problem
gist. Quart. 19 (1972), 435-447. can be addressed as a con]ormational search for
[13] ROODMAN, G.M.: 'Postoptimality analysis in zero- the global minimum energy.
one programming by implicit enumeration: The mixed- There exists two fundamental problems associ-
integer case', Naval Res. Logist. Quart. 21 (1974), 595-
ated with protein folding in the context of a con-
607.
formational search. The first is the ability to cor-
Vivek Dua rectly model protein interactions using detailed
Imperial College
mathematical equations. The second is associated
London, U.K.
with searching the highly nonconvex energy hyper-
Efstratios N. Pistikopoulos
Imperial College surface that describes a given protein. This com-
London, U.K. plexity, coupled with an exponential growth in the
E-mail address: e . pistikopoulosOic, ac. uk number of local minima as the size of protein in-
MSC2000: 90C31, 90Cll creases, has become known as the multiple rain-
Key words and phrases: parametric bounds, branch and ima problem. There exists an obvious need for the
bound, comparison of parametric solutions. development of efficient global optimization tech-
niques. An efficient method which has been suc-
cessfully applied to detailed atomistic models of
protein folding is the c~BB [1], [2], [3], [17] global
MULTIPLE MINIMA PROBLEM IN PRO- optimization algorithm.
TEIN FOLDING: o BB GLOBAL OPTIMIZA-
TION APPROACH
M a t h e m a t i c a l D e s c r i p t i o n . Proteins are essen-
tially polymer chains composed of a predefined
M o t i v a t i o n . Proteins are arguably the most com- set of amino acid residues in which neighbor-
plex molecules in nature. This complexity arises ing residues are linked by peptidic bonds. Natu-
from an intricate balance of intra- and inter- rally occurring proteins consist of only 20 differ-
molecular interactions that define the native three- ent amino acid residues, and the form of the side
dimensional structure of the system, and subse- chain R (e.g., methyl, butyl, benzoic, etc.) defines
quently its biological functionality. The underly- the differences between these constituent groups.
ing goal of protein .folding research is to under- The chemical structure of a generic protein is il-
stand the formation of these native tertiary struc- lustrated in Fig. 1. The repeating unit - N C a C ' -
tures. Genetic engineering can be used to produce defines the backbone of the protein. The protein
proteins with specific amino acid sequences. The also possesses amino and carboxyl end groups, de-
next step involves developing the link between the
noted by EAmino and ECarboxyl, respectively..
primary protein sequence and the native struc-
The geometry of a protein can be fully described
ture. The ability to predict the folding of proteins
by assigning a three-dimensional coordinate vector
promises to have important practical and theoret-
ri:
ical ramifications, especially in the areas of medic-
inal and biophysical chemistry.
Experimental studies have shown that pro- ri ---- Yi •
teins, under native physiological conditions, spon- zi
554
Multiple minima problem in protein folding: a B B global optimization approach
These ri specify the position of each atom in the hedral angle between the normals of the planes
protein molecule. The bond vector between two formed by atoms C~_INiC ~ and NiC~C~ respec-
atoms (i, j) connected with a covalent bond is de- tively, is called ¢i, where i - 1 and i are two ad-
fined as: jacent amino acid residues. The angle defined by
the planes NiC~C~ and C~C~Ni+I, respectively,
rij -
I
xj - xi I
yj yi •
zj zi
The corresponding bond length is then equal to
is called ¢i, where i and i + 1 are two adjacent
amino acid residues. Also, wi is the dihedral angle
defined by the planes C~ C~Ni+l and CiNi+lCi+
, a 1.
The letter X is utilized to denote the dihedral an-
the Euclidean distance between these two atoms:
gles which are associated with the side groups Ri.
I~jl - X/(~j - ~ ) 2 + (y~ _ y~)2 + (zj - z~) 2 Finally, the letter 0 is used to name the dihedral
A covalent bond angle, Oijk, formed by the two ad- angles associated with the two end groups. These
jacent bond vectors rij and rjk can be computed conventions are illustrated in Fig. 3.
by the following formulas:
rij • rjk sin (Oijk) -- rij x rjk
cos (o#~) - Ir~jll~jkl' Ir~jll~jkl"
. [:.
Here, rij "rjk is the dot product of the bond vectors
rij and rjk and rij x rjk is the cross product. k
rjk k - - ,
EAmino Ecarboxyl
1
555
Multiple minima problem in protein folding: aBB global optimization approach
BER [24], CHARMM [7], E C E P P / 3 [19], GRO- replaced by a modified 10-12 Lennard-Jones type
MOS [11], MM3 [4]. These models, also known term:
as force fields, are typically expressed as summa-
tions of several potential energy components, with Ehbond -- £ij 5 \ Rij --6 ~ .
the mathematical form of individual energy terms
based on the phenomenological nature of that Finally, corrective torsional energies, Etor,
term. A general total potential energy equation which are represented by a three term Fourier se-
should include terms for bond stretching (Ebond), ries expansion, are also added:
angle bending (E angle), torsion (Etor) and non- E2
Etor - El(l" - cos ¢) + (1 - cos 2¢)
bonded (END) interactions: -Y 5-
E3
Epotential - Zbond -+-Eangle -+-Ztor -b Enb + - ~ (1 - cos 3¢).
When rigid body approximations are employed, Each term can be interpreted physically. The 1-
bond stretching and angle bending energies can be x (cos ¢) symmetry term accounts for those non-
neglected. For these force fields, torsion angles de- bonded interactions not included in general non-
fine a set of independent variables that effectively bonded terms. The 2-x (cos 2¢) symmetry term is
describe any protein conformation. This approxi- related to the interactions of orbitals, while the 3-x
mately reduces the number of variables by a factor (cos 3¢) symmetry term describes steric contribu-
of 3 over those force fields that use a Cartesian co- tions.
ordinate system to describe flexible molecular ge- Other specific potential energy terms may also
ometries.
be added to the general energy equation depend-
One example of a rigid body atomistic level po- ing on the exact protein sequence. For example,
tential energy model is the E C E P P / 3 force field. the formation of disulfide bridges can be enforced
In this case, the nonbonded energy terms, Enb, in- by adding a penalty term to constrain the values
clude electrostatic, Eelec, van der Waals, Evdw, and of particular atomic distances. Correction terms
hydrogen bonding, Ehbond, interactions. These en- have also been used to adjust conformational en-
ergies are calculated for those atoms that are sep- ergies according to the configurations of proline
arated by more than two atoms; that is, the atoms and hydroxyproline residues.
possess at least a 1-4 relationship. Electrostatic
energies, Eelec , are calculated as Coulombic forces S o l v a t i o n E n e r g y M o d e l i n g . In general, the en-
based on atomic point charges" ergetic description of a protein must also include
solvation effects. A theoretically simple approach
Eelec -- QiQi
eRij would be to explicitly surround the peptide with
Here, Qi and Qj represent the two point charges, solvent molecules and compute potential energy
while Rij equals the distance between these two contributions for intra-and inter-molecular inter-
points. The e term describes the dielectric nature actions. These explicit calculations tend to greatly
of the protein environment. increase the computational cost of the simulation.
General nonbonded van der Waals interactions, In addition, solvent configurations are not rigid,
Evdw, are modeled using a 6-12 Lennard-Jones po- so these calculations must consider an average
tential energy term, which consists of a repulsion solvent-peptide configuration, which is typically
and attraction term" generated by a number of Monte-Carlo (MC) or
molecular dynamics (MD) simulations [14]. There-
[ (Ri*j) 12 (Ri~) 6]
E aw- - 2 . fore, most simulations of this type are limited to
restricted conformational searches.
The energy minimum for a given atomic pair is de- An alternative way for effectively considering
scribed by the potential depth, eij, and position, average solvent effects is to use implicit solvation
Ri~. For those atomic pairs that may form a hy- models. One complication involves the solvent's
drogen bond, the 6-12 potential energy term is influence on electrostatic interaction energies be-
556
Multiple minima problem in protein folding: a B B global optimization approach
cause of the implicit relationship between dielec- ergy contributions can easily be added at every
tric effects and solvation. A simple solution has step of local minimizations.
been to modify the representation of the dielectric
term. In reality, however, the rigorous treatment P r o b l e m F o r m u l a t i o n . For protein folding, the
of electrostatic interactions involves the solution energy m i n i m i z a t i o n problem can be formulated as
of the Poisson-Boltzmann equation. a nonconvex, nonlinear global optimization prob-
Other simple and computationally feasible im- lem in which the energy, E, must be globally min-
plicit solvation models are based on empirical rep- imized with respect to the dihedral angles of the
resentations of the solvation energy. In these cases, protein:
the solvation energy of each functional group is re-
min E ( ¢i , ~)i , wi , xki , oN, OC)
lated to the interaction of the solvent with a hy-
dration shell for the particular group. The individ- subject to -~_<¢i_<
ual terms are then summed together to provide a -Tr_< ¢i _< 7r
total solvation energy for the system. These solva- -Tr _< ~i _< 7r
tion contributions can be described by the follow-
ing general equation"
-Tr _< 0 g _< 7r
N --Tr _< 0 c _< 7r.
Esolv Siai.
i--1 The index i - 1 , . . . , NRES defines the number of
residues, NRES, in the protein. In addition, k -
Typically, Si represents either the solvent- 1 , . . . , K i denotes the number of dihedral angles in
accessible surface area, Ai, or the solvent- the side chain of the ith residue, and j = 1 , . . . , j Y
accessible volume of hydration layer, VHSi, for the and j = 1 , . . . , J C indicates the indices of the
functional group, and ai is an empirically derived amino and carboxyl end groups, respectively. The
free energy density parameter. energy, E, represents the total potential energy
A number of algorithms have been developed for function, Epotential, plus the free energy of solva-
calculating solvent-accessible surface areas [8], [9], tion, Esolv. In most cases, this is the exact formula-
[22]. Although several of these are relatively effi- tion; that is, energetic and gradient contributions
cient, the appearance of discontinuities has been can be added at each step of the minimization.
one complication in considering solvent accessible However, in the case of surface-accessible hydra-
surface areas. In addition, a large number of pa- tion using the JRF parameters, the potential en-
rameterization strategies (JRF, OONS, WE, etc.) ergy function is minimized before adding the hy-
have been used to derive appropriate ai parame- dration energy contributions. In other words, gra-
ters [21], [23], [25]. In the case of the JRF parame- dient contributions from solvation are not consid-
ter set, discontinuities can be avoided because the ered.
surface-accessible solvation energies are only in- Even after reducing this optimization problem
cluded at local minimum conformations [23]. This to a function of internal variables, the multidimen-
is because the parameters were derived from low sional surface that describes the energy function
energy solvated configurations of actual tetrapep- possesses an astronomically large number of local
tides. minima. In addition, evaluation of the energy, es-
Several methods have also been developed for pecially with the addition of solvation, is compu-
calculating the hydration volumes and correspond- tationally expensive, which makes even local min-
ing free energy parameters [6], [12]. A recent and imization slow. A large number of techniques have
computationally inexpensive method, RRIGS, is been developed to search this nonconvex confor-
based on a Gaussian approximation for the volume mational space. Many methods employ stochastic
of a hydration layer [6]. This method also inher- search procedures, while others rely on simplifica-
ently avoids numerical problems associated with tions of the potential model and/or mathematical
possible discontinuities so that the solvation en- transformations. In addition, the use of statistical
557
Multiple minima problem in protein folding: aBB global optimization approach
558
Multiple minima problem in protein folding: aBB global optimization approach
rectangle is greater than the current upper bound, objective function gradient vector is below a
this hyper-rectangle can be discarded because the specified tolerance (kcal/mol/deg). The sec-
global minimum cannot be within this subdomain ond derivative matrix is also evaluated to ver-
(fathoming step). ify that the upper bound solution is a local
The computational requirement of the c~BB minimum.
algorithm depends on the number of variables 5) The hyper-rectangle with the current mini-
(global) on which branching occurs. Therefore, mum value for L is selected and partitioned
these global variables need to be chosen carefully. along one of the global variables.
Qualitatively, the branching variables should cor-
6) If the best upper and lower bounds are within
respond to those variables which substantially in-
an c tolerance the program will terminate,
fluence the nonconvexity of the surface and the
otherwise it will return to Step 2.
location of the global minimum. In terms of the
protein folding problem, it is generally accepted A novel approach has also been proposed for the
that the backbone dihedral angles (¢ and ¢) are initialization of the c~BB algorithm [5]. Specifically,
the most influential variables. Therefore, in larger an analysis of 98 proteins from the Brookhaven X-
problems, the global variable set should include ray data bank was used to develop dihedral angle
only the set of ¢ and ¢ variables. In this case, the distributions in the form of histograms from - ~ to
dihedral angles associated with the peptide bond for each dihedral angle of each of the naturally
(w) and the side chains (X) are treated as local occurring amino acids. Using this information, a
variables. set of reduced domains can be defined for every
dihedral angle of every residue in the peptide se-
A l g o r i t h m i c D e s c r i p t i o n . The basic steps of the quence. Overall initialization domains correspond
algorithm are as follows: to the Cartesian products of all the sub-domains of
individual residues in the protein. This approach
1) The initial best upper bound is set to an ar-
maintains the guarantee of global optimality over
bitrarily large value. The original domain is
the considered search space of the reduced do-
partitioned along one of the global variable
mains, and is deterministic in those subdomains
dimensions.
that possess convex underestimators. In addition,
2) A convex function L is constructed in each all variable bounds are expanded to the [-~, ~]
hyper-rectangle and minimized using a local when solving the upper bounding problem. There-
nonlinear solver, with function calls to po- fore, although the initial point of an upper bound-
tential and solvation models. If a solution is ing minimization is restricted to the search space
greater than the best upper bound the entire of the corresponding lower bounding problem, the
subregion can be fathomed, otherwise the so- solution may lie outside the original subdomain.
lution is stored.
EXAMPLE 1 Met-enkephalin (H-Tyr-Gly-Gly-
3) The local minima for L are used as ini- Phe-Met-OH) is an endogenous opioid pentapep-
tial starting points for local minimizations tide found in the human brain, pituitary, and
of the upper bounding function E in each peripheral tissues. Its biological function involves
hyper-rectangle. In solving the upper bound- a large variety of physiological processes, most no-
ing problems, all variable bounds are ex- tably the endogenous response to pain. The pep-
panded to (-Tr, 7r) domain. These solutions tide consists of 24 dihedral angles and a total of
are upper bounds on the global minimum so- 75 atoms, and has played the role of a benchmark
lution in each hyper-rectangle. molecular conformation problem. The energy hy-
4) The current best upper bound is updated to persurface is extremely complex with the number
be the minimum of those thus far stored. If of local minima estimated on the order of 1011 .
a new upper bound (from step 3) is selected, The unsolvated global minimum energy conforma-
a separate module is called to ensure that tion, which is efficiently located using the c~BB
the absolute value of each gradient in the algorithm, has been shown to exhibit a type II'
559
Multiple minima problem in protein folding: c~BB global optimization approach
560
Multiple objective dynamic programming
tions of oligopeptides', J. Global Optim. 11 (1997), 1- [20] NEUMAIER, A.: 'Molecular modeling of proteins and
34. mathematical prediction of protein structure', SIAM
[6] AUGSPURGER, J.D., AND SCHERAGA, H.A.: 'An ef- Rev. 39 (1997), 407-460.
ficient, differentiable hydration potential for peptides [21] OOBATAKE, M., NI~METHY, G., AND SCHERAGA,H.A..
and proteins', J. Comput. Chem. 17 (1996), 1549-1558. 'Accessible surface areas as a measure of the thermody-
[7] BROOKS, B.R., BRUCCOLERI, R.E., OLAFSON, B.D., namic parameters of hydration of peptides', Proc. Nat.
STATES, D.J., SWAMINATHAN,S., AND KARPLUS, M.: Acad. Sci. USA 84 (1987), 3086-3090.
'CHARMM: A program for macromolecular energy [22] PERROT, G., CHENG, B., GIBSON, K.D., VILA, J.,
minimization and dynamics calculations', J. Comput. PALMER, K.A., NAYEEM, A., MAIGRET, B., AND
Chem. 4 (1983), 187-217. SCHERAGA, H.A.: 'MSEED: A program for the rapid
Is] CONNOLLY, M.L.: 'Analytical molecular surface calcu- analytical determination of accessible surface areas and
lation', J. Appl. Crystallogr. 16 (1983), 548-558. their derivatives', J. Comput. Chem. 13 (1992), 1-11.
[9] EISENHABER, F., AND ARGOS, P.: 'Improved strategy [23] VILA, J., WILLIAMS,R.L., V/~SQUEZ, M., AND SCHER-
in analytic surface calculation for molecular systems: AGA, H.A.: 'Empirical solvation models can be used
Handling of singularities and computational efficiency', to differentiate native from near-native conformations
J. Comput. Chem. 14 (1993), 1272-1280. of bovine pancreatic trypsin inhibitor', PROTEINS:
[10] FLOUDAS, C.A., KLEPEIS, J.L., AND PARDALOS, Struct. Funct. Genet. 10 (1991), 199-218.
P.M.: 'Global optimization approaches in protein fold- [24] WEINER, S.J., KOLLMAN, P.A., CASE, D.A., SINGH,
ing and peptide docking': DIMACS, Vol. 47, Amer. U.C., GHIO, C., ALAGONA, G., PROFETA, S., AND
Math. Soc., 1998, pp. 141-171. WEINER, P.: 'A new force field for molecular mechan-
[11] GUNSTEREN, W.F. VAN, AND BERENDSEN, H.J.C.: ical simulation of nucleic acids and proteins', J. Amer.
'GROMOS', Groningen Mol. Sire. (1987). Chem. Soc. 106 (1984), 765-784.
[12] KANG, Y.K., NI~METHY, G., AND SCHERAGA, H.A.. [25] WESSON, L., AND EISENBERG, D.: 'Atomic solvation
'Free energies of hydration of solute molecules 1. Im- parameters applied to molecular dynamics of proteins
provement of hydration shell model by exact compu- in solution', Protein Sci. 1 (1992), 227-235.
tations of overlapping volumes', J. Phys. Chem. 91 John L. Klepeis
(1987), 4105-4109. Dept. Chemical Engin. Princeton Univ.
[13] KLEPEIS, J.L., ANDROULAKIS, I.P., IERAPETRITOU, Princeton, NJ 08544-5263, USA
M.G., AND FLOUDAS, C.A.: 'Predicting solvated pep- E-mail address" john©titan, princeton, edu
tide conformations via global minimization of energetic
Christodoulos A. Floudas
atom-to-atom interactions', Computers Chem. Engin.
Dept. Chemical Engin. Princeton Univ.
22 (1998), 765-788.
Princeton, NJ 08544-5263, USA
[14] KOLLMAN, P.A.: 'Free energy calculations: Applica-
E-mail address: floudas@titan, princeton, e d u
tions to chemical and biochemical phenomena', Chem.
Rev. 93 (1993), 2395-2417. MSC2000: 92C40, 65K10
[15] MARANAS, C.D., ANDROULAKIS, I.P., AND FLOUDAS, Key words and phrases: protein folding, multiple minima,
C.A.: 'A deterministic global optimization approach global optimization, c~BB.
for the protein folding problem': DIMACS, Vol. 23,
Amer. Math. Soc., 1996, pp. 133-150.
[16] MARANAS, C.D., AND FLOUDAS, C.A.: 'A global op- MULTIPLE OBJECTIVE DYNAMIC PRO-
timization approach for Lennard-Jones microclusters',
GRAMMING
J. Chem. Phys. 97 (1992), 7667-7677.
D y n a m i c p r o g r a m m i n g has b e e n an area of active
[17] MARANAS, C.D., AND FLOUDAS, C.A.: 'A determin-
istic global optimization approach for molecular struc- research since its i n t r o d u c t i o n by R. B e l l m a n [1].
ture determination', J. Chem. Phys. 100 (1994), 1247- More recently, w i t h the recognition t h a t m a n y ap-
1261. plied o p t i m i z a t i o n p r o b l e m s require more t h a n one
[lS] MARANAS, C.D., AND FLOUDAS, C.A.: 'Global objective, the s t u d y of m u l t i c r i t e r i a o p t i m i z a t i o n
minimum potential energy conformations of small
has b e c o m e a growing a r e a of research. I n c l u d e d
molecules', J. Global Optim. 4 (1994), 135-170.
[19] NI}METHY, G., GIBSON, K.D., PALMER, K.A., YOON, in this a r e a of m u l t i c r i t e r i a o p t i m i z a t i o n is the
C.N., PATERLINI, G., ZAGARI, A., RUMSEY, S., AND s t u d y of m u l t i p l e objective d y n a m i c p r o g r a m m i n g
SCHERAGA, H.A.: 'Energy parameters in polypeptides ( M O D P ) . M O D P was first used to replace multi-
10. Improved geometrical parameters and nonbonded ple objective linear p r o g r a m m i n g ( M O L P ) where
interactions for use in the ECEPP/3 algorithm with
it was not applicable, such as in p r o b l e m s w i t h dis-
application to proline containing peptides', J. Phys.
crete variables. M a n y of the techniques used are
Chem. 96 (1992), 6472-6484.
extensions of classical d y n a m i c p r o g r a m m i n g . T h e
561
Multiple objective dynamic programming
following is a discussion of some of the research from its destination, t. T h e n the algorithm is given
t h a t has been developed in the area of MODP. above.
Using multiple objective d y n a m i c p r o g r a m m i n g T h e resulting S k vectors give n o n d o m i n a t e d so-
to find the 'shortest' p a t h t h r o u g h a network lutions for the network, b u t m a y b e not all of them.
with constant costs is one of the more straight- T h e y solve an example in which the weights are
forward uses of MODP. Work has been done on not specified.
b o t h forward and backward M O D P in this area. A few years later, R. Hartley [6] proposed a sire-
First, we consider a general network containing ilar algorithm t h a t also uses backward M O D P to
a set of nodes N - { 1 , . . . , n } and a set of find all P a r e t o p a t h s from all nodes in the network
arcs A - { ( i 0 , i l ) , ( i 2 , i 3 ) , ( i 4 , i 5 ) , . . . } C N x g to a specified node. T h e algorithm is as follows:
which indicates connections between nodes. Each Let V0(i) = { c ~ , . . . , c ~ } for k = 0, 1 , . . . , and let
arc (i, j) has an associated cost vector, cij = vk(t) = {0,..., 0}.
(Cijl,..., Cijm) C R m. A p a t h from node io to ip Vk(i) -- eff [U{cij + Vk-l(j)" j E F(i)}] for i E
is the sequence of arcs P - {(i0, i l ) , . . . , (ip-1, ip)} N (i ¢ t) and k = 1 , 2 , . . . , where F(i) is the set
where the first node of each arc is the same as of nodes such t h a t (i, j) C A. T h e 'eft' operator
the terminal node of the preceding arc and each finds all n o n d o m i n a t e d vectors in the set. The as-
node in the p a t h is unique. Let Hi be the set of all sociated p a t h s must be h a n d l e d separately.
p a t h s from node 1 to node i. T h e cost to traverse H.W. Corley and I.D. Moon [4] used forward
a p a t h p in Hi is [c(p)] E(i,j)Ep[Cij]. A p a t h in
-
562
Multiple objective dynamic programming
cated t h a n M O D P with constant costs. The mono- Find a time grid of discrete values S T =
tonicity assumptions necessary for the principle {t0,...,t0 + T}, to > 0 and compute [cij(t)]
of optimality in dynamic p r o g r a m m i n g can eas- for all t E ST and all (i, j) E A.
Modify [cij(t)] for all t E ST and all (i,j) E A
ily be broken when dealing with time-dependent
as follows:
costs. Reaching a node later may be less costly
[c,3(t)]' = ~[cij(t)] ift + [cij(t)]l <_to + T,
t h a n reaching it earlier. M.M. Kostreva and M.M.
[ c~ if t + [cij (t)], > to + T.
Wiecek [7] extended the work done by K.L. Cooke
and E. Halsey [3] on dynamic p r o g r a m m i n g with Find the initial array [{[Fi(t)(0)]}], i =
one time-dependent cost (travel time) to dynamic 1,... ,N, for all t E ST, where {[Fgd(t)(O)]} =
p r o g r a m m i n g with multiple time-dependent costs. {0}, and {[Fi(t)(°)]} = [Cigd(t)]' for i E N \ N d .
Find the arrays [{[Fi(t)(k)]}], i-- 1,...,N, for
This m e t h o d uses backward dynamic program-
all t E ST, for k -- 1, 2,... as follows:
ming on a discrete time grid to find all nondomi-
{[f,(t)(')]}
nated paths from every node in the network to the
destination node. = VMIN{[cij(t)]' + {[fj (t + [cij(t)]~)('-l)]}},
[6,31 i e y \ {Y~},
{[F,(t)(')]} = {0}.
E,-.~J/~...',
The sequence of sets {[Fi(t0)(k)]}, k = 1,2,...,
converges to the set {[Fi(to)]}, the set of non-
dominated vectors associated with the paths
that leave node i at time to and reach node N4.
563
Multiple objective dynamic programming
(k)
G1
G~k) ~2~
)
( 4
i ( I(!)(:)i
(1o)
G~k) Vmin ~I (:) (00
a(:)
r ,
) V m i n , 11 (191)(1:)} (:1)(:)
I
3
Table 1.
a) tl + [cij(tl)]l <_ t2 + [cij(t2)]l, and leave the origin node at time t = 0 and lead to
node j. The algorithm is as listed above.
b) [cij(tl)]r ~ [Cij(t2)]r for all r C { 2 , . . . , m } . Another way to get around the monotonicity as-
Assuming the cost functions are monotone
sumption of dynamic p r o g r a m m i n g is to use gen-
increasing with respect to time satisfies this eralized dynamic p r o g r a m m i n g techniques. See [2]
assumption.
for a way to use generalized DP with a multicrite-
Find the initial vector {[a~°)]}, j = 1,...,N; rid preference function. Basically, generalized DP
where ([G~°)]} = {0} and {[G~°)]} -[clj(0)], uses a weaker principle of optimality than Bell-
j=2,...,N. man's famous version [1]. Generalized DP finds
Calculate the vectors {[G~k)]}, j = 1,..., N, for partial solutions that may lead to optimal solu-
k = 1, 2,..., as follows: tions even though locally they are not optimal so-
{[G}(h)(k)],l =,... ,Nj} lutions according to the preference function.
= VMIN{[G'~(tn)( k-l)] + [cij(tn)], In [2] generalized D P is applied to the multicri-
n-- 1,...,gi}, teria best p a t h problem. Assuming node 1 to be
j = 2,...,N, the origin and node N to be the destination, let H
be the set of all paths in the network. Let
{[G~(t~)(k)],l = 1}--- {0}.
P(j)= {pEH: il = 1 , i n = j }
{[GJ.k)]}, k = 1,2,..., converges to {[Gj]}, the
be the set of all paths from the origin to node j.
set of vector costs of all nondominated paths
which leave the origin node at time t = 0 and Let
lead to node j.
X ( j ) = {p C H: il = j, i n = N }
Assume that node 1 is the origin node. For nodes be the set of all paths from node j to the destina-
j - 2 , . . . , N , let [Gy(tu) (k)] be the vector cost of tion node. The vector cost along each arc is called
the nondominated p a t h u which is of at most k m E R m. A
an arc length vector, lij - (li~,... ,lij)
links leaving the origin node at time t = 0 and p a t h length function z : H --+ R m assigns a p a t h
leading to node j, where t u is the arrival time of length vector to every p a t h p E H where o is a
this p a t h at node j. Also, let {[G~k)]} be the set of binary operator on R m"
vector costs of all nondominated paths which are
of at most k links leaving the origin node at time
z(p) =/1,2 o . . . o li,-1,i,.
t = 0 and leading to node j, where Nj is the num- Thus, each different objective can have a differ-
ber of nondominated paths. Let {[Gj]} be the set ent binary operation. For example, distance would
of vector costs of all nondominated paths which have an additive binary operator and probabilities
564
Multiple objective dynamic programming
2
~ :/1o)(°o)i i 2o)i
5 7 [
Lq
0
3 (~ ~
4 4 2 0 4 2 0
5 iO)o i i i i Zi
6 (~o~) (142/ ( 2 2 ) ( : ) (~o~) ( 1 4 2 ) ( X ) ( ~ )
7
(
oo oo
i
10 0
(=) (=)(~)(°0) (i)
oo oo 10 0
8
would have a multiplicative binary operator. Let These local preference relations are used to form
the weak principle of optimality. An optimal path
Z(j) - {z(p)" p E P ( j ) } must be composed of subpaths that can be part of
be the set of all length vectors of all paths from the an optimal path.
origin to node j. A multicriteria preference func- Unfortunately, in order to get these preference
tion u: R m ~ R is defined on the set of path relations one would have to complete all paths
length vectors. The objective is to maximize this from every node in the network. Since this is too
preference function. The monotonicity assumption computationally intense, the preference relations
says that for all z,z' E Z(j), u(z) < u(z') '.. are relaxed to the refining local preference rela-
u(z o ljk) <_ u(z'o ljk) for all j , k E S such that tions -~j where z -~j z' implies zpjz'. Using -~j
(j,k) C T. Unfortunately, with multi-objective avoids having to find the entire relation pj. Using
problems this assumption is easily violated. Gen- this relation means that a larger set of maximal
eralized DP tries to get around this monotonic- path length vectors will be kept by using pj than
ity assumption by having local preference relations if pj were used. A maximal path length vector is
defined as pj C Z(j) × Z(j): for z, z' C Z(j), a vector where there does not exist another vector
where zpjz ~ implies that any subpath from the ori- at that state that is strictly more preferred. Let
gin to node j whose length is z cannot be used
maxl(X, p ) - {x E X" 3x' C X" xpx' and x'px}.
in a path to produce a better overall path from
the origin to the destination node than using the The following are the equations of generalized DP:
subpath from the origin to node j whose length
is z ~. So, subpath length vector z ~ is more lo- f(1)- (zl},
cally preferred even though subpath length vector S(J) - m~xl (u(~,j)~A(f(i)o l~j) -~j)
z may be globally preferred, u(z') <_ u(z). So, for forj = 2 , . . . , N ,
z,z' E Z(j), zpjz' if and only if 3p' C X ( j ) such
that u(z o z(p)) <_ u(z' o z(p')) for all p E X(j). where { f (i) o lij } = {z o lij : z E f (i) }.
565
Multiple objective dynamic programming
When the monotonicity assumptions are satis- [4] CORLEY, H.W., AND MOON, I.D.: 'Shortest paths in
fied, the -~j relation can be replaced with the mul- networks with vector weights', J. Optim. Th. Appl. 46
(1985), 79-86.
ticriteria preference function, u, thus reducing to
[5] DAELLENBACH,H.G., AND DEKLUYVER, C.A.: 'Note
the conventional DP problem. However, when the on multiple objective dynamic programming', J. Oper.
monotonicity assumption does not hold the -~j re- Res. Soc. 31 (1980), 591-594.
lation must be defined by trying exploit any special [6] HARTLEY, R.: 'Vector optimal routing by dynamic pro-
structures of each individual problem. Also, using gramming', in P. SERAFINI (ed.): Mathematics of Mul-
tiobjective Optimization, 1984, pp. 215-224.
dynamic programming to find the entire Pareto
[7] KOSTREVA, M.M., AND WIECEK, M.M.: 'Time depen-
optimal set can be seen as another special case of dency in multiple objective dynamic programming', J.
generalized DP where Zk >_ Z~k for all k - 1 , . . . , m Math. Anal. Appl. 173 (1993), 289-307.
'.. z -~j z I (assuming minimization of each cri- Michael M. Kostreva
teria). Dept. Math. Sci. Clemson Univ.
The subject of multiple objective dynamic pro- Clemson, SC 29634-1907, USA
gramming has developed into a viable body of E-mail address: flstgla~clomson.odu
knowledge capable of providing solutions to ap- Laura C. Lancaster
plied problems in which trade-offs among objec- Dept. Math. Sci. Clemson Univ.
Clemson, SC 29634-1907, USA
tives is important. Among the multiple objective
E-mail address: 11ancas@math. clemson, e d u
techniques, it is distinctive in its ability to pro-
MSC2000: 90C39, 90C31
vide the entire Pareto optimal set. To gain such an
Key words and phrases: dynamic programming, multiple
advantage, one must be willing to perform com- objective programming, efficient set.
putationally intensive operations on large sets of
vectors.
See also: D y n a m i c p r o g r a m m i n g in cluster- MULTIPLE OBJECTIVE PROGRAMMING
ing; D y n a m i c p r o g r a m m i n g a n d N e w t o n ' s SUPPORT
m e t h o d in u n c o n s t r a i n e d o p t i m a l control; This article gives a brief introduction into multiple
Dynamic programming: Continuous-time objective programming support. We will overview
o p t i m a l control; H a m i l t o n - J a c o b i - B e l l m a n basic concepts, formulations, and principles of
equation; D y n a m i c p r o g r a m m i n g : Infinite solving multiple objective programming problems.
horizon p r o b l e m s , overview; D y n a m i c pro- To solve those problems requires the interven-
g r a m m i n g : Stochastic s h o r t e s t p a t h prob- tion of a decision-maker. That's why behavioral
lems; D y n a m i c p r o g r a m m i n g : D i s c o u n t e d assumptions play an important role in multiple
problems; D y n a m i c p r o g r a m m i n g : A v e r a g e objective programming. Which assumptions are
cost p e r stage problems; D y n a m i c p r o g r a m - made affects which kind of support is given to a
ming: U n d i s c o u n t e d problems; D y n a m i c decision maker. We will demonstrate how a free
p r o g r a m m i n g : I n v e n t o r y control; N e u r o - search type approach can be used to solve multi-
dynamic programming; Dynamic program- ple objective programming problems.
ming: O p t i m a l control applications.
I n t r o d u c t i o n . Before we can consider the con-
cept of multiple objective programming support
References
(MOPS), we have to first explain the concept of
[1] BELLMAN, R.E.: Dynamic programming, Prince-
ton Univ. Press, 1957. multiple criteria decision making (MCDM). Even
[2] CARRAWAY,R.L., MORIN, T.L., AND MOSKOWITZ, if there is a variation of different definitions, most
H.: 'Generalized dynamic programming for multicri- researchers working in the field might accept the
teria optimization', Europ. J. Oper. Res. 44 (1990), following general definition: Multiple Criteria De-
95-104. cision Making (MCDM) refers to the solving of
[3] COOKE, K.L., AND HALSEY, E.: 'The shortest route
decision and planning problems involving multi-
through a network with time-dependent internodal
transit times', J. Math. Anal. Appl. 14 (1966), 493- ple (generally conflicting) criteria. 'Solving' means
498. that a decision-maker (DM) will choose one 'rea-
566
Multiple objective programming support
sonable' alternative from among a set of available Fig. 1 and Fig. 2 and the numerical example we
ones. It is also meaningful to define that the choice consider a multiple objective linear programming
is irrevocable. For an MCDM problem it is typi- model in which all constraints and objectives are
cal that no unique solution for the problem ex- defined using linear functions.
ists. Therefore to find a solution for MCDM prob- The article consists of seven sections. In Sec-
lems requires the intervention of a decision-maker tion 2, we give a brief introduction to some foun-
(DM). In MCDM, the word 'reasonable' is replaced dations of multiple objective programming. How
by the words 'efficient/nondominated'. They will to generate potential 'reasonable' solutions for a
be defined later on. DM's evaluation is considered in Section 3, and
Actually the above definition is a strongly sim- in Section 4, we will review general principles to
plified description of the whole (multiple criteria) solve a multiple objective programming problem.
decision making process. In practice, MCDM prob- In Section 5, a multiple criteria decision support
lems are not often so well-structured, that they can system VIG is introduced, and a numerical exam-
be considered just as a choice problem. Before a de- ple is solved in Section 6. Concluding remarks are
cision problem is ready to be 'solved', the following given in Section 7.
questions require a lot of preliminary work: How
to structure the problem? How to find essential A Multiple Objective Programming Prob-
criteria? How to handle uncertainty? These ques- lem. A multiple objective programming (MOP)
tions are by no means outside the interest area of problem in a so-called criterion space can be de-
MCDM-researchers. The outranking method by B. fined as follows:
Roy [17] and the AHP (the analytic hierarchy pro-
'max' q (1)
cess) developed by T.L. Saaty [18] are examples of
s.t. q C Q,
the MCDM-methods, in which a lot of effort is de-
voted to problem structuring. Both methods are where set Q c R k is a so-called feasible region in
well known and widely used in practice. In both a k-dimensional criterion space R k. The set Q is
methods, the presence of multiple criteria is an es- of special interest. Most considerations in multi-
sential feature, but the structuring of a problem is ple objective programming are made in a criterion
an even more important part of the solution pro- space.
cess. Set Q may be convex/nonconvex,
When the term 'support' is used in connection bounded/unbounded, precisely known or un-
with MCDM, we may adopt a broad perspective known, consist of finite or infinite number of alter-
and refer with the term to all research associated natives, etc. When Q consists of a finite number of
with the relationship between the problem and the elements which are explicitly known in the begin-
decision-maker. In this article we take a narrower ning of the solution process, we have an important
perspective and focus on a v e r y essential support- class of problems which may be called e.g. (multi-
ing problem in multiple criteria decision making: ple criteria) evaluation problems. Sometimes those
How to assist a DM to find the 'best' solution from problems are referred to as discrete multiple cri.
among a set of available 'reasonable' alternatives, teria problems or selection problems (for a survey
when the alternatives are evaluated by using sev- see for example. [16]).
eral criteria? Available alternatives are assumed When the number of alternatives in Q is infi-
to be defined explicitly or implicitly by means of a nite and not countable, the alternatives are usu-
mathematical model. The term multiple objective ally defined using a mathematical model formula-
programming is usually used to refer to dealing tion, and the problem is called continuous. In this
with this kind of model. case we say that the alternatives are only implic-
The following considerations are general in the itly known. This kind of problem is referred as a
sense that usually it is not necessary to specify multiple criteria design problem (the terms 'evalu-
how the alternatives are defined. It is enough to tion' and 'design' are adopted from A. Arbel) or a
assume that they belong to set Q. However, in continuous multiple criteria problem. In this case,
567
Multiple objective programming support
the set Q is not specified directly, but by means of dominated) solutions is an acceptable and 'reason-
decision variables as usually done in single optimi- able' solution, unless we have no additional infor-
zation problems: mation about the DM's preference structure.
~Nondominated
max q-- f(x) - ( f l ( x ) , . . . ,fk(x)) (2)
s.t. x E X,
568
Multiple objective programming support
if it is impossible to say which system provides If )~ > 0, then the solution vector x of (4) is effi-
the best support for a DM for his multiple criteria cient, but if we allow that A >_ 0, then the solu-
problem, all proper systems have to be able to rec- tion vector is weakly-efficient. (see, e.g. [21, p. 215;
ognize, generate and operate with nondominated 221]). Using the parameter set A - {A" /~ > 0}
solutions. To generate nondominated solutions for in the weighted-sums linear program we can com-
the DM's evaluation is thus one key issue in mul- pletely characterize the efficient set provided the
tiple objective programming. In the next section, constraint set is convex. However, A is an open set,
we will consider some principles. which causes difficulties in a mathematical optimi-
zation problem. If we use cl(A) - {~" )~ > 0} in-
Generating Nondominated Solutions. De- stead, the efficiency of x cannot be guaranteed any-
spite many variations among different methods of more. It is surely weakly-efficient, and not neces-
generating nondominated solutions, the ultimate sarily efficient. When the weighted-sums are used
principle is the same in all methods: a single ob- to specify a scalarizing function in multiple objec-
jective optimization problem is solved to generate tive linear program (MOLP) problems, the opti-
a new solution or solutions. The objective func- mal solution corresponding to nonextreme points
tion of this single objective problem may be called of X is never unique. The set of optimal solutions
a scalarizing function, according to [25]. It typi- always consists of at least one extreme point, or
cally has the original objectives and a set of pa- the solution is unbounded. In early methods, a
rameters as its arguments. The form of the scalar- common feature was to operate with weight vec-
izing function as well as what parameters are used tors A E R k, limiting considerations to efficient
depends on the assumptions made concerning the extreme points (see, e.g., [29]).
DM's preference structure and behavior. A Chebyshev-Type Scalarizing Function. Cur-
Two classes of parameters are widely used in rently, most solution methods are based on the
multiple objective optimization: use of a so-called Chebyshev-type scalarizing func-
1) weighting coefficients for objective functions; tion first proposed by A. Wierzbicki [25]. We will
and refer to this function by the term achievement
(scalarizing) function. The achievement (scalariz-
2) reference/aspiration/reservation levels for ing) function projects any given (feasible or infea-
objective function values. sible) point g C R k onto the set of nondominated
Based on those parameters, there exist several solutions. Point g is called a reference point, and
ways to specify a scalarizing function. An impor- its components represent the desired values of the
tant requirement is that this'function completely objective functions. These values are called aspi-
characterizes the set of nondominated solutions: ration levels.
The simplest form of achievement function is"
for each parameter value, all solution
vectors are nondominated, and for each s(g, q, w) - max gk -- qk, (5)
kCK Wk
nondominated criterion vector, there is
at least one parameter value, which pro- where w > 0 E R k is a (given) vector of weights,
duces that specific criterion vector as a g C R k, and q E Q - {f(x)" x c X}. By minimiz-
solution ing s(g, q, w) subject to q c Q, we find a weakly
nondominated solution vector q* (see, e.g. [25],
(see, for theoretical considerations, e.g. [26]).
[26]). However, if the solution is unique for the
A Linear Scalarizing Function. A classic method problem, then q* is nondominated. If g C R k is
to generate nondominated solutions is to use the feasible, then q* C Q, q* >_ g. To guarantee that
weighted-sums of objective functions, i.e. to use only nondominated (instead of weakly nondomi-
the following linear scalarizing function: nated) solutions will be generated, more compli-
cated forms for the achievement function have to
max { )~'f (x) " x e X } . (4) be used, for example"
569
Multiple objective programming support
570
Multiple objective programming support
itly. Make assumptions of the general func- ing Chebyshev-type achievement scalarizing func-
tional form of the value function. tions as explained above. These functions can be
3) Do not assume the existence of a stable value controlled either by varying weights (keeping as-
function v, either explicitly, or implicitly. piration levels fixed) or by varying the aspiration
levels (keeping weights fixed). Instead of aspiration
The first assumption is adopted in multi- levels, some algorithms asks the DM to specify the
attribute utility or decision analysis (see, e.g. reservation levels for the criteria (see, e.g. [15]).
[7]). Interactive software implementing such ap- An achievement scalarizing function projects
proaches on personal computers exists. one aspiration (reservation) level point at a time
The second assumption was a basic paradigm onto the nondominated frontier. By parametrizing
used in interactive multiple criteria approaches in the function, it is possible to project the whole vec-
the 1970s. A classical example is the GDF-method tor onto the nondominated frontier as originally
[3]. DM's responses to specific questions were used proposed by [11]. The vector to be projected is
to guide the solution process towards an 'optimal' called a reference direction vector and the method
or 'most preferred' solution (in theory), assuming reference direction method, correspondingly. When
that the DM behaves according to some specific a direction is projected onto the nondominated
(but unknown) underlying value function (see for frontier, a curve traversing across the nondomi-
surveys, e.g. [5], [20], [21], and [24]). Interactive nated frontier is obtained. Then an interactive line
software that implements such systems for a com- search is performed along this curve. The idea en-
puter have often been developed by the authors of ables the DM to make a continuous search on the
the above procedures for experimental purposes. nondominated frontier. The corresponding mathe-
The approaches based on the assumption on matical model is a simple modification from the
'no stable value/utility function' typically operate original model (8) developed for projecting one
with a DM's aspiration levels regarding the objec- point:
tives on the feasible region. The aspiration levels r k
are projected via minimizing so called achievement
min e÷p~(gi-qi)
scalarizing functions (6) ([23], [25]). No specific i--1
behavioral assumptions e.g. transitivity are nec- s.t. x CX (9)
essary.
q + ew - z - g + tr,
In essence, this approach seeks to help the DM
z>_O,
more or less freely to search the set of efficient so-
lutions. Interactive software that implements such where t" 0 -+ ce and r E R k is a reference direc-
systems for a computer have been developed like tion. In the original approach, a reference direction
ADBASE [22], DIDAS [14], VIG [8], and VIMDA was specified as a vector starting from the current
[9]. For an excellent review of several interactive solution and passing through the aspiration levels.
multiple criteria procedures, see [21]. Other well- The DM was asked to give aspiration levels for
known books that provides a deeper background criteria.
and additional references especially in the field of The original reference direction approach has
multiple objective optimization include [I], [4], [5], been further developed into many directions. First,
[6], [19], [27] and [28]. [12] improved upon the original procedure by mak-
Multiple objective linear programming (MOLP) ing the specification of a reference direction dy-
is the most commonly studied problem in multiple namic. The dynamic version was called Pareto
criteria decision making (MCDM). Most solution race. In Pareto race, the DM can freely move in
methods are developed for this problem. any direction on the nondominated frontier he/she
likes, and no restrictive assumptions concerning
Example of a Decision Support System: the DM's behavior are made. Furthermore, the ob-
V I G . Today, many systems use aspiration level jectives and constraints are presented in a uniform
projections, where the projection is performed us- manner. Thus, their role can also be changed dur-
571
Multiple objective programming support
ing the search process. The method and its imple- An example of the Pareto race screen is given in
mentation is called Pareto race. The whole soft- Fig. 3. The screen is associated with the numerical
ware package consisting of Pareto race is called example described in the next section.
VIG.
In Pareto race, a reference direction r is deter-
mined by the system on the basis of preference in-
formation received from the DM. By pressing num- .... i : ........... ::: :::85.43:7 :: :/:i i::ii: ::: i:i: i ii:i~:::i ii~ii!::::: ili: ~ii:~:: ===+==========
=======
=!i!i:i!~i:~!:!:i:!:::i:i i::!ii:i ::!!ii.!:!:i
he/she would like to improve and how strongly. In : ": :~: ::: : :~:+::+ : : : ::- : : : : : :;--::~:~:!: : i .:: ::;~:!!:i + !.i.: :~::::;: ::: ::::: ::::::::::::::::::::::::::::::::::::::::::
........ : : ........... i !:-:!+::~i~!:ii:i!:::i::! :
this way he/she implicitly specifies a reference di- ::Yb~es::: :!i: F2!Ge~(~:F4~Rei~::i:i!iFl0iE~{:: I! i?:::ii~:i:::iii:::i;::ii::::!i
572
Multiple objective programming support
length of the regular working day is 10 depends entirely on his/her own preferences. Ac-
hours. People are willing to work over- tually, all sample solutions except solution II are
time which is costly and they are tired somehow consistent with his/her statement above.
the next day. Therefore, if possible, I In solution II, product 3 is excluded from the pro-
would like to avoid it. Finally, product 3 duction plan.
is very important to a major customer,
I i I ii] iii ! iv
and I cannot totally exclude it from the
Objectives:
production plan.
crit. mat. 1 91.46 94.50 93.79 90.00
The traditional single objective programming crit. mat. 2 85.44 88.00 89.15 84.62
considers the problem as a profit maximization profit 30.27 31.00 30.42 29.82
problem. The other 'requirements' are taken as product 3 0.23 0.00 0.50 0.44
, .
573
Multiple objective programming support
mization; Bi-objective assignment problem; A. WIERZBICKI (eds.): Aspiration Based Decision Sup-
Estimating data for multicriteria decision port Systems, Springer, 1989, pp. 21-47.
making problems: Optimization techniques; [15] MICHALOWSKI, W., AND SZAPIRO, T.: 'A bi-reference
procedure for interactive multiple criteria program-
Multicriteria sorting methods; Financial ap-
ming', Oper. Res. 40 (1992), 247-258.
plications of multicriteria analysis; Portfo- [16] OLSON, D.: Decision aids .for selection problems, Ser.
lio s e l e c t i o n and multicriteria analysis; De- Oper. Res. Springer, 1996.
cision support systems with multiple crite- [17] RoY, B.: 'How outranking relation helps multiple crite-
ria. ria decision making', in J. COCHRANEAND M. ZELENY
(eds.): Multiple Criteria Decision Making, Univ. South
Carolina Press, 1973, pp. 179-201.
References [18] SAATY, T.: The analytic hierarchy process, McGraw-
[1] COHON, J.: Multiobjective programming and planning, Hill, 1980.
Acad. Press, 1978. [19] SAWARAGI, Y., NAKAYAMA,H., AND TANINO, T.: The-
[2] DYER, J., FISHBURN, P., WALLENIUS, J., AND ZIONTS, ory of multiobjective optimization, Acad. Press, 1985.
S.: 'Multiple criteria decision making, multiattribute [20] SHIN, W., AND RAVINDRAN, A.: 'Interactive multiple
utility theory- The next ten years', Managem. Sci. 38 objective optimization: Survey I - Continuous case',
( 1992 ), 645-654. Comput. Oper. Res. 18 (1991), 97-114.
[3] GEOFFRION, A., DYER, J., AND FEINBERC, A.: 'An [21] STEUER, R.E.: Multiple criteria optimization: Theory,
interactive approach for multi-criterion optimization, computation, and application, Wiley, 1986.
with an application to the operation of an academic [22] STEUER, R.: Manual for the ADBASE multiple objec-
department', Managem. Sci. 19 (1972), 357-368. tive linear programming package, Dept. Management
[4] HAIMES, Y., TARVAINEN, K., SHIMA, T., AND THA- Sci., Univ. Georgia, 1992.
DATHIL, J.: Hierarchical multiobjective analysis of [23] STEUER, R., AND CHOO, E.-U.: 'An interactive
large-scale systems, Hemisphere, 1990. weighted Tchebycheff procedure for multiple objective
[5] HWANG, C., AND MASUD, A.: Multiple objective de- programming', Math. Program. 26 (1983), 326-344.
cision making - Methods and applications: A state-of- [24] WHITE, D.: 'A bibliography on the applications of
the-art survey, Springer, 1979. mathematical programming multiple-objective meth-
[6] IGNIZIO, J.: Goal programming and extensions, D.C. ods', J. Oper. Res. Soc. 41 (1990), 669-691.
Heath, 1976. [25] WIERZBICKI, A.: 'The use of reference objectives
[7] KEENEY, R.L., AND RAIFFA, H.: Decisions with mul- in multiobjective optimization', in G. FANDEL AND
tiple objectives: Preferences and value tradeoffs, Wiley, T. CoAL (eds.): Multiple Objective Decision Making,
1976. Theory and Application, Springer, 1980.
[8] KORHONEN, P.: ' V I G - A visual interactive support [26] WIERZBICKI, i . : 'On the completeness and construc-
system for multiple criteria decision making', Belgian tiveness of parametric characterizations to vector opti-
J. Oper. Res., Statist. and Computer Sci. 27 (1987), mization problems', OR Spektrum 8 (1986), 73-87.
3-15. [27] Yu, P.L.: Multiple criteria decision making: Concepts,
[9] KORHONEN, P.: 'A visual reference direction approach techniques, and extensions, Plenum, 1985.
to solving discrete multiple criteria problems', Europ. [28] ZELENY, M.: Multiple criteria decision making, Mc-
J. Oper. Res. 34 (1988), 152-159. Graw-Hill, 1982.
[10] KORHONEN, P., AND HALME, M.: 'Using lexicographic [29] ZIONTS, S., AND WALLENIUS, J.: 'An interactive pro-
parametric programming for searching a nondominated gramming method for solving the multiple criteria
set in multiple objective linear programming', J. Multi- problem', Managem. Sci. 22 (1976), 652-663.
Criteria Decision Anal. 5 (1996), 291-300.
[11] KORHONEN, P., AND LAAKSO, J.: 'A visual interac- Pekka Korhonen
tive method for solving the multiple criteria problem', Internat. Inst. Applied Systems Analysis
Europ. J. Oper. Res. 24 (1986), 277-287. A-2361 Laxenburg, Austria
[12] KORHONEN, P., AND WALLENIUS, J.: 'A Pareto race', Helsinki School Economics and Business Adm.
Naval Res. Logist. 35 (1988), 615-623. Runeberginkatu 14-16
[13] KORHoNEN, P., WALLENIUS, J., AND ZIONTS, S.: 00100 Helsinki, Finland
'A computer graphics-based decision support system E-mail address: korhonon~iiasa, ac. at
for multiple objective linear programming', Europ. J.
Oper. Res. 60 (1992), 280-286. MSC 2000:90C29
[14] LEWANDOWSKI, A., KREGLEWSKI, T., ROGOWSKI, Key words and phrases: multiple criteria decision making,
W., AND WIERZBICKI, A.: 'Decision support sys- multiple objective programming, multiple objective pro-
tems of DIDAS family (Dynamic Interactive Deci- gramming support, scalarizing function, value function.
sion Analysis and Support', in A. LEWANDOWSKIAND
574
Multiplicative programming
575
Multiplicative programming
(5) can be solved far more efficiently than the usual [13] SWARUP,K.: 'Programming with indefinite quadratic
concave minimization problem of the same size. function with linear constraints', Cahiers CERO 8
(1966), 132-136.
In addition to (1) and (4), there are a
[14] THOAI, N.V.: 'A global optimization approach for solv-
number of studies on problems with gener- ing the convex multiplicative programming problem',
alized convex multiplicative ]unctions
of the J. Global Optim. I (1991), 341-357.
forms f ( x ) - 1-Iip__lfi(x)+ g ( x ) a n d f ( x ) = Takahito Kuno
p
Y~i=l f2i-1 (x)f2i(x)+g(x), where the fis and g are Univ. Tsukuba
convex functions. These are all nonconvex mini- Ibaraki, Japan
mization problems, each of which has an enormous E-mail address: takahitoOis, tsukuba, ac. jp
number of local minima. Nevertheless, algorithms
MSC2000: 90C26, 90C31
developed in the 1990s can locate a globally opti- Key words and phrases: optimization, nonconvex minimiza-
mal solution in a reasonable amount of time, by tion, low-rank nonconvexity.
exploiting special structures of f such as low-rank
nonconvexity. A comprehensive review of the al-
gorithms are given by H. Konno and T. Kuno in
MULTISTAGE STOCHASTIC PROGRAM-
[5] MING: BARYCENTRIC APPROXIMATION
See also: G l o b a l o p t i m i z a t i o n in m u l t i p l i c a - Many problems in finance, economics and other
tive p r o g r a m m i n g ; L i n e a r p r o g r a m m i n g ;
applications require that decisions xt E R m are
Multiparametric linear programming; Para-
made periodically over time, depending on obser-
metric linear programming: Cost simplex
vations of uncertain data (~t,~t) in future peri-
algorithm.
ods t = 1 , . . . , T . Here, it is distinguished be-
References tween random data ~?t C Ot C R Kt that influ-
[1] AVRIEL, M., DIEWERT, W.E., SCHAIBLE, S., AND ence prices in the objective function and random
ZANG, I.: Generalized concavity, Plenum, 1988. data ~t C Et C R L` that affect the demand on the
[2] EVANS, D.H.: 'Modular design: A special case in non- right-hand side of constraints in an optimization
linear programming', Oper. Res. 11 (1963), 637-647.
problem.
[3] GEOFFRION, M.: 'Solving bicriterion mathematical
programs', Oper. Res. 15 (1967), 39-54. Once an observation (Ut, ~t) becomes available,
[4] HENDERSON, J.M., AND QUANDT, R.E.: Microeco- the decision maker has to determine a policy
nomic theory, McGraw-Hill, 1971. xt that minimizes the costs flt(xt-l,xt, r]t) in t
[5] HORST, R., AND PARDALOS, P.M.: Handbook of global plus the expected costs in the subsequent peri-
optimization, Kluwer Acad. Publ., 1995. ods t + 1 , . . . , T , subject to a set of constraints
[6] HORST, R., AND TUY, H.: Global optimization: deter-
f t ( x t - l , x t ) ~_ h(~t). Both the objective function
ministic approaches, second ed., Springer, 1993.
[7] KONNO, H., AND INORI, M.: 'Bond portfolio optimiza- and the constraints may depend on the sequences
tion by bilinear fractional programming', J. Oper. Res. of observations rlt - ( r l l , . . . , tit), ~ t = (~1,... ,~t)
Japan 32 (1989), 143-158. up to t and earlier decisions x t-1 - ( x 0 , . . . , xt-1).
[8] KONNO, H., AND KUNO, T.: 'Linear multiplicative pro- Obviously, an action xt must be selected after
gramming', Math. Program. 56 (1992), 51-64.
(tit, ~t) is observed but before the future outcomes
[9] KONNO, H., THACH, P.T., AND Wuv, H.: Optimiza-
tion on low rank nonconvex structures, Kluwer Acad. r/t+l,...,rlT and ~ t + l , . . . , ~ T are known, i.e. the
Publ., 1997. decision is based only on information available at
[10] MALING, K., MUELLER, S.H., AND HELLER, W.R.: time t. Hence, one obtains a sequence of decisions
'On finding most optimal rectangular package plans': with the property x0, Xl (rl 1, ~ 1 ) , . . . , XT(~T, ~T),
Proc. 19th Design Automation Conf., 1982, pp. 663- called nonanticipativity. This results in a multi-
670.
[11] MATSUI, T.: 'NP-hardness of linear multiplicative pro-
stage stochastic program, which may be written
gramming and related problems', J. Global Optim. 9 in its dynamic representation as a series of nested
(1996), 113-119. two-stage programs (with C T + I ( ' ) : = 0, see [4]):
[12] PARDALOS, P.M.: 'Polynomial time algorithms for
some classes of constrained nonconvex quadratic prob- Ct(xt-1, ~?t,~t) ._ min ~pt(x t-l, xt, ~7t) (1)
lems', Optim. 21 (1990), 843-853. L
576
Multistage stochastic programming: Barycentric approximation
577
Multistage stochastic programming: Barycentric approximation
from above for all ~ 6 E by a linear function and the tangent ¢(~) to ~o(~) at ~ is a lower bound
L
- ~-~t,=0Tt'(~)V~" To construct the 'classi- to the original function. Both linear approxima-
cal' Edmundson-Madansky upper bound (EM) for tions ¢(() and ~(~) to the convex value function
f ~(~)dP over the simplex E, ~ is replaced by a for a given policy are shown in Fig. 2.
discrete random variable with the same expecta- From a computational viewpoint, the original
tion, attaining values vo,..., VL. To obtain the cor- function ~o(~) is replaced by two linear affine func-
responding probabilities, ~ has to be replaced by tions. Clearly, ¢(~) and ~(~) can be integrated
- f ~ dP in (4), and the system must be solved easily over the support of ~. If there is only ran-
L
for TO,..., TL. Then, f ~(~) dR <_ Et,=0 Tt'(~)Vt,, domness in the objective with deterministic right-
and the weights may be interpreted as the proba- hand sides, a lower and an upper bound can be
bilities of the discrete outcomes. constructed by applying the same procedure to the
dual concave (maximization) problem, deriving an
upper bound from Jensen's inequality and a lower
approximation with the EM-rule.
Barycentric approximation combines these con-
cepts for stochastic objective and right-hand sides
[3] and extends them to the multistage case [4],
[5]. It derives distinguished points, so-called gen-
eralized barycenters, where the value function (1)
v0 Vl must be supported by two bilinear functions to
minimize the error induced by the approximation.
Fig. 2.
This is shown in Fig. 3 for Kt = Lt = 1, where
the minorant is supported at ~0 and ~1 and the
majorant at 770 and ~1.
Let At,0(7?t),..., At,K,(~?t), Tt,o(~t),..., Tt,L,(~t)
be the barycentric weights w.r.t. Ot and Et de-
fined analogously to (4). For both simplices, the
generalized barycenters and their probabilities are
given by
gt
r]t~t -- [q(r]#t)]-I " E uut f Aut(~]t)rttt(~t)dPt,
vt-o
q(n., ) - / rm (~t ) dPt, #t -- O , . . . , L t ,
Lt
~vt - [q(~vt)] -1" ~ vt,t / Avt(~t)Tta(~t) dPt,
#t=0
578
Multistage stochastic programming: Barycentric approximation
the corresponding probabilities q ( ~ ) to obtain for t - 0 , . . . , T with CT+I (') -- ~I/T+I (') "-- 0. Ac-
discrete outcomes for the lower approximation of cording to [4], these are lower and upper bounds
the original measure Pt. This way, one derives a to the original value function, i.e.
discrete probability measure Q~ with support Ct(xt-1, ,f , < ,f ,
supp Q~ - {(u~,, ~,)" ut = 0 , . . . , Kt }. <_ t(zt-1,
Analogously, ~Tm, #t - 0 , . . . , Lt, are supporting In the entire convex case, the accuracy of the ap-
points for the majorant with assigned probabilities proximation is quantifiable by the difference be-
q(rh, ,). This induces a discrete measure Q~' for the tween the upper and lower bound. If required, the
upper approximation with approximation can be improved by partitioning
the simplices Ot and ~t. In case that the subsim-
suppQ~ - {(rh,,,vt, t). #t - O, . . . ,Lt} . plices become arbitrarily small, the extremal mea-
sures converge to Pt, and the convergence of the
Both measures represent the solutions of two upper and lower bounds to the expectation of the
corresponding moment problems. The advanta- value function is guaranteed (see [5] for details).
geous feature from a computational viewpoint is See also: S t o c h a s t i c p r o g r a m m i n g w i t h
that the generalized barycenters and their proba- simple i n t e g e r recourse; T w o - s t a g e stochas-
bilities are completely determined by the first mo- tic p r o g r a m s w i t h recourse; S t o c h a s t i c in-
ments of 7/t and ~t, and by the bilinear cross mo- t e g e r p r o g r a m m i n g : C o n t i n u i t y , stability,
ments E(~,~. ~m), u t - O , . . . , K t , #t - O , . . . , L t . r a t e s of convergence; G e n e r a l m o m e n t opti-
Note that the covariance of two random variables m i z a t i o n p r o b l e m s ; A p p r o x i m a t i o n of multi-
is derived from the first moments and the corre- v a r i a t e p r o b a b i l i t y integrals; D i s c r e t e l y dis-
sponding cross moments. Therefore, the measures t r i b u t e d s t o c h a s t i c p r o g r a m s : D e s c e n t di-
Q~ and Q~ incorporate implicitly a correlation be- r e c t i o n s a n d efficient points; S t a t i c stochas-
tween Ut and ~t. However, cross moments (or co- tic p r o g r a m m i n g models; S t a t i c stochas-
variances, respectively) between different elements tic p r o g r a m m i n g m o d e l s : C o n d i t i o n a l ex-
of Ut are not taken into account (the same holds for pectations; Stochastic programming mod-
the components of ~t). Hence, the formulae given els: R a n d o m objective; S t o c h a s t i c pro-
above are applicable without the assumption of in- g r a m m i n g : M i n i m a x a p p r o a c h ; Simple re-
dependent random variables. course p r o b l e m : P r i m a l m e t h o d ; Simple re-
Applying the approximation scheme dynam- course p r o b l e m : D u a l m e t h o d ; P r o b a b i l i s -
ically over time, one obtains two barycen- tic c o n s t r a i n e d linear p r o g r a m m i n g : Dual-
tric scenario trees A t and jt u with their ity t h e o r y ; P r o b a b i l i s t i c c o n s t r a i n e d p r o b -
path probabilities of type (3). The set of lems: C o n v e x i t y t h e o r y ; E x t r e m u m prob-
outcomes at stage t - 1 , . . . , T is given lems w i t h p r o b a b i l i t y functions: K e r n e l
by ,41(~7t-1,~t-l) -- suppQ~(.[rlt-l,~t-l) and
t y p e s o l u t i o n m e t h o d s ; A p p r o x i m a t i o n of
Au(~?t-1, ~t-1) _ supp Q~(.I~7t-l, ~t-1). Substitut- e x t r e m u m p r o b l e m s w i t h p r o b a b i l i t y func-
ing Pt in (1) by the discrete measures Q~ and q~' tionals; S t o c h a s t i c linear p r o g r a m s w i t h re-
yields two value functions course a n d a r b i t r a r y m u l t i v a r i a t e d i s t r i b u -
tions; S t o c h a s t i c p r o g r a m s w i t h recourse:
Ct(xt-1, ~t, ~t) ._ min { p t ( x t-l, xt, ut) Upper bounds; Stochastic integer programs;
L - s h a p e d m e t h o d for t w o - s t a g e stochas-
+ ~t+l ,xt ,~t+l ,~t+l)dQ~+l , tic p r o g r a m s w i t h recourse; S t o c h a s t i c lin-
ear p r o g r a m m i n g : D e c o m p o s i t i o n a n d cut-
~t (xt-1 , , )'-rain { Pt (zt-1 ,xt ) t i n g planes; S t a b i l i z a t i o n of c u t t i n g plane
a l g o r i t h m s for s t o c h a s t i c linear p r o g r a m -
..kfff2t+l(xt-lxt, l]tlTt+l,~t~t+l dQt+ }
, ,
m i n g p r o b l e m s ; T w o - s t a g e s t o c h a s t i c pro-
g r a m m i n g : Q u a s i g r a d i e n t m e t h o d ; Stochas-
579
Multistage stochastic programming: Barycentric approximation
580