Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Entropy Optimization Parameter Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

E

E I G E N V A L U E E N C L O S U R E S FOR ORDI- One can consider the right-definite eigenvalue


N A R Y D I F F E R E N T I A L EQUATIONS problem
Selfadjoint eigenvalue problems for ordinary dif- Find ACRand~EV, ~#0, (1)
ferential equations are very important in the sci-
s.t. - ( I )for all e V.
ences and in engineering. The characterization of
eigenvalues by a minimum-maximum principle for Problem (1) has a countable spectrum of eigenval-
the Rayleigh quotient forms the basis for the fa- ues, and the eigenvalues can be ordered by magni-
mous Rayleigh-Ritz method. This method allows tude:
for an efficient computation of nonincreasing up- 0 < ~1 _< /~2 _<''" , lim Aj = co.
j--+oo
per eigenvalue bounds. N.J. Lehmann and H.J.
Maehly [6], [7], [8] independently developed com- The Rayleigh-Ritz procedure for calculating upper
plementary characterizations that can be used to bounds is a discretization of the Poinca% principle
compute lower bounds. These methods are based (cf. [9, Chapt. 22])
on extremal principles for the Temple quotient. In
Aj- min max jcN. (2)
general, however, an application of the Lehmann- ("l")'
dim E = j u:fiO
Maehly method requires that certain quantities
can be determined explicitly. This may be difficult If the linearly independent trial functions
or even impossible when dealing with partial dif- ul,...,u,~EV, nCN,
ferential equations. Of great importance is there-
are chosen, one can reduce (2) to the n-dimensional
fore a generalization, the Goerisch method [3], [4],
subspace Vn (the span of the chosen functions
[5], that may be used to overcome these prob-
{Ul,. • •, un}) and obtains the values
lems. Nevertheless, the original Lehmann-Maehly
method can easily be applied to a large class of
ordinary differential equations; in [10] it is shown,
which are upper bounds to the following )~j:
that the method can be interpreted as a special
application of the Rayleigh-Ritz method. Aj < A~ ]
__ ~ j- 1 ~ • . . ~ n •

AIn] is called a Rayleigh-Ritz bound for )~j. Now


, i

one forms the real n × n-matrices


I n c l u s i o n M e t h o d . Let (H, (.].)) be an infinite-
dimensional Hilbert space with the inner product Ao "-((?-tilUk))i,k=l,..., n , (3)
('1") and the norm [[.[[. Suppose that V is a dense A1 "-([uiiuk])i,k=l,..., n ,
subspace of H and that one has the inner product the Rayleigh-Ritz bounds are the eigenvalues of
[.[-] in Y such that (V, [.[.]) is a Hilbert space (the the matrix eigenvalue problem
norm in Y is denoted by [['[[y)" The embedding
V ~ H is assumed to be compact. A l x - A['~]Aox, (A["],x) C R × R ". (4)
Eigenvalue enclosures/or ordinary differential equations

The Rayleigh-Ritz bounds are monotonically de- This discretization (9), (10) is the L e h m a n n -
creasing in n E N. Goerisch procedure. Ajp[n] is called a L e h m a n n -
The Lehmann-Goerisch procedure (see [6], [7], Goerisch bound for Aj.
[4], [5], [3]) for calculating lower bounds can be
understood as the discretization of a variational N u m e r i c a l E x a m p l e . The numerical example is
principle for characterizing the eigenvalues as well. the well known Mathieu equation. This equation
This principle and a proof of the method is due to has been considered by several authors, bounds for
S. Zimmermann and U. Mertins [10]. eigenvalues of the Mathieu equation can be found
Let p E R be a spectral parameter such that for in [1], [9] and [3]. The eigenvalue problem reads as
an N C N the inequality follows
AN < p < AN+I (5) + - • e ,

holds true. One expresses the first N eigenvalues


• '(0)- -0,
in the form
1 where s C R, s > 0, is a parameter.
Ag+l-i = P + --, i = 1,..., N
(7i In order to treat this problem, the required
(assuming ai < 0). For u C V, wu E H denotes quantities can be defined as follows: I := (0, ~/2),
the uniquely determined solution of the equation
H "- L2(I), V "- Hi(I).
[u[v] = (wu[v) for ally C V,
The inner products (., .) and [., .] are given by
the following ai therefore are characterized by
7r/2
ai = inf max (6) (/, g) •- / f ( x ) g(x)dx for a l l f , g E H ,
E C V uEE
dim E--i u=/=O L ]

0
[/,g]
- 9 p[ lu] + p2( ,
i - 1 , . . . , N. A negative upper bound for ai results •- / (f'(x)d (x) + s cos2(x)f(x)g(x)) dx
in a lower bound for AN+I-i. In order to discretize
0
(6), one determines w l , . . . , wn C H such that
for all f, g E V.
[ui]v] = (wi]v) for ally e V, (7) With this definition the inner product [., .] and
then one defines the matrix the usual H 1 inner product are equivalent; the em-
bedding (V, [., .]) ~-+ (H, (., .)) is compact.
A2 "-((Wi[Wk))i,k-1,...,n, (8) Now the eigenvalue problem
and solves the matrix eigenvalue problem
Find A E R a n d ~ p E V, ~#0
(A1 - p A o ) x - T (A2 - 2 p A l + p2Ao) x, (9) s.t. [~lv] - A(~lv)for all v C V.
(r, x) ~ R x R ~.
is equivalent to the Mathieu equation. The trial
If for n E N the condition A~ ] < p is ful- functions Vk E V are defined by
filled, then (9) has exactly N negative eigenvalues vl(x) "- 1, (11)
T1 <_''" <_ TN < 0 < _ . . . <__ Tn. These Ti are up-
Vk(X) "-- cos(2(k - 1)x)
per bounds for our ai (ai __ Ti, i -- 1 , . . . , N). One
obtains the lower bounds forxEI, k-2,...,n.
1
Ap[n] "- p + < Aj, (10) With these trial functions the Rayleigh-Ritz up-
TN+I-j
per bounds AI'~] (eft (3), (4)) can be computed. For
j- 1,...,N. n - 5 one obtains
Eigenvalue enclosures for ordinary differential equations

AI5] i
2.28404873592 1 2.28404873561 2.28404873592
8.4560567005 2 8.4560566942 8.4560567005
19.606719005 3 19.6067171 19.6067191
39.5439779 For an example with a system of ordinary dif-
67.609198
ferential equations see [2].
The quality of these upper bounds can be in- See also: H e m i v a r i a t i o n a l inequalities:
creased by increasing n. Eigenvalue problems; Interval analysis:
An application of the Lehmann-Goerisch pro- E i g e n v a l u e b o u n d s of i n t e r v a l m a t r i c e s ;
cedure requires a spectral parameter p which is a Semidefinite programming and determinant
rough eigenvalue bound (el. (5)). For this aim the maximization; aBB algorithm.
Mathieu equation is considered for s = 0. This is
a second order problem with constant coefficients
References
and can be solved in closed form. Its eigenvalues [1] ALBRECHT, J.: 'Iterationsverfahren zur Berechnung
are Ai - 4 ( i - 1) 2, i E N. From the comparison der Eigenwerte der Mathieuschen Differentialgle-
theorem (see [3]) one can see that the Ai are lower ichung', Z. Angew. Math. Mechanics 44 (1964), 453-
bounds for the eigenvalues of the Mathieu equation 458.
[2] BEHNKE, H.: 'A numerically rigorous proof of curve
with s > 0; this can be used to verify the left hand
veering in an eigenvalue problem for differential equa-
side inequality of (5), the right-hand side inequal- tions', Z. Anal. Anwend. 15 (1996), 181-200.
ity can be examined by means of the Rayleigh-Ritz [3] BEHNKE, H., AND GOERISCH, F.: 'Inclusions for eigen-
bounds. For N = 4 one obtains values of selfadjoint problems', in J. HERZBERGER
(ed.): Topics in validated computations, Elsevier, 1994,
A3 _<A~n] _< 19.607 < p " - ~4 - 36 </~4. pp. 277-322.
If s is increased dramatically, it may be impossi- [4] GOERISCH, F.: 'Eine Verallgemeinerung eines Ver-
fahrens von N.J. Lehmann zur Einschlie~ung von
ble to satisfy (5). If this happens, one can link the
Eigenwerten', Wiss. Z. Techn. Univ. Dresden 29
eigenvalue problem under consideration and the
(1980), 429 - 431.
comparison problem by a homotopy method (cf. [5] GOERISCH, F., AND HAUNHORST, H.: 'Eigen-
[3]). wertschranken ffir Eigenwertaufgaben mit partiellen
The next task is the determination of wi E H Differentialgleichungen', Z. Angew. Math. Mechanics
65, no. 3 (1985), 129-135.
such that (7) holds true. In general this is a prob-
[6] LEHMANN, N.J.: 'Beitr~ige zur Lhsung linearer Eigen-
lem, but for differential equations, where the right- wertprobleme I', Z. Angew. Math. Mechanics 29
hand side is the identity, one can proceed as fol- (1949), 341-356.
lows: The operator on the left-hand side of the [7] LEHMANN~ N.J.: 'Beitr~ige zur Lhsung linearer Eigen-
differential equation is denoted by M; then the wertprobleme II', Z. Angew. Math. Mechanics 30
trial functions vi are chosen from 7:)(M) (that (1950), 1-16.
[8] MAEHLY, H.J.: 'Ein neues Verfahren zur gen~herten
means sufficiently smooth) such that all essen-
Berechnung der Eigenwerte hermitescher Operatoren',
tial and natural boundary conditions are satisfied. Helv. Phys. Acta 25 (1952), 547-568.
Now wi : - M vi fulfills (7). For the Mathieu equa- [9] WEINSTEIN, A., AND STENGER, W.: Methods of inter-
tion one can define mediate problems for eigenvalues, Acad. Press, 1972.
[10] ZIMMERMANN, S., AND MERTINS, U.: 'Variational
(M/)(x) .- + cos2( )/(x) bounds to eigenvalues of self-adjoint problems with ar-
and bitrary spectrum', Z. Anal. Anwend. 14 (1995), 327-
7r 345.
H. Behnke
now it is easy to see that the vi from (11) fulfill Inst. Math. TU Clausthal
vi E Y and wi " - M v i can be used in (7), (8). Erzstr. 1, 38678 Clausthal, Germany
E-mail address: behnke©math, tu-clausthal, de
From the eigenvalues of the matrix eigenvalue
problem (9) one obtains the following bounds: MSC2000: 49R50, 65L15, 65L60, 65G20, 65G30, 65G40
Eigenvalue enclosures for ordinary differential equations

Key words and phrases" upper and lower bounds to eigen- _ /xlnx if x>0,
values, Rayleigh-Ritz method, Lehmann-Maehly method. e(x)
[ cc if x <0,

is a proper convex function with the set {x: x E


ENTROPY OPTIMIZATION: INTERIOR
R, x _> 0} being its effective domain [6]. The con-
POINT METHODS, interior point algorithms/or
cept of proper convex function has often been
entropy optimization, interior point methods for
used to simplify convex analysis. For details about
entropy optimization
the theory of using Lagrange multipliers for solv-
This section introduces the interior point approach
ing constrained optimization problems defined in
to solving entropy optimization problems with lin-
terms of proper convex functions, the reader is re-
ear constraints. In particular, we consider the fol-
ferred to [6, Chap. 28].
lowing problem:
Rearranging terms in (2) results in
Program EL:
71 n n

min f(x) CTx + E djxj lnxj


(1)
-

j=l j=l j-1


s.t. Ax- b, x k O,

where c E R n d E R n d > O b E R m A is an i=1 j--1 i=1


)
(m x n)-matrix, O is an n-dimensional zero vec-
tor, and 01n0 - 0. When c - O and dj - 1, Considering the fact that dj > 0 and the shape of
j - 1 , . . . , n, Program EL becomes a pure entropy the entropic function x ln x, we know that, for any
optimization problem. given y E R m and z E R~, L(x, y, z) achieves its
unique minimum at x* > O. Also, its first deriva-
Denote the feasible region of Program EL by
Fp - {x E R n" A x - b; x _> O} a n d the (rel- tive at x* vanishes. This implies
ative) interior of Fp by F ° - {x E R n" A x - m

b ; x > O}. An n-vector x is called an interior dj lnx~ - E aijYi + cj + dj - zj >_0. (3)


solution of Program EL if x E Fp°. With these def- i=1
initions, we have the following verifiable result"
Multiplying both sides of (3) by x~ and summing
LEMMA 1 If Fp is nonempty, then Program EL has over j produces
a unique optimal solution. Moreover, if Fp has a
nonempty interior, then the unique optimal solu- n n

tion is strictly positive. K] E cjx; + E djx; In x;


j=l j=l
All interior point methods, including those to be
discussed in this section, require the fundamental
assumption that Fp has a nonempty interior, i.e.,
-
j=l i=1
a,jy, + zj ) x;

Fp° # @. A Lagrangian dual can be derived in the n

following manner. For all x E R n , y E R m , a n d - _Zdj ;.


z E R ~ - { x ' x E R n, x > O}, define the fol- j=l
lowing Lagrangian function:
Consequently, for any y E R m and z E R n+~
n n

L(x, y, z) - E cjxj + ~ die(xj) (2)


j=l j-1 m n

L(x*, y, - b,y - djx;,


i=1 j-1
aij xj Yi E zj xj ,
i=1 j=l j=l
where x* satisfies (3). Therefore, a Lagrangian dual
where of Program EL becomes
Entropy optimization: Interior point methods

m tl
ii) There exists y E R rn such that, together
max
yER m with x, dj ln xj + cj + dj - ~im=l aijYi >_ 0
zER~ i=1 j--1
or V / ( x ) - A-ry > O. Similarly, this can be
m

s.t. dj In x; - ~ aijyi + cj + dj - zj, viewed as the 'dual feasibility condition'.


i=1 iii) For all j - 1 , . . . , n , (dj l n x j + cj + dj -
j- 1,...,n. ~ i m l aijYi)Xj - O. This can be viewed as the
'complementary slackness condition'.
This dual is equivalent to
Program DEL" Note that, by (5), the Lagrange multipliers as-
sociated with the constraints of Program DEL at
m n
its optimal solution happen to coincide with the
max
yEl~ m
L(x, y) - ~ biyi- ~ djxj
x-component of the optimal solution of Program
O<xERn i=1 j=l
m DEL. This, together with the fact that the dual of
s.t. dj In xj + cj + dj - ~ aijYi >_ O, Program DEL is Program EL, imply that the opti-
i=1 mal solution of Program DEL contains the optimal
j-- 1,...,n. solution of Program EL.
(4) Also note that an alternative dual program
Note that x is strictly positive because In 0 is not can be defined by considering the following La-
well-defined. However, if we define In 0 - - c ~ , the grangian:
n n
domain of x in Program DEL can be replaced by
L" (x, y) - ~ cjxj + ~ djxj In xj
{ x ' x E R n , x > O}. Denote the excess vector
j=l j=l
V / ( x ) - A T y by s. The j t h component of s is sim-
ply dj In xj + cj + dj - ~im=l aijyi, which is the left-
-- ~ aij x j -- bi Yi ,
hand side of (4). Denote the feasible region of Pro-
i=1
gram DEL by Fd -- {(x, y)" V / ( x * ) - A T y * _> O},
and assume that Fd has a nonempty interior. for x _> 0 and y E R m. In this expression, no La-
We now derive the Karush-Kuhn-Tucker con- grange multipliers are defined for the constraints
ditions for Program DEL. First, define, for all x 2 O, and it leads to the following dual program:
m
u _> O, the following Lagrangian:
max ~
y E R rn
biYi
m n
i=1

i=1 j=l - ~ dj exp ~-~d=l a i j y i - ej _ 1 .

+ ~
"=
uj ( dj lnxj + cj + dj -
i=1
aijy i °
j=l dj

Since this dual program is unconstrained, any so-


lution algorithm can be viewed as an interior point
Setting the partial derivatives with respect to Yi algorithm. For details about this approach and
and xj to zero gives companion efficient solution algorithms, see [2].
n In the rest of this section, we focus on the de-
bi - ~ aijuj -- O, i - 1 , . . . , m, velopment of a primal-dual interior point algo-
j=l rithm [5]. Note that, to obtain the algorithm, Pro-
ujdj gram DEL, rather than the unconstrained dual
-dj +- -- O, j - 1, . . . , n. (5)
xj program, was used in [5]. The primal-dual inte-
rior point algorithm starts with an initial primal
Note that (5) is equivalent to uj - xj. Therefore,
feasible solution x ° and an initial dual feasible so-
the KKT conditions for Program DEL become
lution y0. While the algorithm iterates, it main-
i) There exists x E R n such that A x - b and tains the primal and dual feasibility conditions
x _> O. This can be viewed as the 'primal and reduces the complementary slackness. In other
feasibility condition'. words, the algorithm iterates from a pair of inte-
Entropy optimization: Interior point methods

rior solutions (x k,yk), with A x k - b , x k > O n

l_Eln(XjSj) <1 n 1
() ---lnn.
and s k - V f ( x k) - A T y k > O, to a new interior n xTs -- n
j=l
solution pair (xk+l,y k+l) such that the comple-
mentary slackness is reduced from 5k -- (xk) Tsk Consequently,
n
t o ~ k + l -- (xk+l) Tsk+l. The algorithm terminates
when ~k ~ •, for some given c > 0 (or when the xTs --
j=l
difference between f ( x k) and the optimum is suf-
Therefore, the target potential should be ( p -
ficiently small).
n) In e + n Inn. Given the potential associated with
To describe the algorithm, we use the boldface
the initial solution, the exact amount of potential
upper-case letters X, S, and W to denote the diag-
reduction is ¢ ( x °, s °) - (p - n) In e - n Inn. Note
onal matrices formed by the components of vectors
that for a given inaccuracy tolerance e, the tar-
x, s, and w, respectively. We also denote the vec-
get potential is indeed the minimum of all the po-
tors of all ones of appropriate dimensions by e, the
tentials associated with all (x,s) pairs such that
12 norm by ]]']1, and the vector whose components
x T s - c. This is indicated by the tight geometric-
are ln(xj)'s, j - 1 , . . . , n , by ln x.
arithmetic inequality.
Rather than dealing with the complementary
Given the knowledge of how much potential
slackness 5k directly, the following primal-dual po-
reduction needs to be, if an algorithm reduces
tential function [8]
the potential by a constant amount in each it-
n
eration, then the complexity of the algorithm is
¢(x, s) -- pln(xTs) -- E ln(xjsj),
j=l
O ( ¢ ( x °, s °) - (p - n) In e - n Inn).
Assume that, in iteration k, we have a primal-
where p >_ n + v ~ , can be used as a surrogate
dual feasible solution pair (x k, yk) and the slack
measure [5].
vector s k - V f ( x k) - A T y k > O. Ideally, one
Given the initial solution pair, the potential of
would like to find (x k+i, yk+l) such that the K K T
the associated complementary slackness can be
conditions are met, i.e.,
calculated. Given the inaccuracy tolerance e, a
Ax k+i-b, x k+i_O,
target potential can be calculated. Therefore, the
amount of required potential reduction can be cal- V f ( x k+i) -- A T y k+i ~ O,
culated. The primal-dual interior point algorithm, X k+i ( V f (x k+i) - ATyk+i) _ O.
under proper conditions, will reduce the potential
by a constant amount in each iteration. Define
Note that two different pairs of (x, s) that have AX -- X k+l -- X k,
the same complementary slackness measure may A y -- yk+i _ yk,
have different potentials. Therefore, to ensure that
As -- s k +1 _ s k,
the target potential is sufficiently small, we need to
find the minimum potential among all those (x, s) A X - X k+i - X k.
pairs such that xTs -- e, or a lower bound of this
With these definitions, the conditions stated above
minimum potential.
become
Rewrite the potential function as
n
A(x k+Ax)-b , x k+Ax>_O,
¢(x, S) -- (p -- n ) l n ( x Ts) -- E In ( x j s j V f ( x k + Ax) - A T (yk + Ay) >_ O,
j=l
xSs) "
( x k + z x) (6)
Applying the geometric-arithmetic inequality re-
sults in × [Vf (x k + Ax) - AT (yk + Ay)] -- O.

! ~ !
Note that quantity in the bracket of (6) is simply
11~, xTs ] -- n . xTs n s k+i - s k + As, where
j=l j=l

Taking the natural logarithm leads to As - V f ( x k + A x ) - V f ( x k) - AT Ay.


Entropy optimization: Interior point methods

Therefore, we have II,kll C132min(xksk), (11)


( x k + A X ) ( s k + As) - o ,
or then
X k s k Jr- X k A s ~ - / ~ X s k d - / ~ X A s - O,
¢ ( X k, S k) -- ¢ ( X k + l , S k + l ) > ")',
or
x k / k , s -+- s k / ~ x -- - - / ~ X / ~ s - X k s k.
where ~ - (v/3/2)f~(1 - C/~) -/32(1 + Cf~) 2.
Solving the equations
Condition (11) can be achieved by solving the
A ( x k + Ax) - b, following set of linear equations"
X k A s + SkAx - - A X A s - Xks k,
subject to the condition X k ( V 2 f ( x k ) A x - A TAy) + SkAx -- Opk, (12)
Vf(x k + Ax)- A T ( y k + Ay) >_ O AAx- O. (13)

is in general difficult.
GivenO<x kCFp,s k-V/(x k)-ATy k >0, Note that the vector V 2 f ( x k ) A x replaces V f ( x k +
and 5k _ (xk)Tsk, the algorithm proposed in [4] A x ) - V f ( x k) of (7) and serves as a simple linear
solves the following system of nonlinear equations approximation. Equations (12) and (13) are key to
f o r / k x and Ay: the 'potential-reduction' primal-dual interior point
algorithm.
X k As + S k A x - 0p k, (7)
Given an initial interior point solution, an inte-
A A x = O, (8) rior point algorithm can be stated as follows.
where 0 > 0 is a constant to be specified later and
Initialization:
pk
(~k x k s k e, Given an initial primal interior point solution x °
- - - e -

P and an initial dual solution yO such that A x ° -


b,x °>0,ands °=Vf(x °)-ATy °>0,
and n + x/~ <_ p < 2n.
calculate 5 ° = (x °)Ts°;
By choosing set k +--0.
Iteration:
13minj ( ~/xk s jk) IF 5 k < e, THEN STOP
O=
II(XkSk)-0-Spkll ELSE
solve (12), (13) for A x and Ay;
for some 0 < / ~ < 1 yet to be determined, we ob-
set
tain x k+l - - x k + Ax;
sk+ < ¢ ( x k, s _ yk+l _ yk + Ay;
sk+l _-- V f ( x k+l) - A T y . k + l ;
for a constant -r > 0. Let C > 0 be a real number. 5k+~ _ (xk+~)Tsk+~;
Choose/3 such that reset k +-- k -t- 1 for the next iteration.
END IF
0</3<1, (9)
1 With a standard procedure for obtaining an ini-
~(1 + C~) <_ ~, 1 - C~ >__0. (10)
tial solution [1], the following theorem of polyno-
It can be shown [5] that, to reduce the potential mial time convergence was shown in [5].
by a constant amount in each iteration, solving a
THEOREM 2 Suppose that e > 0 and 2n > p _ n +
linear approximation of equations (7) and (8) can
v ~ . Then, in the kth iteration, x k > O, s k > O,
achieve the required accuracy.
and x k and yk are feasible for Programs EL and
Suppose that n + x/~ _< p < 2n and that A x DEL. Moreover, the interior point algorithm ter-
and A y satisfy minates in at most O ( ¢ ( x °, s ° ) - ( p - n ) In e - n In n)
A A x - O, X k A s -+- s k / k x -- Op k + z k, iterations. E]
Entropy optimization: Interior point methods

It was also suggested that, in practical imple- Hall, 1993.


[2] FANG, S.-C., RAJASEKERA, J.R., AND TSAO, H.-S.J.:
mentation, the stepsize can be set to ~ based on
Entropy optimization and mathematical programming,
a line search such that ~ - argmin~>0 ¢ ( x k +
Kluwer Acad. Publ., 1997.
rlAx, s k + rlAs ). With this stepsize, one can set [3] HAN, C.-G., PARDALOS, P.M., AND YE, Y.: 'Imple-
X k+l -- X k ~- ~/kX, y k + l -- yk + ~Ay. mentation of interior-point algorithms for some entropy
The search direction is a combination of a de- optimization problems', Optim. and Software 1 (1992),
cent direction and a centering direction. To enable 71-80.
[4] KORTANEK, K.O., POTRA, F., AND YE, Y.: 'On some
local quadratic convergence, a computable crite- efficient interior point methods for nonlinear convex
rion was developed under which a pure Newton programming', Linear Alg. ~ Its Appl. 152 (1991),
method for solving V f ( x ) - A T y -- O, A x - b 169-189.
(by solving the linear system of ~ 7 2 f ( x k ) A x - [5] POTRA, F., AND YE, Y.: 'A quadratically convergent
AT-Ay - - s k and A A x - O) can be applied polynomial algorithm for solving entropy optimization
problems', SIAM J. Optim. 3 (1993), 843-860.
for the rest of the search process. Note that when
[6] ROCKAFELLAR, R.T.: Convex analysis, Prince-
x k is close to the optimal solution, we have x k be- ton Univ. Press, 1970.
ing strictly positive, and therefore V f (x) - ATy [7] SHEU, R.L., AND FANG, S.-C.: 'On the generalized
should be close to O. Implementation of primal- path-following methods for linear programming', Op-
dual interior point algorithms proposed in [5] is tim. 30 (1994), 235-249.
[8] TODD, M.J., AND YE, Y.: 'A centered projective al-
discussed in [3].
gorithm for linear programming', Math. Oper. Res. 15
In addition to the 'potential-reduction' interior (1990), 508-529.
point method described above, the 'path follow- [9] ZHU, J., AND YE, Y.: 'A path-following algorithm for a
ing' interior point method, which follows an ideal class of convex programming problems', Working Pa-
interior trajectory to reach an optimal solution, per College of Business Administration, Univ. Iowa,
was proposed in [9], [7]. The convergence of the no. 90-14 (1990).

path following interior point method has been es- Shu-Cherng Fang
tablished. However, to the best of our knowledge, North Carolina State Univ.
North Carolina, USA
possible polynomial time convergence behavior re-
E-mail address: fangCeos .ncsu. edu
mains an open issue.
H.-S. Jacob Tsao
See also: E n t r o p y o p t i m i z a t i o n : S h a n n o n San Jose State Univ.
m e a s u r e of e n t r o p y a n d its p r o p e r t i e s ; San Jose, California, USA
J a y n e s ~ m a x i m u m e n t r o p y principle; M a x - E-mail address: jtsao©email, sjsu. e d u
i m u m entropy principle: Image reconstruc- MSC2000: 94A17, 90C51, 90C25
tion; E n t r o p y o p t i m i z a t i o n : P a r a m e t e r esti- Key words and phrases: entropy optimization, interior point
m a t i o n ; H o m o g e n e o u s s e l f d u a l m e t h o d s for methods, primal-dual algorithm, polynomial time conver-
linear p r o g r a m m i n g ; L i n e a r p r o g r a m m i n g : gence.
Interior point methods; Linear program-
ming: K a r m a r k a r p r o j e c t i v e a l g o r i t h m ; Po-
t e n t i a l r e d u c t i o n m e t h o d s for l i n e a r pro- ENTROPY OPTIMIZATION: PARAMETER
g r a m m i n g ; Successive q u a d r a t i c p r o g r a m - ESTIMATION
ming: S o l u t i o n by a c t i v e sets a n d i n t e r i o r
p o i n t m e t h o d s ; S e q u e n t i a l q u a d r a t i c pro-
g r a m m i n g : I n t e r i o r p o i n t m e t h o d s for dis- I n t r o d u c t i o n . Entropy optimization has been ap-
tributed optimal control problems; Interior plied to problems in various fields of interest from
p o i n t m e t h o d s for s e m i d e f i n i t e p r o g r a m - thermodynamics to financial planning. In this con-
ming. text 'entropy' refers to the amount of uncertainty
in a system, rather than the amount of disorder. A
detailed definition of entropy can be found in [4].
References
[1] FANG, S.-C., AND PUTHENPURA, S.: Linear optimiza- One area of application, which has not received
tion and extensions: theory and algorithms, Prentice- much attention in recent years, is that of param-
Entropy optimization: Parameter estimation

eter estimation. The estimation of parameters in It is assumed that when qi - O, the associated pi
semi-empirical mathematical models is a process also is zero and 0 in o _ 0. This function is re-
which is important in many disciplines in the sci- ferred to as the Kullback-Leibler measure of cross-
ences and engineering. This article will focus on entropy.
a few different areas of the parameter estimation
problem which have been approached from an en- Jaynes' M a x i m u m Entropy for Continuous
tropy perspective. Jaynes'maximum entropy prin- Distributions. Since most distributions encoun-
ciple allows for the estimation of parameters in a tered in practice are continuous in nature, Jaynes'
statistical distribution function by specification of principle of maximum entropy (MaxEnt), must
the characteristic moments. This method can also first be extended to continuous distributions. This
be used to derive the principle of maximum like- extension is straight forward and results in:
lihood, one of the most widely used parameter es- b
timation approaches. Entropy principles have also
max - f f (x)in f (x) dx
been used to derive theoretical 'best estimators' for
a
recursive parameter estimation schemes. These re- b
sults can then be used to gauge the performance
s.t. f f(x)dx - 1 (3)
of various nonoptimal approaches. A final applica- a
tion involves the development of a measure which b

not only allows for the estimation of model param- f (X)gr(X) dx - at,
eters, but also simultaneously choosing the best a
mathematical form of the model. r- 1,...,m,

where f(x) is a continuous probability density


E n t r o p y M e a s u r e s . In order to optimize entropy, function from a to b. The Lagrange function takes
one must possess some quantitative measure of the the form of:
entropy of a given distribution. One such measure
was developed by C.E. Shannon [8]. Shannon ar-
L - - / f(x)in f(x) dx (4)
rived at the function by postulating a set of prop-
erties which the measure should have, and then
deriving a form which possesses those properties.
For a probability distribution p = ( p l , . . . , Pn), the
function takes the form of:
1)
11
S-
n

-Epilnpi. (1) m i! f (x)g~(x) dx - a~1


r--1
i=1
Using the Euler-Lagrange equation the following
Shannon also proved that this function was unique
expression results:
for the postulated set of properties. Other re-
searchers have postulated different sets of prop- f(x) -- exp [--/~0 - - ) ~ l g l ( x ) .... )~mgm(X)]. (5)
erties, but arrived at the same result [4].
A detailed discussion can be found in [4].
Another measure of entropy, in this case the
cross entropy or distance between two distribu-
M a x E n t Estimation Method. The estimation
tions, was presented by S. Kullback and R.A.
of parameters in a statistical distribution using
Leibler [5]. For two given distributions, p =
MaxEnt follows these steps"
(Pl,...,Pn), and q - (ql,...,qn), the function
takes the form: 1) Specify m characterizing functions,

n
I - E Pi In P~. (2) 2) Use MaxEnt to find f(x), which is given by
i=1 qi (5).
Entropy optimization: Parameter estimation

3) Find estimates of the values of the moment b b


equations from the observed data set x - - / / ( x , @) ln f (x, O) dx - / In f (x, O) dF.
{ x l , . . . ,Xn} through the relationship: a a

(12)
1
a r -- --[gr(Zl) -~- ' ' " + g r ( X n ) ] . (6)
n The knowledge which is given by the observation
4) Determine estimates of the Lagrange multi- is"
pliers, A0,..., Am, from"
' F ( x , O) - 0 when x < x l
f : gr(x)e-~g~(x) ..... ~mgm(x) dx 1
(7) O) - - when xl _~ x < X2
n
ar = f : e-'Algl (x) . . . . . "Amgm (z) d x

and r
F(x, 0) - - when xr < x < Xr +l
b n

e~O _ J e-~g~(x) ..... ~mgm(x) dx. (8)


a ,F(x, e ) - 1 when xn <_ x
5) The estimated function then takes the form" where F(x, O) is the cumulative density. Thus the
entropy of the sample is then written as"
1
[In f (xl, @) + ' " + In f (Xn, e ) ] , (13)
n
M a x i m u m L i k e l i h o o d f r o m M a x E n t . The
principle of maximum likelihood has been widely which is equal to •
used to estimate the parameters of both sta- 1
[n(x,,. . . ,Xn;O,,.. . ,Om)] , (14)
tistical distributions and semi-empirical models. n
Maximum likelihood assumes that information ex- where L is the same as described by (11). There-
ists about a random variable in the form of an fore to minimize the entropy of the sample the like-
observation, x l , . . . , x n , and a density function, lihood function must be maximized.
f(x; 0 1 , . . . , Om), unlike in MaxEnt where the forms
of the characterizing moments are known. The ap- R e c u r s i v e P a r a m e t e r E s t i m a t i o n . The deter-
proach seeks to maximize the likelihood that the mination of the parameters of a dynamical system
given observations will occur given a set of param- on-line is a key step in the implementation of a
eters. If each observation is independent, then this wide range of control schemes. The estimation pro-
'likelihood' is defined as: cedure is conducted in a recursive fashion in which
n
the estimates from the previous time step are com-
L(X; O) - H f(xi[O). (10)
bined with the current state observations to calcu-
i=1
late a new set of parameter estimates. The analy-
The log likelihood function is most often used:
sis of the estimation process used is typically ap-
n
proached from a mean square error criterion. This
l n L ( X ; O) - E In f(xi]O). (11)
method requires some assumptions about the error
i=1
to be made and the form of the data processor to
The ln L is maximized to determine the optimal
be restricted. H.L. Weidemann and E.B. Stear [9]
parameter estimates O.
presented an approach based on entropy concepts
The same objective can also be derived using the which has various benefits over the mean squared
concept of MaxEnt, even though the former pre-
error method:
dates the latter. The parameters need to be chosen
such that the entropy which remains after the ob- • The form of the optimal data processor is not
served values are known is large as possible. This constrained nor does it have to be known.
implies that the entropy of the observation itself • Errors are not restricted to have a normal
has to be a minimum. The entropy is given by: probability distribution.

10
Entropy optimization: Parameter estimation

• None of the operators in the system are re- I(~; (9) - I([; D(O)), (19)
quired to be linear.
or, in words, that D(O) preserves energy. When
Before continuing with the analysis, various this does not hold, I(~; O) > I(~; D(O)).
measures need to be defined. The entropy of a K- The problem now is to determine the function
dimensional random vector X with the joint prob- F which will produce an optimal estimator. The
ability density function, px(xl,... ,xk) is defined theoretically best function results in a minimum
&S: of the error entropy, defined to be H0. The only
OO
constraint on the approach is that the mutual in-
H(X) - - / pz(X) lnpz(X) dX. (15) formation, I(O; Z), must be known. With that the
--00 following can be stated:
If Rx is the covariance matrix of the vector X then • The minimum entropy of the error vector is
the following holds" given by:
1 H0 = H(U) - / ( U ; Z). (20)
H ( X ) < ~ In { (27re) g det[Rz] }. (16)
Minimizing the mutual information, I(X; Z),
When X is a Gaussian random vector then (16)
is equivalent to the minimization of the er-
holds as an equality. Another quantity which will
ror vector. This is achieved be choosing F(Z)
be used in the analysis is referred to as the mutual
such that Z and X are independent.
information between X and Y.
Whether or not D(O) preserves energy, the

// reduction in the processed parameter en-


tropy, H ( U ) , is bounded above by I(O; Z),
that is,
Y)in Y) e x eY H(D(O)) - H ( X ) _< I(O; Z), (21)
pv(y)pv(y) "

Xk
and the equality holds when D(@) A preserves
I V. t -
energy and the optimal processor, F, is used.
;__k~ ........
The
Dynamical These three statements now make it possible to
System determine the best possible performance an esti-
Sensor [ zk t mator can achieve for a given system. The proofs
Yk I Estimator of these statements and a simple example can be
found in [9]. The extension of the theorems to the
Fig." Typical parameter estimator.
continuous time case is given in [6], and to the
The object is to estimate a vector O of un- similar problem of state estimation in [7].
known parameters with the joint probability den-
sity function, Po(01,..., Om). The output of the dy- Parameter Estimation and Model Selection.
namical model as a function of these parameters For most problems of any physical significance the
is expressed as Yk(Ol,..., Om, k). These outputs are form of the model equations are not known with
then measured by a sensor to produce {zk}. These absolute certainty. In this lies the problem of not
measurements are then used by the data processor only estimated unknown parameters, but also de-
F to produce an r-dimensional vector V which is termining the best fitting model. Given a set of N
an estimate of D(O). The estimation error is given independent observations, X l , . . . , XN, of a random
by: variable from an unknown true distribution g(x),
the objective is to estimate this true distribution
X-D(O)-V-D(O)-F(Z)=U-V. (18)
by choosing a member of a family of distributions
Also, under certain conditions, the transform given by f (xlO) where O is a vector of parameters.
D(O) will possess a property that for any given In order to accomplish this, the distance between
random vector ~, the following holds" the two distributions needs to be minimized. The

11
Entropy optimization: Parameter estimation

entropy of the true distribution is given by: [3] BOZDOGAN, H.: 'Model selection and Akaike's infor-
mation criterion (AIC): The general theory and its an-
S(g; g) - / g(x) in g(x) dx (22) alytical extensions', Psychometrika 52, no. 3 (1987),
345-370.
while a measure of the cross-entropy is given by: [4] KAPUR, J.N., AND KESAVAN, H.K.: Entropy optimiza-
tion principles and applications, Acad. Press, 1992.
S(g; f (xJO) - / g(x) In f (xJO) dx. (23) [5] KULLBACK, S., AND LEIBLER, R.A.: 'On information
and sufficiency', Ann. Math. Statist. 22 (1951), 79-86.
The Kullback-Leibler (K-L) measure is defined as: [6] MINAMIDE, N.: 'An extension of the entropy theo-
rem for parameter estimation', Inform. and Control 53
I -- s ( g ; g) - S ( g ; / ( x l e ) (24)
(1982), 81-90.
[7] MINAMIDE, N.~ AND NIKIFORUK, P.N.: 'Conditional
-
f g(x) in f~-xi~)
g(x) dx . entropy theorem for recursive parameter estimation
and its application to state estimation problems', In-
Therefore the solution involves the minimization
ternat. J. Syst. Sci. 24, no. 1 (1993), 53-63.
of the K-L measure [3]. [8] SHANNON, C.E.: 'A mathematical theory of communi-
Take the example of a family of possible dis- cation', Bell System Techn. J. 27 (1948), 379-423,623-
tributions each one having a different number, k, 659.
of unknown parameters, Ok. These are denoted [9] WEIDEMANN, H.L., AND STEAR, E.B.: 'Entropy anal-
ysis of parameter estimation', Inform. and Control 14
by f(xJ(~k). The resulting form of the measure
(1969), 493-506.
to choose the correct distribution is referred to as
William R. Esposito
Akaike's information criterion (AIC)[1]:
Dept. Chemical Engin. Princeton Univ.
AIC(k) - - 2 1 n L ( O k ) + 2k, (25) Princeton, NJ 08544-5263, USA
E-mail address: randyOtitan, princeton, edu
where ln L(Ok) is the value of the log likelihood Christodoulos A. Floudas
function with optimally determined parameters Dept. Chemical Engin. Princeton Univ.
Ok. It is proven in [3] that this result is obtained Princeton, NJ 08544-5263, USA
by the minimization of the K-L measure given by E-mail address: f l o u d a s O t i t a n , princeton, edu
(24). MSC2000: 94A17, 62F10
A secondary problem in the area of model selec- Key words and phrases: maximum entropy, parameter esti-
tion, is sequential design of experiments. The con- mation, model identification.
cept of entropy has been applied to this problem
in [2]. A total entropy criterion is developed which
includes the uncertainty in the model selected as ENTROPY OPTIMIZATION: SHANNON
well as the uncertainty in the parameter values in MEASURE OF ENTROPY AND ITS PROP-
each model. The use of this measure leads to a ERTIES
choice of an experiment for which the outcome is The word entropy originated in the literature on
the most uncertain. thermodynamics around 1865 in Germany and was
See also: Entropy optimization: Shannon coined by R. Clausius [4] to represent a measure of
measure of entropy and its properties; the amount of energy in a thermodynamic system
Jaynes ~ maximum entropy principle; Maxi- as a function of the temperature of the system and
mum e n t r o p y p r i n c i p l e : I m a g e reconstruc- the heat that enters the system. Clausius wanted
tion; E n t r o p y o p t i m i z a t i o n : I n t e r i o r p o i n t a word similar to the German word energie (i.e.,
methods. energy) and found it in the Greek word ~TTpor~,
which means transformation [1]. The word entropy
References had belonged to the domain of physics until 1948
[1] AKAIKE, H.: 'A new look at the statistical model when C.E. Shannon, while developing his theory of
identification', IEEE Trans. Autom. Control 19, no. 6
communication at Bell Laboratories, used the term
(1974), 716-723.
[2] BORTH, D.M.: 'A total entropy criterion for the dual
to represent a measure of information after a sug-
problem of model discrimination and parameter esti- gestion made by J. yon Neumann. Shannon wanted
mation', J. Royal Statist. Soc. B 37 (1975), 77-87. a word to describe his newly found measure of un-

12
Entropy optimization: Shannon measure of entropy and its properties

certainty and sought Von Neumann's advice. Von particular moments of the distribution, e.g., the
Neumann's reasoning to Shannon [25] was that: expected value. In this case, a mathematical def-
'No one really understands entropy. Therefore, if inition of 'uncertainty' is crucial. This is the case
you know what you mean by it and you use it when where Shannon's measure of uncertainty, or Shan-
you are in an argument, you will win every time.' non's entropy, plays an indispensable role [20].
Whatever the reason for the name is, the con- To define entropy, Shannon proposed some ax-
cept of Shannon's entropy has penetrated a wide ioms that he thought any measure of uncertainty
range of disciplines, including statistical mechan- should satisfy and deduced a unique function, up
ics [12], thermodynamics [12], statistical inference to a multiplicative constant, that satisfies them. It
[24], business and finance [5], nonlinear spectral turned out that this function actually possesses
analysis [21], image reconstruction [3], transporta- many more desirable properties. In later years,
tion and regional planning [26], queueing theory many researchers modified and replaced some of
[10], information theory [20], [9], statistics [17], his axioms in an attempt to simplify the reason-
econometrics [8], and linear and nonlinear pro- ing. However, they all deduced that same function.
gramming [6], [7]. We first focus on finite-dimensional entropy, i.e.,
The concept of entropy is closely tied to the Shannon's entropy defined on discrete probabil-
concept of uncertainty embedded in a probability ity distributions that have a finite number of out-
distribution. In fact, entropy can be defined as a comes (or states). Let p - ( p l , . . . , pn)7- be a prob-
measure of probabilistic uncertainty. For example, ability distribution associated with n possible out-
suppose the probability distribution for the out- comes, denoted by x - (xl, ... , Xn)7-, of an exper-
come of a coin-toss experiment is (0.0001, 0.9999), iment. Denote its entropy by Sn(p). Among those
with 0.0001 being the probability of having a tail. defining axioms, J.N. Kapur and H.K. Kesavan
One is likely to notice that there is much more stated the following [15]"
'certainty' than 'uncertainty' about the outcome
1) Sn(p) should depend on all the pj's, j =
of this experiment and hence about the proba-
1,...,n.
bility distribution. In fact, one is almost certain
that the outcome will be a head. If, on the other 2) Sn (p) should be a continuous function of pj,
hand, the probability distribution governing that j-1,...,n.
same experiment were (0.5, 0.5), one would real-
3) S , ( p ) should be permutationally symmetric.
ize that there is much less 'certainty' and much
In other words, if the pj's are merely per-
more 'uncertainty,' when compared to the previous
muted, then S , ( p ) should remain the same.
distribution. Generalizing this observation to the
case of n possible outcomes, we conclude that the 4) S n ( 1 / n , . . . , 1/n) should be a monotonically
uniform distribution has the highest uncertainty increasing function of n.
out of all possible probability distributions. This
5) Sn(Pl,...,Pn) -- Sn-I(Pl + P2,P3,...,Pn)
implies that, if one had to choose a probability
+ + p2) $2 + p2), p2 / + p2)).
distribution for a chance experiment without any
prior knowledge about that distribution, it would Properties 1, 2 and 3 are obvious. Property 4
seem reasonable to pick the uniform distribution. states that the maximum uncertainty of a proba-
This is because one would have no reason to choose bility distribution should increase as the number of
any other and because that distribution maximizes possible outcomes increases. Property 5 is the least
the 'uncertainty' of the outcome. This is called obvious but states that the uncertainty of a proba-
Laplace's principle of insufficient reasoning [15]. bility distribution is the sum of the uncertainty of
Note that we are able to justify this principle with- the probability distribution that combines two of
out resorting to a rigorous definition of 'uncer- the outcomes and the uncertainty of the probabil-
tainty.' However, this principle is inadequate when ity distribution consisting of only those two out-
one has some prior knowledge about the distribu- comes adjusted by the combined probabilities of
tion. Suppose, for example, that one knows some the two outcomes.

13
Entropy optimization: Shannon measure of entropy and its properties

It turns out that the unique family of func- considered necessary for any reasonable measure
tions that satisfy the defining axioms has the of uncertainty [19], [20], [16]. The concept of en-
rt
form Sn(p) - - k ~ j = l PJ lnpj, where k is a pos- tropy, when extended for probability distributions
itive constant, In represents the natural logarith- defined on a countably infinite sample space, takes
mic function, and 01n0 - 0 [15]. Shannon chose the form o f - ~#~=1PJ ln pj. It can still be viewed
- ~ j n I pj lnpj to represent his concept of entropy as a measure of uncertainty but such an interpre-
[20]. Among its many other desirable properties, tation does not enjoy the same degree of math-
we state the following: ematical rigor as its finite-sample-space counter-
part. When the concept is extended for continu-
6) Shannon's measure is nonnegative and con-
ous probability distributions, it is defined to be
cave in Pl , • • •, Pn.
- f p(x) In p(x) dx. However, it can no longer be
7) The measure does not change with the inclu- interpreted as a measure of uncertainty at all [9],
sion of a zero-probability outcome. [11]. Rather, it can only be viewed as a measure of
8) The entropy of a probability distribution rep- relative uncertainty [15].
resenting a completely certain outcome is 0, Note that, with Shannon's entropy as the mea-
and the entropy of any probability distribu- sure of uncertainty, in the absence of any prior
tion representing uncertain outcomes is pos- information about the underlying probability dis-
itive. tribution, the best course of action suggested by
9) Given any fixed number of outcomes, the the principle of insufficient reasoning is to choose
maximum possible entropy is that of the uni- the uniform distribution because it possesses max-
form distribution. imum uncertainty. Given the knowledge of some
moments of the underlying distribution, the same
10) The entropy of the joint distribution of two
reasoning leads to the following principle:
independent distributions is the sum of the
individual entropies. • Out of all possible distributions that are con-
11) The entropy of the joint distribution of two sistent with the moment constraints, choose
dependent distributions is no greater than the one that has maximum entropy.
the sum of the two individual entropies. This principle was proposed by E.T. Jaynes ([15,
Property 6 is desirable because it is much eas- Chapter 2]), and has been known as the principle
ier to maximize a concave function than a noncon- of m a x i m u m entropy or Jaynes' m a x i m u m entropy
cave one. Properties 7 and 8 are appealing because principle. It has often been abbreviated as MaxEnt
a zero-probability outcome contributes nothing to in literature.
uncertainty, and neither does a completely certain Let X be a random variable with n possible
outcome. Property 9 was discussed earlier. Proper- outcomes { x l , . . . , x n } and p - (pl,...,Pn)-r be
ties 10 and 11 state that joining two distributions a vector consisting of corresponding probabilities.
does not affect the entropy, if they are indepen- Suppose that gl ( X ) , . . . , gm ( X ) are m functions of
dent, and may actually reduce the entropy, if they X with known expected values a l , . . . , am, respec-
are dependent. tively. The principle of maximum entropy leads to
Shannon's entropy was originally defined for a the following mathematical optimization problem:
probability distribution over a finite sample space, rt

i.e., a finite number of possible outcomes, and can max H1 (p) - - ~ pj in pj


be interpreted as a measure of uncertainty of the j=l
n
probability distribution. It has subsequently been
s.t. EPi gi(xj) -- ai' i = l,...,m,
defined for general discrete and continuous random
j=l
vectors. It has been rigorously proved that Shah- n

non's entropy is the unique measure of uncertainty E


j=l
pj - - 1,
(up to a multiplicative constant) of a finite prob-
ability distribution that satisfies a set of axioms pj>_O, j=l,...,n.

14
Entropy optimization: Shannon measure of entropy and its properties

This is a convex programming problem with lin- 1) D(p, p0) should be nonnegative for all p and
ear constraints. The nonnegativity constraints are p0.

not binding for the optimal solution p* because 2) D(p, p0) _ 0 if and only if p - p0.
each pj can be expressed as an exponential func-
3) D ( p , p °) should be a convex function of
tion in terms of the Lagrange multipliers associ-
Pl, • • • ,Pn-
ated with the equality constraints. Note that, in
the absence of the moment constraints, the solu- 4) When D(p, p0) is minimized subject to too-
tion to the problem is the uniform probability dis- ment constraints but without the explicit
tribution, whose entropy is Inn. As such, the max- presence of the nonnegativity constraints, the
imum entropy principle can be viewed as an exten- resulting pj ~S should be nonnegative.
sion of the Laplace's principle of insufficient rea- Property 1 is desirable for any such measure of
soning. The distribution selected under the max- deviation. If property 2 were not satisfied, then
imum entropy principle has also been interpreted it would be possible to choose a vector p that
as one that is the 'most probable' in the sense that has a zero directed divergence from p 0 i.e., one
the maximum entropy distribution coincides with that is as 'close' to p0 as p0 itself, but differs
the frequency distribution that can be realized in from p0. Property 3 makes minimizing the mea-
the greatest number of ways [13]. An explanation sure much simpler, and property 4 spares us from
of this linkage in the context of the well-known explicitly considering n nonnegativity constraints.
application of entropy maximization in transporta- Fortunately, there are many measures that satisfy
tion planning can be found in [7]. these properties. We may even be able to find one
Recall that the above discussion was originally that satisfies the triangular inequality. But, sim-
motivated by the task of choosing a probability plicity of the measure is also desirable. The sim-
distribution among those that are consistent with plest and most important of those measures is the
some given moments. Now, in addition to the mo- Kullback-Leibler measure ([15, Chapt. 4]), defined
ment constraints, suppose that we have an a pri- as D(p, p0) _ Ejn_lpj ln(pj/p0), with the con-
ori probability distribution p0 that we think our vention that, whenever pj0 is 0 , pj is set to 0 and
probability distribution p should be close to. In 0 ln(0/0) is defined to be 0. This measure is also
fact, in the absence of the moment constraints, known as the c r o s s - e n t r o p y , relative entropy, di-
we would like to choose p0 for p because it is rected divergence or expected weight of evidence of
clearly the closest to p0. However, in the presence p with respect to p0. A. Hobson [11] provided an
of some moment constraints which p0 does not axiomatic characterization of c r o s s - e n t r o p y . He in-
satisfy, we need a precise defihition of 'closeness' terpreted D(p, p0) as the 'information in p relative
or 'deviation'. In other words, we need to define to p0,, and showed that the only function I(p, p0)
some sort of deviation or, more precisely, 'directed satisfying the following five properties has the form
d i v e r g e n c e ' [15] on the space of discrete probabil- of k E j n = l p j l n ( p j / p O ) , where k is a positive con-
ity distributions where the distribution is chosen stant:
from. Note that we deliberately avoid calling this 5) I(p, p0) is a continuous function of p and p0.
measure a 'distance'. This is because a distance
6) i(p, p0) is permutationally symmetric, i.e.,
measure should be symmetric and should satisfy
the measure does not change if the pairs of
the triangular inequality, but these two properties
(pj, pj0 ) are permuted among themselves.
are not important in this context. In fact, we can
be content with a 'one-way (asymmetric) deviation 7) I(p, p) - 0.
measure', D(p, p0), from p to p0. If a 'one-way de- 8) For any pair of integers n and
viation measure' from p to p0 is not satisfactory, no such that no > n > 0,
one can consider using a symmetric measure de- I ( 1 / n , . . . , 1 / n , O, . . . , O; 1 ~ n o , . . . , l/n0)
fined as the sum of D(p, p0) and D ( p °, p). What is an increasing function of no and
is desirable for this 'directed divergence' measure a decreasing function of n, where
includes the following properties" I ( 1 / n , . . . , 1 / n , O, . . . , O; 1 ~ n o , . . . , l/n0) de-

15
Entropy optimization: Shannon measure of entropy and its properties

notes the information obtained when the sequently for distributions defined on countably
number of equally likely possibilities is re- infinite and continuous sample spaces. The cor-
duced from no to n. responding forms become ~-]j~l PJ ln(pj/p°) and
9) f p(x) l n ( p ( x ) / p ° ( x ) ) d x , respectively. It has also
been derived rigorously as the unique measure
I(pl, . . . ,pn;pO, . . . ,pO) _ I(ql,q2; qO, qO)
of deviation of one probability distribution from
+q ±dp pO pO, another that satisfies a set of axioms considered
\ ~ . . . ~ ~ ~ ~ . . .

as necessity for any reasonable measure of devia-


o+1 tion, for both finite probability distributions [11]
\ q2 qO qOJ and continuous distributions [14]. Cross-entropy
for probability distributions with countably infi-
where 1 _< r <__ n, ql - P l + "'" + P r ,
nite sample space can be viewed and has been used
q2 -- P r + l + " " + P n , qO -- pO + . . . + pO,
qO_ Pr0+ l -t-- " " -t- pO. as a measure of deviation, although the justifica-
tion is not as strong as their finite-sample-space
Property 8 says, for example, that the informa- and continuous counterparts.
tion obtained upon reducing the number of equally With cross-entropy interpreted as a measure
likely sides on a die from 6 to 3 is greater than the of 'deviation', the Kullback-Leibler's principle
information obtained upon reducing the number of m i n i m u m cross-entropy, or M i n x E n t , can be
from 6 to 4. Property 9 says that one may give in- stated as follows [15]:
formation about the outcome associated with the
Out of all possible distributions that
random event either by specifying the probabilities
are consistent with the moment con-
p l , . . . ,Pn directly, or by specifying the probabili-
straints, choose the one that minimizes
ties ql and q2 first and then specifying the condi-
the cross-entropy with respect to the
tional probabilities P i / q l and Pi/q2.
given a priori distribution.
In addition to the nine properties discussed
above, we state the following desirable properties Mathematically, we consider the following opti-
for cross-entropy: mization problem"
10) D(p, p0) is convex in both p and p0. n

min H2 (p) - E PJ In pj0


11) D(p, p0) is not symmetric. j=l PJ
n
12) If p and q are independent and r and s
are also independent, then D ( p • q, r • s) -
s.t. Z pJ g' ( xJ ) - , i - 1,...,
j=l
D(p, r ) + D(q, s), where • denotes the con- n

volution operation between two independent


distributions.
E
j=l
pj -- 1,

13) In general, the triangular inequality does pj~O, j-1,...,n.


not hold. But, if distribution p minimizes Note that the nonnegativity constraints are not
D(p, p0) subject to some moment constraints binding, for the same reason as in the MaxEnt
and q is any other distribution that satis- problem. For a detailed discussion of the proper-
fies those same constraints, then D(q, p0) = ties of MinxEnt, the reader is referred to [23].
D(q, p ) + D(p, p0). Thus, in this special case, Note that, if there is no a priori informa-
the triangular inequality holds, but as an tion, then one may use the uniform distribu-
equality. tion, denoted by u, as the a priori distribu-
Kullback and Leibler's cross-entropy was also tion. In this case, D ( p , p °) - D(p,u) =
originally defined for probability distributions with ~-~jn_=1 pj l n ( p j / ( 1 / n ) ) - l n n + ~-]jn_=1 pj lnpj. Since
a finite sample space and can be interpreted as minimizing ~jn__l pj In pj is equivalent to maximiz-
a measure of deviation of one probability distri- ing -- y']On=lPJ l n p j , minimizing the cross-entropy
bution from another. It has been extended sub- with respect to the uniform distribution is equiv-

16
Entropy optimization: Shannon measure of entropy and its properties

alent to maximizing entropy and, therefore, Max- [2] BEN-TAL, A., TEBOULLE, M., AND CHARNES, A.: 'The
Ent is a special case of MinxEnt. These two princi- role of duality in optimization problems involving en-
tropy functionals with applications to information the-
ples can now be combined into a general principle:
ory', J. Optim. Th. Appl. 58 (1988), 209-223.
Out of all probability distributions sat- [3] BURCH, S.F., GULL, S.F., AND SKILLING, J.K.: 'Image
restoration by a powerful maximum entropy method',
isfying the given moment constraints,
Computer Vision, Graphics, and Image Processing 23
choose the distribution that minimizes (1983), 113-128.
the cross-entropy with respect to the [4] CLAUSIUS, R.: 'Ueber Verschiedene fur die Anwendung
given a priori distribution and, in the Bequeme Formen der Hauptgleichungen der Mecha-
absence of it, choose the distribution nischen Warmetheorie', Ann. Physik und Chemie 125
(1865), 353-400.
that minimizes the cross-entropy with
[5] COZZOLINO, J.M., AND ZAHNER,M.J.: 'The maximum
respect to the uniform distribution. entropy distribution of the future market price of a
stock', Oper. Res. 21 (1973), 1200-1211.
Both the MaxEnt and MinxEnt principles for
[6] ERLANDER, S.: 'Entropy in linear programming', Math.
selecting finite-sample-space probability distribu- Program. 21 (1981), 137-151.
tions and the MinxEnt principle for selecting con- [7] FANG, S.-C., RAJASEKERA, J.R., AND TSAO, H.-S.J.:
tinuous probability distributions can be axiomati- Entropy optimization and mathematical programming,
cally derived [22]. Under four consistency axioms, Kluwer Acad. Publ., 1997.
it was shown that the two principles are uniquely IS] GOLAN, A., JUDGE, G., AND MILLER, D.: Maximum
entropy econometrics: robust estimation with limited
correct methods for inductive inference when new
data, Wiley, 1996.
information is given in the form of expected values. [9] GUIASU, S.: Information theory with applications, Mc-
Many well-known and widely used distributions, Graw-Hill, 1977.
including the normal, gamma and geometric dis- [10] GUIASU, S.: 'Maximum entropy condition in queueing
tributions, can actually be derived as solutions to theory', J. Oper. Res. Soc. 37 (1986), 293-301.
some MaxEnt or MinxEnt problems [15]. [11] HOBSON, A.: Concepts in statistical mechanics, Gor-
don and Breach, 1987.
The maximum entropy principle has also been [12] JAYNES, E.T.: 'Information theory and statistical me-
shown to be a dual principle of the maximum chanics II', Phys. Rev. 108 (1957), 171-190.
likelihood principle for the exponential family of [13] JAVNES, E.T.: 'Prior probabilities', IEEE Trans. Syst.,
probability distributions in the sense that a dual Sci. Cybern. SSC-4 (1968), 227-241.
JOHNSON, R.W.: 'Axiomatic characterization of the di-
problem to the linearly constrained entropy max-
rected divergence and their linear combinations', IEEE
imization problem is equivalent to the problem of Trans. Inform. Theory 25 (1979), 709-716.
maximizing a likelihood function with respect to [15] KAPUtt, J.N., AND KESAVAN, H.K.: Entropy optimiza-
the parameters of an exponential family [2]. This tion principles with applications, Acad. Press, 1992.
principle has also been shown to be related to the [16] KHINCHIN, A.I.: Mathematical foundations of informa-
Bayesian parameter estimation problem [7]. Du- tion theory, Dover, 1957.
ality theory and major mathematical algorithms
[17] KULLBACK, S.: Information theory and statistics,
Dover, 1968.
for solving finite-dimensional MaxEnt or MinxEnt [lS] SCOTT, C.H., AND JEFFERSON, T.R.: 'Entropy maxi-
problems can be found in [7] and the references mizing models of residential location via geometric pro-
therein. gramming', Geographical Anal. 9 (1977), 181-187.
See also: Jaynes' maximum entropy princi- [19] SHANNON, C.E.: 'A mathematical theory of commu-
nication', Bell System Techn. J. 27 (1948), 379-423;
ple; Maximum entropy principle: Image re-
623-656.
construction; Entropy optimization: Param- [20] SHANNON, C.E., AND WEAVER, W.: The mathematical
eter estimation; Entropy optimization: Inte- theory of communication, Univ. Illinois Press, 1962.
rior point methods; Optimization in medical [21] SHORE, J.E.: 'Minimum cross-entropy spectral analy-
imaging. sis', IEEE Trans. Acoustics, Speech and Signal Process-
ing 29 (1981), 230-237.
[22] SHORE, J.E., AND JOHNSON, R.W.: 'Axiomatic deriva-
References tion of the principle of maximum entropy and the prin-
[1] BAIERLEIN, R.: 'How entropy got its name', Amer. J. ciple of minimum cross-entropy', IEEE Trans. Inform.
Phys. 60 (1992), 1151.

17
Entropy optimization: Shannon measure of entropy and its properties

Theory 26 (1980), 26-37. constraint qualification, then there exists an m-


[23] SHORE, J.E., AND JOHNSON, R.W.: 'Properties of dimensional vector )~* such that
cross-entropy minimization', IEEE Trans. Inform.
g(x*) - T _ 0,
Theory 27 (1981), 472-482.
[24] TRIBUS, M.: Rational descriptions, decisions, and de- = 0.
signs, Pergamon, 1969.
[25] TRIBUS, M.: 'An engineer looks at Bayes', in G.J. The vector A* is usually referred to as the vector
ERICKSON AND C.R. SMITH (eds.): Maximum-Entropy of Lagrange multipliers. For equality-constrained
and Bayesian Methods in Sci. and Engineering: Foun- problems, the K K T conditions are attributed to
dations, Vol. 1, Kluwer Acad. Publ., 1988, pp. 31-52.
J.L. Lagrange, and hence 'classical'. The acronym
[26] WILSON, A.G.: Entropy in urban and regional model-
ing, Pion, 1970. K K T arises from the more general results on
inequality-constrained problems provided by W.
Shu-Cherng Fang
North Carolina State Univ. Karush [3], H.W. Kuhn and A.W. Tucker [4], [5].
North Carolina, USA For an equality-constrained problem, the K K T
E-mail address: fang©cos .ncsu. e d u conditions state that x* must be feasible, i.e.,
H.-S. Jacob Tsao c(x*) = 0; and that the gradient must have zero
San Jose State Univ. projection onto the null space of the constraint
San Jose, California, USA
gradients, i.e., there exists a A* such that g(x*) -
E-mail address: jtsao©email, sjsu. edu
J(x*)TA*. In the case of linear equality constraints,
MSC 2000: 94A17, 90C25 i.e., c(x) = A x - b for some (m × n)-matrix A
Key words and phrases: entropy, cross-entropy, maximum and m-vector b, it follows that if x* is feasible,
entropy principle, minimum cross-entropy principle.
then x* + p is feasible if and only if Ap = O.
Hence, in this situation, if x* is a local minimizer,
it must hold that g(x*)Tp _ 0 for all p such that
EQUALITY-CONSTRAINED NONLINEAR
Ap = 0. But this is equivalent to the existence of a
PROGRAMMING: KKT NECESSARY OP-
)~* such that g(x*) - AT)~ *. Consequently, in the
TIMALITY C O N D I T I O N S , E Q N L P
case of linear constraints, the K K T conditions are
An equality-constrained nonlinear programming
necessary for x* to be a local minimizer to prob-
problem may be posed in the form
lem (1). Constraint qualifications essentially en-
rain f(x) sure that the linearization of c at x* provided by
• ea- (1) J(x*) adequately describes c in a neighborhood of
subject to c(x) - 0,
x*. A constraint qualification which is frequently
where f is a real-valued nonlinear function and c used is that J(x*) has rank m, i.e., that the gra-
is an m-vector of real-valued nonlinear functions dients of the constraints are linearly independent
with ith component ci(x), i = 1 , . . . , m. Normally, at x*. The related Fritz Yohn necessary optimality
with the term equality-constrained nonlinear pro- conditions are valid without any constraint quali-
gramming problem is meant a problem of the form fication.
(1) where f and c are sufficiently smooth, at least The K K T conditions are of fundamental im-
continuously differentiable. This will be assumed portance, not only from a theoretical point of
throughout this discussion, with the gradient of view, but also algorithms for solving equality-
f ( x ) denoted by g(x) and the m × n Jacobian of constrained nonlinear programming problems are
c(x) denoted by J(x). often based on finding a solution to the K K T con-
Of fundamental importance for equality- ditions. In general, the K K T conditions are not
constrained optimization problems are the first or- sufficient for x* to be a local minimizer, but sec-
der necessary optimality conditions. These condi- ond order optimality conditions need be consid-
tions are often referred to as the K K T necessary ered. However, if c is affine and f is a convex func-
optimality conditions, or more briefly, the K K T tion on the feasible region, then the K K T condi-
conditions. The K K T conditions state that if x* tions are sufficient for x* to be a global minimizer.
is a local minimizer to (1) that satisfies a certain Detailed discussions on optimality conditions can

18
Equilibrium networks

be found in textbooks on nonlinear programming, EQUILIBRIUM NETWORKS


e.g., [1], [2], [6].
Many complex systems in which agents compete
As a simple example, consider the two-
for scarce resources on a network, be it a physi-
dimensional problem where f (x) = Xl and c(x) =
cal one, as in the case of congested urban trans-
(x~ + x22 - 1)/2. Then, the KKT conditions have
portation systems, or an abstract one, as in the
two solutions: ~ - (1,0) T together with ~ - 1,
case of certain economic and financial problems,
and ~ - ( - 1 , 0 ) T together with ~ - -1. How-
can be formulated and studied as network equilib-
ever, only ~ is a local minimizer (and in fact also
rium problems. Applications of network equilib-
a global minimizer).
rium problems are common in many disciplines, in
See also: I n e q u a l i t y - c o n s t r a i n e d nonlin-
particular, in operations research and management
ear o p t i m i z a t i o n ; Second o r d e r o p t i m a l i t y
science and in economics and engineering (cf. [17],
c o n d i t i o n s for n o n l i n e a r o p t i m i z a t i o n ; La-
[10]).
g r a n g i a n d u a l i t y : Basics; S a d d l e p o i n t the-
ory a n d o p t i m a l i t y conditions; F i r s t or- Network equilibrium problems as opposed to
d e r c o n s t r a i n t qualifications; Second o r d e r network optimization problems involve competi-
c o n s t r a i n t qualifications; K u h n - T u c k e r op- tion among the agents or users of the network sys-
t i m a l i t y conditions; R o s e n ' s m e t h o d , global tern. Moreover, network equilibrium problems are
c o n v e r g e n c e , a n d Powell's c o n j e c t u r e ; Re- governed by an underlying behavioral principle as
l a x a t i o n in p r o j e c t i o n m e t h o d s ; SSC mini- to the behavior of the agents as well as the equilib-
m i z a t i o n a l g o r i t h m s ; SSC m i n i m i z a t i o n al- rium conditions. For example, in congested urban
g o r i t h m s for n o n s m o o t h a n d s t o c h a s t i c op- transportation systems in which users seek to de-
timization. termine their cost minimizing routes of travel, the
equilibrium conditions, due to J.G. Wardrop [23]
References (see also [2] and [8]), state that, in equilibrium all
[1] BAZARAA, M.S., SHERALI, H.D., AND SHETTY, C.M.: used paths connecting an origin/destination pair
Nonlinear programming: Theory and algorithms, sec- will have minimal and equal user travel costs. On
ond ed., Wiley, 1993. the other hand, in the case of spatial price equi-
[2] BERTSEKAS, D.P.: Nonlinear programming, Athena
librium patterns one seeks to determine the com-
Sci., 1995.
[3] KARUSH, W.: 'Minima of functions of several variables modity production, trade, and consumption pat-
with inequalities as side constraints', Master's Thesis tern satisfying the equilibrium condition, due to S.
Dept. Math. Univ. Chicago (1939). Enke [9] and P.A. Samuelson [20], that expresses
[4] KUHN, H.W.: 'Nonlinear programming: A historical that there will be trade between a pair of spatially
note', in J.K. LENSTRA, A.H.G. RINNOOY KAN, AND
separated supply and demand markets provided
A. SCHRIJVER (eds.): History of Mathematical Pro-
the supply price of the commodity at the supply
gramming: A Collection of Personal Reminiscences, El-
sevier, 1991, pp. 82-96. market plus the unit cost of transportation associ-
[5] KUHN, H.W., AND TUCKER, A.W.: 'Nonlinear pro- ated with shipping the commodity is equal to the
gramming', in J. NEYMAN (ed.): Proc. Second Berke- demand price of the commodity at the demand
ley Syrup. Math. Stat. Probab., Univ. Calif. Press, 1951, market; if the supply price plus the transportation
pp. 481-492.
cost exceed the demand price, then there will be
[6] NASH, S.G., AND SOFER, A.: Linear and nonlinear
programming, McGraw-Hill, 1996. no trade between this pair of supply and demand
markets.
Anders Forsgren
Royal Inst. Technol. (KTH) M.J. Beckmann, C.B. McGuire, and C.B. Win-
Stockholm, Sweden sten [2] initiated the systematic study of network
E-mail address: a n d e r s f ~ m a t h . k t h , se equilibrium problems in the general setting of traf-
MSC2000: 49M37, 65K05, 90C30 tic networks and demonstrated that the equilib-
Key words and phrases: equality-constrained optimization, rium flow pattern satisfying the traffic network
KKT necessary optimality conditions. equilibrium conditions (see also [23]), under cer-
tain symmetry assumptions on the underlying

19
Equilibrium networks

functions, could be reformulated as the solution p , q , . . , the paths. Assume that there are J ori-
to an optimization problem. Samuelson [20], fol- gin/destination (O/D) pairs, with a typical O/D
lowing [9], had made a similar connection but in pair denoted by w, and n modes of transporta-
the more specialized context of spatial price equi- tion on the network with typical modes denoted
librium problems on networks that were bipartite. by i,j, ....
M.J. Smith [22] later proposed an alternative The flow on a link a generated by mode i is de-
formulation of traffic network equilibrium condi- noted by f~, and the user cost associated with trav-
tions which were then identified by S.C. Dafermos cling by mode i on link a is denoted by ca. i Group
[3] to satisfy a finite-dimensional variational in- the link flows into a column vector f E R nL, where
equality problem. This connection allowed for the L is the number of links in the network. Group the
relaxation of the symmetry assumption and, conse- link costs into a row vector c E R nL. Assume that
quently, for the construction of more realistic mod- the user cost on a link and a particular mode may,
els (cf. [17], [21], and the references therein). in general, depend upon the flows of every mode
Other network equilibrium applications whose on every link in the network, that is,
study and understanding have benefited from this =

methodology (cf. [10], [14], [17], [19]), include:


spatial price equilibrium problems (see, e.g., [11], where c is a known smooth function.
[15]), oligopolistic market equilibrium problems The travel demand of users of mode i travel-
([7], [12], [13]), migration equilibrium problems (cf. ing between O/D pair w is denoted by d / and the
[16], [18]), and general economic equilibrium prob- travel disutility associated with traveling between
lems (cf. [5]). this O/D pair using the mode is denoted by )~.
Here we present two examples of network equi- Group the demands into a vector d E R nJ.
librium problems for illustrative purposes with the The flow on path p due to mode i is denoted
first example being a multimodal/multiclass trans- i Group the path flows into a column vector
by Xp.
portation network equilibrium problem in which x E R nQ, where Q denotes the number of paths in
the network is a physical one whereas the sec- the network.
ond problem is a multiclass migration equilibrium The conservation of flow equations are as fol-
problem which is isomorphic to a specially struc- lows. The demand for a mode and O/D pair must
ture multiclass traffic network equilibrium prob- be equal to the sum of the flows of the mode on
lem. the paths joining the O/D pair, that is,
Additional background, models and applica-
tions, qualitative results, as well as computational d~- E /
Xp, Vi, Vw,
procedures and references can be found in [17] and pE P~
[10] where Pw denotes the set of paths connecting w.
A nonnegative path flow vector x which satisfies
the demand constraint is termed feasible. More-
A M u l t i m o d a l Traffic N e t w o r k E q u i l i b r i u m
over, we must have that
M o d e l . We now present a multimodal traffic net-
work equilibrium model (cf. [3], [4], [6]). The Xp ap,
model is a fixed demand model in that the de- P
mands associated with traveling between the ori-
that is, for each mode, the link load associated
gin/destination pairs are assumed known. See [17]
with a mode is equal to the sum of the path flows
for additional background, as well as elastic de-
of that mode on paths that utilize that link.
mand traffic network equilibrium models and other
A user traveling on path p using mode i incurs
network equilibrium problems.
a user (or personal) travel cost C~ satisfying
Consider a general network N = [G, A], where
N denotes the set of nodes and A the set of c;- i ~ap,
directed links. Let a,b,c,.., denote the links, a

20
Equilibrium networks

in other words, the cost on & path p due to mode i holds, then the variational inequality problem can
is equal to the sum of the link costs of links com- be reformulated as the solution to an optimization
prising that path and using that mode. problem. This symmetry assumption, however, is
The traffic network equilibrium conditions are not expected to hold in most applications. Conse-
given below. quently, the variational inequality problem which
is the more general problem formulation is needed.
DEFINITION 1 (multimodal traffic network equilib-
For example, the symmetry condition essentially
rium) ([2], [3], [4]) A link load pattern f* satisfying
says that the flow on link b due to mode j should
the feasibility conditions is an equilibrium pattern,
affect the cost of mode i on link a in the same
if, once established, no user has any incentive to
manner that the flow of mode i on link a affects
alter his travel arrangements. This state is char-
the cost on link b and mode j. In the case of a sin-
acterized by the following equilibrium conditions,
gle mode problem, the symmetry condition would
which must hold for every mode i, every O/D pair
imply that the cost on link a is affected by the flow
w, and every path p E Pw:
on link b in the same manner as the cost on link b
Cp -)~i if Xpi * >0, is affected by the flow on link a.
>,~ if Xpi * =0,
A Migration Network Equilibrium Model.
where ~/w is the equilibrium travel disutility asso- Human migration is a topic that has been studied
ciated with the O/D pair w and mode i. [::] not only by economists, but also by demographers,
We now define the feasible set K as sociologists, and geographers. Here a model of hu-
man migration is described, which is shown to have
a simple, abstract network structure in which the
K- I f" the demand3xconstraints
>_0, and I . links correspond to locations and the flows on the
the link load constraints hold links to populations of a particular class at the par-
One can verify (see [3]) that the variational in- ticular location. Hence, the model is isomorphic to
equality governing equilibrium conditions for this the traffic network equilibrium problem just de-
model would be given as in the subsequent theo- scribed on a network with special structure. For
rem. additional details, see [16], [17], [18].
Assume a closed economy in which there are n
THEOREM 2 (variational inequality formulation) locations, typically denoted by i, and J classes,
A vector f* E K is an equilibrium pattern, if and typicaUy denoted by k. Assume further that the
only if, it satisfies the variational inequality prob- attractiveness of any location i as perceived by
lem
class k is represented by a utility u ki . Let ~k de-
(c(]*), f - f*) > 0, v f E K. note the fixed and known population of class k in
the economy, and let pk denote the population of
[2 class k at location i. Group the utilities into a row
Note that this variational inequality is in link vector u E I:tgn and the populations into a column
loads. One can also derive a variational inequal- vector p E R Jn. Assume no births and no deaths
ity problem in path flows (see also [1], [4], [17]). in the economy.
Existence of an equilibrium f* follows from the The conservation of flow equation for each class
standard theory of variational inequalities (cf. [14]) k is given by
solely from the assumption that c is continuous, n

since the feasible set K is now compact.


i----1
In the special case where the symmetry condi-
tion where pk >_ 0, k - 1 , . . . , J; i - 1 , . . . , n . Let

( p >_ 0 and satisfy the }


cog = Of~ ' Vi,j; a, b, K- P" conservation of flow equation "

21
Equilibrium networks

The conservation of flow equation expresses with overpopulation, such as congestion, increased
that the population of each class k must be con- crime, competition for scarce resources, etc.
served in the economy. As illustrated in [17], the above migration model
DEFINITION 3 (migration equilibrium) Assume is equivalent to a network equilibrium model
that the migrants are rational and that migration with a single origin/destination pair and fixed de-
will continue until no individual of any class has mands. Indeed, one can make the identification
any incentive to move since a unilateral decision as follows. Construct a network consisting of two
will no longer yield an increase in the utility. Math- nodes, an origin node 0 and a destination node
ematically, hence, a multiclass population vector 1, and n links connecting the origin node to the
p* C K is said to be in equilibrium if for each class destination node. Associate with each link i, J
k,k=l,...,J: costs: --~t i1~. . . ~u J and link flows represented by
p~,... ,pJ. This model is, hence, equivalent to a
k{ -'~k ifp/k* > 0
multimodal traffic network equilibrium model with
ui <_ )~k ifpk* _ O.
fixed demand for each mode, consisting of a sin-
D gle origin/destination pair, and J paths connecting
the O/D pair. Note that one can make J copies of
The equilibrium conditions express that for a given
the network, in which case, each ith network will
class k only those locations i with maximal util-
correspond to class i with the cost functions on
ity will have a positive population volume of the
the links defined accordingly. This identification
class. Moreover, the utilities for a given class are
enables us to immediately write down the follow-
equilibrated across the locations.
ing:
THEOREM 4 (variational inequality formulation)
A population pattern p* C K is in equilibrium,
if and only if it satisfies the variational inequality
--u~, "'" , _ U lJ ~ U n1~ ' ' " _u J
problem:

- p*> > 0, VpeK.


1
[::]

Existence of an equilibrium then follows from the


standard theory of variational inequalities, since
fil __ E i n= l p l ~ fi- J ___ E i n: I P Ji
the feasible set K is compact, assuming that the
utility functions are continuous. Uniqueness of the
equilibrium population pattern also follows from
Fig. 1: Network equilibrium formulation of a multiclass the standard theory provided that the - u func-
migration equilibrium model. tion is strictly monotone. The interpretation of
We now discuss the utility functions. Assume this monotonicity condition in the context of ap-
that, in general, the utility associated with a par- plications is that condition implies that the utility
ticular location as perceived by a particular class, associated with a given class and location is ex-
may depend upon the population associated with pected to be a decreasing function of the popula-
every class and every location, that is, assume that tion of that class at that location.
See also: Spatial price e q u i l i b r i u m ; Traffic
u =
network equilibrium; Oligopolistic m a r k e t
Note that in allowing the utility to depend upon e q u i l i b r i u m ; Walrasian price e q u i l i b r i u m ;
the populations of the classes, we are using pop- Financial equilibrium; Generalized mono-
ulations as a proxy for amenities associated with tonicity: Applications to variational in-
a particular location. Such a utility function can equalities and e q u i l i b r i u m p r o b l e m s ; Mini-
also model the negative externalities associated m u m cost flow p r o b l e m ; N o n c o n v e x network

22
Equivalence between nonlinear complementarity problem and fixed point problem

flow problems; Network location: Covering metrics and Management Sci., North-Holland, 1980,
problems; Maximum flow problem; Short- pp. 271-294.
est path tree algorithms; Steiner tree prob-
[13] HAURIE, A., AND MARCOTTE, P.: 'On the relationship
between Nash-Cournot and Wardrop equilibria', Net-
lems; Survivable networks; Directed tree works 15 (1985), 295-308.
networks; Dynamic traffic networks; Auc- [14] KINDERLEHER, D., AND STAMPACCHIA, G.: An intro-
tion algorithms; Piecewise linear network duction to variational inequalities and their applica-
flow problems; Communication network as- tions, Acad. Press, 1980.
signment problem; Generalized networks; [15] NAGURNEY, A.: 'Computational comparisons of spa-
tial price equilibrium methods', J. Reg. Sci. 27 (1987),
Evacuation networks; Network design prob- 55-76.
lems; Stochastic network problems: Mas- [16] NAGURNEY, A.: 'Migration equilibrium and variational
sively parallel solution. inequalities', Economics Left. 31 (1989), 109-112.
[17] NAGURNEY, A.: Network economics: A variational in-
References equality approach, second ed., Kluwer Acad. Publ.,
[1] AASHTIANI, H.Z., AND MAGNANTI, T.L.: 'Equilibria 1999.
on a congested transportation network', SIAM J. Alg. [IS] NAGURNEY, A., PAN, J., AND ZHAO, L.: 'Human mi-
Discrete Meth. 2 (1981), 213-226. gration networks', Europ. J. Oper. Res. (1991).
[2] BECKMANN, M.J., McGumE, C.B., AND WINSTEN, [19] PATRIKSSON, M.: The traffic assignment problem, VSP,
C.B.: Studies in the economics of transportation, 1994.
Yale Univ. Press, 1956. [20] SAMUELSON, P.A.: 'A spatial price equilibrium and lin-
[3] DAFERMOS, S.: 'Traffic equilibrium and variational in- ear programming', Amer. Economic Rev. 42 (1952),
equalities', Transport. Sci. 14 (1980), 43-54. 283-303.
[4] DAFERMOS, S.: 'The general multimodal network equi- [21] SHEFFI, Y.: Urban transportation networks, Prentice-
librium problem with elastic demand', Networks 14 Hall, 1985.
(1982), 43-54. [22] SMITH, M.J.: 'The existence, uniqueness, and stabil-
[5] DAFERMOS, S.: 'Exchange price equilibria and varia- ity of traffic equilibria', Transport. Res. 13B (1979),
tional inequalities', Math. Program. 46 (1990), 391- 259-304.
402. [23] WARDROP, J.G.: 'Some theoretical aspects of road traf-
[6] DAFERMOS, S., AND NAGURNEY, A.: 'Stability fic research', Proc. Inst. Civil Engineers II (1952), 325-
and sensitivity analysis for the general network 378.
equilibrium-travel choice model', in J. VOLMULLER
Anna Nagurney
AND R. HAMERSLAG (eds.): Proc. 9th Internat. Syrup.
Univ. Massachusetts
Transportation and Traffic Theory, VNU Sci. Press,
Amherst, Massachusetts 01003, USA
1984, pp. 217-234.
[7] DAFERMOS, S., AND NAGURNEY, A.: 'Oligopolistic and E-mail address: nagurney~gbfin, umass, e d u
competitive behavior of spatially separated markets', MSC 2000:90C30
Regional Sci. and Urban Economics 17 (1987), 245- Key words and phrases: traffic network equilibrium, spatial
254. price equilibrium, migration equilibrium, multimodal net-
[8] DAFERMOS, S., AND SPARROW, F.T.: 'The traffic as- works, multiclass migration.
signment problem for a general network', J. Res. Nat.
Bureau Standards 73B (1969), 91-118.
[9] ENKE, S.: 'Equilibrium among spatially separated mar-
kets: solution by electronic analogue', Econometrica 10 EQUIVALENCE BETWEEN NONLINEAR
(1951), 40-47. COMPLEMENTARITY PROBLEM AND
[10] FLORIAN, M., AND HEARN, D.: 'Network equilibrium FIXED POINT PROBLEM
models and algorithms', in M.O. BALL, T.L. MAC-
Complementarity theory is a new domain of ap-
NANTI, C.L. MONMA, AND G.L. NEMHAUSER (eds.):
Network Routing, Vol. 8 of Handbook Oper. Res. and plied mathematics strongly related to Linear Anal-
Management Sci., Elsevier, 1995, pp. 485-550. ysis, Nonlinear Analysis, Topology, Variational
[11] FLORIAN, M., AND LOS, M.: 'A new look at static spa- Inequalities Theory, Ordered Topological Vector
tial price equilibrium models', Regional Sci. and Urban Spaces, Numerical Analysis etc. The main goal
Economics 12 (1982), 579-597.
in this theory is the study of complementarity
[12] GABAY, D., AND MOULIN, n.: 'On the uniqueness and
stability of Nash-equilibria in noncooperative games',
problems. It is well known that complementarity
in A. BENSOUSSAN, P. KLEINDoRFER, AND C.S. problems encompass a variety of practical prob-
TAPIERO (eds.): Applied Stochastic Control in Econo- lems arising in: Optimization, Structural Mechan-

23
Equivalence between nonlinear complementarity problem and fixed point problem

ics, Elasticity, Economics etc. [8]. The relation THEOREM 1 For every x C H, P g ( x ) is character-
between the general nonlinear complementarity ized by the following property:
problem and the fixed point problem it seems to 1) < P K ( x ) - x, y> >_ 0 for all y C K;
be remarkable. The main aim of this article is the
2) (PK(x) -- x , x } = O.
study of this relation.
D
P r e l i m i n a r i e s . Let E, E* be a pair of real locally PROOF. A proof of this theorem is in [20]. [i]
convex spaces. The space E* can be the topologi- Very useful is also the following classical Moreau's
cal dual of E. Let (., .>be a bilinear form on E × E* theorem"
satisfying the separation axioms:
THEOREM 2 If K C H is a closed convex cone
81) (X0,y> - 0 for all y C E* implies x0 - 0; and x, y, z C H, then the following statements are
s2) (x, Y0> - 0 for all x E E implies Y0 - 0. equivalent"
The triplet (E, E*, <., .>) is called a dual system or i) z - x + y , xCK, y E K ° and < x , y ) - 0 ;
a duality (denoted by (E, E*>). In practical prob- ii) x - PK(Z) and y - PK0(Z).
lems, the space E can be a Banach space and E*
E]
its topological dual and <x, y> = y(x) for all x C E
and y e E*. When E is a Hilbert space (H, (.,.>) PROOF. For the proof the reader is referred to
or the Euclidean space (R n, (., .>) we have that H* [16]. [:3
(respectively, (Rn) *) is isomorphic to H (respec- W e say that the closed pointed convex cone K C H
tively, to Rn). Let (E,E*> be a dual system of is isotone projection if and only if, for every x, y E
locally convex spaces. Denote by K a pointed con- H such that y - x E K we have P K ( y ) - - P K ( x ) E K.
vex cone in E, i.e., a subset of E satisfying the This remarkable class of cones has been studied in
following properties: several papers (see for example [13]). We say that
1) K + K C_K; a closed pointed convex cone K C H is a Galerkin
cone if there exists a family of convex subcones
2) AK C_ K for all A E R + (the set of nonnega-
{Kn}neN of K such that:
tive real numbers); and
1) Kn is a locally compact cone, for every n C
3) K M ( - K ) - {0}.
N;
The closed convex cone 2) if n _< m, then Kn C_ Kin;
K*-{yCE*" (x,y> > _ 0 f o r a l l x E K } 3) K - UncNKn.

is called the dual of K. The polar of K is K ° = We denote a Galerkin cone by K(Kn)n~N.


- K * . Given the pointed convex cone K C E we For more information about the application of
denote by < the ordering defined on E by K, i.e., Galerkin cones in complementarity theory, we in-
x _< y if and only if y - x K . In some situations, E is dicate the papers [7], [8], [10], [11], [12], [13] and
a vector lattice with respect to this ordering, i.e., [14],
for every pair x, y E E there exist inf(x, y) (de-
noted by x A y ) a n d sup(x, y) (denoted by z V y). Nonlinear Complementarity P r o b l e m . Let
We say that the bilinear form (., .> is K-local if (E, E*) be a dual system of locally convex spaces
(x,y) - 0, whenever x , y C K and x A y -- 0. and K C E a pointed convex cone. Given the map-
Let (H, (.,.>) be a Hilbert space and K C H ping f : K -+ E*, the nonlinear complementarity
a closed pointed convex cone. It is known that problem associated to f and K is:
the projection operator onto K, denoted by PK
find x0 C K
is well defined [20] and for every x e H, PK(X) is
NLCP(f,K) s.t. f ( x o ) E K*
the unique element of K satisfying ]ix - PK(X)] I --
minyeK Iix -- Yi]. and (xo, f (xo)> - O.

24
Equivalence between nonlinear complementarity problem and fixed point problem

Given two mappings f" K --+ E* and g" K ~ E PROOF. Suppose that x0 is a fixed point for the
the implicit complementarity problem is: mapping (I), i.e.,

find x0 E K zo -- P K ( z o ) - - f

ICP(f,g,g) s.t. g(xo) E K, f (xo) E K* If we denote by x, = Pg(x0), we have that x, C K


and xo = x, - f ( x , ) , or x, - x0 = f(x,). Apply-
and (g(xo), f (xo)) - O.
ing Theorem 1 we can show that f ( x , ) E K* and
The problem N L C P ( f , K ) is important in op- (x,, f ( x , )) = 0, i.e., x, is a solution of the problem
timization, Economics, mechanics, engineering, NLCP(f, K).
game theory, etc. [8]. The problem I C P ( f , g , K ) Conversely, if x, C K is a solution of the prob-
was defined in relation with the study of some lem NLCP(f, K), then denoting by xo - x , - f ( x , )
problems in stochastic optimal control [8]. The and applying Theorem 2 we deduce that PK (x0) =
problems NLCP(f, K), ICP(], g, K) can be solv- x, and finally,
able or unsolvable.
- PK( 0)- f(PK( o))
Solvability By Fixed Points Theorems. Given
a topological space X and a mapping f : X --+ X, i.e., x0 is a fixed point of (I). [-7
the fixed point problem is to know under what The mapping, (I) defined in Theorem 4 was ap-
conditions there exists a point x, E X such that plied in complementarity theory in 1988, [7], while
f ( x , ) = x,. This problem is studied in the Fixed the mapping ~(x) = x - (I)(x) was used in 1992
Point Theory, which is a very popular domain in [19]. The mapping • is known as the normal map.
Nonlinear Analysis. In particular the Fixed Point By Theorem 3 the NLCP(f, K) is transformed in
Theory has been used by several authors in the a fixed point problem for the mapping T with re-
study of solvability of the problem NLCP(f, K). spect to the cone K while, by Theorem 4 the prob-
The results obtained in this sense, are based on lem N L C P ( f , K ) is transformed in a fixed point
some equivalences between N L C P ( f , K ) and the problem with respect to the whole space H. Sev-
fixed point problem. Let (H, (.,.)) be a Hilbert eral existence results for the problem NLCP(f, K)
space, K C H a pointed closed convex cone and have been obtained by several authors using the
f : K --4 H a mapping. fixed point theory and the mappings T and (I), [6],
[7], [8], [10], [13], [3]. The fixed point problem as-
THEOREM 3 The element x, E K is a solution of
sociated to the mappings T and (I) has been also
the problem NLCP(f, K) if and only if x, is a fixed
used in several iterative methods for solving nu-
point in K for the mapping T ( x ) = P K ( X - f ( x ) ) .
merically the problem N L C P ( f , K ) [1], [8], [13],
O
[17], [18] etc.
PROOF. Suppose that x, E K is a solution of In [15] and also in [2] it is shown that the
the problem N L C P ( f , K ) . We can show that x, problem NLCP(f, K) is equivalent to the follow-
satisfies properties 1), 2), of Theorem 1 for x = ing variational inequality
z,-/(x,).
find x EK
Conversely, if x, E K and x, = P K ( x , - f ( x , ) ) ,
then since P K ( x , - f ( x , ) ) satisfies properties 1), VI(f, K) s.t. ( f (x), y - x} >_ 0
2) of Theorem 1 we deduce that x,is a solution of for all y E K.
the problem NLCP(f, K). [i] Because, the fixed point theory is systematically
THEOREM 4 The problem NLCP(f, K) has a so- applied to the study of variational inequalities, we
lution if and only if the mapping (I)(x) = PK (x) -- have by this way another possibility to use the
f(PK (x)), defined for every x E H, has a fixed fixed point theory in the study of the problem
point in H. Moreover, if x0 is a fixed point of (I), NLCP(f, K). In this sense are relevant the results
then x, = PK(X0) is a solution of the problem obtained in [5], [7], [8], [12] and in many other pa-
NLCP(f, g ) . [:] pers dedicated to the study of variational inequal-

25
Equivalence between nonlinear complementarity problem and fixed point problem

ities. In the study of some economical problems, which implies that h ( x , ) = x,. M
we are interested to find a solution of the problem We note that Theorem 5 was applied to obtain new
NLCP(f, K) which is also the least element of the fixed point theorems [7], [10], [11]. We cite only the
feasible set following two fixed point theorems.
F = {x e K : f ( x ) e K*}. THEOREM 6 Let (H, (., .>) be a Hilbert space or-
This particular problem can be also studied by the dered by a Galerkin cone K(K)n~N. Let T : K --+
fixed point theory [5], [8]. If the cone K is an iso- K be a mapping satisfying the following assump-
tone projection cone in a Hilbert space H and if the tions:
mapping f : H --+ H satisfies some properties with 1) T(0) 0;
respect to the ordering defined by K, we obtain 2) T is a (ws)-compact operator;
that the mappings T and (I) are monotone increas-
3) T is C-asymptotically bounded, with
ing or the difference of two monotone increasing
l i m t _ ~ ¢ ( t ) # +c~.
mappings. In this case, we can apply some fixed
point theorems based on the ordering, to study of Then, T has a fixed point x, E K \ {0}. More-
the problem NLCP(f, K). Several results in this over, x, is the limit of a sequence {Xm}meN where
sense are presented in [13]. for every m E N , Xm is a solution of the problem
NLCP(T, Kin). [3
The Nonlinear Complementarity Problem
PROOF. The terminology and the proof is in [7].
As a M a t h e m a t i c a l Tool In F i x e d P o i n t T h e - [-]
ory. The fixed point theorems on cones attracted
Recently, a new proof for this theorem was pro-
the attention of many mathematicians. The appli-
posed in [14].
cations of such kind of fixed point theorems are
very important. We will show now how the prob- THEOREM 7 Let (H, (., .>) be a Hilbert space or-
lem NLCP(], K) can be used to obtain new fixed dered by a Galerkin cone K(K)n~N C H. Suppose,
point theorems on cones. given two continuous operators S, T: K --+ H such
Let H be a Hilbert space, K C H a closed that S is bounded, T is compact and ( S + T ) ( K ) C_
pointed convex cone and h" K --+ K a mapping. K. If the following assumptions are satisfied:
The fixed point problem associated to h and K is: 1) I - S satisfies condition (S)+;

FP(h,K) ~find x0 E K 2) I - S - T satisfies condition (GM),

Ls.t. h(xo) = xo. then S + T has a fixed point in K. U]

Consider the mapping f" K ~ H defined by PROOF. The terminology and the proof is in [11].
f ( x ) - x - h(x) for all x C K. [3
THEOREM 5 The problems NLCP(f,K) and We note that Theorem 7 has several interesting
FP (h, K) are equivalent. [3 corollaries. In [10] the reader can find other fixed
point theorems for set-valued operators.
PROOF. Suppose that x, is a solution of the
problem FP(h, K). In this case we have h ( x , ) -
C o n c l u s i o n s . This interesting double relation be-
x,, which implies that f ( x , ) - O. It is evident
tween the nonlinear complementarity problem and
that x, is a solution of the problem N L C P ( f , K).
the fixed point theory, can be exploited to obtain
Conversely, if x, is a solution of the prob-
new results in complementarity theory and also in
lem N L C P ( f , K ) we have that x, is a solu-
fixed point theory.
tion of the problem V I ( f , K ) , i.e., x, E K and
See also" P r i n c i p a l p i v o t i n g m e t h o d s for
<f(x,),y-x,) >_ 0 for all y C K. But f ( x , ) -
linear c o m p l e m e n t a r i t y problems; Linear
x , - h ( x , ) and h ( x , ) e g (by hypothesis). This
complementarity problem; Convex-simplex
means that
algorithm; Sequential simplex method;
0 <__ <x, - h ( z , ) , x , - h ( x , ) } _ 0, P a r a m e t r i c l i n e a r p r o g r a m m i n g " C o s t sim-

26
Estimating d a t a / o r multicriteria decision making problems: Optimization techniques

plex algorithm; Linear programming; Lemke C.R. Acad. Sci. Paris 225 (1962), 238-240.
method; Integer linear complementary [17] NOOR, M.A.: 'Fixed point approach for complementar-
problem; LCP: Pardalos-Rosen mixed in- ity problems', J. Math. Anal. Appl. 133 (1988), 437-
448.
teger formulation; Order complementar-
[181 NooR, M.A.: 'Iterative methods for a class of comple-
ity; Generalized nonlinear complementarity mentarity problems', J. Math. Anal. Appl. 133 (1988),
problem; Topological methods in comple- 366-382.
mentarity theory. [19] ROBINSON, S.M.: 'Normal maps induced by linear
transformations', Math. Oper. Res. 17, no. 3 (1992),
691-714.
References [20] ZARANTONELLO, E.H.: 'Projection on convex sets in
[1] AHN, B.H.: 'Solution of nonsymmetric linear comple- Hilbert space and spectral theory', in E.H. ZARAN-
mentarity problems by iterative methods', J. Optim. TONELLO (ed.): Contributions to Nonlinear Functional
Th. Appl. 33, no. 2 (1981), 175-185. Analysis, Acad. Press, 1971, pp. 237-424.
[2] COTTLE, R.W.: Complementarity and variational
problems, Vol. 19, Amer. Math. Soc., 1976, pp. 177- George Isac
208. Royal Military College of Canada
[3] HYERS, D.H., ISAC, G., AND RASSIAS, T.M.: Topics in Kingston, Ontario, Canada
non-linear analysis and applications, World Sci., 1997. E-mail address: isac-gCrmc, ca
[4] ISAC, G.: 'On the implicit complementarity problem
MSC 2000:90C33
in Hilbert spaces', Bull. Austral. Math. Soc. 32, no. 2
Key words and phrases: nonlinear complementarity prob-
(1985), 251-260.
lem, fixed point problem.
[5] ISAC, G.: 'Complementarity problem and coincidence
equations o convex cones', Boll. Unione Mat. Ital. Set.
B 6 (1986), 925-943.
[6] ISAC, G.: 'Fixed point theory and complementarity ESTIMATING DATA FOR MULTICRITERIA
problems in Hilbert spaces', Bull. Austral. Math. Soc. DECISION MAKING PROBLEMS: OPTIMI-
36, no. 2 (1987), 295-310.
[7] ISAC, G." 'Fixed point theory, coincidence equations on"
ZATION TECHNIQUES
convex cones and complementarity problem', Contemp. One of the most crucial steps in many multicrite-
Math. 72 (1988), 139-155. ria decision making methods (MCDM) is the ac-
[8] ISAC, G.: Complementarity problems, Vol. 1528 of Lec- curate estimation of the pertinent data [18]. Very
ture Notes Math., Springer, 1992.
often these data cannot be known in terms of ab-
[9] ISAC, G.: 'Tihonov's regularization and the comple-
solute values. For instance, what is the worth of
mentarity problem in Hilbert spaces', J. Math. Anal.
Appl. 174, no. 1 (1993), 53-66. the ith alternative in terms of a political impact
[10] ISAC, G.: 'Fixed point theorems on convex cones, gen- criterion? Although information about questions
eralized pseudo-contractive mappings and the comple- like the previous one is vital in making the cor-
mentarity problem', Bull. Inst. Math. Acad. Sinica 23, rect decision, it is very difficult, if not impossible,
no. 1 (1995), 21-35.
to quantify it correctly. Therefore, many decision
[11] ISAC, G.: 'On an Altman type fixed point theorem
on convex cones', Rocky Mountain J. Math. 25, no. 2 making methods attempt to determine the rela-
(1995), 701-714. tive importance, or weight, of the alternatives in
[12] ISAC, G., AND GOELEVEN, D.: 'Existence theorems terms of each criterion involved in a given decision
for the implicit complementarity problem', Internat. J. making problem.
Math. and Math. Sci. 16, no. 1 (1993), 67-74.
Consider the case of having a single decision
[13] ISAC, G., AND NEMI~TH, A.B" 'Projection meth-
ods, isotone projection cones and the complementar-
criterion and a set of n alternatives, denoted as
ity problem', J. Math. Anal. Appl. 153, no. 1 (1990), Ai (for i = 1 , . . . , n ) . The decision maker wants
258-275. to determine the relative performance of these al-
[14] JACHYMSKI, J.: 'On Isac's fixed point theorem for self- ternatives in terms of a single criterion. An ap-
maps of a Galerkin cone', Ann. Sci. Math. Qudbec 18, proach based on pairwise comparisons which was
no. 2 (1994), 169-171.
[15] KARAMARDIAN, S.: 'Generalized complementarity proposed by T.L. Saaty [Ii], and [12] has long at-
problem', J. Optim. Th. Appl. 8 (1971), 161-168. tracted the interest of many researchers, because
[16] MOREAU, J." 'D~composition orthogonale d'un espace both of its easy applicability and interesting math-
hilbertien selon deux cones mutuellement polaires', ematical properties. Pairwise comparisons are used

27
Estimating data for multicriteria decision making problems: Optimization techniques

to determine the relative importance of each alter- The second problem in this article is how to esti-
native in terms of each criterion. mate missing comparisons. The third problem is
how to select the order for eliciting the compar-
In that approach the decision maker has to ex-
isons and determine whether all comparisons are
press his/her opinion about the value of one single
needed. These problems are examined in detail in
pairwise comparison at a time. Usually, the deci-
the following sections.
sion maker has to choose his/her answer among
10-17 discrete choices. Each choice is a linguistic
phrase. Some examples of such linguistic phrases E x t r a c t i o n of R e l a t i v e Priorities from Com-
when two concepts, A and B are considered might plete Pairwise Matrices. Let A 1 , . . . , A n be n
be: 'A is more important than B', or 'A is of the alternatives (or criteria or, in general, concepts)
same importance as B', or 'A is a little more im- to be compared. We are interested in evaluat-
portant than B', and so on. When one focuses di- ing the relative preference values of the above
rectly on the data elicitation issue one may use lin- concepts. Saaty [11], [12], [14] proposed to use
guistic statements such as 'How much more does a matrix A of rational numbers taken from the
alternative A belong to the set S than alternative set {1/9, 1/8, 1 / 7 , . . . , 1 , . . . , 9 } . Each entry of the
B'? above matrix A represents a pairwise judgment.
Specifically, the entry aij denotes the number that
The main problem with the pairwise compar-
estimates the relative preference of element Ai
isons is how to quantify the linguistic choices se-
when it is compared with element Aj. Obviously,
lected by the decision maker during the evalua-
aij - 1/aji and aii - 1. That is, the matrix is
tion of the pairwise comparisons. All the meth-
reciprocal.
ods which use the pairwise comparisons approach
eventually express the qualitative answers of a de- The Eigenvalue Approach. Let us first examine the
cision maker into some numbers. case in which it is possible to have perfect values
Pairwise comparisons are quantified by using "aij. In this case it is aij - W i / W j ( W s denotes

a scale. Such a scale is nothing but an one-to- the actual value of element s) and the previous
one mapping between the set of discrete linguis- reciprocal matrix A is consistent. That is:
tic choices available to the decision maker and aij -- aik × akj for i, j, k - 1 , . . . , n, (1)
a discrete set of numbers which represent the
importance, or weight, of the previous linguistic where n is the number of elements in the compari-
choices. There are two major approaches in de- son set. It can be proved [12] that the matrix A has
veloping such scales. The first approach is based rank 1 with n to be its nonzero eigenvalue. Thus,
on the linear scale proposed by Saaty [12] as part we have:
of the analytic hierarchy process (AHP). The sec- A x = nx, (2)
ond approach was proposed by F. Lootsma [8], [9],
[10] and determines exponential scales. Both ap- where x is an eigenvector. From the fact that
proaches depart from some psychological theories aij - W i / W j , the following are obtained:
and develop the numbers to be used based on these n n
psychological theories. For an extensive study of EaijWj - E Wi-nWi' i- l,...,n, (3)
the scale issue, see [18] and [19]. j=l j=l

In this article we examine three problems re- or

lated to the use of pairwise comparisons for data A W = nW. (4)


elicitation in MCDM. The first problem is how to
combine the n ( n - 1)/2 comparisons needed to Equation (4) states that n is an eigenvalue of A
compare n entities (alternatives or criteria) un- with W being a corresponding eigenvector. The
der a given goal and extract their relative prefer- same equation also states that in the perfectly con-
ences. This subject was extensively studied in [21] sistent case (i.e., when aij = aik × akj for all possi-
and it is briefly discussed in the second section. ble triplets), the vector W, with the relative pref-

28
Estimating data for multicriteria decision making problems: Optimization techniques

erences of the elements A 1 , . . . , A~, is the principal ~max -- n


CI=
right eigenvector (after normalization) of A. n--1
In the nonconsistent case (which is the most
common) the pairwise comparisons are not perfect, Then, the consistency ratio CR is obtained
that is, the entry aij might deviate from the real by dividing the CI by the random consistency
ratio Wi/Wj (i.e., from the ratio of the real rela- index (RCI) as given in table 1. Each RCI is
tive preference values Wi and Wj). In this case, the an average random consistency index derived
previous expression (1) does not hold for all pos- from a sample of size 500 of randomly gener-
sible combinations. Now the new matrix A can be ated reciprocal matrices with entries from the
considered as a perturbation of the previous con- set (1/9, 1/8, 1 / 7 , . . . , 1 , . . . , 9 } to see if its CI is
sistent case. When the entries aij change slightly, 10% or less. If the previous approach yields a CR
then the eigenvalues change in a similar fashion greater than 10%, then a reexamination of the
[12]. Moreover, the maximum eigenvalue is close pairwise judgments is recommended until a CR
to n (actually greater than n) while the remain- less than or equal to 10% is achieved.
ing eigenvalues are close to zero. Thus, in order to
Optimization Approaches. A.T.W. Chu, R.E. Kal-
find the relative preferences in the nonconsistent
aba and K. Spingarn [2] claimed that given the
cases, one should find an eigenvector that corre-
data aij, the values Wi to be estimated are desired
sponds to the maximum eigenvalue ~max. That is
to have the property:
to say, to find the principal right eigenvector W
Wi
that satisfies: (5)

AW = ,~maxW w h e r e ,~max -- n. This is reasonable since aij is meant to be the


estimation of the ratio Wi/Wj. Then, in order to
Saaty estimates the principal right eigenvector W get the estimates for the Wi given the data aij,
by multiplying the entries in each row of A to- they proposed the following constrained optimiza-
gether and taking the nth root (n being the num- tion problem:
ber of the elements in the comparison set). Since n n

we desire to have values that add up to 1, we nor- min s - -

i=j j=i
malize the previously found vector by the sum of n

the above values. If we want to have the element


(6)
s.t. EWi-1,
with the highest value to have a relative preference i=j
value equal to 1, we divide the previously found Wi > O f o r i = l , . . . , n .
vector by the highest value. They also provide an alternative expression $1
Under the assumption of total consistency, if that is more difficult to solve numerically. That
the judgments are gamma distributed (something is,
that Saaty claims to be the case), the principal n n

right eigenvector of the resultant reciprocal ma- Sl - Z (a J - Wj/W ) (7)


trix A is Dirichlet distributed. If the assumption i=j j=i
of total consistency is relaxed, then L.G. Vargas In [3] a variation of the above least squares for-
[23] proved that the hypothesis that the principal mulation is proposed. For the case of only one de-
right eigenvector follows a Dirichlet distribution is cision maker it recommends the following models:
accepted if the consistency ratio is 10% or less.
log aij - log Wi - log Wj + ¢2(W/, Wj)eij, (8)
The consistency ratio (CR) is obtained by first
estimating ~max. Saaty estimates ~max by adding (9)
the columns of matrix A and then multiplying the
aij- wj
resulting vector with the vector W. Then, he uses where Wi and Wj are the true (and hence
what he calls the consistency index (CI) of the unknown) relative preferences; ¢ I ( X , Z ) and
matrix A. He defined CI as follows: ¢2(X, Z) are given positive functions (where

29
Estimating data .for multicriteria decision making problems: Optimization techniques

n 1 2 3 4 5 6 7 8 9
RCI 0 0 0.58 0.90 1.12 1.24 1.32 1.41 1.45

Table 1' RCI values for sets of different order n [12].

X, Z > 0). The random errors eij are assumed in- W3 Wn


dependent with zero mean and unit variance. Us- W2""' W2'
ing these two assumptions one is able to calculate
the variance of each individual estimated relative
Wn-1
preference. However, is fails to give a way of se-
lecting the appropriate positive functions. In the Wn
second example, presented later, a sample problem The corresponding n ( n - 1)/2 errors are (after us-
which originates in [11] and later in [3] is solved for ing relations (11) and (12))"
different functions ¢1, ¢2 using this method.
- 1.00, (13)
Considering the H u m a n Rationality Factor. Ac-
cording to the h u m a n rationality assumption [21] i,j-1,...,n, andj>l.
the decision maker is a rational person. Rational
Since the W / a r e relative preferences that add up
persons are defined here as individuals who try to to 1, the following relation (14) should also be sat-
minimize their regret [15], to minimize losses, or
isfied:
to maximize profit [24]. In the relative preference II
evaluation problem, m i n i m i z a t i o n of regret, losses, E Wi - 1.00. (14)
or maximization of profit could be interpreted as i---1
the effort of the decision maker to minimize the Apparently, since the Wi represent relative prefer-
errors involved in the pairwise comparisons. ences we also have:
As it is stated in previous paragraphs, in the in-
consistent case the entry aij of the matrix A is an Wi >O, i-1,...,n. (15)
estimation of the real ratio W i / W j . Since it is an Relations (13) and (14), when the data are consis-
estimation, the following is true: tent (i.e., all the errors are equal to zero), can be
written as follows:
aij - ~jj dij , i, j - 1 , . . . , n. (10)
B W - b. (16)
In the above relation dij denotes the deviation of
The vector b has zero entries everywhere except
aij from being an accurate judgment. Obviously,
the last one that is equal to 1, and the matrix B
if dij = 1, then the aij was perfectly estimated.
has the following form (blank entries represent ze-
From the previous formulation we conclude that
ros)"
the errors involved in these pairwise comparisons
are given by: 1 2 3 ... n
-1 al,2 1
e i j = dij - 1.00,
-1 al,3 2
or after using (10), above:
--1 al,n n -- 1
eij - aij -~i -1"00" (11) --1 a2,3 1
B

When a comparison set contains n elements, then


Saaty's method requires the estimation of the fol- -I a2,n n - 2

lowing n ( n - 1)/2 pairwise comparisons: • •

an-l,n i
W2 Wn (12)
WI ""' WI ' 1 1 1 ... 1

30
Estimating data for multicriteria decision making problems: Optimization techniques

The error m i n i m i z a t i o n issue is interpreted 1 2/1 1/5 1/9


in many cases (regression analysis, linear least 1/2 1 1/8 1/9
squares problem) as the minimization of the s u m A
15/1 8/1 1 1/4 "
of squares of the residual vector: r = b - B W [16]. L9/1 9/1 4/1 1
In terms of formulation (15) this means that in a
Using the methods presented in previous sections
real life situation (i.e., when errors are not zero
we can see that
any more) the real intention of the decision maker
is to minimize the expression" /~max -- 4.226;
4.226 - 4
f2(x) -lib- BWI] , (17) CI- = 0.053,
4-1
which, apparently, expresses a typical linear least CI
CR- = 0.0837 < 0.10.
squares problem. 0.90
If we use the notation described previously, then The formulation (15) that corresponds to this ex-
the quantity (6) which is minimized in [2] becomes" ample is as follows"
n n n n
" 1 2/1 0.0 0 0
s = E ( a , j W j - W,): - E - 1 0.0 1/5 0 0
i=1 j--1 i=1 j = l
1 0.0 0 1/9
Vl 0
112
and the alternative expression (7) becomes: 0.0-1 1/8 0 = 0 •
x v3
0.0 - 1 0 1/9 0
$1 = i ~ 1 E a ij -~i - eij-~j . 0.0 0.0 -1 1/4 0
"= j--1 i=1 j = l 1 1 1 1 _l.OJ
Clearly, both expressions are too complicated to The vector V that solves the above least squares
reflect, in a reasonable way, the intentions of the
problem is calculated to be:
decision maker.
The models proposed in [3] are closer to the one V = (0.065841 0.039398 0.186926 0.704808).
developed under the human rationality assump- Hence, the sum of squares of the residual vec-
tion. The only difference is that instead of the re- tor components is 0.003030. The average squared
lations: residual for this problem is 0 . 0 0 3 0 3 0 / ( ( 4 ( 4 -
1)/2) + 1 = 0.000433; that is, the average resid-
log aij - log wi - log W j + ¢1 (Wi, Wj )gij
ual is v/0.000433 = 0.020806. K]
and
EXAMPLE 2 The second example uses the same
Wi
aij -- Wj + ¢2 (Wt, W j ) e i j , data used originally in [11], and later in [2] and
[3]. These data are presented in Table 2.
the following simpler expression is used: (1) (2) (3) (4) (5) (6) (7)
4 9 6 6 5 5
aij - ,--i7-dij, (18) 2) 1/4 1 7 5 5 3 5
vvj
or 3) 1/9 1/7 1 1/5 1/5 1/7 1/5
(4) 1/6 1/5 5 1 1 1/3 1/3
aij -- ~jj x (sij + 1.00). 5) 1/6 1/5 5 1 1 1/3 1/3
(6) 1/5 1/3 7 3 3 1 2
However, as the second example illustrates, the (7) 1/5 1/4 5 3 3 1/2 1
performance of this method is greatly dependent Table 2: Data for the second example.
on the selection of the ¢1 (X, Z) or ¢2(X, Z) func- Table 3 presents a summary of the results (as
tions. Now, however, these functions are further found in the corresponding references) when the
modified by (17). methods described in the subsections above are
EXAMPLE l Let us assume that the following is used. The power m e t h o d for deriving the eigenvec-
the matrix with the pairwise comparisons for a set tor was applied as presented in [7]. In the last row
of four elements: of Table 2 are the results obtained by using the

31
Estimating data/or multicriteria decision making problems: Optimization techniques

least square method under the human rationality Xi,j - ai,k × a k,j.
assumption (HR).
In the more general inconsistent case, the Xi,j
As it is shown in the last column of Table 3, the
value can be approximated by the product a i , k ×
performance of each method is very different as far
ak,j. In [5], and [6] the pair ai,k and ak,j is called an
the mean residual is concerned. The results also il-
elementary connecting path connecting the missing
lustrate how critical is the role of the functions
comparison Xi,j. Obviously, given a missing com-
¢1(X, Z) and ¢2(X, Z ) i n the method of [3]. The
parison, more than one such connecting path may
mean residual obtained by using the least squares
exist (i.e., if there are more than one k indexes
method under the human rationality assumption
which satisfy the above relationship). Moreover, it
is the smallest one by 16%.
is also possible to have connecting paths comprised
by more than two known comparisons (i.e., paths
Matrices with Missing Comparisons. For of size larger than 2). The general structure of a
one to evaluate n concepts, normally all the re- connecting path of size r, denoted as CPr, has the
quired n(n-1)/2 pairwise comparisons are needed. following form:
However, for large numbers of concepts to be
CPr " Zi,j - ai,kl × akl,k2 × "'" X akr,j ,
compared, the decision maker may become quite
bored, tired and inattentive with assigning the val- for i,j, kl,... ,kr - 1,... ,n, 1 ~_ r <_ n - 2 .
ues to the comparisons as time is going on, which According to P.T. Harker [5], [6] the value of
may easily lead to erroneous judgments. Moreover, the missing comparison Xi,j should be equal to
the time spent to elicit all the comparisons for a the geometric mean of all connecting paths related
judgment matrix may be unaffordable. Also the to this missing comparison. That is, the following
decision maker may not be sure about the values of should be true:
some comparisons and thus may not want to make
a direct evaluation of them. In cases like the previ-
Xij- ~HCPr.r=I
ous ones, the decision maker may wish to stop the
process and then try to derive the relative pref-
In the previous expression it is assumed that there
erences from an incomplete pairwise comparison
are q such connecting paths. For the above rea-
(judgment) matrix.
sons, this method is known as the geometric mean
Given an incomplete pairwise comparison ma-
method for estimating missing comparisons.
trix, there are two central and closely interrelated
A method alternative to the geometric means
problems. The first problem is how to estimate
method is to express the missing comparisons in
the missing comparisons. The second problem is
terms of the arithmetic averages of all related con-
which comparison to evaluate next. In other words,
necting paths and some error terms. In this way,
if the decision maker wishes to estimate a few
one can also introduce error terms on consistency
extra comparisons (from the remaining undeter-
relations which are defined on pairs of missing
mined ones) how should the next comparison be
comparisons (for more details, please see [1]). A
selected? Should it be selected randomly or ac-
natural objective then, could be to minimize the
cording to some rule (to be determined)? Next,
sum of the absolute terms of all these error terms
we study the first of these two closely related prob-
(which can be of any sign). That is, the above
lems.
consideration leads to the formulation of a linear
programming (LP) problem. A similar approach is
Estimating Missing Comparisons.
presented in [17] (in which the path problem does
Using Connecting Paths. Suppose that Xi,j is a not occur).
missing comparison to be estimated. Next, also as- However, there is a serious drawback with any
sume that there are two known comparisons ai,k method which attempts to use connecting paths.
and ak,j for some index k. In the perfectly consis- The number of connecting paths may be astronom-
tent case the following relationship should be true: ically large, rendering any such method computa-

32
Estimating data for multicriteria decision making problems: Optimization techniques

elements in set
method used (1) (2) (3) (4) (5) (6) (7) Ave. residual
Saaty eigenvector method 0.429 0.231 0.021 0.053 0.053 0.119 0.095 0.134
Power method eigenvector 0.427 0.230 0.021 0.052 0.052 0.123 0.094 0.135
Chu's method 0.487 0.175 0.030 0.059 0.059 0.104 0.085 0.097
Federov model 1 with Ct - 1 0.422 0.232 0.021 0.052 0.052 0.127 0.094 0.138
Federov Model 2 with ¢2 - 1 0.386 0.287 0.042 0.061 0.061 0.088 0.075 0.161
Federov Model 2 with ¢2 --- 0.383 0.262 0.032 0.059 0.059 0.122 0.083 0.152
IW, - W~ l
Federov Model 2 with ¢2 = Wi/Wj 0.047 0.229 0.021 0.051 0.051 0.120 0.081 0.130
Least squares method under the HR 0.408 0.147 0.037 0.054 0.054 0.080 0.066 0.082
assumption

Table 3: Comparison of the relative preferences for the data in Table 2.

tionally intractable. For instance, for a comparison


matrix of dimension of six, the number of possible
connecting paths to be considered might be equal
AtW = ix 2 wxjw31
1/2
w3/wl
iwll 1
1/2
2
1
w2
w3
to 64, while in a case of dimension equal to ten, the
number of paths may become equal to 109,600. As I 2w1 + 2w2 1
= Wl/2 + w2 + 2w3
a result, some alternative approaches have been
w2/2 + 2w3
developed. The revised geometric means method
(or RGM) method and a least squares formulation The same result can also be obtained if one con-
are two such methods and are discussed next. siders the matrix C, given as follows:
Revised Geometric Mean Method (RGM). An al-
ternative approach to the use of connecting paths,
is to convert the incomplete judgement matrix
C ~.. 12 2!1
1/2
o
1

into a transformed matrix and then determine its


that is, matrix C satisfies the relationship
principal right eigenvector. This was proposed by
Harker [4] and it is best illustrated by means of an A1W = CW.
example.
Therefore, the desired relative preferences (i.e., the
Suppose that the following is an incomplete
entries of vector W) can be determined as the prin-
judgement matrix of order 3 (taken from [4]).
cipal right eigenvector of the new matrix C. This
is true because:
Ao --
I1 il
1/2
-
1
1/2
. AI W = C W = AW.

In general, the entries of matrix C can be deter-


One can replace the missing elements (denoted by mined from the entries of an incomplete judgement
- ) by the corresponding ratios of weights. There- matrix A0 as follows (where ci,j and ai,j a r e the el-
fore, the previous matrix becomes: ements of the matrices C and A0, respectively):

1 2 wt/w3l Ci,i -- 1 + mi
A1 = 1/21
Lw3/wl i/2
21]. and for i # j:

l ai,j i f a i , j is a positive number,


That is, the missing comparison X1,3 was replaced
ci,j - 0 otherwise,
by the ratio Wl/W3 (similar for the reciprocal en-
try X3,1). Next observe that the product AIW is where mi is the number of unanswered questions in
equal to: the ith row of the incomplete comparison matrix.

33
Estimating data for multicriteria decision making problems: Optimization techniques

Next, the elements of the W vector can be de- best stated as follows: Given an incomplete judg-
termined by using one of the methods presented ment matrix, and the option to elicit just some
in the second section. additional comparisons, then which one should be
the comparison to elicit next?
Least Squares Formulation. This formulation is a
One obvious approach is to select the next com-
natural extension of the formulation discussed ear-
parison just randomly among the missing ones.
lier in the section on the HR factor. The only differ-
This problem was examined by Harker in [5] and
ence is that in relations (12) one should only con-
[6]. Harker focused his attention on how to deter-
sider known comparisons. This, as a result, implies
mine which comparison, among the missing ones,
that the new matrix B (as defined earlier) should
is the most critical one. He determined as the most
not have rows which would correspond to missing
critical one, to be the comparison which would
comparisons. Finally, observe that in order to solve
have the largest impact (when the appropriate
the least squares problem given as (16), one has to
derivatives are considered) on the vector W.
calculate the vector W as follows:
He observed that the largest absolute gradi-
W-(BTB)-IBTb, ent (i.e., the largest partial derivative) means that
a unit change of the specific missing comparison
where B T stands for the transpose of B. brings out the biggest change on the vector W.
In [1] the revised geometric means and the pre- Therefore, he asserted, that the missing compari-
vious least squares method were tested on ran- son related to the largest absolute gradient should
dom problems. First, a complete judgment matrix be the most critical one and therefore, the one to
was determined. These matrices, in general, were evaluate next. Then, the following formula calcu-
slightly inconsistent. They were derived according lating the largest absolute gradient can be used to
to the procedures used in [22], [20], and [19]. Then, choose the most critical comparison index (i, j):
some comparisons were randomly removed and set
as missing. Then, the previous two methods were Ox(A)
(i j ) - arg max
applied on the incomplete judgment matrix and ' (k,l)eQ Ok,l OO

the missing comparisons were estimated. The es- where Q is the set of missing comparisons and
timated matrix was used to derive a ranking of []'[[oo is the Tchebyshev norm. The most critical
the compared entities. This ranking was compared comparison index (i, j) is determined by the max-
with the ranking derived when the original com- imum norm of the vector of Ox(A)/Ok,l which cor-
plete judgment matrix is used. In these compu- responds to all missing comparisons.
tational experiments it was found that the two The previous approach is intuitively plausible
estimation methods for missing comparisons per- but computationally non trivial. Moreover, its ef-
formed almost in a similar manner. This manner fectiveness had not been addressed until recently.
was different for matrices of different order and In [1] Harker's derivatives approach was tested
various percentages of missing comparisons. More versus a method which randomly selects the next
details on these issues can be found in [1]. comparison to elicit. The test problems were gen-
erated similarly to the ones described at the end
Determining the Comparison to Elicit Next. of the previous section. The two methods were
Suppose that the decision maker has determined also tested in a similar manner as before. To our
some of the n ( n - 1)/2 comparisons when a set surprise, the two methods performed in a similar
of n entities is considered for extracting relative manner. Therefore, the obvious conclusion is that
preferences. Next assume that the decision maker one does not have to implement the more com-
wishes to proceed with only a few additional com- plex derivatives method. It is sufficient to select
parisons and not determine the entire judgment the next comparison just randomly. Of course, the
matrix. The question we examine at this point is more comparisons are selected, the better is for
which ones the additional comparisons should be. the accuracy of the final results. Since the order
To be more specific, the question we consider is of comparisons seems not to have an impact, the

34
Estimating data for multicriteria decision making problems: Optimization techniques

best strategy is to select as the next comparison totalities', in M.M. GUPTA AND E. SANCHEZ (eds.):
the one which is easier for the decision maker to Approximate Reasoning in Decision Analysis, North-
elicit. Holland, 1982, pp. 23-30.
[4] HARKER, P.T.: 'Alternative modes of questioning in
the analytic hierarchy process', Math. Model. 9, no. 3-
C o n c l u s i o n s . Deriving the data for MCDM prob- 5 (1987), 353-360.
lems is an approach which requires trade-offs. [5] HARKER, P.T.: 'Derivatives of the Perron root of a
Thus, it should not come as a surprise that op- positive reciprocal matrix: With application to the
timization can be used at various stages of this analytic hierarchy process', Appl. Math. Comput. 22
(1987), 217-232.
crucial phase in solving many MCDM problems.
[6] HARKER, P.T.: 'Incomplete pairwise comparisons in
The previous analysis of some key problems signi- the analytic hierarchy process', Math. Model. 9, no. 11
fies that optimization becomes more critical as the (1987), 837-848.
size of the decision problem increases. [7] KALABA, R., AND SPINGARN, K.: 'Numerical ap-
Finally, it should be stated here that an in depth proaches to the eigenvalues of Saaty's matrices for
fuzzy sets', Comput. Math. Appl. 4 (1979).
analysis of many key issues in multicriteria deci-
[8] LOOTSMA, F.A.: 'Numerical scaling of human judg-
sion making theory and practice is provided in [18]. ment in pairwise-comparison methods for fuzzy multi-
See also" Multi-objective optimization: criteria decision analysis': Mathematical Models for De-
Pareto optimal solutions, properties; Multi- cision Support, Vol. 48 of NATO ASI F: Computer and
objective optimization: Interactive meth- System Sci., Springer, 1988, pp. 57-88.
ods for preference value functions; Multi- [9] LOOTSMA, F.A.: 'The French and the American school
in multi-criteria decision analysis', Rech. Oper./Operat.
objective optimization: Lagrange dual- Res. 24, no. 3 (1990), 263-285.
ity; Multi-objective optimization: Interac- [10] LOOTSMA, F.A.: 'Scale sensitivity and rank preser-
tion of design and control; Outranking vation in a multiplicative variant of the AHP and
methods; Preference disaggregation; Fuzzy SMART', Techn. Report Fac. Techn. Math. and Infor-
multi-objective linear programming; Multi- matics Delft Univ. Techn., no. 91-67 (1991).
[11] SAATY, T.L.: 'A scaling method for priorities in hier-
objective optimization and decision sup-
archical structures', J. Math. Psych. 15, no. 3 (1977),
port systems; Preference disaggregation ap- 234-281.
proach: Basic features, examples from fi- [12] SAATY, T.L.: The analytic hierarchy process, McGraw-
nancial decision making; Preference model- Hill, 1980.
ing; Multiple objective programming sup- [13] SAATY, T.L.: 'Priority setting in complex problems',
IEEE Trans. Engin. Management E M - 3 0 , no. 3
port; Multi-objective integer linear pro-
(1983), 140-155.
gramming; Multi-objective combinatorial [14] SAATY, T.L.: Fundamentals of decision making and
optimization; Bi-objective assignment prob- priority theory with the analytic hierarchy process,
lem; Multicriteria sorting methods; Finan- Vol. VI, RWS Publ., 1994.
cial applications of multicriteria analysis; [15] SIMON, H.A.: Models of man, 2 ed., Wiley, 1961.
Portfolio selection and multicriteria analy- [16] STEWART, S.M.: Introduction to matrix computations,
Acad. Press, 1973.
sis; Decision support systems with multiple
[17] TRIANTAPHYLLOU, E.: 'Linear programming based de-
criteria. composition approach in evaluating priorities from
pairwise comparisons and error analysis', J. Optim. Th.
References Appl. 84, no. 1 (1995), 207-234.
[1] CHEN, Q., TRIANTAPHYLLOU, E., AND ZANAKIS, S.: [18] TRIANTAPHYLLOU, E.: Multi-criteria decision making
'Estimating missing comparisons and selecting the methods: A comparative study, Kluwer Acad. Publ.,
next comparison to elicit in MCDM', Working Paper 20OO.
Dept. Industrial Engin. Louisiana State Univ. (2001), [19] TRIANTAPHYLLOU, E., LOOTSMA, F.A., PARDALOS,
http: / / www.imse, lsu. edu / vangelis. P.M., AND MANN, S.H.: 'On the evaluation and appli-
[2] CHU, A.T.W., KALABA, R.E., AND SPINGARN, K.: 'A cation of different scales for quantifying pairwise com-
comparison of two methods for determining the weights parisons in fuzzy sets', J. Multi-Criteria Decision Anal.
of belonging to fuzzy sets', J. Optim. Th. Appl. 27, 3 (1994), 133-155.
no. 4 (1979), 321-538. [20] TRIANTAPHYLLOU, E., AND MANN, S.H.: 'A compu-
[3] FEDEROV, V.V., KUZMIN, V.B., AND VERESKOV, A.I.: tational evaluation of the AHP and the revised AHP
'Membership degrees determination from Saaty matrix

35
Estimating data for multicriteria decision making problems: Optimization techniques

when the eigenvalue method is used under a continu- problem. We focus on evacuation networks where
ity assumption', Computers and Industrial Engin. 26, congestion is a significant problem.
no. 3 (1994), 609-618.
[21] TRIANTAPHYLLOU, E., PARDALOS, P.M., AND MANN,
S.H.: 'A minimization approach to membership evalu- I n t r o d u c t i o n . Evacuation is one of the most per-
ation in fuzzy sets and error analysis', J. Optim. Th. ilous, pernicious, and persistent problems faced by
Appl. 66, no. 2 (1990), 275-287.
humanity. Hurricanes, fires, earthquakes, explo-
[22] TRIANTAPHYLLOU, E., AND SANCHEZ, A.: 'A sensi-
tivity analysis approach for some deterministic multi- sions and other natural and man-made disasters
criteria decision-making methods', Decision Sci. 28, happen on almost a daily basis throughout the
no. 1 (1997), 151-194. world. How can we safely evacuate a collection of
[23] VARGAS, L.G.: 'Reciprocal matrices with random co- occupants within an affected region or facility is
efficients', Math. Model. 3 (1982), 69-81. the fundamental problem faced in evacuation.
[24] WRITE, C., AND TATE, M.D.: Economics and systems
analysis: Introduction for public managers, Addison- Purpose. The purpose of this article is to both in-
Wesley, 1973.
troduce to the reader the problem of evacuation
Qing Chen and its manifest nature, and also suggest some
Dept. Industrial and Manufacturing Systems Engin. alternative approaches to optimize this process.
3128 CEBA Building
That life-threatening evacuations happen as often
Louisiana State Univ.
as they do is somewhat surprising. That people of-
Baton Rouge, LA 70803-6409, USA
Evangelos Triantaphyllou ten do not know how to safely evacuate in time of
Dept. Industrial and Manufacturing Systems Engin. need is a sad reality. That people must help people
3128 CEBA Building plan for evacuation is one of the most important
Louisiana State Univ. activities of a research scientist.
Baton Rouge, LA 70803-6409, USA
E-mail address: triantaOlsu.edu
Web address: www. imse. Isu. edu/vangelis Phase I:
Warning Siren or Alarm
MSC 2000:90C29 Goes Off
Key words and phrases: pairwise comparisons, data elici-
tation, multicriteria decision making, MCDM, scale, ana-
lytic hierarchy process, AHP, consistent judgment matrix, Phase II
eigenvalue, eigenvector, least squares problem, incomplete Reaction to Warning Siren
or Alarm
judgments.

Phase II1:
EVACUATION NETWORKS Decision to Evacuate
Planning and design of evacuation networks is
both a complex and critically important optimi-
Phase IV:
zation problem for a number of emergency situa- Evacuate the region or
tions. One particularly critical class of examples the facility
concerns the emergency evacuation of chemical
plants, high-rise buildings, and naval vessels due
to fire, explosion or other emergencies. The prob- Phase V:
Verification Process
lem is compounded because the solution must take
into account the fact that human occupants may
panic during the evacuation, therefore, there must
be a well-defined set of evacuation routes in order Fig. 1: Processes for an evacuation.
to minimize the sense of panic and at the same
time create safe, effective routes for evacuation. Outline. In this article we first introduce the prob-
The problem is a highly transient, stochastic, non- lem in Section 1 and then describe our funda-
linear, combinatorial optimization programming mental modeling 3-step methodology in Section 2.

36
Evacuation networks

In Section 3, we array the number of different of a deterministic or stochastic evacuation pro-


static and dynamic approaches to this problem and cess? What performance measures are crucial
present our general approach which has guided our to measuring performance of the evacuation?
research on the problem. Finally, in Section 4 we and
discuss the algorithmic approaches to the problem 3) Synthesis: How should one synthesize the re-
where we capture the congested flow of occupants sults of the analysis step so as to best evacu-
in the network and attempt to define the safest ate the occupants in light of the performance
evacuation routes trading off the different objec- measures?
tive performance measures in the network.
Representation Stage. Fig. 2 depicts a large hos-
!~!i~iili~ii~i!~i!~iii~:i~l~i~:::
: ::~ : i ! !~i!~::~::!:::: i :!:!:! ~i:~:~;:~:::: ::: ; ; : :: I:
i~::}i~i;:i::ii~::i::!!~ii!ii~iiii!i!~iiii~iii~.
!ili i i!~!!ii ~i~i~!~ii !!ii!i!!ii~i !i l ~!:i
~ii::l !i!i::i~:::::i i:::: : ~:::~ ======================= : 7
]i;! :: ::: ....: '
pital campus with many inter-connected build-
ings, many different levels, and a complex array
:~:.~:~ .~..@.~ .~.~.,.,. :~:~: :..:.~ ..... ............... .......................... ......................
of circulation passages, and illustrates that the
ii!{t~iiiiiii}~:: ~ii~f~ii: ii:~: i ::i:il i!:ii:i:i: !: :i:i!!:::i :: :: : !:; i:: : :i::i::ii!~:i;: : I .... :

iii~i!~::~ :~ii~#~...................~ ......... ...... ~:: ~::::: !::!:i!;::: t .....


evacuation problem is a difficult one to represent.
However, one can begin to accurately model the
evacuation process through a network as depicted
in Fig. 3. By definition, an evacuation network
(graph) G(V, E) ~ is comprised of a finite set Y of
nodes (vertices)of size N, where Y - {V1,..., Vn}
together with a finite set E of arcs ek -- (Vi, v j ) ,
V(i, j), nodal pairs and an indication of the level at
Fig. 2: E v a c u a t i o n plan for a hospital complex.
which the network is defined t~. The levels actually
correspond to the degree of aggregation inherent
M o d e l i n g F u n d a m e n t a l s . The process of an in modeling large complex networks. V can further
evacuation is captured in the simplified flow chart be partitioned into three sets of nodes:
of Fig. 1. There are essentially five phases which
1/'1) representing the occupant source nodes dur-
underly the evacuation process. The first and fore-
ing the evacuation,
most is a warning bell or siren signaling the occu-
pant population to leave. Unfortunately, one must V2) representing the intermediate nodes during
react to the warning and recognize the problem the evacuation;
at hand, so there is often a great deal of uncer- 1/3) representing the sink or destination nodes of
tainty associated with the second phase. Thirdly, the occupants.
after the warning is taken seriously, the occupants The set of arcs represent the different streets, pas-
must decide to evacuate. The first three phases sageways, or routes from 1/1 to V3. Associated with
are highly uncertain and transient. Once the oc- each node ~ E V and each arc (vi, vj) C E are vari-
cupants decide to evacuate, the general evacuation ables and parameters which represent node and
process gets underway and this is where the evac- arc processing times, node and arc capacities, ar-
uation plans should be followed. Finally, there is rival times to the network, distances, and occupant
a verification phase, were one must account for all population sizes at the source nodes.
the occupants to ensure their safe arrival at the Fig. 3 illustrates the example evacuation net-
destination. As a constructive framework for this work with the key congested routes in the evacu-
Chapter on evacuation networks, we establish that ation planning problem embedded in the network
the modeling of evacuation problems has three fun-
model.
damental steps:
The Representation Step is often defined in
1) Representation: How should a region, e.g. terms of the size and composition of the customer
Fig. 2 or facility be represented or modeled? population: infinite, finite, or mixed and how the
2) Analysis: Given the model, how should an- facility under study should be decomposed by V,
alyze the evacuation of the occupants, i.e. E, and t~. The crucial link between the Represen-

37
Evacuation networks

tation and Analysis Steps is the complexity (i.e., istics determined during the Analysis Step, we can
number of nodes and arcs) of G t, which governs begin to optimize the network topology itself, rout-
the number of equations used in the mathemat- ing and resource allocation problems within:
ical model in the Analysis Step. The Represen-
• Topological Network Design (TND): Deter-
tation Step presents an interesting and challeng-
mination of the number, type, and subset of
ing problem because of the many possible ways of
nodes and arcs as well as the particular node
representing regions, facilities, ships, vehicles, and
and arc topology to be used for the evacua-
building components.
tion.

I
Topology I
Critical Route 3
Routing Network Design (RND)" Determina-
Topology II
Critical Route 8
tion of the routing scheme in both steady-
Topology III
i i i n l l l l Cri|ical Route 9 state and real time.
Capacitated Network Design (CND): Deter-
mination of the Network Resources: Number
of highway lanes, corridor length, widths, ar-
eas, landing shape, reception center capacity,
configuration etc.
Garage Lot B

MOB I ~ Lot A
M a t h e m a t i c a l Models. There are many possi-
Fig. 3: Route site plan. ble mathematical modeling approaches once our
network is constructed and Fig. 5 represents the
range of approaches many research scientists have
Analysis Step. The Analysis Step is the point at followed. References are provided for further de-
which the methodology and mathematical mod-
tails. The boldface text along the morphological
els underlying the flow processes, and the algo-
tree represents the approach suggested in this ar-
rithmic structure for computing the performance
ticle which we have applied in many different con-
characteristics of Gt(Z, E) come together. Mathe-
texts.
matically, we have a network G(V, E), with a finite
Many mathematical models which have ap-
set of nodes V and edges(arcs) E over which mul-
peared in the literature for generating and evalu-
tiple classes of customers (occupants) flow from
ating evacuation paths for an occupant population
source(s) to sink(s) while a vector of objective
[5], [2], [8].
functions f~ = {fl (x), • • •, fp(~) } is simultaneously
extremized subject to a set of constraints on the Set Partitioning Model. The model which is pre-
occupants flowing through the network. Fig. 4 cap- sented below is a variation of one model appearing
tures many of the recognized criteria appropriate in [8]. It was one of the first to account for the
in analyzing a network evacuation problem. In our critical features of the stochastic evacuation prob-
studies, we have often used Minimum Total Evac- lem. Another class of models that one might utilize
uation Time and Minimum Total Distance Trav- to formulate the problem are those of the class of
elled to capture the evacuation problem. The Total multicommodity flow models. Unfortunately, these
Distance travelled is a suitable surrogate objective models will not control the Bernoulli splitting of
for approaching the route complexity, since reduc- the occupant population along the different evac-
ing the evacuation path length will often begin to uation paths which is problematic since splitting
capture the path complexity and, hopefully, mini- the different source populations will engender con-
mizing this measure will abate the occupants sense fusion and create a potential sense of panic among
of panic. Other objectives might be appropriate the evacuating occupants. The integer set parti-
given the particular context or decision situation. tioning programming model presented below has
the desired property to control splitting of the
Synthesis Step. Given the performance character- flows.

38
Evacuation networks

Minimize Minimize Maximum Queue Lengths


Congestion ~Minimize Average Queue Lengths
Minimize
Evacuation Time
E Minimize
Travel T i m e F
-Minimize Latest Arrival Time
M i n i m i z e Total E v a c u a t i o n T i m e

Minimize
Shortest Routes
~ MMinimize
i n i m i z e Total D i s t a n c e Travelled
Maximum Path Lengths
Overall Minimize Routing
Safety Complexity
Minimize Minimize # of Turns
Path Complexity F-Minimize # Up-down Transitions

Minimize Reception J--Minimize Maximum Flow Capacity


Center Failures --[-- Equalize Average Flow Capacities
Maximize
-- Path Reliability
Minimize --Equalize Average Arc Flows
Arc Failures L Minimize Maximum Arc Flows

Fig. 4: Morphological diagram of multi-objective approaches.

The multi-objective model of our routing prob- • Ollijk is a data coefficient which equals 1 if
lem is: the gth arc is included in the ijkth route as-
signment and equals 0 otherwise.
minimize { fl (5); f2 (5) }
• Pl is the maximum allowable traffic along arc
where the Evacuation Time, respectively the Dis-
tance Travelled are:
• Cq is the capacity of sink (destination) node
fl (5) -- E E E qijk'kijkXijk' q.
i j k
" Pijk is the occupant population of source ij
i j k on the kth route alternative.
subject to: qijk is the expected evacuation (sojourn) time
* V2 Arcs: of the i j k t h occupant class. These values
must be calculated from the particular sto-
E ~ E Cttijk/~ijkXiJk <-Pt, chastic model used in the evacuation study,
i j k
see the discussion below.
• V3 Sinks:
dijk is the average distance travelled for the
Vq, i j k t h occupant class.
i j k
Since we have two objective in our model, it
• Occupant Classes:
makes sense to talk of the NonInferior (ni) set of
EXijk -- 1, Vi, j, route alternatives, since the trade-offs between fl
k and ]2 naturally underlie the optimal set of solu-
• Routes: tions we seek. Because of the complexity of solv-
X~jk = O, 1, Vi, j, k, ing this model directly, an alternative approach
which systematically generates feasible routing al-
and where: ternatives to a relaxed version of our mathematical
• Xij k 1 if the ith occupant class from the j t h
- - model but at the same time measures the critical
source is assigned the kth route alternative. objectives of evacuation time and distance trav-

39
Evacuation networks

Simulation Approaches
Transient Networks ~_ [18], [81, [211
Analytical Approaches
-- Stochastic Networks
M e a n Value Analysis (MVA)
Steady State Networks-~ [17], [19], [14]
MVA with Finite Waiting Room [4]

- - Static Networks
Transshipment

Dynamic Networks Dynamic Network Flows [5]


m Deterministic Flow Networks-- Dynamic Programming [10]

Simulation Networks ~[1]Deterministic


. o o

Fig. 5: Morphological diagram of EEP approaches.

elled is proposed and demonstrated in the next tersections, landings, stairwells, ramps, and so on
two sections. represent a network of interconnected M / G / C / C
queues. The separations of the circulation blocks
Congestion Models. The real crux of the evacu- are due to changes in flow direction, level, or merg-
ation problem is to capture the congestion that ing and splitting decisions. Further, the cardinality
naturally occurs when occupants choose the short- of S depends on the configuration and complexity
est routes to evacuate. There are some determin- of movement patterns within the facility.
istic measures possible for measuring congestion,
yet stochastic ones are the most accurate, because
queueing is a nonlinear complex phenomenon.

Erlang Loss/Delay Networks. Fundamentally, each


Sj node in the circulation network is an M / G / C / C
queue, i.e. there is no waiting room and C depends
on the square footage area of the circulation seg-
ment or the number of vehicles which can maxi-
mally occupy a highway segment [23]. Let's for the
sake of the argument, focus on pedestrian evacua-
tion. Later on we will show how our model extends
to vehicular congestion. Each occupant in the cir-
culation system consumes approximately 0.2m 2 of
floorspace, and, therefore, the capacity of a circu- • IIII II I _
!
lation system element is:

C - 5LW,
where L ( l e n g t h ) a n d W ( w i d t h ) a r e given in me- Fig. 6: Three-dimensional network models.
ters.
Each circulation segment is a representative Flows through the nodes of S, the circulation
'building block' for modeling pedestrian move- system of a building are largely state dependent,
ments through the facility. Corridor segments, in- in that a customer receives service in the circu-

40
Evacuation networks

lation node Sj and this service rate decays with Because of the complexity of dynamically up-
increasing amounts of customer traffic. dating the service rate as a function of the num-
Fig. 7 shows a family of curves which repre- ber of customers within a corridor segment, it be-
sent the variety of empirical studies (the curves comes extremely difficult to utilize digital simu-
in Fig. 7) that document the decay rate of the lation models in the design of circulation systems
customer service rate as a function of population within buildings. Our computational experience in
density in a corridor. Empirical models are also digital simulation of access and egress networks
available showing distributions for stairs and other underscores this defect in simulation models. We
circulation elements with bi-, and multidirectional must, therefore, look to analytical models to aid
pedestrian flows [6], [20]. the network design process if state dependent mod-
Finally, there are a set of classical linear and els are to be effectively utilized. Also, since we are
exponential curves which relate vehicle speed and examining the pedestrian/vehicular network as a
vehicle density captured in Fig. 8. We have utilized design problem rather than as a control problem, it
these type of vehicular speed/density relations to makes most sense to look at steady state measures
develop state dependent models for vehicular traf- rather than transient ones.
tic analysis [7]. We have recently developed a generalized model
of the M / G / C / C Erlang loss queueing model for
service rate decay which can model any service rate
1-5
distribution (linear, exponential, etc.)[3], [4], [15].
It is a special case of an Erlang loss model. F.P.
Kelly [9] has treated M / G / C / C state dependent
\ models in his book, but only ones with a linear,
A
\ increasing function of the number of customers in
1-0
\ the queue, whereas, we treat the queue with an
e~
nonlinear, decreasing service rate, see Fig. 3.
e-
\\\' 80 ~ , , , , , , , , , ,

70 x

60 \~
"d

50 ""\
0-5 .....
40

3o
eo

20

10

0
0 20 40 60 80 100 120 140 160 180 200 _320
0 -

0 1 2 3 4 Density (veh/mile/lane)
Crowd Density (p/m2)
Fig. 8: Empirical distributions of pedestrian traffic flows.
Fig. 7: Empirical distributions of pedestrian traffic flows.
Our M / G / C / C state dependent model dynam-
In general, the service rate # is a function of ve- ically models the flow rate of pedestrians within
locity vi, which is a constant for each individual in a corridor as a function of the population within
the corridor. Thus, it takes ti (seconds) the corridor. Suppose that G is a continuous dis-
tribution having density g and failure rate p(t) =
L g(t)/G(t). Loosely speaking, #(t) is the instanta-
ti = --
vi neous probability intensity that a service t units
old will end. The service rate depends on the num-
for each person to traverse the corridor, where i is ber of customers in the system: given that there
the number of occupants in the circulation system are n people in the system, each server processes
when an individual enters. work at rate f ( n ) . In other words, if there is an dr-

41
Evacuation networks

rival, the service rate will change to f ( n + 1) and k-Shortest Paths. The algorithm to facilitate the
if there is a departure, the service rate will change design methodology can be incorporated into any
to f ( n - 1). simulation e.g. Q-GERT or analytical model e.g.
In particular, the probability distribution of the QNET-C to estimate fl, f2, and carry out the
number of occupants in the corridor is given by: evacuation planning/routing analysis. To summa-
rize and focus the efforts in this article, an algo-
P(n in system) - [AE(S)]nP° rithmic description of Steps 1-3 and it substeps
n! f (n) . . . f (1) ' are presented.
n= 1,...,C,
1) Representation Step: Represent the underly-
where ing facility or region as a network G(V,E)
where V is a finite set of nodes and E is a
1
Po = finite set of arcs or nodal pairs.
1 + ~-~c=1 [AE(S)]'
ii?(i)'..f(1) 2) Analysis Step: Analyze G(V,E) as a queue-
L v__Rn' ing network either with a transient or steady-
E(S)- 1.5' f ( n ) - Vl
state model and compute the total evacua-
and E(S) is the mean service time of a lone occu- tion time of the occupant population along
pant flowing through a corridor of length L, with with total distance travelled to evacuate
service rate 1.5m/see (see Fig. 3). The term vn is given a set of evacuation paths.
defined as the average walking speed when n peo- 3) Synthesis Step:
ple are in the corridor. 3.1) Analyze the queueing output from the
For the M / G / C / C state dependent model, we evacuation model and compute the set
have also shown that the departure process (in- of NonInferior evacuation paths which
cluding customers completing service and those simultaneously minimize time and dis-
that are lost) is a Poisson process with rate A [3], tance travelled in G(V, E) for each oc-
[4]. cupant population.
3.1.1) If the set on NI paths are uniquely
A l g o r i t h m s . The problem we face in our evacua- optimal, then
tion planning problem is that we do not know a pri-
ori which paths are NI without assessing the con- Eij - q i j k - + q ~ 0,
gestion in G(V, E). We must iteratively generate
candidate paths, assess the congestion in G(V, E),
w,j,k,
and then iterate again until the desired trade-offs go to Step 3.2, where:
between distance travelled and evacuation time is a) Eijk is the net increase or de-
acceptable to the planner. This iterative process crease in the average egress
leads to the algorithm described below. For prod- time per person caused by re-
uct form networks where the estimate of time de- routing occupants to the (k +
lays in the Expected Savings calculation for re- 1)st NonInferior route.
routing among the alternative Noninferior paths b) qijk is the sum of the average
can be computed exactly, then the algorithm will queue times per person on the
guarantee finding a Noninferior path for re-routing original route.
the occupant classes. For nonproduct form net- C) dijk is the increased distance
works, which are typically the case, we can only travelled on the (k + 1)st NonIn-
approximate these time delays, therefore, the algo- ferior route (e.g. if the kth Non-
rithm can only guarantee an approximate Nonin- Inferior route is 100 feet and the
ferior solution. Considering the complexity of the (k + 1)st NonInferior route is
underlying stochastic-integer programming prob- 120 feet, dijk is equal to 20 feet,
lem, this is a reasonable and practical strategy. i.e. 1 2 0 - 100).

42
Evacuation networks

d) w is the average travel speed for which seems quite viable, would be to define the
k set of arc disjoint paths, since this would tend to
dij .
e) qk is the sum of the expected completely separate the occupant congestion along
queue times per person on the the paths. We have not experimented with these
(k + 1)st NonInferior route. approaches to define the evacuation routes, but
otherwise: their use might be quite appropriate in the future.
3.1.2) Significant queueing (congestion) ex-
ists on one or more routes then go to S u m m a r y a n d Conclusion. We have given some
Step 3.3. insights into the performance modeling and opti-
3.2) STOP! The NI shortest time/distance mization problems associated with evacuation net-
routes are optimal and identical and to- works. As the maturity of this application area
tal evacuation time, distance and conges- grows, and more research is devoted to the area,
tion are minimized. then more theoretical and algorithmic issues and
3.3) Determine the total number of occupants progress that emerge.
who pass through the queueing area(s) See also: M i n i m u m cost flow p r o b l e m ; N o n -
and trace them back to their origins. convex n e t w o r k flow p r o b l e m s ; Traffic net-
3.4) Select the total number of occupants to w o r k e q u i l i b r i u m ; N e t w o r k location." Cov-
be re-routed from each source node. The e r i n g p r o b l e m s ; M a x i m u m flow p r o b l e m ;
total number of occupants re-routed is Shortest path tree algorithms; Steiner tree
correlated to both the size of the queues problems; Equilibrium networks; Survivable
and the number of occupants on each networks; Directed tree networks; Dynamic
route. In selecting the population, the traffic n e t w o r k s ; A u c t i o n a l g o r i t h m s ; Piece-
analyst should strive to achieve unifor- wise linear n e t w o r k flow p r o b l e m s ; C o m m u -
mity of occupants and queues on each n i c a t i o n n e t w o r k a s s i g n m e n t p r o b l e m ; Gen-
egress route. eralized n e t w o r k s ; N e t w o r k design prob-
3.5) Re-route the population to the kth route lems; S t o c h a s t i c n e t w o r k problems" M a s -
of the NI set of paths where k is selected sively p a r a l l e l solution.
by employing the following formula:
References
k dij [1] BERLIN, G.N.: 'A simulation model for assessing build-
[ i j -- qijk -- -- + q , Vi, j, k. ing firesafety', Fire Techn. 18, no. 1 (1982), 66-76.
w
[2] CHALMET, L.G., FRANCIS, R.L., AND SAUNDERS,
3.6) Select the largest positive E* for each set P.B.: 'Network models for building evacuation', Man-
of populations to be re-routed, where: agem. Sci. 28, no. 1 (1982), 86-105.
[3] CHEAH, JENYENG: 'State dependent queueing models',
E* = max {Ell,...,EH} Master's Thesis Dept. Industr. Engin. and Oper. Res.
Vi,j sources Univ. Massachusetts, Amherst MA 01003 (1990).
for all possible savings, and then re-run [4] CHEAH, JENYENG, AND MACGRECOR SMITH, J.: 'Gen-
the computer evacuation planning model eralized M / G / C / C state dependent queueing models
and pedestrian traffic flows', Queueing Systems and
with the new set of routes, by returning
Their Applications 15 (1994), 365-386.
to Step 2.0 of the General Algorithm. If [5] FRANCIS, R.L., AND CHALMET, L.G.: 'Network mod-
all E~s are negative, stop! The current els for building evacuation: A prototype primer', Un-
set of NI shortest routes used on the pre- published Paper Dept. Industr. Systems Engin. Univ.
vious iteration are selected. Florida, Gainesville, Florida (1980).
[6] FRUIN, J.J.: Pedestrian planning and design, Metro-
Other Algorithms. Besides the k-shortest path ap- politan Assoc. Urban Designers and Environmental
Planners, 1971.
proach, one might utilize a turn-penalty algorithm
[7] JAIN, R., AND MACGREGOR SMITH, J.: 'Modeling ve-
to guide the process of determining the evacuation hicular traffic flow using M / G / C / C state dependent
paths. This is probably very appropriate in vehic- queueing models', Tansportation Sci. 31, no. 4 (1997),
ular evacuation schemes. Also, another approach 324-336.

43
Evacuation networks

[s] KARBOWICZ, C.J., AND MACGREGOR SMITH, J.: 'A Amherst, Massachusetts 01003, USA
k-shortest path routing heuristic for stochastic evacu- E-mail address: jmsmith@ecs, umass, edu
ation networks', Engin. Optim. 7 (1984), 253-280.
MSC 2000: 90-XX
[9] KELLY, F.P.: Reversibility and stochastic networks,
Wiley, 1979. Key words and phrases: combinatorial optimization, evacu-
[10] KOSTREVA, M., AND WIECEK, M.W.: 'Time depen- ation network, congestion.
dency in multiple objective dynamic programming', J.
Math. Anal. Appl. 173 (1993), 289-307.
[11] MACGREGOR SMITH,J.: 'The use of queueing networks EVOLUTIONARY ALGORITHMS IN COM-
and mixed integer programming to optimally allocate
BINATORIAL O P T I M I Z A T I O N , EACO
resources within a library layout', JASIS 32, no. 1
(1981), 33-42. Most of the NP-hard combinatorial optimization
[12] MACGREGOR SMITH, J.: 'An analytical queueing net- problems cannot be solved to optimality in prac-
work computer program for the optimal egress prob- tice. Therefore heuristic techniques have to be used
lem', Fire Techn. 18, no. 1 (1982), 18-37. to obtain solutions of high quality. There exists dif-
[13] MACGREGOR SMITH, J.: 'Queueing networks and fa-
ferent approaches to design a heuristic algorithm,
cility planning', Building and Environment 17, no. 1
(1982), 33-45.
such as tabu search and genetic algorithm for ex-
[14] MACGREGOR SMITH, J.: 'QNET-C: An interactive ample. The latter solution method belongs to a
graphics computer program for evacuation planning', wider class of algorithms, called evolutionary al-
in R. NEWKIRK (ed.): Proc. Soc. for Computer Simu- gorithms, that handle a set of several solutions.
lation Emergency Planning Session, 1987, pp. 19-24. Within this class, the best known algorithms that
[15] MACGREGOR SMITH, J.: 'State dependent queueing
are applied to combinatorial optimization prob-
models in emergency evacuation networks', Transport.
Sci. B 25B, no. 6 (1991), 373-389. lems are genetic algorithms (cf. G e n e t i c algo-
[16] MACGREGOR SMITH, J., AND ROUSE, W.B.: 'Appli- r i t h m s ) and ant systems. For a general presen-
cation of queueing network models to optimization of tation, one can mention [22], [72] for genetic algo-
resource allocation within libraries', JASIS 30, no. 5 rithms and [12], [23] for ant systems.
(1979), 250-263.
In this article, a review of the evolutionary algo-
[17] MACGREGOR SMITH, J., AND TOWSLEY, S.: 'The use
of queueing networks in the evaluation of egress from rithms used up to 1998 in combinatorial optimiza-
buildings', Environment ~ Planning B 8 (1981), 125- tion is being made. For a certain number of combi-
139. natorial problems, the main papers that present an
[ls] STAHL, FRED I.: 'BFIRES-II: A behavior based com- evolutionary algorithm for that problem are refer-
puter simulation of emergency egress during fires', Fire
enced, and some short remarks are given. While it
Techn. 18, no. 1 (1982), 49-65.
[19] TALEBI, K., AND MACGREGOR SMITH, J.: 'Stochastic
is difficult to provide a very precise definition of an
network evacuation models', Comput. Oper. Res. 12, evolutionary algorithm, this term will be used here
no. 6 (1985), 559-577. as a synonym of population-based algorithm: an
[20] TREGENZA, P.: The design of interior circulation, v. algorithm that makes evolve several solutions, in
Nostrand Reinhold, 1976. particular by exchanging some kind of information
[21] WATTS, J.M.: 'Computer models for evacuation anal-
between them. Algorithms that iteratively modify
ysis. Paper presented at the SFPE Symposium': Quan-
titative Methods .for Life Safety Analysis, College Park a solution in order to obtain a good one (like tabu
Maryland, 1986, Available from the Fire Safety Inst., search or genetic algorithms with a 'population'
Middlebury Vermont. of size 1) will not be considered as evolutionary
[22] WOODSIDE, C.M., AND HUNT, R.E.: 'Medical facili- algorithms.
ties planning using general queueing network analysis',
IEEE Trans. SMC-7, no. 11 (1977), 793-799.
[23] YUHASKI, S., AND MACGREGOR SMITH, J.: 'Modeling T h e T r a v e l i n g S a l e s m a n P r o b l e m . The trav-
circulation systems in buildings using state dependent eling salesman problem (or TSP) is probably the
queueing models', Queueing Systems and Their Appli- problem on which the largest number of evolution-
cations 4 (1989), 319-338.
ary algorithms have been applied. It consists in de-
J. MacGregor Smith termining a shortest tour visiting all of the given
Dept. Mechanical and Industrial Engin. Univ. cities exactly once. A very complete survey of lo-
Massachusetts cal search approaches to this problem has been

44
Evolutionary algorithms in combinatorial optimization

provided by D.S. Johnson and L.A. McGeoch [51], and quality of solutions, in [39]. Their use of an
while J.-Y. Potvin [70] compared several genetic edge-preserving crossover and of a hill-climbing al-
algorithms for TSP. In [51], the authors recom- gorithm illustrates important elements necessary
mend different solving techniques depending on to obtain an efficient genetic algorithm for TSP.
the quality of the solution desired and the time These elements have been put forward in different
available. Genetic algorithms or ant systems are comparisons between various genetic algorithms
a good choice if enough running time is allowed for TSP [78], [70], together with the necessity to
and good solutions are needed. With similar run- split the population into several subpopulations for
ning times, the iterated Lin-Kernighan algorithm solving large instances (more than a few hundred
(or ILK) yields better results but is more complex cities).
to implement. In ILK, a single solution instead of The first presentation of ant colony optimiza-
a population of individuals is considered and this tion (ACO) [12] was made with the TSP as illus-
method will therefore not be referred to as an evo- tration and this problem remains the most often
lutionary algorithm. If there is no restriction on used application problem of works on ant colony
the running time, the best results can be obtained optimization. The initial ACO system, named ant
by genetic algorithms based on ILK. system, has been extended to what is called ant
An important breakthrough in the field of evo- colony system (ACS). A description of this algo-
lutionary algorithms for the TSP was the paper rithm can be found in [23] by M. Dorigo and L.M.
[67] by H. Miihlenbein, M. Gorges-Schleuter and Gambardella. In the same paper, local search has
O. Kr£mer. In their algorithm, implemented on a been added to ACS and the resulting algorithm
parallel machine, a solution was allowed to mate has been applied to ATSP and TSP. The results
only with certain other solutions and some opti- reported are better in [39] for TSP, but are bet-
mization technique was applied to the offsprings. ter in [23] for ATSP. Another proposed extension
Indeed, the use of a local search algorithm to im- of ant system, called MAX-MIN ant system [79],
prove created offsprings is a necessary condition consists in introducing explicit maximum and min-
for an evolutionary algorithm to be efficient. More- imum values for the trail factors on the arcs. Good
over, they designed a crossover specific to the TSP, results are obtained with such an algorithm when
called MPX (maximum preservative crossover). It local search is added.
consists in copying a segment of a certain length
from a first parent into the offspring and adding T h e Vehicle R o u t i n g P r o b l e m . The most stud-
cities consecutively from the second parent accord- ied extension of the vehicle routing problem (VRP)
ing to some rules. This crossover is very suitable for is the one with time windows (VRPTW). In order
the TSP, as shown in [66]. Further researches stud- to solve this problem, a two-phase heuristic, called
ied the impact of the different elements on the re- GIDEON, has been proposed in [84]. The first
sults and improved the quality of the solutions ob- phase uses a genetic algorithm to cluster the cus-
tained [44], [7], [89]. Several other crossovers, most tomers, and the solutions obtained are improved
of them using two parents, have been suggested by by local optimization techniques in the second
various authors. In particular, B. Freisleben and phase. This procedure has first been improved in
P. Merz proposed [37], [38] the distance preserv- [83], and then extended in [85]. In this last paper,
ing crossover (or DPX): An offspring is created S.R. Thangiah, I.H. Osman and T. Sun present
by keeping the edges that are found in both par- several metaheuristics, all having a first phase sim-
ents, and greedily reconnecting the different pieces ilar to the one in GIDEON. These algorithms
without using the edges contained in only one par- have been compared to several other heuristics and
ent. They obtain a very efficient algorithm, that showed very good results on test problems taken
won both the ATSP (asymmetric TSP) and the from the literature. Some improvements have still
TSP competitions at the First International Con- to be brought for solving problems with large time
test in Evolutionary Optimization [6]. They fur- windows. For such problems, a heuristic based on
ther improved their algorithm, in terms of speed simulated annealing and a population-based algo-

45
Evolutionary algorithms in combinatorial optimization

rithm called GENEROUS [71] are shown to be a lem of finding a truth assignment for variables
little more efficient. The latter is not a standard to make a propositional formula true is proba-
genetic algorithm since it does not represent solu- bly the best known, and historically the first, NP-
tions by chromosomes, but it nevertheless handles complete problem. But only few evolutionary algo-
several solutions and uses a recombination oper- rithms for SAT can be found in the literature. Af-
ator. An adaptive memory procedure, in conjunc- ter a straightforward approach in [52], a rather dif-
tion with tabu search, has also been applied to this ferent solution representation has been proposed
problem [75]. in [45]. But the drawback of this method, despite
Improvements of the GIDEON approach with adapted operators, is that it increases the size of
local post-optimization procedures have also been the individuals in an important way, compared
used for the VRP with time deadlines. A compar- to the coding 'one gene for one variable'. This
ison done in [87], [86] with two other heuristics last coding has been used in [35], together with
shows that the cluster-first route-second algorithm a SAT-adapted crossover (the objective function
with a genetic algorithm in the first phase per- being simply the number of satisfied clauses). But
forms well for problems in which the customers the evolutionary algorithm thus obtained was not
are distributed uniformly and/or with short time able to compete with a tabu search (also presented
deadlines. in [35]). The tabu search-genetic hybrid (where
some iterations of tabu search is used for muta-
The Quadratic Assignment Problem. The tion) is computationally expensive, but is able to
quadratic assignment problem (or QAP) allows the solve large instances that a tabu search alone can-
modelization of many practical problems in loca- not solve. For smaller instances, the hybridization
tion science, but can be solved optimally only for is not useful.
very small instances. Therefore different heuristics Another heuristic approach to SAT consists in
have been proposed for this problem. Several of assigning weights to the different clauses and min-
them are compared in [13], [81]. For real-world imizing the sum of the weights of the unsatisfied
problems (irregular and structured), the genetic clauses. These weights are adapted during the al-
hybrid by C. Fleurent and J.A. Ferland in [33] ap- gorithm depending on the 'difficulty' of each con-
pears to be one of the most efficient algorithms straint. This mechanism has been used in evolu-
[81]. Based on a standard genetic algorithm with tionary algorithms in [25] and [90], but in both
solutions encoded as permutations [82], this ge- cases it came out that the best results are obtained
netic hybrid applies a robust tabu search on the with a 'population' of size 1. Such an algorithm is
offsprings and was able to find several new best therefore no longer considered as an evolutionary
solutions on some benchmark problems. algorithm.
The ant colony optimization approach has also
been considered, first in [64]. This ant system al- The Set Covering and Set Partitioning
gorithm, hybridized with a local search, has been Problems. The set covering problem (SCP) is a
improved in [63, 62] and provides very good re- zero-one integer programming problem where the
sults. A different ACO approach, where at each it- constraints are all of the type ~ j aijxj ~ 1 with
eration the solutions are modified instead of newly zero-one coefficients. It is a well-known problem,
constructed, has been proposed in [40]. This algo- that has also been used to study penalty functions
rithm, also hybridized with a local search proce- in genetic algorithms [74], [3].
dure, yields better results on real-world problems Different genetic algorithms approaches have
than the genetic hybrid of [33], but is not com- been proposed in the literature (see for example
petitive on random problems. A further promis- [60], [61], [50]), and a very efficient one has been
ing method, based on scatter search, has been pre- presented by J.E. Beasley and P.C. Chu in [5].
sented in [19]. This algorithm uses binary representation of the
solutions, and a repair operator to preserve the
The Satisfiability Problem (SAT). The prob- feasibility of the individuals and to improve the

46
Evolutionary algorithms in combinatorial optimization

solutions. Moreover, a variable mutation rate has specific to this problem to ensure good feasible
been introduced. Results on standard test prob- offsprings and obtained high-quality results, but
lems up to 1000 constraints and 10,000 variables needed also more computation time (on a same
show the efficiency of this algorithm that was able machine, about one hour for the genetic algorithm
to improve the best-known result on some of the against a few seconds for the other heuristics).
larger instances. The same paper shows no signif-
icant difference between various crossovers. T h e Bin P a c k i n g P r o b l e m . The standard
The set partitioning problem (SPP) is also a one-dimensional bin packing problem consists in
zero-one integer programming problem, the dif- putting items of given sizes in bins of given ca-
ference with SCP being that the constraints are pacity. Many evolutionary algorithms proposed for
equalities instead of inequalities. Relatively few this problem (genetic algorithms and evolution
heuristics have been developed for this problem. D. strategy, see for example [77], [16], [57]) performed
Levine investigated sequential and parallel genetic worse than a simple heuristic like first fit decreas-
algorithms for SPP [59]. His best algorithm was a ing. E. Falkenauer and A. Delchambre then sug-
genetic algorithm in an island model, hybridized gested in [30] a genetic algorithm designed for
with a local search heuristic. But this algorithm grouping problems: the grouping genetic algorithm
remained less efficient, both in terms of quality of (GGA). In this algorithm, solutions are repre-
the solutions and in terms of running time, than sented by chromosomes having two parts: the item
the branch and cut approach of [49]. Some prob- part encodes for each item its bin and the group
lems met by his algorithm were due to the penalty part, of variable length, encodes the bin identi-
term for infeasible solutions in the fitness function. fiers used. The crossover, mutation and inversion
In order to overcome these problems, other authors operators have been adapted to this encoding. In-
decomposed the single fitness measure in two dis- stead of simply using the number of bins, the au-
tinct parts (the objective function and a measure thors designed a fitness function that also takes
of 'infeasibility') [10]. Adapting the parent selec- into account the proportion to which each bin is
tion method to this modification, and also using filled. With this approach, they obtained very sat-
an improvement operator, they obtained a better isfactory results. The arguments presented for this
genetic algorithm, but that is still not able, for new encoding are discussed by C. Reeves in [73].
the problems they considered, to compete with a In the same paper, a hybrid genetic algorithm is
commercial mixed integer solver. presented, where solutions are represented by per-
mutations and decoded using heuristics like first
T h e K n a p s a c k P r o b l e m . The multidimensional fit and best fit. The results obtained are more or
(zero-one) knapsack problem is equivalent to the less similar to those in [30]. A problem size re-
zero-one integer programming problem with non- duction heuristic, similar to the reduction process
negative coefficients. Only few papers tried to solve used in [16], has also been introduced in this ge-
this problem with evolutionary algorithms. While netic algorithm. According to Falkenauer [29], this
the first such algorithms did not give high-quality reduction violates the search strategy of the ge-
results and were not competitive with other heuris- netic algorithm and he therefore prefers the GGA's
tics [56], [88], the quality has improved. Genetic crossover, that has the same goal of propagating
algorithms as presented in [11], [48], both work- promising bins. In the same paper, the GGA is
ing only with feasible solutions, are able to ob- improved by the introduction of local optimiza-
tain optimal solutions on standard test problems tion inspired by the dominance criterion of [65].
(instances with at most 105 variables and 30 con- The new algorithm is compared with an efficient
straints). In [11], Chu and Beasley proposed some branch and bound algorithm and gives excellent
larger test problems (up to 500 variables and 30 results.
constraints), without known optimal solution, and Extensions of the standard bin packing prob-
used them for a comparison with other heuristics. lem, like the two-dimensional bin packing problem,
Their genetic algorithm uses a 'repair' operator have also been considered with evolutionary algo-

47
Evolutionary algorithms in combinatorial optimization

rithms [77], [15], [69]. An overview of these varia- offsprings is a steepest descent method, instead of
tions is presented in [43]. a tabu search like in [34]. Despite this less sophis-
ticated method, their algorithm gives similar re-
G r a p h Coloring. The graph coloring problem is sults to those obtained by the hybrid algorithm
a well-known problem in graph theory; it consists in [34]. Moreover, the latter gives worser results
in determining the smallest number of colors that when the tabu search is replaced by a simple de-
must be used to color the vertices of a graph such scent method.
that two adjacent vertices do not have the same Concerning ant colony optimization, a first ap-
color. L. Davis is the first author who proposed proach to graph coloring has been proposed in [17],
an evolutionary algorithm for this problem [22]. but the results obtained need improvements.
In fact, he considered a graph with weights on
the vertices and an integer k. He then designed
a hybrid genetic algorithm for finding a partial k-
Other Graph Problems.
coloring such that the colored vertices have max-
imum total weight. In this algorithm, individuals Maximum Clique. The problem of determining the
are represented as permutations of the vertices of maximum clique (complete subgraph) in a graph
the graph. This order-based encoding is not very is equivalent to the problem of determining the
efficient, as shown by Fleurent and Ferland in [34]. minimum vertex cover or the maximum stable set
In this paper, they also present hybrid genetic al- in the complementary graph. A first genetic al-
gorithms that use string-based encodings of the gorithm, hybridized with a tabu search, has been
solutions for finding a coloring in k colors with proposed by Fleurent and Ferland in [35], but they
as few conflicting edges (edges with both ends of show that their tabu search alone gives similar
the same color) as possible. They consider differ- results in a shorter time. In these algorithms, a
ent crossovers, including a graph-adapted one, and solution is a set of vertices of given size and the
hybridize the genetic algorithm with a simple lo- objective function measures how many edges are
cal search or with tabu search (a modified version missing for a set to be a clique. Improving an algo-
of [46]). The results on random graphs G,~,0.5 im- rithm of [2], E. Balas and W. Niehaus [4] proposed
prove the previous best results. For graphs up to a genetic algorithm (without improving algorithm
300 vertices, their tabu search-genetic hybrid and applied to the offsprings) for both the maximum
their tabu search give similar results, but in much cardinality and maximum weight clique problems
less time for the latter. For larger graphs (500 or where an individual is a clique. In this algorithm,
1000 vertices), the running time becomes prohibi- the recombination operation ('crossover') used is
tive, and both the evolutionary algorithm and the designed specifically for this problem and taken
tabu search must be used within a different ap- from another heuristic. The results obtained on the
proach (determining large stable sets and color- DIMACS benchmark graphs are very good, similar
ing the residual graph). The tests on 450-vertices to those obtained in [35] from the point of view of
Leighton graphs (with known chromatic numbers) the solutions' quality. A different fitness function
showed that the tabu search-genetic hybrid out- has been suggested in [8] and included in a hybrid
performs the tabu search on about half of the in- genetic algorithm using a local optimization step.
stances, while the opposite is true for the remain- The fitness value associated to a set of vertices is
ing instances. The hybrid algorithm was able to a weighted combination of the size of the set and
find an optimal solution for two instances (out of the number of edges missing to have a clique, but
twelve) that could not be solved by the tabu search the weights are modified during the run of the al-
alone. gorithm according to a simple rule. Despite the in-
Another evolutionary algorithm has been pro- troduction of a preprocessing step that determines
posed in [18], with a graph-adapted crossover that the order of the vertices on the chromosome, this
takes into account how 'close' a vertex is to con- algorithm is less efficient (but this may be due to
flicting edges. The improving algorithm applied to the use of the 2-point crossover).

48
Evolutionary algorithms in combinatorial optimization

Graph Partitioning. Evolutionary algorithms are solution is represented by the coordinates of the
rather seldom used to tackle the k-way graph par- Steiner points. A comparison with simulated an-
titioning problem (partitioning a (weighted) graph nealing and the Rayward-Smith-Care algorithm
in k equal-sized parts), even if the graph bisection- shows no significant differences. The problem of
ing problem (the case k = 2) is sometimes taken to the rectilinear Steiner problem has been addressed
illustrate various ingredients in genetic algorithms in [53] with a specific coding and an adapted
([9], [54]). For the general k-way graph partition- crossover. The minimal Steiner tree problem in
ing problem, different problem-oriented operators graphs has attracted a little more interest. A stan-
are introduced and studied in a parallel genetic dard genetic algorithm (with bit strings as chro-
algorithm in [58]. In this algorithm, the popula- mosomes) that gave good results on the sparse
tion is only composed of feasible solutions. An- graphs tested has been proposed in [55]. Later, H.
other approach has been proposed in [76] where Esbensen and P. Mazumder [28] designed a genetic
the population is split in two halves: one contain- algorithm in which the encoding method is based
ing only feasible solutions and the other only infea- on the distance network heuristic. Improvements
sible ones. This algorithm uses the same encoding have been brought in [26] and [27], where there
scheme and crossover operator as [58], but has not is also a comparison between different algorithms.
been applied on similar instances of the problem. But this genetic algorithm is not competitive with
In a general way, genetic algorithms give good re- an efficient tabu search as presented in [41].
sults on partitioning problems, but at a very high
computational cost.
Conclusion. In this paper, some references on the
Miscellaneous. evolutionary approaches that have been proposed
up to 1998 for different combinatorial problems
Sequencing and Scheduling. The best-known se-
have been given. A general remark that can be
quencing and scheduling problems are the flow-
made on these solution methods is that evolution'
shop, job-shop and open shop problems. The first
ary algorithms in general, and genetic algorithms
paper applying an evolutionary algorithm to such
in particular, are not efficient for such problems
a problem is [21]. Later, several other genetic al-
if implemented too naively. To obtain an algo-
gorithms have been proposed ([36], [80] for exam-
rithm with good performances, it is necessary to
ple). One of the first efficient evolutionary algo-
make adjustments of the basic method. Moreover,
rithm for job-shop problems has been presented in
knowledge about the problem considered is very
[68] and improved in [91], [20]. Comparisons done
often also needed, in order to design adapted op-
with other heuristics on benchmark problems show
erators.
that sophisticated genetic algorithms (with the use
of problem-adapted crossovers and hybridization) Another remark concerns their competitivity
yield the best results for flow-shop and job-shop compared to other heuristic methods. While evo-
problems [1], [24], [42]. The open shop problems lutionary algorithms can quite easily be adapted
have less attracted researchers of the evolutionary to (almost) any problem, their running time is of-
algorithms' field, but a genetic algorithm has been ten quite high. Local search algorithms, like tabu
proposed in [32], [31]. An ant colony approach of search or simulated annealing, can also be adapted
job-shop problems has also been tested, in [14], but to the different combinatorial problems quite eas-
gave worse results than known genetic algorithms. ily. If they are designed in an intelligent way,
they are very often able to obtain better results
Steiner Trees. Only very few works deal with than evolutionary algorithms. Moreover, they are
Steiner trees and evolutionary algorithms. More- usually faster. For some problems, specifically de-
over, they consider different variants of this prob- signed heuristics can use theoretical results about
lem. The first paper [47] proposes a genetic al- this problem, allowing them to obtain good results.
gorithm with local optimization for determining In general, evolutionary algorithms are not com-
minimum Steiner trees in the Euclidean plane. A petitive against (extended) local search or specific

49
Evolutionary algorithms in combinatorial optimization

algorithms for small to m e d i u m size instances of Is] BuI, T.N., AND EPPLEY, P.H.: 'A hybrid genetic al-

combinatorial problems. gorithm for the maximum clique problem', in L.J. ES-
HELMAN (ed.): Proc. 6th Internat. Conf. Genetic Algo-
B u t this does not m e a n t h a t p o p u l a t i o n - b a s e d rithms, Morgan Kaufmann, 1995, pp. 478-484.
algorithms are not useful. In fact, the different ap- [9] BuI, T.N., AND MOON, B.R.: 'On multi-dimensional
proaches have various (dis)advantages, and the ef- encoding/crossover', in L.J. ESHELMAN (ed.): Proc.
ficient algorithms t h a t will be developed in the fu- 6th Internat. Conf. Genetic Algorithms, Morgan Kauf-
ture will p r o b a b l y mix these different approaches. mann, 1995, pp. 49-56.
Such algorithms are usually called 'hybrid algo-
[10] CHU, P.C., AND BEASLEY, J.E.: 'Constraint handling
in genetic algorithms: the set partitioning problem', J.
r i t h m s ' and have already been proposed for exam- Heuristics 4 (1998), 323-357.
ple for the traveling salesman p r o b l e m [39] or the [11] CHU, P.C., AND BEASLEY, J.E.: 'A genetic algorithm
q u a d r a t i c assignment p r o b l e m [33], d e m o n s t r a t i n g for the multidimensional knapsack problem', J. Heuris-
their potentials. tics 4 (1998), 63-86.
See also: F r a c t i o n a l c o m b i n a t o r i a l opti-
[12] COLORNI, A., DORIGO, M., AND MANIEZZO, V.: 'Dis-
tributed optimization by ant coloniess', in F. VARELA
mization; Replicator dynamics in c o m b i - AND P. BOURGINE (eds.): Proc. ECAL91 - European
n a t o r i a l o p t i m i z a t i o n ; N e u r a l n e t w o r k s for Conf. Artificial Life, Elsevier, 1991, pp. 134-142.
combinatorial optimization; Combinatorial [13] COLORNI, A., DORIGO, M., AND MANIEZZO, V.: 'A1-
matrix analysis; Multi-objective combinato- godesk: An experimental comparison of eight evolu-
tionary heuristics applied to the quadratic assignment
rial optimization; Combinatorial optimiza-
problem', Europ. J. Oper. Res. 81 (1995), 188-205.
tion games.
[14] COLORNI, A., DORIGO, M., MANIEZZO,V., AND TRU-
BIAN, M.: 'Ant system for job-shop scheduling', JOR-
BEL - Belgian J. Oper. Res., Statist. and Computer
References Sci. 34, no. 1 (1994), 39-53.
[1] AARTS, E.H.L., LAARHOVEN, P.J.M. VAN, LENSTRA, [1~] CORCORAN, A.L., AND WAINWRIGHT, R.L.: 'A ge-
J.K., AND ULDER, N.L.J.: 'A computational study of netic algorithm for packing in three dimensions': Proc.
local search algorithms for job shop scheduling', ORSA 1992 A CM/SIGAPP Symposium on Applied Comput-
J. Comput. 6 (1994), 118-125. ing SAC'92, ACM, 1992, pp. 1021-1030.
[2] AGGARWAL, C.C., ORLIN, J.B., AND TAI, R.P.: 'An [16] CORCORAN, A.L., AND WAINWRIGHT, R.L.: 'A heuris-
optimized crossover for maximum independent set', tic for improved genetic bin packing', Techn. Report
Oper. Res. 45 (1995), 226-234. UTULSA-MCS-93-08, Univ. Tulsa, USA (1993).
[3] BACK, T., SCHUTZ, M., AND KHURI, S.: 'A compara- [17] COSTA, D., AND HERTZ, A.: 'Ants can color graphs',
tive study of a penalty function, a repair heuristic, and J. Oper. Res. Soc. 48 (1997), 295-305.
stochastic operators with the set-covering problem', in [ls] COSTA, D., HERTZ, A., AND DUBUIS, O.: 'Embedding
J.M. ALLIOT, E. LUTTON, E. RONALD, M. SCHOEN- a sequential procedure within an evolutionary algo-
HAUER, AND D. SNYERS (eds.): Artificial Evolution: rithm for coloring problems in graphs', J. Heuristics
European Conf., Vol. 1063 of Lecture Notes Computer 1 (1995), 105-128.
Sci., Springer, 1996, pp. 3-20. [19] CUNG, V.-D., MAUTOR, TH., MICHELON, PH., AND
[4] BALAS, E., A N D NIEHAUS, W.: 'Optimized crossover- TAVARES, A.: 'A scatter search based approach for the
based genetic algorithms for the maximum cardinality quadratic assignment problem': Proc. 1997 IEEE In-
and maximum weight clique problems', J. Heuristics 4 ternat. Conf. Evolutionary Computation, IEEE Press,
(1998), 107-122. 1997, pp. 190-206.
[5] BEASLEY, J., AND CHU, P.: 'A genetic algorithm for [20] DAVIDOR, Y., YAMADA, T., AND NAKANO, R.: 'The
the set covering problem', Europ. J. Oper. Res. 94 ecological framework II: Improving GA performance
(1996), 392-404. with virtually zero cost', in S. FORREST (ed.): Proc.
[6] BERSINI,H., DORIGO,M., LANGERMAN,S., SERONT, 5th Internat. Conf. Genetic Algorithms, Morgan Kauf-
G., AND GAMBARDELLA, L.M.: 'Results of the first mann, 1993, pp. 171-176.
international contest on evolutionary optimisation (Ist [21] DAVIS, L.: 'Job shop scheduling with genetic algo-
ICEO)': Proc. 1996 IEEE Internat. Conf. Evolutionary rithms', in J.J. GREFENsTETTE (ed.): Proc. 1st Inter-
Computation, IEEE Press, 1996, pp. 611-615. nat. Conf. on Genetic Algorithms, Lawrence Erlbaum
[7] BRAUN, H.: 'On solving travelling salesman prob- Ass., 1985, pp. 136-140.
lems by genetic algorithms', in H.-P. SCHWEFEL AND [22] DAVIS, L.: Handbook of genetic algorithms, v. Nos-
R. M)i.NNER (eds.): Parallel Problem Solving from Na- trand Reinhold, 1991.
ture, Vol. 496 of Lecture Notes Computer Sci., Springer, [23] DORIGO, M., AND GAMBARDELLA, L.M.: 'Ant colony
1991, pp. 129-133.

50
Evolutionary algorithms in combinatorial optimization

system: A cooperative learning approach to the trav- nat. Conf. on Evolutionary Computation, IEEE Press,
eling salesman problem', IEEE Trans. Evolutionary 1996, pp. 616-621.
Computation 1 (1997), 53-66. [3s] FREISLEBEN, B., AND MERZ, P.: 'New genetic local
[24] DUVIVIER, D., PREUX, PH., AND TALBI, E.-G.: 'Sto- search operators for the traveling salesman problem',
chastic algorithms for optimization and application to in H.-M. VOIGT, W. EBELING, I. RECHENBERG, AND
job-shop-scheduling', Techn. Report LIL-95-5, Univ. H.-P. SCHWEFEL (eds.): Proc. 4th Con/. on Paral-
du Littoral, France (1995). lel Problem Solving from Nature, Vol. 1141 of Lecture
[25] EIBEN, A.E., AND HAUW, J.K. VAN DER: 'Solving 3- Notes Computer Sci., Springer, 1996, pp. 890-899.
SAT with adaptive genetic algorithms': Proc. ~th IEEE [39] FREISLEBEN, B., AND MERZ, P.: 'Genetic local search
Conf. Evolutionary Computation, IEEE Press, 1997, for the TSP: new results': Proc. 1997 IEEE Internat.
pp. 81-86. Conf. on Evolutionary Computation, IEEE Press, 1997,
[26] ESBENSEN, H.: 'Computing near-optimal solutions to pp. 159-164.
the Steiner problem in a graph using a genetic algo- [40] GAMBARDELLA, L.-M., TAILLARD, E.D., AND
rithm', Networks 26 (1995), 173-185. DORIGO, M.: 'Ant colonies for the quadratic as-
[27] ESBENSEN, H.: 'Finding (near-)optimal Steiner trees in signment problems', J. Oper. Res. Soc. 50 (1999),
large graphs', in L.J. ESHELMAN (ed.): Proc. 6th Inter- 167-176.
nat. Con/. on Genetic Algorithms, Morgan Kaufmann, [41] GENDREAU, M., LAROCHELLE, J.-F., AND SANS6, B."
1995, pp. 485-491. 'A tabu search heuristic for the Steiner tree problem',
[ss] ESBENSEN, H., AND MAZUMDER, P.: 'A genetic algo- GERAD G-98-01, Univ. Montrdal, Canada (1998).
rithm for the Steiner problem in a graph', Techn. Re- [42] GLASS, C.A., AND POTTS, C.N.: 'A comparison of lo-
port Univ. Michigan, Ann Arbor (1993). cal search methods for flow shop', in G. LAPORTE AND
[29] FALKENAUER, E.: 'A hybrid grouping genetic algorithm I.H. OSMAN (eds.): Metaheuristics in combinatorial op-
for bin packing', J. Heuristics 2 (1996), 5-30. timization, Vol. 63 of Ann. Oper. Res., Baltzer, 1996,
[30] FALKENAUER, E., AND DELCHAMBRE, A.: 'A genetic pp. 489-509.
algorithm for bin packing and line balancing': Proc. [43] GOODMAN, E.D., TETELBAUM, A.Y., AND KURE-
1992 IEEE Internat. Conf. on Robotics and Automa- ICHIK, V.M.: 'A genetic algorithm approach to com-
tion, IEEE Computer Soc. Press, 1992, pp. 1186-1192. paction, bin packing and nesting problems', Techn. Re-
[31] FANG, H.-L.: 'Genetic algorithms in timetabling and port GARAGe94-4, Michigan State Univ. (1994).
scheduling', PhD Thesis, Univ. Edinburgh (1994). [44] GORGES-SCHLEUTER, M.: 'Asparagos: An asynchro-
[32] FANG, H.-L., Ross, P., AND CORNE, D.: 'A promis- nous parallel genetic optimization strategy', in J.D.
ing genetic algorithm approach to job-shop scheduling, SCHAFFER (ed.): Proc. 3rd Internat. Con/. on Genetic
re-scheduling, and open-shop scheduling problems', in Algorithms, Morgan Kaufmann, 1989, pp. 422-427.
S. FORREST (ed.): Proc. 5th Internat. Conf. Genetic [45] HAO, J.K.: 'A clausal genetic representation and its
Algorithms, Morgan Kaufmann, 1993, pp. 375-382. related evolutionary procedures for satisfiability prob-
[33] FLEURENT, C., AND FERLAND, J.A.: 'Genetic hy- lems', in D.W. PEARSON, N.C. STEELE, AND R.F. AL-
brids for the quadratic assignement problem', in P.M. BRECHT (eds.): Proc. 2nd Internat. Conf. on Artificial
PARDALOS AND H. WOLKOWICZ (eds.): Quadratic as- Neural Networks and Genetic Algorithms, Springer,
signment and related problems, DIMACS 16, Amer. 1995, pp. 289-292.
Math. Soc., 1994, pp. 190-206. [46] HERTZ, A., AND WERRA, D. DE: 'Using tabu search
[34] FLEURENT, C., AND FERLAND, J.A.: 'Genetic and hy- techniques for graph coloring', Computing 39 (1987),
brid algorithms for graph coloring', in G. LAPORTE 345-351.
AND I.H. OSMAN (eds.): Metaheuristics in combinato- [47] HESSER, J., M.~NNER, R., AND STUCKY, O.: 'On
rial optimization, Vol. 63 of Ann. Oper. Res., Baltzer, Steiner trees and genetic algorithms', in J.D. BECKER,
1996, pp. 437-461. I. EISELE, AND F.W. MUNDEMANN (eds.): Parallelism,
[35] FLEURENT, C., AND FERLAND, J.A.: 'Object-oriented Learning, Evolution, Vol. 565 of Lecture Notes Artifi-
implementation of heuristic search methods for graph cial Intelligence, Springer, 1991, pp. 509-525.
coloring, maximum clique, and satisfiability', in D.S. [4s] HOFF, A., LOKKETANGEN, A., AND MITTET, I." 'Ge-
JOHNSON AND M.A. TRICK (eds.): Cliques, coloring, netic algorithms for 0/1 multidimensional knapsack
and satisfiability, Amer. Math. Soc., 1996, p. 619. problems', Proc. Norsk Informatik Konferanse, NIK
[36] Fox, B.R., AND MCMAHON, M.B.: 'Genetic operators '96 (1996).
for sequencing problems', in G.J.E. RAWLINS (ed.): [49] HOFFMAN, K., AND PADBERG, M.: 'Solving airline
Foundations of Genetic Algorithms, Morgan Kauf- crew-scheduling problems by branch-and-cut', Man-
mann, 1991, pp. 284-300. agem. Sci. 39 (1993), 657-682.
[37] FREIsLEBEN, B., AND MERZ, P.: 'A genetic local [50] HUANG, W.-C., KAO, C.-Y., AND HORNG, J.-T.:
search algorithm for solving symmetric and asymmetric 'A genetic algorithm approach for set covering prob-
traveling salesman problems': Proc. 1996 IEEE Inter- lems': Proc. First IEEE Internat. Conf. on Evolution-

51
Evolutionary algorithms in combinatorial optimization

ary Computation, IEEE Press, 1994, pp. 569-574. Trans. Knowledge and Data Engin. (1998).
[51] JOHNSON, D.S., AND MCGEOCH, L.A.: 'The travel- [64] MANIEZZO, V., COLORNI, A., AND DORIGO, M.: 'The
ing salesman problem: A case study in local optimiza- ant system applied to the quadratic assignment prob-
tion', in E.H.L. AARTS AND J.K. LENSTRA (eds.): Lo- lem', Techn. Report IRIDIA/94-28, Univ. Libre de
cal Search in Combinatorial Optimization, Wiley, 1997, Bruxelles, Belgium (1994).
pp. 215-310. [65] MARTELLO, S., AND TOTH, P.: 'Lower bounds and re-
[52] JONG, K.A. DE, AND SPEARS, W.M.: 'Using genetic duction procedures for the bin packing problem', Dis-
algorithms to solve NP-complete problems', in J.D. crete Appl. Math. 22 (1990), 59-70.
SCHAFFER (ed.): Proc. 3rd Internat. Conf. on Genetic [66] MATHIAS, K., AND WHITLEY, D.: 'Genetic opera-
Algorithms, Morgan Kaufmann, 1989, pp. 123-132. tors, the fitness landscape and the traveling salesman
[53] JULSTROM, B.A.: 'A genetic algorithm for the rectilin- problem', in R. M~.NNER AND B. MANDERICK (eds.):
ear Steiner problem', in S. FORREST (ed.): Proc. 5th In- Parallel Problem Solving from Nature, Elsevier, 1992,
ternat. Conf. Genetic Algorithms, Morgan Kaufmann, pp. 219-228.
1993, pp. 474-480. [67] M~)HLENBEIN, H., GORGES-SCHLEUTER, M., AND
[54] KAHNG, A.B., AND MOON, B.R.: 'Toward more power- KR)i.MER, O.: 'Evolution algorithms in combinatorial
ful recombinations', in L.J. ESHELMAN (ed.): Proc. 6th optimization', Parallel Comput. 7 (1988), 65-85.
Internat. Conf. on Genetic Algorithms, Morgan Kauf- [6s] NAKANO, R., AND YAMADA, T.: 'Conventional genetic
mann, 1995, pp. 96-103. algorithm for job shop problems', in R. BELEW AND
[55] KAPSALIS, A., RAYWARD-SMITH, V.J., AND SMITH, L. BOOKER (eds.): Proc. 4th Internat. Conf. on Ge-
G.D.: 'Solving the graphical Steiner tree problem us- netic Algorithms, Morgan Kaufmann, 1991, pp. 474-
ing genetic algorithms', J. Oper. Res. Soc. 44 (1993), 479.
397-406. [69] PARGAS, R.P., AND JAIN, R.: 'A parallel stochastic op-
[56] KHURI, S., BACK, T., AND HEITK()TTER, J.: 'The timization algorithm for solving 2D bin packing prob-
zero/one multiple knapsack problem and genetic algo- lems': Proc. 9th Conf. on Artificial Intelligence for Ap-
rithms': Proc. 1994 ACM Symposium on Applied Com- plications, 1993, pp. 18-25.
puting, ACM, 1994, pp. 188-193. [70] POTVIN, J.-Y.: 'Genetic algorithms for the traveling
[57] KHURI, S., SCHLITZ, M., AND HEITK()TTER, J.: 'Evo- salesman problem', in G. LAPORTE AND I.H. OSMAN
lutionary heuristics for the bin packing problem', in (eds.): Metaheuristics in combinatorial optimization,
D.W. PEARSON, N.C. STEELE, AND R.F. ALBRECHT Vol. 63 of Ann. Oper. Res., Baltzer, 1996, pp. 339-370.
(eds.): Proc. 2nd Internat. Conf. on Artificial Neu- [71] POTVIN, J.-Y., AND BENGIO, S.: 'A genetic approach
ral Networks and Genetic Algorithms, Springer, 1995, to the vehicle routing problem with time windows',
pp. 285-288. Techn. Report CRT-953, Univ. Montrdal (1993).
[58] LASZEWSKI, G. VON: 'Intelligent structural opera- [72] REEVES, C.R. (ed.): Modern heuristic techniques for
tors for the k-way graph partitioning problem', in combinatorial problems, Blackwell, 1993.
R. BELEW AND L. BOOKER (eds.): Proc. 4th Inter- [73] REEVES, C.: 'Hybrid genetic algorithms for bin-
nat. Conf. on Genetic Algorithms, Morgan Kaufmann, packing and related problems', in G. LAPORTE AND
1991, pp. 45-52. I.H. OSMAN (eds.): Metaheuristics in combinatorial op-
[59] LEVINE, D.: 'A parallel genetic algorithm for the set timization, Vol. 63 of Ann. Oper. Res., Baltzer, 1996,
partitioning problem', PhD Thesis Illinois Inst. Techn. pp. 371-396.
(1994). [74] RICHARDSON, J.T., PALMER, M.R., LIEPINS, G.E.,
[60] LIEPINS, G.E., HILLIARD, M.R., PALMER, M.R., AND AND HILLIARD, M.: 'Some guidelines for genetic al-
MORROW, M.: 'Greedy genetics', in J.J. GREFEN- gorithms with penalty functions', in J.D. SCHAFFER
STETTE (ed.): Proc. 2nd Internat. Conf. on Genetic (ed.): Proc. 3rd Internat. Conf. on Genetic Algorithms,
Algorithms, Lawrence Erlbaum Ass., 1987. Morgan Kaufmann, 1989, pp. 191-197.
[61] LIEPINS, G.E., HILLIARD, M.R., RICHARDSON, J.T., [75] ROCHAT, Y., AND TAILLARD, E.D.: 'Probabilistic di-
AND PALMER, M.: 'Genetic algorithms applications to versification and intensification in local search for ve-
set covering and traveling salesman problems', in D.E. hicle routing', J. Heuristics 1 (1995), 147-167.
BROWN AND C.C. WHITE (eds.): Oper. Res. and Arti- [76] SEKHARAN, D.A., AND WAINWRIGHT, R.L.: 'Manip-
ficial Intelligence: The Integration of Problem-Solving ulating subpopulations in genetic algorithms for solv-
Strategies, Kluwer Acad. Publ., 1990, pp. 29-57. ing the k-way graph partitioning problem': Proc. 7th
[62] MANIEZZO, V.: 'Exact and approximate nondetermin- Oklahoma Symposium on Artificial Intelligence, 1993,
istic tree-search procedures for the quadratic assign- pp. 215-225.
ment problem', Techn. Report Univ. Bologna C S R 98- [77] SMITH, D.: 'Bin packing with adaptive search', in
1 (1998). J.J. GREFENsTETTE (ed.): Proc. 1st Internat. Conf.
[63] MANIEZZO, V., AND COLORNI, A.: 'The ant system on Genetic Algorithms, Lawrence Erlbaum Ass., 1985,
applied to the quadratic assignment problem', IEEE pp. 202-207.

52
Extended cutting plane algorithm

ITs] STARKWEATHER, T., MCDANIEL, S., MATHIAS, K., 290.


WHITLEY, D., AND WHITLEY, C.: 'A comparison
of genetic sequencing operators', in R. BELEW AND Daniel Kobler
L. BOOKER (eds.): Proc. 4th Internat. Conf. on Ge- Dept. Math. Swiss Federal Inst. Technol.
netic Algorithms, Morgan Kaufmann, 1991, pp. 69-76. CH-1015 Lausanne, Switzerland
[79] ST~ITZLE, T., AND HOOS, H.: 'The MAX-MIN ant sys- E-mail address: Daniel.Kobler~epfl.ch
tem and local search for the traveling salesman prob-
lem': Proc. 1997 IEEE Internat. Conf. on Evolutionary MSC2000: 90C27, 05-04
Computation, IEEE Press, 1997, pp. 308-313. Key words and phrases: evolutionary algorithm, combina-
[so] SYSWERDA, G.: 'Schedule optimization using genetic torial optimization, heuristics.
algorithms', in L. DAVIS (ed.): Handbook Genetic Al-
gorithms, v. Nostrand Reinhold, 1991, p. 333.
[Sl] TAILLARD, E.: 'Comparison of iterative searches for the
quadratic assignment problem', Location Sci. 3 (1995), EXTENDED CUTTING PLANE ALGO-
87-105. RITHM
Is2] TATS, D.M, AND SMITH, A.E.: 'A genetic approach to
The (~-ECP (extended cutting plane) algorithm
the quadratic assignment problem', Computers Oper.
Res. 22 (1995), 73-83. ([12], [14]) is an algorithm for solving quasiconvex
Is3] THANGIAH, S.R.: 'Vehicle routing with time windows MINLP (mixed integer nonlinear programming)
using genetic algorithms', in L. CHAMBERS (ed.): Ap- problems. The algorithm approximates the feasible
plications handbook of genetic algorithms: new fron- region with linear approximations and solves a se-
tiers, CRC Press, 1995.
quence of MILP problems based on these approx-
Is4] THANGIAH, S.R., NYGARD, K.E., AND JUELL, P.L.:
'GIDEON: A genetic algorithm system for vehicle rout- imations. There are several other similar meth-
ing with time windows': Proc. 7th IEEE Conf. Artificial ods, for instance the generalized Benders decom-
Intelligence Applications, IEEE Computer Soc. Press, position method ([6]), the outer approximation
1991, pp. 322-328. method ([3]), the generalized outer approxima-
[ss] THANGIAH, S.R., OSMAN, I.H., AND SUN, T.: 'Meta- tion method ([15]), the L P / N L P based branch and
heuristics for vehicle routing problems with time win-
bound method ([8]) and the linear outer approxi-
dows', Techn. Report Slippery Rock Univ. (1995).
Is6] THANGIAH, S.R., OSMAN, I.H., VINAYAGAMOORTHY, mation method ([4]). A good overview of MINLP
R., AND SUN, T.: 'Algorithms for the vehicule rout- algorithms and applications is given in [5]. All
ing problems with time deadlines', American J. Math. other methods iteratively solve both NLP and
Management Sci. 13 (1994), 323-355. MILP problems, while the (~-ECP method only
[sT] THANGIAH, S.R., VINAYAGAMOORTHY, R., AND
solves MILP problems. The size of the MILP prob-
GUBBI, A.: 'Vehicle routing with time deadlines us-
ing genetic and local algorithms', in S. FORREST (ed.): lems grow in each iteration, so efficient algorithms
Proc. 5th Internat. Conf. Genetic Algorithms, Morgan of this type require efficient MILP solvers.
Kaufmann, 1993, pp. 506-513. Most of the MINLP methods can only ensure
[ss] THIEL, J., AND VOSS, S.: 'Some experiences on solving global convergence for convex MINLP problems.
multiconstraint zero-one knapsack problems with ge-
The c~-ECP method can also solve quasiconvex
netic algorithm', INFOR special issue: Knapsack, pack-
ing and cutting, Part H 32 (1994), 226-242. problems. Different heuristic procedures for some
[sg] ULDER, N.L.J., AARTS, E.H.L, BANDELT, H.J., of the above algorithms have been introduced for
LAARHOVEN, P.J.M. VAN, AND PESCH, E.: 'Genetic the nonconvex case, e.g., [10], [13]. Although these
local search algorithms for the traveling salesman prob- methods perform quite well in different applica-
lems', in H.-P. SCHWEFEL AND R. M~.NNER (eds.): tions, convergence towards the optimal solution
Parallel Problem Solving from Nature, Vol. 496 of Lec-
cannot generally be ensured by these algorithms
ture Notes Computer Sci., Springer, 1991, pp. 109-116.
[90] VINK, M.: 'Solving combinatorial problems using evo- for nonconvex problems.
lutionary algorithms', Techn. Report Leiden Univ., There are also some recent MINLP global op-
Netherlands (1997). timization methods ([1], [2], [9], [11]). In these al-
[91] YAMADA, T., AND NAKANO, R.: 'A genetic algo- gorithms the function space is separated for the
rithm applicable to large-scale job-shop problems', in
continuous and discrete variables and the dis-
R. MANNER AND B. MANDERICK (eds.): Parallel Prob-
lem Solving from Nature, 2, Elsevier, 1992, pp. 281- crete variables can only occur in linear space. The
c~-ECP method can solve quasiconvex problems

53
Extended cutting plane algorithm

where the discrete variables are involved in non- D e f i n i t i o n of t h e A l g o r i t h m . The algorithm


linear equations as well. solves the problem (1) by approximating the maxi-
Although a valid optimal solution is ensured mal violated nonlinear function with a linear func-
only for quasiconvex problems, the algorithm also tion
provides good approximations for the global opti- l(z) - g (z k) + . vg (zk) r(z - zk) (a)
mal solution of general MINLP global optimization
problems.
in the current iterate zk where i =
a r g m a x i { g i ( z k ) } . To simplify notation, let gk --
gi(zk). Furthermore, if the linearization added to
F o r m u l a t i o n o f t h e M I N L P P r o b l e m . The c~-
the MILP problem is the j t h linearization, let
E C P algorithm can be used to solve problems of
-~j - g i ( z k ) , - ~ j ( z ) -- gi(z), V-gj - V g i ( z k) and
the form
-2J - z k where i is defined as above. The c~ values
T
min C z change from iteration to iteration so to be able to
s.t. g ( z ) <_ o reference the value of the j t h constant in iteration
(1) k the c~ constants are replaced with c~ . Thus the
Az<a
linearization (3) is redefined so that in iteration k
Bz-b
the j t h linear approximation l}k)" will be
zEX×Y
- + -

where c is a vector of constants, z = (x, y) con-


sists of a vector x of continuous variables in R n and the algorithm adds the linear constraint
and a vector y of integer variables in Z m and l~k) (z) <_ 0 (4)
g(z): R n x Z m --+ R p is a vector of continuous
to the MILP problem. The a constants initially
differentiable quasiconvex functions defined on the
have the value aj(k) _ 1 and they are either left
set X × Y having nonzero gradients in the infea-
unchanged or increased by a factor in each iter-
sible region of (P). The feasible region of (P) is
ation. The algorithm then iteratively adds more
assumed to be nonempty. Furthermore X is a com-
and more constraints to a MILP problem originally
pact convex set X C R n and Y is a finite discrete
consisting of only the linear constraints A z <_ a
set Y C Z m.
and B z - b from (1). In iteration k it thus solves
The matrices A and B and vectors a and b are
the MILP problem
used to define the linear constraints of the problem
and are of suitable dimensions. min c q-z
The c~-ECP m e t h o d guarantees global optimal s.t. 15k) < 0 , j-l,..., Lk
solutions for MINLP problems having a linear Az < a (5)
objective function and differentiable quasiconvex
Bz-b
constraints. The linear objective function is not
too restrictive since most optimization problems zcXxY
having a nonlinear objective f ( z ) can be rewrit- where Lk is the n u m b e r of linearizations in iter-
ten as a problem involving an additional variable ation k. The solution to this MILP problem will
u and an additional constraint be the new iteration point. Using this point a new
linearization is added to the MILP problem or one
f (z)- u < O. (2)
or several of the c~ constants are updated. The pro-
The new problem, then, will be to minimize u sub- cedure is then repeated until a feasible point of (1)
ject to the original constraints and the additional is found. A point is considered feasible if
constraint (2). Note, however, that this is not, in gi(z) <_eg, i-1,...,p, (6)
general, possible for quasiconvex objectives since
for some prespecified tolerance eg. Note that the
f (z) - u is not necessarily quasiconvex when f (z)
constraints A z < a and B z - b are automati-
is quasiconvex.
cally satisfied since the current iteration point is

54
Extended cutting plane algorithm

the solution to (5). The idea of finding a feasible g_j + ~k). (vyj)T (z - ~ ) _< ~j(z), (s)
and optimal point by solving a sequence of MILP
Vz e {z e x x Y. ~ j ( z ) _ 0}.
problems is the same as in the classical Kelley's
cutting plane method for NLP problems. However, A weaker condition is that the inequality (8) is
J.E. Kelley [7] considered only the continuous case satisfied only for all current iteration points. If this
using LP subsolutions. Furthermore, Kelley's cut- condition is satisfied, the linearization is called a
ting plane algorithm assumes that the lineariza- local underestimator. Thus the linearization is a
tions will always be valid underestimators of the local underestimator if it satisfies the following in-
corresponding nonlinear functions. This is true if equality in iteration k
the functions are convex, since for convex functions
it holds that yj + ~k). (vyj)T(z k _ ~j) < ~j(zk), (9)
g~(z k) + vg~(zk) T (z - z k) _< g~(z) (7) j = 1,...,Lk. (10)

for all z, z k C X × Y. Thus l~k) (z) < 0 whenever This inequality is easy to check in each itera-
.I

tion. If there is some a constant c~ that does


gj(z) <- O even when a -1.
not satisfy (9) then it is updated by multiplying
Unfortunately (7) does not generally hold for
the constant with/3. The update formula is thus
quasiconvex functions. It is possible that the lin-
ear approximations are not valid underestimators (~(k+l) __ { t~ " Ol~k), l~k)(Zk) > -gj(Zk)'
of the corresponding nonlinear function and thus
(11)
J -- a~ k) otherwise.
the constraint l} ~ 0 may cut away parts of the
feasible region. To avoid this problem the a con- The/3 constant is a prespecified constant (/3 >
stants have been introduced. By using sufficiently 1). The concept of local underestimators is now
large values it is ensured that l(ik)" < 0 whenever extended to feasible underestimators. A lineariza-
g-~(z) <_ 0 holds, the linearizations will then be tion is called a feasible underestimator if it approx-
valid outer approximations of the feasible region imates the entire feasible region. Thus, for such
of (1). linearizations, it holds that
Generally it is not known how large the a con-
g_j + ~k). (v~j)T(z _ ~j) < 0, (12)
stants should be. Instead an updating strategy is
used. The a values are checked and updated in Vz e {z e x × y . ~j(z) < 0}. (13)
each iteration if they turn out to be too small.
This is a much more strict requirement since
The updated value is obtained by multiplying the
a local underestimator need only underestimate
current value with a constant greater than one.
the nonlinear function in a finite set of infeasi-
When the current MILP solution is a feasible so-
ble points. But condition (12) is weaker than the
lution in (1) and all a constants are large enough,
condition for global underestimators (8) since a
the optimal solution to (1) has been found and the
feasible underestimator does not necessarily have
algorithm terminates.
to underestimate all points in the feasible region
Calculating Sufficiently Large a-Values. Since it is of the corresponding nonlinear function. It is only
not known beforehand how large a values to use, it required that l~.k) ~_ 0 in this region. In practice,
is shown below how to obtain large enough values a feasible underestimator needs to underestimate
to ensure e-optimality. As previously mentioned, the entire boundary or, more precisely, the convex
parts of the feasible region may be cut out when hull of the feasible region.
linearizing the quasiconvex functions, if the value To see how to get a feasible underestimator, a
of the a constant is not increased. new constant h~k) is introduced where, as previ-
If a sufficiently large a value can be found so ously with the a constants, the constant will be
that the linearization is a global underestimator of used in the j t h linearization and k stands for the
the corresponding nonlinear function in the entire value of the constant in the kth iteration. The con-
feasible region, the linearization should satisfy stant is defined as

55
Extended cutting plane algorithm

@k) _ gj In fact, it would be sufficient to require that


a~.k) . (14)
the linear underestimators should not cut away the
optimal point z*, i.e. that l~k)(z *) <_ O. The algo-
Since (12) can be divided by c~ k) the inequality rithm would then terminate in considerably fewer
becomes iterations, but since the optimal solution z* is not
h~.k) + ( V ~ j ) T ( z - #J) < 0 (15) known it is very difficult to check this requirement.
The same difficulty also appears if the algorithm
and moreover, because ~ j(k) _> 1 it holds that would be based on global underestimators of the
h~k) <_ -gj. (16) type (8). However, as will follow, global conver-
gence of the algorithm towards the optimal solu-
The level sets of quasiconvex functions are con- tion can be guaranteed by using local and feasible
vex, which means that if the constant parameter underestimators. That is why the concepts of lo-
h}k)" is replaced with zero, then the linearization cal and feasible underestimators have been intro-
(15) is always an outer approximation of the fea- duced.
sible region. In fact the linearization is then an
approximation of an even larger region Handling Infeasible MILP Problems. It is possi-
ble that the linearizations cut out enough of the
feasible region of (P) to make the corresponding
MILP problem infeasible. Then there would be no
containing the feasible region. Thus, if h~k) is suffi-
J new iteration point and the algorithm would not
ciently small, (15) is an approximation of the feasi-
be able to continue. The solution to this problem is
ble region. In practice the h constants should sat-
to update all c~ values and solve the MILP problem
isfy
again, after updating the values. If there is still no
h~k) <_ eh, Vj - I, . . . , Lk. feasible point, this process is repeated until a fea-
sible point is obtained. There exist large enough
This is the same as requiring that
values to make the MILP problem feasible, since
~k) gj the nonlinear problem (1) was assumed to be fea-
>_--, Vj- I,...,Lk, (17)
eh
sible. Thus, if the MILP problem is infeasible, the
which can easily be seen from (14). Equation (17) c~ update will be
shows that there is an important connection be-
tween sufficiently large a values and the value of j -fi.c~ , j--l,...,Lk. (19)
the nonlinear function in the linearization point To illustrate the algorithm, a flowsheet of the
(~j). The larger the term ~j is, the larger the con- algorithm is given below.
stant a has to be, to be sufficiently large. One
could use the same updating scheme (11) as was C o n v e r g e n c e of t h e p r o p o s e d m e t h o d . Con-
used for obtaining a local underestimator, but to vergence properties of the algorithm are now stud-
speed up the process a new updating factor ~ > 1 ied. Below it is proven that the algorithm con-
(and ~/_ ~) is introduced. This constant is used to verges towards the optimal solution for the qua-
update the c~ values if the corresponding lineariza- siconvex problem (1). There are three important
tions are not feasible underestimators. properties which are needed to prove convergence.
Whenever the algorithm finds a feasible point it First, the algorithm will never return to the same
checks that all linearizations are feasible underes- point if it is infeasible, secondly the generated
timators, i.e. that (17) holds. If there is some a~k) points will converge to a feasible solution and fi-
constant that violates this inequality, the value of nally this feasible solution will be the global opti-
that constant is updated by multiplying it with -7. mal solution to the original quasiconvex problem
Thus the a constants will be updated according to (I).
<
Cycling. First it is shown that the algorithm never
j = aj(k) otherwise. (18) returns to the same point if it is infeasible, i.e.,

56
Extended cutting plane algorithm

that cycling is not possible. Note that compact- COROLLARY 2 If the current point z k is infeasible,
ness or quasiconvexity of the constraint functions then z k is different from all previous points. [-7
are unnecessary to prove this theorem.
PROOF. If there is a z j, j < k, such that zJ - z k,
[ L1 = 0 , k = 0 I then zJ would be a point not satisfying the previ-
ous theorem. [2
I Sk=k+ L
o l v e (5) 1"
Convergence To a Feasible P o i n t . Convergence to
a feasible point for discrete problems is directly en-
No ~1 Update as l
sured by the above cycling theorem. By assump-
tion, there are only a finite number of points in
Y, and there is at least one feasible point. Conse-
Call solution z k. 1
Calculate [ quently, if the algorithm does not find any of the
gk--maxi{g,(zk(} [
feasible points in finite time, it would have to re-
peat an infeasible point after generating at most
No ~[ Update as [ ~ IYI iteration points, which is not possible under
-[ according to (11)
V" the cycling theorem.
Convergence in the mixed integer case can be
proven by utilizing the fact that the points x k are
I Add linearization
No a c c o r d i n g t o (4) taken on a compact set X, and the set Y is finite.
L~+I -- Lk q- 1
This implies that any infinite sequence of points
{z k _ (x k, yk). k E ~} taken on the set X x Y has
a subsequence with a limit point. The following
theorem shows that any limit point will be a feasi-
No ~1 Update as
(18) ble point which is a property required for conver-
-I a c c o r d i n g to
gence. Note that the quasiconvex property of the
nonlinear functions is not required to prove con-
P o i n t z ~ is vergence of the algorithm. Quasiconvexity is only
o p t i m a l in (1)
required to ensure a global optimal solution.
The algorithm ensures that c~ _> -gj/eh, but
Fig. 1. for simplicity assume that equality holds for those
j where ~j ___eh. Then the constant h~ k) satisfies
THEOREM 1 If, in iteration k, the current point
z k is not feasible, then all new points generated by min(eh,~j) _< h~ k) <_-gj. (21)
the algorithm will be different from z k. [2 This follows directly from (16) and the fact that
PROOF. If z k is infeasible, then gk > 0 and a lin- (17) is already satisfied for c~k)" - 1 if Yj < £h.
earization is added to the MILP problem. If this Below it is proven that any limit point is a lea-
linearization was the j t h one added, then all new sible point.
points z I generated by the algorithm will satisfy THEOREM 3 Suppose that the c~-ECP algorithm
generates an infinite sequence of points {z k" k C
g-j (vyj)T(z ___0, l> k. (20) E}. Then the limit point of any convergent subse-
quence K: C K: is feasible. D
Since z I - z k (= ~J) does not satisfy the in-
PROOF. Assume there is a convergent subsequence
equality (20), all new points will be different from
{z k" k E ~} with a limit point that is not feasi-
z k• D
ble. Then limkc ~9k -- c > 0 and one can find a
It immediately follows that all previous points
constant M such that
generated by the algorithm are different from z k £
as well. h~ k) > min (eh, ~) Vj > L M , Vk > M ,

57
Extended cutting plane algorithm

by (21). Since subsequent points z k are solutions to h~k) <_ eh. Thus the actual solution obtained by the
a linear program containing the linearization (15) algorithm can only be ensured to be e-optimal.
it holds for all k that
THEOREM 4 Assume that the a - E C P algorithm
0 > h k)+ (V j)T(z k - converges to a feasible solution z ~ and that all
linearizations are feasible underestimators accord-
ing to (12). Then z c~ is an optimal point in (P) and
when j = 1 , . . . , L k . Define G as the maximal Z ( z ~ ) , where Z(z) - cTz, is the optimal solution
norm of the gradient of g(z) in X x Y. That is, of (1). V1
a = max{l]Vg (z)ll:z x x g,i = 1,...,p}.
PROOF. Denote the feasible region of (1) with f~,
Then
the feasible region of the MILP problem that was
h~k) min(eh, e/2) > 0
IIz IlVyjll - a
solved to obtain z c¢ with f~c¢ and an optimal point
of (1) with z*. By (12) it holds that f~ C f~c¢ and
when k > M and j > LM. This implies that the thus
sequence is not a Cauchy sequence and thus not Z(z < Z(z*). (22)
convergent, which is a contradiction since it was
On the other hand z ~ was feasible in (1) and
assumed that the sequence {z k" k E ~ } was con-
vergent. [::] thus
Z(z*) < Z(zC°). (23)
Convergence To the Optimal Solution. Finally,
convergence of the algorithm to the global opti- From (22) and ( 2 3 ) o n e gets that Z(z*) =
mal solution of (1) is shown. Z ( z °°) and thus Z ( z ~ ) is the optimal solution to
First note that the algorithm will terminate in (1) and z c~ is an optimal point in (1). [:3
finite time at a point where all underestimators EXAMPLE 5 The algorithm is demonstrated on a
are e-feasible underestimators, i.e. equation (17) is quasiconvex integer problem. In these, as well as
satisfied. This follows from the convergence theo- in other test runs, it has turned out that a suit-
rem. Since any convergent subsequence has a limit able choice of ~ and ~ / i s / 3 - 1.3 and 3 ' - 10. The
point that is feasible, it means that the entire se- e-tolerances in these examples are eg - eh -- 0.001.
quence of points will also converge to a feasible Consider the problem
point. Thus there is a tail of the sequence, say
{~J" j - M , . . . }, where the initial a values of the min 3yl + 2y2
corresponding linearizations directly satisfy (17). s.t. 3 . 5 - YlY2 <_ 0 (24)
This is true for those M values that satisfy ~j _< eh, y e { 1 , . . . , 5 } 2.
Vj > M. These c~ values will remain constant in
subsequent iterations. On the other hand, after The optimal solution to this problem is y -
reaching a feasible point (~j _< eg), the old con- (2, 2), which can be seen from the figure below.
The steps executed by the a - E C P algorithm are"
stants a , j - 1 , . . . , M, can only be updated a fi-
nite number of times before being sufficiently large Iteration 1. Solve problem
to satisfy (17). Therefore the algorithm will even-
min 3yl + 2y2
tually reach a feasible point where all lineariza-
tions are e-feasible underestimators and the algo- s.t. y E { 1 , . . . , 5 } 2.
rithm terminates. It remains to see if this point is The solution is yl _ (1, 1). A linearization in
also the optimal solution.
this point
To prove that the obtained solution is the op-
timal solution one needs to assume that all linear
constraints are feasible underestimators according
to (12). This is in general true if h~k)" - O. However, is added to the MILP problem according to (4). Set
in the actual algorithm it was only required that Ct(1)
1 _ 1. The linearization l~1) is shown in Fig 2.

58
Extended cutting plane algorithm

As can be seen from this figure, the linearization is added, where c~ 5) - 1.


cuts away the optimal solution to the problem.
Iteration 6. The MILP solution is y6 _ (1, 3) which
%lil)t y2 is also infeasible. A new linearization

"~4"-!. °5+a~6)(-3-1)(Y~-~)
-<°y2
is added (a~6) = 1).
3 Y
Iteration 7. The MILP solution is again the fea-
2t Y3i ~~~11 ° t s i b l e s ° l u t i ° n y 7 = (2'2)" The linearizati°ns are
l ~ not feasible underestimators and t h u s t h ~ a values
1 yl. . . . . ..y.5 are updated. The new a values are - 1000,
c~s) = 100 and o~s)= a~ s) - 1 0 .

1 2 3 4 5 Yl Iterations 8-10. The new solutions to the MILP


problems are still yS,9,10 = (2,2) but the c~ val-
Fig. 2: Feasible region of (24). ues are not large enough to guarantee that the lin-
earizations are feasible underestimators. Therefore
Iteration 2. The solution to the new MILP prob-
the c~ constants are updated.
lem is y2 _ (1, 4). This point is a feasible solution
to the INLP problem. The linearization satisfies Iteration 11. The solution is yl1 _ (2,2) and all
the requirements of a local underestimator but is linearizations are feasible underestimators. The al-
not a feasible underestimator. Observe, that with- gorithm terminates with y* = (2, 2).
out the concept of feasible underestimators the al-
Result. The algorithm thus returns the global so-
gorithm would stop here at a nonoptimal point.
lution y* - (2, 2) to (24) with the optimal value
However, in order to ensure the linear function
Z(2,2) = 10. The final MILP problem solved in
be a feasible underestimator, the c~ constant is
iteration 11 is
updated according to (18) and a~ 3) - 10. Since
maxi{gi(zk)} < 0, no additional linearization is min 3yl + 2y2
added. s.t. 2.5 + 10000(2 - Yl - y2) < 0
Iteration 3. The solution to the new MILP prob- 1.5 + 10000(4 - 2yl - Y2) < 0
lem is y3 _ (1, 2). A new linearization at this point 1.5 + 10000(4 - Yl - 2y2) < 0
is added to the MILP problem (c~ 3) - 1) 0.5 + 1000(6 - 3y~ - y2) < 0
y e { 1 , . . . , 5 } 2.
1.5 + a ? ) ( - 2 -1) ~Y2~Yl--~) _<0 K]

Iteration ~. The MILP solution is y4 _ (2, 2) which C o n c l u s i o n s . The above algorithm has several
is feasible, however, neither linearization is a fea- advantages when compared to other similar algo-
sible underestimator, so the a values are updated rithms for solving MINLP problems. At each itera-
using (18). The new values are a~ 5) - 100 and tion, the procedure only solves MILP subproblems
c~5) = 10. and is thus a competitive alternative to algorithms
where only NLP problems or both NLP and MILP
Iteration 5. The solution of the modified MILP
problems are solved in each iteration.
problem is y5 _ (2, 1). Since it is infeasible, a new
One consequence is that since only MILP prob-
linearization
lems are solved in each iteration, the nonlinear
1 . 5 + a ~ 5) (--1 --2)
xY2
21) _< constraints need not be calculated at relaxed val-
ues of the integer variables. It can be very diffi-

59
Extended cutting plane algorithm

cult to calculate the value in a relaxed point if, References


for instance, there are binary variables that repre- [1] ADJIMAN, C.S., ANDROULAKIS, I.P., AND FLOUDAS,
sent the existence of units in a process and the C.A.: 'Global optimization of MINLP problems in pro-
constraints are evaluated by simulating the re- cess synthesis and design', Computers Chem. Engin. 21
sult of having those units present or not. Then (1997), 445-450.
it may sometimes be impossible to evaluate the [2] ANDROULAKIS, I.P., MARANAS, C.D., AND FLOUDAS,
C.A.: 'c~-BB: A global optimization method for general
constraints if the integer variables are relaxed.
constrained nonconvex problems', J. Global Optim. 7
The c~-ECP algorithm also solves MINLP prob- (1995), 337-363.
lems that have general integer variables, not only [3] DURAN, M.A., AND GROSSMANN, I.E.: 'An outer ap-
binary variables. Also, no integer cuts are needed proximation algorithm for a class of mixed-integer non-
to ensure convergence. This is not the case with linear programs', Math. Program. 36 (1986), 307-339.
[4] FLETCHER, R., AND LEYFFER, S.: 'Solving mixed-
all outer approximation MINLP methods. In ad-
integer nonlinear programs by outer approximation',
dition, the proposed algorithm ensures global con- Math. Program. 66 (1994), 327-349.
vergence for quasiconvex MINLP problems. [5] FLOUDAS, C.A.: Nonlinear and mixed-integer optimi-
Cutting plane methods are claimed to have slow zation, fundamentals and applications, Oxford Univ.
convergence. This, generally, is not the case if the Press, 1995.
[6] GEOFFRION, A.M.: 'Generalized Benders decomposi-
convergence rate is measured as the number of
tion', J. Optim. Th. Appl. 10 (1972), 237-260.
nonlinear function evaluations. Numerical experi- [7] KELLEY, J.E.: 'The cutting plane method for solving
ence with the algorithm indicates that there are convex programs', J. SIAM VIII, no. 4 (1960), 703-
many cases where the number of function evalu- 712.
ations are even magnitudes lower than for com- Is] QUESADA, I., AND GROSSMANN, I.E.: 'An LP/NLP
based branch-and-bound algorithm for convex MINLP
peting algorithms that solve both MINLP and
optimization problems', Computers Chem. Engin. 16
NLP subproblems. This is a significant advantage (1992), 937-947.
if evaluation of the constraints is the most time- [9] RYOO~ H.S., AND SAHINIDIS, N.Y.: 'Global optimiza-
consuming part of the problem. tion of nonconvex NLPs and MINLPs with applications
The algorithm is especially suitable for solving in process design', Computers Chem. Engin. 19 (1995),
551-566.
INLP problems.
[10] VISWANATHAN, J., AND GROSSMANN, I.E.: 'A com-
See Chemical process planning;
also: bined penalty function and outer approximation
Mixed integer linear programming: M a s s method for MINLP optimization', Computers Chem.
and heat exchanger networks; Mixed integer Engin. 14 (1990), 769-782.
nonlinear programming; MINLP: Outer ap- [11] VISWESWARAN, V., AND FLOUDAS, C.A.: 'New for-
mulations and branching strategies for the GOP al-
proximation algorithm; Generalized outer
gorithm', in I.E. GROSSMANN (ed.): Global Optimiza-
approximation; MINLP: Generalized cross tion in Engineering Design, Kluwer Acad. Publ., 1996,
decomposition; Generalized Benders de- pp. 75-109.
composition; MINLP: Logic-based meth- [12] WESTERLUND, W., AND PETTERSSON, F.: 'An ex-
ods; MINLP: Branch and bound methods; tended cutting plane method for solving convex
MINLP: Branch and bound global optimi- MINLP problems', Computers Chem. Engin. Suppl. 19
(1995), S131-136.
zation algorithm; MINLP: Global optimi-
[13] WESTERLUND, T., PETTERSSON, F., AND GROSS-
zation with c~BB; MINLP: Heat exchanger MANN, I.E.: 'Optimization of pump configurations as a
network synthesis; MINLP: Reactive dis- MINLP problem', Computers Chem. Engin. 18 (1994),
tillation column synthesis; MINLP: Design 845-858.
and scheduling of batch processes; M I N L P : [14] WESTERLUND, T., SKRIFVARS, S., HARJUNKOSKI, I.,
A N D P(~RN, R.: 'An extended cutting plane method
Applications in the interaction of design
for a class of non-convex MINLP problems', Comput-
and control; MINLP: Application in facility ers Chem. Engin. 22 (1998), 357-365.
location-allocation; MINLP: Applications in [15] YUAN, X., PIBOuLEAN, L., AND DOMENECH, S.: 'Ex-
blending and pooling problems. periments in process synthesis via mixed-integer pro-
gramming', Chem. Engin. and Processing 25 (1989),
99-116.

60
Extremum problems with probability functions: Kernel type solution methods

Claus Still
1, i f f ( x , ~ ) <_ t,
Dept. Math. /~bo Akademi Univ. x(t - f (x, ~)) -- O,
F~nriksgatan 3 > t.
FIN-20500 Abo, Finland
Then
E-mail address: cstill~abo.fi
f
Tapio Westerlund v(x, t) - / x(t - f(x, ~))a( d~), (2)
Process Design Lab. Abo Akad. Univ. S
Biskopsgatan 8
where a(.) is the distribution function of a ran-
FIN-20500 Abo, Finland
E-mail address: twesterl©abo.fi dom vector ~ and the integral in (2) is understood
in the Lebesgue-Stieltjes sense.
MSC2000: 90Cll, 90C26
Key words and phrases: mixed integer nonlinear program-
Integral representation (2) of the probability
ming, extended cutting plane, quasiconvex function, feasible function v(x,t) demonstrates us expressively dif-
underestimators. ficulties which arise in approximate maximization
of its value: integrand X(') itself is a discontinuous
zero-one function and integral (2) over X(') is never
EXTREMUM PROBLEMS WITH PROBA- convex. Only in some cases, e.g., if function f(x, ~)
BILITY FUNCTIONS: KERNEL TYPE SO- is jointly convex and continuous in (x, ~) and a(.)
LUTION METHODS, KSM as a measure is quasiconcave, then function v(x, t)
Two types of stochastic programs are widely is quasiconcave in x, see [12].
known: two-stage and chance constrained prob- In this survey we at first will solve iteratively,
lems. The last ones were introduced to stochastic using stochastic analogues of linearization and gra-
programming by A. Charnes and W.W. Cooper in dient projection methods, the following probability
the 1950s [I] and are formally described defining a maximization problem:
nonlinear probability function v(x, t) of the form: max v(x, t) = max P {~: f(x, ~) ~_ t}, (3)
xEX xEX
t) - P _< t } . (1) where the constraint set X is assumed to be sim-
ple, i.e. on X we can effectively solve auxiliary
Here f(x,~) is a real valued function, defined on
R r x R v, t is a fixed level of reliability, ~ = ~(w) problems of maximization of linear or quadratic
is a random parameter and P denotes probabil- functions. At second, we will exploit the intro-
ity. Note that for a fixed x the function v(x, t) as duced technique for minimization of a smooth
function over probabilistic equality-inequality type
a function of t is the distribution function of the
random variable f (x, s). constraints, using a stochastic analogue of the
modified Lagrange method.
Various examples of extremum problems with
probability function v(x,t) can be found in [3, Gradient type methods require differentiability
Chap. 1], where among others also the so-called of a cost function. A lot of papers have been de-
voted to differentiability conditions of v(x, t) in x,
'stock exchange' paradox is analyzed. To overcome
a paradoxical situation being caused by an unsuc- starting from [13] where v~(x, t) was presented via
cessful choice of the objective expected return, the surface integral. The gradient of v(x, t) via volume
strategy which maximizes the expected growth of integral was presented in [16]; see also the survey
return (Kelly strategy), was applied in [2]. In [3] it paper [4]. All these formulas are quite uncomfort-
was demonstrated that a risky (i.e. probabilistic) able to handle, especially for numerical methods.
strategy is better than the Kelly one. In the following we will assume differentiability of
!

In the approximate maximization of v(x, t) over


v(x, t) in x and in (x, t), i.e. there exist vz(x , t) and
"
Vxt (x t).
the constraint set X C R ~ we should apply some
(quasi-) gradient type method. This in turn needs Define solution sets X* for the problem (3) as
the presentation of v(x,t) as an integral, which follows:
*
we can realize via the Heaviside zero-one function X - o, VxEX},
x('): (4)

61
Extremum problems with probability functions" Kernel type solution methods

or OO

' (x , t - Ohny) dy,


yK(y)vx,t
X* - {~*. ~* - ~[z* + pv'(z, t)], Vp > 0}, = ~ ' (z,t) - h~ f
(5) -~
where 0 __ 0 _< 1, see [15], and consequently,
where 7r[y] means the projection of a vector y to
the set X. Then we can interpret linearization and lim sup IEv~(x,t,~ n) - v~(x,t) I - O.
n-+oo xE X
gradient projection methods as iteration ways for
For approximate solution of (3) consider the sto-
testing conditions (4) and (5), respectively.
chastic analogue of the linearization method"
Following [17, Chap. IV], method for solution of
a problem is said to be convergent, if limit points x~+~ - x~ + 7~(5~ - x~), (11)
of the sequence {xn}, generated by the algorithm, where 5n is a solution of the linear problem:
belong to the solution set X*.
max( v ~' ( x ~ , t ,~n ),x) - v ~' ( ~ , t , ~n ,~)
Denote n independent realizations ( t , . . . , (n of xEX
a random vector ( by (n, i.e., (n = ( ( 1 , . . . , ( n ) . and x0 C X.
Then, following [14] and [10], the smoothed ap- Explain the stochastic nature of the sequence
proximation of v(x, t) looks as follows: {Zn}, generated by the algorithm (11). For each
~ ( ~ , t, ~ ) - v~(x, t, ~..., &) (6) n the random vector xn is defined on the
sigma-algebra Fn-1, generated by random vectors
1L/ t
K( T - f (x, ~i) )dT,
hn
~1,..., ~n-1. Union of the sequence of sub-sigma-
algebras U ~ l F i is equal to the sigma-algebra F of
nhn i=1
--OO
the initial probability space (~, F, P), where the
where the sequence {hn} is connected with the se- random vector ~ was defined. Note that in each it-
quence N - {1,2,...} as eration step we should generate new (independent)
realizations of the random vector ~.
limhn-0, limnhn-co, nCN, (7)
Assume that function f(x, ~) is differentiable in
and the continuous kernel function K(y) satisfies x and that for all t C R 1 and all x E X its gradient
conditions [14]" is bounded with a a-integrable function K(~):
OO
Ifx(x,~)l <_K(~), f K(~)a( d~) < oo. (12)
K(y) d y - 1, sup IK(y)I < oo, (8) t.I

-oo<y<oo
--00

O0 O0
Let the sequence {Tn} of steplength satisfy con-
ditions"
f vK(v)dv - 0, f IvK(v)I dv < oo. (9) (X)

--OO --OO 0__%~_<1, 7n-+0, ETn-co. (13)


n--1
Gradient of the smoothed approximate proba-
bility function vn(x,t,~ n) from (6) looks now as Then the following convergence theorem holds
follows" [5]
THEOREM 1 Let differentiable in x function
v~(x,t,
' C ) (10)
f(x,~) satisfy conditions (12), smoothing continu-
in ( ) ous kernel K(y) conditions (8), (9), sequence {Tn}
nh~ Z f ( ~ ' ~ ) K t - f(~, ~) of steplength conditions (13), and let the solution
i=1 hn "
set X* be finite. Then all limit points of the se-
Even estimates (6) and (10) are biased, i.e., quence {xn}, generated by the algorithm (11), be-
Ev~(x, t, ~n) :/: vx(x ' t), ! long almost surely to the solution set X*. E:]
REMARK 2 Proof of the theorem relies on the sto-
we still have
chastic analogue of [17, Thm. A], see [9, Chap. II,
Ev~x(x,t,~ n) Thm. 8], and was verified in [5]. E]

62
Extremum problems with probability ]unctions: Kernel type solution methods

REMARK 3 Statements of the theorem are valid Define the solution set X* for the problem (18)
also for the stochastic analogue of the gradient pro- as follows [8]"
jection method, see [5]"
I
X* - {x*" F (3 G}, (19)
Xn+l - iv]in + 7nvnz(xn ,t,~n)], x0 e X. (14)
where
E]
As it was described earlier, algorithms (11) and F -{. x Is'(x • )+v=(~*
, , t)~*l 2 -0},
(14) need in nth iteration step n independent re-
alizations of the random vector ~. In [11] it was with
verified that in asymptotic sense statistical esti- t
A* - argmin ]fx(x*)+ t X*
vz( ,t)A] 2 (21)
mation type methods, as algorithms (11) and (14) A>O

are, have no advantages compared with methods of and


random search, but need more calculating efforts.
As an example of the last statement consider G - {~*. v(x*, t) > ~}, (22)
the free maximum problem:
where A* is the optimal Lagrange multiplier of the
max- max P {~. f(x,~)___ t}. (15) Lagrangian.
xER r xCR r
!
Replacing v(x,t) and vz(x,t ) with their esti-
Let ~n be the nth realization of the random vari-
mates (6) and (10), we should regularize the es-
able ~. Consider the algorithm:
timated analogue of (21) since the approximated
~ ~, t - f(z~ ~ ) subproblem (21) could be ill-posed.
xn+l - xn - -~n.,x(xn,~n)K( ' ). (16)
hr, Denote by
Assume, in addition to assumptions (7)-(9) and
(12), (13) to {hn}, {Tn}, K(y) and f(x,~), that wn(x, t, ~,~) - min{0, vn(x, t, ~n) _ a}.
c~ c¢ c~ 2
Then the stochastic analogue of modified Lagrange
method looks as follows"
n=l n--1 n=l ~nn
(~7)
Xn+l -- Xn (23)
Then, if fR-If~(x,()l a(d~) is bounded for --~n[f;(~n) + v'~(x~,t,~)~(~ ~)
bounded x, the limit points of the sequence {in}
+Mv~x(xn , t, ~n)wn(Xn, t, ~n)],
belong almost surely to the set X* of stationary
points, where An(~n) is a solution of the regularized aux-
* ! iliary subproblem of quadratic programming
X - {x*. v ~ ( z * , t ) - 0},
see [7]. min If~(xn) + vnz(xn t,~n)~[2
A>O E ' +°- ]
REMARK 4 Even algorithms (11) and (14) take
more calculating efforts compared with random with an > 0, an --+ 0, n -+ cx) and M > 0. The
search method (16), the last one is very unstable, following convergence theorem is valid, see [6]:
and converges only 'in probability' sense. [2] THEOREM 5 Let conditions of the previous the-
Consider the following nonlinear programming orem be satisfied, let the cost function ](x) be
problem with a smooth cost function f(x) and continuously differentiable and let
with probabilistic constraints of inequality type (X)

with a fixed level of reliability a, 0 < a < 1, i.e.,


n=l
min { f ( x ) ' v ( x , t ) > a} (18)
xCR r
Then limit points of the sequence, generated by
(for sake of simplicity consider only the case with the algorithm (23), belong almost surely to the so-
one inequality constraint). lution set X*, defined by (19). [5]

63
Extremum problems with probability functions: Kernel type solution methods

See also" Stochastic programming with [3] KIBZUN, A.I., AND KAN, Y.S.: Stochastic programming
simple integer recourse; Two-stage stochas- problems with probability and quantile functions, Wiley,
tic programs with recourse; Stochastic ve- 1995.
[4] KIBZUN, A., AND URYASEV, S.: 'Differentiability
hicle routing problems; Stochastic integer of probability functions', Stochastic Anal. Appl. 16
programming: Continuity, stability, rates of (1998), 1101-1128.
convergence; Logconcave measures, logcon- [5] LEPP, R.: 'Maximization of a probability function over
vexity; Logconcavity of discrete distribu- simple sets (in Russian)', Proc. Acad. Sci. Estonian
tions; General moment optimization prob- SSR. Phys. Math. 28 (1979), 303-308.
[6] LEPP, R.: 'Minimization of a smooth function over
lems; Approximation of multivariate proba-
probabilistic constraints (in Russian)', Proc. Acad. Sci.
bility integrals; Discretely distributed sto- Estonian SSR. Phys. Math. 29 (1980), 140-144.
chastic programs: Descent directions and [7] LEPP, R.: 'Stochastic approximation type algorithm
efficient points; Static stochastic program- for the maximization of the probability function', Proc.
ming models; Static stochastic program- Acad. Sci. Estonian SSR. Phys. Math. 32 (1983), 150-
ming models: Conditional expectations; 156.
[8] MIELE, i . , CRAGG, E.G., Ivsa, R.R., AND LEVY,
Stochastic programming models: Random A.V.: 'Use of the augmented penalty functions in
objective; Stochastic programming: Mini- mathematical programming problems. Part I', J. Op-
max approach; Simple recourse problem: tim. Th. Appl. 8 (1971), 115-130.
Primal method; Simple recourse problem: [9] NURMINSKII, E.A.: Numerical methods for solution of
Dual method; Probabilistic constrained lin- deterministic and stochastic Minimax Problems, Nauk.
Dumka, 1979. (In Russian.)
ear programming: Duality theory; Prob-
[10] PARZEN, E.: 'On the estimation of a probability density
abilistic constrained problems: Convexity and the mode', Ann. Math. Statist. 33 (1962), 1065-
theory; Approximation of extremum prob- 1076.
lems with probability functionals; Multi- [11] POLYAK, B.T., AND TSYPKIN, Y.Z.: 'Adaptive algo-
stage stochastic programming: Barycentric rithms of estimation (convergence, optimality, stabil-
approximation; Stochastic linear programs ity) (in Russian)', Avtomatika i Telemekhanika (Auto-
matics and Remote Control) (1979), 74-84.
with recourse and arbitrary multivariate [12] PRI~KOPA, A." 'Logarithmic concave measures and re-
distributions; Stochastic programs with re- l a t e d topics', in M.A.H. DEMPSTER (ed.): Stochastic
course: Upper bounds; Stochastic integer Programming, Acad. Press, 1980.
programs; L-shaped method for two-stage [13] RAIK, E.: 'Differentiability in the parameter of the
stochastic programs with recourse; Stochas- probability function and optimization of the probabil-
ity function via the stochastic pseudogradient method',
tic linear programming: Decomposition and
Proc. Acad. Sci. Estonian SSR. Phys. Math. 24 (1975),
cutting planes; Stabilization of cutting plane 3-6. (In Russian.)
algorithms for stochastic linear program- [14] ROSENBLATT, M.: 'Remarks on some nonparametric
ming problems; Two-stage stochastic pro- estimates of a density function', Ann. Math. Statist.
gramming: Quasigradient method; Stochas- 27' (1957), 832-837.
tic quasigradient methods in minimax prob- [15] TAMM, E.: ~On the minimization of the probability
function (in Russian)', Proc. Acad. Sci. Estonian SSR.
lems; Stochastic programming: Nonantici- Phys. Math. 28 (1979), 17-24.
pativity and Lagrange multipliers; Prepro- [16] URYASEV, S.: 'A differentiation formula for integrals
cessing in stochastic programming; Stochas- over sets given by inclusion', Numer. Funct. Anal. Op-
tic network problems: Massively parallel so- tim. 10 (1989), 827-841.
lution. [17] ZANGWILL, W.I.: Nonlinear programming. A unified
approach, Prentice-Hall, 1969.
Riho Lepp
References Tallinn Technical Univ.
Tallinn, Estonia
[1] CHARNES, A., AND COOPER, W.W.: 'Chance-
constrained programming', Managem. Sci. 5 (1959), E-mail address: lprh©ioc, ee
73-79. MSC 2000:90C15
[2] KELLY, J.: 'A new interpretation of information rate', Key words and phrases: probability function, kernel esti-
Bell System Techn. J. 35 (1956), 917-926. mates, stochastic approximation.

64

You might also like