Entropy Optimization Parameter Estimation
Entropy Optimization Parameter Estimation
Entropy Optimization Parameter Estimation
The Rayleigh-Ritz bounds are monotonically de- This discretization (9), (10) is the L e h m a n n -
creasing in n E N. Goerisch procedure. Ajp[n] is called a L e h m a n n -
The Lehmann-Goerisch procedure (see [6], [7], Goerisch bound for Aj.
[4], [5], [3]) for calculating lower bounds can be
understood as the discretization of a variational N u m e r i c a l E x a m p l e . The numerical example is
principle for characterizing the eigenvalues as well. the well known Mathieu equation. This equation
This principle and a proof of the method is due to has been considered by several authors, bounds for
S. Zimmermann and U. Mertins [10]. eigenvalues of the Mathieu equation can be found
Let p E R be a spectral parameter such that for in [1], [9] and [3]. The eigenvalue problem reads as
an N C N the inequality follows
AN < p < AN+I (5) + - • e ,
0
[/,g]
- 9 p[ lu] + p2( ,
i - 1 , . . . , N. A negative upper bound for ai results •- / (f'(x)d (x) + s cos2(x)f(x)g(x)) dx
in a lower bound for AN+I-i. In order to discretize
0
(6), one determines w l , . . . , wn C H such that
for all f, g E V.
[ui]v] = (wi]v) for ally e V, (7) With this definition the inner product [., .] and
then one defines the matrix the usual H 1 inner product are equivalent; the em-
bedding (V, [., .]) ~-+ (H, (., .)) is compact.
A2 "-((Wi[Wk))i,k-1,...,n, (8) Now the eigenvalue problem
and solves the matrix eigenvalue problem
Find A E R a n d ~ p E V, ~#0
(A1 - p A o ) x - T (A2 - 2 p A l + p2Ao) x, (9) s.t. [~lv] - A(~lv)for all v C V.
(r, x) ~ R x R ~.
is equivalent to the Mathieu equation. The trial
If for n E N the condition A~ ] < p is ful- functions Vk E V are defined by
filled, then (9) has exactly N negative eigenvalues vl(x) "- 1, (11)
T1 <_''" <_ TN < 0 < _ . . . <__ Tn. These Ti are up-
Vk(X) "-- cos(2(k - 1)x)
per bounds for our ai (ai __ Ti, i -- 1 , . . . , N). One
obtains the lower bounds forxEI, k-2,...,n.
1
Ap[n] "- p + < Aj, (10) With these trial functions the Rayleigh-Ritz up-
TN+I-j
per bounds AI'~] (eft (3), (4)) can be computed. For
j- 1,...,N. n - 5 one obtains
Eigenvalue enclosures for ordinary differential equations
AI5] i
2.28404873592 1 2.28404873561 2.28404873592
8.4560567005 2 8.4560566942 8.4560567005
19.606719005 3 19.6067171 19.6067191
39.5439779 For an example with a system of ordinary dif-
67.609198
ferential equations see [2].
The quality of these upper bounds can be in- See also: H e m i v a r i a t i o n a l inequalities:
creased by increasing n. Eigenvalue problems; Interval analysis:
An application of the Lehmann-Goerisch pro- E i g e n v a l u e b o u n d s of i n t e r v a l m a t r i c e s ;
cedure requires a spectral parameter p which is a Semidefinite programming and determinant
rough eigenvalue bound (el. (5)). For this aim the maximization; aBB algorithm.
Mathieu equation is considered for s = 0. This is
a second order problem with constant coefficients
References
and can be solved in closed form. Its eigenvalues [1] ALBRECHT, J.: 'Iterationsverfahren zur Berechnung
are Ai - 4 ( i - 1) 2, i E N. From the comparison der Eigenwerte der Mathieuschen Differentialgle-
theorem (see [3]) one can see that the Ai are lower ichung', Z. Angew. Math. Mechanics 44 (1964), 453-
bounds for the eigenvalues of the Mathieu equation 458.
[2] BEHNKE, H.: 'A numerically rigorous proof of curve
with s > 0; this can be used to verify the left hand
veering in an eigenvalue problem for differential equa-
side inequality of (5), the right-hand side inequal- tions', Z. Anal. Anwend. 15 (1996), 181-200.
ity can be examined by means of the Rayleigh-Ritz [3] BEHNKE, H., AND GOERISCH, F.: 'Inclusions for eigen-
bounds. For N = 4 one obtains values of selfadjoint problems', in J. HERZBERGER
(ed.): Topics in validated computations, Elsevier, 1994,
A3 _<A~n] _< 19.607 < p " - ~4 - 36 </~4. pp. 277-322.
If s is increased dramatically, it may be impossi- [4] GOERISCH, F.: 'Eine Verallgemeinerung eines Ver-
fahrens von N.J. Lehmann zur Einschlie~ung von
ble to satisfy (5). If this happens, one can link the
Eigenwerten', Wiss. Z. Techn. Univ. Dresden 29
eigenvalue problem under consideration and the
(1980), 429 - 431.
comparison problem by a homotopy method (cf. [5] GOERISCH, F., AND HAUNHORST, H.: 'Eigen-
[3]). wertschranken ffir Eigenwertaufgaben mit partiellen
The next task is the determination of wi E H Differentialgleichungen', Z. Angew. Math. Mechanics
65, no. 3 (1985), 129-135.
such that (7) holds true. In general this is a prob-
[6] LEHMANN, N.J.: 'Beitr~ige zur Lhsung linearer Eigen-
lem, but for differential equations, where the right- wertprobleme I', Z. Angew. Math. Mechanics 29
hand side is the identity, one can proceed as fol- (1949), 341-356.
lows: The operator on the left-hand side of the [7] LEHMANN~ N.J.: 'Beitr~ige zur Lhsung linearer Eigen-
differential equation is denoted by M; then the wertprobleme II', Z. Angew. Math. Mechanics 30
trial functions vi are chosen from 7:)(M) (that (1950), 1-16.
[8] MAEHLY, H.J.: 'Ein neues Verfahren zur gen~herten
means sufficiently smooth) such that all essen-
Berechnung der Eigenwerte hermitescher Operatoren',
tial and natural boundary conditions are satisfied. Helv. Phys. Acta 25 (1952), 547-568.
Now wi : - M vi fulfills (7). For the Mathieu equa- [9] WEINSTEIN, A., AND STENGER, W.: Methods of inter-
tion one can define mediate problems for eigenvalues, Acad. Press, 1972.
[10] ZIMMERMANN, S., AND MERTINS, U.: 'Variational
(M/)(x) .- + cos2( )/(x) bounds to eigenvalues of self-adjoint problems with ar-
and bitrary spectrum', Z. Anal. Anwend. 14 (1995), 327-
7r 345.
H. Behnke
now it is easy to see that the vi from (11) fulfill Inst. Math. TU Clausthal
vi E Y and wi " - M v i can be used in (7), (8). Erzstr. 1, 38678 Clausthal, Germany
E-mail address: behnke©math, tu-clausthal, de
From the eigenvalues of the matrix eigenvalue
problem (9) one obtains the following bounds: MSC2000: 49R50, 65L15, 65L60, 65G20, 65G30, 65G40
Eigenvalue enclosures for ordinary differential equations
Key words and phrases" upper and lower bounds to eigen- _ /xlnx if x>0,
values, Rayleigh-Ritz method, Lehmann-Maehly method. e(x)
[ cc if x <0,
m tl
ii) There exists y E R rn such that, together
max
yER m with x, dj ln xj + cj + dj - ~im=l aijYi >_ 0
zER~ i=1 j--1
or V / ( x ) - A-ry > O. Similarly, this can be
m
+ ~
"=
uj ( dj lnxj + cj + dj -
i=1
aijy i °
j=l dj
l_Eln(XjSj) <1 n 1
() ---lnn.
and s k - V f ( x k) - A T y k > O, to a new interior n xTs -- n
j=l
solution pair (xk+l,y k+l) such that the comple-
mentary slackness is reduced from 5k -- (xk) Tsk Consequently,
n
t o ~ k + l -- (xk+l) Tsk+l. The algorithm terminates
when ~k ~ •, for some given c > 0 (or when the xTs --
j=l
difference between f ( x k) and the optimum is suf-
Therefore, the target potential should be ( p -
ficiently small).
n) In e + n Inn. Given the potential associated with
To describe the algorithm, we use the boldface
the initial solution, the exact amount of potential
upper-case letters X, S, and W to denote the diag-
reduction is ¢ ( x °, s °) - (p - n) In e - n Inn. Note
onal matrices formed by the components of vectors
that for a given inaccuracy tolerance e, the tar-
x, s, and w, respectively. We also denote the vec-
get potential is indeed the minimum of all the po-
tors of all ones of appropriate dimensions by e, the
tentials associated with all (x,s) pairs such that
12 norm by ]]']1, and the vector whose components
x T s - c. This is indicated by the tight geometric-
are ln(xj)'s, j - 1 , . . . , n , by ln x.
arithmetic inequality.
Rather than dealing with the complementary
Given the knowledge of how much potential
slackness 5k directly, the following primal-dual po-
reduction needs to be, if an algorithm reduces
tential function [8]
the potential by a constant amount in each it-
n
eration, then the complexity of the algorithm is
¢(x, s) -- pln(xTs) -- E ln(xjsj),
j=l
O ( ¢ ( x °, s °) - (p - n) In e - n Inn).
Assume that, in iteration k, we have a primal-
where p >_ n + v ~ , can be used as a surrogate
dual feasible solution pair (x k, yk) and the slack
measure [5].
vector s k - V f ( x k) - A T y k > O. Ideally, one
Given the initial solution pair, the potential of
would like to find (x k+i, yk+l) such that the K K T
the associated complementary slackness can be
conditions are met, i.e.,
calculated. Given the inaccuracy tolerance e, a
Ax k+i-b, x k+i_O,
target potential can be calculated. Therefore, the
amount of required potential reduction can be cal- V f ( x k+i) -- A T y k+i ~ O,
culated. The primal-dual interior point algorithm, X k+i ( V f (x k+i) - ATyk+i) _ O.
under proper conditions, will reduce the potential
by a constant amount in each iteration. Define
Note that two different pairs of (x, s) that have AX -- X k+l -- X k,
the same complementary slackness measure may A y -- yk+i _ yk,
have different potentials. Therefore, to ensure that
As -- s k +1 _ s k,
the target potential is sufficiently small, we need to
find the minimum potential among all those (x, s) A X - X k+i - X k.
pairs such that xTs -- e, or a lower bound of this
With these definitions, the conditions stated above
minimum potential.
become
Rewrite the potential function as
n
A(x k+Ax)-b , x k+Ax>_O,
¢(x, S) -- (p -- n ) l n ( x Ts) -- E In ( x j s j V f ( x k + Ax) - A T (yk + Ay) >_ O,
j=l
xSs) "
( x k + z x) (6)
Applying the geometric-arithmetic inequality re-
sults in × [Vf (x k + Ax) - AT (yk + Ay)] -- O.
! ~ !
Note that quantity in the bracket of (6) is simply
11~, xTs ] -- n . xTs n s k+i - s k + As, where
j=l j=l
is in general difficult.
GivenO<x kCFp,s k-V/(x k)-ATy k >0, Note that the vector V 2 f ( x k ) A x replaces V f ( x k +
and 5k _ (xk)Tsk, the algorithm proposed in [4] A x ) - V f ( x k) of (7) and serves as a simple linear
solves the following system of nonlinear equations approximation. Equations (12) and (13) are key to
f o r / k x and Ay: the 'potential-reduction' primal-dual interior point
algorithm.
X k As + S k A x - 0p k, (7)
Given an initial interior point solution, an inte-
A A x = O, (8) rior point algorithm can be stated as follows.
where 0 > 0 is a constant to be specified later and
Initialization:
pk
(~k x k s k e, Given an initial primal interior point solution x °
- - - e -
path following interior point method has been es- Shu-Cherng Fang
tablished. However, to the best of our knowledge, North Carolina State Univ.
North Carolina, USA
possible polynomial time convergence behavior re-
E-mail address: fangCeos .ncsu. edu
mains an open issue.
H.-S. Jacob Tsao
See also: E n t r o p y o p t i m i z a t i o n : S h a n n o n San Jose State Univ.
m e a s u r e of e n t r o p y a n d its p r o p e r t i e s ; San Jose, California, USA
J a y n e s ~ m a x i m u m e n t r o p y principle; M a x - E-mail address: jtsao©email, sjsu. e d u
i m u m entropy principle: Image reconstruc- MSC2000: 94A17, 90C51, 90C25
tion; E n t r o p y o p t i m i z a t i o n : P a r a m e t e r esti- Key words and phrases: entropy optimization, interior point
m a t i o n ; H o m o g e n e o u s s e l f d u a l m e t h o d s for methods, primal-dual algorithm, polynomial time conver-
linear p r o g r a m m i n g ; L i n e a r p r o g r a m m i n g : gence.
Interior point methods; Linear program-
ming: K a r m a r k a r p r o j e c t i v e a l g o r i t h m ; Po-
t e n t i a l r e d u c t i o n m e t h o d s for l i n e a r pro- ENTROPY OPTIMIZATION: PARAMETER
g r a m m i n g ; Successive q u a d r a t i c p r o g r a m - ESTIMATION
ming: S o l u t i o n by a c t i v e sets a n d i n t e r i o r
p o i n t m e t h o d s ; S e q u e n t i a l q u a d r a t i c pro-
g r a m m i n g : I n t e r i o r p o i n t m e t h o d s for dis- I n t r o d u c t i o n . Entropy optimization has been ap-
tributed optimal control problems; Interior plied to problems in various fields of interest from
p o i n t m e t h o d s for s e m i d e f i n i t e p r o g r a m - thermodynamics to financial planning. In this con-
ming. text 'entropy' refers to the amount of uncertainty
in a system, rather than the amount of disorder. A
detailed definition of entropy can be found in [4].
References
[1] FANG, S.-C., AND PUTHENPURA, S.: Linear optimiza- One area of application, which has not received
tion and extensions: theory and algorithms, Prentice- much attention in recent years, is that of param-
Entropy optimization: Parameter estimation
eter estimation. The estimation of parameters in It is assumed that when qi - O, the associated pi
semi-empirical mathematical models is a process also is zero and 0 in o _ 0. This function is re-
which is important in many disciplines in the sci- ferred to as the Kullback-Leibler measure of cross-
ences and engineering. This article will focus on entropy.
a few different areas of the parameter estimation
problem which have been approached from an en- Jaynes' M a x i m u m Entropy for Continuous
tropy perspective. Jaynes'maximum entropy prin- Distributions. Since most distributions encoun-
ciple allows for the estimation of parameters in a tered in practice are continuous in nature, Jaynes'
statistical distribution function by specification of principle of maximum entropy (MaxEnt), must
the characteristic moments. This method can also first be extended to continuous distributions. This
be used to derive the principle of maximum like- extension is straight forward and results in:
lihood, one of the most widely used parameter es- b
timation approaches. Entropy principles have also
max - f f (x)in f (x) dx
been used to derive theoretical 'best estimators' for
a
recursive parameter estimation schemes. These re- b
sults can then be used to gauge the performance
s.t. f f(x)dx - 1 (3)
of various nonoptimal approaches. A final applica- a
tion involves the development of a measure which b
not only allows for the estimation of model param- f (X)gr(X) dx - at,
eters, but also simultaneously choosing the best a
mathematical form of the model. r- 1,...,m,
n
I - E Pi In P~. (2) 2) Use MaxEnt to find f(x), which is given by
i=1 qi (5).
Entropy optimization: Parameter estimation
(12)
1
a r -- --[gr(Zl) -~- ' ' " + g r ( X n ) ] . (6)
n The knowledge which is given by the observation
4) Determine estimates of the Lagrange multi- is"
pliers, A0,..., Am, from"
' F ( x , O) - 0 when x < x l
f : gr(x)e-~g~(x) ..... ~mgm(x) dx 1
(7) O) - - when xl _~ x < X2
n
ar = f : e-'Algl (x) . . . . . "Amgm (z) d x
and r
F(x, 0) - - when xr < x < Xr +l
b n
10
Entropy optimization: Parameter estimation
• None of the operators in the system are re- I(~; (9) - I([; D(O)), (19)
quired to be linear.
or, in words, that D(O) preserves energy. When
Before continuing with the analysis, various this does not hold, I(~; O) > I(~; D(O)).
measures need to be defined. The entropy of a K- The problem now is to determine the function
dimensional random vector X with the joint prob- F which will produce an optimal estimator. The
ability density function, px(xl,... ,xk) is defined theoretically best function results in a minimum
&S: of the error entropy, defined to be H0. The only
OO
constraint on the approach is that the mutual in-
H(X) - - / pz(X) lnpz(X) dX. (15) formation, I(O; Z), must be known. With that the
--00 following can be stated:
If Rx is the covariance matrix of the vector X then • The minimum entropy of the error vector is
the following holds" given by:
1 H0 = H(U) - / ( U ; Z). (20)
H ( X ) < ~ In { (27re) g det[Rz] }. (16)
Minimizing the mutual information, I(X; Z),
When X is a Gaussian random vector then (16)
is equivalent to the minimization of the er-
holds as an equality. Another quantity which will
ror vector. This is achieved be choosing F(Z)
be used in the analysis is referred to as the mutual
such that Z and X are independent.
information between X and Y.
Whether or not D(O) preserves energy, the
Xk
and the equality holds when D(@) A preserves
I V. t -
energy and the optimal processor, F, is used.
;__k~ ........
The
Dynamical These three statements now make it possible to
System determine the best possible performance an esti-
Sensor [ zk t mator can achieve for a given system. The proofs
Yk I Estimator of these statements and a simple example can be
found in [9]. The extension of the theorems to the
Fig." Typical parameter estimator.
continuous time case is given in [6], and to the
The object is to estimate a vector O of un- similar problem of state estimation in [7].
known parameters with the joint probability den-
sity function, Po(01,..., Om). The output of the dy- Parameter Estimation and Model Selection.
namical model as a function of these parameters For most problems of any physical significance the
is expressed as Yk(Ol,..., Om, k). These outputs are form of the model equations are not known with
then measured by a sensor to produce {zk}. These absolute certainty. In this lies the problem of not
measurements are then used by the data processor only estimated unknown parameters, but also de-
F to produce an r-dimensional vector V which is termining the best fitting model. Given a set of N
an estimate of D(O). The estimation error is given independent observations, X l , . . . , XN, of a random
by: variable from an unknown true distribution g(x),
the objective is to estimate this true distribution
X-D(O)-V-D(O)-F(Z)=U-V. (18)
by choosing a member of a family of distributions
Also, under certain conditions, the transform given by f (xlO) where O is a vector of parameters.
D(O) will possess a property that for any given In order to accomplish this, the distance between
random vector ~, the following holds" the two distributions needs to be minimized. The
11
Entropy optimization: Parameter estimation
entropy of the true distribution is given by: [3] BOZDOGAN, H.: 'Model selection and Akaike's infor-
mation criterion (AIC): The general theory and its an-
S(g; g) - / g(x) in g(x) dx (22) alytical extensions', Psychometrika 52, no. 3 (1987),
345-370.
while a measure of the cross-entropy is given by: [4] KAPUR, J.N., AND KESAVAN, H.K.: Entropy optimiza-
tion principles and applications, Acad. Press, 1992.
S(g; f (xJO) - / g(x) In f (xJO) dx. (23) [5] KULLBACK, S., AND LEIBLER, R.A.: 'On information
and sufficiency', Ann. Math. Statist. 22 (1951), 79-86.
The Kullback-Leibler (K-L) measure is defined as: [6] MINAMIDE, N.: 'An extension of the entropy theo-
rem for parameter estimation', Inform. and Control 53
I -- s ( g ; g) - S ( g ; / ( x l e ) (24)
(1982), 81-90.
[7] MINAMIDE, N.~ AND NIKIFORUK, P.N.: 'Conditional
-
f g(x) in f~-xi~)
g(x) dx . entropy theorem for recursive parameter estimation
and its application to state estimation problems', In-
Therefore the solution involves the minimization
ternat. J. Syst. Sci. 24, no. 1 (1993), 53-63.
of the K-L measure [3]. [8] SHANNON, C.E.: 'A mathematical theory of communi-
Take the example of a family of possible dis- cation', Bell System Techn. J. 27 (1948), 379-423,623-
tributions each one having a different number, k, 659.
of unknown parameters, Ok. These are denoted [9] WEIDEMANN, H.L., AND STEAR, E.B.: 'Entropy anal-
ysis of parameter estimation', Inform. and Control 14
by f(xJ(~k). The resulting form of the measure
(1969), 493-506.
to choose the correct distribution is referred to as
William R. Esposito
Akaike's information criterion (AIC)[1]:
Dept. Chemical Engin. Princeton Univ.
AIC(k) - - 2 1 n L ( O k ) + 2k, (25) Princeton, NJ 08544-5263, USA
E-mail address: randyOtitan, princeton, edu
where ln L(Ok) is the value of the log likelihood Christodoulos A. Floudas
function with optimally determined parameters Dept. Chemical Engin. Princeton Univ.
Ok. It is proven in [3] that this result is obtained Princeton, NJ 08544-5263, USA
by the minimization of the K-L measure given by E-mail address: f l o u d a s O t i t a n , princeton, edu
(24). MSC2000: 94A17, 62F10
A secondary problem in the area of model selec- Key words and phrases: maximum entropy, parameter esti-
tion, is sequential design of experiments. The con- mation, model identification.
cept of entropy has been applied to this problem
in [2]. A total entropy criterion is developed which
includes the uncertainty in the model selected as ENTROPY OPTIMIZATION: SHANNON
well as the uncertainty in the parameter values in MEASURE OF ENTROPY AND ITS PROP-
each model. The use of this measure leads to a ERTIES
choice of an experiment for which the outcome is The word entropy originated in the literature on
the most uncertain. thermodynamics around 1865 in Germany and was
See also: Entropy optimization: Shannon coined by R. Clausius [4] to represent a measure of
measure of entropy and its properties; the amount of energy in a thermodynamic system
Jaynes ~ maximum entropy principle; Maxi- as a function of the temperature of the system and
mum e n t r o p y p r i n c i p l e : I m a g e reconstruc- the heat that enters the system. Clausius wanted
tion; E n t r o p y o p t i m i z a t i o n : I n t e r i o r p o i n t a word similar to the German word energie (i.e.,
methods. energy) and found it in the Greek word ~TTpor~,
which means transformation [1]. The word entropy
References had belonged to the domain of physics until 1948
[1] AKAIKE, H.: 'A new look at the statistical model when C.E. Shannon, while developing his theory of
identification', IEEE Trans. Autom. Control 19, no. 6
communication at Bell Laboratories, used the term
(1974), 716-723.
[2] BORTH, D.M.: 'A total entropy criterion for the dual
to represent a measure of information after a sug-
problem of model discrimination and parameter esti- gestion made by J. yon Neumann. Shannon wanted
mation', J. Royal Statist. Soc. B 37 (1975), 77-87. a word to describe his newly found measure of un-
12
Entropy optimization: Shannon measure of entropy and its properties
certainty and sought Von Neumann's advice. Von particular moments of the distribution, e.g., the
Neumann's reasoning to Shannon [25] was that: expected value. In this case, a mathematical def-
'No one really understands entropy. Therefore, if inition of 'uncertainty' is crucial. This is the case
you know what you mean by it and you use it when where Shannon's measure of uncertainty, or Shan-
you are in an argument, you will win every time.' non's entropy, plays an indispensable role [20].
Whatever the reason for the name is, the con- To define entropy, Shannon proposed some ax-
cept of Shannon's entropy has penetrated a wide ioms that he thought any measure of uncertainty
range of disciplines, including statistical mechan- should satisfy and deduced a unique function, up
ics [12], thermodynamics [12], statistical inference to a multiplicative constant, that satisfies them. It
[24], business and finance [5], nonlinear spectral turned out that this function actually possesses
analysis [21], image reconstruction [3], transporta- many more desirable properties. In later years,
tion and regional planning [26], queueing theory many researchers modified and replaced some of
[10], information theory [20], [9], statistics [17], his axioms in an attempt to simplify the reason-
econometrics [8], and linear and nonlinear pro- ing. However, they all deduced that same function.
gramming [6], [7]. We first focus on finite-dimensional entropy, i.e.,
The concept of entropy is closely tied to the Shannon's entropy defined on discrete probabil-
concept of uncertainty embedded in a probability ity distributions that have a finite number of out-
distribution. In fact, entropy can be defined as a comes (or states). Let p - ( p l , . . . , pn)7- be a prob-
measure of probabilistic uncertainty. For example, ability distribution associated with n possible out-
suppose the probability distribution for the out- comes, denoted by x - (xl, ... , Xn)7-, of an exper-
come of a coin-toss experiment is (0.0001, 0.9999), iment. Denote its entropy by Sn(p). Among those
with 0.0001 being the probability of having a tail. defining axioms, J.N. Kapur and H.K. Kesavan
One is likely to notice that there is much more stated the following [15]"
'certainty' than 'uncertainty' about the outcome
1) Sn(p) should depend on all the pj's, j =
of this experiment and hence about the proba-
1,...,n.
bility distribution. In fact, one is almost certain
that the outcome will be a head. If, on the other 2) Sn (p) should be a continuous function of pj,
hand, the probability distribution governing that j-1,...,n.
same experiment were (0.5, 0.5), one would real-
3) S , ( p ) should be permutationally symmetric.
ize that there is much less 'certainty' and much
In other words, if the pj's are merely per-
more 'uncertainty,' when compared to the previous
muted, then S , ( p ) should remain the same.
distribution. Generalizing this observation to the
case of n possible outcomes, we conclude that the 4) S n ( 1 / n , . . . , 1/n) should be a monotonically
uniform distribution has the highest uncertainty increasing function of n.
out of all possible probability distributions. This
5) Sn(Pl,...,Pn) -- Sn-I(Pl + P2,P3,...,Pn)
implies that, if one had to choose a probability
+ + p2) $2 + p2), p2 / + p2)).
distribution for a chance experiment without any
prior knowledge about that distribution, it would Properties 1, 2 and 3 are obvious. Property 4
seem reasonable to pick the uniform distribution. states that the maximum uncertainty of a proba-
This is because one would have no reason to choose bility distribution should increase as the number of
any other and because that distribution maximizes possible outcomes increases. Property 5 is the least
the 'uncertainty' of the outcome. This is called obvious but states that the uncertainty of a proba-
Laplace's principle of insufficient reasoning [15]. bility distribution is the sum of the uncertainty of
Note that we are able to justify this principle with- the probability distribution that combines two of
out resorting to a rigorous definition of 'uncer- the outcomes and the uncertainty of the probabil-
tainty.' However, this principle is inadequate when ity distribution consisting of only those two out-
one has some prior knowledge about the distribu- comes adjusted by the combined probabilities of
tion. Suppose, for example, that one knows some the two outcomes.
13
Entropy optimization: Shannon measure of entropy and its properties
It turns out that the unique family of func- considered necessary for any reasonable measure
tions that satisfy the defining axioms has the of uncertainty [19], [20], [16]. The concept of en-
rt
form Sn(p) - - k ~ j = l PJ lnpj, where k is a pos- tropy, when extended for probability distributions
itive constant, In represents the natural logarith- defined on a countably infinite sample space, takes
mic function, and 01n0 - 0 [15]. Shannon chose the form o f - ~#~=1PJ ln pj. It can still be viewed
- ~ j n I pj lnpj to represent his concept of entropy as a measure of uncertainty but such an interpre-
[20]. Among its many other desirable properties, tation does not enjoy the same degree of math-
we state the following: ematical rigor as its finite-sample-space counter-
part. When the concept is extended for continu-
6) Shannon's measure is nonnegative and con-
ous probability distributions, it is defined to be
cave in Pl , • • •, Pn.
- f p(x) In p(x) dx. However, it can no longer be
7) The measure does not change with the inclu- interpreted as a measure of uncertainty at all [9],
sion of a zero-probability outcome. [11]. Rather, it can only be viewed as a measure of
8) The entropy of a probability distribution rep- relative uncertainty [15].
resenting a completely certain outcome is 0, Note that, with Shannon's entropy as the mea-
and the entropy of any probability distribu- sure of uncertainty, in the absence of any prior
tion representing uncertain outcomes is pos- information about the underlying probability dis-
itive. tribution, the best course of action suggested by
9) Given any fixed number of outcomes, the the principle of insufficient reasoning is to choose
maximum possible entropy is that of the uni- the uniform distribution because it possesses max-
form distribution. imum uncertainty. Given the knowledge of some
moments of the underlying distribution, the same
10) The entropy of the joint distribution of two
reasoning leads to the following principle:
independent distributions is the sum of the
individual entropies. • Out of all possible distributions that are con-
11) The entropy of the joint distribution of two sistent with the moment constraints, choose
dependent distributions is no greater than the one that has maximum entropy.
the sum of the two individual entropies. This principle was proposed by E.T. Jaynes ([15,
Property 6 is desirable because it is much eas- Chapter 2]), and has been known as the principle
ier to maximize a concave function than a noncon- of m a x i m u m entropy or Jaynes' m a x i m u m entropy
cave one. Properties 7 and 8 are appealing because principle. It has often been abbreviated as MaxEnt
a zero-probability outcome contributes nothing to in literature.
uncertainty, and neither does a completely certain Let X be a random variable with n possible
outcome. Property 9 was discussed earlier. Proper- outcomes { x l , . . . , x n } and p - (pl,...,Pn)-r be
ties 10 and 11 state that joining two distributions a vector consisting of corresponding probabilities.
does not affect the entropy, if they are indepen- Suppose that gl ( X ) , . . . , gm ( X ) are m functions of
dent, and may actually reduce the entropy, if they X with known expected values a l , . . . , am, respec-
are dependent. tively. The principle of maximum entropy leads to
Shannon's entropy was originally defined for a the following mathematical optimization problem:
probability distribution over a finite sample space, rt
14
Entropy optimization: Shannon measure of entropy and its properties
This is a convex programming problem with lin- 1) D(p, p0) should be nonnegative for all p and
ear constraints. The nonnegativity constraints are p0.
not binding for the optimal solution p* because 2) D(p, p0) _ 0 if and only if p - p0.
each pj can be expressed as an exponential func-
3) D ( p , p °) should be a convex function of
tion in terms of the Lagrange multipliers associ-
Pl, • • • ,Pn-
ated with the equality constraints. Note that, in
the absence of the moment constraints, the solu- 4) When D(p, p0) is minimized subject to too-
tion to the problem is the uniform probability dis- ment constraints but without the explicit
tribution, whose entropy is Inn. As such, the max- presence of the nonnegativity constraints, the
imum entropy principle can be viewed as an exten- resulting pj ~S should be nonnegative.
sion of the Laplace's principle of insufficient rea- Property 1 is desirable for any such measure of
soning. The distribution selected under the max- deviation. If property 2 were not satisfied, then
imum entropy principle has also been interpreted it would be possible to choose a vector p that
as one that is the 'most probable' in the sense that has a zero directed divergence from p 0 i.e., one
the maximum entropy distribution coincides with that is as 'close' to p0 as p0 itself, but differs
the frequency distribution that can be realized in from p0. Property 3 makes minimizing the mea-
the greatest number of ways [13]. An explanation sure much simpler, and property 4 spares us from
of this linkage in the context of the well-known explicitly considering n nonnegativity constraints.
application of entropy maximization in transporta- Fortunately, there are many measures that satisfy
tion planning can be found in [7]. these properties. We may even be able to find one
Recall that the above discussion was originally that satisfies the triangular inequality. But, sim-
motivated by the task of choosing a probability plicity of the measure is also desirable. The sim-
distribution among those that are consistent with plest and most important of those measures is the
some given moments. Now, in addition to the mo- Kullback-Leibler measure ([15, Chapt. 4]), defined
ment constraints, suppose that we have an a pri- as D(p, p0) _ Ejn_lpj ln(pj/p0), with the con-
ori probability distribution p0 that we think our vention that, whenever pj0 is 0 , pj is set to 0 and
probability distribution p should be close to. In 0 ln(0/0) is defined to be 0. This measure is also
fact, in the absence of the moment constraints, known as the c r o s s - e n t r o p y , relative entropy, di-
we would like to choose p0 for p because it is rected divergence or expected weight of evidence of
clearly the closest to p0. However, in the presence p with respect to p0. A. Hobson [11] provided an
of some moment constraints which p0 does not axiomatic characterization of c r o s s - e n t r o p y . He in-
satisfy, we need a precise defihition of 'closeness' terpreted D(p, p0) as the 'information in p relative
or 'deviation'. In other words, we need to define to p0,, and showed that the only function I(p, p0)
some sort of deviation or, more precisely, 'directed satisfying the following five properties has the form
d i v e r g e n c e ' [15] on the space of discrete probabil- of k E j n = l p j l n ( p j / p O ) , where k is a positive con-
ity distributions where the distribution is chosen stant:
from. Note that we deliberately avoid calling this 5) I(p, p0) is a continuous function of p and p0.
measure a 'distance'. This is because a distance
6) i(p, p0) is permutationally symmetric, i.e.,
measure should be symmetric and should satisfy
the measure does not change if the pairs of
the triangular inequality, but these two properties
(pj, pj0 ) are permuted among themselves.
are not important in this context. In fact, we can
be content with a 'one-way (asymmetric) deviation 7) I(p, p) - 0.
measure', D(p, p0), from p to p0. If a 'one-way de- 8) For any pair of integers n and
viation measure' from p to p0 is not satisfactory, no such that no > n > 0,
one can consider using a symmetric measure de- I ( 1 / n , . . . , 1 / n , O, . . . , O; 1 ~ n o , . . . , l/n0)
fined as the sum of D(p, p0) and D ( p °, p). What is an increasing function of no and
is desirable for this 'directed divergence' measure a decreasing function of n, where
includes the following properties" I ( 1 / n , . . . , 1 / n , O, . . . , O; 1 ~ n o , . . . , l/n0) de-
15
Entropy optimization: Shannon measure of entropy and its properties
notes the information obtained when the sequently for distributions defined on countably
number of equally likely possibilities is re- infinite and continuous sample spaces. The cor-
duced from no to n. responding forms become ~-]j~l PJ ln(pj/p°) and
9) f p(x) l n ( p ( x ) / p ° ( x ) ) d x , respectively. It has also
been derived rigorously as the unique measure
I(pl, . . . ,pn;pO, . . . ,pO) _ I(ql,q2; qO, qO)
of deviation of one probability distribution from
+q ±dp pO pO, another that satisfies a set of axioms considered
\ ~ . . . ~ ~ ~ ~ . . .
16
Entropy optimization: Shannon measure of entropy and its properties
alent to maximizing entropy and, therefore, Max- [2] BEN-TAL, A., TEBOULLE, M., AND CHARNES, A.: 'The
Ent is a special case of MinxEnt. These two princi- role of duality in optimization problems involving en-
tropy functionals with applications to information the-
ples can now be combined into a general principle:
ory', J. Optim. Th. Appl. 58 (1988), 209-223.
Out of all probability distributions sat- [3] BURCH, S.F., GULL, S.F., AND SKILLING, J.K.: 'Image
restoration by a powerful maximum entropy method',
isfying the given moment constraints,
Computer Vision, Graphics, and Image Processing 23
choose the distribution that minimizes (1983), 113-128.
the cross-entropy with respect to the [4] CLAUSIUS, R.: 'Ueber Verschiedene fur die Anwendung
given a priori distribution and, in the Bequeme Formen der Hauptgleichungen der Mecha-
absence of it, choose the distribution nischen Warmetheorie', Ann. Physik und Chemie 125
(1865), 353-400.
that minimizes the cross-entropy with
[5] COZZOLINO, J.M., AND ZAHNER,M.J.: 'The maximum
respect to the uniform distribution. entropy distribution of the future market price of a
stock', Oper. Res. 21 (1973), 1200-1211.
Both the MaxEnt and MinxEnt principles for
[6] ERLANDER, S.: 'Entropy in linear programming', Math.
selecting finite-sample-space probability distribu- Program. 21 (1981), 137-151.
tions and the MinxEnt principle for selecting con- [7] FANG, S.-C., RAJASEKERA, J.R., AND TSAO, H.-S.J.:
tinuous probability distributions can be axiomati- Entropy optimization and mathematical programming,
cally derived [22]. Under four consistency axioms, Kluwer Acad. Publ., 1997.
it was shown that the two principles are uniquely IS] GOLAN, A., JUDGE, G., AND MILLER, D.: Maximum
entropy econometrics: robust estimation with limited
correct methods for inductive inference when new
data, Wiley, 1996.
information is given in the form of expected values. [9] GUIASU, S.: Information theory with applications, Mc-
Many well-known and widely used distributions, Graw-Hill, 1977.
including the normal, gamma and geometric dis- [10] GUIASU, S.: 'Maximum entropy condition in queueing
tributions, can actually be derived as solutions to theory', J. Oper. Res. Soc. 37 (1986), 293-301.
some MaxEnt or MinxEnt problems [15]. [11] HOBSON, A.: Concepts in statistical mechanics, Gor-
don and Breach, 1987.
The maximum entropy principle has also been [12] JAYNES, E.T.: 'Information theory and statistical me-
shown to be a dual principle of the maximum chanics II', Phys. Rev. 108 (1957), 171-190.
likelihood principle for the exponential family of [13] JAVNES, E.T.: 'Prior probabilities', IEEE Trans. Syst.,
probability distributions in the sense that a dual Sci. Cybern. SSC-4 (1968), 227-241.
JOHNSON, R.W.: 'Axiomatic characterization of the di-
problem to the linearly constrained entropy max-
rected divergence and their linear combinations', IEEE
imization problem is equivalent to the problem of Trans. Inform. Theory 25 (1979), 709-716.
maximizing a likelihood function with respect to [15] KAPUtt, J.N., AND KESAVAN, H.K.: Entropy optimiza-
the parameters of an exponential family [2]. This tion principles with applications, Acad. Press, 1992.
principle has also been shown to be related to the [16] KHINCHIN, A.I.: Mathematical foundations of informa-
Bayesian parameter estimation problem [7]. Du- tion theory, Dover, 1957.
ality theory and major mathematical algorithms
[17] KULLBACK, S.: Information theory and statistics,
Dover, 1968.
for solving finite-dimensional MaxEnt or MinxEnt [lS] SCOTT, C.H., AND JEFFERSON, T.R.: 'Entropy maxi-
problems can be found in [7] and the references mizing models of residential location via geometric pro-
therein. gramming', Geographical Anal. 9 (1977), 181-187.
See also: Jaynes' maximum entropy princi- [19] SHANNON, C.E.: 'A mathematical theory of commu-
nication', Bell System Techn. J. 27 (1948), 379-423;
ple; Maximum entropy principle: Image re-
623-656.
construction; Entropy optimization: Param- [20] SHANNON, C.E., AND WEAVER, W.: The mathematical
eter estimation; Entropy optimization: Inte- theory of communication, Univ. Illinois Press, 1962.
rior point methods; Optimization in medical [21] SHORE, J.E.: 'Minimum cross-entropy spectral analy-
imaging. sis', IEEE Trans. Acoustics, Speech and Signal Process-
ing 29 (1981), 230-237.
[22] SHORE, J.E., AND JOHNSON, R.W.: 'Axiomatic deriva-
References tion of the principle of maximum entropy and the prin-
[1] BAIERLEIN, R.: 'How entropy got its name', Amer. J. ciple of minimum cross-entropy', IEEE Trans. Inform.
Phys. 60 (1992), 1151.
17
Entropy optimization: Shannon measure of entropy and its properties
18
Equilibrium networks
19
Equilibrium networks
functions, could be reformulated as the solution p , q , . . , the paths. Assume that there are J ori-
to an optimization problem. Samuelson [20], fol- gin/destination (O/D) pairs, with a typical O/D
lowing [9], had made a similar connection but in pair denoted by w, and n modes of transporta-
the more specialized context of spatial price equi- tion on the network with typical modes denoted
librium problems on networks that were bipartite. by i,j, ....
M.J. Smith [22] later proposed an alternative The flow on a link a generated by mode i is de-
formulation of traffic network equilibrium condi- noted by f~, and the user cost associated with trav-
tions which were then identified by S.C. Dafermos cling by mode i on link a is denoted by ca. i Group
[3] to satisfy a finite-dimensional variational in- the link flows into a column vector f E R nL, where
equality problem. This connection allowed for the L is the number of links in the network. Group the
relaxation of the symmetry assumption and, conse- link costs into a row vector c E R nL. Assume that
quently, for the construction of more realistic mod- the user cost on a link and a particular mode may,
els (cf. [17], [21], and the references therein). in general, depend upon the flows of every mode
Other network equilibrium applications whose on every link in the network, that is,
study and understanding have benefited from this =
20
Equilibrium networks
in other words, the cost on & path p due to mode i holds, then the variational inequality problem can
is equal to the sum of the link costs of links com- be reformulated as the solution to an optimization
prising that path and using that mode. problem. This symmetry assumption, however, is
The traffic network equilibrium conditions are not expected to hold in most applications. Conse-
given below. quently, the variational inequality problem which
is the more general problem formulation is needed.
DEFINITION 1 (multimodal traffic network equilib-
For example, the symmetry condition essentially
rium) ([2], [3], [4]) A link load pattern f* satisfying
says that the flow on link b due to mode j should
the feasibility conditions is an equilibrium pattern,
affect the cost of mode i on link a in the same
if, once established, no user has any incentive to
manner that the flow of mode i on link a affects
alter his travel arrangements. This state is char-
the cost on link b and mode j. In the case of a sin-
acterized by the following equilibrium conditions,
gle mode problem, the symmetry condition would
which must hold for every mode i, every O/D pair
imply that the cost on link a is affected by the flow
w, and every path p E Pw:
on link b in the same manner as the cost on link b
Cp -)~i if Xpi * >0, is affected by the flow on link a.
>,~ if Xpi * =0,
A Migration Network Equilibrium Model.
where ~/w is the equilibrium travel disutility asso- Human migration is a topic that has been studied
ciated with the O/D pair w and mode i. [::] not only by economists, but also by demographers,
We now define the feasible set K as sociologists, and geographers. Here a model of hu-
man migration is described, which is shown to have
a simple, abstract network structure in which the
K- I f" the demand3xconstraints
>_0, and I . links correspond to locations and the flows on the
the link load constraints hold links to populations of a particular class at the par-
One can verify (see [3]) that the variational in- ticular location. Hence, the model is isomorphic to
equality governing equilibrium conditions for this the traffic network equilibrium problem just de-
model would be given as in the subsequent theo- scribed on a network with special structure. For
rem. additional details, see [16], [17], [18].
Assume a closed economy in which there are n
THEOREM 2 (variational inequality formulation) locations, typically denoted by i, and J classes,
A vector f* E K is an equilibrium pattern, if and typicaUy denoted by k. Assume further that the
only if, it satisfies the variational inequality prob- attractiveness of any location i as perceived by
lem
class k is represented by a utility u ki . Let ~k de-
(c(]*), f - f*) > 0, v f E K. note the fixed and known population of class k in
the economy, and let pk denote the population of
[2 class k at location i. Group the utilities into a row
Note that this variational inequality is in link vector u E I:tgn and the populations into a column
loads. One can also derive a variational inequal- vector p E R Jn. Assume no births and no deaths
ity problem in path flows (see also [1], [4], [17]). in the economy.
Existence of an equilibrium f* follows from the The conservation of flow equation for each class
standard theory of variational inequalities (cf. [14]) k is given by
solely from the assumption that c is continuous, n
21
Equilibrium networks
The conservation of flow equation expresses with overpopulation, such as congestion, increased
that the population of each class k must be con- crime, competition for scarce resources, etc.
served in the economy. As illustrated in [17], the above migration model
DEFINITION 3 (migration equilibrium) Assume is equivalent to a network equilibrium model
that the migrants are rational and that migration with a single origin/destination pair and fixed de-
will continue until no individual of any class has mands. Indeed, one can make the identification
any incentive to move since a unilateral decision as follows. Construct a network consisting of two
will no longer yield an increase in the utility. Math- nodes, an origin node 0 and a destination node
ematically, hence, a multiclass population vector 1, and n links connecting the origin node to the
p* C K is said to be in equilibrium if for each class destination node. Associate with each link i, J
k,k=l,...,J: costs: --~t i1~. . . ~u J and link flows represented by
p~,... ,pJ. This model is, hence, equivalent to a
k{ -'~k ifp/k* > 0
multimodal traffic network equilibrium model with
ui <_ )~k ifpk* _ O.
fixed demand for each mode, consisting of a sin-
D gle origin/destination pair, and J paths connecting
the O/D pair. Note that one can make J copies of
The equilibrium conditions express that for a given
the network, in which case, each ith network will
class k only those locations i with maximal util-
correspond to class i with the cost functions on
ity will have a positive population volume of the
the links defined accordingly. This identification
class. Moreover, the utilities for a given class are
enables us to immediately write down the follow-
equilibrated across the locations.
ing:
THEOREM 4 (variational inequality formulation)
A population pattern p* C K is in equilibrium,
if and only if it satisfies the variational inequality
--u~, "'" , _ U lJ ~ U n1~ ' ' " _u J
problem:
22
Equivalence between nonlinear complementarity problem and fixed point problem
flow problems; Network location: Covering metrics and Management Sci., North-Holland, 1980,
problems; Maximum flow problem; Short- pp. 271-294.
est path tree algorithms; Steiner tree prob-
[13] HAURIE, A., AND MARCOTTE, P.: 'On the relationship
between Nash-Cournot and Wardrop equilibria', Net-
lems; Survivable networks; Directed tree works 15 (1985), 295-308.
networks; Dynamic traffic networks; Auc- [14] KINDERLEHER, D., AND STAMPACCHIA, G.: An intro-
tion algorithms; Piecewise linear network duction to variational inequalities and their applica-
flow problems; Communication network as- tions, Acad. Press, 1980.
signment problem; Generalized networks; [15] NAGURNEY, A.: 'Computational comparisons of spa-
tial price equilibrium methods', J. Reg. Sci. 27 (1987),
Evacuation networks; Network design prob- 55-76.
lems; Stochastic network problems: Mas- [16] NAGURNEY, A.: 'Migration equilibrium and variational
sively parallel solution. inequalities', Economics Left. 31 (1989), 109-112.
[17] NAGURNEY, A.: Network economics: A variational in-
References equality approach, second ed., Kluwer Acad. Publ.,
[1] AASHTIANI, H.Z., AND MAGNANTI, T.L.: 'Equilibria 1999.
on a congested transportation network', SIAM J. Alg. [IS] NAGURNEY, A., PAN, J., AND ZHAO, L.: 'Human mi-
Discrete Meth. 2 (1981), 213-226. gration networks', Europ. J. Oper. Res. (1991).
[2] BECKMANN, M.J., McGumE, C.B., AND WINSTEN, [19] PATRIKSSON, M.: The traffic assignment problem, VSP,
C.B.: Studies in the economics of transportation, 1994.
Yale Univ. Press, 1956. [20] SAMUELSON, P.A.: 'A spatial price equilibrium and lin-
[3] DAFERMOS, S.: 'Traffic equilibrium and variational in- ear programming', Amer. Economic Rev. 42 (1952),
equalities', Transport. Sci. 14 (1980), 43-54. 283-303.
[4] DAFERMOS, S.: 'The general multimodal network equi- [21] SHEFFI, Y.: Urban transportation networks, Prentice-
librium problem with elastic demand', Networks 14 Hall, 1985.
(1982), 43-54. [22] SMITH, M.J.: 'The existence, uniqueness, and stabil-
[5] DAFERMOS, S.: 'Exchange price equilibria and varia- ity of traffic equilibria', Transport. Res. 13B (1979),
tional inequalities', Math. Program. 46 (1990), 391- 259-304.
402. [23] WARDROP, J.G.: 'Some theoretical aspects of road traf-
[6] DAFERMOS, S., AND NAGURNEY, A.: 'Stability fic research', Proc. Inst. Civil Engineers II (1952), 325-
and sensitivity analysis for the general network 378.
equilibrium-travel choice model', in J. VOLMULLER
Anna Nagurney
AND R. HAMERSLAG (eds.): Proc. 9th Internat. Syrup.
Univ. Massachusetts
Transportation and Traffic Theory, VNU Sci. Press,
Amherst, Massachusetts 01003, USA
1984, pp. 217-234.
[7] DAFERMOS, S., AND NAGURNEY, A.: 'Oligopolistic and E-mail address: nagurney~gbfin, umass, e d u
competitive behavior of spatially separated markets', MSC 2000:90C30
Regional Sci. and Urban Economics 17 (1987), 245- Key words and phrases: traffic network equilibrium, spatial
254. price equilibrium, migration equilibrium, multimodal net-
[8] DAFERMOS, S., AND SPARROW, F.T.: 'The traffic as- works, multiclass migration.
signment problem for a general network', J. Res. Nat.
Bureau Standards 73B (1969), 91-118.
[9] ENKE, S.: 'Equilibrium among spatially separated mar-
kets: solution by electronic analogue', Econometrica 10 EQUIVALENCE BETWEEN NONLINEAR
(1951), 40-47. COMPLEMENTARITY PROBLEM AND
[10] FLORIAN, M., AND HEARN, D.: 'Network equilibrium FIXED POINT PROBLEM
models and algorithms', in M.O. BALL, T.L. MAC-
Complementarity theory is a new domain of ap-
NANTI, C.L. MONMA, AND G.L. NEMHAUSER (eds.):
Network Routing, Vol. 8 of Handbook Oper. Res. and plied mathematics strongly related to Linear Anal-
Management Sci., Elsevier, 1995, pp. 485-550. ysis, Nonlinear Analysis, Topology, Variational
[11] FLORIAN, M., AND LOS, M.: 'A new look at static spa- Inequalities Theory, Ordered Topological Vector
tial price equilibrium models', Regional Sci. and Urban Spaces, Numerical Analysis etc. The main goal
Economics 12 (1982), 579-597.
in this theory is the study of complementarity
[12] GABAY, D., AND MOULIN, n.: 'On the uniqueness and
stability of Nash-equilibria in noncooperative games',
problems. It is well known that complementarity
in A. BENSOUSSAN, P. KLEINDoRFER, AND C.S. problems encompass a variety of practical prob-
TAPIERO (eds.): Applied Stochastic Control in Econo- lems arising in: Optimization, Structural Mechan-
23
Equivalence between nonlinear complementarity problem and fixed point problem
ics, Elasticity, Economics etc. [8]. The relation THEOREM 1 For every x C H, P g ( x ) is character-
between the general nonlinear complementarity ized by the following property:
problem and the fixed point problem it seems to 1) < P K ( x ) - x, y> >_ 0 for all y C K;
be remarkable. The main aim of this article is the
2) (PK(x) -- x , x } = O.
study of this relation.
D
P r e l i m i n a r i e s . Let E, E* be a pair of real locally PROOF. A proof of this theorem is in [20]. [i]
convex spaces. The space E* can be the topologi- Very useful is also the following classical Moreau's
cal dual of E. Let (., .>be a bilinear form on E × E* theorem"
satisfying the separation axioms:
THEOREM 2 If K C H is a closed convex cone
81) (X0,y> - 0 for all y C E* implies x0 - 0; and x, y, z C H, then the following statements are
s2) (x, Y0> - 0 for all x E E implies Y0 - 0. equivalent"
The triplet (E, E*, <., .>) is called a dual system or i) z - x + y , xCK, y E K ° and < x , y ) - 0 ;
a duality (denoted by (E, E*>). In practical prob- ii) x - PK(Z) and y - PK0(Z).
lems, the space E can be a Banach space and E*
E]
its topological dual and <x, y> = y(x) for all x C E
and y e E*. When E is a Hilbert space (H, (.,.>) PROOF. For the proof the reader is referred to
or the Euclidean space (R n, (., .>) we have that H* [16]. [:3
(respectively, (Rn) *) is isomorphic to H (respec- W e say that the closed pointed convex cone K C H
tively, to Rn). Let (E,E*> be a dual system of is isotone projection if and only if, for every x, y E
locally convex spaces. Denote by K a pointed con- H such that y - x E K we have P K ( y ) - - P K ( x ) E K.
vex cone in E, i.e., a subset of E satisfying the This remarkable class of cones has been studied in
following properties: several papers (see for example [13]). We say that
1) K + K C_K; a closed pointed convex cone K C H is a Galerkin
cone if there exists a family of convex subcones
2) AK C_ K for all A E R + (the set of nonnega-
{Kn}neN of K such that:
tive real numbers); and
1) Kn is a locally compact cone, for every n C
3) K M ( - K ) - {0}.
N;
The closed convex cone 2) if n _< m, then Kn C_ Kin;
K*-{yCE*" (x,y> > _ 0 f o r a l l x E K } 3) K - UncNKn.
24
Equivalence between nonlinear complementarity problem and fixed point problem
Given two mappings f" K --+ E* and g" K ~ E PROOF. Suppose that x0 is a fixed point for the
the implicit complementarity problem is: mapping (I), i.e.,
find x0 E K zo -- P K ( z o ) - - f
25
Equivalence between nonlinear complementarity problem and fixed point problem
ities. In the study of some economical problems, which implies that h ( x , ) = x,. M
we are interested to find a solution of the problem We note that Theorem 5 was applied to obtain new
NLCP(f, K) which is also the least element of the fixed point theorems [7], [10], [11]. We cite only the
feasible set following two fixed point theorems.
F = {x e K : f ( x ) e K*}. THEOREM 6 Let (H, (., .>) be a Hilbert space or-
This particular problem can be also studied by the dered by a Galerkin cone K(K)n~N. Let T : K --+
fixed point theory [5], [8]. If the cone K is an iso- K be a mapping satisfying the following assump-
tone projection cone in a Hilbert space H and if the tions:
mapping f : H --+ H satisfies some properties with 1) T(0) 0;
respect to the ordering defined by K, we obtain 2) T is a (ws)-compact operator;
that the mappings T and (I) are monotone increas-
3) T is C-asymptotically bounded, with
ing or the difference of two monotone increasing
l i m t _ ~ ¢ ( t ) # +c~.
mappings. In this case, we can apply some fixed
point theorems based on the ordering, to study of Then, T has a fixed point x, E K \ {0}. More-
the problem NLCP(f, K). Several results in this over, x, is the limit of a sequence {Xm}meN where
sense are presented in [13]. for every m E N , Xm is a solution of the problem
NLCP(T, Kin). [3
The Nonlinear Complementarity Problem
PROOF. The terminology and the proof is in [7].
As a M a t h e m a t i c a l Tool In F i x e d P o i n t T h e - [-]
ory. The fixed point theorems on cones attracted
Recently, a new proof for this theorem was pro-
the attention of many mathematicians. The appli-
posed in [14].
cations of such kind of fixed point theorems are
very important. We will show now how the prob- THEOREM 7 Let (H, (., .>) be a Hilbert space or-
lem NLCP(], K) can be used to obtain new fixed dered by a Galerkin cone K(K)n~N C H. Suppose,
point theorems on cones. given two continuous operators S, T: K --+ H such
Let H be a Hilbert space, K C H a closed that S is bounded, T is compact and ( S + T ) ( K ) C_
pointed convex cone and h" K --+ K a mapping. K. If the following assumptions are satisfied:
The fixed point problem associated to h and K is: 1) I - S satisfies condition (S)+;
Consider the mapping f" K ~ H defined by PROOF. The terminology and the proof is in [11].
f ( x ) - x - h(x) for all x C K. [3
THEOREM 5 The problems NLCP(f,K) and We note that Theorem 7 has several interesting
FP (h, K) are equivalent. [3 corollaries. In [10] the reader can find other fixed
point theorems for set-valued operators.
PROOF. Suppose that x, is a solution of the
problem FP(h, K). In this case we have h ( x , ) -
C o n c l u s i o n s . This interesting double relation be-
x,, which implies that f ( x , ) - O. It is evident
tween the nonlinear complementarity problem and
that x, is a solution of the problem N L C P ( f , K).
the fixed point theory, can be exploited to obtain
Conversely, if x, is a solution of the prob-
new results in complementarity theory and also in
lem N L C P ( f , K ) we have that x, is a solu-
fixed point theory.
tion of the problem V I ( f , K ) , i.e., x, E K and
See also" P r i n c i p a l p i v o t i n g m e t h o d s for
<f(x,),y-x,) >_ 0 for all y C K. But f ( x , ) -
linear c o m p l e m e n t a r i t y problems; Linear
x , - h ( x , ) and h ( x , ) e g (by hypothesis). This
complementarity problem; Convex-simplex
means that
algorithm; Sequential simplex method;
0 <__ <x, - h ( z , ) , x , - h ( x , ) } _ 0, P a r a m e t r i c l i n e a r p r o g r a m m i n g " C o s t sim-
26
Estimating d a t a / o r multicriteria decision making problems: Optimization techniques
plex algorithm; Linear programming; Lemke C.R. Acad. Sci. Paris 225 (1962), 238-240.
method; Integer linear complementary [17] NOOR, M.A.: 'Fixed point approach for complementar-
problem; LCP: Pardalos-Rosen mixed in- ity problems', J. Math. Anal. Appl. 133 (1988), 437-
448.
teger formulation; Order complementar-
[181 NooR, M.A.: 'Iterative methods for a class of comple-
ity; Generalized nonlinear complementarity mentarity problems', J. Math. Anal. Appl. 133 (1988),
problem; Topological methods in comple- 366-382.
mentarity theory. [19] ROBINSON, S.M.: 'Normal maps induced by linear
transformations', Math. Oper. Res. 17, no. 3 (1992),
691-714.
References [20] ZARANTONELLO, E.H.: 'Projection on convex sets in
[1] AHN, B.H.: 'Solution of nonsymmetric linear comple- Hilbert space and spectral theory', in E.H. ZARAN-
mentarity problems by iterative methods', J. Optim. TONELLO (ed.): Contributions to Nonlinear Functional
Th. Appl. 33, no. 2 (1981), 175-185. Analysis, Acad. Press, 1971, pp. 237-424.
[2] COTTLE, R.W.: Complementarity and variational
problems, Vol. 19, Amer. Math. Soc., 1976, pp. 177- George Isac
208. Royal Military College of Canada
[3] HYERS, D.H., ISAC, G., AND RASSIAS, T.M.: Topics in Kingston, Ontario, Canada
non-linear analysis and applications, World Sci., 1997. E-mail address: isac-gCrmc, ca
[4] ISAC, G.: 'On the implicit complementarity problem
MSC 2000:90C33
in Hilbert spaces', Bull. Austral. Math. Soc. 32, no. 2
Key words and phrases: nonlinear complementarity prob-
(1985), 251-260.
lem, fixed point problem.
[5] ISAC, G.: 'Complementarity problem and coincidence
equations o convex cones', Boll. Unione Mat. Ital. Set.
B 6 (1986), 925-943.
[6] ISAC, G.: 'Fixed point theory and complementarity ESTIMATING DATA FOR MULTICRITERIA
problems in Hilbert spaces', Bull. Austral. Math. Soc. DECISION MAKING PROBLEMS: OPTIMI-
36, no. 2 (1987), 295-310.
[7] ISAC, G." 'Fixed point theory, coincidence equations on"
ZATION TECHNIQUES
convex cones and complementarity problem', Contemp. One of the most crucial steps in many multicrite-
Math. 72 (1988), 139-155. ria decision making methods (MCDM) is the ac-
[8] ISAC, G.: Complementarity problems, Vol. 1528 of Lec- curate estimation of the pertinent data [18]. Very
ture Notes Math., Springer, 1992.
often these data cannot be known in terms of ab-
[9] ISAC, G.: 'Tihonov's regularization and the comple-
solute values. For instance, what is the worth of
mentarity problem in Hilbert spaces', J. Math. Anal.
Appl. 174, no. 1 (1993), 53-66. the ith alternative in terms of a political impact
[10] ISAC, G.: 'Fixed point theorems on convex cones, gen- criterion? Although information about questions
eralized pseudo-contractive mappings and the comple- like the previous one is vital in making the cor-
mentarity problem', Bull. Inst. Math. Acad. Sinica 23, rect decision, it is very difficult, if not impossible,
no. 1 (1995), 21-35.
to quantify it correctly. Therefore, many decision
[11] ISAC, G.: 'On an Altman type fixed point theorem
on convex cones', Rocky Mountain J. Math. 25, no. 2 making methods attempt to determine the rela-
(1995), 701-714. tive importance, or weight, of the alternatives in
[12] ISAC, G., AND GOELEVEN, D.: 'Existence theorems terms of each criterion involved in a given decision
for the implicit complementarity problem', Internat. J. making problem.
Math. and Math. Sci. 16, no. 1 (1993), 67-74.
Consider the case of having a single decision
[13] ISAC, G., AND NEMI~TH, A.B" 'Projection meth-
ods, isotone projection cones and the complementar-
criterion and a set of n alternatives, denoted as
ity problem', J. Math. Anal. Appl. 153, no. 1 (1990), Ai (for i = 1 , . . . , n ) . The decision maker wants
258-275. to determine the relative performance of these al-
[14] JACHYMSKI, J.: 'On Isac's fixed point theorem for self- ternatives in terms of a single criterion. An ap-
maps of a Galerkin cone', Ann. Sci. Math. Qudbec 18, proach based on pairwise comparisons which was
no. 2 (1994), 169-171.
[15] KARAMARDIAN, S.: 'Generalized complementarity proposed by T.L. Saaty [Ii], and [12] has long at-
problem', J. Optim. Th. Appl. 8 (1971), 161-168. tracted the interest of many researchers, because
[16] MOREAU, J." 'D~composition orthogonale d'un espace both of its easy applicability and interesting math-
hilbertien selon deux cones mutuellement polaires', ematical properties. Pairwise comparisons are used
27
Estimating data for multicriteria decision making problems: Optimization techniques
to determine the relative importance of each alter- The second problem in this article is how to esti-
native in terms of each criterion. mate missing comparisons. The third problem is
how to select the order for eliciting the compar-
In that approach the decision maker has to ex-
isons and determine whether all comparisons are
press his/her opinion about the value of one single
needed. These problems are examined in detail in
pairwise comparison at a time. Usually, the deci-
the following sections.
sion maker has to choose his/her answer among
10-17 discrete choices. Each choice is a linguistic
phrase. Some examples of such linguistic phrases E x t r a c t i o n of R e l a t i v e Priorities from Com-
when two concepts, A and B are considered might plete Pairwise Matrices. Let A 1 , . . . , A n be n
be: 'A is more important than B', or 'A is of the alternatives (or criteria or, in general, concepts)
same importance as B', or 'A is a little more im- to be compared. We are interested in evaluat-
portant than B', and so on. When one focuses di- ing the relative preference values of the above
rectly on the data elicitation issue one may use lin- concepts. Saaty [11], [12], [14] proposed to use
guistic statements such as 'How much more does a matrix A of rational numbers taken from the
alternative A belong to the set S than alternative set {1/9, 1/8, 1 / 7 , . . . , 1 , . . . , 9 } . Each entry of the
B'? above matrix A represents a pairwise judgment.
Specifically, the entry aij denotes the number that
The main problem with the pairwise compar-
estimates the relative preference of element Ai
isons is how to quantify the linguistic choices se-
when it is compared with element Aj. Obviously,
lected by the decision maker during the evalua-
aij - 1/aji and aii - 1. That is, the matrix is
tion of the pairwise comparisons. All the meth-
reciprocal.
ods which use the pairwise comparisons approach
eventually express the qualitative answers of a de- The Eigenvalue Approach. Let us first examine the
cision maker into some numbers. case in which it is possible to have perfect values
Pairwise comparisons are quantified by using "aij. In this case it is aij - W i / W j ( W s denotes
a scale. Such a scale is nothing but an one-to- the actual value of element s) and the previous
one mapping between the set of discrete linguis- reciprocal matrix A is consistent. That is:
tic choices available to the decision maker and aij -- aik × akj for i, j, k - 1 , . . . , n, (1)
a discrete set of numbers which represent the
importance, or weight, of the previous linguistic where n is the number of elements in the compari-
choices. There are two major approaches in de- son set. It can be proved [12] that the matrix A has
veloping such scales. The first approach is based rank 1 with n to be its nonzero eigenvalue. Thus,
on the linear scale proposed by Saaty [12] as part we have:
of the analytic hierarchy process (AHP). The sec- A x = nx, (2)
ond approach was proposed by F. Lootsma [8], [9],
[10] and determines exponential scales. Both ap- where x is an eigenvector. From the fact that
proaches depart from some psychological theories aij - W i / W j , the following are obtained:
and develop the numbers to be used based on these n n
psychological theories. For an extensive study of EaijWj - E Wi-nWi' i- l,...,n, (3)
the scale issue, see [18] and [19]. j=l j=l
28
Estimating data for multicriteria decision making problems: Optimization techniques
i=j j=i
malize the previously found vector by the sum of n
29
Estimating data .for multicriteria decision making problems: Optimization techniques
n 1 2 3 4 5 6 7 8 9
RCI 0 0 0.58 0.90 1.12 1.24 1.32 1.41 1.45
an-l,n i
W2 Wn (12)
WI ""' WI ' 1 1 1 ... 1
30
Estimating data for multicriteria decision making problems: Optimization techniques
31
Estimating data/or multicriteria decision making problems: Optimization techniques
least square method under the human rationality Xi,j - ai,k × a k,j.
assumption (HR).
In the more general inconsistent case, the Xi,j
As it is shown in the last column of Table 3, the
value can be approximated by the product a i , k ×
performance of each method is very different as far
ak,j. In [5], and [6] the pair ai,k and ak,j is called an
the mean residual is concerned. The results also il-
elementary connecting path connecting the missing
lustrate how critical is the role of the functions
comparison Xi,j. Obviously, given a missing com-
¢1(X, Z) and ¢2(X, Z ) i n the method of [3]. The
parison, more than one such connecting path may
mean residual obtained by using the least squares
exist (i.e., if there are more than one k indexes
method under the human rationality assumption
which satisfy the above relationship). Moreover, it
is the smallest one by 16%.
is also possible to have connecting paths comprised
by more than two known comparisons (i.e., paths
Matrices with Missing Comparisons. For of size larger than 2). The general structure of a
one to evaluate n concepts, normally all the re- connecting path of size r, denoted as CPr, has the
quired n(n-1)/2 pairwise comparisons are needed. following form:
However, for large numbers of concepts to be
CPr " Zi,j - ai,kl × akl,k2 × "'" X akr,j ,
compared, the decision maker may become quite
bored, tired and inattentive with assigning the val- for i,j, kl,... ,kr - 1,... ,n, 1 ~_ r <_ n - 2 .
ues to the comparisons as time is going on, which According to P.T. Harker [5], [6] the value of
may easily lead to erroneous judgments. Moreover, the missing comparison Xi,j should be equal to
the time spent to elicit all the comparisons for a the geometric mean of all connecting paths related
judgment matrix may be unaffordable. Also the to this missing comparison. That is, the following
decision maker may not be sure about the values of should be true:
some comparisons and thus may not want to make
a direct evaluation of them. In cases like the previ-
Xij- ~HCPr.r=I
ous ones, the decision maker may wish to stop the
process and then try to derive the relative pref-
In the previous expression it is assumed that there
erences from an incomplete pairwise comparison
are q such connecting paths. For the above rea-
(judgment) matrix.
sons, this method is known as the geometric mean
Given an incomplete pairwise comparison ma-
method for estimating missing comparisons.
trix, there are two central and closely interrelated
A method alternative to the geometric means
problems. The first problem is how to estimate
method is to express the missing comparisons in
the missing comparisons. The second problem is
terms of the arithmetic averages of all related con-
which comparison to evaluate next. In other words,
necting paths and some error terms. In this way,
if the decision maker wishes to estimate a few
one can also introduce error terms on consistency
extra comparisons (from the remaining undeter-
relations which are defined on pairs of missing
mined ones) how should the next comparison be
comparisons (for more details, please see [1]). A
selected? Should it be selected randomly or ac-
natural objective then, could be to minimize the
cording to some rule (to be determined)? Next,
sum of the absolute terms of all these error terms
we study the first of these two closely related prob-
(which can be of any sign). That is, the above
lems.
consideration leads to the formulation of a linear
programming (LP) problem. A similar approach is
Estimating Missing Comparisons.
presented in [17] (in which the path problem does
Using Connecting Paths. Suppose that Xi,j is a not occur).
missing comparison to be estimated. Next, also as- However, there is a serious drawback with any
sume that there are two known comparisons ai,k method which attempts to use connecting paths.
and ak,j for some index k. In the perfectly consis- The number of connecting paths may be astronom-
tent case the following relationship should be true: ically large, rendering any such method computa-
32
Estimating data for multicriteria decision making problems: Optimization techniques
elements in set
method used (1) (2) (3) (4) (5) (6) (7) Ave. residual
Saaty eigenvector method 0.429 0.231 0.021 0.053 0.053 0.119 0.095 0.134
Power method eigenvector 0.427 0.230 0.021 0.052 0.052 0.123 0.094 0.135
Chu's method 0.487 0.175 0.030 0.059 0.059 0.104 0.085 0.097
Federov model 1 with Ct - 1 0.422 0.232 0.021 0.052 0.052 0.127 0.094 0.138
Federov Model 2 with ¢2 - 1 0.386 0.287 0.042 0.061 0.061 0.088 0.075 0.161
Federov Model 2 with ¢2 --- 0.383 0.262 0.032 0.059 0.059 0.122 0.083 0.152
IW, - W~ l
Federov Model 2 with ¢2 = Wi/Wj 0.047 0.229 0.021 0.051 0.051 0.120 0.081 0.130
Least squares method under the HR 0.408 0.147 0.037 0.054 0.054 0.080 0.066 0.082
assumption
1 2 wt/w3l Ci,i -- 1 + mi
A1 = 1/21
Lw3/wl i/2
21]. and for i # j:
33
Estimating data for multicriteria decision making problems: Optimization techniques
Next, the elements of the W vector can be de- best stated as follows: Given an incomplete judg-
termined by using one of the methods presented ment matrix, and the option to elicit just some
in the second section. additional comparisons, then which one should be
the comparison to elicit next?
Least Squares Formulation. This formulation is a
One obvious approach is to select the next com-
natural extension of the formulation discussed ear-
parison just randomly among the missing ones.
lier in the section on the HR factor. The only differ-
This problem was examined by Harker in [5] and
ence is that in relations (12) one should only con-
[6]. Harker focused his attention on how to deter-
sider known comparisons. This, as a result, implies
mine which comparison, among the missing ones,
that the new matrix B (as defined earlier) should
is the most critical one. He determined as the most
not have rows which would correspond to missing
critical one, to be the comparison which would
comparisons. Finally, observe that in order to solve
have the largest impact (when the appropriate
the least squares problem given as (16), one has to
derivatives are considered) on the vector W.
calculate the vector W as follows:
He observed that the largest absolute gradi-
W-(BTB)-IBTb, ent (i.e., the largest partial derivative) means that
a unit change of the specific missing comparison
where B T stands for the transpose of B. brings out the biggest change on the vector W.
In [1] the revised geometric means and the pre- Therefore, he asserted, that the missing compari-
vious least squares method were tested on ran- son related to the largest absolute gradient should
dom problems. First, a complete judgment matrix be the most critical one and therefore, the one to
was determined. These matrices, in general, were evaluate next. Then, the following formula calcu-
slightly inconsistent. They were derived according lating the largest absolute gradient can be used to
to the procedures used in [22], [20], and [19]. Then, choose the most critical comparison index (i, j):
some comparisons were randomly removed and set
as missing. Then, the previous two methods were Ox(A)
(i j ) - arg max
applied on the incomplete judgment matrix and ' (k,l)eQ Ok,l OO
the missing comparisons were estimated. The es- where Q is the set of missing comparisons and
timated matrix was used to derive a ranking of []'[[oo is the Tchebyshev norm. The most critical
the compared entities. This ranking was compared comparison index (i, j) is determined by the max-
with the ranking derived when the original com- imum norm of the vector of Ox(A)/Ok,l which cor-
plete judgment matrix is used. In these compu- responds to all missing comparisons.
tational experiments it was found that the two The previous approach is intuitively plausible
estimation methods for missing comparisons per- but computationally non trivial. Moreover, its ef-
formed almost in a similar manner. This manner fectiveness had not been addressed until recently.
was different for matrices of different order and In [1] Harker's derivatives approach was tested
various percentages of missing comparisons. More versus a method which randomly selects the next
details on these issues can be found in [1]. comparison to elicit. The test problems were gen-
erated similarly to the ones described at the end
Determining the Comparison to Elicit Next. of the previous section. The two methods were
Suppose that the decision maker has determined also tested in a similar manner as before. To our
some of the n ( n - 1)/2 comparisons when a set surprise, the two methods performed in a similar
of n entities is considered for extracting relative manner. Therefore, the obvious conclusion is that
preferences. Next assume that the decision maker one does not have to implement the more com-
wishes to proceed with only a few additional com- plex derivatives method. It is sufficient to select
parisons and not determine the entire judgment the next comparison just randomly. Of course, the
matrix. The question we examine at this point is more comparisons are selected, the better is for
which ones the additional comparisons should be. the accuracy of the final results. Since the order
To be more specific, the question we consider is of comparisons seems not to have an impact, the
34
Estimating data for multicriteria decision making problems: Optimization techniques
best strategy is to select as the next comparison totalities', in M.M. GUPTA AND E. SANCHEZ (eds.):
the one which is easier for the decision maker to Approximate Reasoning in Decision Analysis, North-
elicit. Holland, 1982, pp. 23-30.
[4] HARKER, P.T.: 'Alternative modes of questioning in
the analytic hierarchy process', Math. Model. 9, no. 3-
C o n c l u s i o n s . Deriving the data for MCDM prob- 5 (1987), 353-360.
lems is an approach which requires trade-offs. [5] HARKER, P.T.: 'Derivatives of the Perron root of a
Thus, it should not come as a surprise that op- positive reciprocal matrix: With application to the
timization can be used at various stages of this analytic hierarchy process', Appl. Math. Comput. 22
(1987), 217-232.
crucial phase in solving many MCDM problems.
[6] HARKER, P.T.: 'Incomplete pairwise comparisons in
The previous analysis of some key problems signi- the analytic hierarchy process', Math. Model. 9, no. 11
fies that optimization becomes more critical as the (1987), 837-848.
size of the decision problem increases. [7] KALABA, R., AND SPINGARN, K.: 'Numerical ap-
Finally, it should be stated here that an in depth proaches to the eigenvalues of Saaty's matrices for
fuzzy sets', Comput. Math. Appl. 4 (1979).
analysis of many key issues in multicriteria deci-
[8] LOOTSMA, F.A.: 'Numerical scaling of human judg-
sion making theory and practice is provided in [18]. ment in pairwise-comparison methods for fuzzy multi-
See also" Multi-objective optimization: criteria decision analysis': Mathematical Models for De-
Pareto optimal solutions, properties; Multi- cision Support, Vol. 48 of NATO ASI F: Computer and
objective optimization: Interactive meth- System Sci., Springer, 1988, pp. 57-88.
ods for preference value functions; Multi- [9] LOOTSMA, F.A.: 'The French and the American school
in multi-criteria decision analysis', Rech. Oper./Operat.
objective optimization: Lagrange dual- Res. 24, no. 3 (1990), 263-285.
ity; Multi-objective optimization: Interac- [10] LOOTSMA, F.A.: 'Scale sensitivity and rank preser-
tion of design and control; Outranking vation in a multiplicative variant of the AHP and
methods; Preference disaggregation; Fuzzy SMART', Techn. Report Fac. Techn. Math. and Infor-
multi-objective linear programming; Multi- matics Delft Univ. Techn., no. 91-67 (1991).
[11] SAATY, T.L.: 'A scaling method for priorities in hier-
objective optimization and decision sup-
archical structures', J. Math. Psych. 15, no. 3 (1977),
port systems; Preference disaggregation ap- 234-281.
proach: Basic features, examples from fi- [12] SAATY, T.L.: The analytic hierarchy process, McGraw-
nancial decision making; Preference model- Hill, 1980.
ing; Multiple objective programming sup- [13] SAATY, T.L.: 'Priority setting in complex problems',
IEEE Trans. Engin. Management E M - 3 0 , no. 3
port; Multi-objective integer linear pro-
(1983), 140-155.
gramming; Multi-objective combinatorial [14] SAATY, T.L.: Fundamentals of decision making and
optimization; Bi-objective assignment prob- priority theory with the analytic hierarchy process,
lem; Multicriteria sorting methods; Finan- Vol. VI, RWS Publ., 1994.
cial applications of multicriteria analysis; [15] SIMON, H.A.: Models of man, 2 ed., Wiley, 1961.
Portfolio selection and multicriteria analy- [16] STEWART, S.M.: Introduction to matrix computations,
Acad. Press, 1973.
sis; Decision support systems with multiple
[17] TRIANTAPHYLLOU, E.: 'Linear programming based de-
criteria. composition approach in evaluating priorities from
pairwise comparisons and error analysis', J. Optim. Th.
References Appl. 84, no. 1 (1995), 207-234.
[1] CHEN, Q., TRIANTAPHYLLOU, E., AND ZANAKIS, S.: [18] TRIANTAPHYLLOU, E.: Multi-criteria decision making
'Estimating missing comparisons and selecting the methods: A comparative study, Kluwer Acad. Publ.,
next comparison to elicit in MCDM', Working Paper 20OO.
Dept. Industrial Engin. Louisiana State Univ. (2001), [19] TRIANTAPHYLLOU, E., LOOTSMA, F.A., PARDALOS,
http: / / www.imse, lsu. edu / vangelis. P.M., AND MANN, S.H.: 'On the evaluation and appli-
[2] CHU, A.T.W., KALABA, R.E., AND SPINGARN, K.: 'A cation of different scales for quantifying pairwise com-
comparison of two methods for determining the weights parisons in fuzzy sets', J. Multi-Criteria Decision Anal.
of belonging to fuzzy sets', J. Optim. Th. Appl. 27, 3 (1994), 133-155.
no. 4 (1979), 321-538. [20] TRIANTAPHYLLOU, E., AND MANN, S.H.: 'A compu-
[3] FEDEROV, V.V., KUZMIN, V.B., AND VERESKOV, A.I.: tational evaluation of the AHP and the revised AHP
'Membership degrees determination from Saaty matrix
35
Estimating data for multicriteria decision making problems: Optimization techniques
when the eigenvalue method is used under a continu- problem. We focus on evacuation networks where
ity assumption', Computers and Industrial Engin. 26, congestion is a significant problem.
no. 3 (1994), 609-618.
[21] TRIANTAPHYLLOU, E., PARDALOS, P.M., AND MANN,
S.H.: 'A minimization approach to membership evalu- I n t r o d u c t i o n . Evacuation is one of the most per-
ation in fuzzy sets and error analysis', J. Optim. Th. ilous, pernicious, and persistent problems faced by
Appl. 66, no. 2 (1990), 275-287.
humanity. Hurricanes, fires, earthquakes, explo-
[22] TRIANTAPHYLLOU, E., AND SANCHEZ, A.: 'A sensi-
tivity analysis approach for some deterministic multi- sions and other natural and man-made disasters
criteria decision-making methods', Decision Sci. 28, happen on almost a daily basis throughout the
no. 1 (1997), 151-194. world. How can we safely evacuate a collection of
[23] VARGAS, L.G.: 'Reciprocal matrices with random co- occupants within an affected region or facility is
efficients', Math. Model. 3 (1982), 69-81. the fundamental problem faced in evacuation.
[24] WRITE, C., AND TATE, M.D.: Economics and systems
analysis: Introduction for public managers, Addison- Purpose. The purpose of this article is to both in-
Wesley, 1973.
troduce to the reader the problem of evacuation
Qing Chen and its manifest nature, and also suggest some
Dept. Industrial and Manufacturing Systems Engin. alternative approaches to optimize this process.
3128 CEBA Building
That life-threatening evacuations happen as often
Louisiana State Univ.
as they do is somewhat surprising. That people of-
Baton Rouge, LA 70803-6409, USA
Evangelos Triantaphyllou ten do not know how to safely evacuate in time of
Dept. Industrial and Manufacturing Systems Engin. need is a sad reality. That people must help people
3128 CEBA Building plan for evacuation is one of the most important
Louisiana State Univ. activities of a research scientist.
Baton Rouge, LA 70803-6409, USA
E-mail address: triantaOlsu.edu
Web address: www. imse. Isu. edu/vangelis Phase I:
Warning Siren or Alarm
MSC 2000:90C29 Goes Off
Key words and phrases: pairwise comparisons, data elici-
tation, multicriteria decision making, MCDM, scale, ana-
lytic hierarchy process, AHP, consistent judgment matrix, Phase II
eigenvalue, eigenvector, least squares problem, incomplete Reaction to Warning Siren
or Alarm
judgments.
Phase II1:
EVACUATION NETWORKS Decision to Evacuate
Planning and design of evacuation networks is
both a complex and critically important optimi-
Phase IV:
zation problem for a number of emergency situa- Evacuate the region or
tions. One particularly critical class of examples the facility
concerns the emergency evacuation of chemical
plants, high-rise buildings, and naval vessels due
to fire, explosion or other emergencies. The prob- Phase V:
Verification Process
lem is compounded because the solution must take
into account the fact that human occupants may
panic during the evacuation, therefore, there must
be a well-defined set of evacuation routes in order Fig. 1: Processes for an evacuation.
to minimize the sense of panic and at the same
time create safe, effective routes for evacuation. Outline. In this article we first introduce the prob-
The problem is a highly transient, stochastic, non- lem in Section 1 and then describe our funda-
linear, combinatorial optimization programming mental modeling 3-step methodology in Section 2.
36
Evacuation networks
37
Evacuation networks
tation and Analysis Steps is the complexity (i.e., istics determined during the Analysis Step, we can
number of nodes and arcs) of G t, which governs begin to optimize the network topology itself, rout-
the number of equations used in the mathemat- ing and resource allocation problems within:
ical model in the Analysis Step. The Represen-
• Topological Network Design (TND): Deter-
tation Step presents an interesting and challeng-
mination of the number, type, and subset of
ing problem because of the many possible ways of
nodes and arcs as well as the particular node
representing regions, facilities, ships, vehicles, and
and arc topology to be used for the evacua-
building components.
tion.
I
Topology I
Critical Route 3
Routing Network Design (RND)" Determina-
Topology II
Critical Route 8
tion of the routing scheme in both steady-
Topology III
i i i n l l l l Cri|ical Route 9 state and real time.
Capacitated Network Design (CND): Deter-
mination of the Network Resources: Number
of highway lanes, corridor length, widths, ar-
eas, landing shape, reception center capacity,
configuration etc.
Garage Lot B
MOB I ~ Lot A
M a t h e m a t i c a l Models. There are many possi-
Fig. 3: Route site plan. ble mathematical modeling approaches once our
network is constructed and Fig. 5 represents the
range of approaches many research scientists have
Analysis Step. The Analysis Step is the point at followed. References are provided for further de-
which the methodology and mathematical mod-
tails. The boldface text along the morphological
els underlying the flow processes, and the algo-
tree represents the approach suggested in this ar-
rithmic structure for computing the performance
ticle which we have applied in many different con-
characteristics of Gt(Z, E) come together. Mathe-
texts.
matically, we have a network G(V, E), with a finite
Many mathematical models which have ap-
set of nodes V and edges(arcs) E over which mul-
peared in the literature for generating and evalu-
tiple classes of customers (occupants) flow from
ating evacuation paths for an occupant population
source(s) to sink(s) while a vector of objective
[5], [2], [8].
functions f~ = {fl (x), • • •, fp(~) } is simultaneously
extremized subject to a set of constraints on the Set Partitioning Model. The model which is pre-
occupants flowing through the network. Fig. 4 cap- sented below is a variation of one model appearing
tures many of the recognized criteria appropriate in [8]. It was one of the first to account for the
in analyzing a network evacuation problem. In our critical features of the stochastic evacuation prob-
studies, we have often used Minimum Total Evac- lem. Another class of models that one might utilize
uation Time and Minimum Total Distance Trav- to formulate the problem are those of the class of
elled to capture the evacuation problem. The Total multicommodity flow models. Unfortunately, these
Distance travelled is a suitable surrogate objective models will not control the Bernoulli splitting of
for approaching the route complexity, since reduc- the occupant population along the different evac-
ing the evacuation path length will often begin to uation paths which is problematic since splitting
capture the path complexity and, hopefully, mini- the different source populations will engender con-
mizing this measure will abate the occupants sense fusion and create a potential sense of panic among
of panic. Other objectives might be appropriate the evacuating occupants. The integer set parti-
given the particular context or decision situation. tioning programming model presented below has
the desired property to control splitting of the
Synthesis Step. Given the performance character- flows.
38
Evacuation networks
Minimize
Shortest Routes
~ MMinimize
i n i m i z e Total D i s t a n c e Travelled
Maximum Path Lengths
Overall Minimize Routing
Safety Complexity
Minimize Minimize # of Turns
Path Complexity F-Minimize # Up-down Transitions
The multi-objective model of our routing prob- • Ollijk is a data coefficient which equals 1 if
lem is: the gth arc is included in the ijkth route as-
signment and equals 0 otherwise.
minimize { fl (5); f2 (5) }
• Pl is the maximum allowable traffic along arc
where the Evacuation Time, respectively the Dis-
tance Travelled are:
• Cq is the capacity of sink (destination) node
fl (5) -- E E E qijk'kijkXijk' q.
i j k
" Pijk is the occupant population of source ij
i j k on the kth route alternative.
subject to: qijk is the expected evacuation (sojourn) time
* V2 Arcs: of the i j k t h occupant class. These values
must be calculated from the particular sto-
E ~ E Cttijk/~ijkXiJk <-Pt, chastic model used in the evacuation study,
i j k
see the discussion below.
• V3 Sinks:
dijk is the average distance travelled for the
Vq, i j k t h occupant class.
i j k
Since we have two objective in our model, it
• Occupant Classes:
makes sense to talk of the NonInferior (ni) set of
EXijk -- 1, Vi, j, route alternatives, since the trade-offs between fl
k and ]2 naturally underlie the optimal set of solu-
• Routes: tions we seek. Because of the complexity of solv-
X~jk = O, 1, Vi, j, k, ing this model directly, an alternative approach
which systematically generates feasible routing al-
and where: ternatives to a relaxed version of our mathematical
• Xij k 1 if the ith occupant class from the j t h
- - model but at the same time measures the critical
source is assigned the kth route alternative. objectives of evacuation time and distance trav-
39
Evacuation networks
Simulation Approaches
Transient Networks ~_ [18], [81, [211
Analytical Approaches
-- Stochastic Networks
M e a n Value Analysis (MVA)
Steady State Networks-~ [17], [19], [14]
MVA with Finite Waiting Room [4]
- - Static Networks
Transshipment
elled is proposed and demonstrated in the next tersections, landings, stairwells, ramps, and so on
two sections. represent a network of interconnected M / G / C / C
queues. The separations of the circulation blocks
Congestion Models. The real crux of the evacu- are due to changes in flow direction, level, or merg-
ation problem is to capture the congestion that ing and splitting decisions. Further, the cardinality
naturally occurs when occupants choose the short- of S depends on the configuration and complexity
est routes to evacuate. There are some determin- of movement patterns within the facility.
istic measures possible for measuring congestion,
yet stochastic ones are the most accurate, because
queueing is a nonlinear complex phenomenon.
C - 5LW,
where L ( l e n g t h ) a n d W ( w i d t h ) a r e given in me- Fig. 6: Three-dimensional network models.
ters.
Each circulation segment is a representative Flows through the nodes of S, the circulation
'building block' for modeling pedestrian move- system of a building are largely state dependent,
ments through the facility. Corridor segments, in- in that a customer receives service in the circu-
40
Evacuation networks
lation node Sj and this service rate decays with Because of the complexity of dynamically up-
increasing amounts of customer traffic. dating the service rate as a function of the num-
Fig. 7 shows a family of curves which repre- ber of customers within a corridor segment, it be-
sent the variety of empirical studies (the curves comes extremely difficult to utilize digital simu-
in Fig. 7) that document the decay rate of the lation models in the design of circulation systems
customer service rate as a function of population within buildings. Our computational experience in
density in a corridor. Empirical models are also digital simulation of access and egress networks
available showing distributions for stairs and other underscores this defect in simulation models. We
circulation elements with bi-, and multidirectional must, therefore, look to analytical models to aid
pedestrian flows [6], [20]. the network design process if state dependent mod-
Finally, there are a set of classical linear and els are to be effectively utilized. Also, since we are
exponential curves which relate vehicle speed and examining the pedestrian/vehicular network as a
vehicle density captured in Fig. 8. We have utilized design problem rather than as a control problem, it
these type of vehicular speed/density relations to makes most sense to look at steady state measures
develop state dependent models for vehicular traf- rather than transient ones.
tic analysis [7]. We have recently developed a generalized model
of the M / G / C / C Erlang loss queueing model for
service rate decay which can model any service rate
1-5
distribution (linear, exponential, etc.)[3], [4], [15].
It is a special case of an Erlang loss model. F.P.
Kelly [9] has treated M / G / C / C state dependent
\ models in his book, but only ones with a linear,
A
\ increasing function of the number of customers in
1-0
\ the queue, whereas, we treat the queue with an
e~
nonlinear, decreasing service rate, see Fig. 3.
e-
\\\' 80 ~ , , , , , , , , , ,
70 x
60 \~
"d
50 ""\
0-5 .....
40
3o
eo
20
10
0
0 20 40 60 80 100 120 140 160 180 200 _320
0 -
0 1 2 3 4 Density (veh/mile/lane)
Crowd Density (p/m2)
Fig. 8: Empirical distributions of pedestrian traffic flows.
Fig. 7: Empirical distributions of pedestrian traffic flows.
Our M / G / C / C state dependent model dynam-
In general, the service rate # is a function of ve- ically models the flow rate of pedestrians within
locity vi, which is a constant for each individual in a corridor as a function of the population within
the corridor. Thus, it takes ti (seconds) the corridor. Suppose that G is a continuous dis-
tribution having density g and failure rate p(t) =
L g(t)/G(t). Loosely speaking, #(t) is the instanta-
ti = --
vi neous probability intensity that a service t units
old will end. The service rate depends on the num-
for each person to traverse the corridor, where i is ber of customers in the system: given that there
the number of occupants in the circulation system are n people in the system, each server processes
when an individual enters. work at rate f ( n ) . In other words, if there is an dr-
41
Evacuation networks
rival, the service rate will change to f ( n + 1) and k-Shortest Paths. The algorithm to facilitate the
if there is a departure, the service rate will change design methodology can be incorporated into any
to f ( n - 1). simulation e.g. Q-GERT or analytical model e.g.
In particular, the probability distribution of the QNET-C to estimate fl, f2, and carry out the
number of occupants in the corridor is given by: evacuation planning/routing analysis. To summa-
rize and focus the efforts in this article, an algo-
P(n in system) - [AE(S)]nP° rithmic description of Steps 1-3 and it substeps
n! f (n) . . . f (1) ' are presented.
n= 1,...,C,
1) Representation Step: Represent the underly-
where ing facility or region as a network G(V,E)
where V is a finite set of nodes and E is a
1
Po = finite set of arcs or nodal pairs.
1 + ~-~c=1 [AE(S)]'
ii?(i)'..f(1) 2) Analysis Step: Analyze G(V,E) as a queue-
L v__Rn' ing network either with a transient or steady-
E(S)- 1.5' f ( n ) - Vl
state model and compute the total evacua-
and E(S) is the mean service time of a lone occu- tion time of the occupant population along
pant flowing through a corridor of length L, with with total distance travelled to evacuate
service rate 1.5m/see (see Fig. 3). The term vn is given a set of evacuation paths.
defined as the average walking speed when n peo- 3) Synthesis Step:
ple are in the corridor. 3.1) Analyze the queueing output from the
For the M / G / C / C state dependent model, we evacuation model and compute the set
have also shown that the departure process (in- of NonInferior evacuation paths which
cluding customers completing service and those simultaneously minimize time and dis-
that are lost) is a Poisson process with rate A [3], tance travelled in G(V, E) for each oc-
[4]. cupant population.
3.1.1) If the set on NI paths are uniquely
A l g o r i t h m s . The problem we face in our evacua- optimal, then
tion planning problem is that we do not know a pri-
ori which paths are NI without assessing the con- Eij - q i j k - + q ~ 0,
gestion in G(V, E). We must iteratively generate
candidate paths, assess the congestion in G(V, E),
w,j,k,
and then iterate again until the desired trade-offs go to Step 3.2, where:
between distance travelled and evacuation time is a) Eijk is the net increase or de-
acceptable to the planner. This iterative process crease in the average egress
leads to the algorithm described below. For prod- time per person caused by re-
uct form networks where the estimate of time de- routing occupants to the (k +
lays in the Expected Savings calculation for re- 1)st NonInferior route.
routing among the alternative Noninferior paths b) qijk is the sum of the average
can be computed exactly, then the algorithm will queue times per person on the
guarantee finding a Noninferior path for re-routing original route.
the occupant classes. For nonproduct form net- C) dijk is the increased distance
works, which are typically the case, we can only travelled on the (k + 1)st NonIn-
approximate these time delays, therefore, the algo- ferior route (e.g. if the kth Non-
rithm can only guarantee an approximate Nonin- Inferior route is 100 feet and the
ferior solution. Considering the complexity of the (k + 1)st NonInferior route is
underlying stochastic-integer programming prob- 120 feet, dijk is equal to 20 feet,
lem, this is a reasonable and practical strategy. i.e. 1 2 0 - 100).
42
Evacuation networks
d) w is the average travel speed for which seems quite viable, would be to define the
k set of arc disjoint paths, since this would tend to
dij .
e) qk is the sum of the expected completely separate the occupant congestion along
queue times per person on the the paths. We have not experimented with these
(k + 1)st NonInferior route. approaches to define the evacuation routes, but
otherwise: their use might be quite appropriate in the future.
3.1.2) Significant queueing (congestion) ex-
ists on one or more routes then go to S u m m a r y a n d Conclusion. We have given some
Step 3.3. insights into the performance modeling and opti-
3.2) STOP! The NI shortest time/distance mization problems associated with evacuation net-
routes are optimal and identical and to- works. As the maturity of this application area
tal evacuation time, distance and conges- grows, and more research is devoted to the area,
tion are minimized. then more theoretical and algorithmic issues and
3.3) Determine the total number of occupants progress that emerge.
who pass through the queueing area(s) See also: M i n i m u m cost flow p r o b l e m ; N o n -
and trace them back to their origins. convex n e t w o r k flow p r o b l e m s ; Traffic net-
3.4) Select the total number of occupants to w o r k e q u i l i b r i u m ; N e t w o r k location." Cov-
be re-routed from each source node. The e r i n g p r o b l e m s ; M a x i m u m flow p r o b l e m ;
total number of occupants re-routed is Shortest path tree algorithms; Steiner tree
correlated to both the size of the queues problems; Equilibrium networks; Survivable
and the number of occupants on each networks; Directed tree networks; Dynamic
route. In selecting the population, the traffic n e t w o r k s ; A u c t i o n a l g o r i t h m s ; Piece-
analyst should strive to achieve unifor- wise linear n e t w o r k flow p r o b l e m s ; C o m m u -
mity of occupants and queues on each n i c a t i o n n e t w o r k a s s i g n m e n t p r o b l e m ; Gen-
egress route. eralized n e t w o r k s ; N e t w o r k design prob-
3.5) Re-route the population to the kth route lems; S t o c h a s t i c n e t w o r k problems" M a s -
of the NI set of paths where k is selected sively p a r a l l e l solution.
by employing the following formula:
References
k dij [1] BERLIN, G.N.: 'A simulation model for assessing build-
[ i j -- qijk -- -- + q , Vi, j, k. ing firesafety', Fire Techn. 18, no. 1 (1982), 66-76.
w
[2] CHALMET, L.G., FRANCIS, R.L., AND SAUNDERS,
3.6) Select the largest positive E* for each set P.B.: 'Network models for building evacuation', Man-
of populations to be re-routed, where: agem. Sci. 28, no. 1 (1982), 86-105.
[3] CHEAH, JENYENG: 'State dependent queueing models',
E* = max {Ell,...,EH} Master's Thesis Dept. Industr. Engin. and Oper. Res.
Vi,j sources Univ. Massachusetts, Amherst MA 01003 (1990).
for all possible savings, and then re-run [4] CHEAH, JENYENG, AND MACGRECOR SMITH, J.: 'Gen-
the computer evacuation planning model eralized M / G / C / C state dependent queueing models
and pedestrian traffic flows', Queueing Systems and
with the new set of routes, by returning
Their Applications 15 (1994), 365-386.
to Step 2.0 of the General Algorithm. If [5] FRANCIS, R.L., AND CHALMET, L.G.: 'Network mod-
all E~s are negative, stop! The current els for building evacuation: A prototype primer', Un-
set of NI shortest routes used on the pre- published Paper Dept. Industr. Systems Engin. Univ.
vious iteration are selected. Florida, Gainesville, Florida (1980).
[6] FRUIN, J.J.: Pedestrian planning and design, Metro-
Other Algorithms. Besides the k-shortest path ap- politan Assoc. Urban Designers and Environmental
Planners, 1971.
proach, one might utilize a turn-penalty algorithm
[7] JAIN, R., AND MACGREGOR SMITH, J.: 'Modeling ve-
to guide the process of determining the evacuation hicular traffic flow using M / G / C / C state dependent
paths. This is probably very appropriate in vehic- queueing models', Tansportation Sci. 31, no. 4 (1997),
ular evacuation schemes. Also, another approach 324-336.
43
Evacuation networks
[s] KARBOWICZ, C.J., AND MACGREGOR SMITH, J.: 'A Amherst, Massachusetts 01003, USA
k-shortest path routing heuristic for stochastic evacu- E-mail address: jmsmith@ecs, umass, edu
ation networks', Engin. Optim. 7 (1984), 253-280.
MSC 2000: 90-XX
[9] KELLY, F.P.: Reversibility and stochastic networks,
Wiley, 1979. Key words and phrases: combinatorial optimization, evacu-
[10] KOSTREVA, M., AND WIECEK, M.W.: 'Time depen- ation network, congestion.
dency in multiple objective dynamic programming', J.
Math. Anal. Appl. 173 (1993), 289-307.
[11] MACGREGOR SMITH,J.: 'The use of queueing networks EVOLUTIONARY ALGORITHMS IN COM-
and mixed integer programming to optimally allocate
BINATORIAL O P T I M I Z A T I O N , EACO
resources within a library layout', JASIS 32, no. 1
(1981), 33-42. Most of the NP-hard combinatorial optimization
[12] MACGREGOR SMITH, J.: 'An analytical queueing net- problems cannot be solved to optimality in prac-
work computer program for the optimal egress prob- tice. Therefore heuristic techniques have to be used
lem', Fire Techn. 18, no. 1 (1982), 18-37. to obtain solutions of high quality. There exists dif-
[13] MACGREGOR SMITH, J.: 'Queueing networks and fa-
ferent approaches to design a heuristic algorithm,
cility planning', Building and Environment 17, no. 1
(1982), 33-45.
such as tabu search and genetic algorithm for ex-
[14] MACGREGOR SMITH, J.: 'QNET-C: An interactive ample. The latter solution method belongs to a
graphics computer program for evacuation planning', wider class of algorithms, called evolutionary al-
in R. NEWKIRK (ed.): Proc. Soc. for Computer Simu- gorithms, that handle a set of several solutions.
lation Emergency Planning Session, 1987, pp. 19-24. Within this class, the best known algorithms that
[15] MACGREGOR SMITH, J.: 'State dependent queueing
are applied to combinatorial optimization prob-
models in emergency evacuation networks', Transport.
Sci. B 25B, no. 6 (1991), 373-389. lems are genetic algorithms (cf. G e n e t i c algo-
[16] MACGREGOR SMITH, J., AND ROUSE, W.B.: 'Appli- r i t h m s ) and ant systems. For a general presen-
cation of queueing network models to optimization of tation, one can mention [22], [72] for genetic algo-
resource allocation within libraries', JASIS 30, no. 5 rithms and [12], [23] for ant systems.
(1979), 250-263.
In this article, a review of the evolutionary algo-
[17] MACGREGOR SMITH, J., AND TOWSLEY, S.: 'The use
of queueing networks in the evaluation of egress from rithms used up to 1998 in combinatorial optimiza-
buildings', Environment ~ Planning B 8 (1981), 125- tion is being made. For a certain number of combi-
139. natorial problems, the main papers that present an
[ls] STAHL, FRED I.: 'BFIRES-II: A behavior based com- evolutionary algorithm for that problem are refer-
puter simulation of emergency egress during fires', Fire
enced, and some short remarks are given. While it
Techn. 18, no. 1 (1982), 49-65.
[19] TALEBI, K., AND MACGREGOR SMITH, J.: 'Stochastic
is difficult to provide a very precise definition of an
network evacuation models', Comput. Oper. Res. 12, evolutionary algorithm, this term will be used here
no. 6 (1985), 559-577. as a synonym of population-based algorithm: an
[20] TREGENZA, P.: The design of interior circulation, v. algorithm that makes evolve several solutions, in
Nostrand Reinhold, 1976. particular by exchanging some kind of information
[21] WATTS, J.M.: 'Computer models for evacuation anal-
between them. Algorithms that iteratively modify
ysis. Paper presented at the SFPE Symposium': Quan-
titative Methods .for Life Safety Analysis, College Park a solution in order to obtain a good one (like tabu
Maryland, 1986, Available from the Fire Safety Inst., search or genetic algorithms with a 'population'
Middlebury Vermont. of size 1) will not be considered as evolutionary
[22] WOODSIDE, C.M., AND HUNT, R.E.: 'Medical facili- algorithms.
ties planning using general queueing network analysis',
IEEE Trans. SMC-7, no. 11 (1977), 793-799.
[23] YUHASKI, S., AND MACGREGOR SMITH, J.: 'Modeling T h e T r a v e l i n g S a l e s m a n P r o b l e m . The trav-
circulation systems in buildings using state dependent eling salesman problem (or TSP) is probably the
queueing models', Queueing Systems and Their Appli- problem on which the largest number of evolution-
cations 4 (1989), 319-338.
ary algorithms have been applied. It consists in de-
J. MacGregor Smith termining a shortest tour visiting all of the given
Dept. Mechanical and Industrial Engin. Univ. cities exactly once. A very complete survey of lo-
Massachusetts cal search approaches to this problem has been
44
Evolutionary algorithms in combinatorial optimization
provided by D.S. Johnson and L.A. McGeoch [51], and quality of solutions, in [39]. Their use of an
while J.-Y. Potvin [70] compared several genetic edge-preserving crossover and of a hill-climbing al-
algorithms for TSP. In [51], the authors recom- gorithm illustrates important elements necessary
mend different solving techniques depending on to obtain an efficient genetic algorithm for TSP.
the quality of the solution desired and the time These elements have been put forward in different
available. Genetic algorithms or ant systems are comparisons between various genetic algorithms
a good choice if enough running time is allowed for TSP [78], [70], together with the necessity to
and good solutions are needed. With similar run- split the population into several subpopulations for
ning times, the iterated Lin-Kernighan algorithm solving large instances (more than a few hundred
(or ILK) yields better results but is more complex cities).
to implement. In ILK, a single solution instead of The first presentation of ant colony optimiza-
a population of individuals is considered and this tion (ACO) [12] was made with the TSP as illus-
method will therefore not be referred to as an evo- tration and this problem remains the most often
lutionary algorithm. If there is no restriction on used application problem of works on ant colony
the running time, the best results can be obtained optimization. The initial ACO system, named ant
by genetic algorithms based on ILK. system, has been extended to what is called ant
An important breakthrough in the field of evo- colony system (ACS). A description of this algo-
lutionary algorithms for the TSP was the paper rithm can be found in [23] by M. Dorigo and L.M.
[67] by H. Miihlenbein, M. Gorges-Schleuter and Gambardella. In the same paper, local search has
O. Kr£mer. In their algorithm, implemented on a been added to ACS and the resulting algorithm
parallel machine, a solution was allowed to mate has been applied to ATSP and TSP. The results
only with certain other solutions and some opti- reported are better in [39] for TSP, but are bet-
mization technique was applied to the offsprings. ter in [23] for ATSP. Another proposed extension
Indeed, the use of a local search algorithm to im- of ant system, called MAX-MIN ant system [79],
prove created offsprings is a necessary condition consists in introducing explicit maximum and min-
for an evolutionary algorithm to be efficient. More- imum values for the trail factors on the arcs. Good
over, they designed a crossover specific to the TSP, results are obtained with such an algorithm when
called MPX (maximum preservative crossover). It local search is added.
consists in copying a segment of a certain length
from a first parent into the offspring and adding T h e Vehicle R o u t i n g P r o b l e m . The most stud-
cities consecutively from the second parent accord- ied extension of the vehicle routing problem (VRP)
ing to some rules. This crossover is very suitable for is the one with time windows (VRPTW). In order
the TSP, as shown in [66]. Further researches stud- to solve this problem, a two-phase heuristic, called
ied the impact of the different elements on the re- GIDEON, has been proposed in [84]. The first
sults and improved the quality of the solutions ob- phase uses a genetic algorithm to cluster the cus-
tained [44], [7], [89]. Several other crossovers, most tomers, and the solutions obtained are improved
of them using two parents, have been suggested by by local optimization techniques in the second
various authors. In particular, B. Freisleben and phase. This procedure has first been improved in
P. Merz proposed [37], [38] the distance preserv- [83], and then extended in [85]. In this last paper,
ing crossover (or DPX): An offspring is created S.R. Thangiah, I.H. Osman and T. Sun present
by keeping the edges that are found in both par- several metaheuristics, all having a first phase sim-
ents, and greedily reconnecting the different pieces ilar to the one in GIDEON. These algorithms
without using the edges contained in only one par- have been compared to several other heuristics and
ent. They obtain a very efficient algorithm, that showed very good results on test problems taken
won both the ATSP (asymmetric TSP) and the from the literature. Some improvements have still
TSP competitions at the First International Con- to be brought for solving problems with large time
test in Evolutionary Optimization [6]. They fur- windows. For such problems, a heuristic based on
ther improved their algorithm, in terms of speed simulated annealing and a population-based algo-
45
Evolutionary algorithms in combinatorial optimization
rithm called GENEROUS [71] are shown to be a lem of finding a truth assignment for variables
little more efficient. The latter is not a standard to make a propositional formula true is proba-
genetic algorithm since it does not represent solu- bly the best known, and historically the first, NP-
tions by chromosomes, but it nevertheless handles complete problem. But only few evolutionary algo-
several solutions and uses a recombination oper- rithms for SAT can be found in the literature. Af-
ator. An adaptive memory procedure, in conjunc- ter a straightforward approach in [52], a rather dif-
tion with tabu search, has also been applied to this ferent solution representation has been proposed
problem [75]. in [45]. But the drawback of this method, despite
Improvements of the GIDEON approach with adapted operators, is that it increases the size of
local post-optimization procedures have also been the individuals in an important way, compared
used for the VRP with time deadlines. A compar- to the coding 'one gene for one variable'. This
ison done in [87], [86] with two other heuristics last coding has been used in [35], together with
shows that the cluster-first route-second algorithm a SAT-adapted crossover (the objective function
with a genetic algorithm in the first phase per- being simply the number of satisfied clauses). But
forms well for problems in which the customers the evolutionary algorithm thus obtained was not
are distributed uniformly and/or with short time able to compete with a tabu search (also presented
deadlines. in [35]). The tabu search-genetic hybrid (where
some iterations of tabu search is used for muta-
The Quadratic Assignment Problem. The tion) is computationally expensive, but is able to
quadratic assignment problem (or QAP) allows the solve large instances that a tabu search alone can-
modelization of many practical problems in loca- not solve. For smaller instances, the hybridization
tion science, but can be solved optimally only for is not useful.
very small instances. Therefore different heuristics Another heuristic approach to SAT consists in
have been proposed for this problem. Several of assigning weights to the different clauses and min-
them are compared in [13], [81]. For real-world imizing the sum of the weights of the unsatisfied
problems (irregular and structured), the genetic clauses. These weights are adapted during the al-
hybrid by C. Fleurent and J.A. Ferland in [33] ap- gorithm depending on the 'difficulty' of each con-
pears to be one of the most efficient algorithms straint. This mechanism has been used in evolu-
[81]. Based on a standard genetic algorithm with tionary algorithms in [25] and [90], but in both
solutions encoded as permutations [82], this ge- cases it came out that the best results are obtained
netic hybrid applies a robust tabu search on the with a 'population' of size 1. Such an algorithm is
offsprings and was able to find several new best therefore no longer considered as an evolutionary
solutions on some benchmark problems. algorithm.
The ant colony optimization approach has also
been considered, first in [64]. This ant system al- The Set Covering and Set Partitioning
gorithm, hybridized with a local search, has been Problems. The set covering problem (SCP) is a
improved in [63, 62] and provides very good re- zero-one integer programming problem where the
sults. A different ACO approach, where at each it- constraints are all of the type ~ j aijxj ~ 1 with
eration the solutions are modified instead of newly zero-one coefficients. It is a well-known problem,
constructed, has been proposed in [40]. This algo- that has also been used to study penalty functions
rithm, also hybridized with a local search proce- in genetic algorithms [74], [3].
dure, yields better results on real-world problems Different genetic algorithms approaches have
than the genetic hybrid of [33], but is not com- been proposed in the literature (see for example
petitive on random problems. A further promis- [60], [61], [50]), and a very efficient one has been
ing method, based on scatter search, has been pre- presented by J.E. Beasley and P.C. Chu in [5].
sented in [19]. This algorithm uses binary representation of the
solutions, and a repair operator to preserve the
The Satisfiability Problem (SAT). The prob- feasibility of the individuals and to improve the
46
Evolutionary algorithms in combinatorial optimization
solutions. Moreover, a variable mutation rate has specific to this problem to ensure good feasible
been introduced. Results on standard test prob- offsprings and obtained high-quality results, but
lems up to 1000 constraints and 10,000 variables needed also more computation time (on a same
show the efficiency of this algorithm that was able machine, about one hour for the genetic algorithm
to improve the best-known result on some of the against a few seconds for the other heuristics).
larger instances. The same paper shows no signif-
icant difference between various crossovers. T h e Bin P a c k i n g P r o b l e m . The standard
The set partitioning problem (SPP) is also a one-dimensional bin packing problem consists in
zero-one integer programming problem, the dif- putting items of given sizes in bins of given ca-
ference with SCP being that the constraints are pacity. Many evolutionary algorithms proposed for
equalities instead of inequalities. Relatively few this problem (genetic algorithms and evolution
heuristics have been developed for this problem. D. strategy, see for example [77], [16], [57]) performed
Levine investigated sequential and parallel genetic worse than a simple heuristic like first fit decreas-
algorithms for SPP [59]. His best algorithm was a ing. E. Falkenauer and A. Delchambre then sug-
genetic algorithm in an island model, hybridized gested in [30] a genetic algorithm designed for
with a local search heuristic. But this algorithm grouping problems: the grouping genetic algorithm
remained less efficient, both in terms of quality of (GGA). In this algorithm, solutions are repre-
the solutions and in terms of running time, than sented by chromosomes having two parts: the item
the branch and cut approach of [49]. Some prob- part encodes for each item its bin and the group
lems met by his algorithm were due to the penalty part, of variable length, encodes the bin identi-
term for infeasible solutions in the fitness function. fiers used. The crossover, mutation and inversion
In order to overcome these problems, other authors operators have been adapted to this encoding. In-
decomposed the single fitness measure in two dis- stead of simply using the number of bins, the au-
tinct parts (the objective function and a measure thors designed a fitness function that also takes
of 'infeasibility') [10]. Adapting the parent selec- into account the proportion to which each bin is
tion method to this modification, and also using filled. With this approach, they obtained very sat-
an improvement operator, they obtained a better isfactory results. The arguments presented for this
genetic algorithm, but that is still not able, for new encoding are discussed by C. Reeves in [73].
the problems they considered, to compete with a In the same paper, a hybrid genetic algorithm is
commercial mixed integer solver. presented, where solutions are represented by per-
mutations and decoded using heuristics like first
T h e K n a p s a c k P r o b l e m . The multidimensional fit and best fit. The results obtained are more or
(zero-one) knapsack problem is equivalent to the less similar to those in [30]. A problem size re-
zero-one integer programming problem with non- duction heuristic, similar to the reduction process
negative coefficients. Only few papers tried to solve used in [16], has also been introduced in this ge-
this problem with evolutionary algorithms. While netic algorithm. According to Falkenauer [29], this
the first such algorithms did not give high-quality reduction violates the search strategy of the ge-
results and were not competitive with other heuris- netic algorithm and he therefore prefers the GGA's
tics [56], [88], the quality has improved. Genetic crossover, that has the same goal of propagating
algorithms as presented in [11], [48], both work- promising bins. In the same paper, the GGA is
ing only with feasible solutions, are able to ob- improved by the introduction of local optimiza-
tain optimal solutions on standard test problems tion inspired by the dominance criterion of [65].
(instances with at most 105 variables and 30 con- The new algorithm is compared with an efficient
straints). In [11], Chu and Beasley proposed some branch and bound algorithm and gives excellent
larger test problems (up to 500 variables and 30 results.
constraints), without known optimal solution, and Extensions of the standard bin packing prob-
used them for a comparison with other heuristics. lem, like the two-dimensional bin packing problem,
Their genetic algorithm uses a 'repair' operator have also been considered with evolutionary algo-
47
Evolutionary algorithms in combinatorial optimization
rithms [77], [15], [69]. An overview of these varia- offsprings is a steepest descent method, instead of
tions is presented in [43]. a tabu search like in [34]. Despite this less sophis-
ticated method, their algorithm gives similar re-
G r a p h Coloring. The graph coloring problem is sults to those obtained by the hybrid algorithm
a well-known problem in graph theory; it consists in [34]. Moreover, the latter gives worser results
in determining the smallest number of colors that when the tabu search is replaced by a simple de-
must be used to color the vertices of a graph such scent method.
that two adjacent vertices do not have the same Concerning ant colony optimization, a first ap-
color. L. Davis is the first author who proposed proach to graph coloring has been proposed in [17],
an evolutionary algorithm for this problem [22]. but the results obtained need improvements.
In fact, he considered a graph with weights on
the vertices and an integer k. He then designed
a hybrid genetic algorithm for finding a partial k-
Other Graph Problems.
coloring such that the colored vertices have max-
imum total weight. In this algorithm, individuals Maximum Clique. The problem of determining the
are represented as permutations of the vertices of maximum clique (complete subgraph) in a graph
the graph. This order-based encoding is not very is equivalent to the problem of determining the
efficient, as shown by Fleurent and Ferland in [34]. minimum vertex cover or the maximum stable set
In this paper, they also present hybrid genetic al- in the complementary graph. A first genetic al-
gorithms that use string-based encodings of the gorithm, hybridized with a tabu search, has been
solutions for finding a coloring in k colors with proposed by Fleurent and Ferland in [35], but they
as few conflicting edges (edges with both ends of show that their tabu search alone gives similar
the same color) as possible. They consider differ- results in a shorter time. In these algorithms, a
ent crossovers, including a graph-adapted one, and solution is a set of vertices of given size and the
hybridize the genetic algorithm with a simple lo- objective function measures how many edges are
cal search or with tabu search (a modified version missing for a set to be a clique. Improving an algo-
of [46]). The results on random graphs G,~,0.5 im- rithm of [2], E. Balas and W. Niehaus [4] proposed
prove the previous best results. For graphs up to a genetic algorithm (without improving algorithm
300 vertices, their tabu search-genetic hybrid and applied to the offsprings) for both the maximum
their tabu search give similar results, but in much cardinality and maximum weight clique problems
less time for the latter. For larger graphs (500 or where an individual is a clique. In this algorithm,
1000 vertices), the running time becomes prohibi- the recombination operation ('crossover') used is
tive, and both the evolutionary algorithm and the designed specifically for this problem and taken
tabu search must be used within a different ap- from another heuristic. The results obtained on the
proach (determining large stable sets and color- DIMACS benchmark graphs are very good, similar
ing the residual graph). The tests on 450-vertices to those obtained in [35] from the point of view of
Leighton graphs (with known chromatic numbers) the solutions' quality. A different fitness function
showed that the tabu search-genetic hybrid out- has been suggested in [8] and included in a hybrid
performs the tabu search on about half of the in- genetic algorithm using a local optimization step.
stances, while the opposite is true for the remain- The fitness value associated to a set of vertices is
ing instances. The hybrid algorithm was able to a weighted combination of the size of the set and
find an optimal solution for two instances (out of the number of edges missing to have a clique, but
twelve) that could not be solved by the tabu search the weights are modified during the run of the al-
alone. gorithm according to a simple rule. Despite the in-
Another evolutionary algorithm has been pro- troduction of a preprocessing step that determines
posed in [18], with a graph-adapted crossover that the order of the vertices on the chromosome, this
takes into account how 'close' a vertex is to con- algorithm is less efficient (but this may be due to
flicting edges. The improving algorithm applied to the use of the 2-point crossover).
48
Evolutionary algorithms in combinatorial optimization
Graph Partitioning. Evolutionary algorithms are solution is represented by the coordinates of the
rather seldom used to tackle the k-way graph par- Steiner points. A comparison with simulated an-
titioning problem (partitioning a (weighted) graph nealing and the Rayward-Smith-Care algorithm
in k equal-sized parts), even if the graph bisection- shows no significant differences. The problem of
ing problem (the case k = 2) is sometimes taken to the rectilinear Steiner problem has been addressed
illustrate various ingredients in genetic algorithms in [53] with a specific coding and an adapted
([9], [54]). For the general k-way graph partition- crossover. The minimal Steiner tree problem in
ing problem, different problem-oriented operators graphs has attracted a little more interest. A stan-
are introduced and studied in a parallel genetic dard genetic algorithm (with bit strings as chro-
algorithm in [58]. In this algorithm, the popula- mosomes) that gave good results on the sparse
tion is only composed of feasible solutions. An- graphs tested has been proposed in [55]. Later, H.
other approach has been proposed in [76] where Esbensen and P. Mazumder [28] designed a genetic
the population is split in two halves: one contain- algorithm in which the encoding method is based
ing only feasible solutions and the other only infea- on the distance network heuristic. Improvements
sible ones. This algorithm uses the same encoding have been brought in [26] and [27], where there
scheme and crossover operator as [58], but has not is also a comparison between different algorithms.
been applied on similar instances of the problem. But this genetic algorithm is not competitive with
In a general way, genetic algorithms give good re- an efficient tabu search as presented in [41].
sults on partitioning problems, but at a very high
computational cost.
Conclusion. In this paper, some references on the
Miscellaneous. evolutionary approaches that have been proposed
up to 1998 for different combinatorial problems
Sequencing and Scheduling. The best-known se-
have been given. A general remark that can be
quencing and scheduling problems are the flow-
made on these solution methods is that evolution'
shop, job-shop and open shop problems. The first
ary algorithms in general, and genetic algorithms
paper applying an evolutionary algorithm to such
in particular, are not efficient for such problems
a problem is [21]. Later, several other genetic al-
if implemented too naively. To obtain an algo-
gorithms have been proposed ([36], [80] for exam-
rithm with good performances, it is necessary to
ple). One of the first efficient evolutionary algo-
make adjustments of the basic method. Moreover,
rithm for job-shop problems has been presented in
knowledge about the problem considered is very
[68] and improved in [91], [20]. Comparisons done
often also needed, in order to design adapted op-
with other heuristics on benchmark problems show
erators.
that sophisticated genetic algorithms (with the use
of problem-adapted crossovers and hybridization) Another remark concerns their competitivity
yield the best results for flow-shop and job-shop compared to other heuristic methods. While evo-
problems [1], [24], [42]. The open shop problems lutionary algorithms can quite easily be adapted
have less attracted researchers of the evolutionary to (almost) any problem, their running time is of-
algorithms' field, but a genetic algorithm has been ten quite high. Local search algorithms, like tabu
proposed in [32], [31]. An ant colony approach of search or simulated annealing, can also be adapted
job-shop problems has also been tested, in [14], but to the different combinatorial problems quite eas-
gave worse results than known genetic algorithms. ily. If they are designed in an intelligent way,
they are very often able to obtain better results
Steiner Trees. Only very few works deal with than evolutionary algorithms. Moreover, they are
Steiner trees and evolutionary algorithms. More- usually faster. For some problems, specifically de-
over, they consider different variants of this prob- signed heuristics can use theoretical results about
lem. The first paper [47] proposes a genetic al- this problem, allowing them to obtain good results.
gorithm with local optimization for determining In general, evolutionary algorithms are not com-
minimum Steiner trees in the Euclidean plane. A petitive against (extended) local search or specific
49
Evolutionary algorithms in combinatorial optimization
algorithms for small to m e d i u m size instances of Is] BuI, T.N., AND EPPLEY, P.H.: 'A hybrid genetic al-
combinatorial problems. gorithm for the maximum clique problem', in L.J. ES-
HELMAN (ed.): Proc. 6th Internat. Conf. Genetic Algo-
B u t this does not m e a n t h a t p o p u l a t i o n - b a s e d rithms, Morgan Kaufmann, 1995, pp. 478-484.
algorithms are not useful. In fact, the different ap- [9] BuI, T.N., AND MOON, B.R.: 'On multi-dimensional
proaches have various (dis)advantages, and the ef- encoding/crossover', in L.J. ESHELMAN (ed.): Proc.
ficient algorithms t h a t will be developed in the fu- 6th Internat. Conf. Genetic Algorithms, Morgan Kauf-
ture will p r o b a b l y mix these different approaches. mann, 1995, pp. 49-56.
Such algorithms are usually called 'hybrid algo-
[10] CHU, P.C., AND BEASLEY, J.E.: 'Constraint handling
in genetic algorithms: the set partitioning problem', J.
r i t h m s ' and have already been proposed for exam- Heuristics 4 (1998), 323-357.
ple for the traveling salesman p r o b l e m [39] or the [11] CHU, P.C., AND BEASLEY, J.E.: 'A genetic algorithm
q u a d r a t i c assignment p r o b l e m [33], d e m o n s t r a t i n g for the multidimensional knapsack problem', J. Heuris-
their potentials. tics 4 (1998), 63-86.
See also: F r a c t i o n a l c o m b i n a t o r i a l opti-
[12] COLORNI, A., DORIGO, M., AND MANIEZZO, V.: 'Dis-
tributed optimization by ant coloniess', in F. VARELA
mization; Replicator dynamics in c o m b i - AND P. BOURGINE (eds.): Proc. ECAL91 - European
n a t o r i a l o p t i m i z a t i o n ; N e u r a l n e t w o r k s for Conf. Artificial Life, Elsevier, 1991, pp. 134-142.
combinatorial optimization; Combinatorial [13] COLORNI, A., DORIGO, M., AND MANIEZZO, V.: 'A1-
matrix analysis; Multi-objective combinato- godesk: An experimental comparison of eight evolu-
tionary heuristics applied to the quadratic assignment
rial optimization; Combinatorial optimiza-
problem', Europ. J. Oper. Res. 81 (1995), 188-205.
tion games.
[14] COLORNI, A., DORIGO, M., MANIEZZO,V., AND TRU-
BIAN, M.: 'Ant system for job-shop scheduling', JOR-
BEL - Belgian J. Oper. Res., Statist. and Computer
References Sci. 34, no. 1 (1994), 39-53.
[1] AARTS, E.H.L., LAARHOVEN, P.J.M. VAN, LENSTRA, [1~] CORCORAN, A.L., AND WAINWRIGHT, R.L.: 'A ge-
J.K., AND ULDER, N.L.J.: 'A computational study of netic algorithm for packing in three dimensions': Proc.
local search algorithms for job shop scheduling', ORSA 1992 A CM/SIGAPP Symposium on Applied Comput-
J. Comput. 6 (1994), 118-125. ing SAC'92, ACM, 1992, pp. 1021-1030.
[2] AGGARWAL, C.C., ORLIN, J.B., AND TAI, R.P.: 'An [16] CORCORAN, A.L., AND WAINWRIGHT, R.L.: 'A heuris-
optimized crossover for maximum independent set', tic for improved genetic bin packing', Techn. Report
Oper. Res. 45 (1995), 226-234. UTULSA-MCS-93-08, Univ. Tulsa, USA (1993).
[3] BACK, T., SCHUTZ, M., AND KHURI, S.: 'A compara- [17] COSTA, D., AND HERTZ, A.: 'Ants can color graphs',
tive study of a penalty function, a repair heuristic, and J. Oper. Res. Soc. 48 (1997), 295-305.
stochastic operators with the set-covering problem', in [ls] COSTA, D., HERTZ, A., AND DUBUIS, O.: 'Embedding
J.M. ALLIOT, E. LUTTON, E. RONALD, M. SCHOEN- a sequential procedure within an evolutionary algo-
HAUER, AND D. SNYERS (eds.): Artificial Evolution: rithm for coloring problems in graphs', J. Heuristics
European Conf., Vol. 1063 of Lecture Notes Computer 1 (1995), 105-128.
Sci., Springer, 1996, pp. 3-20. [19] CUNG, V.-D., MAUTOR, TH., MICHELON, PH., AND
[4] BALAS, E., A N D NIEHAUS, W.: 'Optimized crossover- TAVARES, A.: 'A scatter search based approach for the
based genetic algorithms for the maximum cardinality quadratic assignment problem': Proc. 1997 IEEE In-
and maximum weight clique problems', J. Heuristics 4 ternat. Conf. Evolutionary Computation, IEEE Press,
(1998), 107-122. 1997, pp. 190-206.
[5] BEASLEY, J., AND CHU, P.: 'A genetic algorithm for [20] DAVIDOR, Y., YAMADA, T., AND NAKANO, R.: 'The
the set covering problem', Europ. J. Oper. Res. 94 ecological framework II: Improving GA performance
(1996), 392-404. with virtually zero cost', in S. FORREST (ed.): Proc.
[6] BERSINI,H., DORIGO,M., LANGERMAN,S., SERONT, 5th Internat. Conf. Genetic Algorithms, Morgan Kauf-
G., AND GAMBARDELLA, L.M.: 'Results of the first mann, 1993, pp. 171-176.
international contest on evolutionary optimisation (Ist [21] DAVIS, L.: 'Job shop scheduling with genetic algo-
ICEO)': Proc. 1996 IEEE Internat. Conf. Evolutionary rithms', in J.J. GREFENsTETTE (ed.): Proc. 1st Inter-
Computation, IEEE Press, 1996, pp. 611-615. nat. Conf. on Genetic Algorithms, Lawrence Erlbaum
[7] BRAUN, H.: 'On solving travelling salesman prob- Ass., 1985, pp. 136-140.
lems by genetic algorithms', in H.-P. SCHWEFEL AND [22] DAVIS, L.: Handbook of genetic algorithms, v. Nos-
R. M)i.NNER (eds.): Parallel Problem Solving from Na- trand Reinhold, 1991.
ture, Vol. 496 of Lecture Notes Computer Sci., Springer, [23] DORIGO, M., AND GAMBARDELLA, L.M.: 'Ant colony
1991, pp. 129-133.
50
Evolutionary algorithms in combinatorial optimization
system: A cooperative learning approach to the trav- nat. Conf. on Evolutionary Computation, IEEE Press,
eling salesman problem', IEEE Trans. Evolutionary 1996, pp. 616-621.
Computation 1 (1997), 53-66. [3s] FREISLEBEN, B., AND MERZ, P.: 'New genetic local
[24] DUVIVIER, D., PREUX, PH., AND TALBI, E.-G.: 'Sto- search operators for the traveling salesman problem',
chastic algorithms for optimization and application to in H.-M. VOIGT, W. EBELING, I. RECHENBERG, AND
job-shop-scheduling', Techn. Report LIL-95-5, Univ. H.-P. SCHWEFEL (eds.): Proc. 4th Con/. on Paral-
du Littoral, France (1995). lel Problem Solving from Nature, Vol. 1141 of Lecture
[25] EIBEN, A.E., AND HAUW, J.K. VAN DER: 'Solving 3- Notes Computer Sci., Springer, 1996, pp. 890-899.
SAT with adaptive genetic algorithms': Proc. ~th IEEE [39] FREISLEBEN, B., AND MERZ, P.: 'Genetic local search
Conf. Evolutionary Computation, IEEE Press, 1997, for the TSP: new results': Proc. 1997 IEEE Internat.
pp. 81-86. Conf. on Evolutionary Computation, IEEE Press, 1997,
[26] ESBENSEN, H.: 'Computing near-optimal solutions to pp. 159-164.
the Steiner problem in a graph using a genetic algo- [40] GAMBARDELLA, L.-M., TAILLARD, E.D., AND
rithm', Networks 26 (1995), 173-185. DORIGO, M.: 'Ant colonies for the quadratic as-
[27] ESBENSEN, H.: 'Finding (near-)optimal Steiner trees in signment problems', J. Oper. Res. Soc. 50 (1999),
large graphs', in L.J. ESHELMAN (ed.): Proc. 6th Inter- 167-176.
nat. Con/. on Genetic Algorithms, Morgan Kaufmann, [41] GENDREAU, M., LAROCHELLE, J.-F., AND SANS6, B."
1995, pp. 485-491. 'A tabu search heuristic for the Steiner tree problem',
[ss] ESBENSEN, H., AND MAZUMDER, P.: 'A genetic algo- GERAD G-98-01, Univ. Montrdal, Canada (1998).
rithm for the Steiner problem in a graph', Techn. Re- [42] GLASS, C.A., AND POTTS, C.N.: 'A comparison of lo-
port Univ. Michigan, Ann Arbor (1993). cal search methods for flow shop', in G. LAPORTE AND
[29] FALKENAUER, E.: 'A hybrid grouping genetic algorithm I.H. OSMAN (eds.): Metaheuristics in combinatorial op-
for bin packing', J. Heuristics 2 (1996), 5-30. timization, Vol. 63 of Ann. Oper. Res., Baltzer, 1996,
[30] FALKENAUER, E., AND DELCHAMBRE, A.: 'A genetic pp. 489-509.
algorithm for bin packing and line balancing': Proc. [43] GOODMAN, E.D., TETELBAUM, A.Y., AND KURE-
1992 IEEE Internat. Conf. on Robotics and Automa- ICHIK, V.M.: 'A genetic algorithm approach to com-
tion, IEEE Computer Soc. Press, 1992, pp. 1186-1192. paction, bin packing and nesting problems', Techn. Re-
[31] FANG, H.-L.: 'Genetic algorithms in timetabling and port GARAGe94-4, Michigan State Univ. (1994).
scheduling', PhD Thesis, Univ. Edinburgh (1994). [44] GORGES-SCHLEUTER, M.: 'Asparagos: An asynchro-
[32] FANG, H.-L., Ross, P., AND CORNE, D.: 'A promis- nous parallel genetic optimization strategy', in J.D.
ing genetic algorithm approach to job-shop scheduling, SCHAFFER (ed.): Proc. 3rd Internat. Con/. on Genetic
re-scheduling, and open-shop scheduling problems', in Algorithms, Morgan Kaufmann, 1989, pp. 422-427.
S. FORREST (ed.): Proc. 5th Internat. Conf. Genetic [45] HAO, J.K.: 'A clausal genetic representation and its
Algorithms, Morgan Kaufmann, 1993, pp. 375-382. related evolutionary procedures for satisfiability prob-
[33] FLEURENT, C., AND FERLAND, J.A.: 'Genetic hy- lems', in D.W. PEARSON, N.C. STEELE, AND R.F. AL-
brids for the quadratic assignement problem', in P.M. BRECHT (eds.): Proc. 2nd Internat. Conf. on Artificial
PARDALOS AND H. WOLKOWICZ (eds.): Quadratic as- Neural Networks and Genetic Algorithms, Springer,
signment and related problems, DIMACS 16, Amer. 1995, pp. 289-292.
Math. Soc., 1994, pp. 190-206. [46] HERTZ, A., AND WERRA, D. DE: 'Using tabu search
[34] FLEURENT, C., AND FERLAND, J.A.: 'Genetic and hy- techniques for graph coloring', Computing 39 (1987),
brid algorithms for graph coloring', in G. LAPORTE 345-351.
AND I.H. OSMAN (eds.): Metaheuristics in combinato- [47] HESSER, J., M.~NNER, R., AND STUCKY, O.: 'On
rial optimization, Vol. 63 of Ann. Oper. Res., Baltzer, Steiner trees and genetic algorithms', in J.D. BECKER,
1996, pp. 437-461. I. EISELE, AND F.W. MUNDEMANN (eds.): Parallelism,
[35] FLEURENT, C., AND FERLAND, J.A.: 'Object-oriented Learning, Evolution, Vol. 565 of Lecture Notes Artifi-
implementation of heuristic search methods for graph cial Intelligence, Springer, 1991, pp. 509-525.
coloring, maximum clique, and satisfiability', in D.S. [4s] HOFF, A., LOKKETANGEN, A., AND MITTET, I." 'Ge-
JOHNSON AND M.A. TRICK (eds.): Cliques, coloring, netic algorithms for 0/1 multidimensional knapsack
and satisfiability, Amer. Math. Soc., 1996, p. 619. problems', Proc. Norsk Informatik Konferanse, NIK
[36] Fox, B.R., AND MCMAHON, M.B.: 'Genetic operators '96 (1996).
for sequencing problems', in G.J.E. RAWLINS (ed.): [49] HOFFMAN, K., AND PADBERG, M.: 'Solving airline
Foundations of Genetic Algorithms, Morgan Kauf- crew-scheduling problems by branch-and-cut', Man-
mann, 1991, pp. 284-300. agem. Sci. 39 (1993), 657-682.
[37] FREIsLEBEN, B., AND MERZ, P.: 'A genetic local [50] HUANG, W.-C., KAO, C.-Y., AND HORNG, J.-T.:
search algorithm for solving symmetric and asymmetric 'A genetic algorithm approach for set covering prob-
traveling salesman problems': Proc. 1996 IEEE Inter- lems': Proc. First IEEE Internat. Conf. on Evolution-
51
Evolutionary algorithms in combinatorial optimization
ary Computation, IEEE Press, 1994, pp. 569-574. Trans. Knowledge and Data Engin. (1998).
[51] JOHNSON, D.S., AND MCGEOCH, L.A.: 'The travel- [64] MANIEZZO, V., COLORNI, A., AND DORIGO, M.: 'The
ing salesman problem: A case study in local optimiza- ant system applied to the quadratic assignment prob-
tion', in E.H.L. AARTS AND J.K. LENSTRA (eds.): Lo- lem', Techn. Report IRIDIA/94-28, Univ. Libre de
cal Search in Combinatorial Optimization, Wiley, 1997, Bruxelles, Belgium (1994).
pp. 215-310. [65] MARTELLO, S., AND TOTH, P.: 'Lower bounds and re-
[52] JONG, K.A. DE, AND SPEARS, W.M.: 'Using genetic duction procedures for the bin packing problem', Dis-
algorithms to solve NP-complete problems', in J.D. crete Appl. Math. 22 (1990), 59-70.
SCHAFFER (ed.): Proc. 3rd Internat. Conf. on Genetic [66] MATHIAS, K., AND WHITLEY, D.: 'Genetic opera-
Algorithms, Morgan Kaufmann, 1989, pp. 123-132. tors, the fitness landscape and the traveling salesman
[53] JULSTROM, B.A.: 'A genetic algorithm for the rectilin- problem', in R. M~.NNER AND B. MANDERICK (eds.):
ear Steiner problem', in S. FORREST (ed.): Proc. 5th In- Parallel Problem Solving from Nature, Elsevier, 1992,
ternat. Conf. Genetic Algorithms, Morgan Kaufmann, pp. 219-228.
1993, pp. 474-480. [67] M~)HLENBEIN, H., GORGES-SCHLEUTER, M., AND
[54] KAHNG, A.B., AND MOON, B.R.: 'Toward more power- KR)i.MER, O.: 'Evolution algorithms in combinatorial
ful recombinations', in L.J. ESHELMAN (ed.): Proc. 6th optimization', Parallel Comput. 7 (1988), 65-85.
Internat. Conf. on Genetic Algorithms, Morgan Kauf- [6s] NAKANO, R., AND YAMADA, T.: 'Conventional genetic
mann, 1995, pp. 96-103. algorithm for job shop problems', in R. BELEW AND
[55] KAPSALIS, A., RAYWARD-SMITH, V.J., AND SMITH, L. BOOKER (eds.): Proc. 4th Internat. Conf. on Ge-
G.D.: 'Solving the graphical Steiner tree problem us- netic Algorithms, Morgan Kaufmann, 1991, pp. 474-
ing genetic algorithms', J. Oper. Res. Soc. 44 (1993), 479.
397-406. [69] PARGAS, R.P., AND JAIN, R.: 'A parallel stochastic op-
[56] KHURI, S., BACK, T., AND HEITK()TTER, J.: 'The timization algorithm for solving 2D bin packing prob-
zero/one multiple knapsack problem and genetic algo- lems': Proc. 9th Conf. on Artificial Intelligence for Ap-
rithms': Proc. 1994 ACM Symposium on Applied Com- plications, 1993, pp. 18-25.
puting, ACM, 1994, pp. 188-193. [70] POTVIN, J.-Y.: 'Genetic algorithms for the traveling
[57] KHURI, S., SCHLITZ, M., AND HEITK()TTER, J.: 'Evo- salesman problem', in G. LAPORTE AND I.H. OSMAN
lutionary heuristics for the bin packing problem', in (eds.): Metaheuristics in combinatorial optimization,
D.W. PEARSON, N.C. STEELE, AND R.F. ALBRECHT Vol. 63 of Ann. Oper. Res., Baltzer, 1996, pp. 339-370.
(eds.): Proc. 2nd Internat. Conf. on Artificial Neu- [71] POTVIN, J.-Y., AND BENGIO, S.: 'A genetic approach
ral Networks and Genetic Algorithms, Springer, 1995, to the vehicle routing problem with time windows',
pp. 285-288. Techn. Report CRT-953, Univ. Montrdal (1993).
[58] LASZEWSKI, G. VON: 'Intelligent structural opera- [72] REEVES, C.R. (ed.): Modern heuristic techniques for
tors for the k-way graph partitioning problem', in combinatorial problems, Blackwell, 1993.
R. BELEW AND L. BOOKER (eds.): Proc. 4th Inter- [73] REEVES, C.: 'Hybrid genetic algorithms for bin-
nat. Conf. on Genetic Algorithms, Morgan Kaufmann, packing and related problems', in G. LAPORTE AND
1991, pp. 45-52. I.H. OSMAN (eds.): Metaheuristics in combinatorial op-
[59] LEVINE, D.: 'A parallel genetic algorithm for the set timization, Vol. 63 of Ann. Oper. Res., Baltzer, 1996,
partitioning problem', PhD Thesis Illinois Inst. Techn. pp. 371-396.
(1994). [74] RICHARDSON, J.T., PALMER, M.R., LIEPINS, G.E.,
[60] LIEPINS, G.E., HILLIARD, M.R., PALMER, M.R., AND AND HILLIARD, M.: 'Some guidelines for genetic al-
MORROW, M.: 'Greedy genetics', in J.J. GREFEN- gorithms with penalty functions', in J.D. SCHAFFER
STETTE (ed.): Proc. 2nd Internat. Conf. on Genetic (ed.): Proc. 3rd Internat. Conf. on Genetic Algorithms,
Algorithms, Lawrence Erlbaum Ass., 1987. Morgan Kaufmann, 1989, pp. 191-197.
[61] LIEPINS, G.E., HILLIARD, M.R., RICHARDSON, J.T., [75] ROCHAT, Y., AND TAILLARD, E.D.: 'Probabilistic di-
AND PALMER, M.: 'Genetic algorithms applications to versification and intensification in local search for ve-
set covering and traveling salesman problems', in D.E. hicle routing', J. Heuristics 1 (1995), 147-167.
BROWN AND C.C. WHITE (eds.): Oper. Res. and Arti- [76] SEKHARAN, D.A., AND WAINWRIGHT, R.L.: 'Manip-
ficial Intelligence: The Integration of Problem-Solving ulating subpopulations in genetic algorithms for solv-
Strategies, Kluwer Acad. Publ., 1990, pp. 29-57. ing the k-way graph partitioning problem': Proc. 7th
[62] MANIEZZO, V.: 'Exact and approximate nondetermin- Oklahoma Symposium on Artificial Intelligence, 1993,
istic tree-search procedures for the quadratic assign- pp. 215-225.
ment problem', Techn. Report Univ. Bologna C S R 98- [77] SMITH, D.: 'Bin packing with adaptive search', in
1 (1998). J.J. GREFENsTETTE (ed.): Proc. 1st Internat. Conf.
[63] MANIEZZO, V., AND COLORNI, A.: 'The ant system on Genetic Algorithms, Lawrence Erlbaum Ass., 1985,
applied to the quadratic assignment problem', IEEE pp. 202-207.
52
Extended cutting plane algorithm
53
Extended cutting plane algorithm
54
Extended cutting plane algorithm
the solution to (5). The idea of finding a feasible g_j + ~k). (vyj)T (z - ~ ) _< ~j(z), (s)
and optimal point by solving a sequence of MILP
Vz e {z e x x Y. ~ j ( z ) _ 0}.
problems is the same as in the classical Kelley's
cutting plane method for NLP problems. However, A weaker condition is that the inequality (8) is
J.E. Kelley [7] considered only the continuous case satisfied only for all current iteration points. If this
using LP subsolutions. Furthermore, Kelley's cut- condition is satisfied, the linearization is called a
ting plane algorithm assumes that the lineariza- local underestimator. Thus the linearization is a
tions will always be valid underestimators of the local underestimator if it satisfies the following in-
corresponding nonlinear functions. This is true if equality in iteration k
the functions are convex, since for convex functions
it holds that yj + ~k). (vyj)T(z k _ ~j) < ~j(zk), (9)
g~(z k) + vg~(zk) T (z - z k) _< g~(z) (7) j = 1,...,Lk. (10)
for all z, z k C X × Y. Thus l~k) (z) < 0 whenever This inequality is easy to check in each itera-
.I
55
Extended cutting plane algorithm
56
Extended cutting plane algorithm
that cycling is not possible. Note that compact- COROLLARY 2 If the current point z k is infeasible,
ness or quasiconvexity of the constraint functions then z k is different from all previous points. [-7
are unnecessary to prove this theorem.
PROOF. If there is a z j, j < k, such that zJ - z k,
[ L1 = 0 , k = 0 I then zJ would be a point not satisfying the previ-
ous theorem. [2
I Sk=k+ L
o l v e (5) 1"
Convergence To a Feasible P o i n t . Convergence to
a feasible point for discrete problems is directly en-
No ~1 Update as l
sured by the above cycling theorem. By assump-
tion, there are only a finite number of points in
Y, and there is at least one feasible point. Conse-
Call solution z k. 1
Calculate [ quently, if the algorithm does not find any of the
gk--maxi{g,(zk(} [
feasible points in finite time, it would have to re-
peat an infeasible point after generating at most
No ~[ Update as [ ~ IYI iteration points, which is not possible under
-[ according to (11)
V" the cycling theorem.
Convergence in the mixed integer case can be
proven by utilizing the fact that the points x k are
I Add linearization
No a c c o r d i n g t o (4) taken on a compact set X, and the set Y is finite.
L~+I -- Lk q- 1
This implies that any infinite sequence of points
{z k _ (x k, yk). k E ~} taken on the set X x Y has
a subsequence with a limit point. The following
theorem shows that any limit point will be a feasi-
No ~1 Update as
(18) ble point which is a property required for conver-
-I a c c o r d i n g to
gence. Note that the quasiconvex property of the
nonlinear functions is not required to prove con-
P o i n t z ~ is vergence of the algorithm. Quasiconvexity is only
o p t i m a l in (1)
required to ensure a global optimal solution.
The algorithm ensures that c~ _> -gj/eh, but
Fig. 1. for simplicity assume that equality holds for those
j where ~j ___eh. Then the constant h~ k) satisfies
THEOREM 1 If, in iteration k, the current point
z k is not feasible, then all new points generated by min(eh,~j) _< h~ k) <_-gj. (21)
the algorithm will be different from z k. [2 This follows directly from (16) and the fact that
PROOF. If z k is infeasible, then gk > 0 and a lin- (17) is already satisfied for c~k)" - 1 if Yj < £h.
earization is added to the MILP problem. If this Below it is proven that any limit point is a lea-
linearization was the j t h one added, then all new sible point.
points z I generated by the algorithm will satisfy THEOREM 3 Suppose that the c~-ECP algorithm
generates an infinite sequence of points {z k" k C
g-j (vyj)T(z ___0, l> k. (20) E}. Then the limit point of any convergent subse-
quence K: C K: is feasible. D
Since z I - z k (= ~J) does not satisfy the in-
PROOF. Assume there is a convergent subsequence
equality (20), all new points will be different from
{z k" k E ~} with a limit point that is not feasi-
z k• D
ble. Then limkc ~9k -- c > 0 and one can find a
It immediately follows that all previous points
constant M such that
generated by the algorithm are different from z k £
as well. h~ k) > min (eh, ~) Vj > L M , Vk > M ,
57
Extended cutting plane algorithm
by (21). Since subsequent points z k are solutions to h~k) <_ eh. Thus the actual solution obtained by the
a linear program containing the linearization (15) algorithm can only be ensured to be e-optimal.
it holds for all k that
THEOREM 4 Assume that the a - E C P algorithm
0 > h k)+ (V j)T(z k - converges to a feasible solution z ~ and that all
linearizations are feasible underestimators accord-
ing to (12). Then z c~ is an optimal point in (P) and
when j = 1 , . . . , L k . Define G as the maximal Z ( z ~ ) , where Z(z) - cTz, is the optimal solution
norm of the gradient of g(z) in X x Y. That is, of (1). V1
a = max{l]Vg (z)ll:z x x g,i = 1,...,p}.
PROOF. Denote the feasible region of (1) with f~,
Then
the feasible region of the MILP problem that was
h~k) min(eh, e/2) > 0
IIz IlVyjll - a
solved to obtain z c¢ with f~c¢ and an optimal point
of (1) with z*. By (12) it holds that f~ C f~c¢ and
when k > M and j > LM. This implies that the thus
sequence is not a Cauchy sequence and thus not Z(z < Z(z*). (22)
convergent, which is a contradiction since it was
On the other hand z ~ was feasible in (1) and
assumed that the sequence {z k" k E ~ } was con-
vergent. [::] thus
Z(z*) < Z(zC°). (23)
Convergence To the Optimal Solution. Finally,
convergence of the algorithm to the global opti- From (22) and ( 2 3 ) o n e gets that Z(z*) =
mal solution of (1) is shown. Z ( z °°) and thus Z ( z ~ ) is the optimal solution to
First note that the algorithm will terminate in (1) and z c~ is an optimal point in (1). [:3
finite time at a point where all underestimators EXAMPLE 5 The algorithm is demonstrated on a
are e-feasible underestimators, i.e. equation (17) is quasiconvex integer problem. In these, as well as
satisfied. This follows from the convergence theo- in other test runs, it has turned out that a suit-
rem. Since any convergent subsequence has a limit able choice of ~ and ~ / i s / 3 - 1.3 and 3 ' - 10. The
point that is feasible, it means that the entire se- e-tolerances in these examples are eg - eh -- 0.001.
quence of points will also converge to a feasible Consider the problem
point. Thus there is a tail of the sequence, say
{~J" j - M , . . . }, where the initial a values of the min 3yl + 2y2
corresponding linearizations directly satisfy (17). s.t. 3 . 5 - YlY2 <_ 0 (24)
This is true for those M values that satisfy ~j _< eh, y e { 1 , . . . , 5 } 2.
Vj > M. These c~ values will remain constant in
subsequent iterations. On the other hand, after The optimal solution to this problem is y -
reaching a feasible point (~j _< eg), the old con- (2, 2), which can be seen from the figure below.
The steps executed by the a - E C P algorithm are"
stants a , j - 1 , . . . , M, can only be updated a fi-
nite number of times before being sufficiently large Iteration 1. Solve problem
to satisfy (17). Therefore the algorithm will even-
min 3yl + 2y2
tually reach a feasible point where all lineariza-
tions are e-feasible underestimators and the algo- s.t. y E { 1 , . . . , 5 } 2.
rithm terminates. It remains to see if this point is The solution is yl _ (1, 1). A linearization in
also the optimal solution.
this point
To prove that the obtained solution is the op-
timal solution one needs to assume that all linear
constraints are feasible underestimators according
to (12). This is in general true if h~k)" - O. However, is added to the MILP problem according to (4). Set
in the actual algorithm it was only required that Ct(1)
1 _ 1. The linearization l~1) is shown in Fig 2.
58
Extended cutting plane algorithm
"~4"-!. °5+a~6)(-3-1)(Y~-~)
-<°y2
is added (a~6) = 1).
3 Y
Iteration 7. The MILP solution is again the fea-
2t Y3i ~~~11 ° t s i b l e s ° l u t i ° n y 7 = (2'2)" The linearizati°ns are
l ~ not feasible underestimators and t h u s t h ~ a values
1 yl. . . . . ..y.5 are updated. The new a values are - 1000,
c~s) = 100 and o~s)= a~ s) - 1 0 .
Iteration ~. The MILP solution is y4 _ (2, 2) which C o n c l u s i o n s . The above algorithm has several
is feasible, however, neither linearization is a fea- advantages when compared to other similar algo-
sible underestimator, so the a values are updated rithms for solving MINLP problems. At each itera-
using (18). The new values are a~ 5) - 100 and tion, the procedure only solves MILP subproblems
c~5) = 10. and is thus a competitive alternative to algorithms
where only NLP problems or both NLP and MILP
Iteration 5. The solution of the modified MILP
problems are solved in each iteration.
problem is y5 _ (2, 1). Since it is infeasible, a new
One consequence is that since only MILP prob-
linearization
lems are solved in each iteration, the nonlinear
1 . 5 + a ~ 5) (--1 --2)
xY2
21) _< constraints need not be calculated at relaxed val-
ues of the integer variables. It can be very diffi-
59
Extended cutting plane algorithm
60
Extremum problems with probability functions: Kernel type solution methods
Claus Still
1, i f f ( x , ~ ) <_ t,
Dept. Math. /~bo Akademi Univ. x(t - f (x, ~)) -- O,
F~nriksgatan 3 > t.
FIN-20500 Abo, Finland
Then
E-mail address: cstill~abo.fi
f
Tapio Westerlund v(x, t) - / x(t - f(x, ~))a( d~), (2)
Process Design Lab. Abo Akad. Univ. S
Biskopsgatan 8
where a(.) is the distribution function of a ran-
FIN-20500 Abo, Finland
E-mail address: twesterl©abo.fi dom vector ~ and the integral in (2) is understood
in the Lebesgue-Stieltjes sense.
MSC2000: 90Cll, 90C26
Key words and phrases: mixed integer nonlinear program-
Integral representation (2) of the probability
ming, extended cutting plane, quasiconvex function, feasible function v(x,t) demonstrates us expressively dif-
underestimators. ficulties which arise in approximate maximization
of its value: integrand X(') itself is a discontinuous
zero-one function and integral (2) over X(') is never
EXTREMUM PROBLEMS WITH PROBA- convex. Only in some cases, e.g., if function f(x, ~)
BILITY FUNCTIONS: KERNEL TYPE SO- is jointly convex and continuous in (x, ~) and a(.)
LUTION METHODS, KSM as a measure is quasiconcave, then function v(x, t)
Two types of stochastic programs are widely is quasiconcave in x, see [12].
known: two-stage and chance constrained prob- In this survey we at first will solve iteratively,
lems. The last ones were introduced to stochastic using stochastic analogues of linearization and gra-
programming by A. Charnes and W.W. Cooper in dient projection methods, the following probability
the 1950s [I] and are formally described defining a maximization problem:
nonlinear probability function v(x, t) of the form: max v(x, t) = max P {~: f(x, ~) ~_ t}, (3)
xEX xEX
t) - P _< t } . (1) where the constraint set X is assumed to be sim-
ple, i.e. on X we can effectively solve auxiliary
Here f(x,~) is a real valued function, defined on
R r x R v, t is a fixed level of reliability, ~ = ~(w) problems of maximization of linear or quadratic
is a random parameter and P denotes probabil- functions. At second, we will exploit the intro-
ity. Note that for a fixed x the function v(x, t) as duced technique for minimization of a smooth
function over probabilistic equality-inequality type
a function of t is the distribution function of the
random variable f (x, s). constraints, using a stochastic analogue of the
modified Lagrange method.
Various examples of extremum problems with
probability function v(x,t) can be found in [3, Gradient type methods require differentiability
Chap. 1], where among others also the so-called of a cost function. A lot of papers have been de-
voted to differentiability conditions of v(x, t) in x,
'stock exchange' paradox is analyzed. To overcome
a paradoxical situation being caused by an unsuc- starting from [13] where v~(x, t) was presented via
cessful choice of the objective expected return, the surface integral. The gradient of v(x, t) via volume
strategy which maximizes the expected growth of integral was presented in [16]; see also the survey
return (Kelly strategy), was applied in [2]. In [3] it paper [4]. All these formulas are quite uncomfort-
was demonstrated that a risky (i.e. probabilistic) able to handle, especially for numerical methods.
strategy is better than the Kelly one. In the following we will assume differentiability of
!
61
Extremum problems with probability functions" Kernel type solution methods
or OO
-oo<y<oo
--00
O0 O0
Let the sequence {Tn} of steplength satisfy con-
ditions"
f vK(v)dv - 0, f IvK(v)I dv < oo. (9) (X)
62
Extremum problems with probability ]unctions: Kernel type solution methods
REMARK 3 Statements of the theorem are valid Define the solution set X* for the problem (18)
also for the stochastic analogue of the gradient pro- as follows [8]"
jection method, see [5]"
I
X* - {x*" F (3 G}, (19)
Xn+l - iv]in + 7nvnz(xn ,t,~n)], x0 e X. (14)
where
E]
As it was described earlier, algorithms (11) and F -{. x Is'(x • )+v=(~*
, , t)~*l 2 -0},
(14) need in nth iteration step n independent re-
alizations of the random vector ~. In [11] it was with
verified that in asymptotic sense statistical esti- t
A* - argmin ]fx(x*)+ t X*
vz( ,t)A] 2 (21)
mation type methods, as algorithms (11) and (14) A>O
63
Extremum problems with probability functions: Kernel type solution methods
See also" Stochastic programming with [3] KIBZUN, A.I., AND KAN, Y.S.: Stochastic programming
simple integer recourse; Two-stage stochas- problems with probability and quantile functions, Wiley,
tic programs with recourse; Stochastic ve- 1995.
[4] KIBZUN, A., AND URYASEV, S.: 'Differentiability
hicle routing problems; Stochastic integer of probability functions', Stochastic Anal. Appl. 16
programming: Continuity, stability, rates of (1998), 1101-1128.
convergence; Logconcave measures, logcon- [5] LEPP, R.: 'Maximization of a probability function over
vexity; Logconcavity of discrete distribu- simple sets (in Russian)', Proc. Acad. Sci. Estonian
tions; General moment optimization prob- SSR. Phys. Math. 28 (1979), 303-308.
[6] LEPP, R.: 'Minimization of a smooth function over
lems; Approximation of multivariate proba-
probabilistic constraints (in Russian)', Proc. Acad. Sci.
bility integrals; Discretely distributed sto- Estonian SSR. Phys. Math. 29 (1980), 140-144.
chastic programs: Descent directions and [7] LEPP, R.: 'Stochastic approximation type algorithm
efficient points; Static stochastic program- for the maximization of the probability function', Proc.
ming models; Static stochastic program- Acad. Sci. Estonian SSR. Phys. Math. 32 (1983), 150-
ming models: Conditional expectations; 156.
[8] MIELE, i . , CRAGG, E.G., Ivsa, R.R., AND LEVY,
Stochastic programming models: Random A.V.: 'Use of the augmented penalty functions in
objective; Stochastic programming: Mini- mathematical programming problems. Part I', J. Op-
max approach; Simple recourse problem: tim. Th. Appl. 8 (1971), 115-130.
Primal method; Simple recourse problem: [9] NURMINSKII, E.A.: Numerical methods for solution of
Dual method; Probabilistic constrained lin- deterministic and stochastic Minimax Problems, Nauk.
Dumka, 1979. (In Russian.)
ear programming: Duality theory; Prob-
[10] PARZEN, E.: 'On the estimation of a probability density
abilistic constrained problems: Convexity and the mode', Ann. Math. Statist. 33 (1962), 1065-
theory; Approximation of extremum prob- 1076.
lems with probability functionals; Multi- [11] POLYAK, B.T., AND TSYPKIN, Y.Z.: 'Adaptive algo-
stage stochastic programming: Barycentric rithms of estimation (convergence, optimality, stabil-
approximation; Stochastic linear programs ity) (in Russian)', Avtomatika i Telemekhanika (Auto-
matics and Remote Control) (1979), 74-84.
with recourse and arbitrary multivariate [12] PRI~KOPA, A." 'Logarithmic concave measures and re-
distributions; Stochastic programs with re- l a t e d topics', in M.A.H. DEMPSTER (ed.): Stochastic
course: Upper bounds; Stochastic integer Programming, Acad. Press, 1980.
programs; L-shaped method for two-stage [13] RAIK, E.: 'Differentiability in the parameter of the
stochastic programs with recourse; Stochas- probability function and optimization of the probabil-
ity function via the stochastic pseudogradient method',
tic linear programming: Decomposition and
Proc. Acad. Sci. Estonian SSR. Phys. Math. 24 (1975),
cutting planes; Stabilization of cutting plane 3-6. (In Russian.)
algorithms for stochastic linear program- [14] ROSENBLATT, M.: 'Remarks on some nonparametric
ming problems; Two-stage stochastic pro- estimates of a density function', Ann. Math. Statist.
gramming: Quasigradient method; Stochas- 27' (1957), 832-837.
tic quasigradient methods in minimax prob- [15] TAMM, E.: ~On the minimization of the probability
function (in Russian)', Proc. Acad. Sci. Estonian SSR.
lems; Stochastic programming: Nonantici- Phys. Math. 28 (1979), 17-24.
pativity and Lagrange multipliers; Prepro- [16] URYASEV, S.: 'A differentiation formula for integrals
cessing in stochastic programming; Stochas- over sets given by inclusion', Numer. Funct. Anal. Op-
tic network problems: Massively parallel so- tim. 10 (1989), 827-841.
lution. [17] ZANGWILL, W.I.: Nonlinear programming. A unified
approach, Prentice-Hall, 1969.
Riho Lepp
References Tallinn Technical Univ.
Tallinn, Estonia
[1] CHARNES, A., AND COOPER, W.W.: 'Chance-
constrained programming', Managem. Sci. 5 (1959), E-mail address: lprh©ioc, ee
73-79. MSC 2000:90C15
[2] KELLY, J.: 'A new interpretation of information rate', Key words and phrases: probability function, kernel esti-
Bell System Techn. J. 35 (1956), 917-926. mates, stochastic approximation.
64