Mathematical and Computer Modelling, 16(11) 1992, 87-100.
GENETIC ALGORITHMS AND VERY FAST SIMULATED REANNEALING: A COMPARISON
LESTER INGBER
Lester Ingber Research, P.O. Box 857, McLean, VA 22101, U.S.A.
ingber@alumni.caltech.edu
BRUCE ROSEN
Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, U.S.A.
brosen@cis.udel.edu
Abstract—We compare Genetic Algorithms (GA) with a functional search method, Very Fast Simulated
Reannealing (VFSR), that not only is efficient in its search strategy, but also is statistically guaranteed to
find the function optima. GA previously has been demonstrated to be competitive with other standard
Boltzmann-type simulated annealing techniques. Presenting a suite of six standard test functions to GA
and VFSR codes from previous studies, without any additional fine tuning, strongly suggests that VFSR
can be expected to be orders of magnitude more efficient than GA.
Genetic Algorithms & VFSR
-2-
Ingber & Rosen
1. INTRODUCTION
This paper compares Genetic Algorithms (GA) with a functional search method, Very Fast Simulated Reannealing (VFSR), that not only is efficient in its search strategy, but also is statistically guaranteed to find the function optima. We first review Genetic Algorithms as evolutionary optimization methods, and illustrate their operation. We next introduce the VFSR algorithm, and show how it can use a
faster annealing schedule than either Boltzmann Annealing, or Fast Annealing. We then compare GA
with VFSR on a robust suite of 6 test problems involving relatively easy to difficult function optimization,
and comment on their relative performances. Our preliminary results show that VFSR is at least an order
of magnitude superior to Genetic Algorithms in convergence speed and is more likely to find the global
optima during a time limited search. We conclude with a discussion on the relative merits of parallelizing
the algorithms.
2. GENETIC ALGORITHMS
Genetic Algorithms are population based search strategies that maintain the locations and values of
a set of points in a function space. The standard of comparison for which new points are generated or old
points are discarded is a function of the existing population. In these GA, combinations of individual
points in the population are used to derive new points.
2.1. Genetic Reproduction
Genetic algorithms are optimization algorithms based on Darwinian models of natural selection and
evolution. The basic idea behind GA is that an initial population of candidate states of size n are chosen
at random and each state is evaluated according to the optimization function. Iteratively, a small set of
elite individuals of the population are chosen by a selection procedure according to their fitness value f ,
derived from an individual’s relative functional performance, a measure of how well they satisfy the optimization function. The selection procedure probabilistically selects an individual i to remain in the population and reproduce with probability pi = f i /
n
Σ f j.
j=1
Those states not selected are culled from the popula-
tion. Because the average fitness of the population is defined as f =
n
Σ f j /n, if there are mi (t) copies of
j=1
an individual i at time t, the new population will have m i (t + 1) = m i (t) f i / f copies of f i . The effect of
this reproduction scheme is that above average performing individuals reproduce, replacing poorly performing individuals. The test, select, and reproduce operators are applied each iteration (generation), and
replace the standard generate and test paradigm.
2.2. Genetic Recombination
To interpolate between states with high fitness values, the crossover operators genetically recombines individuals states. New individuals are created by encoding their values as bit strings of a given
length, l, typically by using gray or binary coding. Two parent states from the new population are chosen
randomly to create two new child states. A crossover position between 1 and l is chosen at random, and
the two parents exchange portions of their binary representations. The first child contains bits 1 through i
from its first parent and bits i + 1 to l from its second parent, while the second child contains bits 1
through i from the second parent and bits i + 1 to l from the first parent. (These bit strings are analogous
to chromosomes in biological systems, with the crossover function creating new states by sexual reproduction.) This process continues until enough new children are produced to replace the parent population,
although some parents may kept in the new population.
For diversity, mutations that alter individual states occasionally occur (with small probability) each
generation. The mutation operator randomly flips a bit in an individual’s bit string representation. The
resulting effect allows the population to sample states away from its mean, preventing the population from
converging and stagnating at any minima, but the mutation operator does not cause the population to sample the state space ergodically. The reproduction, crossover and mutation cycle continues until an acceptable solution is found, until the average population fitness converges to a stable fixed points, or until a
fixed number of generations have evolved.
Genetic Algorithms & VFSR
-3-
Ingber & Rosen
Consider the optimization task of finding an integer x that minimizes the function
f (x) = x 2 (x − 1)2 + 0. 1x 2 subject to the constraint 0 < x < 16. The function has a global minimum at
x = 0, and a slightly higher local minimum at x = 1. For the purposes of this example, the population
states represented are generated as bit strings of length 4. The strings are initialized with randomly generated 0 or 1 values, and the fitness function to maximize is − f (x). Using a simple (nonnegative) binary
decoding algorithm, these strings represent integer values from 0 to 15. Each individual is ranked according to the fitness of its decoded value. The top performing individuals reproduce replacing those with
poor fitness values. The new population are genetically recombined, although mutations occasionally
occur. Consider two individuals bit strings, A = 0111, and B = 1001, that are chosen to breed. A
crossover point between 0 and 3 is randomly chosen. If the crossover point is 1, the two newly created
children C and D have values C = 0001, and D=1111. While strings A and B represent positive integer
values of 7 and 9 respectively, their children bit strings (C and D) represent integer values of 1 and 15
respectively. Individuals, such as C, having relatively high fitness values (as determined by f ) will
increasingly dominate the population, while less significant individuals, such as D, die off. Each successive generation, this eugenic method increases the mean population fitness, until the majority of the population settles in the local minimum at x=1 (precision is lost by nature of the encoding). The mutation
operator, occasionally flipping random bits in the population, causes individuals to sample points that
span the space with precision defined by the encoding method. New states are typically suboptimal relative to the population mean, but can infrequently increase the mean fitness of the population. Should the
rightmost bit of C be mutated, the global minimum will be found, and successive populations will converge to f (0000) = 0.
The original concepts of Genetic Algorithms were developed by Holland [1], and have been shown
to provide near-optimal heuristics for information gathering in complex search spaces. They frequently
outperform other more direct methods such as gradient descent on difficult problems, i.e., those involving
highly nonlinear, high dimensional, discrete, multimodal or noisy functions. (However, gradient descent
methods are more efficient for finding solutions when searching convex function spaces with tight constraints e.g., continuous, low dimensional, unimodal spaces.) Many empirical simulations have demonstrated the efficiency and robustness of GA on different optimization tasks [2-4]; however, GA are not
guaranteed to find the global functional optima because: (1) the precision limits in the encoding process
can substantially reduce the solution accuracy, and (2) the search process does not ergodically cover and
search the state space. The first deficiency is the limited precision that is dependent on the length of the
population bit strings. A bit string of length four, when decoded, allows only 15 distinct values; gray coding is even more limited. Methods such as dynamic parameter encoding have addressed the former issue
and have achieved good results on a variety of function optimization problems [5]. However, in the GA
domain, the second deficiency has been largely unresolved.
3. VERY FAST SIMULATED REANNEALING
An algorithm of Very Fast Simulated Reannealing (VFSR) has been developed to fit empirical data
to a theoretical cost-function over a D-dimensional parameter-space [6]. This methodology has been
applied to several systems, ranging from combat analysis [7,8], to finance [9,10], to neuroscience [11,12].
This Section gives a self-contained description of VFSR. The general outline of presentation of simulated-annealing heuristics here closely follows that of Szu and Hartley [13].
3.1. Boltzmann Annealing (BA)
Boltzmann annealing was essentially introduced as a Monte Carlo importance-sampling technique
for doing large-dimensional path integrals arising in statistical physics problems [14]. This method was
generalized to apply more generally to fitting non-convex cost-functions arising in a variety of problems,
e.g., finding the optimal wiring for a densely wired computer chip [15].
The method of simulated annealing consists of three functional relationships.
1. g(x): Probability density of state-space of D parameters x = {x i ; i = 1, D}.
2. h(x): Probability density for acceptance of new cost-function given the just previous value.
3. T (k): schedule of “annealing” the “temperature” T in annealing-time steps k, i.e., of
changing the volatility or fluctuations of the two previous probability densities.
Genetic Algorithms & VFSR
-4-
Ingber & Rosen
Based on functional form derived for many physical systems belonging to the class of GaussianMarkovian systems, the algorithm chooses for g,
g(x) = (2π T )−D/2 exp[−∆x 2 /(2T )] ,
(1)
where ∆x = x − x 0 is the deviation of x from x 0 (usually taken to be the just-previously chosen point to
test), and where T is clearly a measure of the fluctuations of the Boltzmann distribution g in the D-dimensional x-space.
The acceptance probability is based on the chances of obtaining a new state E k+1 relative to a previous state E k ,
1
exp(−E k+1 /T )
=
,
(2)
h(x) =
exp(−E k+1 /T ) + exp(−E k /T ) 1 + exp(∆E/T )
where ∆E represents the “energy” difference between the present and previous values of the cost-function
appropriate to the physical problem, i.e., ∆E = E k+1 − E k . This essentially is the Boltzmann distribution
contributing to the statistical mechanical partition function of the system [16].
Given g(x), it has been proven [17] that it suffices to obtain a global minimum of E(x) if T is
selected to be not faster than
T0
.
(3)
T (k) =
ln k
For the purposes of this paper, a heuristic demonstration follows, to show that Equation (3) will suffice to give a global minimum of E(x) [13].
In order to statistically assure, i.e., requiring many trials, that any point in x-space can be sampled
infinitely often in annealing-time (IOT), it suffices to prove that the products of probabilities of not generating a state x IOT for all annealing-times successive to k 0 yield zero,
∞
(1 − g k ) = 0 .
Π
k
(4)
0
This is equivalent to
∞
gk = ∞ .
Σ
k
(5)
0
The problem then reduces to finding T (k) to satisfy Equation (5).
For BA, if T (k) is selected to be Equation (3), then Equation (1) gives
∞
∞
∞
0
0
0
g k ≥ Σ exp(− ln k) = Σ 1/k = ∞ .
Σ
k
k
k
(6)
Although there are sound physical principles underlying the choices of Equations (1) and (2) [14], it
was noted that this method of finding the global minimum in x-space was not limited to physics examples
requiring bona fide “temperatures” and “energies.” Rather, this methodology can be readily extended to
any problem for which a reasonable probability density h(x) can be formulated [15].
[The above Equations (1) and (6) correct Equations (1) and (6), resp., in the first VFSR paper [6].
Nothing else is affected in the rest of that paper.]
3.2. Fast Annealing (FA)
It was also noted that this methodology can be readily extended to use any reasonable generating
function g(x), without relying on the principles underlying the ergodic nature of statistical physics.
Specifically, it was noted that the Cauchy distribution has some definite advantages over the Boltzmann
form [13]. The Cauchy distribution,
T
,
(7)
g(x) =
2
(∆x + T 2 )(D+1)/2
Genetic Algorithms & VFSR
-5-
Ingber & Rosen
has a “fatter” tail than the Gaussian form of the Boltzmann distribution, permitting easier access to test
local minima in the search for the desired global minimum.
It is instructive to note the similar corresponding heuristic demonstration, that the Cauchy g(x) statistically finds a global minimum. If Equation (3) is replaced by
T0
,
(8)
T (k) =
k
then
∞
T0 ∞ 1
=∞.
(9)
gk ≈
Σ
∆x D+1 Σ
k0 k
k0
Note that the “normalization” of g has introduced the annealing-time index k.
The method of FA is thus statistically seen to have an annealing schedule exponentially faster than
the method of BA. This method has been tested in a variety of problems [13].
3.3. Very Fast Annealing
In a variety of physical problems we have a D-dimensional parameter-space. Different parameters
have different finite ranges, fixed by physical considerations, and different annealing-time-dependent sensitivities, measured by the curvature of the cost-function at local minima. BA and FA have g distributions
which sample infinite ranges, and there is no provision for considering differences in each parameterdimension, e.g., different sensitivities might require different annealing schedules. Also, there is no quick
algorithm for calculating a D-dimensional Cauchy random generator.
For example, one might choose a D-product of one-dimensional Cauchy distributions, because the
one-dimensional Cauchy has a few quick algorithms. This would also permit different T 0 ’s to account
different sensitivities.
T i0
.
(10)
gik =
∆x i 2 + T 2
But then we would require an annealing schedule going as
T i (k) = T 0 /k 1/D ,
(11)
which, although faster than BA, is still quite slow. E.g., consider D = 6 and assume a final temperature
T f = 10−4 T 0 is desired.
The above problems provide motivation for the development of a new algorithm. Consider a
parameter α ki in dimension i generated at annealing-time k with the range
α ki ∈[ Ai , Bi ] ,
(12)
i
calculated with the random variable y ,
i
= α ki + y i (Bi − Ai ) ,
α k+1
y i ∈[−1, 1] .
(13)
Define the generating function
D
gT (y) = Π
i=1
D
1
≡
gTi (y i ) .
2(|y i | + T i ) ln(1 + 1/T i ) Π
i=1
Its cumulative probability distribution is
y1
G T (y) =
yD
∫ −1∫
...
−1
D
dy′1 . . . dy′ D gT (y′) ≡ Π G Ti (y i ) ,
i=1
(14)
Genetic Algorithms & VFSR
G Ti (y i )
-6-
1 sgn (y i ) ln(1 + |y i |/T i )
= +
.
2
2
ln(1 + 1/T i )
Ingber & Rosen
(15)
y i is generated from a ui from the uniform distribution
ui ∈U[0, 1] ,
1
y i = sgn (ui − )T i [(1 + 1/T i )|2u −1| − 1] .
i
2
(16)
It is straightforward to calculate that for an annealing schedule for T i
T i (k) = T 0i exp(−c i k 1/D ) ,
(17)
a global minima statistically can be obtained. I.e.,
∞
∞
0
0
D
1
1
gk ≈ Σ [ Π
] =∞.
Σ
i
i=1 2|y |c i k
k
k
(18)
It seems sensible to choose control over c i , such that
T fi = T 0i exp(−m i ) when k f = exp ni ,
c i = m i exp(−ni /D) ,
(19)
where m i and ni can be considered “free” parameters to help tune VFSR for specific problems.
It has proven fruitful to use the same type of annealing schedule for the acceptance function h as
used for the generating function g, i.e., Equations (17) and (19), but with the number of acceptance
points, instead of the number of generated points, used to determine the k for the acceptance temperature.
i
New parameters α k+1
are generated from old parameters α ki from
i
= α ki + y i (Bi − Ai ) ,
α k+1
(20)
constrained by
i
α k+1
∈[ Ai , Bi ] .
(21)
i
I.e., y ’s are generated until a set of D are obtained satisfying these constraints.
3.4. Reannealing
Whenever doing a multi-dimensional search in the course of a real-world nonlinear physical problem, inevitably one must deal with different changing sensitivities of the α i in the search. At any given
annealing-time, it seems sensible to attempt to “stretch out” the range over which the relatively insensitive
parameters are being searched, relative to the ranges of the more sensitive parameters.
It has proven fruitful to accomplish this by periodically rescaling the annealing-time k, essentially
reannealing, every hundred or so acceptance-events, in terms of the sensitivities si calculated at the most
current minimum value of the cost function L,
si = ∂L/∂α i .
(22)
In terms of the largest si = smax , it has proven fruitful to reanneal by using a rescaling for each k i of each
parameter dimension,
k i → k′i ,
T ′ik′ = T ik (smax /si ) ,
k′i = ( ln(T i0 /T ik′ )/c i ) D .
(23)
T i0 is set to unity to begin the search, which is ample to span each parameter dimension. [N.b. The
VFSR code used here was used previously to investigate the inverse weighting of smax /si in the above
Genetic Algorithms & VFSR
-7-
Ingber & Rosen
equation. Since no code was changed for this project, the old ratio was used here. Using the equation
above as given here increases the efficiency of VFSR.]
The acceptance temperature is similarly rescaled. In addition, since the initial acceptance temperature is set equal to a trial value of L, this is typically very large relative to the global minimum. Therefore, when this rescaling is performed, the initial acceptance temperature is reset to the most current minimum of L, and the annealing-time associated with this temperature is set to give a new temperature equal
to the lowest value of the cost-function encountered to annealing-date.
Also generated are the “standard deviations” of the theoretical forms, calculated as [∂2 L/(∂α i )2 ]−1/2 ,
for each parameter α i . This gives an estimate of the “noise” that accompanies fits to stochastic data or
functions. During the runs, also calculated are the off-diagonal elements of the “covariance matrix” for
all parameters. This inverse curvature of the theoretical cost function can provide a quantitative assessment of the relative sensitivity of parameters to statistical errors in fits to stochastic systems.
A few other twists can be added, and such searches undoubtedly will never be strictly by rote.
Physical systems are so different, some experience with each one is required to develop a truly efficient
algorithm.
The above two sub-Sections define this method of Very Fast Simulated Reannealing (VFSR). The
annealing schedule for the temperatures T i decrease exponentially in annealing-time k, i.e.,
T i = T i0 exp(−c i k 1/D ). Of course, the fatter the tail of the generating function, the smaller the ratio of
acceptance to generated points in the fit. However, in practice, it is found that for a given generating function, this ratio is approximately constant as the fit finds a global minimum. Therefore, for a large parameter space, the efficiency of the fit is determined by the annealing schedule of the generating function.
4. SIMULATION COMPARISONS
4.1. Test Functions
We evaluated the two algorithms, VFSR and GA, on a set of six robust optimizing test functions.
Functions 1-5, shown below, are De Jong’s five function test bed and are typically used for GA
benchmarking [3]. The performances of GA and their variations on these functions are well
documented [5]. For consistency with [5], our GA runs were also simulated on the University of California at San Diego GENESIS 1.2 Genetic Algorithms simulator [18]. We also choose function f 0 , the
objective function of Corana et al [19], which contains a very large number of local minima, and is very
difficult to optimize.
The test suite was designed to test algorithms on functions that are:
Continuous or Discontinuous
Convex or Concave
Unimodal or Multimodal
Linear or Nonlinear
Low Dimensional or High Dimensional
Deterministic or Stochastic.
The test suite is composed of a set of six test functions { f n ; n = 0, 5} that are optimized by their
minimization:
f 0 (x 1 , . . . , x 4 ) =
N
Σ
i=1
(t i sgn (z i ) + z i )2 cd i if |x i − z i | < |t i |
2
otherwise ,
di xi
x
z i = i + 0. 49999 sgn (x i )si ,
si
si = 0. 2, t i = 0. 05, i = 1, 4 ,
d i = {1. 0, 1000. 0, 10. 0, 100. 0} ,
Genetic Algorithms & VFSR
-8-
Ingber & Rosen
c = 0. 15 ,
−1000. 0 ≤ x i ≤ 1000. 0 , i = 1, 4 ,
(24)
where si , t i , d i , and c are coefficients defined such that f 0 defines a paraboloid with axis parallel to the
coordinates, and a set of holes that increase in depth near the origin.
f 1 (x 1 , x 2, x 3 ) =
3
x 2j ,
Σ
j=1
−5. 12 ≤ x i ≤ 5. 12 , i = 1, 3 .
(25)
f 2 (x 1 , x 2 ) = 100(x 12 − x 2 )2 + (1 − x 1 )2 ,
−2. 048 ≤ x i ≤ 2. 048 , i = 1, 2 .
f 3 (x 1 , . . . , x 5 ) = 30. 0 +
(26)
5
Σ x j ,
j=1
−5. 12 ≤ x i ≤ 5. 12 , i = 1, 5 .
f 4 (x 1 , . . . , x 30 ) =
(27)
30
Σ x j4 + η ,
j=1
−1. 28 ≤ x i ≤ 1. 28 , i = 1, 30 .
(28)
where η is (a) a random number with distribution (0,1), or (b) a random number bounded between [0,1).
1
,
f 5 (x 1 , x 2 ) =
25
1
500 + Σ
2
j=1
j + Σ (x i − a ji )6
i=1
a j 1 = { − 32, −16, 0, 16, 32, −32, −16, 0, 16, 32, −32, −16, 0, 16, 32, −32, −16,
0, 16, 32, −32, −16, 0, 16, 32} ,
a j 2 = { − 32, −32, −32, −32, −32, −16, −16, −16, −16, −16, 0, 0, 0, 0, 0, 16, 16,
16, 16, 16, 32, 32, 32, 32, 32} ,
−65. 536 ≤ x i ≤ 65. 536 , i = 1, 2 .
(29)
Function f 0 is somewhat like function f 2 . It is very difficult to minimize correctly when n = 4,
with over 1020 local minima to be trapped in. Optimization methods based on gradient descent are quite
likely to be caught in one of the many local minima. Function f 1 tests simple sum of squares with 1 minimum at x i = 0. The function f 2 is the classical function of Rosenbrock and Chebyquad in two dimensions that is unimodal, yet difficult to minimize [19]. The next function, f 3 , is the plateau function, generated as the sum of integer threshold values. The five dimensional space has one minimum and is discontinuous. Function f 4 is a noisy quartic function of 30 variables originally defined as
f 4 (x 1 , . . . , x 30 ) =
30
x j 4 + η , where η is produced by Gaussian noise with distribution n(0,1) [3].
Σ
j=1
While
the intent of this function is to determine an optimizer’s performance in the presence of noise, this func30
tion is perhaps flawed, as no definite global minimum exists. Once an algorithm minimizes
x j 4 , it
Σ
j=1
Genetic Algorithms & VFSR
-9-
Ingber & Rosen
could continue executing, waiting for smaller values of η to be produced. By performing more function
evaluations, an inefficient optimization technique might find better solutions than an efficient optimization
(that terminates with fewer function evaluations). In response to this problem, we choose η to be a random variable with uniform distribution, and bounded by [0,1). This ensured that there could be only one
functional minima located at 0. Lastly, f 5 spans a 2-dimensional space, with a global minima ≈ 0.998004.
It is similar to f 0 but has only 25 local minima. In our opinion, f 0 is a much better test of an optimizing
algorithm than De Jong’s f 5 .
In each run, each algorithm performed a minimum of 1000 function calls unless it terminated prematurely by finding the function’s global minima. Global minima values of 0 are shown as 10−12 . For
those runs not converging to the optimum, limits were placed on the number of VFSR function calls, and
the number GA generations. For example, for VFSR, a cutoff of f < 10−12 was one criteria for those
functions with zero as a minimum (although zero was reached to within machine double-precision accuracy).
For all the Genetic Algorithm simulation runs, the fitness functions to maximize were − f i . Real
values were encoded as binary strings of various lengths. f 0 used 32 bits. For the others, we used the
defaults given in the GA simulator: f 1 used 10 bits, f 2 used 12 bits, f 3 used 10 bits, f 4 used 8 bits, and f 5
used 17 bits. (Although this entailed some prior tuning of GA for these specific systems, no such tuning
was done for the VFSR runs reported here.) Bit string lengths depend on the function being optimized;
when states are encoded as short bit strings, they converge more quickly than those encoded as long
string. However, the decoded precision of the shorter string is less than longer strings. Factors such as
the string length, population sizes, mutation rates, number of generations, and percent of population
culled were heuristically determined, and were dependent on the function being optimized. These relatively minor procedures are considered to be part of the algorithm defining GA.
Both the GA and VFSR algorithms were taken from existing libraries previously prepared for other
problems. While this approach does not guarantee the best fitting procedures for GA or VFSR on these
specific problems, our results do suggest the relative merits of these algorithms that might be expected on
new problems. Population sizes for the GA runs were 30000, 100, 50, 2000, 30000, and 200 individuals
for functions f 0 through f 5 respectively. No determination was made to find optimal population sizes.
(Again, although this entailed some prior tuning of GA for these specific systems, no such tuning was
done for the VFSR runs reported here.) Additionally, we have not plotted each generation versus its best
function evaluation, because VFSR has no corresponding attribute and because the GA runs were simulated sequentially. However, a generation-performance plot would be a more accurate estimate of the
power of the genetic algorithms, especially if the GA simulations were run on parallel hardware such as
the Connection Machine [20]. Then all function evaluations (performed in one generation) could be evaluated in parallel. It should be noted that parallel processing of GA can lead to spurious correlations and
premature convergence [21].
It should be noted in this context that VFSR also lends itself well to parallelization. Blocks of random numbers can be generated in parallel, and then sequentially checked to determine a generating point
satisfying all boundary conditions. Also, advantage can be taken of the low ratio of acceptance to generated points typical in VFSR, to generate blocks of cost functions, and then sequentially check these to
determine the next best current minimum.
4.2. Results
FIGURES 1-6
Figures 1 through 6 illustrate the performances of the two algorithms on the corresponding functions f 0 through f 5 . Solid and short dashed lines each represent one VFSR run each, and dashed and long
dashed lines represent one GA run each. The runs are log-log plotted to show relative convergence rates
of each algorithm. The abscissa indicates the number of function calls, while the ordinate shows the best
function evaluation found so far. The VFSR and GA optimizations of functions f 1 and f 2 were fairly
competitive, which may be attributed to the fact that these functions are continuous and relative concave
Genetic Algorithms & VFSR
-10-
Ingber & Rosen
within the boundary constraints.
Although stochastic, the VFSR performance variances on functions f 0 , f 1 , f 3 , and f 4 were relatively small. VFSR’s performance on f 0 is most illustrative. The GA variances were substantial on f 1
and f 2 . A VFSR run also was performed for f 0 for the case of 10 parameters, and it converged to zero
within 107 generations of that cost function; we considered it to be too much a strain on resources to use
this number of variables with GA. Both algorithms found the minimum for function f 5 in about the same
time, although VFSR had the quickest approach. Overall, the VFSR runs converged faster than the GA
runs, and with smaller variances.
5. DISCUSSION
No rigorous proofs have been given for the convergence of VFSR, only a heuristic rationale for its
success. It is expected that the obvious utility of this algorithm will motivate such proofs. However,
actual fits to data are a finite process, and often even only heuristic guides to algorithms that obviously fit
many classes of data are important. The annealing schedule for the temperatures T i decrease exponentially in annealing-time k, i.e., T i = T i0 exp(−c i k 1/D ). This algorithm is exponentially faster than the fast
Cauchy annealing, where T i = T 0 /k, and even faster than Boltzmann annealing, where T i = T 0 / ln k.
With regard to the Genetic Algorithm simulations, Genetic Algorithms have the problem of parameter estimation. Currently only heuristic estimates of parameters such as population size and string length
were used in our simulations, and these parameter values were taken from [18].
When comparing the two algorithms, one should note that the GA and VFSR random number generator values differ almost immediately during the optimization. VFSR makes additional calls to the generator to incorporate noise into its decision process, making it difficult to rigidly compare performances.
Genetic algorithms are not designed to ergodically sample and cover the space in a maximally efficient way. There is some doubt as to whether they are ergodic at all, but the prime benefit of Genetic
Algorithms occurs during each generation when all individuals can be evaluated in parallel, making
Genetic algorithms excellent candidates for running on fine grained parallel processing hardware such as
the connection machine. However, parallelization for some systems can be somewhat thwarted. For
some applications, such as closed loop control, performance evaluation of the system can be delayed or
occur infrequently. Additionally, the evaluations may depend not only on the current state, but also on
previous states.
In contrast, VFSR is largely sequential in moving from one optimized value to the next. States
must be sampled sequentially, for acceptability and to permit identification of current local minima about
which new test parameters are chosen. However, we also have outlined how VFSR can be parallelized, at
the stage of preparing random numbers for generating points, and at the stage for preparing cost functions
for acceptance criteria. Additionally, when fitting dynamic systems, e.g., as performed for three physical
systems to date [7,9,11], parallelization is attained by independently calculating each time epoch’s contribution to the cost function.
An interesting variation of GA developed by Ackley [22], Stochastic Iterated Genetic Hillclimbing
(SIGH), combines simulated annealing, hillclimbing, and genetic algorithms, creating a coarse-to-fine
search strategy that is used as a function optimizer, which improves on the performances of Genetic Algorithms. While the SIGH algorithms and variations allow genetic algorithmic search to find states that are
suitable for Hillclimbing, they still are not ergodic, and do not provide for importance sampling over the
state space.
Problems with large dimensional problems and the “curse of dimensionality” (as illustrated by
function f 0 ) yet remain to be solved. However, with the advent of fine grain parallel computers to perform concurrent function evaluation, the optimization performances of VFSR and GA would undoubtedly
improve. For Genetic Algorithms, larger population sizes could sample larger areas of the state space,
allowing the algorithm to more quickly converges to a local minima.
Our future investigations will examine and compare neural network function optimizers with
VFSR, to more accurately characterize the performance benefits and deficits of the the VFSR algorithm.
However, from our current simulations, VFSR is a robust and powerfully efficient tool for function optimization.
Genetic Algorithms & VFSR
-11-
Ingber & Rosen
ACKNOWLEDGMENT
Larry Eshelman and Dave Schaffer, Philips Laboratories, gave a careful reading to our initial draft
and pointed out several misprints. Mark Johnson, MicroUnity Systems Engineering, Inc., first brought the
function f 0 to our attention by his posting of the code of this function to a public electronic bulletin board.
Graphs were produced using XVGR (graphics for exploratory data analysis), a public domain software
package running under UNIX and X11, developed by Paul Turner at the Oregon Graduate Institute.
Genetic Algorithms & VFSR
-12-
Ingber & Rosen
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
J. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor,
MI, (1975).
A.D. Bethke, Genetic Algorithms as Function Optimizers, Ph.D. Thesis, Department of Computer
and Communication Sciences, University of Michigan, Ann Arbor, MI, (1981).
K.A. De Jong, An Analysis of the Behavior of a Class of Genetic Adaptive system, Ph.D. Thesis,
Report, Department of Computer and Communication Sciences, University of Michigan, Ann
Arbor, MI, (1981).
D.E. Goldberg, Genetic Algorithms in Search, Optimization & Machine Learning, Addison-Wesley,
Reading, Mass, (1989).
N.N. Schraudolph and R.K. Belew, Dynamic parameter encoding for genetic algorithms, Technical
Report LAUR 90-2795, Los Alamos National Laboratory, Los Alamos, NM, (1990).
L. Ingber, Very fast simulated re-annealing, Mathl. Comput. Modelling 12 (8), 967-973 (1989).
L. Ingber, H. Fujio, and M.F. Wehner, Mathematical comparison of combat computer models to
exercise data, Mathl. Comput. Modelling 15 (1), 65-90 (1991).
L. Ingber and D.D. Sworder, Statistical mechanics of combat with human factors, Mathl. Comput.
Modelling 15 (11), 99-127 (1991).
L. Ingber, Statistical mechanical aids to calculating term structure models, Phys. Rev. A
42 (12), 7057-7064 (1990).
L. Ingber, M.F. Wehner, G.M. Jabbour, and T.M. Barnhill, Application of statistical mechanics
methodology to term-structure bond-pricing models, Mathl. Comput. Modelling 15 (11), 77-98
(1991).
L. Ingber, Statistical mechanics of neocortical interactions: A scaling paradigm applied to electroencephalography, Phys. Rev. A 44 (6), 4017-4060 (1991).
L. Ingber, Generic mesoscopic neural networks based on statistical mechanics of neocortical interactions, Phys. Rev. A 45 (4), R2183-R2186 (1992).
H. Szu and R. Hartley, Fast simulated annealing, Phys. Lett. A 122 (3-4), 157-162 (1987).
N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller, Equation of state calculations by fast computing machines, J. Chem. Phys. 21 (6), 1087-1092 (1953).
S. Kirkpatrick, C.D. Gelatt, Jr., and M.P. Vecchi, Optimization by simulated annealing, Science
220 (4598), 671-680 (1983).
K. Binder and D. Stauffer, A simple introduction to Monte Carlo simulations and some specialized
topics, in Applications of the Monte Carlo Method in Statistical Physics, (Edited by K. Binder), pp.
1-36, Springer-Verlag, Berlin, (1985).
S. Geman and D. Geman, Stochastic relaxation, Gibbs distribution and the Bayesian restoration in
images, IEEE Trans. Patt. Anal. Mac. Int. 6 (6), 721-741 (1984).
N.N. Schraudolph and J.J. Grefenstette, A Users Guide to GAUCSD 1.2, Report, University of California at San Diego, La Jolla, CA, (1991).
A. Corana, M. Marchesi, C. Martini, and S. Ridella, Minimizing multimodal functions of continuous variables with the "simulated annealing" algorithm, ACM Trans. Mathl. Software
13 (3), 262-279 (1987).
W.D. Hillis, The Connection Machine, The MIT Press, Cambridge, MA, (1985).
J.D. Schaffer, L.J. Eshelman, and D. Offutt, Spurious correlations and premature convergence in
genetic algorithms, in Foundations of Genetic Algorithms, (Edited by G. Rawlins), pp. 102-112,
Morgan Kaufmann, San Mateo, CA, (1991).
D.H. Ackley, Stochastic Iterated Genetic Hillclimbing, Ph.D. Thesis, Department of Computer Sciences, Carnegie Mellon University, Pittsburgh, PA, (1987).
Genetic Algorithms & VFSR
-13-
Ingber & Rosen
FIGURE CAPTIONS
Figure 1. Comparison between GA and VFSR is given for function f 0 . Solid and short dashed
lines each represent one VFSR run each, and dashed and long dashed lines represent one GA run each.
The runs are log-log plotted to show relative convergence rates of each algorithm. The abscissa indicates
the number of function calls, while the ordinate shows the best function evaluation found so far.
Figure 2. Comparison between GA and VFSR is given for function f 1 . See Figure 1 for legend.
Figure 3. Comparison between GA and VFSR is given for function f 2 . See Figure 1 for legend.
Figure 4. Comparison between GA and VFSR is given for function f 3 . See Figure 1 for legend.
Figure 5. Comparison between GA and VFSR is given for function f 4 . See Figure 1 for legend.
Figure 6. Comparison between GA and VFSR is given for function f 5 . See Figure 1 for legend.
Genetic Algorithms & VFSR
GA vs. VFSR
f0
1.0e+10
1.0e+00
cost
-14-
Figure 1
1.0e-10
1.0e+02
1.0e+04
generated
1.0e+06
1.0e+08
Ingber & Rosen
1.0e-20
1.0e+00
Genetic Algorithms & VFSR
GA vs. VFSR
f1
1.0e+05
1.0e-05
-15-
Figure 2
cost
1.0e+00
1.0e-10
1.0e+01
1.0e+02
1.0e+03
generated
1.0e+04
1.0e+05
Ingber & Rosen
1.0e-15
1.0e+00
Genetic Algorithms & VFSR
GA vs. VFSR
f2
1.0e+04
1.0e+02
1.0e-02
-16-
Figure 3
cost
1.0e+00
1.0e-04
1.0e-06
1.0e-08
1.0e+01
1.0e+02
1.0e+03
generated
1.0e+04
1.0e+05
Ingber & Rosen
1.0e-10
1.0e+00
Genetic Algorithms & VFSR
GA vs. VFSR
f3
1.0e+05
1.0e-05
-17-
Figure 4
cost
1.0e+00
1.0e-10
1.0e+01
1.0e+02
1.0e+03
generated
1.0e+04
1.0e+05
Ingber & Rosen
1.0e-15
1.0e+00
Genetic Algorithms & VFSR
GA vs. VFSR
f4
1.0e+03
1.0e+02
cost
-18-
Figure 5
1.0e+01
1.0e+00
1.0e-01
1.0e+02
1.0e+04
generated
1.0e+06
1.0e+08
Ingber & Rosen
1.0e-02
1.0e+00
Genetic Algorithms & VFSR
GA vs. VFSR
f5
1.0e+03
1.0e+01
-19-
Figure 6
cost
1.0e+02
1.0e+00
1.0e+01
1.0e+02
generated
1.0e+03
1.0e+04
Ingber & Rosen
1.0e-01
1.0e+00