Basic Algorithms and Operators
Basic Algorithms and Operators
EDITORIAL BOARD
Peter J Angeline, USA
David Beasley, UK
Lashon B Booker, USA
Kalyanmoy Deb, India
Larry J Eshelman, USA
Hitoshi Iba, Japan
Kenneth E Kinnear Jr, USA
Raymond C Paton, UK
V William Porto, USA
Günter Rudolph, Germany
Robert E Smith, USA
William M Spears, USA
ADVISORY BOARD
Kenneth De Jong, USA
Lawrence J Fogel, USA
John R Koza, USA
Hans-Paul Schwefel, Germany
Stewart W Wilson, USA
Evolutionary Computation 1
Edited by
Thomas Bäck, David B Fogel
and Zbigniew Michalewicz
I N S T I T U TE OF PHYSICS PUBLI S HI NG
Bristol and Philadelphia
INSTITUTE OF PHYSICS PUBLISHING
Bristol and Philadelphia
Copyright
c 2000 by IOP Publishing Ltd
PROJECT STAFF
Publisher: Nicki Dennis
Production Editor: Martin Beavis
Production Manager: Sharon Toop
Assistant Production Manager: Jenny Troyano
Production Controller: Sarah Plenty
Electronic Production Manager: Tony Cox
∞ ™ The paper used in this publication meets the minimum requirements
of American National Standard for Information Sciences – Permanence of Paper
for Printed Library Materials, ANSI Z39.48-1984
Contents
Preface xiii
List of contributors xvii
Glossary xxi
v
vi Contents
PART 4 REPRESENTATIONS
14 Introduction to representations 127
Kalyanmoy Deb
14.1 Solutions and representations 127
14.2 Important representations 128
14.3 Combined representations 130
References 131
viii Contents
PART 5 SELECTION
22 Introduction to selection 166
Kalyanmoy Deb
Contents ix
Index 331
Preface
xiii
xiv Preface
disciplines and their specific conferences and journals, thus reflecting the general
applicability and success of evolutionary computation methods.
The progress in the theory of evolutionary computation methods since
1990 impressively confirms the strengths of these algorithms as well as their
limitations. Research in this field has reached maturity, concerning theoretical
and application aspects, so it becomes important to provide a complete reference
for practitioners, theorists, and teachers in a variety of disciplines. The
original Handbook of Evolutionary Computation was designed to provide such
a reference work. It included complete, clear, and accessible information,
thoroughly describing state-of-the-art evolutionary computation research and
application in a comprehensive style.
These new volumes, based in the original Handbook, but updated, are
designed to provide the material in units suitable for coursework as well as
for individual researchers. The first volume, Evolutionary Computation 1:
Basic Algorithms and Operators, provides the basic information on evolutionary
algorithms. In addition to covering all paradigms of evolutionary computation in
detail and giving an overview of the rationale of evolutionary computation and
of its biological background, this volume also offers an in-depth presentation
of basic elements of evolutionary computation models according to the types
of representations used for typical problem classes (e.g. binary, real-valued,
permutations, finite-state machines, parse trees). Choosing this classification
based on representation, the search operators mutation and recombination
(and others) are straightforwardly grouped according to the semantics of the
data they manipulate. The second volume, Evolutionary Computation 2:
Advanced Algorithms and Operators, provides information on additional topics
of major importance for the design of an evolutionary algorithm, such as
the fitness evaluation, constraint-handling issues, and population structures
(including all aspects of the parallelization of evolutionary algorithms). This
volume also covers some advanced techniques (e.g. parameter control, meta-
evolutionary approaches, coevolutionary algorithms, etc) and discusses the
efficient implementation of evolutionary algorithms.
Organizational support provided by Institute of Physics Publishing makes it
possible to prepare this second version of the Handbook. In particular, we would
like to express our gratitude to our project editor, Robin Rees, who worked with
us on editorial and organizational issues.
References
Bäck T, Fogel D B and Michalewicz Z 1997 Handbook of Evolutionary Computation
(Bristol: Institute of Physics Publishing and New York: Oxford University Press)
References xv
xvii
xviii List of Contributors
Larry J Eshelman (Chapter 8)
Principal Member of Research Staff, Philips Research, Briarcliff Manor, NY, USA
e-mail: lje@philabs.philips.com
David B Fogel (Chapters 1, 4, 6, 16, 18, 20, 21, 27, 32–34, Glossary)
Executive Vice President and Chief Scientist, Natural Selection Inc., La Jolla, CA,
USA
e-mail: dfogel@natural-selection.com
Bold text within definitions indicates terms that are also listed elsewhere in this
glossary.
Adaptation: This denotes the general advantage in ecological or physiological
efficiency of an individual in contrast to other members of the population,
and it also denotes the process of attaining this state.
Adaptive behavior: The underlying mechanisms to allow living organisms,
and, potentially, robots, to adapt and survive in uncertain environments
(cf adaptation).
Adaptive surface: Possible biological trait combinations in a population of
individuals define points in a high-dimensional sequence space, where each
coordinate axis corresponds to one of these traits. An additional dimension
characterizes the fitness values for each possible trait combination, resulting
in a highly multimodal fitness landscape, the so-called adaptive surface or
adaptive topography.
Allele: An alternative form of a gene that occurs at a specified chromosomal
position (locus).
Artificial life: A terminology coined by C G Langton to denote the ‘. . . study
of simple computer generated hypothetical life forms, i.e. life-as-it-could-
be.’ Artificial life and evolutionary computation have a close relationship
because evolutionary algorithms are often used in artificial life research
to breed the survival strategies of individuals in a population of artificial
life forms.
Automatic programming: The task of finding a program which calculates a
certain input–output function. This task has to be performed in automatic
programming by another computer program (cf genetic programming).
Baldwin effect: Baldwin theorized that individual learning allows an organism
to exploit genetic variations that only partially determine a physiological
structure. Consequently, the ability to learn can guide evolutionary
processes by rewarding partial genetic successes. Over evolutionary
time, learning can guide evolution because individuals with useful genetic
variations are maintained by learning, such that useful genes are utilized
more widely in the subsequent generation. Over time, abilities that
previously required learning are replaced by genetically determinant
xxi
xxii Glossary
cycles between the mates. The cycle crossover operator preserves absolute
positions of the elements of permutations. (See also Section 33.3.)
Darwinism: The theory of evolution, proposed by Darwin, that evolution
comes about through random variation (mutation) of heritable charac-
teristics, coupled with natural selection, which favors those species for
further survival and evolution that are best adapted to their environmental
conditions. (See also Chapter 4.)
Deception: Objective functions are called deceptive if the combination of good
building blocks by means of recombination leads to a reduction of fitness
rather than an increase.
Deficiency: A form of mutation that involves a terminal segment loss of
chromosome regions.
Defining length: The defining length of a schema is the maximum distance
between specified positions within the schema. The larger the defining
length of a schema, the higher becomes its disruption probability by
crossover.
Deletion: A form of mutation that involves an internal segment loss of a
chromosome region.
Deme: An independent subpopulation in the migration model of parallel
evolutionary algorithms.
Diffusion model: The diffusion model denotes a massively parallel
implementation of evolutionary algorithms, where each individual is
realized as a single process being connected to neighboring individuals,
such that a spatial individual structure is assumed. Recombination
and selection are restricted to the neighborhood of an individual, such
that information is locally preserved and spreads only slowly over the
population.
Diploid: In diploid organisms, each body cell carries two sets of chromosomes;
that is, each chromosome exists in two homologous forms, one of which
is phenotypically realized.
Discrete recombination: Discrete recombination works on two vectors of
object variables by performing an exchange of the corresponding object
variables with probability one half (other settings of the exchange
probability are in principle possible) (cf uniform crossover). (See also
Section 33.2.)
DNA: Deoxyribonucleic acid, a double-stranded macromolecule of helical
structure (comparable to a spiral staircase). Both single strands are linear,
unbranched nucleic acid molecules built up from alternating deoxyribose
(sugar) and phosphate molecules. Each deoxyribose part is coupled to
a nucleotide base, which is responsible for establishing the connection
to the other strand of the DNA. The four nucleotide bases adenine (A),
thymine (T), cytosine (C) and guanine (G) are the alphabet of the genetic
information. The sequences of these bases in the DNA molecule determines
the building plan of any organism.
Glossary xxv
Gray code: A binary code for integer values which ensures that adjacent
integers are encoded by binary strings with Hamming distance one.
Gray codes play an important role in the application of canonical genetic
algorithms to parameter optimization problems, because there are certain
situations in which the use of Gray codes may improve the performance of
an evolutionary algorithm.
Hamming distance: For two binary vectors, the Hamming distance is the
number of different positions.
Haploid: Haploid organisms carry one set of genetic information.
Heterozygous: Diploid organisms having different alleles for a given trait.
Hillclimbing strategy: Hillclimbing methods owe their name to the analogy
of their way of searching for a maximum with the intuitive way a sightless
climber might feel his way from a valley up to the peak of a mountain
by steadily moving upwards. These strategies follow a nondecreasing path
to an optimum by a sequence of neighborhood moves. In the case of
multimodal landscapes, hillclimbing locates the optimum closest to the
starting point of its search.
Homologues: Chromosomes of identical structure, but with possibly different
genetic information contents.
Homozygous: Diploid organisms having identical alleles for a given trait.
Hybrid method: Evolutionary algorithms are often combined with classical
optimization techniques such as gradient methods to facilitate an efficient
local search in the final stage of the evolutionary optimization. The
resulting combinations of algorithms are often summarized by the term
hybrid methods.
Implicit parallelism: The concept that each individual solution offers partial
information about sampling from other solutions that contain similar
subsections. Although it was once believed that maximizing implicit
parallelism would increase the efficiency of an evolutionary algorithm,
this notion has been proved false in several different mathematical
developments (See no-free-lunch theorem).
Individual: A single member of a population. In evolutionary algorithms,
an individual contains a chromosome or genome, that usually contains at
least a representation of a possible solution to the problem being tackled
(a single point in the search space). Other information such as certain
strategy parameters and the individual’s fitness value are usually also
stored in each individual.
Intelligence: The definition of the term intelligence for the purpose of clarifying
what the essential properties of artificial or computational intelligence
should be turns out to be rather complicated. Rather than taking the usual
anthropocentric view on this, we adopt a definition by D Fogel which
states that intelligence is the capability of a system to adapt its behavior to
meet its goals in a range of environments. This definition also implies that
Glossary xxix
Migration model: The migration model (often also referred to as the island
model) is one of the basic models of parallelism exploited by evolutionary
algorithm implementations. The population is no longer panmictic,
but distributed into several independent subpopulations (so-called demes),
which coexist (typically on different processors, with one subpopulation
per processor) and may mutually exchange information by interdeme
migration. Each of the subpopulations corresponds to a conventional
(i.e. sequential) evolutionary algorithm. Since selection takes place
only locally inside a population, every deme is able to concentrate on
different promising regions of the search space, such that the global
search capabilities of migration models often exceed those of panmictic
populations. The fundamental parameters introduced by the migration
principle are the exchange frequency of information, the number of
individuals to exchange, the selection strategy for the emigrants, and the
replacement strategy for the immigrants.
Monte Carlo algorithm: See uniform random search.
(µ, λ) strategy: See comma strategy.
(µ + λ) strategy: See plus strategy.
Multiarmed bandit: Classical analysis of schema processing relied on an
analogy to sampling from a number of slot machines (one-armed bandits)
in order to minimize expected losses.
Multimembered evolution strategy: All variants of evolution strategies that
use a parent population size of µ > 1 and therefore facilitate the utilization
of recombination are summarized under the term multimembered evolution
strategy.
Multiobjective optimization: In multiobjective optimization, the simultaneous
optimization of several, possibly competing, objective functions is required.
The family of solutions to a multiobjective optimization problem is
composed of all those elements of the search space sharing the property that
the corresponding objective vectors cannot be all simultaneously improved.
These solutions are called Pareto optimal.
Multipoint crossover: A crossover operator which uses a predefined number
of uniformly distributed crossover points and exchanges alternating
segments between pairs of crossover points between the parent individuals
(cf one-point crossover).
Mutation: A change of the genetic material, either occurring in the germ path
or in the gametes (generative) or in body cells (somatic). Only generative
mutations affect the offspring. A typical classification of mutations
distinguishes gene mutations (a particular gene is changed), chromosome
mutations (the gene order is changed by translocation or inversion,
or the chromosome number is changed by deficiencies, deletions, or
duplications), and genome mutations (the number of chromosomes or
genomes is changed). In evolutionary algorithms, mutations are either
modeled on the phenotypic level (e.g. by using normally distributed
Glossary xxxi
bit with a certain probability between the two parent individuals. The
exchange probability typically has a value of one half, but other settings
are possible (cf discrete recombination). (See also Section 33.3.)
Uniform random search: A random search algorithm which samples the
search space by drawing points from a uniform distribution over the search
space. In contrast to evolutionary algorithms, uniform random search does
not update its sampling distribution according to the information gained
from past samples, i.e. it is not a Markov process.
Zygote: A fertilized egg that is always diploid.
1
Introduction to evolutionary computation
David B Fogel
1.2 Optimization
Evolution is an optimization process (Mayr 1988, p 104). Darwin (1859, ch 6)
was struck with the ‘organs of extreme perfection’ that have been evolved, one
such example being the image-forming eye (Atmar 1976). Optimization does not
imply perfection, yet evolution can discover highly precise functional solutions
to particular problems posed by an organism’s environment, and even though
the mechanisms that are evolved are often overly elaborate from an engineering
perspective, function is the sole quality that is exposed to natural selection, and
functionality is what is optimized by iterative selection and mutation.
It is quite natural, therefore, to seek to describe evolution in terms of an
algorithm that can be used to solve difficult engineering optimization problems.
The classic techniques of gradient descent, deterministic hill climbing, and
purely random search (with no heredity) have been generally unsatisfactory when
applied to nonlinear optimization problems, especially those with stochastic,
temporal, or chaotic components. But these are the problems that nature has
seemingly solved so very well. Evolution provides inspiration for computing
1
2 Introduction to evolutionary computation
the solutions to problems that have previously appeared intractable. This was a
key foundation for the efforts in evolution strategies (Rechenberg 1965, 1994,
Schwefel 1965, 1995).
1.5 Biology
Rather than attempt to use evolution as a tool to solve a particular engineering
problem, there is a desire to capture the essence of evolution in a computer
simulation and use the simulation to gain new insight into the physics of natural
evolutionary processes (Ray 1991) (see also Chapter 4). Success raises the
possibility of studying alternative biological systems that are merely plausible
images of what life might be like in some way. It also raises the question of what
properties such imagined systems might have in common with life as evolved on
Earth (Langton 1987). Although every model is incomplete, and assessing what
life might be like in other instantiations lies in the realm of pure speculation,
computer simulations under the rubric of artificial life have generated some
patterns that appear to correspond with naturally occurring phenomena.
Discussion 3
1.6 Discussion
The ultimate answer to the question ‘why simulate evolution?’ lies in the lack
of good alternatives. We cannot easily germinate another planet, wait several
millions of years, and assess how life might develop elsewhere. We cannot
easily use classic optimization methods to find global minima in functions when
they are surrounded by local minima. We find that expert systems and other
attempts to mimic human intelligence are often brittle: they are not robust to
changes in the domain of application and are incapable of correctly predicting
future circumstances so as to take appropriate action. In contrast, by successfully
exploiting the use of randomness, or in other words the useful use of uncertainty,
‘all possible pathways are open’ for evolutionary computation (Hofstadter 1995,
p 115). Our challenge is, at least in some important respects, to not allow our
own biases to constrain the potential for evolutionary computation to discover
new solutions to new problems in fascinating and unpredictable ways. However,
as always, the ultimate advancement of the field will come from the careful
abstraction and interpretation of the natural processes that inspire it.
References
Atmar J W 1976 Speculation on the Evolution of Intelligence and its Possible Realization
in Machine Form Doctoral Dissertation, New Mexico State University
Atmar W 1994 Notes on the simulation of evolution IEEE Trans. Neural Networks NN-5
130–47
Darwin C R 1859 On the Origin of Species by Means of Natural Selection or the
Preservation of Favoured Races in the Struggle for Life (London: Murray)
Fogel D B 1995 Evolutionary Computation: Toward a New Philosophy of Machine
Intelligence (Piscataway, NJ: IEEE)
Fogel L J 1962 Autonomous automata Industr. Res. 4 14–9
Fogel L J, Owens A J and Walsh M J 1966 Artificial Intelligence through Simulated
Evolution (New York: Wiley)
Hofstadter D 1995 Fluid Concepts and Creative Analogies: Computer Models of the
Fundamental Mechanisms of Thought (New York: Basic Books)
Holland J H 1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI:
University of Michigan Press)
Langton C G 1987 Artificial life Artificial Life ed C G Langton (Reading, MA: Addison-
Wesley) pp 1–47
Mayr E 1988 Toward a New Philosophy of Biology: Observations of an Evolutionist
(Cambridge, MA: Belknap)
Ray T 1991 An approach to the synthesis of life Artificial Life II ed C G Langton, C
Taylor, J D Farmer and S Rasmussen (Reading, MA: Addison-Wesley) pp 371–408
Rechenberg I 1965 Cybernetic Solution Path of an Experimental Problem Royal Aircraft
Establishment Library Translation 1122, Farnborough, UK
——1994 Evolutionsstrategies ’94 (Stuttgart: Frommann-Holzboog)
Schwefel H-P 1965 Kybernetische Evolution als Strategie der Experimentellen Forschung
in der Strömungstechnik Diploma Thesis, Technical University of Berlin
——1995 Evolution and Optimum Seeking (New York: Wiley)
2
Possible applications of evolutionary
computation
David Beasley
2.1 Introduction
Applications of evolutionary computation (EC) fall into a wide continuum of
areas. For convenience, in this chapter they have been split into five broad
categories:
• planning
• design
• simulation and identification
• control
• classification.
These categories are by no means meant to be absolute or definitive. They
all overlap to some extent, and many applications could rightly appear in more
than one of the categories.
A number of bibliographies where more extensive information on EC
applications can be found are listed after the references at the end of this chapter.
4
Applications in planning 5
all based at the same depot. A set of customers must each receive one delivery.
Which route should each vehicle take for minimum cost? There are constraints,
for example, on vehicle capacity and delivery times (Blanton and Wainwright
1993, Thangia et al 1993).
Closely related to this is the transportation problem, in which a single
commodity must be distributed to a number of customers from a number of
depots. Each customer may receive deliveries from one or more depots. What
is the minimum-cost solution? (Michalewicz 1992, 1993).
Planning the path which a robot should take is another route planning
problem. The path must be feasible and safe (i.e. it must be achievable within the
operational constraints of the robot) and there must be no collisions. Examples
include determining the joint motions required to move the gripper of a robot
arm between locations (Parker et al 1989, Davidor 1991, McDonnell et al 1992),
and autonomous vehicle routing (Jakob et al 1992, Page et al 1992). In unknown
areas or nonstatic environments, on-line planning/navigating is required, in
which the robot revises its plans as it travels.
2.2.2 Scheduling
2.2.3 Packing
Evolutionary algorithms (EAs) have been applied to many packing problems, the
simplest of which is the one-dimensional zero–one knapsack problem. Given a
knapsack of a certain capacity, and a set of items, each with a particular size and
value, find the set of items with maximum value which can be accommodated
in the knapsack. Various real-world problems are of this type: for example, the
allocation of communication channels to customers who are charged at different
rates.
There are various examples of two-dimensional packing problems. When
manufacturing items are cut from sheet materials (e.g. metal or cloth), it is
desirable to find the most compact arrangement of pieces, so as to minimize
the amount of scrap (Smith 1985, Fujita et al 1993). A similar problem arises
in the design of layouts for integrated circuits—how should the subcircuits be
arranged to minimize the total chip area required (Fourman 1985, Cohoon and
Paris 1987, Chan et al 1991)?
In three dimensions, there are obvious applications in which the best way of
packing objects into a restricted space is required. Juliff (1993) has considered
the problem of packing goods into a truck for delivery.
The design of filters has received considerable attention. EAs have been used
to design electronic or digital systems which implement a desired frequency
response. Both finite impulse response (FIR) and infinite impulse response
(IIR) filter structures have been employed (Etter et al 1982, Suckley 1991, Fogel
1991, Fonseca et al 1993, Ifeachor and Harris 1993, Namibar and Mars 1993,
Roberts and Wade 1993, Schaffer and Eshelman 1993, White and Flockton 1993,
Wicks and Lawson 1993, Wilson and Macleod 1993). EAs have also been used
to optimize the design of signal processing systems (San Martin and Knight
1993) and in integrated circuit design (Louis and Rawlins 1991, Rahmani and
Ono 1993). The unequal-area facility layout problem (Smith and Tate 1993)
is similar to integrated circuit design. It involves finding a two-dimensional
arrangement of ‘departments’ such that the distance which information has to
travel between departments is minimized.
EC techniques have been widely applied to artificial neural networks, both in
the design of network topologies and in the search for optimum sets of weights
(Miller et al 1989, Fogel et al 1990, Harp and Samad 1991, Baba 1992, Hancock
1992, Feldman 1993, Gruau 1993, Polani and Uthmann 1993, Romaniuk 1993,
Spittle and Horrocks 1993, Zhang and Mühlenbein 1993, Porto et al 1995). They
have also been applied to Kohonen feature map design (Polani and Uthmann
1992). Other types of network design problems have also been approached, for
example, in telecommunications (Cox et al 1991, Davis and Cox 1993).
Applications in simulation and identification 7
Janikow and Cai (1992) similarly used EC to estimate statistical functions for
survival analysis in clinical trials. In a similar area, Manela et al (1993) used
EC to fit spline functions to noisy pharmaceutical fermentation process data.
EC may also be used to identify the sources of airborne pollution, given data
from a number of monitoring points in an urban area—the source apportionment
problem. In electromagnetics, Tanaka et al (1993) have applied EC to
determining the two-dimensional current distribution in a conductor, given its
external magnetic field. Away from conventional system identification, an EC
approach has been used to help with identifying criminal suspects. This system
helps witnesses to create a likeness of the suspect, without the need to give an
explicit description.
2.7 Summary
EC has been applied in a vast number of application areas. In some cases it has
advantages over existing computerized techniques. More interestingly, perhaps,
it is being applied to an increasing number of areas in which computers have
not been used before. We can expect to see the number of applications grow
considerably in the future. Comprehensive bibliographies in many different
application areas are listed after the References.
References
Abu Zitar R A and Hassoun M H 1993 Regulator control via genetic search and assisted
reinforcement Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL,
July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 254–62
Almássy N and Verschure P 1992 Optimizing self-organising control architectures with
genetic algorithms: the interaction between natural selection and ontogenesis
Parallel Problem Solving from Nature, 2 (Proc. 2nd Int. Conf. on Parallel Problem
Solving from Nature, Brussels, 1992) ed R Männer and B Manderick (Amsterdam:
Elsevier) pp 451–60
Altman E R, Agarwal V K and Gao G R 1993 A novel methodology using genetic
algorithms for the design of caches and cache replacement policy Proc. 5th Int.
Conf. on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San
Mateo, CA: Morgan Kaufmann) pp 392–9
Axelrod R 1987 The evolution of strategies in the iterated prisoner’s dilemma Genetic
Algorithms and Simulated Annealing ed L Davis (Boston, MA: Pitman) ch 3, pp 32–
41
Baba N 1992 Utilization of stochastic automata and genetic algorithms for neural network
learning Parallel Problem Solving from Nature, 2 (Proc. 2nd Int. Conf. on Parallel
Problem Solving from Nature, Brussels, 1992) ed R Männer and B Manderick
(Amsterdam: Elsevier) pp 431–40
Bagchi S, Uckun S, Miyabe Y and Kawamura K 1991 Exploring problem-specific
recombination operators for job shop scheduling Proc. 4th Int. Conf. on Genetic
Algorithms (San Diego, CA, July 1991) ed R Belew and L Booker (San Mateo, CA:
Morgan Kaufmann) pp 10–7
Bala J W and Wechsler H 1993 Learning to detect targets using scale-space and genetic
search Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 516–22
References 11
Biegel J E and Davern J J 1990 Genetic algorithms and job shop scheduling Comput.
Indust. Eng. 19 81–91
Blanton J L and Wainwright R L 1993 Multiple vehicle routing with time and capacity
constraints Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 452–9
Booker L 1985 Improving the performance of genetic algorithms in classifier systems
Proc. 1st Int. Conf. on Genetic Algorithms (Pittsburgh, PA, July 1985) ed J J
Grefenstette (Hillsdale, NJ: Lawrence Erlbaum Associates) pp 80–92
Bramlette M F and Bouchard E E 1991 Genetic algorithms in parametric design of
aircraft Handbook of Genetic Algorithms ed L Davis (New York: Van Nostrand
Reinhold) ch 10, pp 109–23
Cartwright H M and Tuson A L 1994 Genetic algorithms and flowshop scheduling:
towards the development of a real-time process control system Evolutionary
Computing (AISB Workshop, Leeds, 1994, Selected Papers) (Lecture Notes in
Computer Science 865) ed T C Fogarty (Berlin: Springer) pp 277–90
Chan H, Mazumder P and Shahookar K 1991 Macro-cell and module placement by
genetic adaptive search with bitmap-represented chromosome Integration VLSI J.
12 49–77
Cohoon J P and Paris W D 1987 Genetic placement IEEE Trans. Computer-Aided Design
CAD-6 956–64
Corne D, Ross P and Fang H-L 1994 Fast practical evolutionary timetabling Evolutionary
Computing (AISB Workshop, Leeds, 1994, Selected Papers) (Lecture Notes in
Computer Science 865) ed T C Fogarty (Berlin: Springer) pp 250–63
Cox L A, Davis L and Qiu Y 1991 Dynamic anticipatory routing in circuit-switched
telecommunications networks Handbook of Genetic Algorithms ed L Davis (New
York: Van Nostrand Reinhold) ch 11, pp 124–43
Davidor Y 1991 A genetic algorithm applied to robot trajectory generation Handbook
of Genetic Algorithms ed L Davis (New York: Van Nostrand Reinhold) ch 12,
pp 144–65
Davis L 1985 Job shop scheduling with genetic algorithms Proc. 1st Int. Conf. on Genetic
Algorithms (Pittsburgh, PA, July 1985) ed J J Grefenstette (Hillsdale, NJ: Lawrence
Erlbaum Associates) pp 136–40
Davis L and Cox A 1993 A genetic algorithm for survivable network design Proc. 5th
Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest
(San Mateo, CA: Morgan Kaufmann) pp 408–15
DeJong K 1980 Adaptive system design: a genetic approach IEEE Trans. Systems, Man
Cybern. SMC-10 566–74
Easton F F and Mansour N 1993 A distributed genetic algorithm for employee staffing
and scheduling problems Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-
Champaign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann)
pp 360–67
Etter D M, Hicks M J and Cho K H 1982 Recursive adaptive filter design using
an adaptive genetic algorithm IEEE Int. Conf. on Acoutics, Speech and Signal
Processing (Piscataway, NJ: IEEE) pp 635–8
Fairley A and Yates D F 1994 Inductive operators and rule repair in a hybrid genetic
learning system: some initial results Evolutionary Computing (AISB Workshop,
Leeds, 1994, Selected Papers) (Lecture Notes in Computer Science 865) ed T C
Fogarty (Berlin: Springer) pp 166–79
12 Possible applications of evolutionary computation
Fang H-L, Ross P and Corne D 1993 A promising genetic algorithm approach to job-
shop scheduling, rescheduling and open-shop scheduling problems Proc. 5th Int.
Conf. on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San
Mateo, CA: Morgan Kaufmann) pp 375–82
Feldman D S 1993 Fuzzy network synthesis with genetic algorithms Proc. 5th Int. Conf.
on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 312–7
Flockton S J and White M 1993 Pole-zero system identification using genetic algorithms
Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 531–5
Fogarty T C 1994 Co-evolving co-operative populations of rules in learning control
systems Evolutionary Computing (AISB Workshop, Leeds, 1994, Selected Papers)
(Lecture Notes in Computer Science 865) ed T C Fogarty (Berlin: Springer) pp 195–
209
Fogel D B 1988 An evolutionary approach to the traveling salesman problem Biol.
Cybernet. 6 139–44
——1990 A parallel processing approach to a multiple traveling salesman problem
using evolutionary programming Proc. 4th Ann. Symp. on Parallel Processing
(Piscataway, NJ: IEEE) pp 318–26
——1991 System Identification through Simulated Evolution (Needham, MA: Ginn)
——1993a Applying evolutionary programming to selected traveling salesman problems
Cybernet. Syst. 24 27–36
——1993b Evolving behaviors in the iterated prisoner’s dilemma Evolut. Comput. 1
77–97
——1995 Evolutionary Computation: Toward a New Philosophy of Machine Intelligence
(Piscataway, NJ: IEEE)
Fogel D B and Fogel L J 1996 Using evolutionary programming to schedule tasks on a
suite of heterogeneous computers Comput. Operat. Res. 23 527–34
Fogel D B, Fogel L J and Porto V W 1990 Evolving neural networks Biol. Cybern. 63
487–93
Fogel L J, Owens A J and Walsh M J 1966 Artificial intelligence Through Simulated
Evolution (New York: Wiley)
Fonseca C M and Fleming P J 1993 Genetic algorithms for multiobjective optimization:
formulation, discussion and generalization Proc. 5th Int. Conf. on Genetic
Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 416–23
Fonseca C M, Mendes E M, Fleming P J and Billings S A 1993 Non-linear model
term selection with genetic algorithms Natural Algorithms in Signal Processing
(Workshop, Chelmsford, UK, November 1993) vol 2 (London: IEE) pp 27/1–27/8
Fourman M P 1985 Compaction of symbolic layout using genetic algorithms Proc. 1st
Int. Conf. on Genetic Algorithms (Pittsburgh, PA, July 1985) ed J J Grefenstette
(Hillsdale, NJ: Lawrence Erlbaum Associates) pp 141–53
Fujita K, Akagi S and Hirokawa N 1993 Hybrid approach for optimal nesting
using genetic algorithm and a local minimization algorithm Advances in Design
Automation vol 1, DE-65–1 (ASME) pp 477–84
Furuya H and Haftka R T 1993 Genetic algorithms for placing actuators on space
structures Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 536–42
References 13
Manela, M., Thornhill N and Campbell J A 1993 Fitting spline functions to noisy data
using a genetic algorithm Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-
Champaign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann)
pp 549–56
McDonnell J R, Andersen B L, Page W C and Pin F G 1992 Mobile manipulator
configuration optimization using evolutionary programming Proc. 1st Ann. Conf. on
Evolutionary Programming ed D B Fogel and W Atmar (La Jolla, CA: Evolutionary
Programming Society) pp 52–62
Melhuish C and Fogarty T C 1994 Applying a restricted mating policy to determine state
space niches using immediate and delayed reinforcement Evolutionary Computing
(AISB Workshop, Leeds, 1994, Selected Papers) (Lecture Notes in Computer Science
865) ed T C Fogarty (Berlin: Springer) pp 224–37
Michalewicz Z 1992 Genetic Algorithms + Data Structures = Evolution Programs (Berlin:
Springer)
——1993 A hierarchy of evolution programs: an experimental study Evolut. Comput. 1
51–76
Miller G F, Todd P M and Hegde S U 1989 Designing neural networks using genetic
algorithms. Proc. 3rd Int. Conf. on Genetic Algorithms (Fairfax, VA, June 1989) ed
J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 379–84
Mühlenbein H 1989 Parallel genetic algorithms, population genetics and combinatorial
optimization Proc. 3rd Int. Conf. on Genetic Algorithms (Fairfax, VA, June 1989)
ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 416–21
Namibar R and Mars P 1993 Adaptive IIR filtering using natural algorithms Natural
Algorithms in Signal Processing (Workshop, Chelmsford, UK, November 1993) vol
2 (London: IEE) pp 20/1–20/10
Oliver J R 1993 Discovering individual decision rules: an application of genetic
algorithms Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 216–22
Oliver I M, Smith D J and Holland J R C 1987 A study of permutation crossover operators
on the travelling salesman problem Proc. 2nd Int. Conf. on Genetic Algorithms
(Cambridge, MA, 1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 224–30
Page W C, McDonnell J R and Anderson B 1992 An evolutionary programming
approach to multi-dimensional path planning Proc. 1st Ann. Conf. on Evolutionary
Programming ed D B Fogel and W Atmar (La Jolla, CA: Evolutionary Programming
Society) pp 63–70
Parker J K, Goldberg D E and Khoogar A R 1989 Inverse kinematics of redundant robots
using genetic algorithms Proc. Int. Conf. on Robotics and Automation (Scottsdale,
AZ, 1989) vol 1 (Los Alamitos: IEEE Computer Society Press) pp 271–6
Patel M J and Dorigo M 1994 Adaptive learning of a robot arm Evolutionary Computing
(AISB Workshop, Leeds, 1994, Selected Papers) (Lecture Notes in Computer Science
865) ed T C Fogarty (Berlin: Springer) pp 180–94
Pipe A G and Carse B 1994 A comparison between two architectures for searching and
learning in maze problems Evolutionary Computing (AISB Workshop, Leeds, 1994,
Selected Papers) (Lecture Notes in Computer Science 865) ed T C Fogarty (Berlin:
Springer) pp 238–49
Polani D and Uthmann T 1992 Adaptation of Kohonen feature map topologies by genetic
algorithms Parallel Problem Solving from Nature, 2 (Proc. 2nd Int. Conf. on Parallel
16 Possible applications of evolutionary computation
Spencer G F 1993 Automatic generation of programs for crawling and walking Proc. 5th
Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest
(San Mateo, CA: Morgan Kaufmann) p 654
Spittle M C and Horrocks D H 1993 Genetic algorithms and reduced complexity artificial
neural networks Natural Algorithms in Signal Processing (Workshop, Chelmsford,
UK, November 1993) vol 1 (London: IEE) pp 8/1–8/9
Suckley D 1991 Genetic algorithm in the design of FIR filters IEE Proc. G 138 234–8
Syswerda G 1991 Schedule optimization using genetic algorithms Handbook of Genetic
Algorithms ed L Davis(New York: Van Nostrand Reinhold) ch 21, pp 332–49
Tackett W A 1993 Genetic programming for feature discovery and image discrimination
Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 303–9
Tanaka, Y., Ishiguro A and Uchikawa Y 1993 A genetic algorithms application to inverse
problems in electromagnetics Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-
Champaign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) p 656
Thangia S R, Vinayagamoorthy R and Gubbi A V 1993 Vehicle routing with time
deadlines using genetic and local algorithms Proc. 5th Int. Conf. on Genetic
Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 506–13
Unger R and Moult J 1993 A genetic algorithm for 3D protein folding simulations
Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 581–8
Van Driessche R and Piessens R 1992 Load balancing with genetic algorithms Parallel
Problem Solving from Nature, 2 (Proc. 2nd Int. Conf. on Parallel Problem Solving
from Nature, Brussels, 1992) ed R Männer and B Manderick (Amsterdam: Elsevier)
pp 341–50
Verhoeven M G A, Aarts E H L, van de Sluis E and Vaessens R J M 1992 Parallel local
search and the travelling salesman problem Parallel Problem Solving from Nature,
2 (Proc. 2nd Int. Conf. on Parallel Problem Solving from Nature, Brussels, 1992)
ed R Männer and B Manderick (Amsterdam: Elsevier) pp 543–52
Watabe H and Okino N 1993 A study on genetic shape design Proc. 5th Int. Conf. on
Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 445–50
White M and Flockton S 1993 A comparative study of natural algorithms for adaptive
IIR filtering Natural Algorithms in Signal Processing (Workshop, Chelmsford, UK,
November 1993) vol 2 (London: IEE) pp 22/1–22/8
Whitley D, Starkweather T and Fuquay D 1989 Scheduling problems and travelling
salesmen: the genetic edge recombination operator Proc. 3rd Int. Conf. on Genetic
Algorithms (Fairfax, VA, June 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 133–40
Wicks T and Lawson S 1993 Genetic algorithm design of wave digital filters with
a restricted coefficient set Natural Algorithms in Signal Processing (Workshop,
Chelmsford, UK, November 1993) vol 1 (London: IEE) pp 17/1–17/7
Wilson P B and Macleod M D 1993 Low implementation cost IIR digital filter
design using genetic algorithms Natural Algorithms in Signal Processing (Workshop,
Chelmsford, UK, November 1993) vol 1 (London: IEE) pp 4/1–4/8
Wilson S W 1987 Hierarchical credit allocation in a classifier system Genetic Algorithms
and Simulated Annealing ed L Davis (Boston, MA: Pitman) ch 8, pp 104–15
18 Possible applications of evolutionary computation
Further reading
This article has provided only a glimpse into the range of applications for
evolutionary computing. A series of comprehensive bibliographies has been
produced by J T Alander of the Department of Information Technology and
Production Economics, University of Vaasa, as listed below.
Art and Music: Indexed Bibliography of Genetic Algorithms in Art and Music
Report 94-1-ART (ftp.uwasa.fi/cs/report94-1/gaARTbib.ps.Z)
Hans-Paul Schwefel
20
Conclusions 21
one(s) with respect to the additional test problem. This game could in principle
be played ad infinitum.
A better means of clarifying the scene ought to result from theory. This
should clearly define the domain of applicability of each algorithm by presenting
convergence proofs and efficiency results. Unfortunately, however, it is possible
to prove abilities of algorithms only by simplifying them as well as the situations
to which they are confronted. The huge remainder of questions must be
answered by means of (always limited) test series, and even that cannot tell
much about an actual real-world problem-solving situation with yet unanalyzed
features, that is, the normal case in applications.
Again unfortunately, there does not exist an agreed-upon test problem
catalogue to evaluate old as well as new algorithms in a concise way. It is
doubtful whether such a test bed will ever be agreed upon, but efforts in that
direction would be worthwhile.
3.2 Conclusions
Finally, what are the truths and consequences? First, there will always remain a
dichotomy between efficiency and general applicability, between reliability and
effort of problem-solving, especially optimum-seeking, algorithms. Any specific
knowledge about the situation at hand may be used to specify an adequate
specific solution algorithm, the optimal situation being that one knows the
solution in advance. On the other hand, there cannot exist one method that solves
all problems effectively as well as efficiently. These goals are contradictory.
If there is already a traditional method that solves a given problem, EAs
should not be used. They cannot do it better or with less computational effort.
In particular, they do not offer an escape from the curse of dimensionality—the
often quadratic, cubic, or otherwise polynomial increase in instructions used as
the number of decision variables is increased, arising, for example, from matrix
manipulation.
To develop a new solution method suitable for a problem at hand may be
a nice challenge to a theoretician, who will afterwards get some merit for his
effort, but from the application point of view the time for developing the new
technique has to be added to the computer time invested. In that respect, a
nonspecialized, robust procedure (and EAs belong to this class) may be, and
often proves to be, worthwhile.
A warning should be given about a common practice—the linearization or
other decomplexification of the situation in order to make a traditional method
applicable. Even a guaranteed globally optimal solution for the simplified task
may be a long way off and thus greatly inferior to an approximate solution to
the real problem.
The best one can say about EAs, therefore, is that they present a
methodological framework that is easy to understand and handle, and is either
usable as a black-box method or open to the incorporation of new or old
22 Advantages (and disadvantages) of evolutionary computation over other
approaches
recipes for further sophistication, specialization or hybridization. They are
applicable even in dynamic situations where the goal or constraints are moving
over time or changing, either exogenously or self-induced, where parameter
adjustments and fitness measurements are disturbed, and where the landscape is
rough, discontinuous, multimodal, even fractal or cannot otherwise be handled
by traditional methods, especially those that need global prediction from local
surface analysis.
There exist EA versions for multiple criterion decision making (MCDM)
and many different parallel computing architectures. Almost forgotten today is
their applicability in experimental (non-computing) situations.
Sometimes striking is the fact that even obviously wrong parameter settings
do not prevent fairly good results: this certainly can be described as robustness.
Not yet well understood, but nevertheless very successful are those EAs which
self-adapt some of their internal parameters, a feature that can be described as
collective learning of the environmental conditions. Nevertheless, even self-
adaptation does not circumvent the NFL theorem.
In this sense, and only in this sense, EAs always present an intermediate
compromise; the enthusiasm of their inventors is not yet taken into account
here, nor the insights available from the analysis of the algorithms for natural
evolutionary processes which they try to mimic.
References
Schwefel H-P 1995 Evolution and Optimum Seeking (New York: Wiley)
Wolpert D H and Macready W G 1996 No Free Lunch Theorems for Search Technical
Report SFI-TR-95-02-010 Santa Fe Institute
4
Principles of evolutionary processes
David B Fogel
4.1 Overview
The most widely accepted collection of evolutionary theories is the neo-
Darwinian paradigm. These arguments assert that the vast majority of the
history of life can be fully accounted for by physical processes operating on
and within populations and species (Hoffman 1989, p 39). These processes
are reproduction, mutation, competition, and selection. Reproduction is an
obvious property of extant species. Further, species have such great reproductive
potential that their population size would increase at an exponential rate if
all individuals of the species were to reproduce successfully (Malthus 1826,
Mayr 1982, p 479). Reproduction is accomplished through the transfer of an
individual’s genetic program (either asexually or sexually) to progeny. Mutation,
in a positively entropic system, is guaranteed, in that replication errors during
information transfer will necessarily occur. Competition is a consequence of
expanding populations in a finite resource space. Selection is the inevitable
result of competitive replication as species fill the available space. Evolution
becomes the inescapable result of interacting basic physical statistical processes
(Huxley 1963, Wooldridge 1968, Atmar 1979).
Individuals and species can be viewed as a duality of their genetic program,
the genotype (Section 5.2), and their expressed behavioral traits, the phenotype.
The genotype provides a mechanism for the storage of experiential evidence,
of historically acquired information. Unfortunately, the results of genetic
variations are generally unpredictable due to the universal effects of pleiotropy
and polygeny (figure 4.1) (Mayr 1959, 1963, 1982, 1988, Wright 1931, 1960,
Simpson 1949, p 224, Dobzhansky 1970, Stanley 1975, Dawkins 1986).
Pleiotropy is the effect that a single gene may simultaneously affect several
phenotypic traits. Polygeny is the effect that a single phenotypic characteristic
may be determined by the simultaneous interaction of many genes. There are no
one-gene, one-trait relationships in naturally evolved systems. The phenotype
varies as a complex, nonlinear function of the interaction between underlying
genetic structures and current environmental conditions. Very different genetic
23
24 Principles of evolutionary processes
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
10 j
Figure 4.1. Pleiotropy is the effect that a single gene may simultaneously affect several
phenotypic traits. Polygeny is the effect that a single phenotypic characteristic may
be determined by the simultaneous interaction of many genes. These one-to-many and
many-to-one mappings are pervasive in natural systems. As a result, even small changes
to a single gene may induce a raft of behavioral changes in the individual (after Mayr
1963).
structures may code for equivalent behaviors, just as diverse computer programs
can generate similar functions.
Selection directly acts only on the expressed behaviors of individuals and
species (Mayr 1988, pp 477–8). Wright (1932) offered the concept of adaptive
topography to describe the fitness of individuals and species (minimally, isolated
reproductive populations termed demes). A population of genotypes maps to
respective phenotypes (sensu Lewontin 1974), which are in turn mapped onto
the adaptive topography (figure 4.2). Each peak corresponds to an optimized
collection of phenotypes, and thus to one of more sets of optimized genotypes.
Evolution probabilistically proceeds up the slopes of the topography toward
peaks as selection culls inappropriate phenotypic variants.
Others (Atmar 1979, Raven and Johnson 1986, pp 400–1) have suggested
that it is more appropriate to view the adaptive landscape from an inverted
position. The peaks become troughs, ‘minimized prediction error entropy wells’
(Atmar 1979). Searching for peaks depicts evolution as a slowly advancing,
tedious, uncertain process. Moreover, there appears to be a certain fragility to
an evolving phyletic line; an optimized population might be expected to quickly
fall of the peak under slight perturbations. The inverted topography leaves an
altogether different impression. Populations advance rapidly down the walls of
the error troughs until their cohesive set of interrelated behaviors is optimized,
Overview 25
References
Atmar W 1979 The inevitability of evolutionary invention, unpublished manuscript
Dawkins R 1986 The Blind Watchmaker (Oxford: Clarendon)
Dobzhansky T 1970 Genetics of the Evolutionary Processes (New York: Columbia
University Press)
Hoffman A 1989 Arguments on Evolution: a Paleontologist’s Perspective (New York:
Oxford University Press)
Huxley J 1963 The evolutionary process Evolution as a Process ed J Huxley, A C Hardy
and E B Ford (New York: Collier) pp 9–33
Lewontin R C 1974 The Genetic Basis of Evolutionary Change (New York: Columbia
University Press)
Malthus T R 1826 An Essay on the Principle of Population, as it Affects the Future
Improvement of Society 6th edn (London: Murray)
Mayr E 1959 Where are we? Cold Spring Harbor Symp. Quant. Biol. 24 409–40
——1963 Animal Species and Evolution (Cambridge, MA: Belknap)
——1982 The Growth of Biological Thought: Diversity, Evolution and Inheritance
(Cambridge, MA: Belknap)
——1988 Toward a New Philosophy of Biology: Observations of an Evolutionist
(Cambridge, MA: Belknap)
Raven P H and Johnson G B 1986 Biology (St Louis, MO: Times Mirror)
Simpson G G 1949 The Meaning of Evolution: a Study of the History of Life and its
Significance for Man (New Haven, CT: Yale University Press)
Stanley S M 1975 A theory of evolution above the species level Proc. Natl Acad. Sci.
USA 72 646–50
Wooldridge D E 1968 The Mechanical Man: the Physical Basis of Intelligent Life (New
York: McGraw-Hill)
Wright S 1931 Evolution in Mendelian populations Genetics 16 97–159
——1932 The roles of mutation, inbreeding, crossbreeding, and selection in evolution
Proc. 6th Int. Congr. on Genetics (Ithaca, NY) vol 1, pp 356–66
——1960 The evolution of life, panel discussion Evolution After Darwin: Issues in
Evolution vol 3, ed S Tax and C Callender (Chicago, IL: University of Chicago
Press)
5
Principles of genetics
Raymond C Paton
5.1 Introduction
The material covers a number of key areas which are necessary to understanding
the nature of the evolutionary process. We begin by looking at some basic ideas
of heredity and how variation occurs in interbreeding populations. From here
we look at the gene in more detail and then consider how it can undergo change.
The next section looks at aspects of population thinking needed to appreciate
selection. This is crucial to an appreciation of Darwinian mechanisms of
evolution. The chapter concludes with selected references to further information.
In order to keep this contribution within its size limits, the material is primarily
about the biology of higher plants and animals.
27
28 Principles of genetics
rule is not universally true when it comes to the distribution of sex chromosomes.
Human diploid cells contain 46 chromosomes of which there are 22 pairs and
an additional two sex chromosomes. Sex is determined by one pair (called
the sex chromosomes); female is X and male is Y. A female human has the
sex chromosome genotype of XX and a male is XY. The inheritance of sex is
summarized in figure 5.2. The members of a pair of nonsex chromosomes are
said to be homologous (this is also true for XX genotypes whereas XY are not
homologous).
of particular traits in peas. For example, he took plants that had wrinkled
seeds and plants that had round seeds and bred them with plants of the same
phenotype (i.e. observable appearance), so wrinkled were bred with wrinkled and
round were bred with round. He continued this over a number of generations
until round always produced round offspring and wrinkled, wrinkled. These
are called pure breeding plants. He then cross-fertilized the plants by breeding
rounds with wrinkles. The subsequent generation (called the F1 hybrids) was
all round. Then Mendel crossed the F1 hybrids with each other and found that
the next generation, the F2 hybrids, had round and wrinkled plants in the ratio
of 3 (round) : 1 (wrinkled).
Mendel did this kind of experiment with a number of pea characteristics
such as:
In each case he found that the the F1 hybrids were always of one form and
the two forms reappeared in the F2. Mendel called the form which appeared in
the F1 generation dominant and the form which reappeared in the F2 recessive
(for the full text of Mendel’s experiments see an older genetics book, such as
that by Sinnott et al (1958)).
A modern interpretation of inheritance depends upon a proper understanding
of the nature of a gene and how the gene is expressed in the phenotype. The
nature of a gene is quite complex as we shall see later (see also Alberts et al
1989, Lewin 1990, Futuyma 1986). For now we shall take it to be the functional
unit of inheritance. An allele (allelomorph) is one of several forms of a gene
occupying a given locus (location) on a chromosome. Originally related to pairs
of contrasting characteristics (see examples above), the idea of observable unit
characters was introduced to genetics around the turn of this century by such
workers as Bateson, de Vries, and Correns (see Darden 1991). The concept of
a gene has tended to replace allele in general usage although the two terms are
not the same.
How can the results of Mendel’s experiments be interpreted? We know
that each parent plant provides half the chromosome complement found in its
offspring and that chromosomes in the diploid cells are in pairs of homologues.
In the pea experiments pure breeding parents had homologous chromosomes
which were identical for a particular gene; we say they are homozygous for
a particular gene. The pure breeding plants were produced through self-
fertilization and by selecting those offspring of the desired phenotype. As round
was dominant to wrinkled we say that the round form of the gene is R (‘big
r’) and the wrinkled r (‘little r’). Figure 5.3 summarizes the cross of a pure
breeding round (RR) with a pure breeding wrinkled (rr).
30 Principles of genetics
We see the appearance of the heterozygote (in this case Rr) in the F1
generation. This is phenotypically the same as the dominant phenotype but
genotypically contains both a dominant and a recessive form of the particular
gene under study. Thus when the heterozygotes are randomly crossed with
each other the phenotype ratio is three dominant : one recessive. This is called
the monohybrid ratio (i.e. for one allele). We see in Mendel’s experiments
the independent segregation of alleles during breeding and their subsequent
independent assortment in offspring.
In the case of two genes we find more phenotypes and genotypes appearing.
Consider what happens when pure breeding homozygotes for round yellow seeds
(RRYY) are bred with pure breeding homozygotes for wrinkled green seeds
(rryy). On being crossed we end up with heterozygotes with a genotype of
RrYy and phenotype of round yellow seeds. We have seen that the genes
segregate independently during meiosis so we have the combinations shown in
figure 5.4.
Thus the gametes of the heterozygote can be of four kinds though we assume
that each form can occur with equal frequency. We may examine the possible
combinations of gametes for the next generation by producing a contingency
table for possible gamete combinations. These are shown in figure 5.5.
Some fundamental concepts in genetics 31
translated into protein. The translation process converts the mRNA code into a
protein sequence via another form of RNA called transfer RNA (tRNA). In this
way, genes are transcribed so that mRNA may be produced, from which protein
molecules (typically the ‘workhorses’ and structural molecules of a cell) can be
formed. This flow of information is generally unidirectional. (For more details
on this topic the reader should consult a molecular biology text and look at the
central dogma of molecular biology, see e.g. Lewin 1990, Alberts et al 1989.)
Figure 5.11 provides a simplified view of the anatomy of a structural gene,
that is, one which codes for a protein or RNA.
That part of the gene which ultimately codes for protein or RNA is preceded
upstream by three stretches of code. The enhancer facilitates the operation of
the promoter region, which is where RNA polymerase is bound to the gene in
order to initiate transcription. The operator is the site where transcription can
be halted by the presence of a repressor protein. Exons are expressed in the
final gene product (e.g. the protein molecule) whereas introns are transcribed
but are removed from the transcript leaving the fragments of exon material to
be spliced. One stretch of DNA may consist of several overlapping genes. For
example, the introns in one gene may be the exons in another (Lewin 1990).
The terminator is the postexon region of the gene which causes transcription
to be terminated. Thus a biological gene contains not only code to be read
but also coded instructions on how it should be read and what should be read.
Genes are highly organized. An operon system is located on one chromosome
and consists of a regulator gene and a number of contiguous structural genes
which share the same promoter and terminator and code for enzymes which
are involved in specific metabolic pathways (the classical example is the Lac
operon, see figure 5.12).
Operons can be grouped together into higher-order (hierarchical) regulatory
genetic systems (Neidhart et al 1990). For example, a number of operons
from different chromosomes may be regulated by a single gene known as a
regulon. These higher-order systems provide a great challenge for change in a
genome. Modification of the higher-order gene can have profound effects on
the expression of structural genes that are under its influence.
Options for change 35
remains in equilibrium from one generation to the next. For a single allele, if
the frequency of one form is p then that of the other (say q) is 1 − p. The three
genotypes that exist with this allele have the population proportions of
p 2 + 2pq + q 2 = 1.
This equation does not apply when a mixture of four factors changes the relative
frequencies of genes in a population: mutation, selection, gene flow, and random
genetic drift (drift). Drift can be described as the effect of the sampling of a
population on its parents. Each generation can be thought of as a sample of its
parents’ population. In that the current population is a sample of its parents,
we acknowledge that a statistical sampling error should be associated with gene
frequencies. The effect will be small in large populations because the relative
proportion of random changes will be a very small component of the large
numbers. However, drift in a small population will have a marked effect.
One factor which can counteract the effect of drift is differential migration
of individuals between populations which leads to gene flow. Several models of
gene flow exist. For example, migration which occurs at random among a group
of small populations is called the island model whereas in the stepping stone
model each population receives migrants only from neighboring populations.
Mutation, selection, and gene flow are deterministic factors so that if fitness,
mutation rate, and rate of gene flow are the same for a number of populations
that begin with same gene frequencies, they will attain the same equilibrium
composition. Drift is a stochastic process because the sampling effect on the
parent population is random.
Sewall Wright introduced the idea of an adaptive landscape to explain how
a population’s allele frequencies might evolve over time. The peaks on the
landscape represent genetic compositions of a population for which the mean
fitness is high and troughs are possible compositions where the mean fitness
is low. As gene frequencies change and mean fitness increases the population
moves uphill. Indeed, selection will operate to increase mean fitness so, on
a multipeaked landscape, selection may operate to move populations to local
maxima. On a fixed landscape drift and selection can act together so that
populations may move uphill (through selection) or downhill (through drift).
This means that the global maximum for the landscape could be reached. These
ideas are formally encapsulated in Wright’s (1968–1978) shifting balance theory
of evolution. Further information on the relation of population genetics to
evolutionary theory can be studied further in the books by Wright (1968–1978),
Crow and Kimura (1970) and Maynard Smith (1989).
The change of gene frequencies coupled with changes in the genes
themselves can lead to the emergence of new species although the process
is far from simple and not fully understood (Futuyma, 1986, Maynard Smith
1993). The nature of the species concept or (for some) concepts which is
central to Darwinism is complicated and will not be discussed here (see e.g.
Futuyma 1986). Several mechanisms apply to promote speciation (Maynard
Population thinking 37
References
Alberts B, Bray D, Lewis J, Raff M, Roberts K and Watson J D 1989 Molecular Biology
of the Cell (New York: Garland)
Axelrod R 1984 The Evolution of Co-operation (Harmondsworth: Penguin)
Beaumont M A 1993 Evolution of optimal behaviour in networks of Boolean automata
J. Theor. Biol. 165 455–76
Changeux J-P and Dehaene S 1989 Neuronal models of cognitive functions Cognition
33 63-109
Clarke B, Mittenthal J E and Senn M 1993 A model for the evolution of networks of
genes J. Theor. Biol. 165 269–89
Collins R 1994 Artificial evolution and the paradox of sex Computing with Biological
Metaphors ed R C Paton (London: Chapman and Hall)
Crow J F and Kimura M 1970 An Introduction to Population Genetics Theory (New
York: Harper and Row)
Darden L 1991 Theory Change in Science (New York: Oxford University Press)
Darden L and Cain J A 1987 Selection type theories Phil. Sci. 56 106–29
Futuyma D J 1986 Evolutionary Biology (MA: Sinauer)
Goodwin B C and Saunders P T (eds) 1989 Theoretical Biology: Epigenetic and
Evolutionary Order from Complex Systems (Edinburgh: Edinburgh University
Press)
Hamilton W D, Axelrod A and Tanese R 1990 Sexual reproduction as an adaptation to
resist parasites Proc. Natl Acad. Sci. USA 87 3566–73
Hilario E and Gogarten J P 1993 Horizontal transfer of ATPase genes—the tree of life
becomes a net of life BioSystems 31 111–9
Kauffman S A 1993 The Origins of Order (New York: Oxford University Press)
Kimura, M 1983 The Neutral Theory of Molecular Evolution (Cambridge: Cambridge
University Press)
Landman O E 1991 The inheritance of acquired characteristics Ann. Rev. Genet. 25 1–20
Lewin B 1990 Genes IV (Oxford: Oxford University Press)
Lima de Faria A 1988 Evolution without Selection (Amsterdam: Elsevier)
Margulis L and Foster R (eds) 1991 Symbiosis as a Source of Evolutionary Innovation:
Speciation and Morphogenesis (Cambridge, MA: MIT Press)
Manderick B 1994 The importance of selectionist systems for cognition Computing with
Biological Metaphors ed R C Paton (London: Chapman and Hall)
Maynard Smith J 1989 Evolutionary Genetics (Oxford: Oxford University Press)
——1993 The Theory of Evolution Canto edn (Cambridge: Cambridge University Press)
Neidhart F C, Ingraham J L and Schaechter M 1990 Physiology of the Bacterial Cell
(Sunderland, MA: Sinauer)
Paton R C 1994 Enhancing evolutionary computation using analogues of biological
mechanisms Evolutionary Computing (Lecture Notes in Computer Science 865) ed
T C Fogarty (Berlin: Springer) pp 51–64
Sigmund K 1993 Games of Life (Oxford: Oxford University Press)
Sinnott E W, Dunn L C and Dobzhansky T 1958 Principles of Genetics (New York:
McGraw-Hill)
Sober E 1984 The Nature of Selection: Evolutionary Theory in Philosophical Focus
(Chicago, IL: University of Chicago Press)
References 39
Sumida B and Hamilton W D 1994 Both Wrightian and ‘parasite’ peak shifts enhance
genetic algorithm performance in the travelling salesman problem Computing with
Biological Metaphors ed R C Paton (London: Chapman and Hall)
Van Valen L 1973 A new evolutionary law Evolutionary Theory 1 1–30
Wright S 1968–1978 Evolution and the Genetics of Populations vols 1–4 (Chicago, IL:
Chicago University Press)
6
A history of evolutionary computation
6.1 Introduction
No one will ever produce a completely accurate account of a set of past events
since, as someone once pointed out, writing history is as difficult as forecasting.
Thus we dare to begin our historical summary of evolutionary computation
rather arbitrarily at a stage as recent as the mid-1950s.
At that time there was already evidence of the use of digital computer
models to better understand the natural process of evolution. One of the first
descriptions of the use of an evolutionary process for computer problem solving
appeared in the articles by Friedberg (1958) and Friedberg et al (1959). This
represented some of the early work in machine learning and described the use
of an evolutionary algorithm for automatic programming, i.e. the task of finding
a program that calculates a given input–output function. Other founders in the
field remember a paper of Fraser (1957) that influenced their early work, and
there may be many more such forerunners depending on whom one asks.
In the same time frame Bremermann presented some of the first attempts
to apply simulated evolution to numerical optimization problems involving both
linear and convex optimization as well as the solution of nonlinear simultaneous
equations (Bremermann 1962). Bremermann also developed some of the
early evolutionary algorithm (EA) theory, showing that the optimal mutation
probability for linearly separable problems should have the value of 1/ in the
case of bits encoding an individual (Bremermann et al 1965).
Also during this period Box developed his evolutionary operation (EVOP)
ideas which involved an evolutionary technique for the design and analysis of
(industrial) experiments (Box 1957, Box and Draper 1969). Box’s ideas were
never realized as a computer algorithm, although Spendley et al (1962) used
them as the basis for their so-called simplex design method. It is interesting to
note that the REVOP proposal (Satterthwaite 1959a, b) introducing randomness
into the EVOP operations was rejected at that time.
40
Evolutionary programming 41
As is the case with many ground-breaking efforts, these early studies were
met with considerable skepticism. However, by the mid-1960s the bases for
what we today identify as the three main forms of EA were clearly established.
The roots of evolutionary programming (EP) (Chapter 10) were laid by Lawrence
Fogel in San Diego, California (Fogel et al 1966) and those of genetic algorithms
(GAs) (Chapter 8) were developed at the University of Michigan in Ann Arbor
by Holland (1967). On the other side of the Atlantic Ocean, evolution strategies
(ESs) (Chapter 9) were a joint development of a group of three students, Bienert,
Rechenberg, and Schwefel, in Berlin (Rechenberg 1965).
Over the next 25 years each of these branches developed quite independently
of each other, resulting in unique parallel histories which are described in more
detail in the following sections. However, in 1990 there was an organized effort
to provide a forum for interaction among the various EA research communities.
This took the form of an international workshop entitled Parallel Problem
Solving from Nature at Dortmund (Schwefel and Männer 1991).
Since that event the interaction and cooperation among EA researchers from
around the world has continued to grow. In the subsequent years special efforts
were made by the organizers of ICGA’91 (Belew and Booker 1991), EP’92
(Fogel and Atmar 1992), and PPSN’92 (Männer and Manderick 1992) to provide
additional opportunities for interaction.
This increased interaction led to a consensus for the name of this new field,
evolutionary computation (EC), and the establishment in 1993 of a journal by the
same name published by MIT Press. The increasing interest in EC was further
indicated by the IEEE World Congress on Computational Intelligence (WCCI)
at Orlando, Florida, in June 1994 (Michalewicz et al 1994), in which one of the
three simultaneous conferences was dedicated to EC along with conferences on
neural networks and fuzzy systems.
That brings us to the present in which the continued growth of the field is
reflected by the many EC events and related activities each year, and its growing
maturity reflected by the increasing number of books and articles about EC.
In order to keep this overview brief, we have deliberately suppressed many
of the details of the historical developments within each of the three main EC
streams. For the interested reader these details are presented in the following
sections.
is required. The best machine generates this prediction, the new symbol is added
to the experienced environment, and the process is repeated. Fogel (1964) (and
Fogel et al (1966)) used ‘nonregressive’ evolution. To be retained, a machine
had to rank in the best half of the population. Saving lesser-adapted machines
was discussed as a possibility (Fogel et al 1966, p 21) but not incorporated.
This general procedure was successfully applied to problems in prediction,
identification, and automatic control (Fogel et al 1964, 1966, Fogel 1968) and
was extended to simulate coevolving populations by Fogel and Burgin (1969).
Additional experiments evolving finite-state machines for sequence prediction,
pattern recognition, and gaming can be found in the work of Lutter and
Huntsinger (1969), Burgin (1969), Atmar (1976), Dearholt (1976), and
Takeuchi (1980).
In the mid-1980s the general EP procedure was extended to alternative
representations including ordered lists for the traveling salesman problem (Fogel
and Fogel 1986), and real-valued vectors for continuous function optimization
(Fogel and Fogel 1986). This led to other applications in route planning
(Fogel 1988, Fogel and Fogel 1988), optimal subset selection (Fogel 1989),
and training neural networks (Fogel et al 1990), as well as comparisons to other
methods of simulated evolution (Fogel and Atmar 1990). Methods for extending
evolutionary search to a two-step process including evolution of the mutation
variance were offered by Fogel et al (1991, 1992). Just as the proper choice of
step sizes is a crucial part of every numerical process, including optimization, the
internal adaptation of the mutation variance(s) is of utmost importance for the
algorithm’s efficiency. This process is called self-adaptation or autoadaptation
in the case of no explicit control mechanism, e.g. if the variances are part of
the individuals’ characteristics and underlie probabilistic variation in a similar
way as do the ordinary decision variables.
In the early 1990s efforts were made to organize annual conferences on EP,
these leading to the first conference in 1992 (Fogel and Atmar 1992). This
conference offered a variety of optimization applications of EP in robotics
(McDonnell et al 1992, Andersen et al 1992), path planning (Larsen and
Herman 1992, Page et al 1992), neural network design and training (Sebald
and Fogel 1992, Porto 1992, McDonnell 1992), automatic control (Sebald et al
1992), and other fields.
First contacts were made between the EP and ES communities just
before this conference, and the similar but independent paths that these two
approaches had taken to simulating the process of evolution were clearly
apparent. Members of the ES community have participated in all successive
EP conferences (Bäck et al 1993, Sprave 1994, Bäck and Schütz 1995, Fogel et
al 1996). There is less similarity between EP and GAs, as the latter emphasize
simulating specific mechanisms that apply to natural genetic systems whereas
EP emphasizes the behavioral, rather than genetic, relationships between parents
and their offspring. Members of the GA and GP communities have, however,
also been invited to participate in the annual conferences, making for truly
44 A history of evolutionary computation
interdisciplinary interaction (see e.g. Altenberg 1994, Land and Belew 1995,
Koza and Andre 1996).
Since the early 1990s, efforts in EP have diversified in many directions.
Applications in training neural networks have received considerable attention
(see e.g. English 1994, Angeline et al 1994, McDonnell and Waagen 1994,
Porto et al 1995), while relatively less attention has been devoted to evolving
fuzzy systems (Haffner and Sebald 1993, Kim and Jeon 1996). Image processing
applications can be found in the articles by Bhattacharjya and Roysam (1994),
Brotherton et al (1994), Rizki et al (1995), and others. Recent efforts to use
EP in medicine have been offered by Fogel et al (1995) and Gehlhaar et al
(1995). Efforts studying and comparing methods of self-adaptation can be
found in the articles by Saravanan et al (1995), Angeline et al (1996), and
others. Mathematical analyses of EP have been summarized by Fogel (1995).
To offer a summary, the initial efforts of L J Fogel indicate some of the
early attempts to (i) use simulated evolution to perform prediction, (ii) include
variable-length encodings, (iii) use representations that take the form of a
sequence of instructions, (iv) incorporate a population of candidate solutions, and
(v) coevolve evolutionary programs. Moreover, Fogel (1963, 1964) and Fogel
et al (1966) offered the early recognition that natural evolution and the human
endeavor of the scientific method are essentially similar processes, a notion
recently echoed by Gell-Mann (1994). The initial prescriptions for operating
on finite-state machines have been extended to arbitrary representations,
mutation operators, and selection methods, and techniques for self-adapting the
evolutionary search have been proposed and implemented. The population size
need not be kept constant and there can be a variable number of offspring
per parent, much like the (µ + λ) methods (Section 25.4) offered in ESs. In
contrast to these methods, selection is often made probabilistic in EP, giving
lesser-scoring solutions some probability of surviving as parents into the next
generation. In contrast to GAs, no effort is made in EP to support (some say
maximize) schema processing, nor is the use of random variation constrained
to emphasize specific mechanisms of genetic transfer, perhaps providing greater
versatility to tackle specific problem domains that are unsuitable for genetic
operators such as crossover.
The first glimpses of the ideas underlying genetic algorithms (GAs) are found in
Holland’s papers in the early 1960s (see e.g. Holland 1962). In them Holland set
out a broad and ambitious agenda for understanding the underlying principles
of adaptive systems—systems that are capable of self-modification in response
to their interactions with the environments in which they must function. Such a
theory of adaptive systems should facilitate both the understanding of complex
Genetic algorithms 45
forms of adaptation as they appear in natural systems and our ability to design
robust adaptive artifacts.
In Holland’s view the key feature of robust natural adaptive systems
was the successful use of competition and innovation to provide the ability
to dynamically respond to unanticipated events and changing environments.
Simple models of biological evolution were seen to capture these ideas nicely via
notions of survival of the fittest and the continuous production of new offspring.
This theme of using evolutionary models both to understand natural adaptive
systems and to design robust adaptive artifacts gave Holland’s work a somewhat
different focus than those of other contemporary groups that were exploring the
use of evolutionary models in the design of efficient experimental optimization
techniques (Rechenberg 1965) or for the evolution of intelligent agents (Fogel
et al 1966), as reported in the previous section.
By the mid-1960s Holland’s ideas began to take on various computational
forms as reflected by the PhD students working with Holland. From the outset
these systems had a distinct ‘genetic’ flavor to them in the sense that the
objects to be evolved over time were represented internally as ‘genomes’ and the
mechanisms of reproduction and inheritance were simple abstractions of familiar
population genetics operators such as mutation, crossover, and inversion.
Bagley’s thesis (Bagley 1967) involved tuning sets of weights used in the
evaluation functions of game-playing programs, and represents some of the
earliest experimental work in the use of diploid representations, the role of
inversion, and selection mechanisms. By contrast Rosenberg’s thesis (Rosenberg
1967) has a very distinct flavor of simulating the evolution of a simple
biochemical system in which single-celled organisms capable of producing
enzymes were represented in diploid fashion and were evolved over time to
produce appropriate chemical concentrations. Of interest here is some of the
earliest experimentation with adaptive crossover operators.
Cavicchio’s thesis (Cavicchio 1970) focused on viewing these ideas as a form
of adaptive search, and tested them experimentally on difficult search problems
involving subroutine selection and pattern recognition. In his work we see
some of the early studies on elitist (section 28.4) forms of selection and ideas
for adapting the rates of crossover and mutation. Hollstien’s thesis (Hollstien
1971) took the first detailed look at alternate selection and mating schemes.
Using a test suite of two-dimensional fitness landscapes, he experimented with
a variety of breeding strategies drawn from techniques used by animal breeders.
Also of interest here is Hollstien’s use of binary string encodings of the genome
and early observations about the virtues of Gray codings.
In parallel with these experimental studies, Holland continued to work on
a general theory of adaptive systems (Holland 1967). During this period he
developed his now famous schema analysis of adaptive systems, relating it to
the optimal allocation of trials using k-armed bandit models (Holland 1969).
He used these ideas to develop a more theoretical analysis of his reproductive
plans (simple GAs) (Holland 1971, 1973). Holland then pulled all of these
46 A history of evolutionary computation
ideas together in his pivotal book Adaptation in Natural and Artificial Systems
(Holland 1975).
Of interest was the fact that many of the desirable properties of these
algorithms being identified by Holland theoretically were frequently not
observed experimentally. It was not difficult to identify the reasons for this.
Hampered by a lack of computational resources and analysis tools, most of
the early experimental studies involved a relatively small number of runs using
small population sizes (generally less than 20). It became increasingly clear
that many of the observed deviations from expected behavior could be traced
to the well-known phenomenon in population genetics of genetic drift, the loss
of genetic diversity due to the stochastic aspects of selection, reproduction, and
the like in small populations.
By the early 1970s there was considerable interest in understanding better
the behavior of implementable GAs. In particular, it was clear that choices
of population size, representation issues, the choice of operators and operator
rates all had significant effects of the observed behavior of GAs. Frantz’s thesis
(Frantz 1972) reflected this new focus by studying in detail the roles of crossover
and inversion in populations of size 100. Of interest here is some of the earliest
experimental work on multipoint crossover operators.
De Jong’s thesis (De Jong 1975) broaded this line of study by analyzing
both theoretically and experimentally the interacting effects of population size,
crossover, and mutation on the behavior of a family of GAs being used to
optimize a fixed test suite of functions. Out of this study came a strong sense that
even these simple GAs had significant potential for solving difficult optimization
problems.
The mid-1970s also represented a branching out of the family tree of GAs
as other universities and research laboratories established research activities in
this area. This happened slowly at first since initial attempts to spread the word
about the progress being made in GAs were met with fairly negative perceptions
from the artificial intelligence (AI) community as a result of early overhyped
work in areas such as self-organizing systems and perceptrons.
Undaunted, groups from several universities including the University of
Michigan, the University of Pittsburgh, and the University of Alberta organized
an Adaptive Systems Workshop in the summer of 1976 in Ann Arbor, Michigan.
About 20 people attended and agreed to meet again the following summer. This
pattern repeated itself for several years, but by 1979 the organizers felt the
need to broaden the scope and make things a little more formal. Holland, De
Jong, and Sampson obtained NSF funding for An Interdisciplinary Workshop in
Adaptive Systems, which was held at the University of Michigan in the summer
of 1981 (Sampson 1981).
By this time there were several established research groups working on GAs.
At the University of Michigan, Bethke, Goldberg, and Booker were continuing
to develop GAs and explore Holland’s classifier systems (Chapter 12) as part
of their PhD research (Bethke 1981, Booker 1982, Goldberg 1983). At the
Genetic algorithms 47
and a growing list of journal papers. New paradigms such as messy GAs
(Goldberg et al 1991) and genetic programming (Chapter 11) (Koza 1992)
were being developed. The interactions with other EC communities resulted
in considerable crossbreeding of ideas and many new hybrid EAs. New GA
applications continue to be developed, spanning a wide range of problem areas
from engineering design problems to operations research problems to automatic
programming.
Rudolph 1995, Bäck and Schwefel 1995, Schwefel and Bäck 1995), which on
the one hand define the actual standard ES algorithms and on the other hand
present some recent theoretical results.
References
Altenberg L 1994 Emergent phenomena in genetic programming Proc. 3rd Annu. Conf.
on Evolutionary Programming (San Diego, CA, 1994) ed A V Sebald and L J Fogel
(Singapore: World Scientific) pp 233–41
Andersen B, McDonnell J and Page W 1992 Configuration optimization of mobile
manipulators with equality constraints using evolutionary programming Proc. 1st
Ann. Conf. on Evolutionary Programming (La Jolla, CA, 1992) ed D B Fogel and
W Atmar (La Jolla, CA: Evolutionary Programming Society) pp 71–9
Angeline P J, Fogel D B and Fogel L J 1996 A comparison of self-adaptation methods
for finite state machines in a dynamic environment Evolutionary Programming
V—Proc. 5th Ann. Conf. on Evolutionary Programming (1996) ed L J Fogel,
P J Angeline and T Bäck (Cambridge, MA: MIT Press)
Angeline P J, Saunders G M and Pollack J B 1994 An evolutionary algorithm that
constructs recurrent neural networks IEEE Trans. Neural Networks NN-5 54–65
Atmar J W 1976 Speculation of the Evolution of Intelligence and Its Possible Realization
in Machine Form ScD Thesis, New Mexico State University
Bäck T 1996 Evolutionary Algorithms in Theory and Practice (New York: Oxford
University Press)
Bäck T, Hoffmeister F and Schwefel H-P 1991 A survey of evolution strategies Proc.
4th Int. Conf. on Genetic Algorithms (San Diego, CA, 1991) ed R K Belew and
L B Booker (San Mateo, CA: Morgan Kaufmann) pp 2–9
——1992 Applications of Evolutionary Algorithms Technical Report of the University
of Dortmund Department of Computer Science Systems Analysis Research Group
SYS-2/92
Bäck T, Rudolph G and Schwefel H-P 1993 Evolutionary programming and evolution
strategies: similarities and differences Proc. 2nd Ann. Conf. on Evolutionary
Programming (San Diego, CA, 1993) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp 11–22
Bäck T and Schütz M 1995 Evolution strategies for mixed-integer optimization of
optical multilayer systems Evolutionary Programming IV—Proc. 4th Ann. Conf on
Evolutionary Programming (San Diego, CA, 1995) ed J R McDonnell, R G Reynolds
and D B Fogel (Cambridge, MA: MIT Press) pp 33–51
Bäck T and Schwefel H-P 1995 Evolution strategies I: variants and their computational
implementation Genetic Algorithms in Engineering and Computer Science, Proc. 1st
Short Course EUROGEN-95 ed G Winter, J Périaux, M Galán and P Cuesta (New
York: Wiley) pp 111–26
Bagley J D 1967 The Behavior of Adaptive Systems which Employ Genetic and
Correlation Algorithms PhD Thesis, University of Michigan
Belew R K and Booker L B (eds) 1991 Proc. 4th Int. Conf. on Genetic Algorithms (San
Diego, CA, 1991) (San Mateo, CA: Morgan Kaufmann)
Bethke A D 1981 Genetic Algorithms as Function Optimizers PhD Thesis, University of
Michigan
52 A history of evolutionary computation
Beyer H-G 1995a How GAs do Not Work—Understanding GAs Without Schemata and
Building Blocks Technical Report of the University of Dortmund Department of
Computer Science Systems Analysis Research Group SYS-2/95
——1995b Toward a theory of evolution strategies: on the benefit of sex—the (µ/µ, λ)-
theory Evolutionary Comput. 3 81–111
Bhattacharjya A K and Roysam B 1994 Joint solution of low-, intermediate- and high-
level vision tasks by evolutionary optimization: application to computer vision at
low SNR IEEE Trans. Neural Networks NN-5 83–95
Bienert P 1967 Aufbau einer Optimierungsautomatik für drei Parameter Dipl.-Ing. Thesis,
Technical University of Berlin, Institute of Measurement and Control Technology
Booker L 1982 Intelligent Behavior as an Adaptation to the Task Environment PhD
Thesis, University of Michigan
Born J 1978 Evolutionsstrategien zur numerischen Lösung von Adaptationsaufgaben PhD
Thesis, Humboldt University at Berlin
Box G E P 1957 Evolutionary operation: a method for increasing industrial productivity
Appl. Stat. 6 81–101
Box G E P and Draper N P 1969 Evolutionary Operation. A Method for Increasing
Industrial Productivity (New York: Wiley)
Bremermann H J 1962 Optimization through evolution and recombination Self-
Organizing Systems ed M C Yovits et al (Washington, DC: Spartan)
Bremermann H J, Rogson M and Salaff S 1965 Search by evolution Biophysics
and Cybernetic Systems—Proc. 2nd Cybernetic Sciences Symp. ed M Maxfield,
A Callahan and L J Fogel (Washington, DC: Spartan) pp 157–67
Brindle A 1981 Genetic Algorithms for Function Optimization PhD Thesis, University
of Alberta
Brotherton T W, Simpson P K, Fogel D B and Pollard T 1994 Classifier design using
evolutionary programming Proc. 3rd Ann. Conf. on Evolutionary Programming (San
Diego, CA, 1994) ed A V Sebald and L J Fogel (Singapore: World Scientific)
pp 68–75
Burgin G H 1969 On playing two-person zero-sum games against nonminimax players
IEEE Trans. Syst. Sci. Cybernet. SSC-5 369–70
Cavicchio D J 1970 Adaptive Search Using Simulated Evolution PhD Thesis, University
of Michigan
Davis L 1987 Genetic Algorithms and Simulated Annealing (London: Pitman)
Dearholt D W 1976 Some experiments on generalization using evolving automata Proc.
9th Int. Conf. on System Sciences (Honolulu, HI) pp 131–3
De Jong K A 1975 Analysis of Behavior of a Class of Genetic Adaptive Systems PhD
Thesis, University of Michigan
English T M 1994 Generalization in populations of recurrent neural networks Proc. 3rd
Ann. Conf. on Evolutionary Programming (San Diego, CA, 1994) ed A V Sebald
and L J Fogel (Singapore: World Scientific) pp 26–33
Fogel D B 1988 An evolutionary approach to the traveling salesman problem Biol.
Cybernet. 60 139–44
——1989 Evolutionary programming for voice feature analysis Proc. 23rd Asilomar
Conf. on Signals, Systems and Computers (Pacific Grove, CA) pp 381–3
——1995 Evolutionary Computation: Toward a New Philosophy of Machine Intelligence
(New York: IEEE)
References 53
Fogel D B and Atmar J W 1990 Comparing genetic operators with Gaussian mutations
in simulated evolutionary processing using linear systems Biol. Cybernet. 63 111–4
——(eds) 1992 Proc. 1st Ann. Conf. on Evolutionary Programming (La Jolla, CA, 1992)
(La Jolla, CA: Evolutionary Programming Society)
Fogel D B and Fogel L J 1988 Route optimization through evolutionary programming
Proc. 22nd Asilomar Conf. on Signals, Systems and Computers (Pacific Grove, CA)
pp 679–80
Fogel D B, Fogel L J and Atmar J W 1991 Meta-evolutionary programming Proc. 25th
Asilomar Conf. on Signals, Systems and Computers (Pacific Grove, CA) ed R R Chen
pp 540–5
Fogel D B, Fogel L J, Atmar J W and Fogel G B 1992 Hierarchic methods of evolutionary
programming Proc. 1st Ann. Conf. on Evolutionary Programming (La Jolla, CA,
1992) ed D B Fogel and W Atmar (La Jolla, CA: Evolutionary Programming
Society) pp 175–82
Fogel D B, Fogel L J and Porto V W 1990 Evolving neural networks Biol. Cybernet. 63
487–93
Fogel D B, Wasson E C and Boughton E M 1995 Evolving neural networks for detecting
breast cancer Cancer Lett. 96 49–53
Fogel L J 1962 Autonomous automata Industrial Res. 4 14–9
——1963 Biotechnology: Concepts and Applications (Englewood Cliffs, NJ: Prentice-
Hall)
——1964 On the Organization of Intellect PhD Thesis, University of California at Los
Angeles
——1968 Extending communication and control through simulated evolution
Bioengineering—an Engineering View Proc. Symp. on Engineering Significance of
the Biological Sciences ed G Bugliarello (San Francisco, CA: San Francisco Press)
pp 286–304
Fogel L J, Angeline P J and Bäck T (eds) 1996 Evolutionary Programming V—Proc. 5th
Ann. Conf. on Evolutionary Programming (1996) (Cambridge, MA: MIT Press)
Fogel L J and Burgin G H 1969 Competitive Goal-seeking through Evolutionary
Programming Air Force Cambridge Research Laboratories Final Report Contract
AF 19(628)-5927
Fogel L J and Fogel D B 1986 Artificial Intelligence through Evolutionary Programming
US Army Research Institute Final Report Contract PO-9-X56-1102C-1
Fogel L J, Owens A J and Walsh M J 1964 On the evolution of artificial intelligence
Proc. 5th Natl Symp. on Human Factors in Electronics (San Diego, CA: IEEE)
——1965 Artificial intelligence through a simulation of evolution Biophysics and
Cybernetic Systems ed A Callahan, M Maxfield and L J Fogel (Washington, DC:
Spartan) pp 131–56
——1966 Artificial Intelligence through Simulated Evolution (New York: Wiley)
Frantz D R 1972 Non-linearities in Genetic Adaptive Search PhD Thesis, University of
Michigan
Fraser A S 1957 Simulation of genetic systems by automatic digital computers Aust. J.
Biol. Sci. 10 484–99
Friedberg R M 1958 A learning machine: part I IBM J. 2 2–13
Friedberg R M, Dunham B and North J H 1959 A learning machine: part II IBM J. 3
282–7
54 A history of evolutionary computation
Fürst H, Müller P H and Nollau V 1968 Eine stochastische Methode zur Ermittlung
der Maximalstelle einer Funktion von mehreren Veränderlichen mit experimentell
ermittelbaren Funktionswerten und ihre Anwendung bei chemischen Prozessen
Chem.–Tech. 20 400–5
Gehlhaar et al 1995Gehlhaar D K et al 1995 Molecular recognition of the inhibitor
AG-1343 by HIV-1 protease: conformationally flexible docking by evolutionary
programming Chem. Biol. 2 317–24
Gell-Mann M 1994 The Quark and the Jaguar (New York: Freeman)
Goldberg D E 1983 Computer-Aided Gas Pipeline Operation using Genetic Algorithms
and Rule Learning PhD Thesis, University of Michigan
——1989 Genetic Algorithms in Search, Optimization and Machine Learning (Reading,
MA: Addison-Wesley)
Goldberg D E, Deb K and Korb B 1991 Don’t worry, be messy Proc. 4th Int. Conf. on
Genetic Algorithms (San Diego, CA, 1991) ed R K Belew and L B Booker (San
Mateo, CA: Morgan Kaufmann) pp 24–30
Grefenstette J J (ed) 1985 Proc. 1st Int. Conf. on Genetic Algorithms and Their
Applications (Pittsburgh, PA, 1985) (Hillsdale, NJ: Erlbaum)
——1987 Proc. 2nd Int. Conf. on Genetic Algorithms and Their Applications (Cambridge,
MA, 1987) (Hillsdale, NJ: Erlbaum)
Haffner S B and Sebald A V 1993 Computer-aided design of fuzzy HVAC
controllers using evolutionary programming Proc. 2nd Ann. Conf. on Evolutionary
Programming (San Diego, CA, 1993) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp 98–107
Heydt G T 1970 Directed Random Search PhD Thesis, Purdue University
Hoffmeister F and Schwefel H-P 1990 A taxonomy of parallel evolutionary algorithms
Parcella ’90, Proc. 5th Int. Workshop on Parallel Processing by Cellular Automata
and Arrays vol 2, ed G Wolf, T Legendi and U Schendel (Berlin: Academic)
pp 97–107
Holland J H 1962 Outline for a logical theory of adaptive systems J. ACM 9 297–314
——1967 Nonlinear environments permitting efficient adaptation Computer and
Information Sciences II (New York: Academic)
——1969 Adaptive plans optimal for payoff-only environments Proc. 2nd Hawaii Int.
Conf. on System Sciences pp 917–20
——1971 Processing and processors for schemata Associative information processing ed
E L Jacks (New York: Elsevier) pp 127–46
——1973 Genetic algorithms and the optimal allocation of trials SIAM J. Comput. 2
88–105
——1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI: University of
Michigan Press)
Hollstien R B 1971 Artificial Genetic Adaptation in Computer Control Systems PhD
Thesis, University of Michigan
Kim J-H and Jeon J-Y 1996 Evolutionary programming-based high-precision controller
design Evolutionary Programming V—Proc. 5th Ann. Conf. on Evolutionary
Programming (1996) ed L J Fogel, P J Angeline and T Bäck (Cambridge, MA:
MIT Press)
Klockgether J and Schwefel H-P 1970 Two-phase nozzle and hollow core jet experiments
Proc. 11th Symp. on Engineering Aspects of Magnetohydrodynamics ed D G Elliott
(Pasadena, CA: California Institute of Technology) pp 141–8
References 55
Thomas Bäck
59
60 Introduction to evolutionary algorithms
References
Bäck T 1996 Evolutionary Algorithms in Theory and Practice (New York: Oxford
University Press)
Bäck T and Schwefel H-P 1993 An overview of evolutionary algorithms for parameter
optimization Evolutionary Computation 1(1) 1–23
Fogel D B 1992 Evolving Artificial Intelligence PhD Thesis, University of California,
San Diego
——1995 Evolutionary Computation: Toward a New Philosophy of Machine Intelligence
(Piscataway, NJ: IEEE)
Fogel L J 1962 Autonomous automata Industr. Res. 4 14–9
Fogel L J, Owens A J and Walsh M J 1966 Artificial Intelligence through Simulated
Evolution (New York: Wiley)
Holland J H 1962 Outline for a logical theory of adaptive systems J. ACM 3 297–314
——1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI: University of
Michigan Press)
Rechenberg I 1965 Cybernetic solution path of an experimental problem Library
Translation No 1122 Royal Aircraft Establishment, Farnborough, UK
——1973 Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der
biologischen Evolution (Stuttgart: Frommann-Holzboog)
Schwefel H-P 1965 Kybernetische Evolution als Strategie der experimentellen Forschung
in der Strömungstechnik Diplomarbeit, Technische Universität, Berlin
——1977 Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrate-
gie Interdisciplinary Systems Research, vol 26 (Basel: Birkhäuser)
Further reading
The introductory section to evolutionary algorithms certainly provides the right
place to mention the most important books on evolutionary computation and its
subdisciplines. The following list is not intended to be complete, but only to
guide the reader to the literature.
1. Bäck T 1996 Evolutionary Algorithms in Theory and Practice (New York: Oxford
University Press)
A presentation and comparison of evolution strategies, evolutionary programming,
and genetic algorithms with respect to their behavior as parameter optimization
methods. Furthermore, the role of mutation and selection in genetic algorithms is
discussed in detail, arguing that mutation is much more useful than usually claimed
in connection with genetic algorithms.
Further reading 63
Larry J Eshelman
8.1 Introduction
Genetic algorithms (GAs) are a class of evolutionary algorithms first proposed
and analyzed by John Holland (1975). There are three features which distinguish
GAs, as first proposed by Holland, from other evolutionary algorithms: (i)
the representation used—bitstrings (Chapter 15); (ii) the method of selection—
proportional selection (Chapter 23) ; and (iii) the primary method of producing
variations—crossover (Chapter 33). Of these three features, however, it
is the emphasis placed on crossover which makes GAs distinctive. Many
subsequent GA implementations have adopted alternative methods of selection,
and many have abandoned bitstring representations for other representations
more amenable to the problems being tackled. Although many alternative
methods of crossover have been proposed, in almost every case these variants
are inspired by the spirit which underlies Holland’s original analysis of GA
behavior in terms of the processing of schemata or building blocks. It should be
pointed out, however, that the evolution strategy paradigm (Chapter 9) has added
crossover to its repertoire, so that the distinction between classes of evolutionary
algorithms has become blurred (Bäck et al 1991).
We shall begin by outlining what might be called the canonical GA, similar
to that described and analyzed by Holland (1975) and Goldberg (1987). We
shall introduce a framework for describing GAs which is richer than needed
but which is convenient for describing some variations with regard to the
method of selection. First we shall introduce some terminology. The individual
structures are often referred to as chromosomes. They are the genotypes that
are manipulated by the GA. The evaluation routine decodes these structures
into some phenotypical structure and assigns a fitness value. Typically, but not
necessarily, the chromosomes are bitstrings. The value at each locus on the
bitstring is referred to as an allele. Sometimes the individuals loci are also
called genes. At other times genes are combinations of alleles that have some
phenotypical meaning, such as parameters.
64
Genetic algorithm basics and some variations 65
begin
t = 0;
initialize P(t);
evaluate structures in P(t);
while termination condition not satisfied do
begin
t = t + 1;
select_repro C(t) from P(t-1);
recombine and mutate structures in C(t)
forming C’(t);
evaluate structures in C’(t);
select_replace P(t) from C’(t) and P(t-1);
end
end
After the new offspring have been created via the genetic operators the two
populations of parents and children must be merged to create a new population.
Since most GAs maintain a fixed-sized population M, this means that a total
of M individuals need to be selected from the parent and child populations to
create a new population. One possibility is to use all the children generated
(assuming that the number is not greater than M) and randomly select (without
any bias) individuals from the old population to bring the new population up
to size M. If only one or two new offspring are produced, this in effect means
randomly replacing one or two individuals in the old population with the new
offspring. (This is what Holland’s original proposal did.) On the other hand, if
the number of offspring created is equal to M, then the old parent population is
66 Genetic algorithms
of the M members of the child population are chosen. Depending upon the
implementation, the selection of the child to be replaced by the best individual
from the parent population may or may not be biased.
A number of GA variations make use of biased replacement selection.
Whitley’s GENITOR, for example, creates one child each cycle, selecting the
parents using ranked selection, and then replacing the worst member of the
population with the new child (Whitley 1989). Syswerda’s steady-state GA
creates two children each cycle, selecting parents using ranked selection, and
then stochastically choosing two individuals to be replaced, with a bias towards
the worst individuals in the parent population (Syswerda 1989). Eshelman’s
CHC uses unbiased reproductive selection by randomly pairing all the members
of the parent population, and then replacing the worst individuals of the parent
population with the better individuals of the child population. (In effect, the
offspring and parent populations are merged and the best M (population size)
individuals are chosen.) Since the new offspring are only chosen by CHC if
they are better than the members of the parent population, the selection of both
the offspring and parent populations is biased (Eshelman 1991).
These methods of replacement selection, and especially that of CHC,
resemble the (µ+λ) ES method of selection (Section 25.4) sometimes originally
used by evolution strategies (ESs) (Bäck et al 1991). From µ parents λ
offspring are produced; the µ parents and λ offspring are merged; and the
best µ individuals are chosen to form the new parent population. The other
ES selection method, (µ, λ) ES (Section 25.4), places all the bias in the child
selection stage. In this case, µ parents produce λ offspring (λ > µ), and the best
µ offspring are chosen to replace the parent population. Mühlenbein’s breeder
GA also uses this selection mechanism (Mühlenbein and Schlierkamp-Voosen
1993).
Often a distinction is made between generational and steady-state GAs
(Section 28.3). Unfortunately, this distinction tends to merge two properties that
are quite independent: whether the replacement strategy of the GA is biased
or not and whether the GA produces one (or two) versus many (usually M)
offspring each cycle. Syswerda’s steady-state GA, like Whitley’s GENITOR,
allows only one mating per cycle and uses a biased replacement selection,
but there are also GAs that combine multiple matings per cycle with biased
replacement selection (CHC) as well as a whole class of ESs ((µ + λ) ES).
Furthermore, the GA described by Holland (1975) combined a single mating per
cycle and unbiased replacement selection. Of these two features, it would seem
that the most significant is the replacement strategy. De Jong and Sarma (1993)
found that the main difference between GAs allowing many matings versus few
matings per cycle is that the latter have a higher variance in performance.
The choice between a biased and an unbiased replacement strategy, on the
other hand, is a major determinant of GA behavior. First, if biased replacement
is used in combination with biased reproduction, then the problem of premature
convergence is likely to be compounded. (Of course this will depend upon
68 Genetic algorithms
other factors, such as the size of the population, whether ranked selection is
used, and, if so, the setting of the selection bias parameter.) Second, the obvious
shortcoming of unbiased replacement selection can turn out to be a strength. On
the negative side, replacing the parents by the children, with no mechanism for
keeping those parents that are better than any of the children, risks losing,
perhaps forever, very good individuals. On the other hand, replacing the
parents by the children can allow the algorithm to wander, and it may be
able to wander out of a local minimum that would trap a GA relying upon
biased replacement selection. Which is the better strategy cannot be answered
except in the context of the other mechanisms of the algorithm (as well as the
nature of the problem being solved). Both Syswerda’s steady-state GA and
Whitley’s GENITOR combine a biased replacement strategy with a mechanism
for eliminating children which are duplicates of any member in the parent
population. CHC uses unbiased reproductive selection, relying solely upon
biased replacement selection as its only source of selection pressure, and uses
several mechanisms for maintaining diversity (not mating similar individuals and
seeded restarts), which allow it to take advantage of the preserving properties
of a deterministic replacement strategy without suffering too severely from its
shortcomings.
these features as building blocks scattered throughout the population and tries to
recombine them into better individuals via crossover. Sometimes crossover will
combine the worst features from the two parents, in which case these children
will not survive for long. But sometimes it will recombine the best features from
two good individuals, creating even better individuals, provided these features
are compatible.
Suppose that the representation is the classical bitstring representation:
individual solutions in our population are represented by binary strings of zeros
and ones of length L. A GA creates new individuals via crossover by choosing
two strings from the parent population, lining them up, and then creating two
new individuals by swapping the bits at random between the strings. (In some
GAs only one individual is created and evaluated, but the procedure is essentially
the same.) Holland originally proposed that the swapping be done in segments,
not bit by bit. In particular, he proposed that a single locus be chosen at random
and all bits after that point be swapped. This is known as one-point crossover.
Another common form of crossover is two-point crossover which involves
choosing two points at random and swapping the corresponding segments from
the two parents defined by the two points. There are of course many possible
variants. The best known alternative to one- and two-point crossover is uniform
crossover. Uniform crossover randomly swaps individual bits between the two
parents (i.e. exchanges between the parents the values at loci chosen at random).
Following Holland, GA behavior is typically analyzed in terms of schemata.
Given a space of structures represented by bitstrings of length L, schemata
represent partitions of the search space. If the bitstrings of length L are
interpreted as vectors in a L-dimensional hypercube, then schemata are
hyperplanes of the space. A schema can be represented by a string of L symbols
from the set 0, 1, # where # is a ‘wildcard’ matching either 0 or 1. Each string
of length L may be considered a sample from the partition defined by a schema
if it matches the schema at each of the defined positions (i.e. the non-# loci).
For example, the string 011001 instantiates the schema 01##0#. Each string, in
fact, instantiates 2L schemata.
Two important schema properties are order and defining length. The order of
a schema is the number of defined loci (i.e. the number of non-# symbols). For
example the schema #01##1### is an order 3 schema. The defining length is
the distance between the loci of the first and last defined positions. The defining
length of the above schema is four since the loci of the first and last defined
positions are 2 and 6.
From the hyperplane analysis point of view, a GA can be interpreted as
focusing its search via crossover upon those hyperplane partition elements that
have on average produced the best-performing individuals. Over time the search
becomes more and more focused as the population converges since the degree of
variation used to produce new offspring is constrained by the remaining variation
in the population. This is because crossover has the property that Radcliffe refers
to as respect—if two parents are instances of the same schema, the child will
70 Genetic algorithms
as one- or two-point crossover and assuming that the important building blocks
are of short defining length. Unfortunately, for the types of problem to which
GAs are supposedly ideally suited—those that are highly complex with no
tractable analytical solution—there is no a priori reason to assume that the
problem will, or even can, be represented so that important building blocks will
be those with short defining length. To handle this problem Holland proposed an
inversion operator that could reorder the loci on the string, and thus be capable
of finding a representation that had building blocks with short defining lengths.
The inversion operator, however, has not proven sufficiently effective in practice
at recoding strings on the fly. To overcome this linkage problem, Goldberg has
proposed what he calls messy GAs, but, before discussing messy GAs, it will
be helpful to describe a class of problems that illustrate these linkage issues:
deceptive problems.
Deception is a notion introduced by Goldberg (1987). Consider two
incompatible schemata, A and B. A problem is deceptive if the average fitness of
A is greater than B even though B includes a string that has a greater fitness than
any member of A. In practice this means that the lower-order building blocks
lead the GA away from the global optimum. For example, consider a problem
consisting of five-bit segments for which the fitness of each is determined as
follows (Liepins and Vose 1991). For each one the segment receives a point,
and thus five points for all ones, but for all zeros it receives a value greater
than five. For problems where the value of the optimum is between five and
eight the problem is fully deceptive (i.e. all relevant lower-order hyperplanes
lead toward the deceptive attractor). The total fitness is the sum of the fitness
of the segments.
It should be noted that it is probably a mistake to place too much emphasis on
the formal definition of deception (Grefenstette 1993). What is really important
is the concept of being misled by the lower-order building blocks. Whereas
the formal definition of deception stresses the average fitness of the hyperplanes
taken over the entire search space, selection only takes into account the observed
average fitness of hyperplanes (those in the actual population). The interesting
set of problems is those that are misleading in that manipulation of the lower-
order building blocks is likely to lead the search away from the middle-level
building blocks that constitute the optimum solution, whether these middle-level
building blocks are deceptive in the formal sense or not. In the above class of
functions, even when the value of the optimum is greater than eight (and so
not fully deceptive), but still not very large, e.g. ten, the problem is solvable
by a GA using segment-based crossover, very difficult for a GA using bitwise
uniform crossover, and all but impossible for a poolwise-based algorithm like
BSC.
As long as the deceptive problem is represented so that the loci of the
positions defining the building blocks are close together on the string, it meets
Holland’s original assumption that the important building blocks are of short
defining length. The GA will be able to exploit this information using one- or
Mutation and crossover 73
building blocks can provide an added value over CCV. It still is an open
question, however, as to how representative deceptive problems are of the types
of real-world problem that GAs might encounter. No doubt, many difficult
real-world problems have deceptive or misleading elements in them. If they did
not, they could be easily solved by local search methods. However it does not
necessarily follow that such problems can be solved by a GA that is good at
solving deceptive problems. The SBBH assumes that the misleading building
blocks will exist in the initial population, that they can be identified early in the
search before they are lost, and that the problem can be solved incrementally
by combining these building blocks, but perhaps the building blocks that have
misleading alternatives have little meaning until late in the search and so cannot
be expected to survive in the population.
Even if the SBBH turns out not to be as useful an hypothesis as originally
supposed, the increased propagation capabilities of pairwise mating may give a
GA (using pairwise mating) an advantage over a poolwise CCV algorithm. To
see why this is the case it is useful to define the prototypical individual for a
given population: for each locus we assign a one or a zero depending upon which
value is most frequent in the population (randomly assigning a value if they are
equally frequent). Suppose the population contains some maverick individual
that is quite far from the prototypical individual although it is near the optimum
(as measured by Hamming distance) but is of only average fitness. Since an
algorithm using a poolwise method of producing offspring will tend to produce
individuals that are near the prototypical individual, such an algorithm is unlikely
to explore the region around the maverick individual. On the other hand, a GA
using pairwise mating is more likely to explore the region around the maverick
individual, and so more likely to discover the optimum. Ironically, pairwise
mating is, in this respect, more mutation-like than poolwise mating. While
pairwise mating retains the benefits of CCV, it less subject to the majoritarian
tendencies of poolwise mating.
8.4 Representation
Although GAs typically use a bitstring representation, GAs are not restricted
to bitstrings. A number of early proponents of GAs developed GAs that use
other representations, such as real-valued parameters (Davis 1991, Janikow
and Michalewicz 1991, Wright 1991; see Chapter 16), permutations (Davis
1985, Goldberg and Lingle 1985, Grefenstette et al 1985; see Chapter 17), and
treelike hierarchies (Antonisse and Keller 1987; see Chapter 19). Koza’s genetic
programming (GP) paradigm (Koza 1992; see Chapter 11) is a GA-based method
for evolving programs, where the data structures are LISP S-expressions, and
crossover creates new LISP S-expressions (offspring) by exchanging subtrees
from the two parents.
76 Genetic algorithms
suggested methods for allowing the GA to adapt its own coding. We noted
earlier that Holland proposed the inversion operator for rearranging the loci
in the string. Another approach to adapting the representation is Shaefer’s
ARGOT system (Shaefer 1987). ARGOT contains an explicit parameterized
representation of the mappings from bitstrings to real numbers and heuristics
for triggering increases and decreases in resolution and for shifts in the ranges
of these mappings. A similar idea is employed by Schraudolph and Belew
(1992) who provide a heuristic for increasing the resolution triggered when the
population begins to converge. Mathias and Whitley (1994) have proposed
what they call delta coding. When the population converges, the numeric
representation is remapped so that the parameter ranges are centered around
the best value found so far, and the algorithm is restarted. There are also
heuristics for narrowing or extending the range.
There are also GAs with mechanisms for dynamically adapting the rate
at which GA operators are used or which operator is used. Davis, who has
developed a number of nontraditional operators, proposed a mechanism for
adapting the rate at which these operators are applied based on the past success
of these operators during a run of the algorithm (Davis 1987).
8.6 Conclusion
Although the above discussion has been in the context of GAs as potential
function optimizers, it should be pointed out that Holland’s initial GA work was
in the broader context of exploring GAs as adaptive systems (De Jong 1993).
GAs were designed to be a simulation of evolution, not to solve problems. Of
course, evolution has come up with some wonderful designs, but one must not
lose sight of the fact that evolution is an opportunistic process operating in an
environment that is continuously changing. Simon has described evolution as
a process of searching where there is no goal (Simon 1983). This is not to
question the usefulness of GAs as function optimizers, but only to emphasize
that the perspective of function optimization is somewhat different from that of
adaptation, and that the requirements of the corresponding algorithms will be
somewhat different.
References
Eshelman L J and Schaffer J D 1993 Real-coded genetic algorithms and interval schemata
Foundations of Genetic Algorithms 2 ed D Whitley (San Mateo, CA: Morgan
Kaufmann) pp 187–202
——1995 Productive recombination and propagating and preserving schemata
Foundations of Genetic Algorithms 3 ed D Whitley (San Mateo, CA: Morgan
Kaufmann) pp 299–313
Goldberg D E 1987 Simple genetic algorithms and the minimal, deceptive problem
Genetic Algorithms and Simulated Annealing ed L Davis (San Mateo, CA: Morgan
Kaufmann) pp 74–88
——1989 Genetic Algorithms in Search, Optimization, and Machine Learning (Reading,
MA: Addison-Wesley)
Goldberg D E and Deb K 1991 A comparative analysis of selection schemes used in
genetic algorithms Foundations of Genetic Algorithms ed G J E Rawlins (San
Mateo, CA: Morgan Kaufmann) pp 69–93
Goldberg D E, Deb K, Kargupta H and Harik G 1993 Rapid, accurate optimization
of difficult problems using fast messy genetic algorithms Proc. 5th Int. Conf. on
Genetic Algorithms (Urbana-Champaign, IL, 1993) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 56–64
Goldberg D E, Deb K and Korb B 1991 Don’t worry, be messy Proc. 4th Int. Conf. on
Genetic Algorithms (San Diego, CA, 1991) ed R K Belew and L B Booker (San
Mateo, CA: Morgan Kaufmann) pp 24–30
Goldberg D E and Lingle R L 1985 Alleles, loci, and the traveling salesman problem
Proc. 1st Int. Conf. on Genetic Algorithms (Pittsburgh, PA, 1985) ed J J Grefenstette
(Hillsdale, NJ: Erbaum) pp 154–9
Gordon V S and Whitley 1993 Serial and parallel genetic algorithms and function
optimizers Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL,
1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 177–83
Grefenstette J J 1993 Deception considered harmful Foundations of Genetic Algorithms
2 ed D Whitley (San Mateo, CA: Morgan Kaufmann) pp 75–91
Grefenstette J J, Gopal R, Rosmaita B J and Van Gucht D 1985 Genetic algorithms for the
traveling salesman problem Proc. 1st Int. Conf. on Genetic Algorithms (Pittsburgh,
PA, 1985) ed J J Grefenstette (Hillsdale, NJ: Erbaum) pp 160–8
Holland J H 1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI:
University of Michigan Press)
Janikow C Z and Michalewicz Z 1991 An experimental comparison of binary and
floating point representations in genetic algorithms Proc. 4th Int. Conf. on Genetic
Algorithms (San Diego, CA, 1991) ed R K Belew and L B Booker (San Mateo, CA:
Morgan Kaufmann) pp 31–6
Koza J 1992 Genetic Programming: on the Programming of Computers by Means of
Natural Selection and Genetics (Cambridge, MA: MIT Press)
Liepins G E and Vose M D 1991 Representational issues in genetic optimization J. Exp.
Theor. AI 2 101–15
Mathias K E and Whitley L D 1994 Changing representations during search: a
comparative study of delta coding Evolutionary Comput. 2
Mühlenbein H and Schlierkamp-Voosen 1993 The science of breeding and its application
to the breeder genetic algorithm Evolutionary Comput. 1
80 Genetic algorithms
Radcliffe N J 1991 Forma analysis and random respectful recombination Proc. 4th Int.
Conf. on Genetic Algorithms (San Diego, CA, 1991) ed R K Belew and L B Booker
(San Mateo, CA: Morgan Kaufmann) pp 222–9
Schaffer J D, Eshelman L J and Offutt D 1991 Spurious correlations and premature
convergence in genetic algorithms Foundations of Genetic Algorithms ed
G J E Rawlins (San Mateo, CA: Morgan Kaufmann) pp 102–12
Schraudolph N N and Belew R K 1992 Dynamic parameter encoding for genetic
algorithms Machine Learning 9 9–21
Shaefer C G 1987 The ARGOT strategy: adaptive representation genetic optimizer
technique Genetic Algorithms and Their Applications: Proc. 2nd Int. Conf. on
Genetic Algorithms (Cambridge, MA, 1987) ed J J Grefenstette (Hillsdale, NJ:
Erlbaum) pp 50–8
Simon H A 1983 Reason in Human Affairs (Stanford, CA: Stanford University Press)
Spears W M and De Jong K A 1991 On the virtues of parameterized uniform crossover
Proc. 4th Int. Conf. on Genetic Algorithms (San Diego, CA, 1991) ed R K Belew
and L B Booker (San Mateo, CA: Morgan Kaufmann) pp 230–6
Syswerda G 1989 Uniform crossover in genetic algorithms Proc. 3rd Int. Conf. on
Genetic Algorithms (Fairfax, VA, 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 2–9
——1991 Schedule optimization using genetic algorithms Handbook of Genetic
Algorithms ed L Davis (New York: Van Nostrand Reinhold) pp 332–49
——1993 Simulated crossover in genetic algorithms Foundations of Genetic Algorithms
2 ed D Whitley (San Mateo, CA: Morgan Kaufmann) pp 239–55
Whitley D 1989 The GENITOR algorithm and selection pressure: why rank-based
allocation of reproductive trials is best Proc. 3rd Int. Conf. on Genetic Algorithms
(Fairfax, VA, 1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 116–21
Whitley D, Starkweather T and Fuquay D 1989 Scheduling problems and traveling
salesmen: the genetic edge recombination operator Proc. 3rd Int. Conf. on
Genetic Algorithms (Fairfax, VA, 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 116–21
Wright A 1991 Genetic algorithms for real parameter optimization Foundations of Genetic
Algorithms ed G J E Rawlins (San Mateo, CA: Morgan Kaufmann) pp 205–18
9
Evolution strategies
Günter Rudolph
81
82 Evolution strategies
is probably due to the fact that Rechenberg (1973) succeeded in analyzing the
simple version in Euclidean space with continuous mutation for several test
problems.
Within this setting the archetype of ESs takes the following form. An
individual a consisting of an element X ∈ Rn is mutated by adding a normally
distributed random vector Z ∼ N (0, In ) that is multiplied by a scalar σ > 0 (In
denotes the unit matrix with rank n). The new point is accepted if it is better than
or equal to the old one, otherwise the old point passes to the next iteration. The
selection decision is based on a simple comparison of the objective function
values of the old and the new point. Assuming that the objective function
f : Rn → R is to be minimized, the simple ES, starting at some point X0 ∈ Rn ,
is determined by the following iterative scheme:
Xt + σt Zt if f (Xt + σt Zt ) ≤ f (Xt )
Xt+1 = (9.1)
Xt otherwise
σt+1 = σt exp(τ Zτ )
Xt+1 = Xt + σt+1 Z
where (τ, η) ∈ R2+ and the independent random numbers Zσ(i) are standard
normally distributed. Note that Zτ is drawn only once.
(iii) Let X ∈ Rn be the object variables and Z be a standard normal random
vector. The mutated object variable vector is given by
According to Schwefel (1995) a good heuristic for the choice of the constants
appearing in the above mutation operation is
but recent extensive simulation studies (Kursawe 1996) revealed that the above
recommendation is not the best choice—especially in the case of multimodal
objective functions it seems to be better to use weak selection pressure (µ/λ not
too small) and a parametrization obeying the relation τ > η. As a consequence,
a final recommendation cannot be given here, yet.
Contemporary evolution strategies 85
follows:
The evolutionary algorithm in the inner loop maximizes f (x, y) with fixed
parameters x, while the outer loop is responsible for minimize g(x)ing over the
set X.
References
Herdy M 1992 Reproductive isolation as strategy parameter in hierachically organized
evolution strategies Parallel Problem Solving from Nature, 2 (Proc. 2nd Int. Conf.
on Parallel Problem Solving from Nature, Brussels, 1992) ed R Männer and B
Manderick (Amsterdam: Elsevier) pp 207–17
88 Evolution strategies
Klockgether J and Schwefel H-P 1970 Two-phase nozzle and hollow core jet experiments
Proc. 11th Symp. on Engineering Aspects of Magnetohydrodynamics ed D Elliott
(Pasadena, CA: California Institute of Technology) pp 141–8
Kursawe F 1996 Breeding evolution strategies—first results, talk presented at Dagstuhl
lectures Applications of Evolutionary Algorithms (March 1996)
Lohmann R 1992 Structure evolution and incomplete induction Parallel Problem Solving
from Nature, 2 (Proc. 2nd Int. Conf. on Parallel Problem Solving from Nature,
Brussels, 1992) ed R Männer and B Manderick (Amsterdam: Elsevier) pp 175–85
Oren S and Luenberger D 1974 Self scaling variable metric (SSVM) algorithms, Part II:
criteria and sufficient conditions for scaling a class of algorithms Management Sci.
20 845–62
Ostermeier A, Gawelczyk A and Hansen N 1995 A derandomized approach to self-
adaptation of evolution strategies Evolut. Comput. 2 369–80
Rechenberg I 1965 Cybernetic solution path of an experimental problem Library
Translation 1122, Royal Aircraft Establishment, Farnborough, UK
——1973 Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der
biologischen Evolution (Stuttgart: Frommann-Holzboog)
——1978 Evolutionsstrategien Simulationsmethoden in der Medizin und Biologie ed
B Schneider and U Ranft (Berlin: Springer) pp 83–114
——1994 Evolutionsstrategie ’94 (Stuttgart: Frommann-Holzboog)
Rudolph G 1992 On correlated mutations in evolution strategies Parallel Problem Solving
from Nature, 2 (Proc. 2nd Int. Conf. on Parallel Problem Solving from Nature,
Brussels, 1992) ed R Männer and B Manderick (Amsterdam: Elsevier) pp 105–14
Schwefel H-P 1965 Kybernetische Evolution als Strategie der experimentellen Forschung
in der Strömungstechnik Diplomarbeit, Technical University of Berlin
——1977 Numerische Optimierung von Computer-Modellen mittels der Evolutionsstrate-
gie (Basel: Birkhäuser)
——1995 Evolution and Optimum Seeking (New York: Wiley)
Schwefel H-P and Rudolph G 1995 Contemporary evolution strategies Advances in
Artificial Life ed F Morana et al (Berlin: Springer) pp 893–907
Sebald A V and Schlenzig J 1994 Minimax design of neural net controllers for highly
uncertain plants IEEE Trans. Neural Networks NN-5 73–82
10
Evolutionary programming
V William Porto
10.1 Introduction
Evolutionary programming (EP) is one of a class of paradigms for simulating
evolution which utilizes the concepts of Darwinian evolution to iteratively
generate increasingly appropriate solutions (organisms) in light of a static or
dynamically changing environment. This is in sharp contrast to earlier research
into artificial intelligence research which largely centered on the search for
simple heuristics. Instead of developing a (potentially) complex set of rules
which were derived from human experts, EP evolves a set of solutions which
exhibit optimal behavior with regard to an environment and desired payoff
function. In a most general framework, EP may be considered an optimization
technique wherein the algorithm iteratively optimizes behaviors, parameters, or
other constructs. As in all optimization algorithms, it is important to note that
the point of optimality is completely independent of the search algorithm, and
is solely determined by the adaptive topography (i.e. response surface) (Atmar
1992).
In its standard form, the basic evolutionary program utilizes the four main
components of all evolutionary computation (EC) algorithms: initialization,
variation, evaluation (scoring), and selection. At the basis of this, as well as
other EC algorithms, is the presumption that, at least in a statistical sense,
learning is encoded phylogenically versus ontologically in each member of
the population. ‘Learning’ is a byproduct of the evolutionary process as
successful individuals are retained through stochastic trial and error. Variation
(e.g. mutation) provides the means for moving solutions around on the search
space, preventing entrapment in local minima. The evaluation function directly
measures fitness, or equivalently the behavioral error, of each member in the
population with regard to the environment. Finally, the selection process
probabilistically culls suboptimal solutions from the population, providing an
efficient method for searching the topography.
The basic EP algorithm starts with a population of trial solutions which are
initialized by random, heuristic, or other appropriate means. The size of the
89
90 Evolutionary programming
population, µ, may range over a broadly distributed set, but is in general larger
than one. Each of these trial solutions is evaluated with regard to the specified
fitness function. After the creation of a population of initial solutions, each
of the parent members is altered through application of a mutation process;
in strict EP, recombination is not utilized. Each parent member i generates
λi progeny which are replicated with a stochastic error mechanism (mutation).
The fitness or behavioral error is assessed for all offspring solutions with the
selection process performed by one of several general techniques including: (i)
the best µ solutions are retained to become the parents for the next generation
(elitist, see Section 28.4), or (ii) µ of the best solutions are statistically retained
( tournament, see Chapter 24), or (iii) proportional-based selection (Chapter 23).
In most applications, the size of the population remains constant, but there is no
restriction in the general case. The process is halted when the solution reaches
a predetermined quality, a specified number of iterations has been achieved, or
some other criterion (e.g. sufficient convergence) stops the algorithm.
EP differs philosophically from other evolutionary computational techniques
such as genetic algorithms (GAs) (Chapter 8) in a crucial manner. EP is a
top-down versus bottom-up approach to optimization. It is important to note
that (according to neo-Darwinism) selection operates only on the phenotypic
expressions of a genotype; the underlying coding of the phenotype is only
affected indirectly. The realization that a sum of optimal parts rarely leads
to an optimal overall solution is key to this philosophical difference. GAs
rely on the identification, combination, and survival of ‘good’ building blocks
(schemata) iteratively combining to form larger ‘better’ building blocks. In a
GA, the coding structure (genotype) is of primary importance as it contains
the set of optimal building blocks discovered through successive iterations.
The building block hypothesis is an implicit assumption that the fitness is a
separable function of the parts of the genome. This successively iterated local
optimization process is different from EP, which is an entirely global approach
to optimization. Solutions (or organisms) in an EP algorithm are judged solely
on their fitness with respect to the given environment. No attempt is made
to partition credit to individual components of the solutions. In EP (and in
evolution strategies (ESs), see Chapter 9), the variation operator allows for
simultaneous modification of all variables at the same time. Fitness, described
in terms of the behavior of each population member, is evaluated directly, and is
the sole basis for survival of an individual in the population. Thus, a crossover
operation designed to recombine building blocks is not utilized in the general
forms of EP.
10.2 History
The genesis of EP (Section 6.2) was motivated by the desire to generate an
alternative approach to artificial intelligence. Fogel (1962) conceived of using
the simulation of evolution to develop artificial intelligence which did not
History 91
Figure 10.1. A simple finite-state machine diagram. Input symbols are shown to the
left of the slash. Output symbols are to the right of the slash. The finite-state machine
is presumed to start in state A.
numbers, five were missed, but of the next 65 primes, none were missed. Fogel
et al (1966) indicated that the machines demonstrated the capability to quickly
recognize numbers which are divisible by two and three as being nonprime,
and that some capability to recognize divisibility by five as being indicative
of nonprimes was also evidenced. Thus, the machines generated evidence of
learning a definition of primeness without prior knowledge of the explicit nature
of a prime number, or any ability to explicitly divide.
Fogel and Burgin (1969) researched the use of EP in game theory. In
a number of experiments, EP was consistently able to discover the globally
optimal strategy in simple two-player, zero-sum games involving a small number
of possible plays. This research also showed the ability of the technique to
outperform human subjects in more complicated games. Several extensions were
made to the simulations to address nonzero-sum games (e.g. pursuit evasion.)
A three-dimensional model was constructed where EP was used to guide an
interceptor towards a moving target. Since the target was, in most circumstances,
allowed a greater degree of maneuverability, the success or failure of the
interceptor was highly dependent upon the learned ability to predict the position
of the target without a priori knowledge of the target’s dynamics.
A different aspect of EP was researched by Walsh et al (1970) where EP
was used for prediction as a precursor to automatic control. This research
concentrated on decomposing a finite-state machine into submachines which
could be executed in parallel to obtain the overall output of the evolved system.
A primary goal of this research was to maximize parsimony in the evolving
machines. In these experiments, finite-state machines containing seven and
eight states were used as the generating function for three output symbols. The
performance of three human subjects was compared to the evolved models when
predicting the next symbol in the respective environments. In these experiments,
EP was consistently able to outperform the human subjects.
10.2.2 Extensions
The basic EP paradigm may be described by the following EP algorithm:
t := 0;
initialize P (0) := a1 (0), a2 (0), . . . , aµ (0)
evaluate P (0) : (a1 (0)), (a2 (0)), . . . , (aµ (0))
iterate
{
mutate: P (t) := mm (P (t))
evaluate: P (t) : (a1 (t)), (a2 (t)), . . . , (aλ (t))
select: P (t + 1) := ss (P (t) ∪ Q)
t := t + 1;
}
History 95
where:
a is an individual member in the population
µ ≥ 1 is the size of the parent population
λ ≥ 1 is the size of the offspring population
P (t) := a1 (t), a2 (t), . . . , aµ (t) is the population at time t
: I → R is the fitness mapping
mm is the mutation operator with controlling
parameters
m
ss is the selection operator ss : I λ ∪ I µ+λ → I µ
Q ∈ {∅, P (t)} is a set of individuals additionally accounted for in the
selection step, i.e. parent solutions.
cull lower-scoring members from the population. Optimization of the tours was
quite rapid. In one such experiment with 1000 cities uniformly distributed, the
best tour (after only 4 × 107 function evaluations) was estimated to be within
5–7% of the optimal tour length. Thus, excellent solutions were obtained after
searching only an extremely small portion of the total potential search space.
EP has also been utilized in a number of medical applications. For
example, the issue of optimizing drug design was researched by Gehlhaar et
al (1995). EP was utilized to perform a conformational and position search
within the binding site of a protein. The search space of small molecules
which could potentially dock with the crystallographically determined binding
site was explored iteratively guided by a database of crystallographic protein–
ligand complexes. Geometries were constrained by known physical (in three
dimensions) and chemical bounds. Results demonstrated the efficacy of this
technique as it was orders of magnitude faster in finding suitable ligands than
previous hands-on methodologies. The probability of successfully predicting the
proper binding modes for these ligands was estimated at over 95% using nominal
values for the crystallographic binding mode and number of docks attempted.
These studies have permitted the rapid development of several candidate drugs
which are currently in clinical trials.
The issue of utilizing EP to control systems has been addressed widely
(Fogel and Fogel 1990, Fogel 1991a, Page et al 1992, and many others).
Automatic control of fuzzy heating, ventilation, and air conditioning (HVAC)
controllers was addressed by Haffner and Sebald (1993). In this study, a
nonlinear, multiple-input, multiple-output (MIMO) model of a HVAC system
was used and controlled by a fuzzy controller designed using EP. Typical fuzzy
controllers often use trial and error methods to determine parameters and transfer
functions, hence they can be quite time consuming with a complex MIMO
HVAC system. These experiments used EP to design the membership functions
and values (later studies were extended to find rules as well as responsibilities
of the primary controller) to automate the tuning procedure. EP worked in
an overall search space containing 76 parameters, 10 controller inputs, seven
controller outputs, and 80 rules. Simulation results demonstrated that EP was
quite effective at choosing the membership functions of the control laboratory
and corridor pressures in the model. The synergy of combining EP with fuzzy
set constructs proved quite fruitful in reducing the time required to design a
stable, functioning HVAC system.
Game theory has always been at the forefront of artificial intelligence
research. One interesting game, the iterated prisoner’s dilemma, has been
studied by numerous investigators (Axelrod 1987, Fogel 1991b, Harrald and
Fogel 1996, and others). In this two-person game, each player can choose
one of two possible behavioral policies: defection or cooperation. Defection
implies increasing one’s own reward at the expense of the opposing player,
while cooperation implies increasing the reward for both players. If the game
is played over a single iteration, the dominant move is defection. If the players’
Current directions 99
Player B
C D
C (γ1 , γ1 ) (γ2 , γ3 )
Player A
D (γ3 , γ2 ) (γ4 , γ4 )
In addition, the payoff matrix defining the game is subject to the following
constraints (Rapoport 1966):
2γ1 > γ2 + γ3
γ3 > γ1 > γ4 > γ2 .
Both neural network approaches (Harrald and Fogel 1996) and finite-state
machine approaches (Fogel 1991b) have been applied to this problem. Finite-
state machines are typically used where there are discrete choices between
cooperation and defection. Neural networks allow for a continuous range of
choices between these two opposite strategies. Results of these preliminary
experiments using EP, in general, indicated that mutual cooperation is more
likely to occur when the behaviors are limited to the extremes (the finite-
state machine representation of the problem), whereas in the neural network
continuum behavioral representation of the problem, it is easier to slip into a
state of mutual defection.
Development of interactively intelligent behaviors was investigated by Fogel
et al (1996). EP was used to optimize computer-generated force (CGF)
behaviors such that they learned new courses of action adaptively as changes
in the environment (i.e. presence or absence of opposing side forces) were
encountered. The actions of the CGFs were created at the response of an event
scheduler which recognized significant changes in the environment as perceived
by the forces under evolution. New plans of action were found during these
event periods by invoking an evolutionary program. The iterative EP process
was stopped when time or CPU limits were met, and relinquished control of the
100 Evolutionary programming
simulated forces back to the CGF simulator after transmitting newly evolved
instruction sets for each simulated unit. This process proved quite successful
and offered a significant improvement over other rule-based systems.
References
Aho A V, Hopcroft J E and Ullman J D 1974 The Design and Analysis of Computer
Algorithms (Reading, MA: Addison-Wesley) pp 143–5, 318–26
Angeline P, Saunders G and Pollack J 1994 Complete induction of recurrent neural
networks Proc. 3rd Ann. Conf. on Evolutionary Programming (San Diego, CA, 1994)
ed A V Sebald and L J Fogel (Singapore: World Scientific) pp 1–8
Atmar W 1992 On the rules and nature of simulated evolutionary programming Proc.
1st Ann. Conf. on Evolutionary Programming (La Jolla, CA, 1992) ed D B Fogel
and W Atmar (San Diego, CA: Evolutionary Programming Society) pp 17–26
Axelrod R 1987 The evolution of strategies in the iterated prisoner’s dilemma Genetic
Algorithms and Simulated Annealing ed L Davis (London) pp 32–42
Bäck T and Schwefel H-P 1993 An overview of evolutionary algorithms for parameter
optimization Evolutionary Comput. 1 1–23
Brotherton T W and Simpson P K 1995 Dynamic feature set training of neural
networks for classification Evolutionary Programming IV: Proc. 4th Ann. Conf. on
Evolutionary Programming (San Diego, CA, 1995) ed J R McDonnell, R G Reynolds
and D B Fogel (Cambridge, MA: MIT Press) pp 83–94
Burton D M 1976 Elementary Number Theory (Boston, MA: Allyn and Bacon) p 136–52
Flood M M 1962 Stochastic learning theory applied to choice experiments with cats,
dogs and men Behavioral Sci. 7 289–314
Fogel D B 1988 An evolutionary approach to the traveling salesman problem Biol.
Cybernet. 60 139–44
——1991a System Identification through Simulated Evolution (Needham, MA: Ginn)
——1991b The evolution of intelligent decision making in gaming Cybernet. Syst. 22
223–36
——1992 Evolving Artificial Intelligence PhD Dissertation, University of California
References 101
Further reading
There are several excellent general references available to the reader interested
in furthering his or her knowledge in this exciting area of EC. The following
books are a few well-written examples providing a good theoretical background
in EP as well as other evolutionary algorithms.
1. Bäck T 1996 Evolutionary Algorithms in Theory and Practice (New York: Oxford
University Press)
4. Schwefel H-P 1995 Evolution and Optimization Seeking (New York: Wiley)
11
Derivative methods in genetic
programming
Kenneth E Kinnear, Jr
11.1 Introduction
This chapter describes the fundamental concepts of genetic programming (GP)
(Koza 1989, 1992). Genetic programming is a form of evolutionary algorithm
which is distinguished by a particular set of choices as to representation,
genetic operator design, and fitness evaluation. When examined in isolation,
these choices define an approach to evolutionary computation (EC) which is
considered by some to be a specialization of the genetic algorithm (GA). When
considered together, however, these choices define a conceptually different
approach to evolutionary computation which leads researchers to explore new
and fruitful lines of research and practical applications.
103
104 Derivative methods in genetic programming
are functions of two inputs, and one is a function of one input. Each produces
a single output and no side effect.
The fitness evaluation for this particular individual is determined by the
effectiveness with which it will produce the correct logical output for all of the
test cases against which it is tested.
One way to characterize the design of a representation for an application
of genetic programming to a particular problem is to view it as the design of
a language, and this can be a useful point of view. Perhaps it is more useful,
however, to view the design of a genetic programming representation as that
of the design of a virtual machine—since usually the execution engine must
be designed and constructed as well as the representation or language that is
executed.
The representation for the program (i.e. the definition of the functions and
terminals) must be designed along with the virtual machine that is to execute
them. Rarely are the programs evolved in genetic programming given direct
control of the central processor of a computer (although see the article by
Nordin (1994)). Usually, these programs are interpreted under control of a
virtual machine which defines the functions and terminals. This includes the
functions which process the data, the terminals that provide the inputs to the
programs, and any control functions whose purpose is to affect the execution
flow of the program.
As part of this virtual machine design task, it is important to note that the
output of any function or the value of any terminal may be used as the input to
any function. Initially, this often seems to be a trivial problem, but when actually
performing the design of the representation and virtual machine to execute that
representation, it frequently looms rather large. Two solutions are typically used
for this problem. One approach is to design the virtual machine, represented by
the choice of functions and terminals, to use only a single data type. In this way,
the output of any function or the value of any terminal is acceptable as input to
any function. A second approach is to allow more than one data type to exist
106 Derivative methods in genetic programming
in the virtual machine. Each function must then be defined to operate on any of
the existing data types. Implicit coercions are performed by each function on
its input to convert the data type that it receives to one that it is more normally
defined to process. Even after handling the data type problem, functions must
be defined over the entire possible range of argument values. Simple arithmetic
division must be defined to return some value even when division by zero is
attempted.
It is important to note that the definition of functions and the virtual machine
that executes them is not restricted to functions whose only action is to provide
a single output value based on their inputs. Genetic programming functions
are often defined whose primary purpose is the actions they take by virtue of
their side-effects. These functions must return some value as well, but their real
purpose is interaction with an environment external to the genetic programming
system.
An additional type of side-effect producing function is one that implements
a control structure within the virtual machine defined to execute the genetically
evolved program. All of the common programming control constructs such as
if–then–else, while–do, for, and others have been implemented as evolvable
control constructs within genetic programming systems. Looping constructs
must be protected in such a way that they will never loop forever, and usually
have an arbitrary limit set on the number of loops which they will execute.
As part of the initialization of a genetic programming run, a large number of
individual programs are generated at random. This is relatively straightforward,
since the genetic programming system is supplied with information about the
number of arguments required by each function, as well as all of the available
terminals. Random program trees are generated using this information, typically
of a relatively small size. The program trees will tend to grow quickly to be
quite large in the absence of some explicit evolutionary pressure toward small
size or some simple hard-coded limits to growth.
parent tree. In practice it yields a offspring tree whose fitness has enough
relationship to that of its parents to support the evolutionary search process.
Variations in this crossover approach are easy to imagine, and are currently the
subject of considerable active research in the genetic programming community
(D’haeseleer 1994, Teller 1996).
Mutation (Chapter 32) is a genetic operator which can be applied to a single
parent program tree to create an offspring tree. The typical mutation operator
used selects a point inside a parent tree, and generates a new random subtree
to replace the selected subtree. This random subtree is usually generated by the
same procedure used to generate the initial population of program trees.
evolved, and are only considered executable when they are undergoing fitness
evaluation.
As genetic programming itself evolved in LISP, the programs that were
executed began to look less and less like LISP programs. They continued to be
tree structured but soon few if any of the functions used in the evolved programs
were standard LISP functions. Around 1992 many people implemented genetic
programming systems in C and C++, along with many other programming
languages. Today, other than a frequent habit of printing the representation
of tree-structured genetic programs in a LISP-like syntax, there is no particular
connection between genetic programming and LISP.
There are many public domain implementations of genetic programming in
a wide variety of programming languages. For further details, see the reading
list at the end of this section.
References
Angeline P J 1996 Two self-adaptive crossover operators for genetic programming
Advances in Genetic Programming 2 ed P J Angeline and K E Kinnear Jr
(Cambridge, MA: MIT Press)
Angeline P J and Pollack J B 1993 Competitive environments evolve better solutions for
complex tasks Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL,
July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann)
Cramer N L 1985 A representation of the adaptive generation of simple sequential
programs Proc. 1st Int. Conf. on Genetic Algorithms (Pittsburgh, PA, July 1985)
ed J J Grefenstette (Hillsdale, NJ: Erlbaum)
D’haeseleer P 1994 Context preserving crossover in genetic programming 1st IEEE Conf.
on Evolutionary Computation (Orlando, FL, June 1994) (Piscataway, NJ: IEEE)
Gruau F 1993 Genetic synthesis of modular neural networks Proc. 5th Int. Conf. on
Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann)
Holland J H 1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI: The
University of Michigan Press)
Iba H, de Garis H and Sato T 1994 Genetic programming using a minimum description
length principle Advances in Genetic Programming ed K E Kinnear Jr (Cambridge,
MA: MIT Press)
Kauffman S A 1993 The Origins of Order: Self-Organization and Selection in Evolution
(New York: Oxford University Press)
Kinnear K E Jr 1993 Generality and difficulty in genetic programming: evolving a sort
Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July, 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann)
——1994 Alternatives in automatic function definition: a comparison of performance
Advances in Genetic Programming ed K E Kinnear Jr (Cambridge, MA: MIT Press)
Koza J R 1989 Hierarchical genetic algorithms operating on populations of computer
programs Proc. 11th Int. Joint Conf. on Artificial Intelligence (San Mateo, CA:
Morgan Kaufmann)
Nordin P 1994 A compiling genetic programming system that directly manipulates the
machine code Advances in Genetic Programming ed K E Kinnear Jr (Cambridge,
MA: MIT Press)
Perkis T 1994 Stack-based genetic programming Proc. 1st IEEE Int. Conf. on
Evolutionary Computation (Orlando, FL, June 1994) (Piscataway, NJ: IEEE)
Reynolds C R 1994 Competition, coevolution and the game of tag Artificial Life IV:
Proc. 4th Int. Workshop on the Synthesis and Simulation of Living Systems ed R A
Brooks and P Maes (Cambridge, MA: MIT Press)
Sims K 1994 Evolving 3D morphology and behavior by competition Artificial Life IV:
Proc. 4th Int. Workshop on the Synthesis and Simulation of Living Systems ed R A
Brooks and P Maes (Cambridge, MA: MIT Press)
Teller A 1996 Evolving programmers: the co-evolution of intelligent recombination
operators Advances in Genetic Programming 2 ed P J Angeline and K E Kinnear
Jr (Cambridge, MA: MIT Press)
112 Derivative methods in genetic programming
Further reading
1. Koza J R 1992 Genetic Programming (Cambridge, MA: MIT Press)
The first book on the subject. Contains full instructions on the possible details of
carrying out genetic programming, as well as a complete explanation of genetic
algorithms (on which genetic programming is based). Also contains 11 chapters
showing applications of genetic programming to a wide variety of typical artificial
intelligence, machine learning, and sometimes practical problems. Gives many
examples of how to design a representation of a problem for genetic programming.
4. Forrest S (ed) 1993 Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign,
IL, July 1993) (San Mateo, CA: Morgan Kaufmann)
5. 1994 1st IEEE Conf. on Evolutionary Computation (Orlando, FL, June 1994)
(Piscataway, NJ: IEEE)
6. Eshelman L J (ed) 1995 Proc. 6th Int. Conf. on Genetic Algorithms (Pittsburgh, PA,
July 1995) (Cambridge, MA: MIT Press)
9. ftp.io.com pub/genetic-programming
Robert E Smith
12.1 Introduction
The learning classifier system (LCS) (Goldberg 1989, Holland et al 1986) is
often referred to as the primary machine learning technique that employs genetic
algorithms (GAs). It is also often described as a production system framework
with a genetic algorithm as the primary rule discovery method. However, the
details of LCS operation vary widely from one implementation to another. In
fact, no standard version of the LCS exists. In many ways, the LCS is more
of a concept than an algorithm. To explain details of the LCS concept, this
article will begin by introducing the type of machine learning problem most
often associated with the LCS. This discussion will be followed by a overview
of the LCS, in its most common form. Final sections will introduce the more
complex issues involved in LCSs.
114
Types of learning problem 115
disturbances
control actions
disturbances
Plant state info.
control actions
disturbances
Plant state info.
control actions
State Evaluator
error in (or "critic")
state
action alters the probability of moving the plant from the current state to any
other state. Note that deterministic environments are a specific case. Although
this discussion will limit itself to discrete problems, most of the points made
can be related directly to continuous problems.
A characteristic of many reinforcement learning problems is that one may
need to consider a sequence of control actions and their results to determine
how to improve the controller. One can examine the implications of this by
associating a reward or cost with each control action. The error in state in
figure 12.3 can be thought of as a cost. One can consider the long-term effects
of an action formally as the expected, infinite-horizon discounted cost:
t=∞
λt c t
t=0
where min Q(j, ut+1 ) is the minimum Q available in state j , which is the state
arrived in after action ut is taken in state i (Barto et al 1991, Watkins 1989).
The parameter α is a learning rate parameter that is typically set to a small
value between zero and one. Arguments based on dynamic programming and
Bellman optimality show that if each state–action pair is tried an infinite number
of times, this procedure results in optimal Q-values. Certainly, it is impractical
to try every state–action pair an infinite number of times. With finite exploration,
Q-values can often be arrived at that are approximately optimal. Regardless of
the method employed to update a strategy in a reinforcement learning problem,
this exploration–exploitation dilemma always exists.
Another difficulty in the Q-value approach is that it requires storage of
a separate Q-value for each state–action pair. In a more practical approach,
one could store a Q-value for a group of state–action pairs that share the
same characteristics. However, it is not clear how state–action pairs should
be grouped. In many ways, the LCS can be thought of as a GA-based technique
for grouping state–action pairs.
Learning classifier system introduction 117
LCS
detectors effectors
external
messages CA
message
space
GA
internal matching
messages
CR rule set
In the ‘Michigan’ approach, one need only evaluate a single rule set, that
comprised by the entire population. However, one cannot use the usual genetic
algorithm procedures that will converge to a homogeneous population, since one
rule is not likely to solve the entire problem. Therefore, one must coevolve a
set of cooperative rules that jointly solve the problem. This requires a genetic
algorithm procedure that yields a diverse population at steady state, in a fashion
that is similar to sharing (Deb and Goldberg 1989, Goldberg and Richardson
1987), or other multimodal genetic algorithm procedures. In some cases simply
dividing reward between similar classifiers that fire can yield sharing-like effects
(Horn et al 1994).
12.7 Parasites
The possibility of rule chains introduced by internal messages, and by ‘payback’
credit allocation schemes such as the bucket brigade or Q-learning, also
introduces the possibility of rule parasites. Simply stated, parasites are rules that
obtain fitness through their participation in a rule chain or a sequence of LCS
Variations of the learning classification system 121
actions, but serve no useful purpose in the problem environment. In some cases,
parasite rules can prosper, while actually degrading overall system performance.
A simple example of parasite rules in LCSs is given by Smith (1994). In this
study, a simple problem is constructed where the only performance objective is
to exploit internal messages as internal memory. Although fairly effective rule
sets were evolved in this problem, parasites evolved that exploited the bucket
brigade, and the existing rule chains, but that were incorrect for overall system
performance. This study speculates that such parasites may be an inevitable
consequence in systems that use temporal credit assignment (such as the bucket
brigade) and evolve internal memory processing.
References
Barto A G 1990 Some Learning Tasks from a Control Perspective COINS Technical
Report 90-122, University of Massachusetts
Barto A G, Bradtke S J and Singh S P 1991 Real-time Learning and Control using
Asynchronous Dynamic Programming COINS Technical Report 91-57, University
of Massachusetts
Booker L B 1982 Intelligent behavior as an adaptation to the task environment
Dissertations Abstracts Int. 43 469B; University Microfilms 8214966
——1985 Improving the performance of genetic algorithms in classifier systems Proc.
Int. Conf. on Genetic Algorithms and Their Applications pp 80–92
——1989 Triggered rule discovery in classifier systems Proc. 3rd Int. Conf. on Genetic
Algorithms (Fairfax, VA, June 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 265–74
Deb K and Goldberg D E 1989 An investigation of niche and species formation in genetic
function optimization Proc. 3rd Int. Conf. on Genetic Algorithms (Fairfax, VA, June
1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 42–50
Goldberg D E 1989 Genetic Algorithms in Search, Optimization, and Machine Learning
(Reading, MA: Addison-Wesley)
Goldberg D E and Richardson J 1987 Genetic algorithms with sharing for multimodal
function optimization Proc. 2nd Int. Conf. on Genetic Algorithms (Cambridge, MA,
1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 41–9
Holland J H, Holyoak K J, Nisbett R E and Thagard P R 1986 Induction: Processes of
Inference, Learning, and Discovery (Cambridge, MA: MIT Press)
Holland J H and Reitman J S 1978 Cognitive systems based on adaptive algorithms
Pattern Directed Inference Systems ed D A Waterman and F Hayes-Roth (New
York: Academic) pp 313–29
Horn J, Goldberg D E and Deb K 1994 Implicit niching in a learning classifier system:
Nature’s way Evolutionary Comput. 2 37–66
Riolo R L 1986 CFS-C: a Package of Domain Independent Subroutines for Implementing
Classifier Systems in Arbitrary User-defined Environments University of Michigan,
Logic of Computers Group, Technical Report
Robertson G G and Riolo R 1988 A tale of two classifier systems Machine Learning 3
139–60
References 123
Zbigniew Michalewicz
There is some experimental evidence (Davis 1991, Michalewicz 1993) that the
enhancement of evolutionary methods by some additional (problem-specific)
heuristics, domain knowledge, or existing algorithms can result in a system
with outstanding performance. Such enhanced systems are often referred to as
hybrid evolutionary systems.
Several researchers have recognized the potential of such hybridization of
evolutionary systems. Davis (1991, p 56) wrote:
When I talk to the user, I explain that my plan is to hybridize the
genetic algorithm technique and the current algorithm by employing
the following three principles:
• Use the Current Encoding. Use the current algorithm’s encoding
technique in the hybrid algorithm.
• Hybridize Where Possible. Incorporate the positive features of the
current algorithm in the hybrid algorithm.
• Adapt the Genetic Operators. Create crossover and mutation
operators for the new type of encoding by analogy with bit
string crossover and mutation operators. Incorporate domain-
based heuristics as operators as well.
[...] I use the term hybrid genetic algorithm for algorithms created by
applying these three principles.
The above three principles emerged as a result of countless experiments of
many researchers, who tried to ‘tune’ their evolutionary algorithms to some
problem at hand, that is, to create ‘the best’ algorithm for a particular class
of problems. For example, during the last 15 years, various application-
specific variations of evolutionary algorithms have been reported (Michalewicz
1996); these variations included variable-length strings (including strings whose
elements were if–then–else rules), richer structures than binary strings, and
experiments with modified genetic operators to meet the needs of particular
applications. Some researchers (e.g. Grefenstette 1987) experimented with
incorporating problem-specific knowledge into the initialization routine of an
evolutionary system; if a (fast) heuristic algorithm provides individuals of the
124
Hybrid methods 125
References
Adler D 1993 Genetic algorithms and simulated annealing: a marriage proposal Proc.
IEEE Int. Conf. on Neural Networks pp 1104–9
Angeline P J 1995 Morphogenic evolutionary computation: introduction, issues, and
examples Proc. 4th Ann. Conf. on Evolutionary Programming (San Diego, CA,
March 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge, MA:
MIT Press) pp 387–401
Davis L 1991 Handbook of Genetic Algorithms (New York: Van Nostrand Reinhold)
Grefenstette J J 1987 Incorporating problem specific knowledge into genetic algorithms
Genetic Algorithms and Simulated Annealing ed L Davis (Los Altos, CA: Morgan
Kaufmann) pp 42–60
Michalewicz Z 1993 A hierarchy of evolution programs: an experimental study
Evolutionary Comput. 1 51–76
——1996 Genetic Algorithms + Data Structures = Evolution Programs 3rd edn (New
York: Springer)
Mühlenbein H, Gorges-Schleuter M and Krämer O 1988 Evolution algorithms in
combinatorial optimization Parallel Comput. 7 65–85
14
Introduction to representations
Kalyanmoy Deb
127
128 Introduction to representations
applications, decision variables are directly used and modified genetic operators
are used to make a successful search. A detailed discussion of the real-valued
vector representations is given in Chapter 16.
In evolution strategy (ES) and evolutionary programming (EP) studies, a
natural representation of the decision variables is used where a real-valued
solution vector is used. The numerical values of the decision variables are
immediately taken from the solution vector to compute the objective function
value. In both ES and EP studies, the crossover and mutation operators are used
variable by variable. Thus, the relative positioning of the decision variables in
the solution vector is not an important matter. However, in recent studies of
ES and EP, in addition to the decision variables, the solution vector includes a
set of strategy parameters specifying the variance of search mutation for each
variable and variable combinations. For n decision variables, both methods use
an additional number between one and n(n + 1)/2 such strategy parameters,
depending on the degree of freedom the user wants to provide for the search
algorithm. These adaptive parameters control the search of each variable,
considering its own allowable variance and covariance with other decision
variables. We discuss these representations in Section 16.2.
In permutation problems, the solutions are usually a vector of node identifiers
representing a permutation. Depending on the problem specification, special care
is taken in creating valid solutions representing a valid permutation. In these
problems, the absolute positioning of the node identifiers is not as important as
the relative positioning of the node identifiers. The representation of permutation
problems is discussed further in Chapter 17.
In early EP works, finite-state machines were used to evolve intelligent
algorithms which were operated on a sequence of symbols so as to produce an
output symbol which would maximize the algorithm’s performance. Finite-state
representations were used as solutions to the underlying problem. The input
and output symbols were taken from two different finite-state alphabet sets. A
solution is represented by specifying both input and output symbols to each link
connecting the finite states. The finite-state machine tranforms a sequence of
input symbols to a sequence of output symbols. The finite-state representations
are discussed in Chapter 18.
In genetic programming studies, a solution is usually a LISP program
specifying a strategy or an algorithm for solving a particular task. Functions
and terminals are used to create a valid solution. The syntax and structure of
each function are maintained. Thus, if an OR function is used in the solution,
at least two arguments are assigned from the terminal set to make a valid OR
operation. Usually, the depth of nestings used in any solution is restricted to
a specified upper limit. In recent applications of genetic programming, many
special features are used in representing a solution. As the iterations progress,
a part of the solution is frozen and defined as a metafunction with specified
arguments. We shall discuss these features further in Chapter 19.
As mentioned earlier, the representation of a solution is important in the
130 Introduction to representations
References
Deb K 1995 Optimization for Engineering Design: Algorithms and Examples (New Delhi:
Prentice-Hall)
——1997 A robust optimal design technique for mechanical component design
Evolutionary Algorithms in Engineering Applications ed D Dasgupta and Z
Michalewicz (Berlin: Springer) in press
Deb K and Agrawal R 1995 Simulated binary crossover for continuous search space
Complex Syst. 9 115–48
Chaturvedi D, Deb K and Chakrabarty S K 1995 Structural optimization using real-coded
genetic algorithms Proc. Symp. on Genetic Algorithms (Dehradun) ed P K Roy and
S D Mehta (Dehradun: Mahendra Pal Singh) pp 73–82
Eshelman L J and Schaffer J D 1993 Real-coded genetic algorithms and interval schemata
Foundations of Genetic Algorithms II (Vail, CO) ed D Whitley (San Mateo, CA:
Morgan Kaufmann) pp 187–202
Kargupta H, Deb K and Goldberg D E 1992 Ordering genetic algorithms and deception
Parallel Problem Solving from Nature II (Brussels) ed R Manner and B Manderick
(Amsterdam: North-Holland) pp 47–56
Radcliffe N J 1993 Genetic set recombination Foundations of Genetic Algorithms II (Vail,
CO) ed D Whitley (San Mateo, CA: Morgan Kaufmann) pp 203–19
Reklaitis G V, Ravindran A and Ragsdell K M 1983 Engineering Optimization: Methods
and Applications (New York: Wiley)
Schaffer J D, Caruana R A, Eshelman L J and Das R 1989 A study of control
parameters affecting online performance of genetic algorithms Proc. 3rd Int. Conf.
on Genetic Algorithms (Fairfax, WA, 1989) ed J D Schaffer (San Mateo, CA:
Morgan Kaufmann) pp 51–60
Wright A 1991 Genetic algorithms for real parameter optimization Foundations of
Genetic Algorithms (Bloomington, IN) ed G J E Rawlins (San Mateo, CA: Morgan
Kaufmann) pp 205–20
15
Binary strings
Thomas Bäck
132
Binary strings 133
or by using a Gray code interpretation of the binary vectors, which ensures that
adjacent integer values are represented by binary vectors with Hamming distance
one (i.e. they differ by one entry only). For the Gray code, equation (15.1) is
extended by a conversion of the Gray code representation to the standard code,
which can be done for example according to
vi − ui x −1
x −j
(a1 , . . . , ax ) = ui +
i
a(i−1)x +k 2j (15.2)
2 x − 1 j =0 k=1
are likely to be discrete because they aim at modeling the adaptive capabilities
of natural evolution on the genotype level.
Interpreting a genetic algorithm as an algorithm that processes schemata,
Holland (1975, p 71) then argues that the number of schemata available under
a certain representation is maximized by using binary variables; that is, the
maximum number of schemata is processed by the algorithm if ai ∈ {0, 1}.
This result can be derived by noticing that, when the cardinality of an alphabet
A for the allele values is k = |A|, the number of different schemata is (k + 1)
(i.e. 3 in the case of binary variables). For binary alleles, 2 different solutions
can be represented by vectors of length , and in order to encode the same
number of solutions by a k-ary alphabet, a vector of length
ln 2
= (15.3)
ln k
and obey some structure preserving conditions that still need to be formulated
as a guideline for finding a suitable encoding.
References
Bäck T 1993 Optimal mutation rates in genetic search Proc. 5th Int. Conf. on Genetic
Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 2–8
——1996 Evolutionary Algorithms in Theory and Practice (New York: Oxford
University Press)
Bean J C 1993 Genetics and Random Keys for Sequences and Optimization Technical
Report 92–43, University of Michigan Department of Industrial and Operations
Engineering
Davis L 1991 Handbook of Genetic Algorithms (New York: Van Nostrand Reinhold)
De Jong K A 1975 An Analysis of the Behaviour of a Class of Genetic Adaptive Systems
PhD Thesis, University of Michigan
Goldberg D E 1989 Genetic Algorithms in Search, Optimization and Machine Learning
(Reading, MA: Addison Wesley)
——1991 The theory of virtual alphabets Parallel Problem Solving from Nature—Proc.
1st Workshop, PPSN I (Lecture Notes in Computer Science 496) ed H-P Schwefel
and R Männer (Berlin: Springer) pp 13–22
Holland J H 1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI:
University of Michigan Press)
Michalewicz Z 1996 Genetic Algorithms + Data Structures = Evolution Programs
(Berlin: Springer)
Nakano R and T Yamada 1991 Conventional genetic algorithms for job shop problems
Proc. 4th Int. Conf. on Genetic Algorithms (San Diego, CA, July 1991) ed R K Belew
and L B Booker (San Mateo, CA: Morgan Kaufmann) pp 474–9
Yamada T and R Nakano 1992 A genetic algorithm applicable to large-scale job-shop
problems Parallel Problem Solving from Nature, 2 (Proc. 2nd Int. Conf. on Parallel
Problem Solving from Nature, Brussels, 1992) ed R Männer and B Manderick
(Amsterdam: Elsevier) pp 281–90
16
Real-valued vectors
David B Fogel
136
Object variables and strategy parameters 137
With the belief that maximizing the number of schemata being processed
is not necessarily useful, or may even be harmful (Fogel and Stayton 1994),
there is no compelling reason in a real-valued optimization problem to act
on anything except the real values of the vector x themselves. Moreover,
there has been a general trend away from binary codings within genetic
algorithm research (see e.g. Davis 1991, Belew and Booker 1991, Forrest
1993, and others). Michalewicz (1992, p 82) indicated that for real-valued
numerical optimization problems, floating-point representations outperform
binary representations because they are more consistent and more precise and
lead to faster execution. This trend may reflect a growing rejection of the
building block hypothesis as an explanation for how genetic algorithms act as
optimization procedures.
With evolution strategies and evolutionary programming, the typical method
for searching a real-valued solution space is to add a multivariate zero-mean
Gaussian random variable to each parent involved in the creation of offspring
(see Section 32.2). In consequence, this necessitates the setting of the covariance
matrix for the Gaussian perturbation. If the covariances between parameters
are ignored, only a vector of standard deviations in each dimension is required.
There are a variety of methods for setting these standard deviations. Section 32.2
offers a variety of procedures for mutating real-valued vectors.
(x, σ, α)
References
Bäck T and Schwefel H-P 1993 An overview of evolutionary algorithms for parameter
optimization Evolutionary Comput. 1 1–24
Belew R K and Booker L B (eds) 1991 Proc. 4th Int. Conf. on Genetic Algorithms (San
Diego, CA, July 1991) (San Mateo, CA: Morgan Kaufmann)
Davis L 1991 A genetic algorithms tutorial IV. Hybrid genetic algorithms Handbook of
Genetic Algorithms ed L Davis (New York: Van Nostrand Reinhold)
Fogel D B 1995 Evolutionary Computation: Toward a New Philosophy of Machine
Intelligence (Piscataway, NJ: IEEE)
Fogel D B, Fogel L J and Atmar J W 1991 Meta-evolutionary programming Proc. 25th
Asilomar Conf. on Signals, Systems, and Computers ed R R Chen (San Jose, CA:
Maple) pp 540–5
Fogel D B and Stayton L C 1994 On the effectiveness of crossover in simulated
evolutionary optimization BioSystems 32 171–82
Forrest S (ed) 1993 Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL,
July 1993) (San Mateo, CA: Morgan Kaufmann)
Goldberg D E 1989 Genetic Algorithms in Search, Optimization and Machine Learning
(Reading, MA: Addison-Wesley)
Michalewicz Z 1992 Genetic Algorithms + Data Structures = Evolution Programs
(Berlin: Springer)
Ostermeier A, Gawelczyk A and Hansen N 1994 A derandomized approach to self-
adaptation of evolution strategies Evolutionary Comput. 2 369–80
Rechenberg I 1994 Personal communication
Reed J, Toombs R and Barricelli N A 1967 Simulation of biological evolution and
machine learning J. Theor. Biol 17 319–42
Schwefel H-P 1981 Numerical Optimization of Computer Models (Chichester: Wiley)
Spears W M 1995 Adapting crossover in evolutionary algorithms Evolutionary
Programming IV: Proc. 4th Ann. Conf. on Evolutionary Programming (San Diego,
CA, March 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge,
MA: MIT Press) pp 367–84
17
Permutations
Darrell Whitley
17.1 Introduction
To quote Knuth (1973), ‘A permutation of a finite set is an arrangement of its
elements into a row.’ Given n unique objects, n! permutations of the objects
exist. There are various properties of permutations that are relevant to the
manipulation of permutation representations by evolutionary algorithms, both
from a representation point of view and from an analytical perspective.
As researchers began to apply evolutionary algorithms to applications that are
naturally represented as permutations, it became clear that these problems pose
different coding challenges than traditional parameter optimization problems.
First, for some types of problem there are multiple equivalent solutions. When
a permutation is used to represent a cycle, as in the traveling salesman problem
(TSP), then all shifts of the permutation are equivalent solutions. Furthermore,
all reversals of a permutation are also equivalent solutions. Such symmetries
can pose problems for evolutionary algorithms that rely on recombination.
Another problem is that permutation problems cannot be processed using
the same general recombination and mutation operators which are applied to
parameter optimization problems. The use of a permutation representation may
in fact mask very real differences in the underlying combinatorial optimization
problems. An example of these differences is evident in the description of
classic problems such as the TSP and the problem of resource scheduling.
The traveling salesman problem is the problem of visiting each vertex (i.e.
city) in a full connected graph exactly once while minimizing a cost function
defined with respect to the edges between adjacent vertices. In simple terms,
the problem is to minimize the total distance traveled while visiting all the cities
and returning to the point of origin. The TSP is closely related to the problem
of finding a Hamiltonian circuit in an arbitrary graph. The Hamiltonian circuit
is a set of edges that form a cycle which visits every vertex exactly once.
It is relatively easy to show that the problem of finding a set of Boolean
values that yield an evaluation of ‘true’ for a three-conjunction normal form
Boolean expression is directly polynomial-time reducible to the problem of
139
140 Permutations
and in general
n!
n1 !n2 !n3 ! . . .
where n is the number of elements in the multiset and ni is the number of
elements of type i (Knuth 1973). Radcliffe (1993) considers the application of
genetic and evolutionary operators when the solution is expressed as a set or
multiset (bag).
Before looking in more detail at the relationship between permutations and
evolutionary algorithms, some general properties of permutations are reviewed
that are both interesting and useful.
We can also relate this mapping operator to the process of finding an inverse.
The permutations in the expression
r3421,1342 (3124) = r1432,2143 (1234)
are included as rows in an array. To map the left-hand side of the preceding
expression to the terms in the right-hand side, first compute the inverses for
each of the terms in the left-hand side:
3421 1234
=
1234 4312
1342 1234
=
1234 1423
3124 1234
= .
1234 2314
Collect the three inverses into a single array. We also then add 1 2 3 4 to the
array and inverse the permutation 2 3 1 4, at the same time rearranging all the
other permutations in the array:
4312 1432
1423 2143
2314 = 1234 .
1234 3124
This yields the permutations 1432, 2143, and 1234 which represent the
desired canonical form as it relates to the notion of substitution into a symbolic
canonical form. One can also reverse the process to find the permutations pi
and pj in the following context:
A B C D E F A B C D E F
A 0 1 0 0 0 1 A 0 0 1 0 0 1
B 1 0 1 0 0 0 B 0 0 0 0 1 1
C 0 1 0 1 0 0 C 1 0 0 1 0 0
D 0 0 1 0 1 0 D 0 0 1 0 1 0
E 0 0 0 1 0 1 E 0 1 0 1 0 0
F 1 0 0 0 1 0 F 1 1 0 0 0 0
A B C D E F A B C D E F
A 0 1 1 1 1 1 A 0 0 0 0 0 0
B 0 0 1 1 1 1 B 1 0 0 0 0 1
C 0 0 0 1 1 1 C 1 1 0 1 1 1
D 0 0 0 0 1 1 D 1 1 0 0 1 1
E 0 0 0 0 0 1 E 1 1 0 0 0 1
F 0 0 0 0 0 0 F 1 0 0 0 0 0
In this case, the lower triangle of the matrix flags inversions, which should
not be confused with an inverse. If a1 a2 a3 . . . an is a permutation of the
canonically ordered set 1, 2, 3, . . . , n then the pair (ai , aj ) is an inversion if
i < j and ai > aj (Knuth 1973). Thus, the number of 1 bits in the lower
triangles of the above matrices is also a count of the number of inversions
(which should also not be confused with the inversion operator used in simple
genetic algorithms, see Holland 1975, p 106, Goldberg 1989, p 166).
The common information can also extracted as before. This produces the
following matrix:
A B C D E F
A 0 # # # # #
B # 0 # # # 1
C # # 0 1 1 1
D # # 0 0 1 1
E # # 0 0 0 1
F # 0 0 0 0 0
Note that this binary matrix is again symmetric around the diagonal, except
that the lower triangle and upper triangle have complementary bit values. Thus
only N (N − 1)/2 elements are needed to represent relative order information.
There have been few studies of how recombination crossover operators
generate offspring in this particular representation space. Fox and McMahon
(1991) offer some work of this kind and also define several operators that work
directly on these binary matrices for relative order.
While matrices may not be the most efficient form of implementation, they
do provide a tool for better understanding sequence recombination operators
designed to exploit relative order. It is clear that adjacency and relative order
relationships are different and are best expressed by different binary matrices.
Likewise, absolute position information also has a different matrix representation
(for example, rows could represent cities and the columns represent positions).
Cycle crossover (Section 33.3.6; see Starkweather et al 1991, Oliver et al 1987)
appears to be a good absolute position operator, although it is hard to find
problems in the literature where absolute position is critical.
P(Ij ) = Cj .
To illustrate:
C=a bcd ef g h
I = 6 2 5 3 8 7 1 4 which represents P = g b d h c a f e.
This may seem like a needless indirection, but consider that I can
be generalized to allow a larger number of possible values than there are
permutation elements. I can also be generalized to allow all real values
(although for computer implementations the distinction is somewhat artificial
since all digital representations of real values are discrete and finite). We
now have a parameter-based presentation of the permutation such that we can
generate random vectors I representing permutations. If the number of values
for which elements in I are defined is dramatically larger than the number of
elements in the permutation, then duplicate values in randomly generated vectors
will occur with very small probability.
This representation allows a permutation problem to be treated as if it were
a more traditional parameter optimization problem with the constraint that no
two elements of vector I should be equal, or that there is a well defined way
to resolve ties. Evolutionary algorithm techniques normally used for parameter
optimization problems can thus be applied to permutation problems using this
representation.
This idea has been independently invented on a couple of occasions. The
first use of this coding method was by Steve Smith of Thinking Machines. A
version of this coding was used by the ARGOT Strategy (Shaefer 1987) and the
representation was picked up by Syswerda (1989) and by Schaffer et al (1989)
for the TSP. More recently, a similar idea was introduced by Bean (1994) under
the name random keys.
is equivalent to
Ordering schemata and other metrics 147
Note that in this definition of the o-schemata, relative order is not accounted for.
In other words, if relative order is important then all of the following shifted
o-schemata,
1 ! ! 7 3 ! ! !
! 1 ! ! 7 3 ! !
! ! 1 ! ! 7 3 !
! ! ! 1 ! ! 7 3
could be viewed as equivalent. Such schemata may or may not ‘wrap around’.
Goldberg discusses o-schemata which have an absolute fixed position (o-
schemata, type a) and those with relative position which are shifts of a specified
template (o-schemata, type r).
This work on o-schemata predates the distinctions between relative
order permutation problems, absolute position problems, and adjacency-based
problems. Thus, o-schemata appear to be better for understanding resource
scheduling applications than for the TSP. In subsequent work, Kargupta et
al (1992) attempt to use ordering schemata to construct deceptive functions
for ordering problems—that is, problems where the average fitness values of
the o-schemata provide misleading information. Note that such problems are
constructed to mislead simple genetic algorithms and may or may not be difficult
with respect to other types of algorithm. (For a discussion of deception see
148 Permutations
the article by Goldberg (1987) and Whitley (1991) and for another perspective
see the article by Grefenstette (1993).) The analysis of Kargupta et al (1992)
considers PMX, a uniform ordering crossover operator (UOX), and a relative
ordering crossover operator (ROX).
An alternative way of constructing relative order problems and of
comparing the similarity of permutations is given by Whitley and Yoo
(1995). Recall that a relative order matrix has a 1 bit in position (X, Y ) if
row element X appears before column element Y in a permutation. Note
that the matrix representation yields a unique binary representation for each
permutation. Using this representation one can also define the Hamming
distance between two permutations P1 and P2; Hamming distance is denoted
by HD(index(P1), index(P2)), where the permutations are represented by their
integer index. In the following examples, the Hamming distance is computed
with respect to the lower triangle (i.e. it is a count of the number of 1 bits in
the lower triangle):
A B C D
---------
A | 0 1 1 1
A B C D B | 0 0 1 1 HD(0,0) = 0
C | 0 0 0 1
D | 0 0 0 0
A B C D
---------
A | 0 0 0 0
B D C A B | 1 0 1 1 HD(0,11) = 4
C | 1 0 0 0
D | 1 0 1 0
A B C D
---------
A | 0 0 0 0
D C B A B | 1 0 0 0 HD(0,23) = 6
C | 1 1 0 0
D | 1 1 1 0
Whitley and Yoo (1995) point out that this representation is not perfect.
2
Since 2(N ) > N !, certain binary strings are undefined. For example, consider
the following upper triangle:
1 1 1
0 1
0
References
Bean J C 1994 Genetic algorithms and random keys for sequencing and optimization
ORSA J. Comput. 6
Cormen T, Leiserson C and Rivest R 1990 Introduction to Algorithms (Cambridge, MA:
MIT Press)
Deb K and Goldberg D 1993 Analyzing deception in trap functions Foundations of
Genetic Algorithms 2 ed D Whitley (San Mateo, CA: Morgan Kaufmann)
Fox B R and McMahon M B 1991 Genetic Operators for Sequencing Problems
Foundations of Genetic Algorithms ed G J E Rawlins (San Mateo, CA: Morgan
Kaufmann) pp 284–300
150 Permutations
David B Fogel
18.1 Introduction
A finite-state machine is a mathematical logic. It is essentially a computer
program: it represents a sequence of instructions to be executed, each depending
on a current state of the machine and the current stimulus. More formally, a
finite-state machine is a 5-tuple
M = (Q, τ, ρ, s, o)
where Q is a finite set, the set of states, τ is a finite set, the set of input symbols,
ρ is a finite set, the set of output symbols, s : Q × τ → Q is the next state
function, and o : Q × τ → ρ is the next output function.
Any 5-tuple of sets and functions satisfying this definition is to be interpreted
as the mathematical description of a machine that, if given an input symbol x
while it is in state q, will output the symbol o(q, x) and transition to state s(q, x).
Only the information contained in the current state describes the behavior of the
machine for a given stimulus. The entire set of states serves as the ‘memory’ of
the machine. Thus a finite-state machine is a transducer that can be stimulated
by a finite alphabet of input symbols, that can respond in a finite alphabet
of output symbols, and that possesses some finite number of different internal
states. The corresponding input–output symbol pairs and next-state transitions
for each input symbol, taken over every state, specify the behavior of any finite-
state machine, given any starting state. For example, a three-state machine is
shown in figure 18.1. The alphabet of input symbols are elements of the set
{0, 1}, whereas the alphabet of output symbols are elements of the set {α, β, γ }
(input symbols are shown to the left of the slash, output symbols are shown to
the right). The finite-state machine transforms a sequence of input symbols into
a sequence of output symbols. Table 18.1 indicates the response of the machine
to a given string of symbols, presuming that the machine is found in state C.
It is presumed that the machine acts when each input symbol is perceived and
the output takes place before the next input symbol arrives.
151
152 Finite-state representations
Figure 18.1. A three-state finite machine. Input symbols are shown to the left of the
slash. Output symbols are to the right of the slash. Unless otherwise specified, the
machine is presumed to start in state A. (After Fogel et al 1966, p 12).
Table 18.1. The response of the finite-state machine shown in figure 18.1 to a string of
symbols. In this example, the machine starts in state C.
Present state C B C A A B
Input symbol 0 1 1 1 0 1
Next state B C A A B C
Output symbol β α γ β β α
18.2 Applications
Finite-state representations are often convenient when the required solutions to
a particular problem of interest require the generation of a sequence of symbols
having specific meaning. For example, consider the problem offered by Fogel
et al (1966) of predicting the next symbol in a sequence of symbols taken from
some alphabet A (here, τ = ρ = A). A population of finite-state machines is
exposed to the environment, that is, the sequence of symbols that have been
observed up to the current time. For each parent machine, as each input symbol
is offered to the machine, each output symbol is compared with the next input
symbol. The worth of this prediction is then measured with respect to the
given payoff function (e.g. all–none, absolute error, squared error, or any other
expression of the meaning of the symbols). After the last prediction is made,
a function of the payoff for each symbol (e.g. average payoff per symbol)
indicates the fitness of the machine. Offspring machines are created through
mutation (Section 32.4) and/or recombination (Section 33.4). The machines that
provide the greatest payoff are retained to become parents of the next generation.
This process is iterated until an actual prediction of the next symbol (as yet
Applications 153
References
Angeline P J and Pollack J B 1993 Evolutionary module acquisition Proc. 2nd Ann. Conf.
on Evolutionary Programming (San Diego, CA) ed D B Fogel and W Atmar (La
Jolla, CA: Evolutionary Programming Society) pp 154–63
Fogel D B 1991 The evolution of intelligent decision-making in gaming Cybernet. Syst.
22 223–36
——1993 Evolving behaviors in the iterated prisoner’s dilemma Evolut. Comput. 1 77–97
——1995a On the relationship between the duration of an encounter and the evolution
of cooperation in the iterated prisoner’s dilemma Evolut. Comput. 3 349–63
——1995b Evolutionary Computation: Toward a New Philosophy of Machine Intelligence
(Piscataway, NJ: IEEE)
Fogel L J, Owens A J and Walsh M J 1966 Artificial Intelligence Through Simulated
Evolution (New York: Wiley)
Jefferson D, Collins R, Cooper C, Dyer M, Flowers M, Korf R, Taylor C and Wang A
1991 Evolution as a theme in artificial life: the Genesys/Tracker system Artificial
Life II ed C G Langton, C Taylor, J D Farmer and S Rasmussen (Reading, MA:
Addison-Wesley) pp 549–77
19
Parse trees
Peter J Angeline
155
156 Parse trees
evaluated within an implied ‘repeat until done’ loop that reexecutes the evolved
function until some predetermined stopping criterion is satisfied. For instance,
Koza (1992) describes evolving a controller for an artificial ant for which the
fitness function repeatedly applies its program until a total of 400 commands
are executed or the ant completes the task. Numerous examples of such implied
loops can be found in the genetic programming literature (e.g. Koza 1992,
pp 147, 329, 346, Teller 1994, Reynolds 1994, Kinnear 1993).
Often it is necessary to include constants in the primitive language, especially
when mathematical expressions are being evolved. The general practice is to
include as a potential terminal of the language a special symbol that denotes a
constant. When a new individual is created and this symbol is selected to be a
terminal, rather than enter the symbol into the parse tree, a numerical constant
is inserted drawn uniformly from a user-defined range (Koza 1992). Figure 19.1
shows a number of numerical constants that would be inserted into the parse
tree in this manner.
Figure 19.1. An example parse tree representation for a complex numerical function. The
function if-lt-0 is a numerical conditional that returns the value of its second argument
if its first argument evaluates to a negative number and otherwise returns the value of
its third argument. The function % denotes a protected division operator that returns a
value of 1.0 if the second argument (the denominator) is zero.
References
Angeline P J 1996 Genetic programming’s continued evolution Advances in Genetic
Programming vol 2, ed P J Angeline and K Kinnear (Cambridge, MA: MIT Press)
pp 1–20
Angeline P J and Pollack J B 1994 Co-evolving high-level representations Artificial Life
III ed C G Langton (Reading, MA: Addison-Wesley) pp 55–71
Cramer N L 1985 A representation for the adaptive generation of simple sequential
programs Proc. 1st Int. Conf. on Genetic Algorithms (Pittsburgh, PA, July 1985) ed
J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 183–7
Kinnear K E 1993 Generality and difficulty in genetic programming: evolving a sort
Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed
S Forrest (San Mateo, CA: Morgan Kaufmann) pp 287–94
Koza J R 1992 Genetic Programming: on the Programming of Computers by Means of
Natural Selection (Cambridge, MA: MIT Press)
——1994 Genetic Programming II: Automatic Discovery of Reusable Programs
(Cambridge, MA: MIT Press)
Koza J R and Andre D 1996 Classifying protein segments as transmembrane domains
using architecture-altering operations in genetic programming Advances in Genetic
Programming vol 2, ed P J Angeline and K Kinnear (Cambridge, MA: MIT Press)
pp 155–76
Montana D J 1995 Strongly typed genetic programming Evolutionary Comput. 3 199–230
References 159
160
Guidelines for a suitable encoding 161
Figure 20.1. A quadratic bowl in two dimensions. The shape of the response surface
suggests a natural approach for optimization. The intuitive choice is to use real-valued
encodings and continuous variation operators. The shape of a response surface can be
useful in suggesting choices for suitable encodings.
References
Bäck T and Schwefel H-P 1993 An overview of evolutionary algorithms for parameter
optimization Evolutionary Comput. 1 1–24
Davis L (ed) 1991 Handbook of Genetic Algorithms (New York: Van Nostrand Reinhold)
Fogel D B and Stayton L C 1994 On the effectiveness of crossover in simulated
evolutionary optimization BioSystems 32 171–82
Goldberg D E 1989 Genetic Algorithms in Search, Optimization, and Machine Learning
(Reading, MA: Addison-Wesley)
Goldberg D E and Smith R E 1987 Nonstationary function optimization using genetic
algortihms with dominance and diploidy Proc. 2nd Int. Conf. on Genetic Algorithms
(Cambridge, MA, 1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 59–68
Holland J H 1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI:
University of Michigan Press)
Michalewicz Z 1996 Genetic Algorithms + Data Structures = Evolution Programs 3rd
edn (Berlin: Springer)
Ng K P and Wong K C 1995 A new diploid scheme and dominance change mechanism
for non-stationary function optimization Proc. 6th Int. Conf. on Genetic Algorithms
(Pittsburgh, PA, July 1995) ed L J Eshelman (San Mateo, CA: Morgan Kaufmann)
pp 159–66
Schraudolph N N and Belew R K 1992 Dynamic parameter encoding for genetic
algorithms Machine Learning 9 9–21
21
Other representations
21.2 Introns
In contrast to the above hybridization of different forms of representation,
another ‘nontraditional’ approach has involved the inclusion of noncoding
regions (introns) within a solution (see e.g. Levenick 1991, Golden et al 1995,
Wu and Lindsay 1995). Solutions are represented in the form
163
164 Other representations
References
Angeline P J, Saunders G M and Pollack J B 1994 An evolutionary algorithm that
constructs recurrent neural networks IEEE Trans. Neural Networks NN-5 54–65
References 165
Bäck T and Schütz M 1995 Evolution strategies for mixed-integer optimization of optical
multilayer systems Proc. 4th Ann. Conf. on Evolutionary Programming (San Diego,
CA, March 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge,
MA: MIT Press) pp 33–51
Bagley J D 1967 The Behavior of Adaptive Systems which Employ Genetic and
Correlation Algorithms Doctoral Dissertation, University of Michigan; University
Microfilms 68-7556
Brindle A 1981 Genetic Algorithms for Function Optimization Doctoral Dissertation,
University of Alberta
Cobb H G and Grefenstette J J 1993 Genetic algorithms for tracking changing
environments Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL,
July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 523–30
Fogel D B 1995 Evolutionary Computation: Toward a New Philosophy of Machine
Intelligence (Piscataway, NJ: IEEE)
Goldberg D E, Korb D E and Deb K 1989 Messy genetic algorithms: motivation, analysis,
and first results Complex Syst. 3 493–530
Goldberg D E and Smith R E 1987 Nonstationary function optimization using genetic
algorithms with dominance and diploidy Proc. 2nd Int. Conf. on Genetic Algorithms
(Cambridge, MA, July 1987) ed J J Grefenstette (Hillsdale, NJ: Erlbaum) pp 59–68
Golden J B, Garcia E and Tibbetts C 1995 Evolutionary optimization of a neural network-
based signal processor for photometric data from an automated DNA sequencer
Proc. 4th Ann. Conf. on Evolutionary Programming (San Diego, CA, March 1995)
ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge, MA: MIT Press)
pp 579–601
Haffner S B and Sebald A V 1993 Computer-aided design of fuzzy HVAC
controllers using evolutionary programming Proc. 2nd Ann. Conf. on Evolutionary
Programming (San Diego, CA, 1993) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp 98–107
Hollstein R B 1971 Artificial Genetic Adaptation in Computer Control Systems Doctoral
Dissertation, University of Michigan; University Microfilms 71-23, 773
Levenick J R 1991 Inserting introns improves genetic algorithm success rate: taking
a cue from biology Proc. 4th Int. Conf. on Genetic Algorithms (San Diego, CA,
July 1991) ed R K Belew and L B Booker (San Mateo, CA: Morgan Kaufmann)
pp 123–27
McDonnell J R and Waagen D 1994 Evolving recurrent perceptrons for time-series
modeling IEEE Trans. Neural Networks NN-5 24–38
Ng K P and Wong K C 1995 A new diploid scheme and dominance change mechanism
for non-stationary function optimization Proc. 6th Int. Conf. on Genetic Algorithms
(Pittsburgh, PA, July 1995) ed L J Eshelman (San Mateo, CA: Morgan Kaufmann)
pp 159–66
Wu A S and Lindsay R K 1995 Empirical studies of the genetic algorithm with noncoding
segments Evolutionary Comput. 3 121–48
22
Introduction to selection
Kalyanmoy Deb
166
Pseudocode 167
22.2 Pseudocode
Some EC algorithms (specifically, genetic algorithms (GAs) and genetic
programming (GP)) usually apply the selection operator first to select good
solutions and then apply the recombination and mutation operators on these
good solutions to create a hopefully better set of solutions. Other EC algorithms
(specifically, evolution strategies (ES) and evolutionary programming (EP))
prefer using the recombination and mutation operator first to create a set of
solutions and then use the selection operator to choose a good set of solutions.
The selection operator in (µ + λ) ES and EP techniques chooses the offspring
solutions from a combined population of parent solutions and solutions obtained
after recombination and mutation. In the case of EP, this is done statistically.
However, the selection operator in (µ, λ) ES chooses the offspring solutions
only from the solutions obtained after the recombination and mutation operators.
Since the selection operators are different in different EC studies, it is difficult
to present a common code for all selection operators. However, the following
pseudocode is a generic for most of the selection operators used in EC studies.
The parameters µ and λ are the numbers of parent solutions and offspring
solutions after recombination and mutation operators, respectively. The
parameter q is a parameter related to the operator’s selective pressure, a matter
we discuss later in this section. The population at iteration t is denoted by
P (t) = {a1 , a2 , . . .} and the population obtained after the recombination and
mutation operators is denoted by P (t) = {a1 , a2 , . . .}. Since GAs and GP
techniques use the selection operator first, the population P (t) before the
selection operation is an empty set, with no solutions. The fitness function
is represented by F (t).
Input: µ, λ, q, P (t) ∈ I µ , P (t) ∈ I λ , F (t)
1 for i ← 1 to µ
ai (t) ← sselection (P (t), P (t), F (t), q);
2 return({a1 (t), . . . , aµ (t)});
Detailed discussions of some of the selection operators are presented in the
subsequent sections. Here, we outline a brief introduction to some of the popular
selection schemes, mentioned as sselection in the above pseudocode.
In the proportionate selection operator, the expected number of copies a
solution receives is assigned proportionally to its fitness. Thus, a solution having
twice the fitness of another solution receives twice as many copies. The simplest
168 Introduction to selection
q) chosen from the pool. The complete pool is then sorted in descending order
of this score and the first µ solutions are chosen deterministically. Thus, this
selection scheme is similar to the (µ+µ) ES selection scheme with a tournament
selection of q tournament size. Bäck et al (1994) analyzed this selection scheme
as a combination of (µ + µ) ES and tournament selection schemes, and found
some convergence characteristics of this operator.
Goldberg and Deb (1991) have compared a number of popular selection
schemes in terms of their convergence properties, selective pressure, takeover
times, and growth factors, all of which are important in the understanding of
the power of different selection schemes used in GA and GP studies. Similar
studies have also been performed by Bäck et al (1994) for selection schemes
used in ES and EP studies. A detailed discussion of some analytical as well as
experimental comparisons of selection schemes is presented in Chapter 29. In
the following section, we briefly discuss the theory of selective pressure and its
importance in choosing a suitable selection operator for a particular application.
when the proportion of the best solution in the population is negligible. The
late growth rate is calculated later, when the proportion of the best solution in
the population is large (about 0.5). The early growth rate is important, especially
if a quick near-optimizer algorithm is desired, whereas the late growth rate can
be a useful measure if precision in the final solution is important. Goldberg
and Deb (1991) have calculated these growth rates for a number of selection
operators used in GAs. A comparison of different selection schemes based on
some of the above criteria is given in Chapter 29.
The above discussion suggests that, for a successful EC simulation, the
required selection pressure of a selection operator depends on the recombination
and mutation operators used. A selection scheme with a large selection pressure
can be used, but only with highly disruptive recombination and mutation
operators. Goldberg et al (1993) and later Thierens and Goldberg (1993) have
found functional relationships between the selective pressure and the probability
of crossover for successful working of selectorecombinative GAs. These studies
show that a large selection pressure can be used but only with a large probability
of crossover. However, if a reasonable selection pressure is used, GAs work
successfully for a wide variety of crossover probablities. Similar studies can
also be performed with ES and EP algorithms.
References
Bäck T 1994 Selective pressure in evolutionary algorithms: a characterization of selection
mechanisms Proc. 1st IEEE Conf. on Evolutionary Computation (Orlando, FL,
1994) (Piscataway, NJ: IEEE) pp 57–62
Bäck T, Rudolph G and Schwefel H-P 1994 Evolutionary programming and evolution
strategies: similarities and differences Proc. 2nd Ann. Conf. on Evolutionary
Programming (San Diego, CA, July 1994) ed D B Fogel and W Atmar (La Jolla,
CA: Evolutionary Programming Society)
Goldberg D E 1989 Genetic Algorithms in Search, Optimization, and Machine Learning
(Reading, MA: Addison-Wesley)
Goldberg D E and Deb K 1991 A comparison of selection schemes used in genetic
algorithms Foundations of Genetic Algorithms (Bloomington, IN) ed G J E Rawlins
(San Mateo, CA: Morgan Kaufmann) pp 69–93
Goldberg D E, Deb K and Theirens D 1993 Toward a better understanding of mixing in
genetic algorithms J. SICE 32 10–6
Thierens D and Goldberg D E 1993 Mixing in genetic algorithms Proc. 5th Int. Conf. on
Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 38–45
23
Proportional selection and sampling
algorithms
John Grefenstette
23.1 Introduction
Selection (Chapter 22) is the process of choosing individuals for reproduction in
an evolutionary algorithm. One popular form of selection is called proportional
selection. As the name implies, this approach involves creating a number
of offspring in proportion to an individual’s fitness. This approach was
proposed and analyzed by Holland (1975) and has been used widely in many
implementations of evolutionary algorithms.
Besides having some interesting mathematical properties, proportional
selection provides a natural counterpart in artificial evolutionary systems to the
usual practice in population genetics of defining an individual’s fitness in terms
of its number of offspring.
For clarity of discussion, it is convenient to decompose the selection process
into distinct steps, namely:
(i) map the objective function to fitness,
(ii) create a probability distribution proportional to fitness, and
(iii) draw samples from this distribution.
The first three sections of this article discuss these steps. The final section
discusses some results in the theory of proportional selection, including the
schema theorem and the impact of the fitness function, and two characterizations
of selective pressure.
f : Ax → R
172
Fitness functions 173
where Ax is the object variable space. The objective function typically measures
some cost to be minimized or some reward to be maximized. The definition of
the objective function is, of course, application dependent. The characterization
of how well evolutionary algorithms perform on different classes of objective
functions is a topic of continuing research. However, a few general design
principles are clear when using an evolutionary algorithm.
(i) The objective function must reflect the relevant measures to be optimized.
Evolutionary algorithms are notoriously opportunistic, and there are several
known instances of an algorithm optimizing the stated objective function,
only to have the user realize that the objective function did not actually
represent the intended measure.
(ii) The objective function should exhibit some regularities over the space
defined by the selected representation.
(iii) The objective function should provide enough information to drive the
selective pressure of the evolutionary algorithm. For example, ‘needle-in-
a-haystack’ functions, i.e. functions that assign nearly equal value to every
candidate solution except the optimum, should be avoided.
The fitness function
: Ax → R +
maps the raw scores of the objective function to a non-negative interval. The
fitness function is often a composition of the objective function and a scaling
function g:
(ai (t)) = g(f (ai (t)))
where ai (t) ∈ Ax . Such a mapping is necessary if the goal is to minimize
the objective function, since higher fitness values correspond to lower objective
values in this case. For example, one fitness function that might be used when
the goal is to minimize the objective function is
(ai (t)) = fmax − f (ai (t))
where fmax is the maximum value of the objective function. If the global
maximum value of the objective function is unknown, an alternative is
(ai (t)) = fmax (t) − f (ai (t))
where fmax (t) is the maximum observed value of the objective function up to
time t. There are many other plausible alternatives, such as
1
(ai (t)) =
1 + f (ai (t)) − fmin (t)
where fmin (t) is the minimum observed value of the objective function up to
time t. For maximization problems, this becomes
1
(ai (t)) = .
1 + fmax (t) − f (ai (t))
Note that the latter two fitness functions yield a range of (0, 1].
174 Proportional selection and sampling algorithms
where δ is an update rate of, say, 0.1, and fworst (t) is the worst objective value
in the population at time t.
Sigma scaling (Goldberg 1989) is based on the distribution of objective
values within the current population. It is defined as follows:
f (ai (t)) − (f̄ (t) − cσf (t)) if f (ai (t)) > (f̄ (t) − cσf (t))
(ai (t)) =
0 otherwise
where f̄ (t) is the mean objective value of the current population, σf (t) is the
(sample) standard deviation of the objective values in the current population,
and c is a constant, say c = 2. The idea is that f̄ (t) − cσf (t) represents the least
acceptable objective value for any reproducing individual. As the population
improves, this statistic tracks the improvement, yielding a level of selective
pressure that is sensitive to the spread of performance values in the population.
Fitness scaling methods based on power laws have also been proposed. A
fixed transformation of the form
where the parameter T can be used to control the level of selective pressure
during the course of the evolution. It is suggested by de la Maza and Tidor
(1993) that, if T decreases with time as in a simulated annealing procedure,
then a higher level of selective pressure results than with proportional selection
without fitness scaling.
(i)
Prprop (i) = µ .
i=1 (i)
23.4 Sampling
In an incremental, or steady-state, algorithm, the probability distribution can
be used to select one parent at a time. This procedure is commonly called
the roulette wheel sampling algorithm, since one can think of the probability
distribution as defining a roulette wheel on which each slice has a width
corresponding to the individual’s selection probability, and the sampling can
be envisioned as spinning the roulette wheel and testing which slice ends up at
the top. The pseudocode for this is shown below:
1 SUS(Pr, λ):
2 sample u ∼ U (0, λ1 );
3 sum ← 0.0;
4 for i = 1 to µ do
5 ci ← 0;
6 sum ← sum + Pr(i);
7 while u < sum do
8 ci ← ci + 1;
9 u ← u + λ1 ;
od
od
10 return c;
Note that the pseudocode allows for any number λ > 0 of children to
be specified. If λ = 1, SUS behaves like the roulette wheel function. For
generational algorithms, SUS is usually invoked with λ = µ.
In can be shown that the expected number of offspring that SUS assigns to
individual i is λ Pr(i), and that on each invocation of the procedure, SUS assigns
either λ Pr(i) or λ Pr(i) offspring to individual i. Finally, SUS is optimally
efficient, making a single pass over the individuals to assign all offspring.
23.5 Theory
The section presents some results from the theory of proportional selection.
First, the schema theorem is described, following by a discussion of the effects
Theory 177
m(H,t)
tsr(ai , t)
tsr(H, t) =def
i=1
m(H, t)
(ai )
tsr(ai , t) =
¯
(t)
where is the fitness function and (t)¯ denotes the average fitness of the
individuals in P (t). The most important feature of proportional selection is
that it induces the following target sampling rates for all hyperplanes in the
population:
m(H,t)
tsr(ai , t)
tsr(H, t) =
i=1
m(H, t)
m(H,t)
(ai )
=
¯
(t) m(H, t)
i=1
(H, t)
= (23.1)
¯
(t)
1 (x) = αf (x) + β
2 (x) = 1 (x) + γ
1 (H, t)
tsr1 (H, t) =
¯ 1 (t)
Theory 179
σp2 (t)
S(t) =
¯
(t)
where σp2 (t) is the fitness variance of the population at time t. From this
formula, it is easy to see that, without dynamic fitness scaling, an evolutionary
¯
algorithm tends to stagnate over time since σp2 (t) tends to decrease and (t)
tends to increase. The fitness scaling techniques described above are intended
to mitigate this effect. In addition, operators which produce random variation
(e.g. mutation) can also be used to reduce stagnation in the population.
References
Bäck T 1994 Selective pressure in evolutionary algorithms: a characterization of selection
mechanisms Proc. 1st IEEE Int. Conf. on Evolutionary Computation (Orlando, FL,
June 1994) (Piscataway, NJ: IEEE) pp 57–62
Baker J E 1987 Reducing bias and inefficiency in the selection algorithm Proc. 2nd Int.
Conf. on Genetic Algorithms (Cambridge, MA, 1987) ed J Grefenstette (Hillsdale,
NJ: Erlbaum) pp 14–21
de la Maza M and Tidor B 1993 An analysis of selection procedures with particular
attention paid to proportional and Boltzmann selection Proc. 5th Int. Conf. on
Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 124–31
Gillies A M 1985 Machine Learning Procedures for Generating Image Domain Feature
Detectors Doctoral Dissertation, University of Michigan, Ann Arbor
Goldberg D E 1989 Genetic Algorithms in Search, Optimization, and Machine Learning
(Reading, MA: Addison-Wesley)
Goldberg D and Deb K 1991 A comparative analysis of selection schemes used in
genetic algorithms Foundations of Genetic Algorithms ed G Rawlins (San Mateo,
CA: Morgan Kaufmann) pp 69–93
Grefenstette J 1986 Optimization of control parameters for genetic algorithms IEEE
Trans. Syst. Man Cybernet. SMC-16 122–8
——1991 Conditions for implicit parallelism Foundations of Genetic Algorithms ed G
Rawlins (San Mateo, CA: Morgan Kaufmann) pp 252–61
Holland J H 1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI:
University of Michigan Press)
Mühlenbein H and Schlierkamp-Voosen D 1993 Predictive models for the breeder genetic
algorithm Evolut. Comput. 1 25–49
24
Tournament selection
Tobias Blickle
181
182 Tournament selection
24.4 Properties
24.4.1 Concatenation of tournaments
An interesting property of tournament selection is the concatenation of several
selection phases. Assuming an arbitrary population with a fitness distribution
ρ̄, tournament selection with tournament size q1 is applied followed by
tournament selection with tournament size q2 on the resulting population and
no recombination in between. The obtained expected fitness distribution is the
same as if only a single tournament selection with tournament size q1 q2 were
applied to the initial distribution ρ̄ (Blickle and Thiele 1995b):
∗ 1
τtour (q) ≈ (ln λ + ln(ln λ)). (24.4)
ln q
Figure 24.1 shows the dependence of the takeover time on the tournament size
q. For scaling purposes an artificial population size of λ = e is assumed, such
∗
that (24.4) simplifies to τtour (q) ≈ 1/ ln q.
184 Tournament selection
S(q)
1.5
θ(q)
τ*(q)
0.5
0 q
5 10 15 20 25 30
Figure 24.1. The selection intensity S, the loss of diversity θ, and the takeover time τ ∗
(for λ = e) of tournament selection in dependence on the tournament size q.
The known exact solutions of the integral equation (24.5) are given in
table 24.1. These values can also be obtained using the results of the order
statistics theory (Bäck 1995). The following formula was derived by Blickle
and Thiele (1995b) and approximates the selection intensity with a relative error
of less than 1% for tournament sizes of q > 5:
Table 24.1. Known exact values for the selection intensity of tournament selection.
q 1 2 3 4 5
−1
Stour (q) 0 1
π 1/2
3
2π 1/2
6
π π 1/2
tan 2 1/2 10
π 1/2
3
2π
tan−1 21/2 − 1
4
References
Bäck T 1994 Selective pressure in evolutionary algorithms: a characterization of selection
mechanisms Proc. 1st IEEE Conf. on Evolutionary Computation (Orlando, FL, June
1994) (Piscataway, NJ: IEEE) pp 57–62
——1995 Generalized convergence models for tournament- and (µ, λ)-selection Proc.
6th Int. Conf. on Genetic Algorithms (Pittsburg, PA, July 1995) ed L J Eshelman
(San Mateo, CA: Morgan Kaufmann) pp 2–8
Baker J E 1987 Reducing bias and inefficiency in the selection algorithm Proc. 2nd Int.
Conf. on Genetic Algorithms (Cambridge, MA, 1987) ed J J Grefenstette (Hillsdale,
NJ: Erlbaum) pp 14–21
——1989 An Analysis of the Effects of Selection in Genetic Algorithms PhD Thesis,
Graduate School of Vanderbilt University, Nashville, TN
Blickle T and Thiele L 1995a A Comparison of Selection Schemes used in Genetic
Algorithms Technical Report 11, Computer Engineering and Communication
Networks Lab (TIK), Swiss Federal Institute of Technology (ETH) Zurich
——1995b A mathematical analysis of tournament selection Proc. 6th Int. Conf. on
Genetic Algorithms (Pittsburg, PA, July 1995) ed L J Eshelman (San Mateo, CA:
Morgan Kaufmann) pp 9–16
de la Maza M and Tidor B 1993 An analysis of selection procedures with particular
attention paid to proportional and Boltzmann selection Proc. 5th Int. Conf. on
Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 124–31
186 Tournament selection
John Grefenstette
25.1 Introduction
Selection is the process of choosing individuals for reproduction or survival in
an evolutionary algorithm. Rank-based selection or ranking means that only
the rank ordering of the fitness of the individuals within the current population
determines the probability of selection.
As discussed in Chapter 23, the selection process may be decomposed into
distinct steps:
(i) Map the objective function to fitness.
(ii) Create a probability distribution based on fitness.
(iii) Draw samples from this distribution.
Ranking simplifies step (i), the mapping from the objective function f to
the fitness function . All that is needed is
(ai ) = δf (ai )
187
188 Rank-based selection
It follows that αrank = 2−βrank , and 1 ≤ βrank ≤ 2. That is, the expected number
of offspring of the best individual is no more than twice that of the population
average. This shows how ranking can avoid premature convergence caused by
‘super’ individuals.
25.5 Theory
The theory of rank-based selection has received less attention than the
proportional selection method, due in part to the difficulties in applying the
schema theorem to ranking. The next subsection describes the issues that arise
in the schema analysis of ranking, and shows that ranking does exhibit a form
of implicit parallelism. Characterizations of the selective pressure of ranking
are also described, including its fertility rate, selective differential, and takeover
time. Finally, a simple substitution result is mentioned.
A strictly monotonic fitness function preserves the relative ranking of any two
individuals in the search space with distinct objective function values. Since
(ai ) = δf (ai ), ranking uses a strictly monotonic fitness function by definition.
Likewise, a selection algorithm is called monotonic if
where tsr(a) is the target sampling rate, or expected number of offspring, for
individual a. That is, a monotonic selection algorithm is one that respects
the survival-of-the-fittest principle. A selection algorithm is called strictly
monotonic if it is monotonic and
f(x)
A
S(t) ≈ I σp
where σp is the standard deviation of the fitness values in the population, and
I is a value called the selection intensity. Bäck (1995) quantifies the selection
intensity for general (µ, λ) selection as follows:
λ
1
I= E(Zi:λ )
µ i=λ−µ+1
Theory 193
where Zi:λ are order statistics based in the fitness of individuals in the current
population. That is, I is the average of the expectations of the µ best samples
taken from iid normally distributed random variables Z. This analysis shows
that I is approximately proportional to λ/µ, and experimental studies confirm
this relationship (Bäck 1995, Mühlenbein and Schlierkamp-Voosen 1993).
2
τ≈ ln(µ − 1)
µ−1
References
Bäck T 1995 Generalized convergence models for tournament- and (µ, λ)-selection Proc.
6th Int. Conf. on Genetic Algorithms (Pittsburgh, PA, July 1995) ed L J Eshelman
(San Mateo, CA: Morgan Kaufmann) pp 2–8
Bäck T and H-P Schwefel 1993 An overview of evolutionary algorithms for parameter
optimization Evolut. Comput. 1 1–23
Baker J 1985 Adaptive selection methods for genetic algorithms Proc. 1st Int. Conf. on
Genetic Algorithms (Pittsburgh, PA, July 1985) ed J J Grefenstette (Hillsdale, NJ:
Lawrence Erlbaum) pp 101–11
——1987 Reducing bias and inefficiency in the selection algorithm Proc. 2nd Int. Conf.
on Genetic Algorithms (Cambridge, MA, 1987) ed J J Grefenstette (Hillsdale, NJ:
Erlbaum) pp 14–21
——1989 Analysis of the Effects of Selection in Genetic Algorithms Doctoral Dissertation,
Department of Computer Science, Vanderbilt University
Blickle T and Thiele L 1995 A mathematical analysis of tournament selection Proc. 6th
Int. Conf. on Genetic Algorithms (Pittsburgh, PA, July 1995) ed L Eshelman (San
Mateo, CA: Morgan Kaufmann) pp 9–16
Goldberg D and Deb K 1991 A comparative analysis of selection schemes used in
genetic algorithms Foundations of Genetic Algorithms ed G Rawlins (San Mateo,
CA: Morgan Kaufmann) pp 69–93
Grefenstette J 1991 Conditions for implicit parallelism Foundations of Genetic Algorithms
ed G Rawlins (San Mateo, CA: Morgan Kaufmann) pp 252–61
Mühlenbein H and Schlierkamp-Voosen D 1993 Predictive models for the breeder genetic
algorithm Evolut. Comput. 1 25–49
Schwefel H-P 1977 Numerische Optimierung von Computer-Modellen mittels der
Evolutionsstrategie (Interdisciplinary System Research 26) (Basel: Birkhäuser)
——1987 Collective phenomena in evolutionary systems Preprints of the 31st Ann.
Meeting International Society for General Systems Research (Budapest) vol 2,
pp 1025–33
Shapiro S C (ed) 1990 Encyclopedia of Artificial Intelligence vol 1 (New York: Wiley)
Whitley D 1989 The GENITOR algorithm and selective pressure: why rank-based
allocation of reproductive trials is best Proc. 3rd Int. Conf. on Genetic Algorithms
(Fairfax, VA, June 1989) ed J Schaffer (San Mateo, CA: Morgan Kaufmann) pp 116–
21
Whitley D and Kauth J 1988 GENITOR: a different genetic algorithm Proc. Rocky
Mountain Conf. on Artificial Intelligence (Denver, CO) pp 118–30
26
Boltzmann selection
Samir W Mahfoud
26.1 Introduction
Boltzmann selection mechanisms thermodynamically control the selection
pressure in an evolutionary algorithm (EA), using principles from simulated
annealing (SA) (Kirpatrick et al 1983). Boltzmann selection mechanisms can
be used to indefinitely prolong an EA’s search, in order to locate better final
solutions.
In EAs that employ Boltzmann selection mechanisms, it is often impossible
to separate the selection mechanism from the rest of the EA. In fact, the
mechanics of the recombination and neighborhood operators are critical to the
generation of the proper temporal population distributions. Therefore, most
of the following discusses Boltzmann EAs rather than Boltzmann selection
mechanisms in isolation.
Boltzmann EAs represent parallel extensions of the inherently serial SA. In
addition, theoretical proofs of asymptotic, global convergence for SA carry over
to certain Boltzmann selection EAs (Mahfoud and Goldberg 1995).
The heart of Boltzmann selection mechanisms is the Boltzmann trial, a
competition between current solution i and alternative solution j , in which
i wins with logistic probability
1
(26.1)
1 + e(fi −fj )/T
where T is temperature and fi is the energy, cost, or objective function value
(assuming minimization) of solution i. Slight variations of the Boltzmann trial
exist, but all variations essentially accomplish the same thing when iterated (the
winner of a trial becomes solution i for the next trial): at fixed T , given a
sufficient number of Boltzmann trials, a Boltzmann distribution arises among
the winning solutions (over time). The intent of the Boltzmann trial is that at
high T , i and j win with nearly equal probabilities, making the system fluctuate
wildly from solution to solution; at low T , the better of the two solutions nearly
always wins, resulting in a relatively stable system.
195
196 Boltzmann selection
i A B C D
(1)
?
j C B A D
(2)
?
i or j
Figure 26.1. The population, after application of crossover and mutation (step 1),
transitions from superstring i to superstring j . After a Boltzmann trial (step 2), either
i or j becomes the current population. Individual population elements are represented
as rectangles within the superstrings. Blocks A, B, C, and D represent portions of
individual population elements, prior to crossover and mutation. Crossover points are
shown as dashed lines. Blocks A , B , C , and D result from applying mutation to A, B,
C, and D.
operator meets certain conditions. According to Aarts and Korst (1989), two
conditions on the neighborhood generation mechanism are sufficient to guarantee
asymptotic global convergence. The first condition is that the neighborhood
operator must be able to move from any state to a globally optimal state in a finite
number of transitions. The presence of mutation satisfies this requirement. The
second condition is symmetry. It requires that the probability at any temperature
of generating state y from state x is the same as the probability of generating
state x from state y. Symmetry holds for common crossover operators such as
single-point, multipoint, and uniform crossover (Mahfoud and Goldberg 1995).
References
Aarts E and Korst J 1989 Simulated Annealing and Boltzmann Machines: a Stochastic
Approach to Combinatorial Optimization and Neural Computing (Chichester:
Wiley)
Azencott R (ed) 1992 Simulated Annealing: Parallelization Techniques (New York:
Wiley)
de la Maza M and Tidor B 1993 An analysis of selection procedures with particular
attention paid to proportional and Boltzmann selection Proc. 5th Int. Conf. on
Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 124–31
Goldberg D E 1990 A note on Boltzmann tournament selection for genetic algorithms
and population-oriented simulated annealing Complex Syst. 4 445–60
Ingber L and Rosen B 1992 Genetic algorithms and very fast simulated re-annealing: a
comparison Math. Comput. Modelling 16 87–100
Kirpatrick S, Gelatt C D Jr and Vecchi M P 1983 Optimization by simulated annealing
Science 220 671–80
Mahfoud S W 1993 Finite Markov chain models of an alternative selection strategy for
the genetic algorithm Complex Syst. 7 155–70
——1995 Niching Methods for Genetic Algorithms Doctoral Dissertation and IlliGAL
Report 95001, University of Illinois at Urbana-Champaign, Illinois Genetic
Algorithms Laboratory; Dissertation Abstracts Int. 56(9) p 49878 (University
Microfilms 9543663)
Mahfoud S W and Goldberg D E 1992 A genetic algorithm for parallel simulated
annealing Proc. 2nd Int. Conf. on Parallel Problem Solving from Nature (Brussels,
1992) ed R Männer and B Manderick (Amsterdam: Elsevier) pp 301–10
——1995 Parallel recombinative simulated annealing: a genetic algorithm Parallel
Comput. 21 1–28
Romeo F and Sangiovanni-Vincentelli A 1991 A theoretical framework for simulated
annealing Algorithmica 6 302–45
27
Other selection methods
David B Fogel
27.1 Introduction
In addition to the methods of selection presented in other sections of this
chapter, other procedures for selecting parents of successive generations are of
interest. These include the tournament selection typically used in evolutionary
programming (Fogel 1995, p 137), soft brood selection offered within research
in genetic programming (Altenberg 1994a, b), disruptive selection (Kuo and
Hwang 1993), Boltzmann selection (de la Maza and Tidor 1993), nonlinear
ranking selection (Michalewicz 1996), competitive selection (Hillis 1992,
Angeline and Pollack 1993, Sebald and Schlenzig 1994), and the use of lifespan
(Bäck 1996).
201
202 Other selection methods
where q ∈ (0, 1) and does not depend on popsize; larger values of q imply
stronger selective pressure. Bäck (1994) notes that this nonlinear ranking method
fails to sum to unity and can be made practically identical to tournament selection
under the choice of q.
References
Altenberg L 1994a Emergent phenomena in genetic programming, Proc. 3rd Ann. Conf.
on Evolutionary Programming (San Diego, CA, February 1994) ed A V Sebald and
L J Fogel (Singapore: World Scientific) pp 233–41
——1994b The evolution of evolvability in genetic programming Advances in Genetic
Programming ed K Kinnear (Cambridge, MA: MIT Press) pp 47–74
Axelrod R 1987 The evolution of strategies in the iterated prisoner’s dilemma Genetic
Algorithms and Simulated Annealing ed L Davis (Los Altos, CA: Morgan
Kaufmann) pp 32–41
Angeline P J and Pollack J B 1993 Competitive environments evolve better solutions for
complex tasks Proc. 5th Int. Conf. on Genetic Algorithms (Urbana-Champaign, IL,
July 1993) ed S Forrest (San Mateo, CA: Morgan Kaufmann) pp 264–70
Bäck T 1994 Selective pressure in evolutionary algorithms: a characterization of selection
mechanisms Proc. 1st IEEE Conf. on Evolutionary Computation (Orlando, FL,
1993) (Piscataway, NJ: IEEE) pp 57–62
——1996 Evolutionary Algorithms in Theory and Practice (New York: Oxford
University Press)
de la Maza M and Tidor B 1993 An analysis of selection procedures with particular
attention paid to proportional and Boltzmann selection Proc. 5th Int. Conf. on
Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 124–31
Fogel D B 1995 Evolutionary Computation: Toward a New Philosophy of Machine
Intelligence (New York: IEEE)
Fogel L J and Burgin G H 1969 Competitive Goal-Seeking Through Evolutionary
Programming Final Report, Contract No AF 19(628)-5927, Air Force Cambridge
Research Labratories.
Hillis W D 1992 Co-evolving parasites improves simulated evolution as an optimization
procedure Artificial Life II ed C Langton, C Taylor, J Farmer and S Rasmussen
(Reading, MA: Addison-Wesley) pp 313–24
Kuo T and Hwang S-Y 1993 A genetic algorithm with disruptive selection Proc. 5th Int.
Conf. on Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San
Mateo, CA: Morgan Kaufmann) pp 65–9
Michalewicz Z 1996 Genetic Algorithms + Data Structures = Evolution Programs 3rd
edn (Berlin: Springer)
Schwefel H-P and Rudolph G 1995 Contemporary evolution strategies Advances in
Artificial Life (Proc. 3rd Int. Conf. on Artificial Life, Granada, Spain) (Lecture Notes
in Artificial Intelligence 929) ed F Morán et al (Berlin: Springer) pp 893–907
Sebald A V and Schlenzig J 1994 Minimax design of neural net controllers for highly
uncertain plants IEEE Trans. Neural Networks NN-5 73–82
28
Generation gap methods
28.1 Introduction
205
206 Generation gap methods
(identified empirically for the test suite used) meant that approximately 40% of
the offspring were clones of their parents, even for G = 1. A later empirical
study by Grefenstette (1986) confirmed the earlier results that a larger generation
gap value improved performance.
However, early experience with classifier systems (e.g. Holland and Reitman
1978) yielded quite the opposite behavior. In classifier systems only a subset of
the population is replaced each time step. Replacing a small number of classifiers
was generally more beneficial than replacing a large number or possibly all of
them. Here the poor performance observed as the generation gap value increased
was attributed to the fact that the population as a whole represented a single
solution and thus could not tolerate large changes in its content.
In recent years, computing equipment with increased capacity is easily
available and this effectively removes the reason for preferring the Rd approach.
The desire to solve more complex problems using genetic algorithms has
prompted researchers to develop an alternative to the generational system called
the ‘steady state’ approach, in which typically parents and offspring do coexist
(see e.g. Syswerda 1989, Whitley and Kauth 1988).
1.0
0.8
Best Ratio
0.6
0.4
0.2
0
0 1000 2000 3000 4000 5000
Individuals Generated
Figure 28.1. The mean and variance of the growth curves of the best in an overlapping
system (population size, 50; G = 1/50).
1.0
0.8
Best Ratio
0.6
0.4
0.2
0
0 1000 2000 3000 4000 5000
Individuals Generated
Figure 28.2. The mean and variance of the growth curves of the best in a nonoverlapping
system (population size, 50; G = 1).
only about 80% of the time and the growth curves exhibit much higher variance
when compared to the nonoverlapping population (figure 28.2).
This high variance for small generation gap values causes more genetic
drift (allele loss). Hence, with smaller population sizes, the higher variance in
a steady state system makes it easier for alleles to disappear. Increasing the
population size is one way to reduce the the variance (see figure 28.3) and thus
offset the allele loss. In summary, the main difference between the generational
and steady state systems is higher genetic drift in the latter especially when
small population sizes are used with low generation gap values. (See the article
by De Jong and Sarma (1993) for more details.)
1.0
0.8
Best Ratio
0.6
0.4
0.2
0
0 5000 10000 15000 20000
Individuals Generated
Figure 28.3. The mean and variance of the growth curves of the best in an overlapping
system (population size, 200; G = 1/200).
References
Bäck T, Hoffmeister F and Schwefel H-P 1991 A survey of evolutionary strategies Proc.
4th Int. Conf. on Genetic Algorithms (San Diego, CA, July 1991) ed R K Belew and
L B Booker (San Mateo, CA: Morgan Kaufmann) pp 2–9
De Jong K A 1975 An Analysis of the Behavior of a Class of Genetic Adaptive Systems
Phd Dissertation, University of Michigan
——1993 Genetic algorithms are NOT function optimizers Foundations of Genetic
Algorithms 2 ed L D Whitley (San Mateo, CA: Morgan Kaufmann) pp 5–17
De Jong K A and Sarma J 1993 Generation gaps revisited Foundations of Genetic
Algorithms 2 ed L D Whitley (San Mateo, CA: Morgan Kaufmann) pp 19–28
Fogel G B and Fogel D B 1995 Continuous evolutionary programming: analysis and
experiments Cybernet. Syst. 26 79–90
Fogel L J, Owens A J and Walsh M J 1996 Artificial Intelligence through Simulated
Evolution (New York: Wiley)
Goldberg D E and Deb K 1991 A comparative analysis of selection schemes used in
genetic algorithms Foundations of Genetic Algorithms 1 ed G J E Rawlins (San
Mateo, CA: Morgan Kaufmann) pp 69–93
Grefenstette J J 1986 Optimization of control parameters for genetic algorithms IEEE
Trans. Syst. Man Cybernet. SMC-16 122–8
Holland J H 1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI:
University of Michigan Press)
Holland J H and Reitman J S 1978 Cognitive systems based on adaptive algorithms
Pattern-Directed Inference Systems ed D A Waterman and F Hayes-Roth (New
York: Academic)
Peck C C and Dhawan A P 1995 Genetic algorithms as global random search methods:
an alternative perspective Evolutionary Comput. 3 39–80
Rechenberg I 1973 Evolutionsstrategie: Optimierung technischer Systeme nach
Prinzipien der biologischen Evolution (Stuttgart: Frommann-Holzboog)
Schwefel H-P 1981 Numerical Optimization of Computer Models (Chichester: Wiley)
Syswerda G 1989 Uniform crossover in genetic algorithms Proc. 3rd Int. Conf. on Genetic
Algorithms (Fairfax, VA, June 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 2–9
——1991 A study of reproduction in generational and steady-state genetic algorithms
Foundations of Genetic Algorithms 1 ed G J E Rawlins (San Mateo, CA: Morgan
Kaufmann) pp 94–101
Whitley D 1989 The GENITOR algorithm and selection pressure: why rank-based
allocation of reproductive trials is best Proc. 3rd Int. Conf. on Genetic Algorithms
(Fairfax, VA, June 1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann)
pp 116–21
Whitley D and Kauth J 1988 GENITOR: a Different Genetic Algorithm Colorado State
University Technical Report CS-88-101
29
A comparison of selection mechanisms
Peter J B Hancock
29.1 Introduction
Selection provides the driving force behind an evolutionary algorithm. Without
it, the search would be no better than random. This section explores the pros
and cons of a variety of different methods of performing selection. Selection
methods differ in two main ways: the way they aim to distribute reproductive
opportunities across members of the population, and the accuracy with which
they achieve their aim. The accuracy may differ because of sampling noise
inherent in some selection algorithms. There are also other differences that may
be significant, such as time complexity and suitability for parallel processing.
Crucially for some applications, they also differ in their ability to deal with
evaluation noise.
There have been a number of comparisons of different selection methods
by a mixture of analysis and simulation, usually on deliberately simplified
tasks. Goldberg and Deb (1991) considered a system with just two fitness
levels, and studied the time taken for the fitter individuals to take over the
population under the action of selection only, verifying their analysis with
simulations. Hancock (1994) extended the simulations to a wider range of
selection algorithms, and added mutation as a source of variation, to compare
effective growth rates. The effects of adding noise to the evaluation function
were also considered. Syswerda (1991) compared generational and incremental
models on a ten-level takeover problem. Thierens and Goldberg (1994) derived
analytical results for rates of growth for a bit counting problem, where the
approximately normal distribution of fitness values allowed them to include
recombination in their analysis. Bäck (1994) compared takeover times for all
the major selection methods analytically and reported an experiment on a 30-
dimensional sphere problem. Bäck (1995) compared tournament and (µ, λ)
selection more closely. Blickle and Thiele (1995a, b) undertook a detailed
analytical comparison of a number of selection methods (note that the second
paper corrects an error in the first). Other studies include those of Bäck and
Hoffmeister (1991), de la Maza and Tidor (1993) and Pál (1994).
212
Simulations 213
f̄sel − f̄
I= .
σ
This captures the notion that it is harder to produce a given step in average
fitness between the population and those selected when the fitness variance is
low. However, both takeover time and selection intensity depend on the fitness
functions, and so theoretical results may not always transfer to a real problem.
There is an additional difficulty because the fitness variance itself depends on
the selection method, so different methods configured to have the same selection
intensity may actually grow at different rates.
Most of the selection schemes have a parameter that controls either the
proportion of the population that reproduces or the distribution of reproductive
opportunities, or both. One aim in what follows will be to identify some
equivalent parameter settings for different selection methods.
29.2 Simulations
A number of graphs from simulations similar to those reported by
Hancock (1994) are shown here, along with some analytical and experimental
results from elsewhere. The takeover simulation initializes a population of 100
randomly, with rectangular distribution, in the range 0–1, with the exception
that one individual is set to 1. The rate of takeover of individuals with the value
1 under the action of selection alone is plotted. Results reported are averaged
over 100 different runs. The simulation is thus similar to that used by Goldberg
and Deb (1991), but the greater range of fitness values allows investigation of
the diversity maintained by the different selection methods. Since some of them
produce exponential takeover in such conditions, a second set of simulations
makes the problem slightly more realistic by adding mutation as a source of
variation to be exploited by the selection procedures. This growth simulation
initializes the population in the range 0–0.1. During reproduction, mutation with
a Gaussian distribution, mean 0, standard deviation 0.02, is added to produce the
offspring, subject to remaining in the range 0–1. Some plots show the value of
the best member of the population after various numbers of evaluations, again
averaged over 100 different runs. Other plots show the growth of the worst
value in the population, which gives an indication of the diversity maintained in
214 A comparison of selection mechanisms
the population. Some selection methods are better at preserving such diversity:
other things being equal, this seems likely to improve the quality of the overall
search (Mühlenbein and Schlierkamp-Voosen 1995, Blickle and Thiele 1995b).
It should be emphasized that fast convergence on these tasks is not
necessarily good: they are deliberately simple in an effort to illustrate some
of the differences between selection methods and the reasons underlying them.
Good selection methods need to balance exploration and exploitation. Before
reporting results, we shall consider a number of more theoretical points of
similarities and differences.
Goldberg and Deb (1991) showed that simple binary tournament selection (TS)
(see Chapter 24) is equivalent to linear ranking (Section 25.2) when set to give
two offspring to the top-ranked string (βrank = 2). However, this is only in
expectation: when implemented the obvious way, picking each fresh pair of
potential parents from the population with replacement, tournament selection
suffers from sampling errors like those produced by roulette wheel sampling,
precisely because each tournament is performed separately. A way to reduce
this noise is to take a copy of the population and choose pairs for tournament
from it without replacement. When the copy population is exhausted, another
copy is made to select the second half of the new population (Goldberg et al
1989). This method ensures that each individual participates in exactly two
tournaments, and will not fight itself. It does not eliminate the problem, since,
for example, an average individual, that ought to win once, may pick better or
worse opponents both times, but it will at least stop several copies of any one
being chosen.
The selection pressure generated by tournament selection may be decreased
by making the tournaments stochastic. The equivalence, apart from sampling
errors, with linear ranking remains. Thus TS with a probability of the better
string winning of 0.75 is equivalent to linear ranking with βrank = 1.5. The
selection pressure may be increased by holding tournaments among more than
two individuals. For three, the best will expect three offspring, while an
average member can expect 0.75 (it should win one quarter of its expected
three tournaments). The assignment is therefore nonlinear and Bäck (1994)
shows that, to a first approximation, the results are equivalent to exponential
nonlinear ranking, where the probability of selection of each rank i, starting at
i = 1 for the best, is given by (s − 1)(s i−1 )/(s µ − 1), where s is typically in the
range 0.9–1 (Blickle and Thiele 1995b). (Note that the probabilities as specified
by Michalewicz (1992) do not sum to unity (Bäck 1994).) More precisely,
they differ in that TS gives the worst members of the population no chance to
reproduce. Figure 29.1 compares the expected number of offspring for each rank
in a population of 100. The difference results in a somewhat lower population
diversity for TS when run at the same growth rate.
Goldberg and Deb (1991) prefer TS to linear ranking on account of its
lower time complexity (since ranking requires a sort of the population), and
Bäck (1994) argues similarly for TS over nonlinear ranking. However, time
complexity is unlikely to be an issue in serious applications, where the evaluation
time usually dominates all other parts of the algorithm. The difference is in any
case reduced if the noise-reduced version of TS is implemented, since this
also requires shuffling the population. For global population models, therefore,
ranking, with Baker’s sampling procedure (Baker 1987), is usually preferable.
TS may be appropriate in incremental models, where only one individual is
to be evaluated at a time, and in parallel population models. It may also be
216 A comparison of selection mechanisms
Figure 29.1. Expected number of offspring against rank for tournament selection with
tournament size 3 and exponential rank selection with s = 0.972.
appropriate in, for instance, game playing applications, where the evaluation
itself consists of individuals playing each other.
Freisleben and Härtfelder (1993) compared a number of selection schemes
using a meta-level GA, that adjusted the parameters of the GA used to tackle
their problem. Tournament selection was chosen in preference to rank selection,
which at first sight seems odd, since the only difference is added noise. A
possible explanation lies in the nature of their task, which was learning the
weights for a neural net simulation. This is plagued with symmetry problems
(e.g. Hancock 1992). The GA has to break the symmetries and decide on
just one to make progress. It seems possible that the inaccuracies inherent in
tournament selection facilitated this symmetry breaking, with one individual
having an undue advantage, and thereby taking over the population. Noise is
not always undesirable, though there may be more controlled ways to achieve
the same result.
error. Incremental models also suffer in the presence of evaluation noise (see
Section 29.6).
Figure 29.2. The growth rate in the presence of mutation of the best and worst in the
population for the incremental model with random deletion and the generational model,
both with linear rank selection for reproduction, βrank = 1.2.
1988). Although intuitively appealing, this has the effect of reducing selection
pressure as the population converges and can produce growth curves remarkably
similar to unscaled fitness proportional selection (FPS; Hancock 1994).
29.5.2 Ranking
Goldberg and Deb (1991) show that the expected growth rate for linear ranking
is proportional to the value of βrank , the number of offspring given to the best
individual. For exponential scaling, the selection pressure is proportional to
1 − s. This makes available a wide range of selection pressures, defined by the
value of s, illustrated in figure 29.4. The highest takeover rate available with
linear ranking (βrank = 2) is also shown. Exponential ranking can go faster with
smaller values of s (see table 29.1). Note the logarithmic x-axis on this plot.
With exponential ranking, because of the exponential assignment curve, poor
individuals do rather better than with linear ranking, at the expense of those more
in the middle of the range. One result of this is that, for parameter settings that
Simulation results 219
give similar takeover times, exponential ranking loses the worse values in the
population more slowly, which may help preserve diversity in practice.
Figure 29.3. (a) The takeover rate for FPS, with windowing, sigma, and linear scaling.
(b) Growth rates in the presence of mutation.
Figure 29.4. The takeover rate for exponential rank selection for a number of values of
s, together with that for linear ranking, βrank = 2.
220 A comparison of selection mechanisms
table 29.1). One simulation result is shown, in figure 29.5, to make clear
the selection pressure achievable by (µ, λ) selection, and indicate its potential
susceptibility to evaluation noise, discussed further below.
Figure 29.5. The growth rate in the presence of mutation for ES (µ, λ) selection with
and without evaluation noise, for λ = 100 and µ = 1, 10, and 25.
Figure 29.6. The takeover rates for the generational model and the kill-oldest incremental
model, both using linear ranking for selection.
Figure 29.7. Growth rates in the presence of mutation for incremental kill-by-inverse-
rank (kr) and generational linear ranking (rl) for various values of βrank .
growth rate changes more rapidly than βrank . This is because an increase in βrank
has two effects: increasing the probability of picking one of the better members
of the population at each step, and increasing the number of steps for which
they are likely to remain in the population, by decreasing their probability of
deletion. Figure 29.7 compares growth rates in the presence of mutation for
kill-by-rank incremental and equivalent generational models. It may be seen
that the generational model with βrank = 1.4 and the incremental model with
βrank = 1.2 produce very similar results. Another matched pair at lower growth
rates is generational with βrank = 1.2 and incremental with βrank = 1.13 (not
shown).
One of the arguments in favor of incremental models is that they allow
good new individuals to be exploited at once, rather than having to wait a
generation. It might be thought that any such gain would be rather slight, since
although a good new member could be picked at once, it is more likely to
have to wait several iterations at normal selection pressures. There is also the
inevitable sampling noise to be overcome. De Jong and Sarma (1993) claim
that there is actually no net benefit, since adding new fit members has the
effect of increasing the average fitness, thus reducing the likelihood of them
being selected. However, this argument applies only to takeover problems:
when reproduction operators are included the incremental approach can generate
higher growth rates. Figure 29.8 compares the growth of an incremental kill-
oldest model with a generational model using the same selection scheme. The
graph also shows one of the main drawbacks of the incremental models: their
sensitivity to evaluation noise, to be discussed in the following section.
Figure 29.8. Growth in the presence of mutation, with and without evaluation noise, for
the generational model with linear ranking and incremental models with kill-worst and
kill-oldest, all using βrank = 1.2 for selection.
Figure 29.9. Growth in the presence of mutation, with and without evaluation noise, for
the generational model with linear ranking, βrank = 1.8, and sigma-scaled FPS, s = 4.
224 A comparison of selection mechanisms
29.8 Conclusions
The choice of a selection mechanism cannot be made independently of
other aspects of the evolutionary algorithm. For instance, Eshelman (1991)
deliberately combines a conservative selection mechanism with an explorative
recombination operator in his CHC algorithm. Where search is largely driven by
mutation, it may be possible to use much higher selection pressures, typical of
the ES approach. If the evaluation function is noisy, then most incremental
models and others that may retain parents are likely to suffer. Certainly,
selection pressures need to be lower in the presence of noise, and, of the
incremental models, kill-oldest fares best. Without noise, incremental methods
can provide a useful increase in exploitation of good new individuals. Care
is needed in the choice of method of deletion: killing the worst provides high
growth rates with little means of control. Killing by inverse rank or killing
the oldest offers more control. Amongst generational models, the ES (µ, λ)
and exponential rank selection methods give the biggest and most controllable
range of selection pressures, with the ES method probably most suited to
mutation-driven, high-growth-rate systems, and ranking better for slower, more
explorative searches, where maintenance of diversity is important.
References
Bäck T 1994 Selective pressure in evolutionary algorithms: a characterization of selection
methods Proc. 1st IEEE Conf. on Evolutionary Computation (Orlando, FL, June
1994) (Piscataway, NJ: IEEE) pp 57–62
——1995 Generalized convergence models for tournament and (µ, λ)-selection Proc. 6th
Int. Conf. on Genetic Algorithms (Pittsburgh, PA, July 1995) ed L J Eshelman (San
Mateo, CA: Morgan Kaufmann) pp 2–8
226 A comparison of selection mechanisms
Wolfgang Banzhaf
30.1 Introduction
The basic idea of interactive evolution (IE) is to involve a human user on-line
in the variation–selection loop of the evolutionary algorithm (EA). This is to be
seen in contrast to the conventional participation of the user prior to running
the EA by defining a suitable representation of the problem (Chapters 14–21),
the fitness criterion for evaluation of individual solutions, and corresponding
operators (Chapters 31–34) to improve fitness quality. In the latter case, the
user’s role is restricted to passive observation during the EA run.
The minimum requirement for IE is the definition of a problem
representation, together with a determination of population parameters only.
Search operators of arbitrary kind as well as selection according to arbitrary
criteria might be applied to the representation by the user. The process is much
more comparable to the creation of a piece of art, for example, a painting, than
to the automatic evolution of an optimized problem solution. In IE, the user
assumes an active role in the search process. At the minimum level, the IE
system must hold present solutions together with variants presently generated
or considered.
Usually, however, automatic means of variation (i.e. evolutionary search
operators using random events) are provided with an IE system. In the present
context we shall require the existence of automatic means of variation by
operators for mutation (Chapter 32) and recombination (Chapter 33) of solutions
which are to be defined prior to running the EA.
30.2 History
Dawkins (1986) was the first to consider an elaborate IE system. The evolution
of biomorphs, as he called them, by IE in a system that he had originally intended
to be useful for the design of treelike graphical forms has served as a prototype
for many systems developed subsequently. Starting with the contributions of
Sims (1991) and the book of Todd and Latham (1992), computer art developed
into the present major application area of IE.
228
The problem 229
Input: µ, λ, ι , m , r , s
Output: a∗ , the individual last selected during the run, or
P ∗ , the population last selected during the run.
1 t ← 0;
2 P (t) ← initialize(µ);
3 while (ι(P (t), ι ) = true) do
4 Input: r , m
5 P (t) ← recombine(P (t), r , r );
6 P (t) ← mutate(P (t), m , m );
7 Output: P (t)
8 Input: s
9 P (t + 1) ← select(P (t), µ, s , s );
10 t ← t + 1;
od
30.5 Difficulties
The second, more complicated version of IE requires a predefined fitness
criterion, in addition to user action. This trades one advantage of IE systems for
another: the absence of any requirement to quantify fitness for a small number
of variants to be evaluated interactively by the user.
Interactive systems have one serious difficulty, especially in connection
with the automatic means of variation that are usually provided: whereas the
generation of variants does not necessarily require human intervention, selection
of variants does call the attention of the user. Due to psychological constraints,
however, humans can normally select only from a small set of choices. IE
systems are thus constrained to present only of the order of ten choices at each
point in time from which to choose. Also in sequence, only a limited number
of generations can be practically inspected by a user before the user becomes
tired.
It is emphasized that this limitation must not mean that the generation of
variants has to be restricted to small numbers. Rather the variants have to be
properly ordered at least, for a presentation of a subset that can be handled
interactively.
(a) (b)
(c)
Figure 30.1. Samples of evolved objects: (a) dynamical system, cell structure (Sims
1992, c MIT Press); (b) artwork by Mutator (Todd and Latham 1992, with permission
of the authors); (c) hybrid car model (Graf and Banzhaf 1995b,
c IEEE Press).
References
Caldwell C and Johnston V S 1991 Tracking a criminal suspect through ‘face-space’ with
a genetic algorithm Proc. 4th Int. Conf. on Genetic Algorithms (San Diego, CA, July
1991) ed R K Belew and L B Booker (San Mateo, CA: Morgan Kaufmann) pp 416–
21
234 Interactive evolution
Further reading
This section is intended to give an overview of presently available work in IE
and modeling methods which might be interesting to use.
1. Prusinkiewicz P and Lindenmayer A 1991 The Algorithmic Beauty of Plants (Berlin:
Springer)
An informative introduction to L-systems and their use in computer graphics.
4. Baker E 1993 Evolving line drawings Proc. Int. Conf. on Genetic Algorithms
(Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan
Kaufmann) p 627
This contribution discusses new ideas on design using simple style elements for IE.
Zbigniew Michalewicz
235
236 Introduction to search operators
References
Eiben A E, Raue P-E and Ruttkay Zs 1994 Genetic algorithms with multi-parent
recombination Proc. Parallel Problem Solving from Nature vol 3 (New York:
Springer) pp 78–87
Radcliffe N J 1993 Genetic set recombination Foundations of Genetic Algorithms II
(October 1994, Jerusalem) ed Yu Davidor, H-P Schwefel and R Männer (San Mateo,
CA: Morgan Kaufmann) pp 203–19
32
Mutation operators
237
238 Mutation operators
where u ∼ U ([0, 1)) denotes a uniform random variable sampled anew for each
i ∈ {1, . . . , }.
From a computational point of view, the straightforward implementation
of equation (32.1) as a loop calling the random number generator for each
position i is extremely inefficient. Since the random variable T describing the
distances between two positions to be mutated has a geometrical distribution with
P{T = t} = pm (1 − pm )t−1 and expectation E[T ] = 1/pm , and a geometrical
random number can be generated according to
ln(1 − u)
t =1+ (32.2)
ln(1 − pm )
Mutation generally refers to the creation of a new solution from one and only
one parent (otherwise the creation is referred to as a recombination (see Chapter
33). Given a real-valued representation where each element in a population is an
n-dimensional vector x ∈ Rn , there are many methods for creating new elements
(offspring) using mutation. These methods have a long history, extending back
at least to Bremermann (1962), Bremermann et al (1965), and others. A variety
of methods will be considered here.
The general form of mutation can be written as
x = m(x) (32.3)
where x is the parent vector, m is the mutation function, and x is the resulting
offspring vector. Although there have been some attempts to include mutation
operators that do not operate on the specific values of the parents but instead
simply choose x from a fixed probability density function (PDF) (Montana and
Davis 1989), such methods lose the inheritance from parent to offspring that
can facilitate evolutionary optimization on a variety of response surfaces. The
more common form of mutation generates an offspring vector:
x =x+M (32.4)
where the mutation M is a random variable. M is often zero mean such that
E(x ) = x; the expected difference between a parent and its offspring is zero.
M can take different forms. For example, M could be the uniform random
variable U (a, b)n , where a and b are the lower and upper limits respectively. In
this case, a is often set equal to −b. The result of applying this operator as M in
equation (32.4) yields an offspring within a hyperbox x + U (−b, b)n . Although
such a mutation is unbiased with respect to the position of the offspring within
the hyperbox, the method suffers from easy entrapment when the parent vector
x resides in a locally optimal well that is wider than the available step size.
Davis (1989, 1991b) offered a similar operator (known as creep) that has a
fixed probability of altering each component of x up or down by a bounded
small random amount. The only method for alleviating entrapment in such cases
relies on probabilistic selection, that is, maintaining a probability for choosing
lesser-valued solutions to become parents of the subsequent generations (see
Chapter 27). In contrast, unbounded mutation operators do not require such
selection methods to guarantee asymptotic global convergence (Fogel 1994,
Rudolph 1994).
The primary unbounded mutation PDF for real-valued vectors has been the
Gaussian (or ‘normal’) (Rechenberg 1973, Schwefel 1981, Fogel et al 1990,
Fogel and Atmar 1990, Bäck and Schwefel 1993, Fogel and Stayton 1994, and
240 Mutation operators
When µ = 0, the parameter σ offers the single control on the scaling of the
PDF. It effectively generates a typical step size for a mutation. The use of zero-
mean Gaussian mutations generates offspring that are (i) on average no different
from their parents and (ii) increasingly less likely to be increasingly different
from their parents. Saltations are not completely avoided such that any local
optimum can be escaped from in a single iteration, yet they are not so common
as to lose all inheritance from parent to offspring.
Other density functions with similar characteristics have also been
implemented. Yao and Liu (1996) proposed using Cauchy distributions to aid
in escaping from local minima (the Cauchy distribution has a fatter tail than the
Gaussian) and demonstrated that Cauchy mutations may offer some advantages
across a wide testbed of problems. Montana and Davis (1989) examined the
use of Laplace-distributed mutations but there is no evidence that the Laplace
distribution is particularly better suited than Gaussian or Cauchy mutations for
typical real-valued optimization problems.
In the simplest version of evolution strategies or evolutionary programming,
described as a (1 + 1) evolutionary algorithm, a single parent x creates a single
offspring x by imposing a multivariate Gaussian perturbation with mean zero
and standard deviation σ on the parent, then selects the better of the two trial
solutions as the parent for the next iteration. The same standard deviation is
applied to each component of the vector x during mutation. For some problems,
the variation of σ (i.e. the step size control parameter in each dimension) can
be computed to yield an optimal rate of convergence.
Let the convergence rate be defined as the ratio of the Euclidean distance
covered toward the optimum solution to the number of trials required to achieve
the improvement. Rechenberg (1973) calculated the convergence rates for two
functions:
σ = (π 1/2 /2)(b/n)
Real-valued vectors 241
σ = 1.224x/n
on the sphere model. That is, only a single step size control is needed for
optimum convergence. Given these optimum standard deviations for mutation,
the optimum probabilities of generating a successful mutation can be calculated
as
p1 = (2e)−1 ≈ 0.184
opt
opt
p2 = 0.270.
Noting the similarity of these two values, Rechenberg (1973) proposed the
following rule:
The ratio of successful mutations to all mutations should be 1/5. If this ratio is
greater than 1/5, increase the variance; if it is less, decrease the variance.
Schwefel (1981) suggested measuring the success probability on-line over 10n
trials (where there are n dimensions) and adjusting σ at iteration t by
σ (t − n)δ if ps < 0.2
σ (t) = σ (t − n)δ if ps > 0.2
σ (t − n) if ps = 0.2
with δ = 0.85 and ps equaling the number of successes in 10n trials divided
by 10n, which yields convergence rates of geometric order for both f1 and f2
(Bäck et al 1993; see the book by Bäck (1996) for corrections to the update
rule offered by Bäck et al (1993)).
The use of a single step size control parameter covering all dimensions
simultaneously is of limited robustness. The optimization performance can be
improved by using appropriate step sizes in each dimension. This is particularly
evident when consideration is given to optimizing a vector of parameters each
of different units of dimension (e.g. temperature and pressure). Determining
appropriate settings for each of n step sizes poses a significant challenge to the
human operator; as such, methods have been proposed for self-adapting the step
sizes concurrent to the evolutionary search.
The first efforts in self-adaptation date back at least to the article by Reed et al
(1967), but the two most common implementations in use currently derive from
the work of Schwefel (1981) and Fogel et al (1991). In each case, the vector of
objective variables x is accompanied by a vector strategy parameters σ where
σi denotes the standard deviation to use when applying a zero-mean Gaussian
mutation to that component in the parent vector. The strategy parameters are
updated by slightly different methods according to Schwefel (1981) and Fogel
et al (1991).
242 Mutation operators
xi = xi + N (0, σi )
σi = σi + χ N (0, σi )
where the parents’ strategy parameters are used to create the offspring’s
objective values before being mutated themselves, and the mutation of the
strategy parameters is achieved using a Gaussian distribution scaled by χ
and the standard deviation for each dimension. This procedure also requires
incorporating a rule such that if any component σi becomes negative it is reset
to an arbitrary small value .
Several comparisons have been conducted between these methods.
Saravanan and Fogel (1994) and Saravanan et al (1995) indicated that the
log-normal procedure offered by Schwefel (1981) generated generally superior
optimization performance (statistically significant) across a series of standard
test functions. Angeline (1996a), in contrast, found that the use of Gaussian
mutations on the strategy parameters generated better optimization performance
when the objective function was made noisy. Gehlhaar and Fogel (1996)
indicated that mutating the strategy parameters before creating the offspring
objective values appears to be more generally useful both in optimizing a set of
test functions and in molecular docking applications.
Both of the above methods for self-adaptation have been extended to
include possible correlation across the dimensions. That is, rather than use n
independent Gaussian random perturbations, a multivariate Gaussian mutation
with arbitrary covariance can be applied. Schwefel (1981) described a method
for incorporating rotation angles α such that new solutions are created by
where xi (t) is the ith parameter of the vector x at generation t, xi ∈ [lbi , ubi ],
the lower and upper bounds, respectively, u is a random uniform U (0, 1), and
the function (t, y) returns a value in the range [0, y] such that the probability
of (t, y) being close to zero increases as t increases, essentially taking smaller
steps on average. Michalewicz et al (1994) used the function
32.3 Permutations
Darrell Whitley
32.3.1 Introduction
Mutation operators can be used in a number of ways. Random mutation
hillclimbing (Forrest and Mitchell 1993) is a search algorithm which applies
a mutation operator to a single string and accepts any improving moves. Some
forms of evolutionary algorithms apply mutation operators to a population of
strings without using recombination, while other algorithms may combine the
use of mutation with recombination.
Any form of mutation which is to be applied to a permutation must yield
a string which also represents a permutation. Most mutation operators for
permutations are related to operators which have also been used in neighborhood
local search strategies. Many of these operators thus can be applied in such as
way that they reach a well-defined neighborhood of adjacent states.
244 Mutation operators
The most common form of mutation is 2-opt (Lin and Kernighan 1973). Given
a sequence of elements
A B C D E F G H
the 2-opt operator selects two points along the string, then reverses the segment
between the points. Note that if the permutation is viewed as a circuit as in the
traveling salesman problem (TSP), then all shifts of a sequence of N elements
are equivalent. It follows that once two cut points have been selected in this
circular string, it does not matter which segment is reversed; the effect is the
same.
The 2-opt operator can be applied to all pairs of edges in N (N − 1)/2 steps.
This is analogous to one iteration of local search over all variables in a parameter
optimization problem. If a full iteration of 2-opt to all pairs of edges fails to
find an improving move, then a local optimum has been reached.
B C
A
H F
E
G
2-opt is classically associated with the Euclidean TSP. Consider the graph
in figure 32.1. If this is interpreted as a Euclidean TSP, then reversing the
segment [C D E F] or the segment [G H A B] results in a graph where none of
the edges cross and which has lower cost than the graph where the edges cross.
Let {A, B, . . . , Z} be a set of vertices and (a, b) be the edge between vertices A
and B. If vertices {B, C, F, G} in figure 32.1 are connected by the set of edges
((b, c), (b, f), (b, g), (c, f), (c, g) (f, g)), then two triangles are formed when B
is connected to F and C is connected to G. To illustrate, create a new graph
by placing a new vertex X at the point where the edges (b, f) and (c, g) cross.
In the new graph in Euclidean space, the distance represented by edge (b, c)
must be less than edges (b, x) + (x, c), assuming B, C, and X are not on a line;
likewise, the distance represented by edge (f, g) must be less than edge (f, x) +
(x, g). Thus, reversing the segment [C D E F] will always reduce the cost of
the tour due to this triangle inequality. For the TSP this leads to the general
principle that multiple applications of 2-opt will always yield a tour that has no
crossed edges.
Permutations 245
One can also look at reversing more than two segments at a time. The
3-opt operator cuts the permutation into three segments and then looks at all
possible ways of reordering these segments. There are 3! = 6 ways to order the
segments and each segment can be placed in a forward or reverse order. This
yields up to 23 ∗ 6 = 48 possible new reorderings of the original permutation.
For the symmetric TSP, however, all shifted arrangements of the three segments
are equal and all reversed arrangements of the three segments are equal. Thus,
the 3! orderings are all equivalent. (By analogy, note that there is only one
possible Hamiltonian circuit tour between three cities.) This leaves only 23 = 8
ways of placing each of the segments in a forward or reverse direction, each
of which yields a unique tour. Thus, for the symmetric TSP, the cost to test
one 3-opt move is eight times greater than the cost of testing one 2-opt move.
For other types of scheduling problem, such as resource allocation, reversals
and shifts of the complete permutation are not necessarily equivalent and the
cost of a 3-opt move may be
up to 48 times greater than that of a 2-opt move.
Also note that there are N3 ways to break a permutation up into combinations
of three segments compared to N2 ways of breaking the permutation into two
segments. Thus, the set of all possible 3-opt moves is much larger than the set
of possible 2-opt moves. This further increases the cost of performing one pass
of 3-opt over all possible ways of partitioning a permutation into three segments
compared to a pass of 2-opt over all pairs of possible segments.
One can also use k-opt, where the permutation is broken into k segments,
but such an operator will obviously be very costly.
suggest that a single mutation should represent a minimal change and look at
different types of mutation operator for different representations of the TSP.
For resource allocation problems, a more modest change than 2-opt is
to merely select one element and to insert it at some other position in the
permutation. Syswerda (1991) refers to a variant of this as position-based
mutation and describes it as selecting two elements and then moving the second
element before the first element. Position-based mutation appears to be less
general than the insert operator, since elements can only be moved forward in
position-based mutation.
Similarly, one can select two elements and swap the positions of the two
elements. Syswerda denotes this as order-based mutation. Note that if an
element is moved forward or backward one position, this is equivalent to a
swap of adjacent elements. One way in which swap can be used as a local
search operator is to swap all adjacent elements, or perhaps also all pairs of
elements. Finally, Syswerda also defines a scramble mutation operator that
selects a sublist of permutation elements and randomly reorders (i.e. scrambles)
the order of the subset while leaving the other elements in the permutation in
the same absolute position. Davis (1991a) also reports on a scramble sublist
mutation operator, except that the sublist is explicitly composed of contiguous
elements of a permutation. (It is unclear whether Syswerda’s scramble operator
is also meant to work on contiguous elements or not; an operator that selects
a sublist of elements over random positions of the permutation is certainly
possible.)
For a problem that involved scheduling a limited number of flight simulators,
Syswerda (1991, p 342) reported that when applied individually, the order-based
swap mutation operator yielded the best results when compared to position-
based mutation and scramble mutation. In this case the swaps were selected
randomly rather than being performed over a fixed well-defined neighborhood.
Davis (1991, p 81) on the other hand reports that the scramble sublist mutation
operator proved to be better than the swap operator on a number of applications.
In conclusion, one cannot make a priori statements about the usefulness of
a particular mutation operator without knowing something about the type of
problem that is to be solved and the representation that is being used for that
problem, but in general it is useful to distinguish between permutation problems
that are sensitive to adjacency (e.g. the TSP) versus relative order (e.g. resource
scheduling) or absolute position, which appears to be the least common.
where Q is a finite set, the set of states, T is a finite set, the set of input
symbols, P is a finite set, the set of output symbols, s : Q × T → Q, the next
state function, and o : Q × T → P , the next output function,
there are various methods for mutating parents to create offspring. Following
directly from the definition, five obvious modes of mutation present themselves:
(i) change an output symbol, (ii) change a state transition, (iii) add a new state,
(iv) delete a state, and (v) change the start state. Each of these will be discussed
in turn.
(i) Changing an output symbol consists of determining a particular state q ∈ Q,
and then determining a particular symbol τ ∈ T . For this pair (q, τ ),
identify the associated output symbol ρ ∈ P and change it to a symbol
chosen at random over the set P . The probability mass function for
selecting a new symbol is typically uniform over the possible symbols in
P , but can be chosen to reflect nearness between symbols or other known
relationships between the symbols.
(ii) Changing a state transition consists of determining a particular state q1 ∈ Q,
and then determining a particular symbol τ ∈ T . For this pair (q1 , τ ),
identify the associated next state q2 and change it to a state chosen at
random over the set Q. The probability mass function for selecting a new
symbol is typically uniform over the possible states in Q.
(iii) Adding a state can only be performed when the maximum size of the
machine has not been exceeded. The operation is accomplished by
increasing the set Q by one element. This new state must be properly
defined by generating an associated output symbol ρi and next state
transition qi for all input symbols i = 1, . . . , |T |. The generation is
typically performed by selecting output symbols and next state transitions
with equal probability across their respective sets. Optionally, the new state
may also be forced to be connected to the preexisting states by redirecting
a randomly selected state transition of a randomly chosen preexisting state
to the new state.
(iv) Deleting a state can be performed when the machine has at least two states.
The operation is accomplished by decreasing the set Q by one element
chosen at random (uniformly). All state transitions from other states that
point to the deleted state must be redirected to the remaining states. This
is often performed at random, with the new states selected with equal
probability.
(v) Changing the start state can be performed when the machine has at least
two states. The operation is accomplished by selecting a state q ∈ Q to
be the new starting state. Again, the selection is typically made uniformly
over the available states.
The mutation operation can be implemented with various probabilities
assigned to each mode of mutation (Fogel and Fogel 1986), although many
of the initial experiments in evolutionary programming used equal probabilities
248 Mutation operators
Figure 32.2. An illustration of the grow mutation operator applied to a Boolean parse
tree. Given a parent tree to mutate, a terminal node is selected at random (highlighted)
and replaced by a randomly generated subtree to produce the child tree.
Figure 32.3. An illustration of the shrink mutation operator applied to a Boolean parse
tree. Given a parent tree to mutate, an internal function node is selected at random
(highlighted) and replaced by a randomly selected terminal to produce the child tree.
Figure 32.4. An illustration of the switch mutation operator applied to a Boolean parse
tree. Given a parent tree to mutate, an internal function node is selected, two of the
subtrees below it are selected (highlighted in the figure) and their positions switched to
produce the child tree.
250 Mutation operators
Figure 32.5. An illustration of the cycle mutation operator applied to a Boolean parse
tree. Given a parent tree to mutate, a single node, either a terminal or function, is selected
at random (highlighted in the parent) and replaced by a randomly selected node with the
same number of arguments to produce the child tree.
Many real-world applications suggest the use of representations that are hybrids
of the canonical representations. One common instance is the simultaneous use
of discrete and continuous object variables, with a general formulation of the
global optimization problem as follows (Bäck and Schütz 1995):
Bäck and Schütz (1995) approach the general problem by including a vector
of mutation strategy parameters pj ∈ (0, 1) and j = 1, 2, . . . , d, where there are
d components to the vector d. (Alternatively, fewer strategy parameters could
be used.) These strategy parameters are adapted along with the usual step size
control strategy parameters for Gaussian mutation of the real-world vector x.
The discrete strategy parameters are updated by the formula
−1
1 + (1 − pj )
pj =
pj × exp[−γ Nj (0, 1)]
References
Angeline P J 1996a The effects of noise on self-adaptive evolutionary optimization, Proc.
5th Ann. Conf. on Evolutionary Programming ed L J Fogel, P J Angeline and T
Bäck (Cambridge, MA: MIT Press) pp 443–50
——1996b Genetic programming’s continued evolution Advances in Genetic Program-
ming vol 2, ed P J Angeline and K Kinnear (Cambridge, MA: MIT Press) pp 89–110
Angeline P J, Fogel D B and Fogel L J 1996 A comparison of self-adaptation methods for
finite state machines in a dynamic environment Proc. 5th Ann. Conf. on Evolutionary
Programming ed L J Fogel, P J Angeline and T Bäck (Cambridge, MA: MIT Press)
pp 441–50
Bäck T 1993 Optimal mutation rates in genetic search Proc. 5th Int. Conf. on Genetic
Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo, CA:
Morgan Kaufmann) pp 2–8
——1996 Evolutionary Algorithms in Theory and Practice (New York: Oxford
University Press)
Bäck T, Rudolph G and Schwefel H-P 1993 Evolutionary programming and evolution
strategies: similarities and differences Proc. 2nd Ann. Conf. on Evolutionary
Programming (San Diego, CA) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp 11–22
Bäck T and Schütz M 1995 Evolution strategies for mixed-integer optimization of optical
multilayer systems Proc. 4th Ann. Conf. on Evolutionary Programming (San Diego,
CA, March 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge,
MA: MIT Press) pp 33–51
Bäck T and Schwefel H-P 1993 An overview of evolutionary algorithms for parameter
optimization Evolutionary Comput. 1 1–24
Bagley J D 1967 The Behavior of Adaptive Systems which Employ Genetic and
Correlation Algorithms Doctoral Dissertation, University of Michigan; University
Microfilms 68-7556
Bremermann H J 1962 Optimization through evolution and recombination Self-
Organizing Systems ed M C Yovits, G T Jacobi and G D Goldstine (Washington,
DC: Spartan) pp 93–106
Bremermann H J and Rogson M 1964 An Evolution-type Search Method for Convex Sets
ONR Technical Report, contracts 222(85) and 3656(58)
Bremermann H J, Rogson M and Salaff S 1965 Search by evolution Biophysics and
Cybernetic Systems ed M Maxfield, A Callahan and L J Fogel (Washington, DC:
Spartan) pp 157–67
References 253
(Pittsburgh, PA, July 1995) ed L J Eshelman (San Mateo, CA: Morgan Kaufmann)
pp 159–66
Ostermeier A 1992 An evolution strategy with momentum adaptation of the random
number distribution Parallel Problem Solving from Nature, 2 (Proc. 2nd Int. Conf.
on Parallel Problem Solving from Nature, Brussels, 1992) ed R Männer and B
Manderick pp 197–206
Radcliffe N and Surry P D 1995 Fitness variance of formae and performance prediction
Foundations of Genetic Algorithms 3 ed D Whitley and M Vose (San Mateo, CA:
Morgan Kaufmann) pp 51–72
Rechenberg I 1973 Evolutionsstrategie: Optimierung technischer Systeme nach
Prinzipien der biologischen Evolution (Stuttgart: Frommann-Holzboog)
Reed J, Toombs R and Barricelli N A 1967 Simulation of biological evolution and
machine learning J. Theor. Biol. 17 319–42
Rudolph G 1994 Convergence properties of canonical genetic algorithms IEEE Trans.
Neural Networks 5 96–101
Saravanan N and Fogel D B 1994 Learning strategy parameters in evolutionary
programming: an empirical study Proc. 3rd Ann. Conf. on Evolutionary
Programming (San Diego, CA, February 1994) ed A V Sebald and L J Fogel
(Singapore: World Scientific) pp 269–80
Saravanan N, Fogel D B and Nelson K M 1995 A comparison of methods for self-
adaptation in evolutionary algorithms BioSystems 36 157–66
Schwefel H-P 1981 Numerical Optimization of Computer Models (Chichester: Wiley)
——1995 Evolution and Optimum Seeking (New York: Wiley)
Schaffer J D, Caruana R A, Eshelman L J and Das R 1989 A study of control parameters
affecting online performance of genetic algorithms for function optimization Proc.
3rd Int. Conf. on Genetic Algorithms (Fairfax, VA, June 1989) ed J D Schaffer (San
Mateo, CA: Morgan Kaufmann) pp 51–60
Syswerda G 1991 Schedule optimization using genetic algorithms Handbook of Genetic
Algorithms ed L Davis (New York: Van Nostrand Reinhold) pp 332–49
Yao X and Liu Y 1996 Fast evolutionary programming Proc. 5th Ann. Conf. on
Evolutionary Programming ed L J Fogel, P J Angeline and T Bäck (Cambridge,
MA: MIT Press) at press
33
Recombination
33.1.1 Introduction
In biological systems (see section 5.4), crossing-over is a complex process
that occurs between pairs of chromosomes. Two chromosomes are physically
aligned, breakage occurs at one or more corresponding locations on each
chromosome, and homologous chromosome fragments are exchanged before
the breaks are repaired. This results in a recombination of genetic material
that contributes to variability in the population. In evolutionary algorithms, this
process has been abstracted into syntactic crossing-over (or crossover) operators
that exchange substrings between chromosomes represented as linear strings
of symbols. In this section we describe various approaches to implementing
these computational recombination techniques. Note that, while binary strings
(Chapter 15) are the canonical representation of chromosomes most often
associated with evolutionary algorithms, crossover operators work the same
way on all linear strings regardless of the cardinality of the symbol alphabet.
Accordingly, the discussion in this section applies to both binary and nonbinary
string representations. The obvious caveat is that the syntactic manipulations by
crossover must yield semantically valid results. When this becomes a problem—
for example, when the chromosomes represent permutations (see Chapter 17)—
then other syntactic operations must be used.
256
Binary strings 257
Two new resultant strings are formed by exchanging the parent substrings to the
right of position k. Holland points out that when the overall algorithm is limited
to producing only one new individual per generation, one of the resultant strings
generated by this crossover operator must be discarded. The discarded string is
usually chosen at random.
Holland’s general procedure defines a family of operators that can be
described more formally as follows. Given a space I of individual strings,
a crossover operator is a mapping
m
r : I × I −→ I × I r(a, b) = (c, d)
where m ∈ B and
ai if mi = 0 bi if mi = 0
ci = di =
bi if mi = 1 ai if mi = 1.
crossover(a, b) :
sample u ∈ U (0, 1)
if (u > pc )
then return(a, b)
fi
c := a;
d := b;
m := compute mask();
for i := 1 to do
if (mi = 1)
then
ci := bi ;
di := ai ;
fi
od
return(c, d);
Empirical studies have shown that the best setting for the crossover rate pc
depends on the choices made regarding other aspects of the overall algorithm,
such as the settings for other parameters such as population size and mutation
rate, and the selection operator used. Some commonly used crossover rates
are pc = 0.6 (De Jong 1975), pc ∈ [0.45, 0.95] (Grefenstette 1986), and
pc ∈ [0.75, 0.95] (Schaffer et al 1989). Techniques for adaptively modifying the
crossover rate have also proven to be useful (Booker 1987, Davis 1989, Srinivas
and Patnaik 1994, Julstrom 1995). The pseudocode shown above makes it clear
that the differences between crossover operators are most likely to be found in
the implementation of the compute mask() procedure. The following examples
of pseudocode characterize the way compute mask() is implemented for the
most commonly cited crossover operators.
sample u ∈ U (1, − 1)
m := 0;
for i := u + 1 to do
mi = 1;
od
return m;
Binary strings 259
m := 0;
for i := 1 to do
sample u ∈ U (0, 1)
if (u ≤ px )
then mi = 1;
fi
od
return m
260 Recombination
The value px = 0.5 first used by Ackley remains the standard setting for
the crossover probability at each position, though it may be advantageous to
use smaller values (Spears and De Jong 1991b). When px = 0.5, every binary
string of length is equally likely to be generated as a mask. In this case,
it is often more efficient to implement the operator by using a random integer
sampled from U (0, 2 − 1) as the mask instead of constructing the mask one bit
at a time.
compute mask(a, b)
j := 0;
for i := 1 to /2 do
mi := j ;
mi := j
if ((ai = 1) or (bi = 1))
then j = 1 − j ;
fi
od
return (m);
Note that the symbol and punctuation mark associated with a chromosome
position are transmitted together by the punctuated crossover operator. While
the idea behind this operator is appealing, empirical tests of punctuated crossover
were not conclusive and the operator is not widely used.
offspring can substantially reduce the loss of diversity in the population. Another
widespread practice is to restrict the crossover points to those locations where
the parent strings have different symbols. This so-called reduced surrogate
technique (Booker 1987) improves the ability of crossover to produce offspring
that are different from their parents.
An implementation technique called shuffle crossover was introduced by
Eshelman et al (1989). The symbols in the parent strings are ‘shuffled’ by a
permutation operator before crossover is invoked. The inverse permutation is
applied to the offspring produced by crossover to restore the original symbol
ordering. This method can be used to counteract the tendency in n-point
crossover (n ≥ 1) to disrupt sets of symbols that are widely dispersed on the
chromosome more than it disrupts symbols which are close together (see the
discussion of bias in Section 33.1.4).
The crossover mechanisms described so far are all consistent with the
simplest principle of Mendelian inheritance: the requirement that every gene
carried by an offspring is a copy of a gene inherited from one of its parents.
Radcliffe (1991) points out that this conservation of genetic material during
recombination is not a necessary restriction for artificial recombination operators.
From the standpoint of conducting a robust exploration of the opportunities
represented by the parent strings, it is reasonable to ask whether a crossover
operator can generate all possible offspring having some combination of genes
found in the parents. Given a binary string representation, the answer for one-
point and n-point crossover is no while the answer for shuffle crossover and
uniform crossover is yes. (To see this, simply consider the set of possible
resultant strings for the parents 0 and 1.) For nonbinary strings, however, the
only way to achieve this capability is to allow the offspring to have genes
that are not carried by either parent. Radcliffe used this idea as the basis for
designing the random respectful recombination operator. This operator generates
a resultant string by copying the symbols at positions where the parents are
identical, then choosing random values to fill the remaining positions. Note that
for binary strings, random respectful recombination is equivalent to uniform
crossover with px = 0.5.
RA (B) = R(B ∪ C) B ⊆ A.
C⊆A
lim p (t) (z) = p (0) (zi )
t→∞
i=1
264 Recombination
which is the product of the marginal distributions of alleles from the initial
population.
This theorem tells us that, in the limit, random mating and recombination
without selection lead to chromosome frequencies corresponding to the simple
product of initial allele frequencies. A population in this state is said to be
in linkage equilibrium or Robbins’ equilibrium (Robbins 1918). This result
holds for all recombination operators that allow any two loci to be separated by
recombination.
Note that Holland (1975) sketched a proof of a similar result for schema
frequencies and one-point crossover. Geiringer’s theorem applied to schemata
gives us a much more general result. Together with the recurrence equations,
this work paints a picture of ‘search pressure’ from recombination acting to
reduce departures from linkage equilibrium for all schemata.
Subsequent work has carefully analyzed the dynamics of this convergence
to linkage equilibrium (Christiansen 1989). It has been proven, for example,
that the convergence rate for any particular schema is given by the probability
of the recombination event specified by the schema’s defining loci. In this
view, an important difference between crossover operators is the rate at which,
undisturbed by selective pressures, they drive schemata to their equilibrium
proportions. These results from mathematical population genetics have only
recently been applied to evolutionary algorithms (Booker 1993, Altenberg 1995).
makes it clear that pA (B) involves strings having the allele values given by B
at the loci designated by A. Note that p∅ (B) = 1 and pS (B) = p(B).
With this notation we can also succinctly relate recombination distributions
and schemata. If A designates the defining loci of a schema ξ and B ⊆ A
specifies the alleles at those loci, then the frequency of ξ is given by pA (B) and
the marginal distribution RA describes the transmission of the defining loci of
ξ . In what follows we will assume, without loss of generality, that the elements
of the index set A for a schema ξ are in increasing order so that the kth element
A(k) is the locus of the kth defining position of ξ . This means, in particular,
that the outermost defining loci of ξ are given by the elements A(1) and A(O(ξ ))
where O(ξ ) is the order of ξ . It will be convenient to define the following
property relating the order of a schema to its defining length δ(ξ ).
Definition. The kth component of defining length for schema ξ , δk (ξ ), is the
distance between the kth and k + 1st defining loci, 1 ≤ k < O(ξ ), with the
convention that δ0 (ξ ) ≡ − δ(ξ ).
Note that the defining length of a schema is equal to the sum of its defining
length components:
O(ξ )−1
δ(ξ ) = δk (ξ ) = A(O(ξ )) − A(1) .
k=1
Putting all the pieces together, we can now give an expression for the
complete marginal distribution.
n
δ0 (ξ )
NA (B, n − j, O(ξ ))
j
j =0
if n is even or n =
2
n
RA (B) =
n
n
δ0 (ξ ) − 1
NA (B, n − j, O(ξ ))
j
j =0
otherwise.
−1
2
n
u(p)
Uniform crossover. The marginal distribution RA for parametrized uniform
crossover with parameter p is easily derived from previous analyses (Spears and
De Jong 1991b). It is given by
Figure 33.1 shows how the marginal probability of transmission for second-
u(0.5)
order schemata—2 RnA (A) and 2 RA , |A| = 2—varies as a function of
defining length. The shape of the curves depends on whether n is odd or
even. Since the curves indicate the probability of transmitting schemata, the
area above each curve can be interpreted as a measure of potential schema
disruption. This interpretation makes it clear that two-point crossover is the best
choice for minimizing disruption. Spears and De Jong (1991a) have shown that
this property of two-point crossover remains valid for higher-order schemata.
Note that these curves are not identical to the family of curves for
nondisruptive crossovers given by Spears and De Jong. The difference is
that Spears and De Jong assume crossover points are selected randomly with
268 Recombination
0.52
0.51
0.5
Crossover Points 0.49
1 2 3 4 5 U 0.48
1 L/2
Marginal Probability
0.8
0.6
0.4
0.2
0
L/2 L
search is the only search technique that has no bias. It has long been recognized
that an appropriate inductive bias is necessary in order for inductive search to
proceed efficiently and effectively (Mitchell 1980). Two types of bias have
been attributed to crossover operators in genetic search: distributional bias and
positional bias (Eshelman et al 1989).
Distributional bias refers to the number of symbols transmitted during a
recombination event and the extent to which some quantities might be more
likely to occur than others. This bias is significant because it is correlated with
the potential number of schemata from each parent that can be recombined by
the crossover operator. An operator has distributional bias if the probability
distribution for the number of symbols transmitted from a parent is not uniform.
Both one-point and two-point crossover are free of distributional bias. The
n-point (n > 2) crossover operators have a distributional bias that is well
approximated by a binomial distribution with mean /2 for large n. Uniform
crossover has a strong distributional bias, with the expected number of symbols
transmitted given by a binomial distribution with expected value px . More
recently, Eshelman and Schaffer (1993) have emphasized the expected value of
the number of symbols transmitted rather than the distribution of those numbers.
The bias defined by this criterion, though clearly similar to distributional bias,
is referred to as recombinative bias.
Positional bias characterizes how much the probability that a set of symbols
will be transmitted intact during a recombination event depends on the relative
2 13pts
1pt 14pts
3pts 11pts
12pts
2pts
1.5
4pts 5pts 9pts
Positional Bias
7pts 10pts
1
6pts
8pts
0.5
Figure 33.2. One view of the crossover bias ‘landscape’ generated using quantitative
measures derived from recombination distributions.
270 Recombination
Offspring
x1 = x1,1 x1,2 . . . x1,k x2,k+1 . . . x2,d
x2 = x2,1 x2,2 . . . x2,k x1,k+1 . . . x1,d
Figure 33.3. For one-point crossover, two parents are chosen and a crossover point k
is selected, typically uniformly across the components. Two offspring are created by
interchanging the segments of the parents that occur from the crossover point to the ends
of the string.
Offspring
x1 = x1,1 x1,2 . . . x2,k1 x2,k1 +1 . . . x2,k2 x1,k2 +1 . . . x2,d
x2 = x2,1 x2,2 . . . x1,k1 x1,k1 +1 . . . x1,k2 x2,k2 +1 . . . x2,d
Figure 33.5. For two-point crossover, two parents are chosen and two crossover points,
k1 and k2 , are selected, typically uniformly across the components. Two offspring are
created by interchanging the segments defined by the points k1 and k2 .
Parents
x1 = x1,1 x1,2 . . . x1,d
x2 = x2,1 x2,2 . . . x2,d
Offspring
x1 = x1,1 x2,2 . . . x1,d
x2 = x1,1 x1,2 . . . x2,d
Figure 33.6. For uniform crossover, each element of an offspring (here two offspring
are depicted) is selected from either parent. The example shows that the first element in
both offspring were selected from the first parent. In some applications such duplication
is not allowed. Typically each parent has an equal chance of contributing each element
to an offspring.
xi = αx1i + (1 − α)x2i
subject to
αi = 1 i = 1, . . . , k
where there are k individuals involved in the multirecombination. This general
procedure is also known as arithmetic crossover (Michalewicz 1996, p 112) and
has been described in various other terms in the literature.
In a more generalized manner, recombination operators can take the
following forms (Bäck et al 1993, Fogel 1995, pp 146–7):
xS,i (33.4)
xS,i or xT,i (33.5)
xi = xS,i + u(xT,i − xS,i ) (33.6)
xS ,i or x T ,i (33.7)
j j
xSj ,i + ui (xTj ,i − xSj ,i ) (33.8)
where S and T denote two arbitrary parents, u is a uniform random variable
over [0, 1], and i and j index the components of a vector and the vector itself,
respectively. The versions are no recombination (33.4), discrete recombination
(or uniform crossover) (33.5), intermediate recombination (33.6), and (33.7)
and (33.8) are the global versions of (33.5) and (33.6), respectively, extended
to include more than two parents (up to as many as the entire population size).
There are several other variations of crossover operators that have been
applied to real-valued vectors.
(i) The heuristic crossover of Wright (1994) takes the form
x = u(x2 − x1 ) + x2
where u is a uniform random variable over [0, 1] and x1 and x2 are the
two parent vectors subject to the condition that x2 is not worse than x1 .
Michalewicz (1996, p 129) noted that this operator uses values of the
objective function to determine a direction to search.
(ii) The simplex crossover of Renders and Bersini (1994) selects k > 2 parents
(say the set J ), determines the best and worst individuals within the selected
group (say x1 and x2 , respectively), computes the centroid of the group
without x2 (say c) and computes the reflected vector x (the offspring)
obtained from the vector x2 as
x = c + (c − x2 ).
(iv) The fitness-based scan of Eiben et al (1994) takes multiple parents and
generates an offspring where each component is selected from one of the
parents with a probability corresponding to the parent’s relative fitness. If
a parent has fitness f (i), then the likelihood
of selecting each individual
component from that parent is f (i)/ f (j ), where j = 1, . . . , k and there
are k parents involved in the operator.
(v) The diagonal multiparent crossover of Eiben et al (1994) operates much
like n-point crossover, except that in creating k offspring from k parents,
c ≥ 1 crossover points are chosen and the first offspring is constructed to
contain the first segment from parent 1, the second segment from parent 2,
and so forth. Subsequent offspring are similarly constructed from a rotation
of segments from the parents.
33.3 Permutations
Darrell Whitley
33.3.1 Introduction
An obvious attribute of permutation problems (see Chapter 17) is that simple
crossover operators fail to generate offspring that are permutations. Consider the
following example of simple one-point crossover, when one parent is denoted
with capital letters and the other with lower-case letters:
String 1: A B C D E F G H I
\/
/\
String 2: h d a e i c f b g
Offspring 1: A B C e i c f b g
Offspring 2: h d a D E F G H I.
Davis’s order crossover. Pick two permutations for recombination. Denote the
first parent as the cut string and the other the filler string. Select two crossover
points. Copy the sublist of permutation elements between the crossover points
from the cut string directly to the offspring, placing them in the same absolute
position. This will be referred to as the crossover section. Next, starting at the
second crossover point, find the next element in the filler string that does not
appear in the offspring. Starting at the second crossover point, place the element
Permutations 275
from the filler string into the next available slot in the offspring. Continue
moving the next unused element from the filler string to the offspring. When
the end of the filler string (or the offspring) is reached, wrap around to the
beginning of the string. When done in this way, Davis’s order crossover has
the property that Radcliffe (1994) describes as pure recombination: when two
identical parents are recombined the offspring will also be identical with the
parents. If one does not start copying elements from the filler string starting at
the second crossover point, the recombination may not be pure.
The following is an example of Davis’s order crossover, where dots
represent the crossover points. The underscore symbol in the crossover section
corresponds to empty slots in the offspring.
Parent 1: A B . C D E F . G H I
Crossover-section: _ _ C D E F _ _ _
Parent 2: h d . a e i c . f b g
Available elements in order: b g h a i
Offspring: a i C D E F b g h.
Note that the elements in the crossover section preserve relative order,
absolute position, and adjacency from parent 1. The elements that are copied
from the filler string preserve only the relative order information from the second
parent.
Partially mapped crossover (PMX). Goldberg and Lingle (1985) introduced the
partially mapped crossover operator (PMX). PMX shares the following attributes
with Davis’s order crossover. One parent string is designated as parent 1, the
other as parent 2. Two crossover sites are selected and all of the elements in
parent 1 between the crossover sites are directly copied to the offspring. This
means that PMX also defines a crossover section in the same manner as order
crossover.
Parent 1: A B . C D E . F G
Crossover-section: _ _ C D E _ _
Parent 2: c f . e b a . d g.
The difference between the two operators is in how PMX copies elements
from parent 2 into the open slots in the offspring after a crossover section has
been defined. Denote the parents as P1 and P2 and the offspring as OS; let
P1i denote the ith element of permutation P1. The following description of
selecting elements from P2 to place in the offspring is based on the article by
Whitley and Yoo (1995).
For those elements between the crossover points in parent 2, if element P2i
has already been copied to the offspring, take no action. In the example given
here, element e in parent 2 requires no processing. We will consider the rest of
the elements by considering the positions in which they appear in the crossover
276 Recombination
section. If the next element at position P2i in parent 2 has not already been
copied to the offspring, then find P1i = P2j ; if position j has not been filled
in the offspring then assign OSj = P2i . In the example given here, the next
element in the crossover section of parent 2 is b which is in the same position as
D in parent 1. Element D is located in parent 2 with index 6 and the offspring at
OS6 has not been filled. Copy b to the offspring in the corresponding position.
This yields
Offspring: _ _ C D E b _.
All of the elements in parent 1 and parent 2 that fall within the crossover
section have now been placed in the offspring. The remaining elements can be
placed by directly copying their positions from parent 2. This yields
Offspring: a f C D E b g.
The selected elements in parent 2 are F, B, and A. Thus, the relevant elements
are reordered in parent 1.
Reorder A B _ _ _ F _ from parent 1 which yields f b _ _ _ a _.
Next scan parent 2 from left to right and place place each element which
does not yet appear in the offspring in the next available position. This yields
the following progression:
# # C D E # G => f # C D E # G
=> f b C D E # G
=> f b C D E a G.
Obviously, in this case the two operators generate exactly the same offspring.
Jim Van Zant first pointed out the similarity of these two operators in the
electronic newsgroup The Genetic Algorithm Digest. Whitley and Yoo (1995)
show the two operators to be identical using the following argument.
Assume there is one way to produce a target string S by recombining two
parents. Given a pair of strings which can be recombined to produce string S, the
probability of selecting the K key positions using order crossover-2 required to
−1
generate a specific string S is KL , while for position crossover the probability
of picking the L − K key elements that will produce exactly the same effect is
L −1 L
L−K
. Since KL = L−K the probabilities are identical.
Now assume there are R unique ways to recombine two strings to generate a
target string S. The probabilities for each unique recombination event are equal
as shown by the argument in the preceding paragraph. Thus the sum of the
probabilities for the various ways of ways of generating S are equivalent for
order crossover-2 and position crossover. Since the probabilities of generating
any string S are identical, the operators are identical in expectation.
This also means that in practice there is no difference between using order
crossover-2 and position crossover as long as the parameters of the operators
are adjusted to reflect their complementary nature. If position crossover is used
so that X % of the positions are initially copied to the offspring, then order
crossover is identical if (100 − X )% positions are selected as relative order
positions.
one was used to reach the current city.) Otherwise from the cities on the
current adjacency list pick the next city to be the one whose own adjacency
list is shortest. Ties are broken randomly. Once a city is visited, references
to the city are removed from the adjacency list of other cities and it is no
longer reachable from other cities.
(iii) Repeat step 2 until the tour is complete or a city has been reached that has
no entries in its adjacency list. If not all cities have been visited, randomly
pick a new city to start a new partial tour.
Using the edge table in figure 33.8, city a is randomly chosen as the first
city in the tour. City k is chosen as the second city in the tour since the edge
(a, k) occurs in both parent tours. City e is chosen from the edge list of city k
as the next city in the tour since this is the only city remaining in k’s edge list.
This procedure is repeated until the partial tour contains the sequence [a k e c].
At this point there is no deterministic choice for the fifth city in the
tour. City c has edges to cities d and g, which both have two unused edges
remaining. Therefore city d is randomly chosen to continue the tour. The
normal deterministic construction of the tour then continues until position 7. At
position 7 another random choice is made between cities f and h. City h is
selected and the normal deterministic construction continues until we arrive at
the following partial tour: [a k e c d m h b g].
In this situation, a failure occurs since there are no edges remaining in the
edge list for city g. When a potential failure occurs during edge-3 recombination,
we attempt to continue construction at a previously unexplored terminal point
in the tour.
A terminal is a city which occurs at either end of a partial tour, where all
edges in the partial tour are inherited from the parents. The terminal is said to
be live if that city still has entries in its edge list; otherwise it is said to be a dead
terminal. Because city a was randomly chosen to start the tour in the previous
example, it serves as a new terminal in the event of a failure. Conceptually this
is the same as inverting the partial tour to build from the other end.
When a failure occurs, there is at most one live terminal in reserve at the
opposite end of the current partial tour. In fact, it is not guaranteed to be live,
since the construction of the partial tour could isolate this terminal city. Once
280 Recombination
both terminals of the current partial tour are found to be dead, a new partial
tour must be initiated. Note that no local information is employed.
We now continue construction of the partial tour [a k e c d m h b g]. The
tour segment is reversed (i.e. [g b h m d c e k a]). Then city i is added to the
tour after city a. The tour is then constructed in the normal fashion. In this
case, there are no further failures. The final offspring tour is [g b h m d c e k
a i f j]. The offspring produced has a single foreign edge ([j–g].)
When a failure occurs at both ends of the subtour, edge-3 recombination
starts a new partial tour. However, there is one other possibility, which has
been described as part of the edge-4 operator (Dzubera and Whitley 1994) but
which has not been widely tested.
Assume that the first partial tour has been constructed such that both ends
of the construction lack a live terminal by which to continue. Since only one
partial tour has been constructed and since initially every city has at least two
edges in the edge table, there must be edges internal to the current partial tour
that represent possible edges to the terminal cities of the partial tour. The edge-4
operator attempts to exploit this fact by inverting part of the partial tour so that a
terminal city is reconnected to an edge which is both internal to the partial tour
and which appeared in the original edge list of the terminal city. This will cause
a previously visited city in the partial tour to move to a terminal position. If this
newly created terminal has cities remaining in its (old) edge list, the offspring
construction can continue. If it does not, one can look for other internal edges
that will allow an inversion. Details on the edge-4 recombination operator are
given by Dzubera and Whitley (1994).
If one is using just a recombination operator and a mutation operator,
then edge recombination works very well as an operator for the TSP, at least
compared to other recombination operators, but if one is hybridizing such that
tours are being produced by recombination, then improved using 2-opt, then
both the empirical and the theoretical evidence suggests that Mühlenbein’s MPX
operator may be more effective (Dzubera and Whitley 1994).
(a) IF an edge from the receiver parent starting at the last city in the
offspring is possible (does not violate a valid tour)
(b) THEN add this edge from the receiver
(c) ELSE IF an edge from the donor starting at the last city in the
offspring is possible
(d) THEN add this edge from the donor
(e) ELSE add that city from the receiver which comes next in the string;
this adds a new edge, which we will mark as an implicit mutation.
The following example illustrates the MPX operator.
Receiver: G D M H B J F I A K E C
Donor: c e k a g b h i j f m d
Initial segment: _ _ k a g _ _ _ _ _ _ _.
Note that MPX does not transmit adjacency information from parents to
offspring as effectively as the various edge recombination operators, since it
uses less lookahead to avoid a break in the tour construction. At the same time,
when it must introduce a new edge that does not appear in either parent, it skips
to a nearby city in the tour rather than picking a random edge. Assuming that
the tour is partially optimized (for example, if the tour has been improved via
2-opt) then a city nearby in the tour should also be a city nearby in Euclidean
space. This, coupled with the fact that an initial segment is copied from one of
the parents, appears to give MPX an advantage when when combined with an
operator such as 2-opt. Gorges-Schleuter (1989) implemented a variant of MPX
that has some notable features that are somewhat like Davis’s order crossover
operator. A full description of Gorges-Schleuter’s MPX is given by Dzubera
and Whitley (1994).
two parent strings. Consider the following example from Oliver et al (1987)
where the permutation elements correspond to the alphabetic characters with
numbers to indicate position:
Parent 1: h k c e f d b l a i g j
Parent 2: a b c d e f g h i j k l
Positions: 1 2 3 4 5 6 7 8 9 10 11 12.
To find a cycle, pick a position from either parent. Starting with position 1,
elements (h, a) belong to cycle 1. The elements (h, a) also appear in positions
8 and 9. Thus the cycle is expanded to include positions (1, 8, 9) and the new
elements i and l are added to the corresponding subset. Elements i and l appear
in positions 10 and 12, which also causes j to be added to the subset of elements
in the cycle. Note that adding j adds no new elements, so the cycle terminates.
Cycle 1 includes elements (h, a, i, l, j) in positions (1, 8, 9, 10, 12).
Note that element (c) in position 3 forms a unary cycle of one element.
Aside from the unary cycle at element c (denoted U), Oliver et al note that
there are three cycles between this set of parents:
Parent 1: h k c e f d b l a i g j
Parent 2: a b c d e f g h i j k l
Cycle Label: 1 2 U 3 3 3 2 1 1 1 2 1.
Recombination can occur by picking some cycles from one parent and
the remaining cycles from the alternate parent. Note that all elements in the
offspring occupy the same positions as in one of the two parents. However, few
applications seem to be position sensitive and cycle crossover is less effective at
preserving adjacency information (as in the TSP) or relative order information
(as in resource scheduling) compared to other operators.
placed in the offspring. Because C has already been allocated a position in the
offspring, the C which appears later in parent 2 is exchanged with the E in the
initial position of parent 2. This yields
Parent 1: C F G B A H D I E J
Parent 2: C B G J D I <E> A F H
Precedence: A B C D E F G H I J
Note that one need not actually build a separate offspring, since both parents
are in effect transformed into copies of the same offspring. The resulting
offspring in the above example is
Offspring: C B G F A H D E I J.
The MX-2 operator is similar, except that when an element is added to the
offspring it is deleted from both parents instead of being swapped. Thus, the
process works as follows:
Parent 1: C F G B A H D I E J
Parent 2: E B G J D I C A F H
Precedence: A B C D E F G H I J.
Instead of now moving to the second element of each permutation, the first
remaining elements in the parents are compared: in this case, E and F are the
first elements and E is chosen and deleted. The parents are now represented as
follows:
Parent 1: _ F G B A H D I _ J
Parent 2: _ B G J D I _ A F H
Offspring: C E.
Note that, over time, this class of operator will produce offspring that are
closer to the precedence vector—even if no selection is applied.
284 Recombination
M = (Q, T , P , s, o)
where Q is a finite set, the set of states, T is a finite set, the set of input
symbols, P is a finite set, the set of output symbols, s : Q × T → Q, the
next state function, and o : Q × T → P , the next output function. Perhaps the
earliest proposal to recombine finite-state machines in simulated evolution can
be found in the work of Fogel (1964) and Fogel et al (1966, pp 21–3). The
following extended quotation (Fogel et al 1966, p 21) may be insightful:
The recombination of individuals of opposite sex appears to benefit
natural evolution. By analogy, why not retain worthwhile traits that
have survived separate evolution by combining the best surviving
machines through some genetic rule; mutating the product to yield
offspring? Note that there is no need to restrict this mating to the
best two surviving ‘individuals’. In fact, the most obvious genetic
rule, majority logic, only becomes meaningful with the combination
of more than two machines.
Fogel et al (1966) suggested drawing a single state diagram which expresses
the majority logic of an array of finite-state machines. Each state of the majority
logic machine is the composite of a state from each of the original machines.
Finite-state machines 285
Figure 33.9. Three parent machines (top) are joined by a majority logic operator to
form another machine (bottom). The initial state of each machine is indicated by a short
arrow pointing to that state. Each state in the majority logic machine is a combination
of the states of the three parent machines with the output symbol being chosen as the
majority decision of two of the three parent machines. For example, the state BDF in
the majority logic machine is determined by examining the states B, D, and F in each
of the individual machines. For an input symbol of 0, all three states respond with a
0, therefore this symbol is chosen for the output to an input of 0 in state BDF. For an
input symbol of 1, two of the three states respond with a 0, thus, this being the majority
decision, this symbol is chosen for the output to an input of 1 in state BDF. Note that
several states of the majority logic machine are isolated from the start state and could
never be expressed.
Thus the majority machine may have a number of states as great as the product
of the number of states in the original machines. Each transition of the majority
machine is described by that input symbol which caused the respective transition
in the original machines, and by that output symbol which results from the
majority element logic being applied to the output symbols from each of the
original machines (figure 33.9). If there are only two parents to recombine in
this manner, the majority logic machine reduces to the better of the two parents.
Zhou and Grefenstette (1986) used recombination on finite-state automata
applied to binary sequence induction problems. The finite-state automata were
286 Recombination
(Q, S, δ, q0 , F )
of the parse tree must have the correct number of subtrees below it, one for
each argument that the function requires.
Often in genetic programming, a simplification is made so that all functions
and terminals in the primitive language return the same data type. This is
referred to as the closure principle (Koza 1992). The effect is to reduce the
number of syntactic constraints on the programs so that the complexity of the
crossover operation is minimized.
The recursive structure of parse tree representations makes the definition of
crossover for tree representations that adhere to the above caveats surprisingly
simple. Cramer (1985) initially defined the now standard subtree crossover
for parse trees shown in figure 33.10. First, a random subtree is selected
and removed from one of the parents. Note that this leaves a hole in the
parent such that there exists a function that has a null value for one of its
parameters. Next, a random subtree is extracted from the second parent and
inserted at the point in the first parent where its subtree was removed. Now
the hole in the first parent is again filled. The process is completed by
inserting the subtree extracted from the first parent into the position in the
second parent where its subtree was removed. As long as only complete
subtrees are swapped between parents and the closure principle holds, this simple
crossover operation is guaranteed to produce syntactically valid offspring every
execution.
Typically, when evolving parse tree representations, a user-defined limit on
the maximum size of any tree in the population is provided. Subtree crossover
will often increase the size of a given parent such that, over a number of
generations, individuals in an unrestricted population may grow to swamp the
available computational resources. Given a user-defined restriction on subtree
size, expressed as a limit according to either the depth of a tree or the number of
nodes it contains, crossover must enforce this limit. When a crossover operation
is executed that creates one or more offspring that violate the size limitation, the
crossover operation is invalidated and the offspring are restored to their original
forms. What happens next is a matter of choice. Some systems will reject both
children and revert back to selecting two new parents. Other systems attempt
crossover repeatedly either until both offspring fall within the size limit or until
a specified number of attempts is reached. Given the nature of the crossover
operation, the likelihood of performing a valid crossover operation in a small
number of attempts, say five, is fairly good.
Koza (1992) popularized the use of subtree crossover for manipulating
parse tree representations in genetic programming. The subtree swapping
crossover of Koza (1992) shares much with the subtree crossover defined
by Cramer (1985) with a few minor differences. The foremost difference is
a bias introduced by Koza (1992) to limit the probability that a leaf node
is selected as the subtree from a parent during crossover. The reasoning
for this bias according to Koza (1992) is that, in most trees, the number
of leaf nodes will be roughly equivalent to the number of nonleaf nodes.
288 Recombination
Figure 33.10. An illustration of the crossover operator for parse trees. A subtree is
selected at random from each parent, extracted, and exchanged to create two offspring
trees.
33.7.1 Introduction
To make the following survey unambiguous we have to start with setting some
conventions on terminology. The term population will be used for a multiset
of individuals that undergoes selection and reproduction. This terminology
is maintained in genetic algorithms, evolutionary programming, and genetic
programming, but in evolution strategies all µ individuals in a (µ, λ) or (µ + λ)
strategy are called parents. We, however, use the term parents only for those
individuals that are selected to undergo recombination. In other words, parents
are those individuals that are actually used as inputs for a recombination
operator; the arity of a recombination operator is the number of parents it uses.
The next notion is that of a donor, being a parent that actually contributes to (at
least one of) the alleles of the child(ren) created by the recombination operator.
This contribution can be for instance the delivery of an allele, as in uniform
290 Recombination
where the two parents x Si , x Ti (Si , Ti ∈ {1, . . . , µ}) are redrawn for each i
anew and so is the contraction factor χi . The above definition applies to the
object variables as well as the strategy parameter part; that is, for the mutation
stepsizes (σ ) and the rotation angles (α). Observe that the multiparent character
of global recombination is the consequence of redrawing the parents x Si , x Ti for
each coordinate i. Therefore, probably more than two individuals contribute
to the offspring x, but their number is not defined in advance. It is clear
that investigations on the effects of different numbers of parents on algorithm
performance could not be performed in the traditional ES framework. The option
of using multiple parents can be turned on or off, that is, global recombination
can be used or not, but the arity of the recombination operator is not tunable.
Experimental studies on global versus two-parent recombination are possible,
but so far there are almost no experimental results available on this subject.
Schwefel (1995) notes that ‘appreciable acceleration’ is obtained by changing to
a bisexual from an asexual scheme (i.e., adding recombination using two parents
to the mutation-only algorithm), but only a ‘slight further increase’ is obtained
when changing from bisexual to multisexual recombination (i.e., using global
recombination instead of the two-parent variant). Recall the remark on the name
p-sexual voting. The terms bisexual and multisexual are not appropriate either
for the same reason: individuals have no gender or sex, and recombination can
be applied to any combination of individuals.
Gene-pool recombination (GPR) was introduced by Mühlenbein and Voigt
(1996) as a multiparent recombination mechanism for discrete domains. It is
defined as a generalization of two-parent recombination (TPR). Applying GPR
is preceded by selecting a gene pool consisting of would-be parents. Applying
GPR the two parent alleles of an offspring are randomly chosen for each locus
with replacement from the gene pool and the offspring allele is computed ‘using
any of the standard recombination schemes for TPR’. Theoretical analysis on
infinite populations shows that GPR is mathematically more tractable than TPR.
If n stands for the number of variables (loci), then the evolution with proportional
selection and GPR is fully described by n equations, while TPR needs 2n
equations for the genotypic frequencies. In practice GPR converges about 25%
faster than TPR for Onemax. The authors conclude that GPR separates the
identification and the search of promising areas of the search space better;
besides it searches more reasonably than does TPR. Voigt and Mühlenbein
(1995) extend GPR to continuous domains by combining it with uniform fuzzy
two-parent recombination (UFTPR) from Voigt et al (1995). The resulting
uniform fuzzy gene-pool recombination (UFGPR) outperforms UFTPR on the
spherical function in terms of realized heritability, giving it a higher convergence
Multiparent recombination 293
speed. The convergence of UFGPR is shown to be about 25% faster than that
of UFTPR.
A very particular mechanism is the linkage evolving genetic operator
(LEGO) as defined by Smith and Fogarty (1996). The mechanism is designed
to detect and propagate blocks of corresponding genes of potentially varying
length during the evolution. Punctuation marks in the chromosomes denote
the beginning and the end of each block and more chromosomes with the
appropriately positioned punctuation marks are considered as donors of a whole
block during the creation of a child. Although the multiparent feature is only a
side-effect, LEGO is a mechanism where more than two parents can contribute
to an offspring.
Observe that, because of the term k ≥ ij above, a marker can remain at the
same position after an UPDATE, and will only be shifted if the allele standing
at that position is included in the child. This guarantees that each offspring will
be a permutation.
Depending on the mechanism of choosing a parent (and thereby an allele)
there are three different versions of scanning. The choice can be deterministic,
choosing a parent containing the allele with the highest number of occurrences
and breaking ties randomly (occurrence-based scanning). Alternatively it can
be random, either unbiased, following a uniform distribution thus giving each
parent an equal chance to deliver its allele (uniform scanning), or biased by the
Multiparent recombination 295
fitness of the parents, where the chance of being chosen is fitness proportional
(fitness-based scanning). Uniform scanning for r = 2 is the same as uniform
crossover, although creating only one child, and it also coincides with discrete
recombination in evolution strategies. The occurrence-based version is very
much like the voting or majority mating mechanism discussed before, but
without the threshold v or with v = m/2 respectively. The effect of the
number of parents in scanning crossover has been studied in several papers. An
overview of these studies is given in the next subsection.
Diagonal crossover has been introduced as a generalization of one-point
crossover in GAs (Eiben et al 1994). In its original form diagonal crossover
creates r children from r parents by selecting r −1 crossover points in the parents
and composing the children by taking the resulting r chromosome segments from
the parents ‘along the diagonals’. Later on, a one-child version was introduced
(van Kemenade et al 1995). Figure 33.11 illustrates both variants. It is easy to
see that for r = 2 diagonal crossover coincides with one-point crossover, and in
some sense it also generalizes traditional two-parent n-point crossover. To be
precise, if we define (r, s) segmentation crossover as working on r parents with
s crossover points, diagonal crossover becomes its (r, r − 1) version, its (2, n)
variant coincides with n-point crossover, and one-point crossover is an instance
of both schemes for (r, s) = (2, 1) as parameters. The effect of operator arity
for diagonal crossovers will be also discussed in the next subsection.
A recombination mechanism with tunable arity in ES is proposed by
Schwefel and Rudolph (1995). The (µ, κ, λ, ρ) ES provides the possibility
of freely adjusting the number of parents (called ancestors by the authors).
The parameter ρ stands for the number of parents and global recombination is
redefined for any given set {x 1 , . . . , x ρ } of parents as
j
xi ρ ρary discrete recombination
xi =
(1/ρ) k=1 xik ρ/ρ intermediate recombination
where j ∈ {1, . . . , ρ} is uniform randomly chosen for each i independently.
Let us note that, in the original paper, the above operators are called uniform
crossover and global intermediate recombination respectively. We introduce
the names ρary discrete recombination and ρ/ρ intermediate recombination
respectively here for the sake of a consequent terminology. (A reason for
using the term ρ/ρ intermediate recombination instead of ρary intermediate
recombination is given below, in the paragraph discussing a paper by Eiben and
Bäck (1997).) Observe that ρary discrete recombination coincides with uniform
scanning crossover, while ρ/ρ intermediate recombination is a special case of
mating by averaging. At this time there are no experimental results available
on the effect of ρ within this framework.
Related work in ESs also uses ρ as the number of parents as an independent
parameter for recombination (Beyer 1995). For purposes of a theoretical analysis
it is assumed that all parents are different, uniform randomly chosen from the
population of µ individuals. Beyer defines the ρ/ρ intermediate recombination
296 Recombination
parent 1
parent 2
parent 3
child 1
child 2
child 3
Multiple children
parent 1
parent 2
parent 3
child
One child
Figure 33.11. Diagonal crossover (top) and its one-child version (bottom) for three
parents.
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
2 4 6 8 10 12 14 16
6.6
diagonal one crossover
diagonal crossover
uniform scanning
6.4
6.2
5.8
5.6
5.4
5.2
2 4 6 8 10 12 14 16
Figure 33.12. Illustration of the effect of the number of parents (horizontal axis) on the
error at termination (vertical axis) on NK landscapes with NNI, N = 100, K = 1 (top),
K = 25 (bottom).
recombination. It seems that if and when crossover is useful, that is, on mildly
epistatic problems, then multiparent crossover can be more useful than the two-
parent variants.
The results of an extensive study of diagonal crossover for numerical
optimization in GAs are reported by Eiben and van Kemenade (1997). Diagonal
crossover is compared to its one-offspring version and n-point crossover on a test
300 Recombination
suite consisting of eight functions, monitoring the speed, that is, the total number
of evaluations, the accuracy, that is, the median of the best objective function
value found (all functions have an optimum of zero), and the success rate, that
is, the percentage of runs where the global optimum is found. In most cases an
increase of performance can be achieved by increasing the disruptivity of the
crossover operator (using higher values of n for n-point crossover), and even
more improvement is achieved if the disruptivity of the crossover operator and
the number of parents is increased (using more parents for diagonal crossover).
This study gives a strong indication that for diagonal crossover an advantageous
multiparent effect does exist, that is, (i) using this operator with more than two
parents increases GA performance and (ii) this improvement is not only the
consequence of the increased number of crossover points.
A recent investigation of Eiben and Bäck (1997) addresses the working of
multiparent recombination operators in continuous search spaces, in particular
within ESs. This study compares ρ/2 intermediate recombination, ρary discrete
recombination, which is identical to uniform scanning crossover, and diagonal
crossover with one child. Experiments are performed on unimodal landscapes
(sphere model and Schwefel’s double sum), multimodal functions with regularly
arranged optima and a superimposed unimodal topology (Ackley, Griewangk,
and Rastrigin functions) and on the Fletcher–Powell and the Langermann
functions that have an irregular, random arrangement of local optima. On
the Fletcher–Powell function multiparent recombination does not increase
evolutionary algorithm (EA) performance; besides for the unimodal double sum
increasing operator arity decreases performance. Other conclusions seem to
depend on the operator in question; the greatest consequent improvement on
raising the number of parents is obtained for diagonal crossover.
33.7.6 Conclusions
The idea of applying more than two parents for recombination in an evolutionary
problem solver occurred as early as the 1960s (Bremermann et al 1966). Several
authors have designed and applied recombination operators with higher arities
for a specific task, or used an existing operator with an arity higher than two
(Kaufman 1967, Mühlenbein 1989, Bersini and Seront 1992, Furuya and Haftka
1993, Aizawa 1994, Pál 1994). Nevertheless, investigations explicitly devoted
to the effect of operator arity on EA performance are still scarce; the study of the
phenomenon of multiparent recombination has just begun. What would such a
study mean? Similarly to the question of whether binary reproduction operators
(crossover with two parents) have advantages over unary ones (using mutation
only), it can be investigated whether or not using more than two parents is
advantageous. In the case of operators with tunable arity this question can be
refined and the relationship between operator arity and algorithm performance
can be studied. It is, of course, questionable whether multiparent recombination
can be considered as one single phenomenon showing one behavioral pattern.
Multiparent recombination 301
The survey presented here discloses that there are (at least) three different types
of multiparent mechanism with tunable arity:
(i) operators based on allele frequencies among the parents, such as majority
mating, voting recombination, ρary discrete recombination, or scanning
crossover;
(ii) operators based on segmenting and recombining the parents, such as mating
by crossing over, diagonal crossover, or (r, s) segmentation crossover;
(iii) operators based on numerical operations, in particular averaging of
(real-valued) alleles, such as mating by averaging, ρ/ρ intermediate
recombination, ρ/2 intermediate recombination, and geometrical and
spherical crossover.
A priori it cannot be expected that these different schemes show the same
response to raising operator arities. There are also experimental results
supporting differentiation among various multiparent mechanisms. For instance,
there seems to be no clear relationship between the number of parents and
the performance of uniform scanning crossover, while the opposite is true for
diagonal crossover (Eiben and Schippers 1996).
Another aspect multiparent studies have to take into consideration is the
expected different behavior on different types of fitness landscape. As no
single technique would work on every problem, multiparent mechanisms will
have their limitations too. Some studies indicate that on irregular landscapes,
such as NK landscapes with relatively high K values (Eiben and Schippers
1996), or the Fletcher–Powell function (Eiben and Bäck 1997), they do not
work. On the other hand, on the same Fletcher–Powell function Eiben and van
Kemenade (1997) observed an advantage of increasing the number of parents for
diagonal crossover in a GA framework using bit coding of variables, although
they also found indications that this can be an artifact, caused simply by the
increased disruptiveness of the operator for higher arities. Investigations on
multiparent effects related to fitness landscape characteristics smoothly fit into
the tradition of studying the (dis)advantages of two-parent crossovers under
different circumstances (Schaffer and Eshelman 1991, Eshelman and Schaffer
1993, Spears 1993, Hordijk and Manderick 1995).
Let us also touch on the issue of practical difficulties when using multiparent
recombination operators. Introducing operator arity as a new parameter implies
an obligation of setting its value. Since so far there are no reliable heuristics for
setting this parameter, finding good values may require numerous tests, prior
to ‘real’ application of the EA. A solution may be based on previous work on
adapting (Davis 1989) or self-adapting (Spears 1995) the frequency of applying
different operators. Alternatively, a number of competing subpopulations could
be used in the spirit of Schlierkamp-Voosen and Mühlenbein (1996). According
to the latter approach each different arity is used within one subpopulation
and subpopulations with greater progress, that is, with more powerful operators,
become larger. A first assessment of this technique can be found in an article by
302 Recombination
Eiben et al (1998a). Another recent result indicates the advantage of using more
parents in the context of constraint satisfaction problems (Eiben et al 1998b).
Concluding this survey we can note the following. Even though there are
no biological analogies of recombination mechanisms where more than two
parent genotypes are mixed in one single recombination act, formally there is
no necessity to restrict the arity of reproduction mechanisms to one (mutation)
or two (crossover) in computer simulations. Studying the phenomenon of
multiparent recombination has just begun, but there is already substantial
evidence that applying more than two parents can increase the performance
of EAs. Considering multiparent recombination mechanisms is thus a sound
design heuristic for practitioners and a challenge for theoretical analysis.
References
Ackley D H 1987a A Connectionist Machine for Genetic Hillclimbing (Boston, MA:
Kluwer)
——1987b An empirical study of bit vector function optimization Genetic Algorithms and
Simulated Annealing ed L Davis (San Mateo, CA: Morgan Kaufmann) pp 170–215
Aizawa A N 1994 Evolving SSE: a stochastic schemata exploiter Proc. 1st IEEE Conf. on
Evolutionary Computation (Orlando, FL, 1990) (Piscataway, NJ: IEEE) pp 525–9
Altenberg L 1995 The schema theorem and Price’s theorem Foundations of Genetic
Algorithms 3 ed L Whitley and M Vose (San Mateo, CA: Morgan Kaufmann)
Bäck T 1996 Evolutionary Algorithms in Theory and Practice (New York: Oxford
University Press)
Bäck T, Rudolph G and Schwefel H-P 1993 Evolutionary programming and evolution
strategies: similarities and differences Proc. 2nd Ann. Conf. on Evolutionary
Programming (San Diego, CA) ed D B Fogel and W Atmar (La Jolla, CA:
Evolutionary Programming Society) pp 11–22
Bäck T and Schütz M 1995 Evolution strategies for mixed-integer optimization of optical
multilayer systems Proc. 4th Ann. Conf. on Evolutionary Programming (San Diego,
CA, March 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge,
MA: MIT Press) pp 33–51
Bersini H and Seront G 1992 In search of a good evolution–optimization crossover
Parallel Problem Solving from Nature, 2 (Proc. 2nd Int. Conf. on Parallel Problem
Solving from Nature, Brussels, 1992) ed R Männer and B Manderick (Amsterdam:
Elsevier–North-Holland) pp 479–88
Beyer H-G 1995 Toward a theory of evolution strategies: on the benefits of sex—the
(µ/µ, λ) theory Evolutionary Comput. 3 81–111
——1996 Basic principles for a unified EA-theory Evolutionary Algorithms and their
Applications Workshop (Dagstuhl, 1996)
Birgmeier M 1996 Evolutionary programming for the optimization of trellis-coded
modulation schemes Proc. 5th Ann. Conf. on Evolutionary Programming ed L J
Fogel, P J Angeline and T Bäck (Cambridge, MA: MIT Press)
Blanton J and Wainwright R 1993 Multiple vehicle routing with time and capacity
constraints using genetic algorithms Proc. 5th Int. Conf. on Genetic Algorithms
(Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo, CA: Morgan
Kaufmann) pp 452–9
References 303
Syswerda G 1989 Uniform crossover in genetic algorithms Proc. 3rd Int. Conf. on Genetic
Algorithms (Fairfax, VA, June 1989) ed J D Schaffer (San Mateo, CA: Morgan
Kaufmann) pp 2–9
——1991 Schedule optimization using genetic algorithms Handbook of Genetic
Algorithms ed L Davis (New York: Van Nostrand Reinhold) pp 332–49
Thierens D and Goldberg D E 1993 Mixing in genetic algorithms Proc. 5th Int. Conf. on
Genetic Algorithms (Urbana-Champaign, IL, July 1993) ed S Forrest (San Mateo,
CA: Morgan Kaufmann) pp 38–45
van Kemenade C, Kok J and Eiben A 1995 Raising GA performance by simultaneous
tuning of selective pressure and recombination disruptiveness Proc. 2nd IEEE Conf.
on Evolutionary Computation (Perth, 1995) (Piscataway, NJ: IEEE) pp 346–51
Voigt H-M and Mühlenbein H 1995 Gene pool recombination and utilization of
covariances for the breeder genetic algorithm Proc. 2nd IEEE Int. Conf. on
Evolutionary Computation (Perth, 1995) (Piscataway, NJ: IEEE) pp 172–7
Voigt H-M, Mühlenbein H and Cvetković D 1995 Fuzzy recombination for the breeder
genetic algorithm Proc. 6th Int. Conf. on Genetic Algorithms (Pittsburgh, PA, 1995)
ed L J Eshelman (San Mateo, CA: Morgan Kaufmann) pp 104–11
Whitley D, Starkweather T and Shaner D 1991 Traveling salesman and sequence
scheduling: quality solutions using genetic edge recombination Handbook of
Genetic Algorithms ed L Davis (New York: Van Nostrand Reinhold) pp 350–72
Whitley D and Yoo N-W 1995 Modeling simple genetic algorithms for permutation
problems Foundations of Genetic Algorithms 3 ed D Whitley and M Vose (San
Mateo, CA: Morgan Kaufmann) pp 163–84
Wright A H 1994 Genetic algorithms for real parameter optimization Foundations of
Genetic Algorithms ed G Rawlins (San Mateo, CA: Morgan Kaufmann) pp 205–18
Wu A S and Lindsay R K 1995 Empirical studies of the genetic algorithm with noncoding
segments Evolutionary Comput. 3 121–48
Zhou H and Grefenstette J J 1986 Induction of finite automata by genetic algorithms Proc.
1986 IEEE Int. Conf. on Systems, Man, and Cybernetics (Atlanta, GA) pp 170–4
34
Other operators
308
The Baldwin effect 309
The model of Hinton and Nowlan. The first quantitative model demonstrating
the Baldwin effect was constructed by Hinton and Nowlan (1987). They used a
computer simulation to study the effects of individual learning on the evolution
of a population of neural networks. They considered an extremely difficult
problem, where a network conferred a fitness advantage only if it was fully
functioning (all connections wired correctly). Each network was given 20
possible connections, specified by 20 genes.
Briefly consider the difficulty of finding this solution using a pure genetic
algorithm. Under a binary genetic coding scheme (allelic values of either
‘correct’ or ‘incorrect’), the probability of randomly generating a functional
net is 220 . Note that a net with 19 out of 20 correct connections is no better off
than one with no correct connections. The corresponding fitness landscape has a
singularity at the correct solution with no useful gradient information, analogous
to a putting green (figure 34.1). Finding this solution by a pure genetic algorithm,
then, is the evolutionary equivalent of a ‘hole in one’. Of course, given a large
enough random population, an evolutionary algorithm could theoretically find
this solution in one generation.
Hinton and Nowlan modeled a modified version of this problem, where
genes were allowed three alternative forms (alleles): present (1), absent (0), or
‘plastic’ (?). Connections specified by plastic alleles could be varied by random
trials during the individual’s life span. This allowed an individual to complete
and exploit a partially hard-wired network. Hence, genetic near misses (e.g.
19 out of 20 correct genes) could quickly learn the remaining connection(s)
and differentially survive. The presence of plastic alleles, therefore, softened
the fitness landscape (figure 34.1). Hinton and Nowlan described the effect
of learning ability in their simulation as follows: ‘[learning] alters the shape
of the search space in which evolution operates and thereby provides good
evolutionary paths towards sets of co-adapted alleles’. The second aspect of the
Baldwin effect (genetic assimilation) was manifested in the mutation of plastic
alleles into genetically fixed alleles.
Figure 34.1. Schematic representation of the fitness landscape in the model of Hinton
and Nowlan. A two-dimensional representation of genome space in the problem
considered by Hinton and Nowlan (1987). The horizontal axis represents all possible
gene combinations, and the vertical axis represents relative fitness. Without learning, only
one combination of alleles correctly completes the network; hence only one genotype has
higher fitness, and no gradient exists. The presence of plastic alleles radically alters this
fitness landscape. Assume a correct mutation occurs in one of the 20 genes. The advent
of a new correct gene only partially solves the problem. Learning allows individuals
close (in Hamming space) to complete the solution. Thus, these individuals will be
slightly more fit than individuals with no correct genes. Useful genes will thereby be
increased in subsequent generations. Over time, a large number of correct genes will
accumulate in the gene pool, leading to a completely genetically determined structure.
1991, 1994, Whitley and Gruau 1993, Whitley et al 1994, Balakrishnan and
Honavar 1995, Turney 1995, 1996, Turney et al 1996). Considering the rather
specific assumptions of their model, it is useful to contemplate which aspects
of their results are general properties. Among the issues raised by this and
subsequent studies are the degree of biological realism, the nature of the fitness
landscape, the computational cost of learning, and the role of learning in static
fitness landscapes.
First, the model’s assumption of plastic alleles that can mutate into
permanent alleles seems biologically spurious. However, the Baldwin effect
can be manifested in the evolution of a biological structure regardless of
the genetic basis of that structure or the mechanisms underlying the learning
process (Anderson 1995a). The Baldwin effect is simply a consequence of
individual learning on genetic evolution. Subsequent studies have demonstrated
the Baldwin effect using a variety of learning algorithms. Turney (1995, 1996)
has observed a Baldwin effect in a class of hybrid algorithms, combining a
genetic algorithm (GENESIS) and an inductive learning algorithm, where the
Baldwin effect was manifested in shifting biases in the inductive learner. French
and Messinger (1994) investigated the Baldwin effect under various forms of
phenotypic plasticity. Cecconi et al (1995) observed the Baldwin effect in a
312 Other operators
1989, Scheiner 1993, Via 1993) as well as for sexual versus asexual reproduction
(Maynard-Smith 1978).
The population mean and variance after selection (m∗ ,Vp ∗ ) can now be
expressed in the form of dynamic equations:
m(t)Vs∗ m(t)Vp (t)
m∗ (t) = ∗
= m(t) − (34.5)
Vp (t) + Vs Vp (t) + Vs∗
Vp (t)Vs∗ Vp2 (t)
Vp∗ (t) = = Vp (t) − . (34.6)
Vp (t) + Vs∗ Vp (t) + Vs∗
Lastly, mutations are introduced in the production of the next generation of
trials. To model this process, assume a Gaussian mutation function with mean
zero and variance Vµ . A convolution of the population distribution with the
mutation distribution has the effect of increasing the population variance:
+∞
fp∗∗ (g) = fp∗ (s)fm (g − s) ds = N (m∗ (t), Vp∗∗ (t)) (34.7)
−∞
where
Vp2 (t)
Vp∗∗ (t) = Vp (t) − + Vµ . (34.8)
Vp (t) + Vs∗
Hence, in a fixed environment the population mean m(t) will converge on
the optimal genotype (Bulmer 1985), while a mutation–selection equilibrium
variance occurs at
∗∗
Vµ + (Vµ2 + 4Vµ Vs∗ )1/2
Vpeq = . (34.9)
2
Inspection of equations (34.5), (34.6), and (34.8) illustrates two important points.
First, learning slows the convergence of both m∗ (t) and Vp∗ (t). Second, once
The Baldwin effect 315
convergence in the mean is complete, the utility of learning is lost, and learning
only reduces fitness.
In a more elaborate version of this model, called the critical learning period
model (Anderson 1995a), a second gene is introduced to regulate the fraction of
an individual’s life span devoted to learning (duration of the learning period).
Specification of a critical learning period implicitly assigns cost associated with
learning (the percent of life span not devoted to reproduction). Individuals are
then selected for the optimal combination of genotype and learning investment.
It is easily demonstrated that under these assumptions, learning ability is selected
out of a population subject to fixed selection.
34.1.4 Conclusions
Baldwin’s essential insight was that if an organism has the ability to learn,
it can exploit genes that only partially determine a structure—increasing the
frequencies of useful genes in subsequent generations. The Baldwin effect has
also been demonstrated to be operative in hybrid evolutionary algorithms. These
empirical investigations can be used to quantify the benefits of incorporating
Knowledge-augmented operators 317
Evolutionary computation methods are broadly useful because they are general
search procedures. The canonical forms of the evolutionary algorithms do not
take advantage of knowledge concerning the problem at hand. For example, in
the canonical genetic algorithm (Holland 1975), a one-point crossover operator
is suggested with a crossover point to be chosen randomly across the parents’
chromosomes. However it is generally accepted that the effectiveness of a
particular search operator depends on at least three interrelated factors: (i) the
chosen representation, (ii) the selection criterion, and (iii) the objective function
to be minimized or maximized, subject to the given constraints if applicable.
There is no single best search operator for all problems.
Rather than rely on simple operators that may generate unacceptably
inefficient performance on a particular problem at hand, the search operators
can be tailored for individual applications. For example, in evolution strategies
and evolutionary programming, when searching for the minimum of a quadratic
surface, Rechenberg (1973) showed that the best choice for the standard
deviation σ when using a zero mean Gaussian mutation operator was
σ = 1.224f (x)1/2 /n
where f (x) is the quadratic function evaluated at the parent vector x, and n
is the dimensionality of the function. This choice of σ incorporates knowledge
318 Other operators
about the function being searched in order to provide the greatest expected rate
of convergence. In this particular case, however, knowledge that the function is
a quadratic surface indicates the use of search algorithms that can take greater
advantage of the available gradient information (e.g. Newton–Gauss).
There are other instances where incorporating domain-specific knowledge
into a search operator can improve the performance of an evolutionary algorithm.
In the traveling salesman problem, under the objective function of minimizing
the Euclidean distance of the circuit of cities, and a representation of simply an
ordered listing of cities to be visited, Fogel (1988) offered a mutation operator
which selected a city at random and placed it in the list at another randomly
chosen position. This operator was not based on any knowledge about the
nature of the problem. In contrast, Fogel (1993) offered an operator that instead
inverted a segment of the listing (i.e. like a 2-opt of Lin and Kernighan (1976)).
The inversion operator in the traveling salesman problem is a knowledge-
augmented operator because it was devised to take advantage of the Euclidean
geometry present in the problem. In the case of a traveling salesman’s tour, if
the tour crosses over itself it is always possible to improve the tour by undoing
the crossover (i.e. the diagonals of a quadrangle are always longer in sum than
any two opposite sides). When the two cities just before and after the crossing
point are selected and the listing of cities in between reversed, the crossing is
removed and the tour is improved. Note that this use of inversion is appropriate
in light of the traveling salesman problem, and no broader generality of its
effectiveness as an operator is suggested, or can be defended.
Domain knowledge can also be applied in the use of recombination. For
example, again when considering the traveling salesman problem, Grefenstette et
al (1985) suggested a heuristic crossover operator that could perform a degree
of local search. The operator constructed an offspring from two parents by
(i) picking a random city as the starting point, (ii) comparing the two edges
leaving the starting cities in the parents and choosing the shorter edge, then (iii)
continuing to extend the partial tour by choosing the shorter of the two edges
in the parents which extend the tour. If a cycle were introduced, a random
edge would be selected. Grefenstette et al (1985) noted that offspring were
on average about 10% better than the better parent when implementing this
operator.
In many real-world applications, the physics governing the problem suggests
settings for search parameters. For example, in the problem of docking
small molecules into protein binding sites, the intermolecular potential can be
precalculated on a grid. Gehlhaar et al (1995) used a grid of 0.2 Å, with each
grid point containing the summed interaction energy between an atom at that
point and all protein atoms within 6 Å. This suggests that under Gaussian
perturbations following an evolutionary programming or evolution strategy
approach, a standard deviation of several ångströms would be inappropriate
(i.e. too large).
Whenever evolutionary algorithms are applied to specific problems with the
Gene duplication and deletion 319
34.3.2 Basic motivations for the use of gene duplication and deletion
From these first attempts concerning variable-length genotypes until now many
researchers have made use of gene duplication and deletion. Four different
motivations may be classified.
Given f : D ⊆ X = ∪∞
i=1 G → R, minimize f (x) subject to
i
gi (x) ≥ 0 ∀i ∈ {1, . . . , m}
hj (x) = 0 ∀j ∈ {1, . . . , l}
x = (x1 , . . . , xnx ) ∈ D ⊆ X n ∈ N arbitrary but not fixed,
f, gi , hj : X → R.
formalized as follows:
34.3.5 Solutions
The evolution program approach of Michalewicz (1992), i.e. combining
the concept of evolutionary computation with problem-specific chromosome
structures and genetic operators, may be seen as one main concept used to
overcome the problems mentioned above. Although this concept is useful in
practice, it prevents the conception of a more general and formalized view
of variable-length EAs because there no longer exists ‘the EA’ using ‘the
representation’ and ‘the set of operators’. Instead, for each application problem
a specialized EA exists. According to Lohmann (1992) and Kost (1993),
for example, the formulation of operators such as gene duplication and
deletion, used in their framework of structural evolution, is strongly application
dependent, thus inhibiting a more formal, general concept of these operators.
Davidor (1991a, b) expressed the need for revised and new genetic operators
for his variable-length robot trajectory optimization problem. In contrast to
the evolution program approach, Schütz (1994) formulated an application-
independent, variable-dimensional mixed-integer evolution strategy (ES), thus
following the course of constructing a more general sort of ES. This offered
Schütz the possibility to be more formal than other researchers. Unfortunately,
this approach is restricted to a class of problems which can easily be mapped
onto the mixed-integer representation he used.
Because most work concerning variable-length genotypes uses the evolution
program approach, a formal analysis of gene duplication and deletion is rarely
found in the literature and is therefore omitted here. As a consequence,
theoretical knowledge about the behavior of gene duplication and deletion is
nearly unknown. Harvey (1993), for example, points out ‘that gene-duplication,
followed by mutation of one of the copies, is potentially a powerful method for
evolutionary progress’. Most statements concerning nonstandard operators such
as duplication and deletion have the same quality as Harvey’s: they are far from
being provable.
Because of the lack of theoretical knowledge we proceed by discussing
some solutions used to circumvent the problems which arise when introducing
variable-length genotypes. In the first place, we question how other researchers
have solved the problem of noncomparable loci, i.e. the problem of respecting
the semantics of loci. Mostly this ‘gene assignment’ problem is solved by
explicitly marking semantical entities on the genotype. The form of the tagging
varies from application to application and is carried out with the help of different
representations.
• Davidor (1991a, b) used a binary encoded non-fixed-length vector of arm
configurations, i.e. a vector of triples (three angles), for representing a robot
trajectory, thus defining semantical blocks.
• The path of a mobile robot may be a variable-dimensional list of path nodes
(triples consisting of the two Cartesian coordinates and a flag indicating
whether a node is feasible or not).
Gene duplication and deletion 325
• Harp and Samad (1991) implemented the tagging with the help of a special
and more complex data structure representing the structure and actual
weights of any feedforward net consisting of a variable number of hidden
layers and a variable number of units.
• Goldberg et al (1989, 1990) extended the usual string representation of
GAs by using a list of ordered pairs, with the first component of each tuple
representing the position in the string and the second one denoting the
actual bit value. Using genotypes of fixed length a variable dimension
in the resulting messy GA was achieved by allowing strings not to
contain full gene complement (underspecification) and redundant or even
contradictionary genes (overspecification).
• Koza (1992, 1994) used rooted point-labeled trees with ordered branches
(LISP expressions), thus having a genotype representing semantics very
well.
Lohmann (1992) circumvented the ‘assignment problem’ using so-called
structural evolution. The basic idea of structural evolution is the separation
of structural and nonstructural parameters, thus leading to a ‘two-level’ ES:
a multipopulation ES using isolation. While on the level of each population
a parameter optimization, concerning a fixed structure, is carried out, on
the population level several isolated structures compete with each other. In
this way Lohmann was able to handle structural optimization problems with
variable dimension: the dimension of the structural parameter space does
not have to be constant. Since each ES itself worked on a fixed number
of nonstructural parameters (here a vector of reals) no problem occurred on
this level. On the structural level (population level) special genetic operators
and a special selection criterion were formulated. The criticism concerning
structural evolution definitively lies in the basic assumption that structural
and nonstructural parameters can always be separated. Surely, many mixed-
integer variable-dimensional problems are not separable. Secondly, on the
structural level the well-known semantical problem exists, but was not discussed.
Schütz (1994) totally omitted a discussion concerning semantical problems
arising from variable-length genotypes.
If the genotype is sufficiently prepared, problems (especially) concerning
recombination disappear, because the genetic operators may directly use the
tagging in order to construct interpretable individuals. Another important idea
when designing recombination operators for variable-length genotypes is pointed
out by Davidor (1991a). He suggests a matching of parameters according to
their genotypic character instead of to their genotypic position. Essentially, this
leads to a matching on the phenotypic, instead of the genotypic level. Generally,
Davidor points out:
occurs between sites that control the same, or at least the most similar,
function in the phenotypic space.
In case of the (two-point) segregation crossover used in his robot trajectory
optimization problem, crossing sites were specified according to the proximity
of the end effector positions.
34.3.6 Conclusions
One may remark that many ideas concerning the use of gene duplication and
deletion exist. Unfortunately, most thoughts have been extremely application
oriented, that is, not formulated generally enough. Probably the construction of
a formal frame will be very complicated in the face of the diversity of problems
and solutions.
References
Ackley D and Littman M 1991 Interactions between learning and evolution Artificial
Life II (Santa Fe, NM, February 1990) ed C Langton, C Taylor, D Farmer and S
Rasmussen (Redwood City, CA: Addison-Wesley) pp 487–509
——1994 A case for Lamarckian evolution Artificial Life III ed C Langton (Redwood
City, CA: Addison-Wesley) pp 3–10
Anderson R W 1995a Learning and evolution: a quantitative genetics approach J. Theor.
Biol. 175 89–101
——1995b Genetic mechanisms underlying the Baldwin effect are evident in natural
antibodies Proc. 4th Ann. Conf. on Evolutionary Programming (San Diego, CA,
March 1995) ed J R McDonnell, R G Reynolds and D B Fogel (Cambridge, MA:
MIT Press) pp 547–63
——1996a How adaptive antibodies facilitate the evolution of natural antibodies
Immunol. Cell Biology 74 286–91
——1996b Random-walk learning: a neurobiological correlate to trial-and-error Prog.
Neural Networks at press
Bäck T 1996 Evolutionary Algorithms in Theory and Practice (New York: Oxford
University Press)
Balakrishnan K and V Honavar 1995 Evolutionary Design of Neural Architectures:
a Preliminary Taxonomy and Guide to Literature Artificial Intelligence Research
Group, Department of Computer Science, Iowa State University, Technical Report
CS TR 95-01
Baldwin J M 1896 A new factor in evolution Am. Naturalist 30 441–51
Belew R K 1989 When both individuals and populations search: adding simple learning
to the genetic algorithm Proc. 3rd Int. Conf. on Genetic Algorithms (Fairfax, VA,
June 1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 34–41
——1990 Evolution, learning and culture: computational metaphors for adaptive search
Complex Syst. 4 11–49
Bremermann H J and Anderson R W 1991 How the brain adjusts synapses—maybe
Automated Reasoning: Essays in Honor of Woody Bledsoe ed R S Boyer (New
York: Kluwer) pp 119–47
References 327
Hightower R, Forrest S and Perelson A 1996 The Baldwin effect in the immune system:
learning by somatic hypermutation Adaptive Individuals in Evolving Populations:
Models and Algorithms ed R K Belew and M Mitchell (Reading, MA: Addison-
Wesley) at press
Hinton G E and Nowlan S J 1987 How learning can guide evolution Complex Syst. 1
495–502
Holland J H 1975 Adaptation in Natural and Artificial Systems (Ann Arbor, MI:
University of Michigan Press)
Kost B 1993 Structural Design via Evolution Strategies Internal Report, Department of
Bionics and Evolution Technique, Technical University of Berlin
Koza J R 1992 Genetic Programming (Cambridge, MA: MIT Press)
——1994 Genetic Programming II (Cambridge, MA: MIT Press)
Lin S and Kernighan B W 1976 An effective heuristic for the traveling salesman problem
Operat. Res. 21 498–516
Lohmann R 1992 Structure evolution and incomplete induction Parallel Problem Solving
from Nature, 2 (Proc. 2nd Int. Conf. on Parallel Problem Solving from Nature,
Brussels, 1992) ed R Männer and B Manderick (Amsterdam: Elsevier) pp 175–85
Männer R and Manderick B (eds) 1992 Parallel Problem Solving from Nature, 2 (Proc.
2nd Int. Conf. on Parallel Problem Solving from Nature, Brussels, 1992) ed R
Männer and B Manderick (Amsterdam: Elsevier)
Maynard Smith J 1978 The Evolution of Sex (Cambridge: Cambridge University Press)
——1987 When learning guides evolution Nature 329 761–2
Michalewicz Z 1992 Genetic Algorithms + Data Structures = Evolution Programs
(Berlin: Springer)
Milstein C 1990 The Croonian lecture 1989 Antibodies: a paradigm for the biology of
molecular recognition Proc. R. Soc. B 239 1–16
Mitchell M and Belew R K 1995 Preface to G E Hinton and S J Nowlan How learning
can guide evolution Adaptive Individuals in Evolving Populations: Models and
Algorithms ed R K Belew and M Mitchell (Reading, MA: Addison-Wesley)
Morgan C L 1896 On modification and variation Science 4 733–40
Osborn H F 1896 Ontogenic and phylogenic variation Science 4 786–9
Nolfi S, Elman J L and Parisi D 1994 Learning and evolution in neural networks Adaptive
Behavior 3 5–28
Paechter B, Cumming A, Norman M and Luchian H 1995 Extensions to a memetic
timetabling system Proc. 1st Int. Conf. on the Practice and Theory of Automated
Timetabling (ICPTAT 95) (Edinburgh, 1995)
Parisi D, Nolfi S and Cecconi F 1991 Learning, behavior, and evolution Toward a Practice
of Autonomous Systems (Proc. 1st Eur. Conf. on Artificial Life (Paris, 1991)) ed F
J Varela and P Bourgine (Cambridge, MA: MIT Press)
Rechenberg I 1973 Evolutionsstrategie: Optimierung Technischer Systeme nach
Prinzipien der Biologischen Evolution (Stuttgart: Fromman-Holzboog)
Saravanan N, Fogel D B and Nelson K M 1995 A comparison of methods for self-
adaptation in evolutionary algorithms BioSystems 36 157–66
Scheiner S M 1993 Genetics and evolution of phenotypic plasticity Ann. Rev. Ecol.
Systemat. 24 35–68
Schütz M 1994 Eine Evolutionsstrategie für gemischt-ganzzahlige Optimierungsprobleme
mit variabler Dimension Diploma Thesis, University of Dortmund
Further reading 329
Further reading
More extensive treatments of issues related to the Baldwin effect can be found
in the literature cited in section C3.4.1. The following are notable foundation
and review papers.
330 Other operators
4. Belew R K 1989 When both individuals and populations search: adding simple
learning to the genetic algorithm Proc. 3rd Int. Conf. on Genetic Algorithms (Fairfax,
VA, June 1989) ed J D Schaffer (San Mateo, CA: Morgan Kaufmann) pp 34–41
5. Hinton G E and Nowla Hinton G E and S J Nowlan 1987 How learning can guide
evolution Complex Syst. 1 495–502
7. Sober E 1994 The adaptive advantage of learning and a priori prejudice From a
Biological Point of View: Essays in Evolutionary Philosophy (a collection of essays
by E Sober) (Cambridge: Cambridge University Press) pp 50–70
11. Wcislo W T 1989 Behavioral environments and evolutionary change Ann. Rev.
Ecol. Systemat. 20 137–69
12. Whitley D and F Gruau 1993 Adding learning to the cellular development of neural
networks: evolution and the Baldwin effect Evolutionary Comput. 1 213–33
Index, Volume 1
331
332 Index
discussion 3 population-based 83
history 40–58 steady-state 83
use of term 1 two-membered 48, 50
Evolutionary Computation (journal) 47 Exons 34
Evolutionary game theory 37 Expected, infinite-horizon discounted
Evolutionary operation (EVOP) 40 cost 116
Evolutionary processes 37 Expression process 231
overview 23–6 Extradimensional bypass thesis 320
principles of 23–6
Evolutionary programming (EP) 1, 60,
136, 163, 167, 217, 218 F
basic concepts 89–102
basic paradigm 94 Fault diagnosis 7
continuous 95 Feedback networks 97
convergence properties 100 Feedforward networks 97
current directions 97–100 Fertility factor 192
diversification 44 Fertility rate 192
early foundations 92–4 Filters, design 6
early versions 95 Financial decision making 9
extensions 94–7 Finite impulse response (FIR) filters 6
future research 100 Finite-length alphabet 91
genesis 90 Finite-state machines 44, 60, 91, 92, 95,
history 40, 41, 90–7 129, 134, 152, 153, 162, 246–8
main components 89 Finite-state representations 151–4
main variants of basic paradigm 95 applications 152–4
medical applications 98 Fitness-based scan 273
original 95, 96 Fitness criterion 228
original definition 91 Fitness evaluation 108
overview 40, 41 Fitness function 178
self-adaptive 95, 96 Fitness functions 172–5
standard form 89 monotonic 190, 191
v. GAs 90 scaling 66
Evolutionary robotics, see also Robots strictly monotonic 190, 191
Evolutionary strategies (ESs) 1, 48–51, Fitness landscapes 229, 308, 311
60, 64, 81–8, 136, 163 Fitness measure 235
(1 + 1) 48, 83 Fitness proportional selection (FPS) 218
(1 + λ) 48 Fitness scaling 174, 175, 187
(µ + 1) 83 Fitness values 59, 64, 66
(µ + λ) 48, 67, 83, 86, 167, 169, Fixed selection 314, 315
189, 206, 210, 217, 220, 224, 230 Flat plate 48
(µ, λ) 189, 206, 210, 220, 222, 231 Foundations of Genetic Algorithms
(µ + µ) 170 (FOGA) (workshop) 47
alternative method to control internal Functions 104, 105
parameters 86 Fundamental theorem of genetic
archetype 81, 82 algorithms 177
contemporary 83–6, 85 Fuzzy logic systems 163
development 40 Fuzzy neural networks 97
multimembered (µ > 1) 48 Fuzzy systems 44
nested 86, 87
overview 48–51
334 Index