Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
2K views

Thomas Bäck Evolutionary Algorithms in Theory and Practice Evolution Strategies, Evolutionary Programming, Genetic Algorithms

Evolutionary Algorithms in Theory and Practice Evolution Strategies,

Uploaded by

Andres Rz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
100% found this document useful (1 vote)
2K views

Thomas Bäck Evolutionary Algorithms in Theory and Practice Evolution Strategies, Evolutionary Programming, Genetic Algorithms

Evolutionary Algorithms in Theory and Practice Evolution Strategies,

Uploaded by

Andres Rz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 319
Evolutionary Algorithms in Theory and Practice Evolution Strategies Evolutionary Programming Genetic Algorithms Dr. Thomas Back Informatik Centrum Dortmund, Germany New York Oxford OXFORD UNIVERSITY PRESS 1996 Oxtord University Press Oxiord New Yorke Athens Auckland Bangkok Bombay Caicutta Cape Town Dares Salaam Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madras Madrid Melbourne Mexico City Nairobi Paris Singapore Taipe: Tokyo Toronto and associated companies in Berlin Ibadan Copyright © 1996 by Oxford University Press. Inc. Published by Oxford University Press, Inc., 198 Madison Avenue, New York, New York 10016 Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, red in a retrieval system, or transmitzed, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publieation Data Back, Thomas, 1963~ Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms/ Thomas Back. p. em. Includes bibliographical references and index. ISBN 0-19-509971-0 (hard cover) 1. Genetic algorithms. 2. Evolution (Biology)—Mathematical models. L. Title. QA402.5.B333 1995 006.3—de20 96-13506 36 Angin 136798642 Printed in the United States of America on acid-free paper To Christa ("Du bist das Beste, was mir passieren konnte.” Abstract Evolutionary Aigorithams (EAs) are a class of direct, probabilistic search and optimization algorithms gleaned from the model of organic evolution. The main representatives of this computational paradigm, Genetic Algorithms (GAs), Evolution Strategies (ESs), and Evolution- ary Programming (EP), which were developed independently of each other, are presented in this work as instances of a generalized Evolution- ary Algorithm. Based on this generalization and a formal framework for Evolutionary Algorithms, a detailed comparison of these three in- stances with respect to their operators, working principles, and existing theoretical background is performed. Some new theoretical results con- cerning recombination in Evolution Strategies, the convergence velocity and selection algorithm of Evolutionary Programming, and convergence properties of Genetic Algorithms are presented. Besides the algorithmic aspect, the Evolutionary Algorithms are also compared experimentally by running them on a number of artificial para- meter optimization problems (sphere model, step function, a generalized function after Ackley, a function after Fletcher and Powell, and a fractal function). On these problems, both concerning convergence velocity and convergence reliability an Evolution Strategy outperforms Evolutionary Programming, which is still slightly better than a Genetic Algorithm. The second part of the thesis puts special emphasis on analyzing the behavior of simple Genetic Algorithms that work by mutation and (ex- tinctive) selection. For such a (4+A)-GA, a general convergence velocity theory is presented on the basis of order statistics and transition prob- abilities under mutation for the corresponding Markov chain. Closed expressions for these transition probabilities are presented for a partic- ular objective function called “counting ones”. Using these transition probabilities, the convergence velocity and optimal mutation rate for a (1T))-GA are calculated numerically. It turns out that the optimal mutation rate depends mainly on 1/I (the reciprocal of the search space dimension) and on the distance to the optimum as well as the selective pressure. As 2 increases (i.e., as selective pressure increases), both the optimal mutation rate and convergence velocity increase. The informal notion of selective pressure admits a quantification by using takeover times (following earlier work by Goldberg) and selection probabilities. This quantification shows that selective pressure grows in the order proportional selection, linear ranking, tournament selection, (u,A)-selection, (u+A)-selection. As clarified by a taxonomy and detailed comparison of selection methods, these five mechanisms represent all principal possibilities to perform selection in an Evolutionary Algorithm. In addition to the counting ones problem, optimal mutation rates are also analyzed for a (1+1)-GA applied to a simple continuous para- meter optimization problem. These investigations clearly demonstrate the superiority of a Gray code in comparison to the standard binary code, since the former does not create local optima at the coding level by itself as it is likely to happen when the latter is used. Concerning the vill optimal mutation rate, however, both codes yield highly irregular sched- ules that prevent one from drawing any conclusion towards a heuristic for an optimal mutacion rate control. The resuits concerning optimal mutation rates and selective pres- sure are confirmed by a parallel meta-evolution experiment. The meta- algorithm, a hybrid of Genetic Algorithm and Evolution Strategy, evol- ves generalized Genetic Algorithms for an optimal convergence velocity. The results of this experiment confirm that the optimal mutation rate and convergence velocity increase as selective pressure increases. On the other hand, the meta-algorithm clarifies that a combination of tech- niques from Evolution Strategies and Genetic Algorithms can be used. to develop algorithms for handling nonlinear mixed-integer optimization problems. ~ Contents Contents Introduction I A COMPARISON OF EVOLUTIONARY ALGORITHMS 1 Organic Evolution and Problem Solving 11 12 1.3 14 15 Biological Background 1.1.1 Life and Information Processing 1.12 Meiotic Heredity 1.1.3 Mutations 1.1.4 Molecular Darwinism Evolutionary Algorithms and Artificial Intelligence Evolutionary Algorithms and Global Optimization 1.3.1 Some Traditional Methods 1.3.2. Computational Complexity of Global Optimiza- tion Early Approaches Summary 2 Specific Evolutionary Algorithms 21 Evolution Strategies 2.1.1 Representation and Fitness Evaluation 2.1.2 Mutation 2.1.3 Recombination 2.14 . Selection 11 15 19 23 29 34 45 51 57 60 63 66 68 7 73 78 2.2 2.3 24 Contents n 1. on Other Components 1.6 Conceptual Standard Al; 2.1.7 Theory Pp Evolutionary Programming 2.2.1 Representation and Fitness Evaluation 2.2.2 Mutation 2.2.3 Recombination 2.2.4 Selection 2.2.5 Other Components 2.2.6 Conceptual Standard Algorithm 2.2.7 Theory Genetic Algorithms 2.3.1 Representation and Fitness Evaluation 2.3.2 Mutation 2.3.3 - Recombination 2.3.4 Selection 2.3.5 Other Components 2.3.6 Conceptual Standard Algorithm 23.7 Theory Summary Artificial Landscapes 3.1 3.2 3.3 3.4 3.5 3.6 Sphere Model Step Function Ackley’s Function Function after Fletcher and Powell Fractal Function Summary An Empirical Comparison 41 42 Convergence Velocity: f,, fe Convergence Reliability: fs, fz, and fs 106 109 113 114 117 120 121 123 131 137 139 140 142 143 144 148 149 150 154 Contents xi 4.3 Summary 159 II EXTENDING GENETIC ALGORITHMS 5 Selection 163 5.1 Selection Mechanisms 165 5.1.1 Proportional Selection 167 5.1.2 Ranking 169 5.1.3 Tournament Selection 172 5.1.4 (u+A)- and (,A)-Selection 174 5.1.5 Comparison of Takeover Times 179 5.1.6 A Taxonomy of Selection Mechanisms 180 5.2 Experimental Investigation of Selection 183 5.2.1 Clear Improvement of Average Solutions: fi, fs 185 5.2.2 Ambiguous Results: fo, fe, fs 188 5.2.3 A Note on Scaling 192 5.3 Summary 193 6 Mutation 197 6.1 Simplified Genetic Algorithms 199 6.1.1 The Counting Ones Problem 201 6.1.2 Reflections on'Convergence Velocity 210 6.1.3 The Role of the Binary Code 221 6.2 Summary 228 7 An Experiment in Meta-Evolution 233 7.1 Parallel Evolutionary Algorithms 236 7.2 The Algorithm 238 7.3 Evolving Convergence Velocity 243 7.4 Summary 253 Summary and Outlook 257 xii E Contents APPENDICES Data for the Fletcher-Powell Function Data from Selection Experiments Software C.1 Overview C.2. Usage C.2.1 The Graphical User Interface Evos 1.0 ‘ C.2.2 Stand-alone Usage C.2.3 Visualization of Runs C.3 Data Collection The Multiprocessor Environment D.1 The Transputer System D.2 The Helios Operating System Mathematical Symbols Bibliography Index Wy roy or 269 275 275 277 281 282 282 285 285 286 289 293 307 Evolutionary Algorithms in Theory and Practice vie ge gat ne. gages ote BAP sp were we ‘Would you tell me, please, which way I ought to go from here ?? ‘That depends a good deal on where you want to get to’. said the Cat. ‘I don't much care where... ’, said Alice. ‘Then it doesn’t matter which way you go’, said the Cot. ‘So long as I get somewhere’, Alice added as an explanation. ‘Oh, you're sure to do that’. said the Cat, ‘If you only walk long enough.’ Lewis Carroll: Alice in Wonderland. p. 33 Introduction The conversation between Alice and the Cat gives a perfect characteriz- ation of the meandering path full of dead ends, sharp curves and hurdles one has to follow when doing research. After three and a half years, my first section of this path through wonderland ends up with the work presented here, In its final form, it deals with Evolutionary Algorithms (for parameter optimization purposes) and puts particular emphasis on extensions and analysis of Genetic Algorithms, a special instance of this class of algorithms. The structure of this research, however, has grown over the years and is just slightly related to Classifier Systems, the ori- ginal starting point of my work. These contain Genetic Algorithms as a component for rule-discovery, and as Classifier Systems turned out to lack theoretical understanding almost completely, the concentration of interest on Genetic Algorithms was a natural step and provided the basis of this work. The book is divided into two parts that reflect the emphasis on Ge- netic Algorithms (part II) and the general framework of Evolutionary Algorithms that Genetic Algorithms fit into (part I). Part I concentrates on the development of a general description of Evolutionary Algorithms, i.e. search algorithms gleaned from organic evolution. These algorithms were developed more than thirty years ago in the “ancient” times of computer science, when researchers came up with the ideas to solve problems by trying to imitate the intelligent cap- abilities of individual brains and populations. The former approach, em- phasizing an individual’s intelligence, led to the deyelopment of research topics such as artificial neural networks and knowledge-based symbolic artificial intelligence. The latter emphasized the collective learning prop- erties exhibited by populations of individuals, which benefit from a high diversity of their genetic material. Modeling organic evolution provides 2 introduction the basis for a variety of concepts such as genotype, genetic code, phen- otype. self-adaptation, etc.. which are incorporated into Evolutionary Algorithms. Consequently, the necessary prerequisites to understand the relations between algorithmic realizations and biological reality are provided in chapter 1. In addition to this, chapter 1 clarifies the relationship between global random search algorithms and Evolutionary Algorithms, Artificial Intelligence and Evolutionary Algorithms, and computational complex- ity and Evolutionary Algorithms. Provided with this background, the aim of chapter 2 consists in the presentation of. Evolution Strategies, Evolutionary Programming, and Genetic Algorithms as specializations of a unifying formalization of a general Evolutionary Algorithm. This approach allows for a comparison of all components of the algorithms, for an identification of similarities and differences concerning their perception and algorithmic realization, and for the transfer of concepts and sometimes even of theoretical res- ults between the algorithms. In addition to these advantages, chapter 2 may also serve as a detailed overview of the three mainstream repres- entatives of Evolutionary Algorithms for solving parameter optimization problems. The comparison of these three representatives is also performed in a practical way by running experiments on a number of artificial test functions, which are presented in chapter 3. The basic criteria according to which an assessment of the Evolutionary Algorithms is performed are the speed of the search on the one hand and the reliability of the search (in terms of the chance to get “good” results even if the problem is very complex) on the other hand. These informally stated criteria are clarified in chapters 1 and especially 4, where the experimental results are reported. Part II focuses the research on Genetic Algorithms and reports on how the transfer of some concepts from Evolution Strategies (and Evol- utionary Programming) to Genetic Algorithms was achieved. These concepts are related to the selection and mutation operators, such that chapters 5 and 6 deal with an investigation of the impact of selection operators and the mutation rate on the behavior of Genetic Algorithms. The selection operators are characterized in terms of “selective pres- sure,” which allows for assessing the properties of selection operators in a reliable manner. With respect to mutation, the critical problem turns out to be that of an optimal setting of the mutation rate, considering the actual optimization problem and the influence of selection. This fundamental interaction between mutation and selection is identified to be of more importance than assumed so far, and Part Il sheds s some light on this open question. The results from chapter 6 are also confirmed by a special experiment reported in chapter 7, where a new kind of Evolutionary Algorithm is presented. This meta-Evolutionary Algorithm performs a search within Introduction 3 the space of parameter settings of (generalized, according to chapter 5) Genetic Algorithms and vields results that perfectiy come up to expect- ations derived from the theoretical results reported in chapter 6. In this way, part II clarifies by theoretical as well as experimental investigations that the role of mutation in Genetic Algorithms is generally underestim- ated and demonstrates how it might be effectively exploited. The work concludes by summarizing some of the important results and pointing toward a number of open questions for further work. Before releasing the reader to the book’s inner world, special thanks are due to all colleagues at the Chair of Systems Analysis for creating a cooperative and stimulating working atmosphere. The basis for this atmosphere is provided by Hans-Paul Schwefel, the head of the chair, to whom | am very grateful for giving me optimal working conditions and for his patience, understanding, and encouragement. Furthermore, I would like to thank Reiner Manner for his kindness to be my second ex- aminer, Ray Paton for his generous help in polishing up my English, and David B. Fogel for his detailed and helpful technical comments. I always enjoyed the discussions with these experts in evolutionary computation. Finally, a reviewer of the Deutsche Forschungsgemeinschaft (DFG) who recommended my initial project proposal to be worth financial sup- port by DFG deserves my gratitude. Though experiencing a strong cut in research funding in general and basic research funding in particular, Ihave some hopes raised by receiving this support from DFG that basic research funding will not disappear from the German research scene. Thomas Back Dortmund, January 1995 an more ee Part I A Comparison of Evolutionary Algorithms Gott wiirfelt nicht. (God does not play dice.) Albert Einstein: Creator and Rebei 1973. p. 193 God not only plays dice. He also sometimes throws the dice where they cannot be seen. Stephen W. Hawking: Navure 1975, 257, p. 362 1 Organic Evolution and Problem Solving Evolutionary Algorithms (EAs), the topic of this work, is an interdiscip- linary research field with a relationship to biology, Artificial Intelligence, numerical optimization, and decision support in almost any engineer- ing discipline. Therefore, an attempt to cover at least some of these relations must necessarily result in several introductory pages, always having in mind that it hardly can be complete. This is the reason for a rather voluminous introduction to the fundamentals of Evolutionary Algorithms in section 1.1 without giving any practically useful descrip- tion of the algorithms now. At the moment, it is sufficient to know that these algorithms are based on models of organic evolution, i.e., nature is the source of inspiration. They model the collective learning process within a population of individuals, each of which represents not only a search point in the space of potential solutions to a given problem, but also may be a temporal container of current knowledge about the “Jaws” of the environment. The starting population is initialized by an algorithm-dependent method, and evolves towards successively bet- ter regions of the search space by means of (more or less) randomized processes of recombination, mutation, and selection. The environment delivers a quality information (fitness value) for new search points, and the selection process favors those individuals of higher quality to repro- duce more often than worse individuals. The recombination mechanism allows for mixing of parental information while passing it to their des- cendants, and mutation introduces innovation into the population. This process is currently used by three different mainstreams of Evolutionary Algorithms, i.e. Evolution Strategies (ESs), Genetic Algorithms (GAs), and Evolutionary Programming (EP), details of which are presented in chapter 2. This chapter presents their biological background in order to have 8 Organic Evolution and Problem Solving the necessary understanding of the basic natural processes (section 1.1). Evolutionary Algorithms are then discussed with respect to their impact on Artificial Intelligence and, at the same time. their interpretation as a technique for machine learning (section 1.2), Furthermore. their inter- pretation as a global optimization technique and the basic mathemat- ical terminology as well as some convergence results on random search algorithms as far as they are useful for Evolutionary Algorithms are presented in section 1.3. Finally, the chapter is concluded by a (surely incomplete) overview of the history of Evolutionary Algorithms, which now are well over thirty years old and are actually in an impressive period of revival with respect to empirical, theoretical, and application-oriented research. , = 1.1 Biological Background Evolutionary Algorithms are based on a model of natural, biological evolution, which was formulated for the first time by Charles Darwin'. The Darwinian theory of evolution explains the adaptive change of species by the principle of natural selection, which favors those species for survival and further evolution that are best adapted to their envir- onmental conditions (the saying “survival of the fittest,” however, was coined by one of the protagonists of Darwin, H. Spencer, who wanted to clarify the seemingly tautological nature of Darwin’s theory). In ad- dition to selection, the other important factor for evolution recognized by Darwin is the occurrence of small, apparently random and undirec- ted variations between the phenotypes, i.e., the manner of response and physical embodiment of parents and their offspring. These mutations prevail through selection, if they prove their worth in light of the cur- rent environment; otherwise, they perish. The basic driving force for selection is given by the natural phenomenon of production of offspring. Under advantageous environmental conditions, population size grows exponentially, a process which is generally limited by finite resources. When resources are no longer sufficient to support all the individuals of a population, those organisms are at a selective advantage which exploit resources most effectively. This point of view is presently generally accepted as the correct mac- roscopic explanation of evolution. However, modern biochemistry and genetics has extended the Darwinian theory by microscopic findings con- cerning the mechanisms of heredity. The resulting theory is called syn- thetic theory of evolution or, sometimes, neodarwinism. This theory is based on genes? as transfer units of heredity. Genes are occasionally changed by mutations. Selection acts on the individual (the individual is the unit of selection), which expresses in its phenotype the lindependently of Darwin, A. R. Wallace came to the same conclusions. 2The definition of a gene is postponed here to section 1.1.2. lal Biological Background 9 complex interactions within its genotype, i.e., its total genetic informa- tion, as well as the interaction of the genotype with the environment in determining the phenotype. The evoiving unit is the population which consists of a common gene pool included in the genotypes of the indi- viduals. The last three sentences characterize this theory in brief form, but in order to have the basic terminology of genetics available and to understand similarities and differences between the biological reality and the algorithmic models, the genetic background will be explained in more detail. The additional advantage is seen in the possibility to encounter further natural principles which may be useful but are not yet incorporated into the algorithms. In the evolutionary framework, the fitness of an individual is meas- ured only indirectly by its growth rate in comparison to others, i.e., its propensity to survive and reproduce in a particular environment. Fur- thermore, natural selection is no active driving force, but differential survival and reproduction within a population makes up selection. Se- lection is simply a name for the ability of those individuals that have outlasted the struggle for existence to bring their genetic information to the next generation. This point of view, however, reflects just our missing knowledge about the mapping from genotype to phenotype, a mapping which — if it were known — would allow us to evaluate fitness in terms of a variety of physical properties. In a biological context, the term adaptation® denotes a general ad- vantage in ecological or physiological efficiency of an individual than achieved by other members of the population, and at the same time it denotes the process of attaining this state (see [May88], pp. 134-135). Adaptation is a rather general term that includes nongenetic adapta- tion of an individual (somatic adaptation) as well as the genetic sense of genotypic changes over many generations. The overall meaning of ad- aptation is often used synonymously with fitness, i-e., adaptation is the propensity to grow up (and reproduce) (see [May88], p. 128). Further- more, the term “adaptation” bears the question “to what?” Basically, the answer is to any major kind of environment (adaptive zone) or, in a broader sense, an ecological niche (the set of possible environments that permit survival of a species). A very popular metaphor for evolutionary change is given by Wright’s model of an adaptive surface. Possible biological trait combinations* jn a population of individuals define points in a high-dimensional se- quence space, where each coordinate axis corresponds to one of these traits. An additional dimension is used in this model to plot fitness values for each point in the space, reflecting the selective advantage (or disadvantage) of the corresponding individuals. In this way, a fitness landscape or adaptive surface (topography) is defined, which in its sim- 3Prom “ad” and “aptare” (to fit on). ‘Intentionally, a fuzzy terminology is used here. In section 1.1.1 it will become clear how this space can be understood in terms of nucleotide base sequences. 10 Organic Evolution and Problem Solving plified, three-dimensional (two trait dimensions, one fitness-dimension) form looks just like a mountainous region, equiped with valleys, peaks, and ridges (see figure 1.1}. When trait combinations in a popuiation change and average fitness increases, the population moves uphill and simultaneously climbs some of the peaks. This way, a natural analogy to the optimization problem emerges by interpreting evolution as a process of fiuness maximization in trait space. \) (Op (Ih de ani Gy aA ah oe re Wi MPs p Wh AN Mm Y I (i Fitness: iN Fig. 1.1: Schematic diagram of an adaptive surface. However, this is still a picture much too simple to be adequate. It suggests that evolution starts somewhere in the lower regions of sequence space and then steadily and gradually goes uphill, climbing the hill nearest to the starting point’, indicating the possibility of early stagna- tion on a hill which is by far not the highest peak. Instead, there are some That is why deterministic optimization strategies following this metaphor are often called hillclimbing strategies. 11 Biological Background ll further mechanisms which may allow a population to cross a region of lower fitness between two peaks on the adaptive surface. the most obvi- ous one being environmental changes, which can result in transformation to a completely differently shaped surface. Furthermore. in small pop- ulations, fitness decreases can be caused by the phenomenon of genetic drift, therefore allowing for a valley-crossing, as Wright points out in the shifting balance theory ([Fut90], pp. 196-197). Genetic drift is simply a random decrease or increase of biological trait frequencies, leading to a probability distribution of these frequencies around the equilibrium yalues when several subpopulations (demes) under equal conditions are considered. It is also worthwhile to mention the view summarized well in the work of Schull, that individuals themselves by means of an adaptation during their development and by trying to cope as best as they can with the situation are also changing the structure of the adaptive surface [Sch91]. On the whole, it is more adequate to think of the adaptive surface as dynamically changing by means of environment-population interactions, a process which is currently not understood completely. The following excursion to biochemistry and genetics is based on [Got89] and [Kiip90], two books (in German), which give excellent intro- ductions even to the reader who is not so experienced with the material. Furthermore, the book by Futuyma (Fut90]°, a complete and instruct- ive textbook on evolutionary biology, provides more information for the interested reader. 1.1.1 Life and Information Processing Modern molecular genetics led to relatively detailed knowledge about the building plan of living beings, which is encoded in the deozyribonuc- leic acid (DNA), a double-stranded macromolecule of helical structure (comparable to two intertwined spiral staircases). Both single strands are linear, unbranched nucleic acid molecules built up from alternating desoxyribose (sugar) and phosphate molecules. Each desoxyribose part is coupled to a nucleotide base, which is responsible for establishing the connection to the other strand of the DNA. The four nucleotide bases Adenine (A), Thymine (T), Cytosine (C) and Guanine (G) are the al- phabet of the genetic information. The sequence of these bases in the DNA molecule determines the building plan of the organism. The connection between both nucleotide strands is established by hydrogen bonds between the pairs Adenine and Thymine (two hydro- gen bonds) and Guanine and Cytosine (three hydrogen bonds). Most importantly, both strands have complementary nucleotide base struc- tures since the nucleotide bases are always arranged such that a purine base (Adenine or Guanine) in one strand is connected to a pyrimidine base (Thymine or Cytosine) in the other strand and vice versa. This ®The German translation of [Fut86]. 12 Organic Evolution and Problem Solving way, the information encoded in the DNA is redundant and allows for a complete replication (identical doubling) during cell division. Then, the double helix of the DNA is split up into single strands. Free hydrogen bond places at the nucleotide bases provide positions of chemical reac- tion where new (complementary) nucleotide bases which stem from cell metabolism can be taken up. Usually only the special pairings (Adenine + Thymine and Guanine ++ Cytosine) of nucleotide bases are possible during this synthesis step, and under the influence of the enzyme DNA- polymerase both single strands are completed to form new copies of the original genetic information. After having identified the DNA to be the information carrier of life (and evolution), the next important question concerns the mechan- ism which creates an organism (i.e., the phenotype) from its building plan (ie., the genotype). On the single cell level, this connection is established by the mechanism of protein biosynthesis. Proteins are mul- tiple folded biological macromolecules which consist of a long chain of amino acids. The metabolic effects of proteins are mainly caused by their three-dimensional folded structure (tertiary structure) as well as their symmetrical structure (secondary structure), which results from the amino acid order in the chain (primary structure). Typically, a gene on the DNA is defined to be a part of the DNA which includes the information for the synthesis of one protein. The impact of genes on phenotypical features of an organism, ho- wever, is far more complicated than simply representing a one-to-one correspondence between genes and features. Polygeny, i.e., the com- bined influence of several genes on a single phenotypical characteristic, and pleiotropy, i.e., the influence of a single gene on several phenotyp- ical features, represent the normal rather than the exceptional case (see e.g. [Got89], pp. 149-151, pp. 156-159). Related to these mechanisms, the term epistasis is used to denote the impact of one gene — the epi- static one — on the expression of another — the hypostatic — gene (see [Got89], p. 161)’. The alphabet of amino acids is finite, consisting of twenty different acids®. The process by which a protein (a string from a twenty let- ter alphabet) is obtained from the corresponding gene (a string from a four letter alphabet) is divided into two physically separated mechan- isms called transcription and translation. In case of eukaryotes, the organisms’ building plan is located in the nucleus of each cell, while bio- TDavidor transferred the concept of epistasis to Genetic Algorithms as a measure for the nonlinearity and interdependency among the elements representing a geno- type. Besides defining a mathematical measure of epistasis for Genetic Algorithms, he also identified the epistasis range for which application of Genetic Algorithms seems recommendable [Dav90, Davia]. SThese are: Alanine (ALA), Arginine (ARG), Aspartic acid (ASP), Cysteine (Cys), Glutamine (Gun), Glutamic acid (GLU), Glycine (GLY), Histidine (His), Isoleucine (Iz), Leucine (Leu), Lysine (Lys), Methionine (MET), Phenylanaline (PuE), Proline (Pro), Serine (SzR), Threonine (THR), Thryptophane (TRP), Tyrosine (TR), and Valine (VaL). 1.1 Biological Background 13 synthesis of proteins is performed by the ribosomes. The information is transferred from nucleus to ribosome by a messenger ribonucleic acid (mRNA) molecule. which is synthesized in the nucleus by means of the transcription process. Similar to nucleic acid. mRNA is a single-stranded molecule which has ribose as its sugar component and carries informa- tion by means of the nucleotide bases Adenine. Cytosine, Guanine. and Uracil (U) replacing Thymine. The process of transcription proceeds as follows: « The enzyme RNA-polymerase loosens the DNA-strand. « Only one. the so-called coding strand of the DNA serves as a matrix for the copy process. e Recognition regions (promoters) on the strands determine the be- ginning of a gene as well as the coding strand. © Stop sequences determine the termination point of transcription. . Second base Third First base Iw Pe Tae] [ft Cc x q base PHE Ser | Tyr | Cys || U/T RNA: Uracil (U) PEE SER | Tyr | Cys Cc DNA: Thymine (T) Lev SER | Stop | Stop A Lev__| Ser | Stop | Tay || G Lev Pro | His | Arc }} U/T Lev Pro | His | ARG Cc Cytosine (C) Lev | Pro | Gin] Arc || A Lev Pro | GLN | ARG G ILE Tar } Asn | Ser |) U/T ; ILE Tar | Asn | SeR |} C Adenine (A) ILe THR | Lys | ARG A Start/Met | THR | Lys | Arc G VAL ALA | Asp | GLY || U/T Guanine (G VAL Ata | Asp | GLy Cc ‘wanine (G) VAL ALA | Guu | GLY A VAL Ata | Guu | Gly G Table 1.1. The genetic code. In this way, transcription creates an mRNA molecule which reflects the structure of the coding strand, except the nucleotide base Thym- ine is substituted for by Uracil. The mRNA molecules are translocated 14 Organic Evolution and Problem Solving to the ribosome, where their information is used to synthesize the cor- responding protein. The ribosome performs a mapping from triplets of nucleotide bases to amino acids. Each tripiet (codon) encodes ex- actly one amino acid or serves as an indicator for starting and stopping the synthesis. The genetic code, which is the same for all living be- ings and has remained unchanged. is given explicitly in table 1.1. Since three positions having four occupation possibilities each are sufficient for encoding 4° = 64 different symbols, the genetic code is redundant. However, only two positions would not suffice to encode twenty amino acids. . Translation from nucleotide base code to amino acid sequence is per- formed biochemically by adapter-molecules of transfer-RNA (tRNA). These molecules are responsible for transport and application of amino acids. At one end, the tRNA-molecules carry an anticodon which is complementary to a codon and is bonded to it by hydrogen bonds, such that the codon at the mRNA can be recognized by the tRNA. At the other end, the tRNA molecules carry the amino acid which the codon on the mRNA is mapped to and which is easily passed to the protein chain. For a small segment of DNA the mechanisms of transcription and translation are schematically shown in figure 1.2. T-a-GLie-G-ThHic-c-cLt-c-GLit-G-allc-a-alL{c-t-TLit-G-cL- im -a-T-cHa-c-alic-G-al{a-c-clla-c-TH{G-T-T}{G-a-al{a-c-GL Transcription (Nucleus) mRNA afa-u-cl{a-c-alc-c-cl{a-c- Ch{a-c-uL{o-u-ul-{o-a-aLfa-C-G}-s: C-C-C, U-C-G, Direction of processing Translation (Ribosome) Th Proteins THR ARG SER LEU LYS ALA Fig. 1.2: Simplified scheme of protein biosynthesis in living cells. The processes described so far are basically biochemical information processing mechanisms. They imply the central dogma of molecular genetics which states that information is passed from genotype to phen- otype, i-e.: DNA — RNA —+ Protein 11 Biological Background 15 The dogma implies the proof of the incorrectness of Lamarckism, ie.. the theory which states that behaviorally acquired characteristics of an individual can be passed io its offspring. This is impossible. because the phenotype does not change the genotype, there is no information flow backwards®. To complete this section on the biochemical background of neodar- winism, the structure of the genetic information contained in the DNA sequence is briefly explained. On the level of single symbols, four differ- ent nucleotide bases were identified to form the alphabet of the genetic language. Groups consisting of three nucleotide bases are the units of translation, i.e., symbol groups which encode amino acids or start and stop information for transcription. These groups are called codons. A gene is a unit which encodes a protein and consists of up to about one thousand codons. However, often not all information of a gene is used for translation, rather the gene subdivides into exons, i.e., sequences which are translated, and introns, i.e., sequences between exons which do not bear genetic information for the phenotype of the organism!?. Both the units of transcription (scriptons), consisting of up to a few genes each, and the units of replication (replicons), consisting of up to several hundred scriptons, are hierarchical organization structures which are mentioned here only for completeness. On the next higher level of or- ganization, the chromosomes are structures consisting of some replicons. Chromosomes provide an important logical unit of information trans- mission to the next generation, as will become clearer in the following section. All genes of an organism, distributed over several chromosomes, are summarized under the term genome. Finally, the genotype of an organism includes its complete genetic material, ie., the building plan of the organism. Conceptually, it differs from the genome by a more abstract interpretation, while the term genome is also used to denote the sum of all chromosomes from a cytological’! point of view. For providing an overview of the hierarchy discussed so far, table 1.2 summarizes the previous!?. In chapter 2, the analogies to terminology and information structure in Evolutionary Algorithms will be clarified by referring to this table again. 1.1.2 Meiotic Heredity The processes discussed in the previous section are in the first place mechanisms responsible for the ontogenesis of an organism, i.e., its de- This is true with the exception of some RNA-viruses that show the phenomenon of reverse transcription: The virus-RNA is used as a blueprint for a DNA-molecule, which is then inserted into the host genome (see [Fut90], p. 79). 10When the gene includes both exons and introns, by means of transcription a precursor-mRNA containing both introns and exons is created, which is afterwards, by a process called splicing, reduced to mRNA, consisting solely of exons. MCytology: Science of cell and ceil division. 12The terms meiosis and mitosis will be explained in section 1.1.2. 16 Organic Evolution and Problem Solving . Number per organism Information unit Function - ag 3 i dependent} ! | nucleotide base | single symbol 4 | codon unit of translation | 64 | gene unit which encodes | several thousand a protein | scripton unit of transcription | several thousand | replicon unit of reproduction {| some i chromosome meiotic unit few genome: mitotic unit 1 genotype total information 1 Table 1.2. Hierarchy of the genetic information. velopment from the fertilized zygote until its death. A fundamental basis of ontogenesis is mitosis, the process of cell division by which identical‘? genetic material is distributed to two emerging new cells. After mitosis both nuclei have the same number of chromosomes as the initial cell had. In full-grown organisms, the organs consist of differentiated cells which do not undergo cell divisions. However, most organisms also possess some tissues which continually produce new cell material, e.g., the skin, the bone marrow, and, most importantly for evolution, tissues within the sexual organs. In Evolutionary Algorithms, mitosis is generally neg- lected because the factor of interest is to model a sequence of genera- tions, i.e., phylogeny (evolution) instead of ontogeny. Furthermore, the algorithmic model is based on one set of genetic information per indi- vidual, i-e., from a biological point of view unicellular, haploid organisms are modeled. In the diploid case (e.g., in the case of humans) each body cell includes two sets of chromosomes, such that for each chromosome two homologous forms exist. In contrast, haploid organisms possess only one set of chromosomes per body cell. For diploid organisms, cor- responding genes in homologous chromosomes are called alleles, which make up a much more complicated hereditary situation (summarized by the Mendelian laws) than in the haploid case. The details are omitted here, because Evolutionary Algorithms are by far not advanced to a level of development at which complete Mendelian genetics would be bene- fitted by. However, to see the basic mechanism, the obvious fact that 13 Mutations may occur during mitosis. Later on, when discussing mutations in general, this event will be discussed briefly. 14The structure of both sets of chromosomes is identical, but their genetic inform- ation content may be different. 11 Biological Background 17 alleles of diploid organisms may be identical (homozygotic} or different (heterozygotic) is worthy of mention here. For the former case, it is at least clear which allele is passed io the offspring. In most cases, only one of two different alleles included in the offspring’s genome is pheno- tyDically expressed!>. This allele is called the dominant one in contrast to the other, recessive one. Important for the process of reproduction is not the mitotic but the meiotic cell division, which works differently in diploid compared to haploid organisms. First. the diploid mechanism of meiosis is discussed, since although it is more complicated than the haploid case it provides a deeper understanding of the process and can easily be transferred and teduced to the haploid case. Mainly, meiosis in diploid organisms is the process through which germ cells (gametes) are created. In order to form a new diploid zygote by fusion of a gamete from a male and a female parent organism, gametes must be haploid cells. Thus, during meiosis the double genome of a cell is reduced to create two cells containing only one genome each. Practically, four daughter nuclei with a reduced number of chromosomes emerge from the original cell by means of two subsequent cell divisions. What is genetically more important are reordering processes of the genomes as well as reorganization processes of chromosomes during meiosis, which both cause the emergence of gametes consisting of recombined!® haploid genetic material. The functional units during the first meiotic metaphase and former phases of meiosis are homologous pairs of chromosomes, so-called bi- valents. The bivalent forms orientate randomly, such that in the follow- ing division phase the assignment of chromosomes to one or the other haploid group of chromosomes is stochastic. This completes the first mei- otic division, and the second division simply reproduces the new haploid genetic material once again, leading to four haploid cells (gones), pre- cursors of the gametes. Before genomes are reordered in the first meta- phase, chromosomes undergo reorganization processes in the bivalents during a meiotic stage called pachytene. Every chromosome consists of two structurally identical parallel elements, chromatids, which are con- nected in a special region called centromere!”. It is important to see that corresponding regions of chromatids are duplicates of one single allele; different alleles can only occur on different homologous chromosomes. This way, a bivalent really consists of four parallel chromatids. Chromat- 15An important exception is known as the effect of incomplete dominance, where individuals that are heterozygotic for a certain gene can generate a different pheno- typic characteristic than either of two alternative homozygotic conditions. 16 Recombinants are organisms having redistributed alleles in some genes when comparing them to their ancestor forms. The human genome can in this phase best be analyzed, such that usually chro- mosomes look like an “X” or “Y? when looking at colored microscopic pictures. However, this phase is only obtained by replication processes during the interphase, a phase preceding meiosis. 18 Organic Evolution and Problem Solving ids at the same chromosome are usually called sister-chromatids. Reor- ganization processes between adjacent non-sister-chromatids occurring in the pachytene phase are called crossing-over*? (or intrachromosom- ous recombination). Their effect is a segment exchange of chromosome parts. For the diploid case, heterozygotous allele pairs Aa and Bb, and one crossover point the mechanism is shown schematically in figure 1.3. Centromere * Crossover-point (fe 1e Sister-chromatids Fig. 1.3: Scheme of one-point crossover between non-sister-chromatides. The location of the crossover position(s) is completely at random. Furthermore, the number of crossover points depends on the length of the chromosome. In nature, between one and eight crossover points have been observed ([Got89], p. 118). Caused by these effects, the haploid genetic material of the gametes is usually not identical to any of the two genomes of their parent cell. However, further changes in the genetic material may occur. Before dis- cussing these mutations, we return to haploid organisms, where meiosis must necessarily be different from the diploid analogue. In haploid or- ganisms, gametes are of course haploid!®, such that meiosis is a process which follows cell fusion in the zygote and reduces the diploid zygote to a normal haploid cell. Recombination and crossover processes Occur in the zygote. Evolutionary Algorithms, as we will see in chapter 2, are generally models of the haploid case. ~ Concluding this section, table 1.3 summarizes the characteristics of meiosis as discussed above. 18Crossing-over is often abbreviated as crossover, a terminology which will be used from now on. 1°This implies that gametes are in this case formed by mitosis rather than meiosis. Li Biological Background 19 : Occurrence q Diploid organisms: In sexual organs. | i Haploid organisms: In zygotes. | ; Purpose | Diploid organisms: Formation ot gametes. { Haploid organisms: Recreation of haploid phase. Two subsequent cell divisions. First Prophase: Formation of bivalents. Crossover of chromosomes. First Metapnase: Reordering of genomes. e First Anaphase: Halving of the chromosome number. Second meiotic division. Four haploid daughter cells having unequal genetic material. Development Table 1.3. Characteristics of meiosis. 1.1.3 Mutations Although DNA-replication is a copying process of overwhelming exact- ness, it is (fortunately) not perfect. Mutation processes of the genetic material can be caused by a variety of factors, the simplest one being a replication error switching a specific base pair in the original DNA strand to the other possible base pair in the copy. According to data given by Futuyma ([Fut90}, pp. 82-83), the bacterium eschericia coli has a gen- ome consisting of 3.8 - 10° base pairs and a probability of spontaneous base pair mutation in the order of 4-10~!°. This amounts to a mutation probability per genome and per generation in the order 2-10~°. For the human genome, these values are a factor of one thousand greater. Gott- schalk gives a probability of 6- 10-® up to 8- 10-8 for the occurrence of a spontaneous mutation of a specific gene ([Got89], p. 197), which is in good agreement with Futuyma’s data giving values of 0.5 up to 14.3 mutations per 10® cells or gametes, respectively, depending on the specific gene ({Fut90], p. 83). Such small error rates for the genetic information of mammalian genomes are achieved by providing repair mechanisms for DNA, which have emerged during evolution and are also encoded in the DNA (see [Got89], pp. 269-271). Special enzymes are encoded in the DNA, which have been identified as repair enzymes for a variety of damages of the 20 Organic Evolution and Problem Solving double strand*°. In addition to repair enzymes, mutator genes that in- crease the mutation rate of other genes within the genome have also been identified ([Got&9], p. 182). Altogether, the total mutation rate of an organism is in part regulated by its own genotype (we will return to this biological fact as a model for algorithmic improvements of Evoiutionary Algorithms in several chapters of the book). Besides replication errors on a normal level (implicit in the replic- ation process), exogenous factors, so-called mutagenes, can drastically increase mutation probabilities. Examples are radiation (X-ray, ultravi- olet. cosmic radiation, gamma rays) and some chemical substances. Mutations can be subdivided according to different criteria. In the following,-a brief look at the classification based on the location of muta- tions within the organism and a more detailed one at the classification with respect to the kind (and amount) of mutational deviation are given. According to the location, somatic and generative mutations are dis- tuingished. Generative mutations are those which take place in the germ path or in gametes and are therefore mutations which are passed to the offspring, while somatic mutations occur in body cells and can therefore not be passed to the offspring. Somatic mutations may be tragic for the single individual, e.g. when a carcinogenic oncogene is activated by a single base-pair mutation. However, these mutations do not influence the development of the species. Therefore, generative mutations are of main interest in Evolutionary Algorithms}. When looking at the kind of mutation, three different deviations of the copy from the original are usually distinguished: e Gene mutations: A particular gene is changed and may cause a deviating effect to the organism. © Chromosome mutations: The gene ordering within a chromosome is changed, leading to a new arrangement of genes within the chro- mosome. Furthermore, the number of genes within the chromo- some may be increased or decreased. « Genome mutations: Either the number of genomes or the number of chromosomes is increased or decreased. Gene mutations can be further divided into a number of groups ac- cording to the amount of information changed in the gene. Small muta- tions are typically neutral with respect to their selective effect, i.e., they do not negatively affect the viability of the organism. In contrast, large mutations cause Clearly recognizable deviations in the phenotype. They 20Some of these mechanisms are e.g. excision-repairing (damaged parts are cut out and substituted by normal ones), information transfer corrections (by enzymes during replication), and postreplicative repairing. . 21 With the exception of an early study presented by Schwefel, who tried to model the process of somatic mutations to apply Evolution Strategies to discrete optim- ization problems (Sch75a]. More general processes of ontogeny are the subject of so-called Haeckel strategies (BE91]. - 11 Biological Background 21 provide the basis of racial differences, but they are also the cause of defect and lethal mutations. Progressive (constructive) mutations are in princtpie macromutations, which can cause crossings of boundaries between species. Gene mutations in the form of small mutations (espe- cially point mutations) are common modeis for mutation in Evolutionary Algorithms. Chromosome mutations can be divided into the following three main groups according to their quantitative or qualitative effect on the genes: Losses of chromosome regions (deficiencies and deletions). e Doubling of chromosome regions (duplications). « Reorganization of chromosomes (translocations and inversions). A necessary prerequisite for chromosome mutations is always the occurrence of breaks in the chromosome. A deficiency is a terminal segment loss, caused by the occurrence of one break, while an internal segment loss is called deletion and requires the occurrence of two breaks in one chromosome. Both events occur in a single chromosome and are often summarized under the term deficiency. Chromosome muta- tions can best be investigated (and are sometimes only possible) in the so-called pachyteous phase, between non-sister-chromatides. In the fol- lowing graphics, however, we restrict attention to as much chromatides or chromosomes as needed for demonstrating the working mechanisms of mutation. Figure 1.4 demonstrates both the effect of a deficiency and a deletion??. A duplication process requires two homologous starting chromosomes and leads to the doubling of a certain region on one chromosome at the expense of a.corresponding deficiency on the other chromosome. After completion of two breaks and restitutional events both resulting chro- mosomes differ in their structure. Generally, when such an event occurs during meiosis, the deficient gamete is not capable of fertilization. On the contrary, the gamete containing the duplication often shows normal functionality. The schematic mechanism of a duplication event is shown in figure 1.5. The effects of duplications and deficiencies on the chromosome struc- ture are of a quantitative nature. Inversion is a mutation which instead only qualitatively changes the chromosome by rotating an internal seg- ment of the chromosome by 180° and refitting it into the chromosome. A necessary requirement for inversion is the occurrence of two breaks as it is the case for a deletion. When the chromosome is doubly broken, both deletion and inversion have an equal probability of occuring. The principal mechanism of inversion is shown in figure 1.6. Finally, translocation denotes a process by which genetic material is exchanged between nonhomologous chromosomes, without any loss of 22Numbers in the graphics denote chromosomal regions. 22 Organic Evolution and Problem Solving Break Fig. 1.4: Scheme of terminal (deficiency, left) and internal (deletion, right) segment losses. i Breaks 2030475868758) —— > r L. EOEe Ee of af afin [3] [3] Fig. 1.5: Scheme of a duplication event. genetic material. Translocation results in two newly combined chromo- somes. In principle, with respect to its effect the mechanism is compar- able to a one-point crossover, the main difference being that crossover is defined to operate on homologous chromatides. Therefore, we refer to figure 1.3, where the process working on the inner both non-sister- chromatides is schematically identical to the process acting o1 on nonhomo- logous chromosomes in case of translocation. We conclude this section by giving some information on genome mutations. Generally, the phenomenon of a change in the number of 11 Biological Background 23 Fig. 1.6: Scheme of an inversion event. chromosomes is called polyploidy, further subdivided into euploidy and aneuploidy. Euploidy means a change of the genome by complete addi- tional sets of chromosomes, while in the case of aneuploidy a deviation by single chromosomes or chromosome groups is denoted. For aneuploidy, both increases (hyperploidy??) and decreases (hypoploidy?*) of chromo- some number are observed in nature. So far, genome mutations have not been tested as an extension of evolutionary algorithms. However, some chromosome mutations like du- plications, deletions, and inyersions were investigated, and we will give some references when discussing the algorithms in chapter 2. 1.1.4 Molecular Darwinism The complete human genome consists of approximately one billion nuc- leotide bases (e.g., see [Got89], p. 86). For each position within this information chain one out of four different nucleotide bases can theor- etically be placed in the location. This simple consideration leads to a number of 41-000.000.000 combinatorial sequence possibilities, a num- ber which is more than huge: It is in fact far beyond any imagination”. Even for very much shorter sequences of several hundred nucleotide bases encoding single proteins the probability of creating a special sequence at 23]mportant examples of polysomy are trisomics such as those concerning the X- chromosome of human females. Normally, this does not create a striking abnormality ({Got89], p. 334). 24 hypoploidy that concerns a sexual chromosome of humans is normally lethal ([Got89], p. 334). ?5In order to give an impression of the magnitude of this search space, we would like to mention that currently the number 10!?° is accepted as a kind of universal complexity limit, since it is an upper bound on the number of possible events in the universe up until now. The number results as a product of the 10®° stable elementary particles in the universe and the age of the universe, counted by elementary time units. which amounts to about 104°. 24 Organic Evolution and Problem Soiving random is vanishingly small, such that, from a non-teleological point of view, the random emergence of self-reproducing units can be called im- possible. Throughout this book. we wiii discuss a more adequate point of view which is able to explain the efficiency of biological evolution much better than the pure random search interpretation. It is thanks to Eigen that the evolution of macromolecules and the emergence of a unique genetic code are in principle understood, and we will briefly present an overview of his theory of molecular Darwinism, based on [Eig71, Eig76, Ktip90}. At its core, the theory explains the chemical part of evolution starting from very short and simple molecules capable of replication up to the first cells: To do so, the idea of Darwinian selection is transferred to any kind of evolving system in general and macromolecules in particular by identifying the necessary conditions for Darwinian selection: « Metabolism: Any individual species must be built up from energy- rich matter which is transformed into energetically lower states, i.e., the system must be open and far from equilibrium. © Self-reproduction: Only by means of self-reproduction concurrent behavior and selection can emerge. ¢ Mutation: New information can only be generated by a process of self-replication with errors. Considering s different species i € {1,...,s} and their concentra- tions z; under the assumptions metabolism, self-reproduction, and mut- ability Eigen formulated rate equations for the dynamical behavior of species (see [Eig71], p. 476): tee MS (AQ: Dd , ei die SEs (AQ Dias Diease + Bose - (a) Here Q; denotes the quality factor and is equal to the fraction of error-free copies created by replication. A; is a functional term sum- marizing concentrations of energy-rich fundamental substances, and the whole term A;Q:x; is interpreted as the build-up term resulting from self-replication. On the contrary, D; denotes a limiting effect on the lifetime of states, such that the term —Djz; incorporates processes of destruction. Altogether, both terms describe the metabolism, propor- tionality to z; includes self-reproduction, and Q; describes mutability of the species. Additionally, x; may grow by errors that occur during reproduction processes of members of another class k # i. For all such classes different from i this effect is summarized in the term Xe i Pik Tk, where vig can be interpreted as transition probability from class k to class t. Finally, growth and shrinking processes of the total number of individuals are included in a flow $y, which enters the equation in proportion to the relative concentration of species ¢. 11 Biological Background 25 Under the assumption of a constant overall organization of the sys- tem described by the set (1.1) of s differential equations. it can be transformed further. Constant overall organization means buffering the concentrations A; of energy-rich substances, such that Ay Furthermore, the total size of the system is limited”®, ie.: Sa = c,=const . (1.3) k=l Then, in order to remain at a constant population size and defining E, = A,—Dy, the excess productivity "i_, E,x, must be compensated by transportation through the flow &, such that s & = -SiAm . (1.4) k=l By defining an average excess productivity — Lhe Fete E(t) = Soa (1.5) equations (1.1) can be transferred to a = (W;-E(t))zi+ YS vince ’ (1.6) k#i where W; = A;Q; — D; characterizes the selective value of a species i. The equations (1.6) are inherently nonlinear, since E(t) includes the variable z;. Explicit solutions of (1.6) have been obtained by Eigen under certain conditions, the most important one being constant overall organization. He presented solutions for the case of completely neglected information flow into and out of mutant copies (eg: PikTe = 0) as well as for the case of an approximate consideration of error production, ie., single-digit defects. The average productivity E(t) provides a self-adjusting threshold value which reflects the self-organization of the system. Only those species having selective values W; above the threshold E(t) will grow, consequently shifting E(t) to higher values until an optimum of E(t) is 26This condition also implies a conservation law for the error copies, since any error copy created from one species must enter another species of the system, i.e.: Sra ane: = Yeas . (1.2) i=l ist byt 26 Organic Evolution and Problem Solving reached, representing the maximum selective value of all species. This is characterized as an extremum principle of the form Hm E(t) = Amax - (1.7) toe where Amax denotes the maximum eigenvalue of the s x s-matrix W = (vx) of transition probabilities between the different classes (yi; = wi)". The largest eigenvalue Amax will in most cases equal the maximum diagonal coefficient Wax, up to a second order perturbation term of the form Dggm YmkPkm /(Win — Wi). The selection criterion, which would allow growing of a new species m to become the dominant one, is given by = Seem Fete Wn > Bigm = Se (1.9) . Deegm Zk or, including the more exact terms from perturbation theory: Wm > Eegm— nA ea (1.10) tan The currently dominant species together with its stationary distri- bution of mutants emerging from this species are called quasi-species. The concentration of the dominant species (wild-type) itself may be rel- atively low, but in combination with the mutant distribution the emer- gent quasi-species is dominant. Altogether, the dynamical shift of E(t) toward the largest eigenvalue allows for coexistence of species as well as growth of new mutants. An important question concerning the selective creation of informa- tion asks for the correlation between parameters of the system and the amount of information which can be created. To investigate this, Eigen assumes a number of ! single symbols in the genome and a mean copying accuracy @ per symbol. Then only a fraction q! of correct copies is pro- duced in each generation, ie., Qm = @! for the wild-type. For Qm > o>) as a lower bound, using [In g| » 1—g for 1—-g « 1 (q = 1), one obtains a maximum length Imax such that the information can be preserved by reproduction: lag, ln = [Beni . (1.11) In the simplest case am is the ratio of the wild-type reproduction rate to the average reproduction rate of the rest. The exact form of a can 27Starting from equation (1.6), a transformation of variables is possible, which yields a new system of equations: u(t) = Oi~ EO) - (1.8) Here E(t) = cs )),_; Aevelt); ce = Spi, te = Dogay Ye = const, and A; denotes the ith eigenvalue of the matrix W. 1a Biological Background 27 be calculated from (1.10). Calculation of an approximation from (1.9) yields Am on = . {1.12} Da+Fn (122) Equality (1.11) defines an absolute upper bound for the stable repro- duction of information and is valid for any kind of information trans- fer process. Experimental results clearly demonstrated that this form of macromolecular evolution is not able to facilitate the development of self-reproducing entities longer than at the most hundred nucleotide bases. Furthermore. concerning the error rate. Eigen summarized the following observations: e Error rates too small cause a very small rate of progress of the evolution process. e Error rates too large cause destruction of information; it “melts” away. e Optimal evolution conditions are found just below the error rate which causes information to be destroyed. Up to now we have discussed systems capable of Darwinian selection in its most general case. Self-reproduction implies a rate equation of the principal form?® N =kN and therefore exponential population growth, which in a system of limited population size leads to the emergence of Darwinian selection. Having a sufficiently large selective advantage, any new species created by mutation can in principle grow and become the dominant species at any time??. In contrast to this, Eigen’s concept of a hypercycle leads to a mech- anism of selection which is in fact a “winner-take-all” selection. A hy- percycle denotes a cyclically coupled arrangement of reproductive units, e.g. a catalytic cycle”, Then, population growth follows the rate equa- tion N = kN? and is therefore hyperbolic. Under such circumstances, selection happens only once in case of a limited system. AS soon as one species has grown up, no growth of new, advantageous mutants is possible. This is not a Darwinian selection, and it does not allow for diversity of species. Altogether, we can very briefly summarize the statements of the the- ory concerning the emergence of life starting from biological macromolec- ules and confirmed by a variety of experiments (see (Eig71], p. 511 ff.) as follows: 28Here N denotes the number of individuals in the population. 29Stable coexistence is possible within a size-limited system if population growth is linear, i.e, N =k. Under certain conditions, this is compatible with Darwinian selection [Eig76]. 30 A similar catalytic cycle, the so-called carbon-cycle, presumably constitutes the driving force of the chemical evolution of stars (see e.g. (Jan84], pp. 135-137). 28 Organic Evolution and Problem Solving (1) The first phase is characterized by coexistent evolution of molecv- lar units capable of self-reproduction according to the principle of Darwinian selection. This is limited by the error rates to the emergence of nucleotide segments having lengths below one hun- dred bases. (2) Only by means of a hypercyciic composition, the system consisting of self-replicating subunits can be integrated and stabilized as a functional unit. The integrated system is capable of self-replication of molecules which are by large factors longer than the chains of each subunit. (3) Hypercyclic selection optimizes the system which has emerged first, thereby excluding all alternatives. As a result of non-Darwi- nian, hypercyclic selection, one universal genetic code is produced. (4 A compartmentalization process of the hypercycle (i.e., the hyper- cycle escapes into a system which is saved from any “pollution” caused by unfavorable mutations) allows for the exploitation of genotypic advantages, leading to a joining together of all replic- ative units the hypercycle consists of. Finally, the first biological cells emerge. (5) Darwinian evolution leads to the development of the known variety of species. Though this theory can never explain the precise route of one particu- lar evolution (e.g. our own), it provides insight into the general principles of selection and evolution at the molecular level. Computer experiments were performed by Nowak and Schuster in order to investigate the error threshold above which information dissolves [NS89]. They used a popu- lation of strings formed over an alphabet of two symbols instead of the quaternary nucleotide base alphabet, and assigned a fixed replication constant larger than one to the master sequence. Each other possible sequence was assigned a replication rate of one. Under these condi- tions, which describe a simple single-peak fitness landscape, Nowak and Schuster performed simulations for every value of the copying accuracy q, which was chosen to be identical for both symbols and each location in the sequence (ie., g = gq). Sequences were grouped into error-classes according to the number of bits in which they differed from the master sequence, and each simulation was run until a stationary state repres- enting the quasi-species was reached. The error threshold can then be identified as a sharp transition (comparable to a phase transition) from a localized quasi-species to a random drift through sequence space, re- ducing the concentration of the wild-type to its expectation value under the assumption of a uniform random search. Using a birth and death model, they obtained an approximate analytical expression for the de- pendence of the error threshold on population size c,, sequence length J, EAs and Artificial Intelligence 29 and selective advantage om of the master sequence, which in the limit of large populations amounts to Wont Qmin = Tm! (+e) (1.13) 7 This expression is more exact than the lowest order approximation amin = oni! (1.14) already derived by Eigen (Eig71]. As we will see more clearly in chapter 2 when discussing Genetic Algorithms, the findings from Eigen’s theory as well as empirical and theoretical results on error thresholds have much in common with Ge- netic Algorithms. Essentially, the experiment performed by Nowak and Schuster can be interpreted as a Genetic Algorithm without crossover, working on a very simple fitness landscape. Findings on the importance of the error threshold turn out to be valid in principle also in other Evolu- tionary Algorithms. This can be summarized into the intuitive idea that mutation should not destroy structures more quickly than they repro- duce (but if this happens some time, the whole process may be started once again). 1.2 Evolutionary Algorithms and Artificial Intelligence In Computer Science, Artificial Intelligence (AI) has from the very be- ginning always been one of the most fascinating and, rather quickly, most frustrating research branches. To give a definition of Artificial In- telligence is not as easy as one might expect, but my favorite is still the short sentence by Rich (see [Ric88], p. 1): Artificial Intelligence is the study of how to make computers do things at which, at the moment, people are better. The definition captures the attempt to make computers “intelligent” in a sense coming very close to human intelligence**. Rapid progress was expected at the beginning of Artificial Intelligence in about 1950, but the research results did not come up to expectations in many cases. Nowadays, some researchers in Artificial Intelligence propose to ori- entate toward imitation of the much more restricted capabilities of less complex animals such as flies or ants. That is why the definition of Artificial Intelligence cited above can today possibly be improved by re- placing the word “people” by “living beings.” The modern subfield of “31 Reprinted by permission af McGraw-Hill, Inc. from E. Rich and K. Knight: Ar- tificial Intelligence, copyright © McGraw-Hill, Inc., 1991. 32More specifically, individual intelligence is meant here in contrast to collective intelligence. 30 Organic Evolution and Problem Solving Artificial Intelligence which has emerged mainly from such a shift of de- mands is called Artificial Life (AL). Although the term “life” indicates a high-level approach including inrelligence, it is essentially a bottom- up approach based upon models of simple living entities, The relation between Evolutionary Algorithms and Artificial Life will be discussed later on. According to the definition given above, Artificial Intelligence is a many-faceted research field, working on a variety of different tasks such as game playing, theorem proving, forecasting, general problem solving, perception (mainly vision and speech, closely related to natural language understanding), automatic programming, and machine learning in gen- eral. The latter topic, machine learning, is one of the most important subfields of Artificial Intelligence. From the beginning of the research on Artificial Intelligence until presently the idea that something which is called “intelligent” should be able to improve its behavior based upon the growing experience it gathers when doing the same (or a similar) ac- tion repeatedly and also to develop a concept of what a mistake is and how to avoid it during repeated actions is central to any efforts towards machine learning. The characterization given by Michalski is again only one of several possibilities, but it includes the most important ideas especially when relations between machine learning and Evolutionary Algorithms will be discussed (see [Mic86], p. 10): Learning is constructing or modifying representations of what is being experienced. The definition concentrates on an internal representation the learning system (a human being or a computer program) constructs and modifies of its environment (typically, what is experienced by the learning system is the environment or at least a certain part of it), Classical AI research concentrates on the use of symbolic representations based upon a finite number of representation primitives and rules for the manipulation of symbols, Together, representation primitives and manipulation rules are a formal system and therefore provide a universal model of computation. However, the problem of whether or not thinking in general is identical to computation and hence is restricted to partially recursive calculations opens a more philosophical discussion far beyond the scope of this work. The-symbolic period of Artificial Intelligence can be dated approx- imately from 1962 until 1975, followed by a knowledge-intensive period from 1976 until about 1988 which emphasized on a large amount of knowledge incorporated into the learning systems. However, symbolic representations, e.g. using predicate logic, semantic nets, or frames, were one central characteristic also of this period. Currently, the field of Ar- tificial Intelligence is starting to spread research into a variety of direc- tions and tries to integrate different methods into large-scale systems, thus combining their advantages as far as possible. But the earliest, so-called subsymbolic period of Artificial Intelli- gence dates from about 1950 until 1965. This period did not rely on 1.2 Evolutionary Algorithms and Artificial Intelligence 31 symbolic but on subsymbolic — numerical — representations of know- ledge. Evolutionary Algorithms make use of a subsymbolic representa- tion of knowledge encoded in the genotypes of individuals. and in fact some early approaches of Evolutionary Algorithms from the subsym- bolic period are known and will be discussed in section 1.4. Here, only the other early subsymbolic knowledge representation example of Artifi- cial Neural Networks (ANN), based upon a simp!e model of the central nervous systems of higher animals, is mentioned. Knowledge in Neural Networks is distributed over connection weights of the edges between vertices (called units) in a graph structure. Units loosely correspond to nerve cells (neurons), edges to dendrites and the axon. Both make up the connections to other nerve cells, and weights can be seen as an analogy to synaptic connection strengths. During learning the weight values are incrementally updated. The early approaches by Rosenblatt (Perceptron, [Ros58]), Selfridge (Pandemonium, [Sel59]), and Widrow (Adelaine, [Wid62]) are the start- ing points of the modern research field of Artificial Neural Networks. However, it took a long time to overcome the damage caused by the result of Minsky and Papert [MP69] demonstrating the extremely re- stricted computational capabilities of simplified perceptron-like network structures. They showed that a perceptron with only one layer of out- put units and one layer of input units is unable to calculate the logical exclusive-or function. The result stopped research in this field for nearly twenty years, until it was sufficiently well-known that by introduction of an additional layer of units the problem can be solved (the utility of adding additional layers in a neural network was already known in the 1970s). Today, Artificial Neural Networks are subject to many studies and have been applied successfully to a variety of problems in Artificial Intelligence, e.g. pattern recognition, natural language understanding, and data classification. For an overview of applications, the book of Dayhoff [Day90] is instructive. Returning to the representational difference between subsymbolic and symbolic Artificial Intelligence some remarks on both techniques are important in order to understand why the representation turns out to be a critical choice within programs in Artificial Intelligence. Artifi- cial Intelligence had its main success when the problem domain could be treated as an abstract microworld which is disconnected from the world at large. That is why game-playing has gained much attention from the beginning of research in this field°?. On the contrary, classical Artificial 33Samuel’s checkers playing program is a famous and successful approach to this task, Which was written in the early days of Artificial Intelligence research, i.e. in the subsymbolic period (Sam59]. He used a scoring polynomial for calculating a quality Measure for Moves and board positions. The scoring function was dynamic insofar as the weight factors within the scoring polynomial were updated in a learning process, based upon success or failure of the moves generated by the program. Of course, assignment of responsibility for a successful game to certain moves is by no means trivial; this credit assignment problem is still a critical point. Furthermore, from a set of possible terms in the scoring polynomial only half of the total number was 32 Organic Evolution and Problem Solving Intelligence systems suffer from a very narrow application domain and their brittieness in case of slight modifications of the tasks they are built for. Examples include the problems with generalization and inductive inferences in rule-based systems and unforeseen modifications of the en- vironment in case of blocks-worid programs. The syntactical details are important reasons for the brittleness problem, and the representational bias of symbolic Artificial Intelligence systems cannot be estimated high enough, In the field of Artificial Intelligence the concept of adaptation is closely related and sometimes used synonymously to learning. Remem- bering the discussion from a biological point of view presented in sec- tion 1.1, it can be recognized that adaptation has a more general mean- ing than learning: Learning refers to a property of individuals that is experienced during their lifetime, while adaptation can take place both in individuals when confronted with unknown environments during their lifetime and with a population of individuals over several generations on the basis of genotype changes. Furthermore, it is coupled with an im- provement of the performance of individuals in their environment, which is a more ambitious property than some simple forms of learning have to fulfil. As an abstraction from biological background, adaptation can be in- terpreted as a goal-oriented successive progress of improvement of struc~ tures in order to give better performance in their environment. This is well in accord with the fitness interpretation of phenotypic adaptation (section 1.1) and gives evidence to handle adaptation and optimization as denoting identical concepts from different disciplines. To clarify the position of Evolutionary Algorithms within the variety of machine learning strategies, we use a classification which is directly based upon the underlying learning strategy and its complexity with re- spect to the inference process. The classification is essentially a combin- ation of those presented by Michalski, Carbonell, and Mitchell [CMM84] and the later variant by Michalski [Mic86]: « Rote learning: No inference processes take place. Instead, direct implantation of knowledge is performed. in use for evaluation at the same time. In order to test also the inactive terms, random exchange of scoring terms between the active and the inactive set took place sometimes. We explained some details here because Samuel's work cannot only be seen in the light of game playing, but also can be interpreted as an early Evolutionary Algorithm. Changes in weights and polynomial term structure can be interpreted as mutation processes, and evaluation of moves on the basis of success or failure within the total game provides the selection criterion. This way, a Scoring polynomial of high checkers playing fitness is evolved. However, in a second paper on the checkers playing problem, Samuel no longer used these “evolutionary” techniques [Sam67]. Instead, he improved performance by completely relying on tree pruning techniques (alpha-beta pruning) and used a “book learning” procedure (parameter adjustments based on replaying large number of games played by checkers masters). 1.2 Evolutionary Algorithms and Artificial Intelligence 33 e Learning dy instruction: This term denotes knowledge acquisition from a teacher or from an organized source and integration with existing knowledge. Mainlv selection and reformulation of inform- ation are periormed. e Learning by deduction: Deductive, truth-preserving inferences and memorization of useful conclusions are summarized by this term. Learning by analogy: The transformation of existing knowledge that bears strong similarity to the desired new concept into a form effectively useful in the new situation. e Learning by induction: Inductive inferences. « Learning from examples (concept acquisition): Based upon a set of examples and counterexamples, the task is to induce a general concept description explaining all positive examples and excluding all negative examples. « Learning by observation and discovery (descriptive general- ization, unsupervised learning): Search for regularities and general rules explaining all or at least most observations in absence of any teacher who provides feedback. The learning strategies are ordered with respect to an increasing com~ plexity of the inference mechanisms used. The classification includes Evolutionary Algorithms as an example of an unsupervised learning technique, i.e. inductive learning by observation and discovery. The fol- lowing reasons can be identified for this characterization of Evolutionary Algorithms as learning algorithms: « No teacher exists who presents examples, counterexamples or even knowledge to the learning system. Instead, the algorithm generates examples on its own. i The creation of new examples (search points) by the algorithm is an inductive guess on the basis of existing knowledge. If the guess proves its worth, it is kept in the knowledge base (the population), otherwise it is discarded by means of selection. So far, we have discussed Evolutionary Algorithms with respect to their learning characteristics and therefore related them to Artificial Intelligence as an example of AI programs. Indeed, in many cases Ar- tificial Intelligence tasks can be reduced to the problem of performing a heuristic search within a search space of vast size. The structures within this space can be relatively complex, e.g. permutations, graphs, or game-playing strategies. The search is guided by a heuristic func- tion defined by the researcher who develops the AI program, and by the heuristics that are incorporated into the search algorithm. The purpose 34, Organic Evolution and Problem Solving of the heuristic function is to determine the desirability of the struc~ tures tested by the algorithm and thereby to prune the search tree. It is obvious to see the possibility for using Evolutionary Algorithms as a heuristic search procedure in AI programs. Many such applications of Evolutionary Algorithms have been reported in the literature, search- ing spaces of different complexity, e.g. production rule spaces (classifier systems), game strategy spaces, and program spaces. The clear advant- age of Evolutionary Algorithms is their universal applicability to almost any kind of structure space, in contrast to the specificity of classical AI search strategies (e.g. the A*-algorithm. which is essentially a best-first search on graphs). Finally, concluding this brief survey of the relations between Artificial Intelligence and Evolutionary Algorithms, we return to the modern Ar- tificial Life [Lan89] subfield of AI. Artificial Life research concentrates on computer simulations of simple hypothetical life forms (life-as-it-could- be or synthetical approach, to use the terminology of Langton) and the problem how to make their behavior adaptive (e.g. adaptability of simple robots to unforeseen situations is desired). Furthermore, self-organizing properties emerging from local interactions within a large number of simple basic agents are investigated. These agents usually at the same time cooperate to solve a problem and compete for a set of limited re- sources. Basically, the agents’ actions can be executed asynchronously, in parallel. Analogies to natural systems can be drawn on 4 variety of different levels, including particles, cells, organs, individuals, and pop- ulations. In many cases the agents are equipped with internal rules or strategies determining their behavior, and an Evolutionary Algorithm is used for evolving these strategies. Artificial Life research is actually in its starting period, and no clearly defined boundaries can be identified. However, further development of AL will possibly lead to better understanding of the nature of life and emergent, self-organizing behavior in general. 1.3 Evolutionary Algorithms and Global Optimization Referring again to the adaptive surface metaphor, we can imagine an ex- tremely complex, unknown functional dependence which maps genomes (Le., a word consisting of letters from the alphabet of nucleotide bases) to fitness measures judging phenotypical expressions of genotypes. Dur- ing evolution, genotypes producing phenotypes of increasing biological fitness are created by means of the processes mutation and recombin- ation on the genotype and selection on the phenotype. This way, an optimization of fitness takes place, and even if we assume a constant adaptive surface that is not changed according to the positions of in- dividuals themselves, the combination of mutation and recombination allows in principle for leaving a smaller hill of the landscape and there- fore prevents evolution from getting stuck on suboptimal hills. 1.3 Evolutionary Algorithms and Global Optimization 35 This very simplified point of view provides the basis for the idea to use a simulated evolutionary process for the purpose of solving an op- timization problem. where the goal is to find a set of parameters (wnicn might be interpreted as a “genotype” as well as a “phenotype”) such that a certain quality criterion is maximized or minimized. Problems of this type have an enormous significance in many fields of research and industrial production, e.g. in computer-aided design and construc- tion, biological, chemical, electrical, and medical engineering, production planning, and Artificial Intelligence (see section 1.2)**. In a very general form. the main goal of the global optimization prob- lem is sammarized in the following definition ((TZ891, p. 1f.): Definition 1.1 (Global minimum) Given a function f : MC IR” > RR, M #4, for z* € M the value f* := f(Z*) > —oo is called a global minimum, iff VEEM: f(z) < f(z). (1.15) Then, £* is a global minimum point, f is called objective function, and the set M is called the feasible region. The problem of determining a global minimum point is called the global optimization problem. Recently, Kursawe has demonstrated that Evolutionary Algorithms can in principle be extended to solve multiple criteria decision mak- ing (MCDM) problems, where f is of the more general form f : MC R= R',M#0,k>1 {Kur91, Kur92]. For such problems, a set of non-dominated solutions (the Pareto-set) exists, such that the quality of a solution can be improved with respect to a single criterion only by becoming worse with respect to at least one other criterion. By incorpor- ating biological concepts like diploidy, dominance, and recessivity into an Evolution Strategy, Kursawe’s algorithm is able to generate solutions covering the Pareto set, extending the capabilities of traditional meth- ods which yield just one point ofthe Pareto set. In the following, we will restrict attention to the single-criterion case, ie., k = 1. Furthermore, we will concentrate on minimization problems for pro- viding a standardized point of view for algorithmic test runs. This does not restrict generality of the optimization problem, since the identity max{f(Z)|Z€M} = —min{-s(Z)|Ze M} (1.16) holds. The feasible region M is specified more closely in the next defin- ition (IGMSW89], p. 197): Definition 1.2 (Constraints) Let M := {# € R® | 9(Z#) > 0 Vi € {1,..-,q} } be the feasible region of the objective function f : M — IR. 34An annotated bibliography of applications of Evolutionary Algorithms provides an impressive overview of their potential [BHS92]. 36 Organic Evolution and Problem Solving The functions 9; : JR" — IR are cailed constraints, and at a point Z € FR" a constraint 9; is called satisfied : 9,(F)>0. active 2 (Z) =0 active al) : (1.17) inactive :< 9j(Z)>0, and violated > 9;(Z) <0. The global optimization problem is called unconstrained, iff M = iR"; otherwise, constrained. In section 2.1.5 we will outline that a simple method to handle in- equality constraints (it is easy to see that inequality constraints of the form g;(#) <0 can be transformed into the form used in definition 1.2) in Evolutionary Algorithms works by repeating the creation of a new solution Z until all constraints are satisfied. Except a first attempt by Michalewicz and’ Janikow for linear constraints, general equality con- straints g;(Z) = 0 are currently not taken into account by Evolutionary Algorithms [MJ91]. The method indicated for inequality constraints cannot be extended for this case because the new solution will surely be infeasible if it is not generated in a very clever way (which, in general, may require to solve an additional! optimization problem per constraint). In general, the objective function topology shows not only one, but several minima of different depths, divided by higher regions. Most optimization methods, starting by chance in the region of attraction®® of one of the minima, are able to approach just this minimum, in spite of the fact that it might not be the deepest of all these local minima, ive., a global one. To formalize the notion of a local minimum a distance measure or metrics is needed for an arbitrary vector space. Definition 1.3 (Metrics) Let V be a vector space. A metrics on V is a mapping p: V? — IR, such that Vi,w,z 6 V: oi,a) = 0 o t=0 Als,) = plw, a) (1.18) AG,G) < pli, #) + (2,0) However, it is also sufficient if instead of a metrics only a norm is defined in case of a vector space V on the real numbers (or, more generally, the complex numbers). Informally, the norm of a vector can be seen as a measure of its length. Definition 1.4 (Norm) Let V be @ vector space on JR. A norm on V is a mapping ||- ||: V — IR¢, such that Vi, €V,r € R: (el 0 e t=6 ral = la (2.19) etal < lot lho A ll 35By which we mean, informally, the set of all points from which a monotonic sequence of down-hill steps leads to the minimum point. 13 Evolutionary Algorithms and Global Optimization 37 Tt is easy to verify that any normalized vector space is also a metric space by setting o(%,0) = |le-all . (1.20) Of course. the inversion of this implication does not hold. Usually, when topologies of the space JR™ are considered, the Euclidean norm is pre- supposed, ie, V2 = (21,...,2n) € IR": al = | Sole. (1.21) i=l Then, the metrics ||Z — jl] yields the common definition of Huclidean distance. Based on these definitions, the meaning of a local minimum is put in concrete terms as follows: Definition 1.5 (Local minimum) For # € M the value f := f(#) is called a local minimum, iff e€Re>0: ViEeM: |l€-Z f 5. In a very natural way the discussion of global optimization turned from continuous variables to discrete ones by introducing the idea of a grid search technique. Generally, optimization problems with discrete object variables are called combinatorial optimization problems. The definition of a global optimum does not of course undergo changes when combinatorial optimization problems are discussed, but for local minima the notion of an €-environment turns into the concept of neighborhoods, based on suitably defined metrics. Both c-environments and neighbor- hoods serve to characterize sets of points which are “close” (according to the metrics) in search space to a given point, but the neighborhood is more general. Following Papadimitriou and Steiglitz ({P$82|, p. 7), we define a neighborhood as follows: Definition 1.6 (Neighborhood) A neighborhood, defined on a set S, is @ mapping N : S + 25, where 25 denotes the power set of S. Based on neighborhood structures, the definition of a local minimum can now be carried over to discrete spaces. Definition 1.7 (Local minimum in discrete spaces) For # € M, where M # @ denotes an arbitrarily defined feasible region, and a neigh- borhood N : M — 2™, the value f := f(Z) is called a local minimum with respect to N, iff f Qn 2 1.3 Evolutionary Algorithms and Global Optimization 41 Fig. 1.7: Autocorrelation function of binary sequences (n = 12). task is to find a permutation r € Sp = {s: {1,...,n} > {1,...,n}} such that the objective function (the tour length) f : S, — JR, where nol f(t) =D petaya (try + Pann) + (1.29) i=1 attains its minimum. Again, the size of the search space grows exponentially depending on n, the number of cities, since there are (n—1I/2 = VR Qe)" (1.30) possible tours (the start position is arbitrary, and the tour order may be inverted). Although good approximation algorithms for the TSP 42 Organic Evolution and Problem Solving 16.0 “ \200.0 400. mee ve function volue 0,0 1000.0 oo ‘00.0 Op ject 9.0 200.0 400.8 Fig. 1.8: TSP visualization (n = 30). have been developed (see [PS82], p. 410ff.), the problem still offers high attractiveness for applying new algorithms such as evolutionary ones. This is mostly due to the following reasons: « The problematic of the TSP can be understood easily, since it comes very close to a popular real-world problem. 39Phis is why David E. Goldberg in his lecture held during the First International 1.3 Evolutionary Algorithms and Global Optimization 43 e The TSP serves as the simplest case of a variety of ordering prod- lems which are of enormous relevance to industrial scheduling pro- cesses. €.g. flow-shop scheduling and job-shop scheduling. Several “standard” ISP data sets are available from literature. e.g. Krolak’s 100 city problem [KFM71], such that resuits are com- parable even if the global optimum is not yet known definitely. Concerning computational complexity, the TSP as an NP-complete problem is known to be a representative for a large class of prob- lems for which — as is widely believed, but not proved until now ~ no deterministic polynomial-time algorithm exists which solves the problem. This informal notion will be put in concrete terms in section 1.3.2. Following an idea of Schwefel*® for mapping permutation problems like the TSP to a continuous representation 7 € [u,v]"; u,v € IR, the resulting topology of a TSP can be visualized. From # =(2,...,n) € {u,v]" a permutation m can be obtained by simply sorting the vector components, which yields a new vector Z’ = (z(1);- ++)Zx(n)) such that Zq(1) 0 Va € IR”, the second term of the equation always yields a positive contribution to the Hessian V? f(a). 41'The gradient VIG) = (Hy ola) is the vector of partial derivatives of f (V is called nabla-operator). The Hessian V? F(a) = (#2) Ving € {1,...,n} 8a; is a matrix consisting of all second-order partial derivatives. (-)T denotes matrix transposition, and I is a unit matrix of appropriate size (here: n Xn). 1.3 Evolutionary Algorithms and Global Optimization 45 However, the first term may have positive or negative sign. thus in total the entries of the Hessian may be positive, zero, or negative, indicating convex or concave’? bendings of (2). In other words, f/Z) might be multimodal. Only in case of a model linear in aj. ie. G2) = Yari(z) , (1.33) isl the Hessian is positive, implying a unimodal function f(z). Then. Vf(@) = 0 is a necessary and sufficient criterion for the optimum. Otherwise, in the more general case, global optimization algorithms are needed. Figure 1.9 shows a representative three-dimensional topology plot of an objective function emerging from a nonlinear parameter es- timation problem*. The multimodal topology is smooth, reflecting the continuous differentiability of the problem. Several applications of Evolutionary Algorithms to the parameter estimation problem are reported in literature. Johnson and Husbands applied a Genetic Algorithm to adjust parameters of a model describing the inflow-outflow dependence of a circular water tank with corregated sides (JH91]. Frankhauser used an Evolution Strategy for parameter es- timation of a master-equation based nonlinear model of interurban mi- gration processes ((Fra92], p. 39). Fogel demonstrated the application of Evolutionary Programming to system identification problems includ- ing parameter estimation and finding a reasonable model as well. The latter requires the incorporation of a measurement for model complexity into the objective function in order to search for the simplest (so-called ARMA) model fitting the data ([Fog91], pp. 157-190). This brief survey of the many-faceted world of instantiations of global optimization problems was intended to give an impression of the general problem complexity on the one hand and the relevance of global optim- ization on the other hand. Caused by the existence of a huge number of still unsolved practical problems, researchers have developed a variety of different algorithmic methods to tackle them, Evolutionary Algorithms only being a small group of them. More traditional approaches to global optimization will be a (brief) topic of the next section. 1.3.1 Some Traditional Methods The large number of existing global optimization methods makes it dif- ficult to classify them adequately. This is clearly demonstrated by Térm 42Details on quadratic forms, convex and concave functions and positive defin- iteness (positive semi-definiteness) are omitted, since they are not relevant for the global optimization methods described here and May be found in any textbook of linear algebra, 43For reasons of clarity the graphic shows a plot of —f, ive., the objective func- tion has to be maximized then. The location of the global optimum is indicated in figure 1.9 by the broken lines, -* 46 Organic Evolution and Problem Solving Fig. 1.9: Typical topology of a nonlinear parameter estimation problem (n=2), and Zilinskas, who discuss six existing classifications before presenting their own (see [TZ89], pp. 16-19). Of course, neither their classifica- tion nor that of Zhigljavsky ((Zhi92], pp. 10-13) is used here. Instead, the following classification, taken from [Rud90], is more appropriate to the properties of Evolutionary Algorithms. The main distinction is made between volume-oriented and path-oriented methods, the latter group being subdivided further into prediction methods and ezplora- tion methods. Volume-oriented methods are based on the idea that the whole feasible region must be scanned, implying the requirement of a restricted search space of finite volume, while the concept of a path- oriented method is to follow a path in the feasible region, starting from an arbitrary or from the best point known so far. Prediction meth- ods use an explicit internal model of the objective function to predict the steps while exploration methods do not possess an explicit internal 1.3 Evolutionary Algorithms and Global Optimization 47 model. Usually, the iatter test different paths without requiring each trial to be successful, but discard a path if its inappropriateness is con- firmed. Some ciassicai representatives of these classes are grid search {see e.g. [Sch77]. pp. 32-33), Monte-Carlo strategies (see e.g. {Sch7” pp. 108-110; {Zhi92', pp, 77~80), and cluster algorithms (see e.g. (TZ89!. pp: 9 6) as volume-oriented representatives, tunneling methods (see e.g. (TZ89], pp. 61-62) as path-oriented prediction methods, and pattern search (see ¢.g. [Sch77], pp. 54-58) or the method of rotating coordinates (see e.g. [Sch77], pp. 58-63) as path-oriented exploration methods. as we will see later, Evolutionary Algorithms combine all these fea- tures and cannot be assigned specifically to one of the categories given here. They fit best into the group of path-oriented exploration methods, but to a lesser degree they are also predicting as well as volume-oriented methods. The character of the algorithms changes during the course of the optimization process and can also be controlled by exogenous parameters. To turn over to theoretical results about global optimization meth- ods, a common misunderstanding is discussed first which was previously indiciated at the beginning of the discussion on molecular Darwinism (section 1.1.4). It concerns the Monte-Carlo method or uniform ran- dom search, which is often misinterpreted as an evolutionary method. But evolution has a memory in parent genomes, and offspring are never generated independently from parents, as is the case in uniform ran- dom search. To formulate and analyze the algorithm, a volume measure for bodies in n-dimensional Euclidean space is required. The Lebesgue- measure provides such a generalized volume concept. Definition 1.9 (Lebesgue-measure) Let A; C IR" denote a system of pairwise disjunctive subsets of IR" (AiN.A; =0 Vi Plime) =F}=1 . (1.35) The following convergence theorem for uniform random search, in- cluding the proof, is cited from Zhigljavsky’s book (see (Zhi92], pp. 78- 79). Theorem 2 Let f : MC IR" — RR for a Lebesgue-measurable feasible region M. Then, the sequence Z(1),Z(2),... of random vectors generated by algorithm 1 converges to Z* with probability one. Proof: For arbitrary ¢ > 0, it is wey \ P{E(k) €U(E")} = 1- (1- SED) , (1.36) ie, Pf{limeoo 2(k) = 2} =1. QED. To put this into a more concrete form, we assume the feasible region M to be a hypersphere of radius R. Furthermore, U,(Z*) is a hyper- sphere of radius ¢, centered around <*. The Lebesgue-measure of these sets amounts to Regn? OO = Taps (1.37) 1.3 Evolutionary Algorithms and Global Optimization 49 respectively p(U2(Z")) = 2727/7 /T(z/2 + 1) (where I denotes the Gamma function**), i.e., their volume ratio* is mUlE)) _ fey\n “an (A) (1.38) Thus, in order to reach at least a probability p* to hit U.(z"), z\nyk py = 1-(1-(%)") (3.39) results, or, after solving for the number & of trials: In(1 - p*) TW ye . 1.40 in (0 =(4)) mo) Using the approximation In(1+ 2) = z for x < 1, we obtain R\" k = —In(l—p*) (2) , (1.41) which again clearly demonstrates the exponential growth of computation time depending on n. Indeed, it can be shown for p* > 0.63 that the uniform random search performs worse than grid search (see (Sch77], pp. 108-109). This result is only due to not preventing repeatedly sampling the same points. The main problem with the uniform random search algorithm is given by the restriction to a constant probability distribution. A more general form of the algorithm allows for the construction of a new probabil- ity distribution at each iteration.’ The new distribution may or may not depend on previous trial results generated by the algorithm. This generalized algorithm includes, as we will demonstrate later, certain ver- sions of Evolutionary Algorithms as well as the uniform random search algorithm. Furthermore, it incorporates the random search algorithm by Solis and Wets and provides the basis for the corresponding global convergence result [SW81] and the result presented by Pintér [Pin84]. The description of the global random search algorithm is based again on Zhigljavsky’s book ([Zhi92], p. 85). T(z) = fF #7! exp(—t)dt. 45Brooks misinterpreted uniform random search by overlooking the fact that the volume ratio depends on a, consequently arriving at the result that the number of search steps is independent of n (Bro58]. This was recognized and corrected by Hooke and Jeeves (HJ58]. 50 Organic Evolution and Problem Solving Algorithm 2 (Global random search) teal choose probability distridution p, on M: while not terminate do sampie {z(t),. ()} from pe, evaluate {1(t),..., Ey, (t)}: {f(Z1@))... FEM): construct pe+1 according te a fized rule; : tisttl; od Without logs of generality, Vt: N; = 1 can be assumed. Then, the following convergence theorem taken from [Zhi92] (p. 88) holds*®: Theorem 3 Let f be continuous in the vicinity of Z* and assume that Ve>0: Sale) =00 , (1.42) tal where a(e) = int (PEO ETE) (2.43) Then for any 6 > 0 the sequence of random vectors (1), Z(2),... gen- erated by algorithm 2 with Wt: N, = 1 falls infinitely often into the set Ly-45 with probability one. Proof: Fix 6 > 0 and find « = e(6) > 0 such that U,(Z") C Ly-4s. Determine the sequence of independent random variables {x:} on the two point set {0,1} such that Pix: = 1} =1-P(e=0} =aile) - (1.44) Then, P{Z(t) € Ue(*)} = P{xe = 1}, and the theorem is proved if one can show that {x:} infinitely often takes value one. The latter follows from equation (1.42) and Borel’s zero-one law (see [Rén77], pp. 326~328, p. 342), which completes the proof. QED. Theorem 3 does not in general imply global convergence with prob- ability one according to definition 1.10. Although the set Ly-45 is sampled infinitely often, the global minimum is also “lost” again by the algorithm as long as no countermeasure to prevent such losses is ad- ded. The simplest countermeasure is to accept Z(t) only if it improves the objective function value monotonously (i.e., f(Z(t)) < f(Z(t — 1))), ©The notation used by Zhigljavsky in his book in English language is not always clear. It is more useful sometimes to have a look at the mathematical notation used in a book written by Zhigliavsky and Zilinskas, although the book is in Russian language (ZZ91]. For the theorem discussed here, p. 124 of the latter book was very helpful. 1.3 Evolutionary Algorithms and Global Optimization 51 but more complicated methods are possible. Nevertheless, theorem 3 is useful and will be referred to in chapter 2, as it provides the basis of some known proofs of global convergence with probability one. Due to its generality. the theorem does no longer allow for an estimation of the computational complexity of the algorithm as uniform random search did. In general, we have to assume exponential complexity in n. These algorithms together with their convergence theorems conclude this section on global optimization and its mathematical background as far as related to Evolutionary Algorithms. Disregarding the emphasis on a Similarity to the biological model, Evolutionary Algorithms can be interpreted as global random search techniques and therefore may be categorized as an instance within a variety of techniques proposed and analyzed by researchers. Overviews of these techniques are given in both books extensively referenced in this section, i.e., (TZ89] and [Zhi92]. But the reverse interpretation does not hold; there is no reason to agree with Térn and Zilinskas who reduce Evolutionary Algorithms to nothing more than a notational convention, writing (see [TZ89], p. 74)*7: Even the simplest random search algorithm may be inter- preted in terms of biological evolution. Generating a random trial point is analogous to mutation and the step towards the minimum after a successful trial is a selection. Evolutionary Algorithms, in contrary, try to benefit from looking at nature and modeling concepts gleaned from biological evolution as these obviously bave proven to be useful, but this depends somehow on one’s “world model.” It is this interpretation from life surrounding us which helps to develop better stochastic search algorithms which are widely applicable and at the same time relatively efficient. Those who interpret uniform random search in terms of biological evolution do wrong by nature in assuming her to use disproportionately silly mechanisms. 1.3.2 Computational Complexity of Global Optimization Within this section some important results on the computational com- plexity of global optimization problems will be summarized. These res- ults will be helpful as they allow estimation of the limits of what can be achieved by trying to solve global optimization problems using a random search technique or, more specifically, an Evolutionary Algorithm. For a more formal treatment of an algorithm, the concept of a Turing machine is used. A Turing machine is the abstract machine capable of computing any function f : A* — A* for which an algorithmic description can be given. The alphabet A is a finite set of symbols, and A* denotes the set of all finite strings of symbols from A, including the empty string. ‘TReprinted by permission of Springer-Verlag and the authors from A. Térn, A, Zilinskas: Global Optimization, Lecture Notes in Computer Science 350, p. 74, copyright © Springer-Verlag Berlin Heidelberg 1989. 52 Organic Evolution and Problem Solving This equivalence of Turing machines (and, consequently, a number of different formal systems which are known to be computationally equi- valent with Turing machines} and algorithms. known as Churci's thesis. is the basis of modern theoretical computer science. A Turing machine is surprisingly simple, shown schematically in figure 1.10. It consists of a two-way infinite tape, made up of tape squares each of which is al- lowed to contain just one of a finite set T of tape symbols and which are labeled by integer numbers. A read-write head is able to look up and manipulate the tape, one tape square at each time step. The read-write head is controlled by the “program” of the Turing machine, the finite state control, which is defined by a transition function 9. Finite state control 3-2-1 0 1 23 Two-way infinite tape Fig. 1.10: Schematic diagram of a Turing machine. Definition 1.11 (Deterministic Turing machine) A deterministic Turing machine (DTM) is an eight-tuple DTM = (T, A, br, Q, 40,97 ,9N1 9) (1.45) where T ts a finite set of tape symbols, A C T' is a subset of input symbols, br € T — A is a distuingished blank symbol, Q denotes a finite set of states, including a start-state go € Q and two different halt-states qy € Q and qv € Q. 9:(Q—{ay,an}) xT + Q xT x {-1,+1} (1.46) is the transition function. Informally, the DTM works as follows: The machine receives an input string s € A* on certain subsequent fields of the tape, say fields 1,...,|sl, the rest of the tape containing blank symbols. The computation starts 1.3 Evolutionary Algorithms and Global Optimization 53 in state go having the read-write head placed on field number 1. A state transition of the form ¥(q,s1) = (a',sa,A) (1.47) indicates that, being in state q and reading symbol s, € T with its read- write head, the machine operates by overwriting 5; by s2 € T, moving the read-write head one field to the left (right), if A = -1 (A = +1), and going into state gq’ € Q. If the new state q = gy or g’ = qn, computation ends, the answer being “yes” in the first, “no” in the second case. Usually being notated in the form of a transition table, 3 provides the finite control (“program”) of the machine. Based on this working mechanism, all input strings s € A* to a de- terministic Turing machine can be classified according to the acceptance criterion provided by the distuingished halt-state gy. Definition 1.12 (Recognized language of a DTM) A DTM D with input alphabet A accepts s € A* iff D halts in state gy when applied to s. The language Lp recognized by D is Lp = {s€A*|D accepiss} . (1.48) Most critical to assessing the efficiency of an algorithm is its running time needed for performing the calculation. Very naturally, the basic time unit of a Turing machine is one execution step (1.47) of the trans- ition function. Then, an informal notion of the time complexity of a Turing machine is as follows: Definition 1.13 (Time complexity function of a DTM) For a DTM D that halts for all inputs s € A*, the time complexity function tp: Zt + Z* is: Tp(n) = max{m|aAse'A*, |s| =n: (1.49) the time to accept s by D is m} . D is called a polynomial time DTM iff there exists a polynomial p such that Vn € ZT: tp(n) < p(n). The basic idea is to distinguish between algorithms requiring at most polynomial computation time and those requiring more than polynomial (i.e., exponential) time. While the former are generally interpreted to be treatable, a problem is interpreted as intractable if all deterministic al- gorithms to solve it are of at least exponential time complexity*®. There- fore, the class P of languages formally captures the idea of polynomial time algorithms or tractable problems. 48But bear in mind that for small problem sizes (i.e., small values of n), an expo- nential time bounded algorithm can be more efficient than a polynomial one. 54 Organic Evolution and Problem Solving Definition 1.14 (P) P = {L} There is a polynomial time DTM D: L=Lp} (1.80) In contrast to a deterministic Turing machine, a nondeterministic one (NDTM) captures the intractable problems in the sense explained above. A NDTM is almost identical to a DTM, except that the trans ition function now allows for an arbitrary choice out of a set of possible transitions from a current state and input symbol. Definition 1.15 (Nondeterministic Turing machine) A nondeterministic Turing machine (NDTM) is an eight-tuple NDTM = (T, A, br, Q, 90, 9¥; Iv, 8) (1.51) where T, A, br, Q, 90, Ivy, Gn are defined as in the deterministic case and 9: (Q —{av,an}) xT 29% (1.52) maps a state / symbol pair to a set of possible actions. An alternative but completely equivalent formulation of the idea is presented in (GJ79] (p. 28-31): A NDTM is a DTM which is extended by a “guessing” module, which writes a string s € A* to the tape in an arbitrary, nondeterministic manner, possibly never halting. Then, the normal DTM part runs on the guessed string, either accepting it or not accepting it (i.e., performing a check whether the guess was right). This way, whenever a solution to the problem is guessed, it is assured that the NDTM can check the solution in polynomial time. Definition 1.16 (Recognized language of a NDTM) A NDTM D with input alphabet A accepts s € A*. iff at least one of the infinite number of possible computations of D on s halts in state gy. The language Lp recognized by D is Ip = {s€A*|D accepts s} . (1.53) Now, when there may be several distinct computations of D on in- put s leading to acceptance, the time to accept s is defined as the min- imum time over all accepting computations of D on s. Definition 1.17 (Time complexity function of a NDTM) The time complexity function tp : Z+ —+ Z* of a NDTM D is given by to(n) = max{m|3se€ Lp, |s|=n: » thé time to accept s by D ism} S77 (1.54) Finally, the class NP is defined as 13 Evolutionary Algorithms and Global Optimization 55 Definition 1.18 (NP) NP = {L)| There is a polynomial time NDTMD : L=Lp} (1.55) It is obvious from the definitions that any decision problem solvable by a deterministic polynomial time algorithm is also solvable by a non- deterministic polynomial time algorithm. ie., P C NP. Furthermore. it is widely believed, but not proved, that P # NP, because nobody has been able to find a deterministic polynomial time algorithm for a problem in NP. What is still needed is to establish the connection between optimiza- tion problems. decision problems. and accepted languages. Rather than going too much into details about the second connection, it is sufficient to recognize that an encoding of a problem instance as a word taken from a finite alphabet is always possible (see (GJ79], pp. 18-21, for further details). Concerning the formulation of optimization problems as de- cision problems, the pseudoboolean optimization problem f : IB” — Z (restricting here the possible range of values to Z) is cited here from the work of Hart and Belew as a decision problem [HB91]: Definition 1.19 (Pseudoboolean optimization problem) Given a string encoding integers n and c and a DTM D+ which com- putes a function f : IB” — Z in polynomial time, the decision problem of pseudoboolean optimization is: Does there exist an Z € IB” such that f(Z) > ? In any case, a decision problem can be no harder to solve than the corresponding optimization problem. Hart and Belew also show the N P- completeness of this decision problem, a term which, to express it in- formally, denotes the hardest problems in NP. If only one N P-complete problem could be solved in polynomial time, then all problems in NP could be solved in polynomial time. The reason is that for an NP- complete problem L € NP all other problems in NP can be mapped to L by a polynomial transformation. Definition 1.20 (Polynomial transformation) A polynomial transformation from a language L, € Aj to language Lz 3 is a function h: At — A} such that: (1) There is a polynomial time DTM program that computes h. (2) Ws, © Af: 3, EL, @ h(s1) € Le. The notation L, ba Lo denotes the existence of a polynomial transform- ation from Ly to Lo. Definition 1.21 (NP-completeness) A language L is NP-complete iff L€ NP and: : VUENP: Leal. . (1.56) 56 Organic Evolution and Problem Solving Polynomial transformation is a transitive relation, such that for de- monstrating that a language L, is NP-complete it suffices to show Lz € NP and to find a polynomial transformation from a known VP-compiete Janguage L, € NP to Le. Of course, there must exist a first NP- complete problem for which .V P-completeness has to be shown explicitly. This is the satisfiability problem for boolean expressions, and its NP- completeness is the subject of Cook’s theorem (e.g. [AHU74], pp. 379- 383). According to the exponential growth of the search space as discussed in section 1.3.1, it is not surprising that the decision problem to check whether a ‘given feasible solution of a smooth, nonconvex nonlinear op- umization problem is not a local minimum is also NP-complete. The result implies that any global optimization problem that goes beyond a very low complexity is an NP-complete problem [MK87}. In addition to NP, Hart and Belew need the class RP for deriving their results. RP is a class of languages almost identically defined as NP, but with the exception that for acceptance of s € A* at least half of all computations halt in state gy (compare definition 1.16). It is known that RP C NP and PO RP # 4G, and it is widely believed that RP 4 NP. The main result presented by Hart and Belew states that no single nondeterministic algorithm exists which is able to approach the global optimum of arbitrary functions f : IB” — Z to a certain accuracy ¢ in an efficient way, i.e., in a time polynomial in n [HB91]: Theorem 4 If RP 4 NP, then no nondeterministic polynomial time algorithm A can guarantee that Vf : IB" + Z: f-Af) < e (1.57) for a constant « € IR, € > 0. Here A(f) is used to denote the result algorithm A yields when applied to problem f. Practically, this is a big disappointment. Though it was known in advance that there does not exist a deterministic polynomial time al- gorithm capable of exactly solving arbitrary pseudoboolean optimiza- tion problems, there was some hope that nondeterministic efficient al- gorithms might exist to do so, e.g. Evolutionary Algorithms. This is not the case, however. Even worse, no efficient nondeterministic approz- imation algorithm for arbitrary pseudoboolean optimization problems exists! Intuitively, such a result can be explained by considering an object- ive function that consists of a flat plateau that contains a very sharp hole where the minimum is located (e.g. a pseudoboolean function that yields identical function values for all but one argument vector which is sought after). Assuming such a flat plateau, no other method than those requiring exponential time is able to locate the minimum. Since theorem 4 in principle also includes problems of this kind, the result is Early Approaches 57 quite understandabie. In realistic applications, however. it is a reason- able assumption that the structure of the optimization problem provides more useful topological information for the algorithm and that we are not interested in locating solutions that are sharply isolated in flat re- gions (“needle in a haystack”). 1.4 Early Approaches More than thirty years ago, during the subsymbolic period of Artificial Intelligence, some researchers presented first attempts to model nat- ural evolution as a method for searching for good solutions of problems defined on vast search spaces. These spaces are clearly too large for com- plete enumeration, particularly as only very restricted computer power was available at that time. The approaches reported in the literature are briefly discussed here, because the application domains of automatic programming, sequence prediction, numerical optimization, and optimal control are even more interesting currently than at that time. Addition- ally, the basic algorithmic concepts used thirty years ago are still the basic techniques used in the modern algorithms (though currently exten- ded by many other useful techniques and confirmed by some theoretical results). The term automatic programming denotes the task of finding a pro- gram which calculates a certain input-output function. However, this task has to be performed by a computer program, i.e. an artificially in- telligent program which is able to “write” and test computer programs. We will not discuss here all problems related to automatic programming, but an attempt towards evolving computer programs as performed by Friedberg et al. in 1958 was surely much too early for having real suc- cess [FYi58, FDN59] (due to the restricted performance of the computers available at that time). The computer program to be evolved by Fried- berg’s Evolutionary Algorithm was binary encoded and was modified by instruction interchange and random changes of instructions. To perform these random changes, whenever a mutation occurred, a complete new instruction was chosen to substitute the mutated one. The simple pro- gram generation tasks were to find, starting from a random program, the identical mapping and the complement mapping on a single bit, a program calculating the sum of two input bits, and a program having an internal counter where the output should depend on the number of program executions. Since the modification methods of instruction in- terchange and random instruction changes were undirected? Friedberg introduced a “success number” for instructions. The success numbers in- dicated how well the instructions had served over thousands of previous trials, and the mutation rate depended on the success numbers of the in- structions. This way, instructions which were successful in previous trials 49s we will see later on, such an evaluation scheme as used by Friedberg will nowadays be called a (1,1)-strategy. 38 Organic Evolution and Problem Solving were less often mutated than worse instructions. Not surprisingly, the success numbers proved to be useful for the evolution process. However. the complete evolution mechanism was found to perform worse than a pure random search, which was mainly due to the absence of an effective selection mechanism, The success numbers could not provide sufficient selective pressure because they needed many trials before representing the instructions’ usefulness at least approximately correct. Furthermore, the approach measured quality of the program by combining the binary feedback information from execution of the program (output bit is cor- rect or is incorrect) and the local quality information associated with each instruction. This is a questionable method for judging the quality of the program, since the fitness function is extremely discontinuous®: Small changes in program syntax usually cause large changes in the input-output behaviour of the program. Concerning selective pressure, Friedberg proposed to use a mechan- ism to test several different programs created by random instruction changes and instruction interchanges from the actual one and to choose the best of the new programs as the next starting point. Such a selection mechanism comes close to the modern ones discussed in chapter 2, where it will be denoted a (1,,)-strategy (assuming ) descendants are created from the ancestor). Friedberg could not implement it due to memory restrictions of the IBM 704 computer he used for the experiments. Bremermann’s work was more oriented towards optimization [Bre62}. He used discrete as well as continuous object variables and a discrete mutation mechanism also for continuous variables, since the work was at the beginning closely oriented to the knowledge about the discrete nature of the genetic code. Since he knew about Friedberg’s disappoint- ing results, Bremermann assumed the necessity of choosing relatively simple optimization problems and restricted the experiments to linear and convex programming. For binary object variables he correctly ar- gued that multiple mutations — extremely unlikely events — are neces- ‘sary to overcome “points of stagnation”. Furthermore, he presented an estimation of the optimal mutation probability, which was the value 1/1 (I being the number of bits an individual consisted of) [BRS65]. This value is still in use as a general heuristic, although most people who use it are not aware of its originator. In order to overcome the prob- lem of stagnation points, Bremermann invented the idea of creating a number of descendants from an ancestor by means of mutation and then to reduce a subset of these descendants to a new, single ancestor of the next generation by mating techniques. The subset consists of the best solutions occurring among the descendants. In particular, he used an averaging process in case of continuous object variables, but he also tried exchanges of information between individuals in more complicated forms of the algorithm. However, most of the results were disappointing, and the algorithms had to incorporate additional heuristics to converge S50 very programmer will confirm the statement from personal experience. 1.4 Early Approaches 59 toward a solution. From lots of practical experience with applications Algorithms it can be concluded that Bremermann’s ap- plication to linear and convex programming used a much too simpie problem domain. where Evolutionary Algorithms can not compete with the variety of specialized optimization techniques. While both approaches presented so far have — due to reasons in- dicated in the discussion given above — failed to provide an efficient instrument for solving a problem from the actual domain chosen by the author, Fogel presented successful results from another approach to evol- utionary computation in 1964 [Fog64, FOW66]. This approach is called Evolutionary Programming. In contrast to Bremermann, they used a more complicated appiica- tion domain, and in contrast to Friedberg they provided a real selective pressure as well as a reasonably graded quality function, which was able to reward smail improvements of the quality of the evolving objects. For measuring the quality of an individual in the framework of a se- quence prediction problem, the sequence generated by the individual was compared to a target sequence and the percentage of correct letters determined the individual’s fitness. An individual was a symbolic rep- resentation of the transition table of a finite-state-machine (FSM), and by means of mutation the single offspring individual could differ from its ancestor either by an output symbol, a state transition, the number of states, or the initial state. After testing the performance of the off- spring F'sM, the better individual of both ancestor and offspring became the ancestor of the next step. The worse one was discarded®!. Similar to Bremermann, Fogel, Owens, and Walsh argued that such a search mechanism might become trapped in a local optimum. Therefore, they used a population in order to solve this problem, and they realized a population-based algorithm that created an offspring population from a parent population of the same size by mutating each individual. Selec- tion then determined the best half of parents and offspring to form the next parent population®?. In addition, they also mentioned the ideas to permit multiple mutations and to use recombination as well as mutation. Although their work avoided the mistakes made by Friedberg and Bremermann, it did not find a suitable acknowledgement in the follow- ing years. Instead, the idea of Evolutionary Algorithms received small attention until the beginning of the seventies, when Genetic Algorithms in the United States and Evolution Strategies in Germany were fully developed independently of each other. An overview of the history of these modern Evolutionary Algorithms will be presented in chapter 2. It seems natural to assume that the disappointing results by Fried- berg and Bremermann in combination with insufficient hardware power (which made planning and execution of programming projects a diffi- cult task) caused both the ignorance of the work of Fogel, Owens, and Slguch a selection scheme would presently be called a (1+1)-strategy. 52This can be denoted a (z+ )-strategy, where ys is the population size. 60 Organic Evolution and Problem Solving Walsh and the stagnation of development in the field of Evolutionary Al- gorithms. Goldberg presents a misiudgement. when he gives his opinion on the work of Fogel. Owens. and Walsh (see [Gol89a], p. 106)°3: The evolutionary programming of Fogel, Owens, and Weish with its random alteration of a finite-state-machine and save- the-best selection was insufficiently powerful to search other than small problem spaces quickly. He strongly emphasizes the importance of recombination over muta- tion, and.especially the interaction of both operators and the selection technique will be one of the major topics to be discussed and analyzed in the remaining chapters of this book. Finally, we mention the Evolutionary Operation (EVOP) approach as presented by Box for the first time in the late fifties [Box57, BD69}. Though this method emphasized on the natural model of organic evolu- tion by performing a mutation-selection process in the sense of a (1+A)- strategy (where \ = 4 or A = 8, the so-called 2? and 2° factorial design method, respectively), it was basically intended to serve as an industrial management technique. This management technique provided a system- atic way to test alternative production processes that result from small modifications of the standard parameter settings, this way leading to its stepwise improvement. The method was not intended to be realized as a computer algorithm, but if it were realized, it would probably resemble a (1+A)-Evolution Strategy without a step-size adaptation mechanism (see section 2.1). 1.5 Summary Within this first, introductory chapter we have discussed Evolutionary Algorithms from a wide range of different points of view, thus reflect- ing the enormous degree of interdisciplinarity related to this fascinating field of research. Four important aspects of Evolutionary Algorithms were discussed in detail: The biological background (section 1.1), the relation to Artificial Intelligence (section 1.2), the relation to global op- timization (section 1.3) and the computational complexity of the global optimization problem (section 1.3.2). A lot of background information and theoretical results collected together highlights the basic design of Evolutionary Algorithms and some of their general limitations imposed by the global optimization problem per se. ‘The connection between optimization and evolution turned out to be established by the adaptive landscape metaphor relating fitness to trait combinations. Though we did not discuss the biological background in full detail, the basic processes of transcription and translation, the 53David E. Goldberg, GENETIC ALGORITHMS IN SEARCH, OPTIMIZATION, AND MACHINE LEARNING (pg. 106), ©1989 by Addison-Wesley Publishing Com- pany, Inc. Reprinted by permission of the publisher. 15 Summary 61 genetic code and the hierarchical structure of the genetic information were explained in section 1.1.1. In connection to meiotic heredity, the crossover mechanism occurring during the formation of gametes as well as the various forms of mutation events observable in organic evolu- tion were presented in sections 1.1.2 and 1.1.3, thereby introducing the mechanisms responsible for genetic variation. Section 1.1.4 served as an excursion to evolution processes on the lower level of biological macromolecules, explaining the existence of a unique genetic code for all forms of life on earth by the concept of hy- percycles. Identifying mutation, self-reproduction, and metabolism as necessary conditions for selection to occur, Eigen’s mathematical model yields important insights into the nature of the evolution process and the optimal error rate, which is just below the critical value that prevents stable reproduction of information. A transfer of these results to simpli- fied instances of Genetic Algorithms will become obvious in subsequent chapters. Historically, Artificial Intelligence was the research field in which the first efforts towards problem solving with Evolutionary Algorithms were dealt with, ic. automatic programming and sequence prediction. The impact of Evolutionary Algorithms to this important field of computer science was discussed in section 1,2. Evolutionary Algorithms are in- ductive learning algorithms that can serve as a powerful search method in many fields of Artificial Intelligence research, including Neural Net- works, Classifier Systems, game playing, and Artificial Life. In section 1.3 Evolutionary Algorithms were approached from the global optimization point of view, i.e. emphasizing the application prob- lem and traditional methods for its solution. Three examples of global optimization problems representing different search spaces were presen- ted in order to clarify the complexity and generality of the global optim- ization problem: The autocorrelation of binary sequences, the travel- ing salesman problem, and the general nonlinear parameter estimation problem.. Besides introducing the basic terminology from global optim- ization, the uniform random search algorithm and a general global ran- dom search algorithm were discussed. While global convergence with probability one holds for both these algorithms, the former suffers from exponential time complexity. The latter is too general for time complex- ity analysis, but allows the transfer of the global convergence property to special variants of Evolutionary Algorithms. Closely related, but from a computer science point of view, sec- tion 1.3.2 summarized some results on computational complexity of global optimization problems. In general, global optimization is NP- complete, which is surely not surprising. More disappointing is a result for pseudoboolean problems indicating that even at least an efficient (i.e. polynomial time) approximation algorithm for such problems does not exist. This may exclude only pathological problems from efficient approximation, but actually there is no theory available that supports such a Classification of problems. Empirical results, however, confirm the 62 Organic Evolution and Problem Solving assumption that practical applications allow for good approximations to be located by domain-dependent, specialized Evolutionary Algorithms. Section i.4 conciudes the nrst chapter by mentioning the early ap- proaches to use algorithms gleaned from the model of organic evolution for problem solving, pointing at the difficulties and mistakes that led to a relatively long period of stagnating research in this field. With my two algorithms, one can solve ail problems — without error, if God will ! Al-Khorezmi, 780-850 (in Science Focus IBM} se. 1, 1981) Some problems are just too complicated for rational logical solutions. They admit of insights. not answers. Jerome Bert Wiesner (in D, Lang: Profiles; A Scientist's Advice Il. New Yorker, 26th January 1963) _ 2 . Specific Evolutionary Algorithms In this chapter, an outline of an Evolutionary Algorithm is formulated that is sufficiently general to cover at least the three different main stream algorithms mentioned before, namely, Evolution Strategies, Ge- netic Algorithms, and Evolutionary Programming. As in the previous chapter, algorithms are formulated in a language obtained by mixing pseudocode and mathematical notations, thus allowing for a high-level description which concentrates on the main components. These are: A population of individuals which is manipulated by genetic operators — especially mutation and recombination, but others may also be incor- porated — and undergoes a fitness-based selection process, where fitness of an individual depends on its quality with respect to the optimization task. This is captured by the following definition: Definition 2.1 (General Evolutionary Algorithm) An Evolution- ary Algorithm (EA) is defined as an 8-tuple EA = (I,8,9,¥%,s,0,4,A) (2.1) where I = A, x A, is the space of individuals, and Az, A, denote arbitrary sets. ©: I + R denotes a fitness function assigning real values to individuals. Q = {we,,-..,we, |we,: I> > D}U {we,:I# 4 PY} (2.2) is a set of probabilistic genetic operators we;, each of which is controlled by specific parameters summarized in the sets 0; C IR. so, : (PU ret’) 4 (2.3) denotes the selection operator, which may change the number of indi- viduals from \ or \+ 4: to 2, where p, 4 € IN and tp = A is permitted. 63 64 Specific Evolutionary Algorithms An additional set ©, of parameters may be used by the selection oper- ator. w ts the number of parent individuals, while \ denotes the number of offspring individuais. Finally, «: 7* — {true. faise} is a termination criterion for the EA, and the generation transition function U ; [+ — [# describes the complete process of transforming a population P into a sub- sequent one by applying genetic operators and selection: ™ U = $0Wg, o.,.0We, 0 Wag 3 (2.4) UP) = 50,(QUwe,,(...(4o,,(we9(P)))---)) Here {i1,...,i;} © {1,...,z}, and Q € {0, P}. The space of individuals may be arbitrarily complex, i.e. there are no restrictions on the structure of the sets A, and A,, though they will usually be relatively simple in the following. Even the fitness function & may include some intermediate calculation steps, one of those always be- ing evaluation of the objective function value which provides the basis of the fitness value. Whenever y 4 A, the operator set includes a dis- tinguished operator wo : J# + J which serves to change population size forming A offspring individuals from yz parents. This change is taken back by selection, which performs a fitness-based change of population size to ws individuals. While genetic operators are always probabilistic, se- lection may be probabilistic or completely deterministic. Both selection and genetic operators may be controlled by some exogenous paramet- ers. The termination criterion 1 may range from arbitrarily complicated criteria — e.g. genotypic or phenotypic diversity of the population, rel- ative improvement of the best objective function value over subsequent generations — to rather simple ones, e.g. testing whether a prespecified number of generations is completed. A complete generation step, i.e. the transition from the actual parent population to the subsequent one, con- sists of application of the genetic operators in a defined order, followed by selection. This is captured in the generation transition function , iterated application of which generates a population sequence: Definition 2.2 (Population sequence) Given an Evolutionary Algo- rithm with generation transition function U : It + I* and an initial population P(0) € I#, the sequence P(0), P(1), P(2),... ts called a pop- ulation sequence or evolution of P(0):; <=> WVE>0 : P(t+1)=W(P(t)) . (2.5) Creation of the initial population P(0) is discussed later in connection with the explanation of particular instances of Evolutionary Algorithms. Usually, P(0) is initialized at random, but it may also be generated from one (known) starting point. The stopping criterion 4 characterizes the end of this artificial evolution process, and the result of a run of an Evolutionary Algorithm is in most cases the individual of minimal objective function value encountered during the complete evolution’. 1 This is typical for problem solving applications where emphasis is put on finding best solutions, Specific Evolutionary Algorithms 65 This individual is not necessarily identical to the Dest one contained in the final population. The following definition summarizes this informal description of the running time and result of an Evolutionary Algorithm: Definition 2.3 Given an initial population P(0) € I* for an Evolu- tionary Algorithm with generation transition function Y, the running time Tea is given by te, = min{t € IN | (U*(P(0))) = true} (2.6) An individual TEA ae Uv) (2.7) t=0 is called the result of an Evolutionary Algorithm when applied to the initial population P(0), iff f(@) = minf{f(@)| a € Uj ero} : (2.8) t=0 The genetic operators are characterized here as macro-operators that transform a complete population into another complete population. This high-level description is put into more concrete terms by specifying the mechanisms which lead to the creation of a new individual from one or more ancestors. Basically, the high-level operators are reduced to a description by low-level operators as indicated in the following definition: Definition 2.4 (Asexual, sexual, panmictic genetic operators) A genetic operator we : I? —+ I? is called asexual oo: Iwo: I-41: . ( we(ti,...,@p) = (we(ai),---,we(ap)) AP= 4, sexual 1 By: Pal: we(G1y--5 Ep) = (We (4i,,F5,)s--- WO (Fi,s2)_)) where Wk € {1,.+-,9} teyje € {1)..+,D} (2.9) are chosen at random, panmictic : dwg: JP41: we(di,...,dp) = (welds... ++, W@(ai, +++, dp) q The definition characterizes the essential number of individuals taken into account by the operators. Mutation is an example of an asexual op- erator, while recombination is typically sexual (i.e., it involves two parent individuals), but may also be extended in some variants of Evolutionary 66 Specific Evolutionary Algorithms Algorithms to a panMictic form (without any biological basis. see sec- tion 1.1.2). In the following, the symbols m and r are used to denote the high-level description of muzation and recombination. respectively, while m/ and r’ denote their asexual. sexual, or panmictic form. The description given so far can be directly translated into a gen- eral algorithmic outline of an Evolutionary Algorithm. In algorithm 3;~ t denotes the generation counter, P(t) = {@)(t),...,@,(t)} is the pop- ulation at generation ¢, consisting of individuals @; € I, and p denotes the parent population size. Q € {0, P(t)} denotes an additional set of individuals (e.g. the parent population) that may be taken into account by selection. Algorithm 3 (Outline of an Evolutionary Algorithm) t:=0;, initialize P(0) = {4,(0),...,%,(0)} € I"; evaluate P(0): {(21(0)),..., (@,(0))}; while ((P(t)) # true) do recombine: P’(t) := re, (P(t); mutate: P’ (t) := me,, (P'(t)); evaluate P"(t) : {8(2Y(t)), ..., O(E%(1))}; select: P(t +1) = so,(P"(t)UQ); ti=t+; od In the following sections, Evolution Strategies, Evolutionary Pro- gramming, and Genetic Algorithms are presented within the general framework introduced so far. Since a unified notation has been intro- duced, similarities and differences of the algorithms can easily be iden- tified. After giving a short overview of the history of each of these algorithms, their mechanisms of initialization, fitness evaluation, muta- tion, recombination, and selection are described. Furthermore, the basic theoretical results are summarized, stressing such results which are im- portant in subsequent chapters. The chapter is concluded by a summar- izing comparison of the algorithms with respect to their similarities and differences as well as their relation to the underlying model of organic evolution. 2.1 Evolution Strategies Evolution Strategies are a joint development of Bienert, Rechenberg and Schwefel, who did preliminary work in this area in the 1960s at the Technical University of Berlin (TUB) in Germany. First applica- tions were experimental and dealt with hydrodynamical problems like shape optimization of a bended pipe [Lic65], drag minimization of a joint plate [Rec65], and structure optimization of a two-phase flashing 21 Evolution Strategies 67 Due to the impossibility to describe and solve such optimization problems analytically or by using traditional methods, a simple algorithmic method based on random changes of experimental setups was developed. In these experiments, adjustments were possible in discrete steps only, in the first two cases (pipe and plate) by chan- ging certain joint positions and in the latter case (nozzle) by exchanging, adding or deleting nozzle segments. Following observations from nature that smaller mutations occur more often than larger ones, the discrete changes were sampled from a binomial distribution with prefixed vari- ance. The basic working mechanism of the experiments was to create a mutation, adjust the joints or nozzle segments accordingly, perform the experiment and measure the quality criterion of the adjusted construc- tion. If the new construction happened to be better than its predecessor, it served as basis for the next trial. Otherwise, it was discarded and the predecessor was retained. No information about the amount of improve- ments or deteriorations was necessary. This experimental strategy led to unexpectedly good results both for the bended pipe and the nozzle. Schwefel was the first who simulated different versions of the strategy on the first available computer at TUB, a Zuse Z23 (Sch65], later on fol- lowed by several others who applied the simple Evolution Strategy to solve numerical optimization problems. Due to the theoretical results of Schwefel’s diploma thesis, the discrete mutation mechanism was sub- stituted by normally distributed mutations with expectation zero and given variance [Sch65]. The resulting two membered ES works by cre- ating one n-dimensional real-valued vector of object variables from its parent by applying mutation with identical standard deviations to each object variable. The resulting individual is evaluated and compared to its parent, and the better of both individuals survives to become parent of the next generation, while the other one is discarded. This simple selection mechanism is fully characterized by the term (1+1)-selection. For this algorithm, Rechenberg developed a convergence rate the- ory for n > 1 for two characteristic model functions, and he proposed a theoretically confirmed rule for‘ changing the standard deviation of mutations (the 1/5-success rule) [Rec73]. This strategy and the corres- ponding theory are discussed in section 2.1.7. Obviously, the (1+1)-ES did not incorporate the principle of a pop- ulation. A first multimembered Evolution Strategy or (u+1)-ES hav- ing # > 1 was also designed by Rechenberg to introduce a population concept. In a (4#+1)-ES p parent individuals recombine to form one offspring, which after being mutated eventually replaces the worst par- ent individual — if it is better (extinction of the worst). Mutation and adjustment of the standard deviation was realized as in a (1+1)-ES, and the recombination mechanisms will be explained in section 2.1.3. This 2-This experiment is one of the first known examples of using operators like gene deletion and gene duplication, i.e. the number of segments the nozzle consisted of was allowed to vary during optimization, 68 Specific Evolutionary Algorithms strategy, discussed in more detail in [BHS91], was never widely used but provided the basis to facilitate the transition to the (:+\)-ES and. {p,A)-ES as introduced by Schweiei? [Sch75b. Sch77, Sch8laj. Again the notation characterizes the selection mechanism, in the first case in- dicating that the best y individuals out of the union of parents and offspring survive while in the latter case only the best y: offspring in- dividuals form the next parent generation (consequently, \ > yu is ne- cessary). Currently, the (jt,\)-ES characterizes the state-of-the-art in Evolution Strategy research and is therefore the strategy of our main interest to,be explained in the following. As an introductory remark it should be noted that the major quality of this strategy is seen in its ability to incorporate the most important parameters of the strategy (standard deviations and correlation coefficients of normally distributed mutations) into the search process, such that optimization not only takes place on object variables, but also on strategy parameters according to the actual local topology of the objective function. This capability is termed self-adaptation by Schwefel [Sch87] and will be a major point of interest in discussing the Evolution Strategy. 2.1.1 Representation and Fitness Evaluation As indicated previously, search points in Evolution Strategies are n- dimensional object parameter vectors Z € JR” so that application to optimization problems as introduced in definition 1.1 of section 1.3 is an easy task. Given the objective function f : IR” + RR, the fitness function © is in principle identical to f, ie. given an individual @ € I, we have a@) = f@) . (2.10) Here # is the object variable component* of @ = (2,¢,a) € I = JR” x A,, where A, = Rip x[-1,7)" ne € {l,...,n} (2.11) Ta € {0, (2n—n,)(no —1)/2} Besides representing the object variable vector Z, each individual may additionally include one up to n different standard deviations 7; as well as up ton-(n—1)/2 (namely, when ng = n) rotation angles a4; € [—7, 7] 2-The material presented here is based on [Sch81a] and a number of research art- icles, but in the meantime an updated and extended edition of Schwefel’s book was published (j.¢., [Sch95]). 4We could be even more formal here by defining projections which yield different components of @. However, this would unnecessarily complicate notations and is omitted here, because the meaning of components can be identified by the symbols used for notation, ive, Z, o, &. 21 Evolution Strategies 69 (ie {1,...,.n-1},9 € {741,...,n})§, such that the maximum number of strategy parameters amounts to w = n-(n+1)/2. For the case 1 I which yields a triple (7’,3’,@’) when applied to a particular individual (7,4, @). The notation N(0, 1) is used here to denote a realization of a normally distributed one-dimensional random variable having expectation zero and standard deviation one, while N;(0,1) indicates that the random variable is sampled anew for each possible value of the counter i. Using this notation, mutation is formalized as follows (Vi € {1,...,n}, V7 € {1,....n-(n— 1)/2}): 1 a; = oj: exp(r’ - N(0,1) +7 - Ni(0,1)) a, = a; +B N;(0,1) (2.15) # = 24NGcol.a)) First, the standard deviations and rotation angles are mutated using a multiplicative, logarithmic normally distributed process in case of the standard deviations and an additive, normally distributed variation in case of the rotation angles. Finally, for mutation of the object variable vector Z the resulting vectors 6” and @' are used to create the random vector for modifying 7. The global factor exp(r’ - N(0,1)) allows for an overall change of the mutability and guarantees the preservation of all degrees of freedom®, whereas exp(r - N;(0,1)) allows for individual 8Notice that for 7’ = 0 and m > 1 the total step size would have almost no chance to undergo substantial changes. 72 Specific Evolutionary Algorithms changes of the “mean step sizes” a,. A logarithmic normal distribution for the variations of standard deviations o; is motivated as follows® (see {Sch77j, p. 168): A multiplicative modification process for oj guarantees positive values of standard deviations. The median (1/2-quantile) of a multiplicative modification must be one (this implies the next condition to be fulfilled). * To guarantee an average neutrality of the process in absence of selective: pressure, a multiplication by a certain value must occur with the same probability as multiplication by the reciprocal value. e Smaller modifications must occur more often than larger ones. The factors 7, 7’, and # in equation (2.15) are rather robust paramet- ers, which Schwefel suggests to set as follows (see [Sch77], p. 167-168): (ava) TK - 2.18) rex (van) ‘ (218) B = 0.0873 Usually, the proportionality constants for r and +’ have the value one, and the value suggested for @ (in radians) equals 5°. + and 7’ can be interpreted in the sense of “learning rates” as in artificial neural networks, and preliminary experiments with different proportionality factors indicate that the search process can be tuned for particular ob- jective functions by modifying these factors. However, it is still possible for the standard deviations to become practically zero by the multiplicative process and for the rotation angles to leave the range [—7,, 7] of feasible values. To prevent both events, the algorithm in the first case forces all standard deviations to remain larger than a minimal value! ¢,, and in the second case angles are circularly In case na = 0, equations (2.15) reduce to the mutation rule % ox: exp(r’ » N(0,1) +7» Ne(O,1)) a i tot-Ni(0,3) , (2.16) fou which can also be applied if 1 ¢!,{2;| where 1 + 4, > 1 in order to assure that o; remains sufficiently large to cause @ modification of z; (see [Sch77], p. 132). 21 Evolution Strategies 73 mapped to the feasibie range, ie. whenever an angle would become an amount Cq larger (smaller) than 7 (—7), it is mapped to —t+cq (m~Ca): laf > => al; = a — 2n- sign(a’)) ol a (2.19) Altogether, this special mutation mechanism enables an Evolution Strategy to evolve its own strategy parameters, i.e. standard deviations and covariances (represented by rotation angles) during the search, ex- ploiting an implicit link between appropriate internal model and good fit- ness values, The resulting evolution and adaptation of strategy paramet- ers according to the topological requirements has been termed collective self-adaptation by Schwefel [Sch87]. See also [HB92] for a demonstra- tion of the self-adaptation mechanism in case of a simple, time-varying objective function. 2.1.3 Recombination A variety of different recombination mechanisms are currently used in Evolution Strategies, and the operators are sexual as well as panmictic. In the sexual form, recombination operators act on two individuals ran- domly chosen from the parent population, where choosing the same in- dividual twice for creation of one offspring individual is not suppressed (but this could be introduced easily)". Conversely, for the panmictic variants of recombination one parent is randomly chosen and held fixed while for each component of its vectors the second parent is randomly chosen anew from the complete population. In other words, the creation of a single offspring individual may involve up to all parent individuals (this method for recombination emphasizes the ‘point of view that the parent population as a whole forms a gene pool from which new indi- viduals are constructed). Recombination is always used'in Evolution Strategies for the cre- ation of all offspring individuals, when » > 1. Furthermore, not only the object variables but also strategy parameters are subject to recom- bination, and the recombination operator may be different for object variables, standard deviations, and rotation angles. This implies that recombination of these groups of information proceeds independently of each other, i.e. there is no restriction for strategy parameters to ori- ginate from the same parent as object variables do. The utilization of independent recombination on object variables and strategy parameters (standard deviations and rotation angles) is justified by experimental ob- servations concerning the performance of the resulting variants of Evol- ution Strategies. Theoretical investigations of this topic are still an open field of research. ° 11 As an aside, it is a characteristic property of all recombination operators that incest can never create anything new, ie. V2 EI: r(d,a) =a.

You might also like