Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

1GAWO

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

GENETIC ALGORITHMS

__________________________________________________________________________________

Chapter 1
Genetic Algorithms

1.1 THE GENETIC ALGORITHMS


Genetic Algorithms (GAs) are developed to model natural adaptive systems.
(Holland 1975) They are based on Natural Genetics and Darwian principle - ‘survival
of the fittest’. The algorithm described below is the Traditional GA version (TGA).

Start

Initialise Population of Individuals

Select Individuals

Crossover pairs of Individuals

Mutate Individuals

Satisfy
Terminating No
Criteria?
Yes

End

Diagram1-1: Flowchart showing the operations of Traditional Genetic Algorithms

TGA first generates a population of individuals. (Appendix B) Each


individual is modelled as a sequence of genes. The individuals are created differently
from each other by ‘random’ assignment of different values(alleles) to the genes. To
determine if an individual is fit, an objective function assigns a fitness value to each
individual by checking the alleles of the individuals. Once the whole population is
initialised, a selection mechanism then picks the fitter individuals for crossover and
mutation. This generation process is repeated until some criteria is satisfied.

1
GENETIC ALGORITHMS
__________________________________________________________________________________

1.2 REPRESENTATIONS OF GA
The early GA systems primarily involve binary strings. Binary strings are
popularly used because of theoretical convenience. The range of representation has
widened considerably in recent years and individuals are now commonly represented
as sequences of real numbers, structures, etc. These different forms maintain
precision, allow non-binary based genetic operations to be performed and eliminate
the need to convert numbers to strings. (Wright 1991; Davis 1991)

1.3 GA PARAMETERS
There are several important parameters to consider when working with GA.
This includes the setting of a population size, crossover rate and mutation rate. The
following lists the current popular parameter sizes :
population size : 10 to 100 1
crossover rate : 0.4 to 1.0
mutation rate : 0.005 - 0.01

Unfortunately, the same set of parametric values cannot be applied


consistently to all problems. It is often necessary to experiment with these parameters
so as to avoid premature convergence by GA. (Grefenstette 1986)

1.4 GA POPULATION
1.4.1 Population Size
It has been reported that binary representation needs only a small population to
work efficiently while other representation require bigger population. (Reeves 1993)
To obtain answers quickly, that is to obtain good online performance, small
population, high crossover rate and high mutation rate are preferred. However, small
population may not cover the search space completely and readily loses useful
schemata. It has also been observed (on a uniformly scaled linear problem and a
massively multimodal deceptive problem) that computational cost tends to increase
quickly when population size becomes smaller than a critical size.
A large population is preferred if little is known about the problem. (Goldberg
et. al. 1995) On the other hand, large population can result in suboptimal solutions

1
Some reseachers suggested a population size of 20 -30 and crossoverrate of 0.75 - 0.95 instead.
(Schaffer, Caruana, Eshelman & Das 1989)

2
GENETIC ALGORITHMS
__________________________________________________________________________________

too, especially if computing resources are limited and do not permit sufficient
generations.
From the above discussion, it can be gathered that GA attempts to achieve two
conflicting ‘goals’. GA tries to locate the optimal solution (accuracy) and also tries to
converge as fast as possible to save computing resources (efficiency). It is difficult to
balance the parameters such that both goals can be satisfied.

1.4.2 Population Initialisation


There are a number of approaches to generate the initial pool of individuals:
a. Random generation - all the individuals are randomly generated.
b. Select n best - the best of a number of randomly generated individuals
are selected. (Bramlette 1991)
c. Domain directed - this uses domain specific knowledge to seed the
population. (Grenenstette 1987) A systematic way to generate the individuals
helps to focus on the fitter regions of search space. (Reeves 1993)
d. CaseBased Reasoning Tools (CBR) - proposes suitable molecules via
Case-Retrieval and then performs Case-Adaptation to generate the pool.
(Ramsey & Grefenstette 1993) Some form of cataclysmic mutation can be
performed on these individuals. Portions of the structures are repeatedly and
randomly mutated to generate the population. In this way, a population which
is biased towards certain good structures is created with a certain degree of
diversity. (Mathias & Whitley 1995)
e. Inductive Logic Programs (ILP) - suitable structures maybe induced
from databases and/or the population of individuals. Molecular structures
analysis by ILP can help to guide the seedings of the populations.

Some of the above approaches can be repeatedly applied for every generation
of population. For example, method (b) and (c) can help to apply selective pressure
on the population, method (d) for maintaining diversity and preventing premature
convergence, and method (e) for guiding GA search.

1.5 THE SIMILIARITY TEMPLATE THEORY


1.5.1 Exploration and Exploitation

3
GENETIC ALGORITHMS
__________________________________________________________________________________

The diversity of a population can be maintained by exploration of search space


while selective pressure can be maintained by exploitation. Diversity is important
because it allows new and possibly fitter individuals to be found. Selection is also
necessary because it keeps the fitter individuals and improve overall population
fitness. However, selection tends to duplicate individuals and result in homogenous
population. Thus, both exploration and exploitation are inversely related factors that
affect the search performance. (Whitley 1989; Chapter 4)

1.5.2 Implicit Parallelism


Implicit Parallelism has been proposed to account for how GA attempts to
search large problem spaces effectively. (Holland 1975) It is suggested that each
individual is an instance of many schemata. A schema is a similarity template of a
group of individuals. When a population of individuals is genetically operated, a
large number of different schemata are being sampled in parallel.2 According to the
theory, GA favours low order and short schemata (building blocks) and recombine
them into larger schemata of higher fitness.3
TGA involves Fitness-Proportion-Selection, One-Point-Crossover and
Mutation .(Goldberg 1989a) More details about the process of these mechanisms are
described in the next few sections.

1.6 SELECTION
Each individuals is ‘reproduced’ according to its fitness and the instances of
the individual in the population. The following sections list a few of the selection
methods which have been design to affect the population in different ways. Most
methods involves selecting of individuals for the next generations. Others investigate
how the individuals in a population can be replaced.

1.6.1. Fitness-proportional Selection

2
Assuming each individual has 10 genes and there are 3 possible alleles (0, 1, *) for each gene. As
such, there will be a maximum of 310 different schemata for the individual. If we have a population of
100 individuals, we will get a total of almost 6 million schemata.
3
The Schema Theorem probably still need further ‘enhancement’. (Grefenstette & Baker 1989)
Implicit parallelism describes about the searching of many schemata in parallel and does not describe
the effort required for various search regions. The theorem does not attempt to explain why GA is able
to generate better individuals via crossover. (Mühlenbein 1991) Besides, the theorem assumes that GA
knows beforehand which schemata are relevant so that higher fitness can be allocated.

4
GENETIC ALGORITHMS
__________________________________________________________________________________

According to the Theory of Evolution, nature favours fitter individuals and


reproduced them with higher probability. It has been noted that reproduction affects
the better or worse schemata in an exponential rate. (Goldberg 1989a) However, this
form of selection is unable to maintain steady selection pressure toward convergence.
The selection process has O(n2) time complexity for the weighted roulette
wheel method (Goldberg 1989a) and O(n log n) for the binary search method
(Goldberg & Deb 1991).

1.6.2 Rank-proportional Selection


Population is ranked by fitness and selection is based on rank. As such, a
constant selection pressure is maintained between the best and worst individuals in the
population. This can help to slow down convergence, solve scaling problem, achieve
better optimisation and allow better control of selective pressure. (Whitley 1989)This
method has time complexity O(n log n). It is suitable for the nondominated sorting
procedure which is used for multiobjective problems. (section 2.6.1)

1.6.3 Population - elitist selection


This strategy favours the best or discard the worst. Members that are worse
than the offsprings are replaced. This helps to accumulates useful schemata. Hence,
selection is able to preserve good schemata while other operators (example:
crossover) can then concentrate on maintaining variety in the population. (Eshelman
& Schaffer 1991)
This method results in better individuals having longer lifespans and thus,
more reproductions. (section 2.6.3) As such, fitness proportional reproduction occurs
in this form of selection too. Some researchers suggest that this selection models the
natural process more closely than Fitness-proportional Selection. (Eshelman 1991)
It has been found that a combination of uniform crossover with some form of
elitistic selection may result in better exploration, due to disruption of spurious
correlations and preservation of fitter schemata. (Schaffer, Eshelman & Offutt 1991)

1.6.4 Random Replacement Selection


An individual is randomly selected from the population. As such, all
individuals have ‘equal’ chance of being selected. Replacement occurs if the new

5
GENETIC ALGORITHMS
__________________________________________________________________________________

individuals (obtained via genetic operations) are fitter than the worst individuals in the
population. This process has O(n) time complexity.
This selection mechanism is used in subsequent studies on drug design.
Unlike most of the TGAs, (diagram 1-1) this selection is performed after the genetic
operators. This is to allow the worst individuals to be replaced by new individuals
during the first generation.

1.7 CROSSOVER
A pair of children is obtained after crossing over of a pair of individuals
(parents). Crossover is applied so that schemata from the individual pairs can be
combined to give better children.

1.7.1 One-Point Crossover


This is commonly observed among the chromosomes in nature. A crossover
point is first randomly selected. The sequences of genes ‘after’ the crossover point of
both target and complement are then exchanged.

1.7.2 Template or Uniform Crossover


A pair of parents is randomly selected. A template which marks the exchange
sites for amino acids is randomly generated. Alleles of both parents are then swapped
according to the template.
UniformCrossover is described to be destructive because it breaks up large
schemata readily and results in lower schema survival rate. On the other hand,
Uniform crossover also results in more effective combination of schemata than
OnePoint or TwoPoint Crossover.(Syswerda 1989) In this sense, UniformCrossover
is said to be more constructive. UniformCrossover has also been found to be more
exploratory and has low positional bias and high distributional bias. (Eshelman,
Caruana & Schaffer 1989)
Uniform Crossover is used in the TGA implemented in the ligand studies.
(section 2.3)

1.7.3 Why OnePointCrossover in Nature?

6
GENETIC ALGORITHMS
__________________________________________________________________________________

If UniformCrossover is more ‘effective’, then there should be reasons why it is


not observed in nature:
- UniformCrossover is applicable in cases where epistasis is absent. In natural
chromosomes, many genes are positionally related and this discourages
mechanisms which breakup the chromosomes ‘randomly’.
- UniformCrossover is more suited for smaller populations which have
tedency towards homogeneity. (Spears & DeJong 1991)
- TGA is only a simplified model of nature (D. B. Fogel personal
communication) that is geared towards functional optimisation. This is one
reason why a more comprehensive scheme is proposed for GA. (Chapter 4)
It is found that crossover does not always occur in nature. Crossover is
‘conspicuously’ absent in female silkworms and wax moths. (Atmar 1994) This
suggests the presence of other operations which are probably just as effective and
which should deserve more attention. Indeed, it has been found in some experiments
that crossover can produce better results if combined with operators like mutation.
(Schaffer & Eshelmann 1991)

1.8 MUTATION
A gene of an individual is randomly selected. The allele is then randomly
changed to another value. In this way, Mutation helps to introduce new schemata and
insures against loss of useful schemata by crossover and selection.
It is suggested that mutation is capable of higher levels of disruption than
crossover and that mutation can perform whatever exploration that is done by
crossover. (Spears 1993) Crossover tends to preserve alleles that are common to the
pair of parents and this may limit the amount of exploration possible. Mutation has
been given more importance in recent works, as described in the following sections.
(Back 1995)

1.9 SELECTION, CROSSOVER AND MUTATION


Selection and Crossover help individuals to climb peaks while mutation helps
to jump from peak to peak. Selection tends to result in population uniformity as only
the better individuals are duplicated. Crossover and Mutation help to offset this by
introducing new alleles and new combination of alleles. (DeJong 1993)

7
GENETIC ALGORITHMS
__________________________________________________________________________________

Crossover is able to maintain high levels of both schemata construction and


survival while Mutation is only capable of maintaining either. (Spears 1993) Hence,
Crossover is more useful than Mutation when the population is initially diverse.
Mutation becomes more important when population loses diversity. (Davis 1989)
To optimise the performance of GA, it is necessary to balance between the
climbing of peaks (most of which are suboptimal) by crossover and the exploration of
new areas by mutation. (Chapter 4)

1.10 TERMINATING GA EXECUTION


GA maybe stopped if the best fitness of the population fails to improve for a
certain number of generations. Alternatively, GA can also terminate if a certain
number of generations has been achieved or if convergence of the population occurs.
It has been found that the number of generations needed for convergence
depends on the selection scheme chosen. Possible number of generations ranges from
O(log n) to O(n log n) time where n is the population size. (Goldberg & Deb, 1990)

1.11 SEGA
Self Adaptive GA (SEGA)(Hiroshi 1994) involves Reorder, HillCrossover and
HillMutation. SEGA is different from TGA in several respects. SEGA does not have
any explicit selection mechanism and involves a Reorder mechanism instead (section
1.10.1). Besides, SEGA also makes use of Hillclimbing operators, which are absent
in TGA.

1.11.1 Reorder
The population of individuals is reshuffled so that different pair of individuals
can be crossed over. There is no selection in this process. The reshuffling process
has O(n) time complexity.

1.11.2 HillClimbers
The HillClimbing operation searches by moving to a fitter neighbour and
replacing the original individual by the neighbour. It is able to obtain solutions faster
than genetic operators. However, it is often only capable of locating global minimum

8
GENETIC ALGORITHMS
__________________________________________________________________________________

and needs to be started in a good region of search space. This is especially so if the
search space has many peaks. There are three main hillclimbing strategies :4
- Random-mutation (RMHC) : mutates a site randomly until improvement is
obtained. TGA has been found to perform worse than Random-Mutation-Hill-
Climbing (RMHC) for a number of test functions.(Forrest & Mitchell 1993)
- Next Ascent (NAHC) : changes bits from left to right of the chromosome
until improvement is obtained.
- Steepest Ascent (SAHC) : takes only the best possible improvement. (Forrest
& Mitchell 1993)
The hilloperators confer SEGA a number of advantages. In SEGA, every
individual controls its own rate of adaptation5. This results in a self-adaptive
population. By having variable mutation rates for each individual, convergence may
occur only after many generations. (Back 1995)

1.11.3 HillCrossover
The HillCrossover operator is a hybrid of both RMHC and OnePointCrossover
and it has been adopted by the SEGA. HillCrossover only succeeds if one of the
children have better fitness. If neither children has better fitness, another crossover
point is randomly selected and OnePointCrossover is performed again. If none of the
crossover points result in fitter children, no HillCrossover is performed for the pair of
parents.
HillCrossover is found to perform better than GA or HillClimber:
- HillCrossover integrates the benefits of GA and HillClimber. GA is good at
locating fit regions quickly but tends to spend lots of time hunting for the
optimal solutions. On the other hand, HillClimber is useful for finding global
minimum rapidly but may get stuck in local minimum if it is started in an unfit
region. (Baluja 1993) Hence, HillClimbers are often used in the late stages of
search. (Mühlenbein et. al. 1991)
- Positional bias is lowered when Crossover is repeatedly performed. More
exploration of search space occurs and results in less spurious correlation by
bad schemata. (Eshelman, Caruana & Schaffer 1989)

4
Different hillclimbing methods seem to work well for different problems. RMHC is able to locate
optimum faster than the rest when applied to the Royal Road functions. (Forrest & Mitchell 1993)

9
GENETIC ALGORITHMS
__________________________________________________________________________________

- HillCrossover performs both Crossover and Macromutation concurrently.


(Jones 1995; Dawkins 1986)
- Repeatedly crossover to produce non-identical children is a good way to
combine the benefits of TwoPoint Crossover and Uniform Crossover. (Spears
& DeJong 1991)

1.11.4 HillMutation
In SEGA, the mutation operator has been designed to succeed only if
hillclimbing occurs. No new individual is produced if it is weaker than the original
parental individual.

1.11.5 SEGA and CHC


Strategy/Mechanism SEGA CHC
Selection ReOrder CrossGenerational
Crossover HillCrossover HUX (UniformCrossover)
Mutation HillMutation -
Balance - Incest Prevention
Table1-1: A comparsion between SEGA and CHC
SEGA has some similarity with the CHC adaptive GA (stands for Cross-
generational selection, Heterogenous recombination, Cataclysmic mutation).
(Eshelman 1991) CHC applies cross-generational selection which gives every
individual an equal chance of reproduction. This helps the population to preserve fit
schemata and is rather similar to the ReOrder operation of SEGA. The modified
version of UniformCrossover of CHC also has similar effects with SEGA’s
HillCrossover. However, the UniformCrossover is so disruptive that the Mutation
operator is no longer ‘needed’ in CHC. This is probably the same reason why
selection of CHC has to be more conservative in order to preserve more schemata.
In CHC, Crossover and Selection perform exploration and exploitation
respectively. CHC is able to achieve a balance between both operations by the
implementation of the Incest Prevention mechanism. Disruption is only possible
among ‘different’ pairs and elitism is applied to preserve only those fitter children
produced. Both Disruptive Crossover and Elitism become cooperative operations
because of this mechanism. Test results show that CHC outperforms TGA for a suite
of 10 test functions. (Eshelman & Schaffer 1991)

5
There are other forms of adaptation. This includes representation adaptation (Shaefer 1987), fitness
adaptation (Whitley 1987) and probability-of-operator-application adaptation (Davis 1989).

10
GENETIC ALGORITHMS
__________________________________________________________________________________

SEGA does not have any explicit balancing mechanism. Instead,


HillCrossover and HillMutation are tasked to explore. Because HillClimbing is
involved, both operators also exploit whatever good schemata that are found during
exploration.

11

You might also like