Self-adaptiveDifferentialEvolutionAlgorithmforNumericalOptimization
Self-adaptiveDifferentialEvolutionAlgorithmforNumericalOptimization
A. K. Qin P. N. Suganthan
School of Electrical and Electronic Engineering, School of Electrical and Electronic Engineering,
Nanyang Technological University Nanyang Technological University
50 Nanyang Ave., Singapore 639798 50 Nanyang Ave., Singapore 639798
qinkai@pmail.ntu.edu.sg epnsugan(ntu.edu.sg
Abstract- In this paper, we propose a novel Self- of the problem under consideration. The DE evolves a
adaptive Differential Evolution algorithm (SaDE), population of NP n-dimensional individual vectors, i.e.
where the choice of learning strategy and the two solution candidates, X, = (xi,l... x) E S, i = 1,...,NP,
control parameters F and CR are not required to be from one generation to the next. The initial population
pre-specified. During evolution, the suitable learning should ideally cover the entire parameter space by
strategy and parameter settings are gradually self- randomly distributing each parameter of an individual
adapted according to the learning experience. The vector with uniform distribution between the prescribed
performance of the SaDE is reported on the set of 25 upper and lower parameter bounds x; and x,.
benchmark functions provided by CEC2005 special
session on real parameter optimization At each generation G, DE employs the mutation and
crossover operations to produce a trial vector UiG for
each individual vector XiG, also called target vector, in
1 Introduction the current population.
a) Mutation operation
Differential evolution (DE) algorithm, proposed by Storn For each target vector XiG at generation G , an
and Price [1], is a simple but powerful population-based
stochastic search technique for solving global associated mutant vector Vi G = {VIi,G V2,G I...IViG } can
optimization problems. Its effectiveness and efficiency usually be generated by using one of the following 5
has been successfully demonstrated in many application strategies as shown in the online availbe codes []
fields such as pattern recognition [1], communication [2]
and mechanical engineering [3]. However, the control "DE/randl/ ": ViG -Xrl,G + F* (Xr2,G Xr3,G)
parameters and learning strategies involved in DE are "DE/best/ ": ViEG -Xbest,G + F *(Xr ,G - Xr2,G)
highly dependent on the problems under consideration.
For a specific task, we may have to spend a huge amount "DE/current to best/l ":
of time to try through various strategies and fine-tune the Vi,G = Xi,G + F- (XbeStG Xi,G)+ F * (XIG - Xr2GG)
-
Authorized licensed use limited to: HUNAN UNIVERSITY. Downloaded on March 24,2025 at 09:01:46 UTC from IEEE Xplore. Restrictions apply.
1786
where CR is a user-specified crossover constant in the need to develop a procedure to determine the probability
range [0, 1) and irand is a randomly chosen integer in the of applying each learning strategy. In our current
range [1, NP] to ensure that the trial vector UiG will implementation, we select two learning strategies as
candidates: "rand/l/bin" and "current to best/2/bin" that
differ from its corresponding target vector XiG by at are respectively expressed as:
least one parameter.
c) Selection operation Vi,G = Xr,,G + F*(Xr2,G Xr3GG)
If the values of some parameters of a newly generated Vi,G =-XG +F -
F(XrF,G Xr2GG)
(Xbest,G -Xi,G)+F
trial vector exceed the corresponding upper and lower
bounds, we randomly and uniformly reinitialize it within The reason for our choice is that these two strategies have
the search range. Then the fitness values of all trial been commonly used in many DE literatures [] and
vectors are evaluated. After that, a selection operation is reported to perform well on problems with distinct
performed. The fitness value of each trial vector f(UiJG) characteristics. Among them, "rand/i/bin" strategy
is com ared to that of its corresponding target vector usually demonstrates good diversity while the "current to
f(XG) in the current population. If the trial vector has best/2/bin" strategy shows good convergence property,
which we also observe in our trial experiments.
smaller or equal fitness value (for minimization problem) Since here we have two candidate strategies, assuming
than the corresponding target vector, the trial vector will that the probability of applying strategy "rand/l/bin" to
replace the target vector and enter the population of the each individual in the current population is p1 , the
next generation. Otherwise, the target vector will remain
in the population for the next generation. The operation is probability of applying another strategy should be
expressed as follows: P2 = 1-p1 . The initial probabilities are set to be equal 0.5,
i.e., p1 = p2 = 0.5. Therefore, both strategies have equal
Ui,G if f(Ui,G) < f(Xi,G) probability to be applied to each individual in the initial
X =_
i,G+l-X otherwise population. For the population of size NP , we can
randomly generate a vector of size NP with uniform
distribution in the range [0, 1] for each element. If the 1th
The above 3 steps are repeated generation after element value of the vector is smaller than or equal to p1,
generation until some specific stopping criteria are
satisfied. the strategy "rand/l/bin" will be applied to the jP
individual in the current population. Otherwise the
strategy "current to best/2/bin" will be applied. After
evaluation of all newly generated trial vectors, the number
3 SaDE: Strategy and Parameter Adaptation of trial vectors successfully entering the next generation
while generated by the strategy "rand/i/bin" and the
To achieve good performance on a specific problem by strategy "current to best/2/bin" are recorded as ns, and
using the original DE algorithm, we need to try all
available (usually 5) learning strategies in the mutation ns2, respectively, and the numbers of trial vectors
phase and fine-tune the corresponding critical control discarded while generated by the strategy "rand/l/bin"
parameters CR, F and NP. Many literatures [4], [6] and the strategy "current to best/2/bin" are recorded as
have pointed out that the performance of the original DE nfi and nf2 . Those two numbers are accumulated within
algorithm is highly dependent on the strategies and a specified number of generations (50 in our experiments),
parameter settings. Although we may find the most called the "learning period". Then, the probability of p1
suitable strategy and the corresponding control is updated as:
parameters for a specific problem, it may require a huge
amount of computation time. Also, during different
evolution stages, different strategies and corresponding nsl (ns2 + nf 2)
parameter settings with different global and local search 1 ns2 (nsl + nf l) + nsl (ns2 + nf 2) "2 =
capability might be preferred. Therefore, we attempt to
develop a new DE algorithm that can automatically adapt The above expression represents the percentage of the
the learning strategies and the parameters settings during success rate of trial vectors generated by strategy
evolution. Some related works on parameter or strategy "'rand/l/bin" in the summation of it and the successful
adapation in evolutionary algorithms have been done in rate of trial vectors generated by strategy "current to
literatures [7], [8]. best/2/bin" during the learnng period. Therefore, the
The idea behind our proposed learning strategy probability of applying those two strategies is updated,
adaptation is to probabilistically select one out of several after the learning period. Also we will reset all the
available learning strategies and apply to the current counters ns , ns2, nf1 and nf2 once updating to avoid
population. Hence, we should have several candidate the possible side-effect accumulated in the previous
learning strategies available to be chosen and also we learning stage. This adaptation procedure can gradually
1786
Authorized licensed use limited to: HUNAN UNIVERSITY. Downloaded on March 24,2025 at 09:01:46 UTC from IEEE Xplore. Restrictions apply.
1787
evolve the most suitable learning strategy at different Evolution algorithm (SaDE). The SaDE does not require
learning stages for the problem under consideration. the choice of a certain learning strategy and the setting of
In the original DE, the 3 critical control parameters specific values to critical control parameters CR and F.
CR, F and NP are closely related to the problem under The learning strategy and control parameter CR, which
consideration. Here, we keep NP as a user-specified are highly dependent on the problem's characteristic and
value as in the original DE, so as to deal with problems complexity, are self-adapted by using the previous
with different dimensionalities. Between the two learning experience. Therefore, the SaDE algorithm can
parameters CR and F , CR is much more sensitive to demonstrate consistently good performance on problems
the problem's property and complexity such as the multi- with different properties, such as unimodal and
modality, while F is more related to the convergence multimodal problems. The influence on the performance
speed. According to our initial experiments, the choice of of SaDE by the number of generations during which
F has a larger flexibility, although most of the time the previous learning information is collected is not
values between (0, 1] are preferred. Here, we consider significant. We further investigate this now.
allowing F to take different random values in the range To speed up the convergence of the SaDE algorithm,
(0, 2] with normal distributions of mean 0.5 and standard we apply the local search procedure after a specified
number of generations which is 200 generations in our
deviation 0.3 for different individuals in the current experiments, on 5% individuals including the best
population. This scheme can keep both local (with samll individual found so far and the randomly selected
F values) and global (with large F values) search individuals out of the best 50% individuals in the current
ability to generate the potential good mutant vector population. Here, we employ the Quasi-Newton method
throughout the evolution process. The control parameter as the local search method. A local search operator is
CR, plays an essential role in the original DE algorithm. required as the prespecified MAX_FES are too small to
The proper choice of CR may lead to good performance reach the required level accuracy.
under several learning strategies while a wrong choice
may result in performance deterioration under any
learning strategy. Also, the good CR parameter value
usually falls within a small range, with which the 4 Experimental Results
algorithm can perform consistently well on a complex
problem. Therefore, we consider accumulating the We evaluate the performance of the proposed SaDE
previous learning experience within a certain generation algorithm on a new set of test problems includes 25
interval so as to dynamically adapt the value of CR to a functions with different complexity, where 5 of them are
suitable range. We assume CR normally distributed in a unimodal problems and other 20 are multimodal problems.
Experiments are conducted on all 25 10-D functions and
range with mean CRm and standard deviation 0.1. the former 15 30D problems. We choose the population
Initially, CRm is set at 0.5 and different CR values size to be 50 and 100 for lOD and 30D problems,
conforming this normal distribution are generated for respectively.
each individual in the current population. These CR For each function, the SaDE is run 25 runs. Best
values for all individuals remain for several generations functions error values achieved when FES=le+2,
(5 in our experiments) and then a new set of CR values is FES=le+3, FES=le+4 for the 25 test functions are listed
generated under the same normal distribution. During in Tables 1-5 for lOD and Tables 6-8 for 30D,
every generation, the CR values associated with trial respectively. Successful FES & Success Performance are
vectors successfully entering the next generation are listed in Tables 9 and 10 for 1 OD and 30D, respectively.
recorded. After a specified number of generations (25 in
our experiments), CR has been changed for several times Table 1. Error Values Achieved for Functions 1-5 (1D)0
(25/5=5 times in our experiments) under the same normal IOD 1 2 3 4 5
distribution with center CRm and standard deviation 0.1, 1A 814.1681 3.1353e+003 6.0649e+006 2.7817e+003 6.6495e+003
______ 1.4865e+003 6.0024e+003 2.2955e+007 6.2917e+003 8.4444e+003
and we recalculate the mean of normal distribution of CR 1 13t 2.0310e+003 7.3835e+003 3.401 Oe+007 7.8418e+003 9.1522e+003
e 19t' 2.4178e+003 9.1189e+003 5.3783e+007 9.5946e+003 9.4916e+003
according to all the recorded CR values corresponding to 3 25" 3.2049e+003 1.1484e+004 8.4690e+007 1.5253e+004 1.0831e+004
successful trial vectors during this period. With this new M l.9758e+003 7.3545e+003 3.9124e+007 8.0915e+003 8.9202e+003
normal distribution's mean and the standard devidation Std 651.2718 2.4077e+003 2.1059e+007 3.1272e+003 999.5368
1I 1.1915e-005 7.9389 2.3266e+005 29.7687 126.9805
0.1, we repeat the above procedure. As a result, the proper 7th
13" 2.6208e-005
3.2409e-005
14.1250
19.6960
7.7086e+005
1.0878e+006
57.3773
70.3737
165.4529
184.6404
CR value range for the current problem can be learned to e 19" 4.9557e-005 30.4271 1.7304e+006 91.9872 228.7035
suit the particular problem and. Note that we will empty 4 25"
M
9.9352e-005
3.8254e-005
45.1573
23.2716
2.9366e+006
1.2350e+006
187.8363
83.1323
437.7502
203.5592
the record of the successful CR values once we Std 2.0194e-005 10.7838 6.8592e+005 43.7055 66.1114
recalculate the normal distribution mean to avoid the _ 1" T
7th 0
0
0
0
0
0
0
1.1133e-006
0.0028
possible inappropriate long-term accumulation effects. 1 13" 0 0 j 0 0 0.0073
e 19" 0 0 9.9142e-006 0 0.0168
We introduce the above learning strategy and +
25
th+
h 0 2.5580e-012 1.0309e-004 3.5456e-004 0.0626
parameter adaptation schemes into the original DE M 0 1 .0459e-013 1 .6720e-005 1.4182e-005 0.0123
Std 0 5.1124e-013 3.1196e-005 7.0912e-005 0.0146
algorithm and develop a new Self-adaptive Differential
1787
Authorized licensed use limited to: HUNAN UNIVERSITY. Downloaded on March 24,2025 at 09:01:46 UTC from IEEE Xplore. Restrictions apply.
1788
I
25th 1.3429e+003
M .953e+00
-
1788
Authorized licensed use limited to: HUNAN UNIVERSITY. Downloaded on March 24,2025 at 09:01:46 UTC from IEEE Xplore. Restrictions apply.
1789
- 0 i 2 3 4 $ a 7 a 6
Table 9. Best Error Functions Values Achieved in the Figure 1. Convergence Graph for Function 1-5
MAX FES & Success Perfortmance (IOD)
F
______
(Ml) 7' Md_
Mn 13d 19, 25I
(Max) Mean Std Success
eate -1-
Success Perf.
10126 10126 10126 10126 10126 1 10126 0 1 1.0126e+004 le~~~~~~~~~~~~~~~~~~~~~~~as
-se~~~~~~~~~~~~~~~~~1 "'s;
..e;,
t I zZ.
off ........
k
le L 4 I
i
i ......
i
le r, i
i
lo(,0L-- I
5-~~~~~~~~~~~-
2 t 4 5 6 I 8 0
Pt x IC'
Figure 2. Convergence Graph for Function 6-10
Table 10. Best Error Functions Values Achieved in the
MAX-FES & Success Performance (30D)
F Mint
(Mfin) 7Ih 13th
(Med) 19", 25t'
(Max) Mean Std rate
rate Success Perf.
2.023 2.023 2.0234 2.0234 2.023 2.023 5.0662 1.00 2.0234e+004
1 3e+00 3e+00 e+004 e+004 4e+00 4e+00 e-001
4 4 4 4_ _
1.217 1.334 1.4174 1.4648 - 0.96 1.4883e+005 12
2 Se+00 4e+00 e+005 e+005
5 5
3 0 0
2.448 2.843 2.9639 0.52 5.3816e+005
4 2e+00 4e+00 e+005 ~~~~~~~~~~~~~~~~~~o13
5 5
5 - - - - 0 0
6 0 0
6.964 8.342 1.0162 1.6748 0.80 1.3477e+005
7 8e+00 2e+00 e+005 e+005
4 4
8 0 0
8.299 1.035 1 .0389 1.0395 1.039 9.893 9.0090 1.00 9.8934e4004
9 5e+00 le+00 e+005 e+005 6e+00 464+00 e+003
1 4 5 71
1789
Authorized licensed use limited to: HUNAN UNIVERSITY. Downloaded on March 24,2025 at 09:01:46 UTC from IEEE Xplore. Restrictions apply.
1790
to' .
--.t1a117.
;, ,X
le,
r0j _ alst
Ig r.J.
I
i i.11 'i
I"I.,
: mm.
.Z- uf .l
i; 4
0 1 r0
........... ...................
vI
i,0&
10~~~~~~~~~~~~~~~~~~~~1 I. r¢-
F i~~~~~~~~E r0
i
0i 05 1 1
W . l
t4 9 Zr~~~~~FE
Xt
i0t L ;
f 1r
If IS
r¶0
10' r.'4f
----i*
-- - - - - - - -*-- -r
-- - -- - - -- -- --- -
0 05' t1 6 2 2
- - - -- - i
**_w { %~ ~ ~ ~ ~ ~ ~ 1
Figure 8. CnegneGahfrFnto1-5
0\ 5 2 $ 4 $ 6 7 6 4 i0
, Oxlo'
1790
Authorized licensed use limited to: HUNAN UNIVERSITY. Downloaded on March 24,2025 at 09:01:46 UTC from IEEE Xplore. Restrictions apply.
1791
System Configurations [8] Bryant A. Julstrom, "What Have You Done for Me
Intel Pentiumg 4 CPU 3.00 GHZ Lately? Adapting Operator Probabilities in a Steady-
1 GB of memory State Genetic Algorithm" Proc. of the 6th
Windows XP Professional Version 2002 International Conference on Genetic Algorithms,
Language: Matlab pp.81-87,1995.
Table 9. Algorithm Complexity
TO TI T2 (T2-TI)/TO
D=10 40.0710 31.6860 68.8004 0.8264
D=30 40.0710 38.9190 74.2050 0.8806
D=50 40.0710 47.1940 85.4300 0.9542
5 Conclusions
In this paper, we proposed a Self-adaptive Differential
Evolution algorithm (SaDE), which can automatically
adapt its learning strategies and the asscociated
parameters during the evolving procedure. The
performance of the proposed SaDE algorithm are
evaluated on the newly proposed testbed for CEC2005
special session on real parameter optimization.
Bibliography
[1] R. Storn and K. V. Price, "Differential evolution-A
simple and Efficient Heuristic for Global
Optimization over Continuous Spaces," Journal of
Global Optimization 11:341-359. 1997.
[2] J. Ilonen, J.-K. Kamarainen and J. Lampinen,
"Differential Evolution Training Algorithm for Feed-
Forward Neural Networks," In: Neural Processing
Letters Vol. 7, No. 1 93-105. 2003.
[3] R. Storn, "Differential evolution design of an IIR-
filter," In: Proceedings of IEEE Int. Conference on
Evolutionary Computation ICEC'96. IEEE Press,
New York. 268-273. 1996.
[4] T. Rogalsky, R.W. Derksen, and S. Kocabiyik,
"Differential Evolution in Aerodynamic
Optimization," In: Proc. of 46h Annual Conf of
Canadian Aeronautics and Space Institute. 29-36.
1999.
[5] K. V. Price, "Differential evolution vs. the functions
of the 2nd ICEO", Proc. of 1997 IEEE International
Conference on Evolutionary Computation (ICEC '97),
pp. 153-157, Indianapolis, IN, USA, April 1997.
[6] R. Gaemperle, S. D. Mueller and P. Koumoutsakos,
"A Parameter Study for Differential Evolution", A.
Grmela, N. E. Mastorakis, editors, Advances in
Intelligent Systems, Fuzzy Systems, Evolutionary
Computation, WSEAS Press, pp. 293-298, 2002.
[7] J. Gomez, D. Dasgupta and F. Gonzalez, "Using
Adaptive Operators in Genetic Search", Proc. of the
Genetic and Evolutionary Computation Conference
(GECCO), pp.1580-1581,2003.
1791
Authorized licensed use limited to: HUNAN UNIVERSITY. Downloaded on March 24,2025 at 09:01:46 UTC from IEEE Xplore. Restrictions apply.