BOA The Bayesian Optimization Algorithm
BOA The Bayesian Optimization Algorithm
700000 450000
BOA (k=2, K2 metric) BOA (k=5, K2 metric)
GA (one-point) 400000 GA (one-point)
Number of fitness evaluations
300000 200000
150000
200000
100000
100000 50000
0 0
40 60 80 100 120 140 160 180 40 60 80 100 120 140 160 180
Size of the problem Size of the problem
Figure 1: Results for 3-deceptive Function. Figure 3: Results for 6-bipolar Function.
700000
the problem size grows. For loose building blocks, the
600000 BOA (k=4, K2 metric) simple GA with one-point crossover would require the
number of tness evaluations growing exponentially
GA (one-point)
Number of fitness evaluations
400000 other hand, the BOA would perform the same since
it is independent of the variable ordering in a string.
300000
The population sizes for the GA ranged from N = 400
200000 for n = 30 to N = 7700 for n = 180. The population
sizes for the BOA ranged from N = 1000 for n = 30
100000
to N = 7700 for n = 180.
0
40 60 80 100 120 140 160 180 In Figure 2, the results for the trap-5 function are
Size of the problem presented. The building blocks are non-overlapping
and they are again mapped tightly onto a string. The
Figure 2: Results for trap-5 Function. results for this function are similar to those for the
3-deceptive function. The population sizes for the GA
ranged from N = 600 for n = 30 to N = 8100 for
been set to 1%. In the BOA, no prior information but n = 180. The population sizes for the BOA ranged
the maximal order of interactions to be considered has from N = 1300 for n = 30 to N = 11800 for n = 180.
been incorporated into the algorithm. In Figure 3, the results for the 6-bipolar function are
In Figure 1, the results for the 3-deceptive function presented. The results for this function are similar
are presented. In this function, the deceptive building to those for the 3-deceptive function. In addition to
blocks are of order 3. The building blocks are non- the faster convergence, then BOA discovers a number of
overlapping and mapped tightly onto strings. There- solutions out of totally 2 6 global optima of 6-bipolar
fore, one-point crossover is not likely to disrupt them. function instead of converging into a single solution.
The looser the building blocks would be, the worse the This eect could be further magnied by using niching
simple GA would perform. Since the building blocks methods. The population sizes for the GA ranged from
are deceptive, the computational requirements of the N = 360 for n = 30 to N = 4800 for n = 180. The
simple GA with uniform crossover and the BOA with population sizes for the BOA ranged from N = 900
k = 0 (i.e., the UMDA) grow exponentially and there- for n = 30 to N = 5000 for n = 180.
fore we do not present the results for these algorithms.
Some results for BMDA can be found in Pelikan and
Muhlenbein (1999). The BOA with k = 2 and the K2 7 CONCLUSIONS
metric performs the best of the compared algorithms
in terms of the number of functions evaluations until The experiments have shown that the proposed algo-
successful convergence. The simple GA with one-point rithm outperforms the simple GA even on decompos-
crossover performs worse than the BOA with k = 2 as able problems with tight building blocks as the prob-
lem size grows. The gap between the proposed al- Addison-Wesley.
gorithm and the simple GA would signicantly en- Goldberg, D. E., Korb, B., & Deb, K. (1989). Messy
large for problems with loose building blocks. For genetic algorithms: Motivation, analysis, and rst
loose mapping the time requirements of the simple results. Complex Systems , 3 (5), 493{530.
GA grow exponentially with the problem size. On Harik, G. R., & Goldberg, D. E. (1996). Learning link-
the other hand, the BOA is independent of the order- age. Foundations of Genetic Algorithms 4 , 247{262.
ing of the variables in a string and therefore changing Harik, G. R., Lobo, F. G., & Goldberg, D. E. (1997).
this would not aect the performance of the algorithm. The compact genetic algorithm (IlliGAL Report No.
The proposed algorithm works very well also for other 97006). Urbana: University of Illinois at Urbana-
problems with highly overlapping building blocks, e.g. Champaign.
spin-glasses, that are not discussed in this paper. Heckerman, D., Geiger, D., & Chickering, M. (1994).
Learning Bayesian networks: The combination of
knowledge and statistical data (Technical Report
Acknowledgments MSR-TR-94-09). Redmond, WA: Microsoft Re-
search.
The authors would like to thank Heinz Muhlenbein, Howard, R. A., & Matheson, J. E. (1981). Inuence dia-
David Heckerman, and Ole J. Mengshoel for valuable grams. In Howard, R. A., & Matheson, J. E. (Eds.),
discussions and useful comments. Martin Pelikan was Readings on the Principles and Applications of Deci-
supported by grants number 1/4209/97 and 1/5229/98 sion Analysis, Volume II (pp. 721{762). Menlo Park,
of the Scientic Grant Agency of Slovak Republic. CA: Strategic Decisions Group.
Kargupta, H. (1998). Revisiting the GEMGA: Scalable
The work was sponsored by the Air Force Oce of evolutonary optimization through linkage learning.
Scientic Research, Air Force Materiel Command, In Proceedings of 1998 IEEE International Confer-
USAF, under grant number F49620-97-1-0050. Re- ence on Evolutionary Computation (pp. 603{608).
search funding for this project was also provided by IEEE Press.
a grant from the U.S. Army Research Laboratory Kvasnicka, V., Pelikan, M., & Pospichal, J. (1996). Hill
under the Federated Laboratory Program, Coopera- climbing with learning (An abstraction of genetic al-
tive Agreement DAAL01-96-2-0003. The U.S. Govern- gorithm). Neural Network World , 6 , 773{796.
ment is authorized to reproduce and distribute reprints Muhlenbein, H. (1997). The equation for response to se-
for Governmental purposes notwithstanding any copy- lection and its use for prediction. Evolutionary Com-
putation , 5 (3), 303{346.
right notation thereon. The views and conclusions con- Muhlenbein, H., Mahnig, T., & Rodriguez, A. O. (1998).
tained herein are those of the authors and should not Schemata, distributions and graphical models in evo-
be interpreted as necessarily representing the ocial lutionary optimization. Submitted for publication.
policies and endorsements, either expressed or implied, Muhlenbein, H., & Paa, G. (1996). From recombination
of the Air Force of Scientic Research or the U.S. Gov- of genes to the estimation of distributions I. Binary
ernment. parameters. Parallel Problem Solving from Nature,
PPSN IV , 178{187.
References Munetomo, M., & Goldberg, D. E. (1998). Design-
ing a genetic algorithm using the linkage identi-
Baluja, S. (1994). Population-based incremental learn- cation by nonlinearity check (Technical Report
ing: A method for integrating genetic search 98014). Urbana, IL: University of Illinois at Urbana-
based function optimization and competitive learning Champaign.
(Tech. Rep. No. CMU-CS-94-163). Pittsburgh, PA: Pearl, J. (1988). Probabilistic reasoning in intelligent
Carnegie Mellon University. systems: Networks of plausible inference. San Ma-
Baluja, S., & Davies, S. (1997). Using optimal teo, California: Morgan Kaufmann.
dependency-trees for combinatorial optimization: Pelikan, M., & Muhlenbein, H. (1999). The bivariate
Learning the structure of the search space. In Pro- marginal distribution algorithm. In Roy, R., Fu-
ceedings of the 14th International Conference on Ma- ruhashi, T., & Chawdhry, P. K. (Eds.), Advances
chine Learning (pp. 30{38). Morgan Kaufmann. in Soft Computing - Engineering Design and Manu-
De Bonet, J. S., Isbell, C. L., & Viola, P. (1997). MIMIC: facturing (pp. 521{535). London: Springer-Verlag.
Finding optima by estimating probability densities. Thierens, D. (1995). Analysis and design of genetic al-
In Mozer, M. C., Jordan, M. I., & Petsche, T. (Eds.), gorithms. Leuven, Belgium: Katholieke Universiteit
Advances in Neural Information Processing Systems, Leuven.
Volume 9 (pp. 424). The MIT Press, Cambridge.
Edmonds, J. (1967). Optimum branching. J. Res.
NBS , 71B , 233{240.
Goldberg, D. E. (1989). Genetic algorithms in search,
optimization, and machine learning. Reading, MA: