Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

On sampling error in genetic programming

Published: 01 June 2022 Publication History

Abstract

The initial population in genetic programming (GP) should form a representative sample of all possible solutions (the search space). While large populations accurately approximate the distribution of possible solutions, small populations tend to incorporate a sampling error. This paper analyzes how the size of a GP population affects the sampling error and contributes to answering the question of how to size initial GP populations. First, we present a probabilistic model of the expected number of subtrees for GP populations initialized with full, grow, or ramped half-and-half. Second, based on our frequency model, we present a model that estimates the sampling error for a given GP population size. We validate our models empirically and show that, compared to smaller population sizes, our recommended population sizes largely reduce the sampling error of measured fitness values. Increasing the population sizes even more, however, does not considerably reduce the sampling error of fitness values. Last, we recommend population sizes for some widely used benchmark problem instances that result in a low sampling error. A low sampling error at initialization is necessary (but not sufficient) for a reliable search since lowering the sampling error means that the overall random variations in a random sample are reduced. Our results indicate that sampling error is a severe problem for GP, making large initial population sizes necessary to obtain a low sampling error. Our model allows practitioners of GP to determine a minimum initial population size so that the sampling error is lower than a threshold, given a confidence level.

References

[1]
Burlacu B, Kommenda M, Affenzeller M (2015) Building blocks identification based on subtree sample counts for genetic programming. In: Proceedings of the 2015 Asia-Pacific conference on computer aided system engineering, IEEE Computer Society, APCASE ’15, pp 152–157
[2]
Burlacu B, Affenzeller M, Kommenda M, Kronberger G, and Winkler S Moreno-Díaz R, Pichler F, and Quesada-Arencibia A Analysis of schema frequencies in genetic programming Computer aided systems theory—EUROCAST 2017 2018 Cham Springer 432-438
[3]
Burlacu B, Affenzeller M, Kommenda M, Kronberger G, and Winkler S Banzhaf W, Olson RS, Tozier W, and Riolo R Schema analysis in tree-based genetic programming Genetic programming theory and practice XV 2018 Cham Springer 17-37
[4]
Cochran WG Sampling techniques 1977 3 New York Wiley
[5]
De Jong KA (1975) An analysis of the behavior of a class of genetic adaptive systems. Doctoral dissertation, University of Michigan, Ann Arbor, MI
[6]
Fortin FA, De Rainville FM, Gardner MA, Parizeau M, and Gagné C DEAP: evolutionary algorithms made easy J Mach Learn Res 2012 13 2171-2175
[7]
Goldberg DE Genetic algorithms in search, optimization, and machine learning 1989 Boston Addison-Wesley Publishing Company Inc
[8]
Goldberg DE Schaffer J Sizing populations for serial and parallel genetic algorithms Proceedings of the 3rd international conference on genetic algorithms 1989 San Francisco Morgan Kaufmann Publishers Inc. 70-79
[9]
Goldberg DE The design of innovation: lessons from and for competent genetic algorithms, genetic algorithms and evolutionary computation 2002 Boston Springer
[10]
Goldberg DE and Rudnick M Genetic algorithms and the variance of fitness Complex Syst 1991 5 3 265-278
[11]
Goldberg DE, Segrest P (1987) Finite Markov chain analysis of genetic algorithms. In: Proceedings of the second international conference on genetic algorithms and their application. L. Erlbaum Associates Inc., Hillsdale, pp 1–8. http://dl.acm.org/citation.cfm?id=42512.42513
[12]
Goldberg DE, Deb K, and Clark JH Genetic algorithms, noise, and the sizing of populations Complex Syst 1992 6 4 333-362
[13]
Goldberg DE, Sastry K, and Latoza T Spector L, Goodman ED, Wu A, Langdon WB, Voigt HM, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, and Burke E On the supply of building blocks Proceedings of the genetic and evolutionary computation conference 2001 2001 San Francisco Morgan Kaufmann Publishers 336-342
[14]
Harik G, Cantú-Paz E, Goldberg DE, and Miller BL The Gambler’s ruin problem, genetic algorithms, and the sizing of populations Evol Comput 1999 7 3 231-253
[15]
Hemberg E, Veeramachaneni K, McDermott J, Berzan C, O’Reilly UM (2012) An investigation of local patterns for estimation of distribution genetic programming. In: Proceedings of the 14th annual conference on genetic and evolutionary computation (GECCO ’12). ACM, New York, pp 767–774
[16]
Holland JH Genetic algorithms and the optimal allocation of trials SIAM J Comput 1973 2 2 88-105
[17]
Holland JH Adaptation in natural and artificial systems 1975 Ann Arbor University of Michigan Press
[18]
Hu T, Banzhaf W (2009) The role of population size in rate of evolution in genetic programming. In: Vanneschi L, Gustafson S, Moraglio A, De Falco I, Ebner M (eds) Proceedings of the 12th European conference on genetic programming (EuroGP 2009), LNCS, vol 5481. Springer, Berlin, pp 85–96
[19]
Keijzer M (2003) Improving symbolic regression with interval arithmetic and linear scaling. In: European conference on genetic programming. Springer, Berlin, pp 70–82
[20]
Kim K, Shan Y, Nguyen XH, and McKay RIB Probabilistic model building in genetic programming: a critical review Genet Program Evol Mach 2014 15 2 115-167
[21]
Koza JR Genetic programming: on the programming of computers by means of natural selection 1992 Cambridge MIT Press
[22]
Lee CF, Lee JC, and Lee AC Statistics for business and financial economics 2013 3 New York Springer
[23]
Luke S Two fast tree-creation algorithms for genetic programming IEEE Trans Evol Comput 2000 4 3 274-283
[24]
McDermott J, White D, Luke S, Manzoni L, Castelli M, Vanneschi L, Jaśkowski W, Krawiec K, Harper R, De Jong K, O’Reilly UM (2012) Genetic programming needs better benchmarks. In: GECCO’12—proceedings of the 14th international conference on genetic and evolutionary computation, pp 791–798
[25]
O’Reilly UM and Oppacher F Whitley LD The troubling aspects of a building block hypothesis for genetic programming Foundations of genetic algorithms 1994 Estes Park Morgan Kaufmann 73-88
[26]
Pagie L and Hogeweg P Evolutionary consequences of coevolving targets Evol Comput 1997 5 4 401-418
[27]
Poli R Exact schema theory for genetic programming and variable-length genetic algorithms with one-point crossover Genet Program Evol Mach 2001 2 2 123-163
[28]
Poli R and Langdon WB Schema theory for genetic programming with one-point crossover and point mutation Evol Comput 1998 6 3 231-252
[29]
Poli R and McPhee NF General schema theory for genetic programming with subtree-swapping crossover: part II Evol Comput 2003 11 2 169-206
[30]
Poli R, Langdon WB, McPhee NF (2008) A field guide to genetic programming. Lulu Enterprises, http://www.gp-field-guide.org.uk
[31]
Reeves CR (1993) Using genetic algorithms with small populations. In: Proceedings of the 5th international conference on genetic algorithms. Morgan Kaufmann Publishers Inc., San Francisco, pp 92–99
[32]
Rothlauf F Design of modern heuristics: principles and application. Natural computing series 2011 Heidelberg Springer
[33]
Särndal CE, Swensson B, and Wretman J Model assisted survey sampling. Springer series in statistics 1992 New York Springer
[34]
Sastry K, O’Reilly UM, Goldberg DE, and Hill D Building-block supply in genetic programming Genetic programming theory and practice 2003 Boston Springer 137-154
[35]
Sastry K, O’Reilly UM, Goldberg DE (2005) Population sizing for genetic programming based on decision-making. In: Genetic programming theory and practice II. Springer, New York, pp 49–65.
[36]
Shan Y, McKay RIB, Essam D, and Abbass H A survey of probabilistic model building genetic programming Scal Optim Probab Model 2006 160 121-160
[37]
Uy NQ, Hoai NX, O’Neill M, McKay RI, and Galván-López E Semantically-based crossover in genetic programming: application to real-valued symbolic regression Genet Program Evol Mach 2011 12 2 91-119
[38]
Walsh P, Ryan C (1996) Paragen: a novel technique for the autoparallelisation of sequential programs using GP. In: Proceedings of the 1st annual conference on genetic programming. MIT Press, Cambridge, pp 406–409
[39]
Whigham PA (1995) A schema theorem for context-free grammars. In: IEEE conference on evolutionary computation, vol 1. IEEE Press, Perth, pp 178–181

Cited By

View all
  • (2023)Denoising autoencoder genetic programming: strategies to control exploration and exploitation in searchGenetic Programming and Evolvable Machines10.1007/s10710-023-09462-224:2Online publication date: 8-Nov-2023
  • (2021)Improving estimation of distribution genetic programming with novelty initializationProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3449726.3459410(261-262)Online publication date: 7-Jul-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Natural Computing: an international journal
Natural Computing: an international journal  Volume 21, Issue 2
Jun 2022
225 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 June 2022
Accepted: 25 November 2020

Author Tags

  1. Sampling error
  2. Initial supply
  3. Genetic programming
  4. Building blocks
  5. Initial population
  6. Ramped half-and-half
  7. Full
  8. Grow
  9. n-Grams

Qualifiers

  • Research-article

Funding Sources

  • Projekt DEAL

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Denoising autoencoder genetic programming: strategies to control exploration and exploitation in searchGenetic Programming and Evolvable Machines10.1007/s10710-023-09462-224:2Online publication date: 8-Nov-2023
  • (2021)Improving estimation of distribution genetic programming with novelty initializationProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3449726.3459410(261-262)Online publication date: 7-Jul-2021

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media