Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3638529.3654075acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Bias-Variance Decomposition: An Effective Tool to Improve Generalization of Genetic Programming-based Evolutionary Feature Construction for Regression

Published: 14 July 2024 Publication History

Abstract

Evolutionary feature construction is a technique that has been widely studied in the domain of automated machine learning. A key challenge that needs to be addressed in feature construction is its tendency to overfit the training data. Instead of the traditional approach to control overfitting by reducing model complexity, this paper proposes to control overfitting based on bias-variance decomposition. Specifically, this paper proposes reducing the variance of a model, i.e., reducing the variance of predictions when exposed to data with injected noise, to improve its generalization performance within a multi-objective optimization framework. Experiments conducted on 42 datasets demonstrate that the proposed method effectively controls overfitting and outperforms six model complexity measures for overfitting control. Moreover, further analysis reveals that controlling overfitting adhering to bias-variance decomposition outperforms several plausible variants, highlighting the importance of controlling overfitting based on solid machine learning theory.

Supplemental Material

PDF File
Supplementary Material

References

[1]
Harith Al-Sahaf, Ying Bi, Qi Chen, Andrew Lensen, Yi Mei, Yanan Sun, Binh Tran, Bing Xue, and Mengjie Zhang. 2019. A survey on evolutionary machine learning. Journal of the Royal Society of New Zealand 49, 2 (2019), 205--228.
[2]
Peter L Bartlett, Nick Harvey, Christopher Liaw, and Abbas Mehrabian. 2019. Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks. The Journal of Machine Learning Research 20, 1 (2019), 2285--2301.
[3]
Geoffrey F Bomarito, Patrick E Leser, NCM Strauss, Karl M Garbrecht, and Jacob D Hochhalter. 2022. Bayesian model selection for reducing bloat and overfitting in genetic programming for symbolic regression. In Proceedings of the Genetic and Evolutionary Computation Conference Companion. 526--529.
[4]
Qi Chen, Bing Xue, and Mengjie Zhang. 2020. Improving symbolic regression based on correlation between residuals and variables. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 922--930.
[5]
Qi Chen, Bing Xue, and Mengjie Zhang. 2022. Rademacher Complexity for Enhancing the Generalization of Genetic Programming for Symbolic Regression. IEEE Transactions on Cybernetics 52, 4 (2022), 2382--2395.
[6]
Qi Chen, Mengjie Zhang, and Bing Xue. 2017. Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Transactions on Evolutionary Computation 21, 5 (2017), 792--806.
[7]
Qi Chen, Mengjie Zhang, and Bing Xue. 2018. Structural risk minimization-driven genetic programming for enhancing generalization in symbolic regression. IEEE Transactions on Evolutionary Computation 23, 4 (2018), 703--717.
[8]
Ran Cheng, Yaochu Jin, Markus Olhofer, and Bernhard Sendhoff. 2016. A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Transactions on Evolutionary Computation 20, 5 (2016), 773--791.
[9]
Wei-Yu Chiu, Gary G Yen, and Teng-Kuei Juan. 2016. Minimum manhattan distance approach to multiple criteria decision making in multiobjective optimization problems. IEEE Transactions on Evolutionary Computation 20, 6 (2016), 972--985.
[10]
Fabrício Olivetti de França. 2023. Alleviating overfitting in transformation-interaction-rational symbolic regression with multi-objective optimization. Genetic Programming and Evolvable Machines 24, 2 (2023), 13.
[11]
Fabricio Olivetti de Franca and Gabriel Kronberger. 2023. Reducing Overparameterization of Symbolic Regression Models with Equality Saturation. In Proceedings of the Genetic and Evolutionary Computation Conference. 1064--1072.
[12]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182--197.
[13]
Junlan Dong, Jinghui Zhong, Wei-Neng Chen, and Jun Zhang. 2022. An efficient federated genetic programming framework for symbolic regression. IEEE Transactions on Emerging Topics in Computational Intelligence (2022).
[14]
Jeannie Fitzgerald, R Muhammad Atif Azad, and Conor Ryan. 2013. A bootstrapping approach to reduce over-fitting in genetic programming. In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation. 1113--1120.
[15]
Ivo Gonçalves and Sara Silva. 2013. Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In Genetic Programming: 16th European Conference, EuroGP 2013, Vienna, Austria, April 3-5, 2013. Proceedings 16. Springer, 73--84.
[16]
Thomas Helmuth, Lee Spector, and James Matheson. 2014. Solving uncompromising problems with lexicase selection. IEEE Transactions on Evolutionary Computation 19, 5 (2014), 630--643.
[17]
Ting Hu. 2023. Genetic Programming for Interpretable and Explainable Machine Learning. In Genetic Programming Theory and Practice XIX. Springer, 81--90.
[18]
Maarten Keijzer and Vladan Babovic. 2000. Genetic programming, ensemble methods and the bias/variance tradeoff-introductory investigations. In European Conference on Genetic Programming. Springer, 76--90.
[19]
Krzysztof Krawiec. 2002. Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines 3 (2002), 329--343.
[20]
William La Cava, Thomas Helmuth, Lee Spector, and Jason H Moore. 2019. A probabilistic and multi-objective analysis of lexicase selection and ϵ-lexicase selection. Evolutionary Computation 27, 3 (2019), 377--402.
[21]
William La Cava, Sara Silva, Kourosh Danai, Lee Spector, Leonardo Vanneschi, and Jason H Moore. 2019. Multidimensional genetic programming for multiclass classification. Swarm and Evolutionary Computation 44 (2019), 260--272.
[22]
Jason Liang, Elliot Meyerson, Babak Hodjat, Dan Fink, Karl Mutch, and Risto Miikkulainen. 2019. Evolutionary neural automl for deep learning. In Proceedings of the Genetic and Evolutionary Computation Conference. 401--409.
[23]
Luis Muñoz, Leonardo Trujillo, Sara Silva, Mauro Castelli, and Leonardo Vanneschi. 2019. Evolving multidimensional transformations for symbolic regression with M3GP. Memetic Computing 11 (2019), 111--126.
[24]
Kourosh Neshatian, Mengjie Zhang, and Peter Andreae. 2012. A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Transactions on Evolutionary Computation 16, 5 (2012), 645--661.
[25]
Ji Ni, Russ H Drieberg, and Peter I Rockett. 2012. The use of an analytic quotient operator in genetic programming. IEEE Transactions on Evolutionary Computation 17, 1 (2012), 146--152.
[26]
Ji Ni and Peter Rockett. 2014. Tikhonov regularization as a complexity measure in multiobjective genetic programming. IEEE Transactions on Evolutionary Computation 19, 2 (2014), 157--166.
[27]
Miguel Nicolau and Alexandros Agapitos. 2021. Choosing function sets with better generalisation performance for symbolic regression models. Genetic programming and evolvable machines 22, 1 (2021), 73--100.
[28]
Randal S Olson, William La Cava, Patryk Orzechowski, Ryan J Urbanowicz, and Jason H Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData mining 10, 1 (2017), 1--13.
[29]
Caitlin A Owen, Grant Dick, and Peter A Whigham. 2020. Characterizing genetic programming error through extended bias and variance decomposition. IEEE Transactions on Evolutionary Computation 24, 6 (2020), 1164--1176.
[30]
Caitlin A Owen, Grant Dick, and Peter A Whigham. 2022. Standardization and Data Augmentation in Genetic Programming. IEEE Transactions on Evolutionary Computation 26, 6 (2022), 1596--1608.
[31]
Michael O'Neill, Leonardo Vanneschi, Steven Gustafson, and Wolfgang Banzhaf. 2010. Open issues in genetic programming. Genetic Programming and Evolvable Machines 11 (2010), 339--363.
[32]
Sara Silva, Leonardo Vanneschi, Ana IR Cabral, and Maria J Vasconcelos. 2018. A semi-supervised Genetic Programming method for dealing with noisy labels and hidden overfitting. Swarm and Evolutionary Computation 39 (2018), 323--338.
[33]
Matthew G Smith and Larry Bull. 2005. Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines 6 (2005), 265--281.
[34]
Keith M Sullivan and Sean Luke. 2007. Evolving kernels for support vector machine classification. In Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation. 1702--1707.
[35]
Clíodhna Tuite, Alexandros Agapitos, Michael O'Neill, and Anthony Brabazon. 2011. Early stopping criteria to counteract overfitting in genetic programming. In Proceedings of the 13th Annual Conference Companion on Genetic and Evolutionary Computation. 203--204.
[36]
Leonardo Vanneschi and Mauro Castelli. 2021. Soft target and functional complexity reduction: A hybrid regularization method for genetic programming. Expert Systems with Applications 177 (2021), 114929.
[37]
Leonardo Vanneschi, Mauro Castelli, and Sara Silva. 2010. Measuring bloat, overfitting and functional complexity in genetic programming. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation. 877--884.
[38]
Marco Virgolin, Tanja Alderliesten, and Peter AN Bosman. 2020. On explaining machine learning models by evolving crucial and compact features. Swarm and Evolutionary Computation 53 (2020), 100640.
[39]
Chunyu Wang, Qi Chen, Bing Xue, and Mengjie Zhang. 2023. Shapley Value Based Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression. In Australasian Conference on Data Science and Machine Learning. Springer, 163--176.
[40]
Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, and Yi Ma. 2020. Rethinking bias-variance trade-off for generalization of neural networks. In International Conference on Machine Learning. PMLR, 10767--10777.
[41]
Byoung-Tak Zhang and Heinz Mühlenbein. 1995. Balancing accuracy and parsimony in genetic programming. Evolutionary Computation 3, 1 (1995), 17--38.
[42]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2021. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64, 3 (2021), 107--115.
[43]
Hengzhe Zhang, Qi Chen, Bing Xue, Wolfgang Banzhaf, and Mengjie Zhang. 2023. Modular Multi-Tree Genetic Programming for Evolutionary Feature Construction for Regression. IEEE Transactions on Evolutionary Computation (2023).
[44]
Hengzhe Zhang, Qi Chen, Bing Xue, Wolfgang Banzhaf, and Mengjie Zhang. 2023. A Semantic-Based Hoist Mutation Operator for Evolutionary Feature Construction in Regression. IEEE Transactions on Evolutionary Computation (2023).
[45]
Hengzhe Zhang, Aimin Zhou, Qi Chen, Bing Xue, and Mengjie Zhang. 2023. SR-Forest: A Genetic Programming based Heterogeneous Ensemble Learning Method. IEEE Transactions on Evolutionary Computation (2023).
[46]
Hengzhe Zhang, Aimin Zhou, and Hu Zhang. 2022. An Evolutionary Forest for Regression. IEEE Transactions on Evolutionary Computation 26, 4 (2022), 735--749.

Index Terms

  1. Bias-Variance Decomposition: An Effective Tool to Improve Generalization of Genetic Programming-based Evolutionary Feature Construction for Regression

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '24: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2024
    1657 pages
    ISBN:9798400704949
    DOI:10.1145/3638529
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 July 2024

    Check for updates

    Author Tags

    1. genetic programming
    2. bias-variance decompostion
    3. automated machine learning
    4. evolutionary feature construction

    Qualifiers

    • Research-article

    Conference

    GECCO '24
    Sponsor:
    GECCO '24: Genetic and Evolutionary Computation Conference
    July 14 - 18, 2024
    VIC, Melbourne, Australia

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 56
      Total Downloads
    • Downloads (Last 12 months)56
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media