Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3377930.3390244acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Adaptive weighted splines: a new representation to genetic programming for symbolic regression

Published: 26 June 2020 Publication History

Abstract

Genetic Programming for Symbolic Regression is often prone to overfit the training data, resulting in poor generalization on unseen data. To address this issue, many pieces of research have been devoted to regularization via controlling the model complexity. However, due to the unstructured tree based representation of individuals the model complexity cannot be directly computed, rather approximation of the complexity must be taken. This paper proposes a new novel representation called Adaptive Weighted Splines which enables explicit control over the complexity of individuals using splines. The experimental results confirm that this new representation is significantly better than the tree-based representation at avoiding overfitting and generalizing on unseen data, demonstrating notably better and far more consistent generalization performances on all the benchmark problems. Further analysis also shows that in most cases, the new Genetic Programming method outperforms classical regression techniques such as Linear Regression, Support Vector Regression, K-Nearest Neighbour and Decision Tree Regression and performs competitively with state-of-the-art ensemble regression methods Random Forests and Gradient Boosting.

References

[1]
David W Aha and Richard L Bankert. 1996. A Comparative Evaluation of Sequential Feature Selection Algorithms. In Learning from data. Springer, 199--206.
[2]
Harith Al-Sahaf, Ying Bi, Qi Chen, Andrew Lensen, Yi Mei, Yanan Sun, Binh Tran, Bing Xue, and Mengjie Zhang. 2019. A survey on evolutionary machine learning. Journal of the Royal Society of New Zealand 49, 2 (2019), 205--228.
[3]
Shun-ichi Amari and Si Wu. 1999. Improving Support Vector Machine Classifiers by Modifying Kernel Functions. Neural Networks 12, 6 (1999), 783--789.
[4]
Francesco Archetti, Stefano Lanzeni, Enza Messina, and Leonardo Vanneschi. 2007. Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines 8, 4 (2007), 413--432.
[5]
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. 1987. Occam's Razor. Information processing letters 24, 6 (1987), 377--380.
[6]
Qi Chen, Bing Xue, Lin Shang, and Mengjie Zhang. 2016. Improving Generalisation of Genetic Programming for Symbolic Regression with Structural Risk Minimisation. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. ACM, 709--716.
[7]
Qi Chen, Mengjie Zhang, and Bing Xue. 2018. Structural Risk Minimisation-Driven Genetic Programming for Enhancing Generalisation in Symbolic Regression. IEEE Transactions on Evolutionary Computation (2018).
[8]
David Cohn, Les Atlas, and Richard Ladner. 1994. Improving Generalization with Active Learning. Machine learning 15, 2 (1994), 201--221.
[9]
Paul Dierckx. 1975. An algorithm for smoothing, differentiation and integration of experimental data using spline functions. J. Comput. Appl. Math. 1, 3 (1975), 165--184.
[10]
Paul Dierckx. 1981. An improved algorithm for curve fitting with spline functions. TW Reports (1981).
[11]
Paul Dierckx. 1982. A fast algorithm for smoothing data on a rectangular grid while using spline functions. SIAM J. Numer. Anal. 19, 6 (1982), 1286--1304.
[12]
Paul Dierckx. 1995. Curve and Surface Fitting with Splines. Oxford University Press.
[13]
Tom Dietterich. 1995. Overfitting and Undercomputing in Machine Learning. ACM computing surveys 27, 3 (1995), 326--327.
[14]
Maarten Keijzer. 2003. Improving Symbolic Regression with Interval Arithmetic and Linear Scaling. In European Conference on Genetic Programming. Springer, 70--82.
[15]
John R Koza and John R Koza. 1992. Genetic programming: on the programming of computers by means of natural selection. Vol. 1. MIT press.
[16]
Nam Le, Hoai Nguyen Xuan, Anthony Brabazon, and Thuong Pham Thi. 2016. Complexity measures in Genetic Programming learning: a brief review. In 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2409--2416.
[17]
Sean Luke and Liviu Panait. 2002. Fighting Bloat with Nonparametric Parsimony Pressure. In International Conference on Parallel Problem Solving from Nature. Springer, 411--421.
[18]
Sean Luke and Liviu Panait. 2002. Lexicographic Parsimony Pressure. In Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation. Morgan Kaufmann Publishers Inc., 829--836.
[19]
Sean Luke and Liviu Panait. 2006. A comparison of bloat control methods for genetic programming. Evolutionary Computation 14, 3 (2006), 309--344.
[20]
James McDermott, David R White, Sean Luke, Luca Manzoni, Mauro Castelli, Leonardo Vanneschi, Wojciech Jaskowski, Krzysztof Krawiec, Robin Harper, Kenneth De Jong, et al. 2012. Genetic Programming Needs Better Benchmarks. In Proceedings of the 14th annual conference on Genetic and evolutionary computation. 791--798.
[21]
Samaneh Sadat Mousavi Astarabadi and Mohammad Mehdi Ebadzadeh. 2015. Avoiding Overfitting in Symbolic Regression using the First Order Derivative of GP Trees. In Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation. ACM, 1441--1442.
[22]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[23]
Jeffrey S Racine. 2014. A Primer on Regression Splines. URL: http://cranrprojectorg/web/packages/crs/vignettes/splineprimerpdf (2014).
[24]
Christian Raymond, Qi Chen, Bing Xue, and Mengjie Zhang. 2019. Genetic Programming with Rademacher Complexity for Symbolic Regression. In 2019 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2657--2664.
[25]
Sara Silva, Stephen Dignum, and Leonardo Vanneschi. 2012. Operator equalisation for bloat free genetic programming and a survey of bloat control methods. Genetic Programming and Evolvable Machines 13, 2 (2012), 197--238.
[26]
Leonardo Trujillo, Sara Silva, Pierrick Legrand, and Leonardo Vanneschi. 2011. An Empirical Study of Functional Complexity as an Indicator of Overfitting in Genetic Programming. In European Conference on Genetic Programming. Springer, 262--273.
[27]
Leonardo Vanneschi, Mauro Castelli, and Sara Silva. 2010. Measuring Bloat, Overfitting and Functional Complexity in Genetic Programming. In Proceedings of the 12th annual conference on Genetic and evolutionary computation. ACM, 877--884.
[28]
Vladimir Vapnik. 1992. Principles of Risk Minimization for Learning Theory. In Advances in neural information processing systems. 831--838.
[29]
Ekaterina J Vladislavleva, Guido F Smits, and Dick Den Hertog. 2008. Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming. IEEE Transactions on Evolutionary Computation 13, 2 (2008), 333--349.

Cited By

View all
  • (2023)Explainable Artificial Intelligence by Genetic Programming: A SurveyIEEE Transactions on Evolutionary Computation10.1109/TEVC.2022.322550927:3(621-641)Online publication date: Jun-2023
  • (2023)Dynamic Grammar Pruning for Program Size Reduction in Symbolic RegressionSN Computer Science10.1007/s42979-023-01840-y4:4Online publication date: 17-May-2023
  • (2023)Evolutionary Regression and ModellingHandbook of Evolutionary Machine Learning10.1007/978-981-99-3814-8_5(121-149)Online publication date: 2-Nov-2023
  • Show More Cited By

Index Terms

  1. Adaptive weighted splines: a new representation to genetic programming for symbolic regression

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference
    June 2020
    1349 pages
    ISBN:9781450371285
    DOI:10.1145/3377930
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 June 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. generalization
    2. genetic programming
    3. representation
    4. spline
    5. symbolic regression

    Qualifiers

    • Research-article

    Conference

    GECCO '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)35
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 12 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Explainable Artificial Intelligence by Genetic Programming: A SurveyIEEE Transactions on Evolutionary Computation10.1109/TEVC.2022.322550927:3(621-641)Online publication date: Jun-2023
    • (2023)Dynamic Grammar Pruning for Program Size Reduction in Symbolic RegressionSN Computer Science10.1007/s42979-023-01840-y4:4Online publication date: 17-May-2023
    • (2023)Evolutionary Regression and ModellingHandbook of Evolutionary Machine Learning10.1007/978-981-99-3814-8_5(121-149)Online publication date: 2-Nov-2023
    • (2022)Multi-objective Genetic Programming with the Adaptive Weighted Splines Representation for Symbolic RegressionGenetic Programming10.1007/978-3-031-02056-8_4(51-67)Online publication date: 13-Apr-2022
    • (2022)Generalisation in Genetic Programming for Symbolic Regression: Challenges and Future DirectionsWomen in Computational Intelligence10.1007/978-3-030-79092-9_13(281-302)Online publication date: 14-Apr-2022
    • (2021)Estimation of COVID-19 Epidemiology Curve of the United States Using Genetic Programming AlgorithmInternational Journal of Environmental Research and Public Health10.3390/ijerph1803095918:3(959)Online publication date: 22-Jan-2021
    • (2021)Multi-objective genetic programming for symbolic regression with the adaptive weighted splines representationProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3449726.3459461(165-166)Online publication date: 7-Jul-2021
    • (2021)Automated Behavior-based Malice Scoring of Ransomware Using Genetic Programming2021 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI50451.2021.9660009(01-08)Online publication date: 5-Dec-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media