research-article

Symbolic regression driven by training data and prior knowledge

Authors:

Jiří Kubalík,

Robert BabuškaAuthors Info & Claims

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

Pages 958 - 966

https://doi.org/10.1145/3377930.3390152

Published: 26 June 2020 Publication History

Abstract

In symbolic regression, the search for analytic models is typically driven purely by the prediction error observed on the training data samples. However, when the data samples do not sufficiently cover the input space, the prediction error does not provide sufficient guidance toward desired models. Standard symbolic regression techniques then yield models that are partially incorrect, for instance, in terms of their steady-state characteristics or local behavior. If these properties were considered already during the search process, more accurate and relevant models could be produced. We propose a multi-objective symbolic regression approach that is driven by both the training data and the prior knowledge of the properties the desired model should manifest. The properties given in the form of formal constraints are internally represented by a set of discrete data samples on which candidate models are exactly checked. The proposed approach was experimentally evaluated on three test problems with results clearly demonstrating its capability to evolve realistic models that fit the training data well while complying with the prior knowledge of the desired model characteristics at the same time. It outperforms standard symbolic regression by several orders of magnitude in terms of the mean squared deviation from a reference model.

References

[1]

Alibekov, E., Kubalík, J., and Babuska, R. Policy derivation methods for critic-only reinforcement learning in continuous spaces. Eng. Appl. of AI 69 (2018), 178--187.

[2]

Alibekov, E., Kubalík, J., and Babuška, R. Symbolic method for deriving policy in reinforcement learning. In 2016 IEEE 55th Conference on Decision and Control (CDC) (Dec 2016), pp. 2789--2795.

Digital Library

[3]

Arnaldo, I., Krawiec, K., and O'Reilly, U.-M. Multiple regression genetic programming. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation (New York, NY, USA, 2014), GECCO '14, Association for Computing Machinery, p. 879--886.

[4]

Arnaldo, I., O'Reilly, U.-M., and Veeramachaneni, K. Building predictive models via feature synthesis. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation (New York, NY, USA, 2015), GECCO '15, Association for Computing Machinery, p. 983--990.

[5]

Babuška, R. Fuzzy Modeling for Control. Kluwer Academic Publishers, Boston, USA, 1998.

Digital Library

[6]

Błądek, I., and Krawiec, K. Solving symbolic regression problems with formal constraints. In Proceedings of the Genetic and Evolutionary Computation Conference (New York, NY, USA, 2019), GECCO '19, ACM, pp. 977--984.

Digital Library

[7]

Boedecker, J., Springenberg, J. T., Wülfing, J., and Riedmiller, M. Approximate real-time optimal control based on sparse gaussian process models. In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) (Dec 2014), pp. 1--8.

[8]

Damsteeg, J., Nageshrao, S., and Babuška, R. Model-based real-time control of a magnetic manipulator system. In Proceedings 56th IEEE Conference on Decision and Control (CDC) (Melbourne, Australia, Dec. 2017), pp. 3277--3282.

[9]

de Bruin, T., Kober, J., Tuyls, K., and Babuška, R. Integrating state representation learning into deep reinforcement learning. IEEE Robotics and Automation Letters 3, 3 (July 2018), 1394--1401.

[10]

Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (April 2002), 182--197.

Digital Library

[11]

Deisenroth, M. P., and Rasmussen, C. E. PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28-July 2, 2011 (2011), pp. 465--472.

[12]

Derner, E., Kubalík, J., and Babuska, R. Data-driven construction of symbolic process models for reinforcement learning. In Proceedings IEEE International Conference on Robotics and Automation (ICRA) (Brisbane, Australia, May 2018), pp. 5105--5112.

[13]

Derner, E., Kubalík, J., and Babuska, R. Reinforcement learning with symbolic input-output models. In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2018), pp. 3004--3009.

[14]

Grondman, I., Vaandrager, M., Busoniu, L., Babuska, R., and Schuitema, E. Efficient model learning methods for actor-critic control. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 3 (June 2012), 591--602.

Digital Library

[15]

Hurak, Z., and Zemanek, J. Feedback linearization approach to distributed feedback manipulation. In American control conference (Montreal, Canada, 2012), pp. 991--996.

[16]

Jackson, D. A new, node-focused model for genetic programming. In Proceedings of the 15th European Conference on Genetic Programming (Berlin, Heidelberg, 2012), EuroGP'12, Springer-Verlag, p. 49--60.

Digital Library

[17]

Krawiec, K., Błądek, I., and Swan, J. Counterexample-driven genetic programming. In Proceedings of the Genetic and Evolutionary Computation Conference (New York, NY, USA, 2017), GECCO '17, ACM, pp. 953--960.

Digital Library

[18]

Kubalík, J., Alibekov, E., Žegklitz, J., and Babuška, R. Hybrid single node genetic programming for symbolic regression. In Transactions on Computational Collective Intelligence XXIV - Volume 9770 (Berlin, Heidelberg, 2016), Springer-Verlag, p. 61--82.

Digital Library

[19]

Kubalík., J., Derner., E., and Babuška., R. Enhanced symbolic regression through local variable transformations. In Proceedings of the 9th International Joint Conference on Computational Intelligence - Volume 1: IJCCI, (2017), INSTICC, SciTePress, pp. 91--100.

[20]

Levine, S., and Abbeel, P. Learning neural network policies with guided policy search under unknown dynamics. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1 (Cambridge, MA, USA, 2014), NIPS'14, MIT Press, p. 1071--1079.

[21]

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning, 2015.

[22]

Lioutikov, R., Paraschos, A., Peters, J., and Neumann, G. Sample-based informationl-theoretic stochastic optimal control. In 2014 IEEE International Conference on Robotics and Automation (ICRA) (May 2014), pp. 3896--3902.

[23]

Schmidt, M., and Lipson, H. Distilling free-form natural laws from experimental data. Science 324, 5923 (2009), 81--85.

[24]

Searson, D. P. Gptips 2: An open-source software platform for symbolic data mining. In Handbook of Genetic Programming Applications (2014).

[25]

Staelens, N., Deschrijver, D., Vladislavleva, E., Vermeulen, B., Dhaene, T., and Demeester, P. Constructing a no-reference h.264/avc bitstream-based video quality metric using genetic programming-based symbolic regression. IEEE Trans. Cir. and Sys. for Video Technol. 23, 8 (Aug. 2013), 1322--1333.

[26]

Vladislavleva, E., Friedrich, T., Neumann, F., and Wagner, M. Predicting the energy output of wind farms based on weather data: Important variables and their correlation. Renewable Energy 50 (2013), 236 -- 243.

Cited By

Gupt KKshirsagar MDias DSullivan JRyan C(2024)A novel ML-driven test case selection approach for enhancing the performance of grammatical evolutionFrontiers in Computer Science10.3389/fcomp.2024.13461496Online publication date: 27-Jun-2024
https://doi.org/10.3389/fcomp.2024.1346149
Vastl MKulhánek JKubalík JDerner EBabuška R(2024)SymFormer: End-to-End Symbolic Regression Using Transformer-Based ArchitectureIEEE Access10.1109/ACCESS.2024.337464912(37840-37849)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3374649
Cory-Wright RCornelio CDash SEl Khadir BHoresh L(2024)Evolving scientific discovery by unifying data and background knowledge with AI HilbertNature Communications10.1038/s41467-024-50074-w15:1Online publication date: 14-Jul-2024
https://doi.org/10.1038/s41467-024-50074-w
Show More Cited By

Index Terms

Symbolic regression driven by training data and prior knowledge

Recommendations

Scaled Symbolic Regression

Performing a linear regression on the outputs of arbitrary symbolic expressions has empirically been found to provide great benefits. Here some basic theoretical results of linear regression are reviewed on their applicability for use in symbolic ...
Characterising the Double Descent of Symbolic Regression
GECCO '24 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Recent work has argued that many machine learning techniques exhibit a 'double descent' in model risk, where increasing model complexity beyond an interpolation zone can overcome the bias-variance tradeoff to produce large, over-parameterised models that ...
Symbolic and numerical regression: experiments and applications
Special issue on recent advances in soft computing

This paper describes a new method for creating polynomial regression models. The new method is compared with stepwise regression and symbolic regression using three example problems. The first example is a polynomial equation. The two examples that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference

June 2020

1349 pages

ISBN:9781450371285

DOI:10.1145/3377930

General Chair:
Carlos Artemio Coello Coello
CINVESTAV-IPN

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

European Regional Development Fund

Conference

GECCO '20

Sponsor:

SIGEVO

GECCO '20: Genetic and Evolutionary Computation Conference

July 8 - 12, 2020

Cancún, Mexico

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
456
Total Downloads

Downloads (Last 12 months)119
Downloads (Last 6 weeks)8

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gupt KKshirsagar MDias DSullivan JRyan C(2024)A novel ML-driven test case selection approach for enhancing the performance of grammatical evolutionFrontiers in Computer Science10.3389/fcomp.2024.13461496Online publication date: 27-Jun-2024
https://doi.org/10.3389/fcomp.2024.1346149
Vastl MKulhánek JKubalík JDerner EBabuška R(2024)SymFormer: End-to-End Symbolic Regression Using Transformer-Based ArchitectureIEEE Access10.1109/ACCESS.2024.337464912(37840-37849)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3374649
Cory-Wright RCornelio CDash SEl Khadir BHoresh L(2024)Evolving scientific discovery by unifying data and background knowledge with AI HilbertNature Communications10.1038/s41467-024-50074-w15:1Online publication date: 14-Jul-2024
https://doi.org/10.1038/s41467-024-50074-w
Guo JYin W(2024)Harnessing data using symbolic regression methods for discovering novel paradigms in physicsScience China Physics, Mechanics & Astronomy10.1007/s11433-023-2346-267:6Online publication date: 30-Apr-2024
https://doi.org/10.1007/s11433-023-2346-2
Haider Cde Franca FBurlacu BBachinger FKronberger GAffenzeller M(2024)Shape-constrained Symbolic Regression: Real-World Applications in Magnetization, Extrusion and Data ValidationGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_12(225-240)Online publication date: 18-Feb-2024
https://doi.org/10.1007/978-981-99-8413-8_12
Kubalík JDerner EBabuška R(2023)Toward Physically Plausible Data-Driven Models: A Novel Neural Network Approach to Symbolic RegressionIEEE Access10.1109/ACCESS.2023.328739711(61481-61501)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3287397
Cornelio CDash SAustel VJosephson TGoncalves JClarkson KMegiddo NEl Khadir BHoresh L(2023)Combining data and theory for derivable scientific discovery with AI-DescartesNature Communications10.1038/s41467-023-37236-y14:1Online publication date: 12-Apr-2023
https://doi.org/10.1038/s41467-023-37236-y
Haider Cde Franca FBurlacu BKronberger G(2023)Shape-constrained multi-objective genetic programming for symbolic regressionApplied Soft Computing10.1016/j.asoc.2022.109855132(109855)Online publication date: Jan-2023
https://doi.org/10.1016/j.asoc.2022.109855
Haider CKronberger G(2023)Shape-Constrained Symbolic Regression with NSGA-IIIComputer Aided Systems Theory – EUROCAST 202210.1007/978-3-031-25312-6_19(164-172)Online publication date: 10-Feb-2023
https://doi.org/10.1007/978-3-031-25312-6_19
Haider Cde França FKronberger GBurlacu BWagner M(2022)Comparing optimistic and pessimistic constraint evaluation in shape-constrained symbolic regressionProceedings of the Genetic and Evolutionary Computation Conference10.1145/3512290.3528714(938-945)Online publication date: 8-Jul-2022
https://dl.acm.org/doi/10.1145/3512290.3528714
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents