research-article

PSB2: the second program synthesis benchmark suite

Authors:

Thomas Helmuth,

Peter KellyAuthors Info & Claims

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 785 - 794

https://doi.org/10.1145/3449639.3459285

Published: 26 June 2021 Publication History

Abstract

For the past six years, researchers in genetic programming and other program synthesis disciplines have used the General Program Synthesis Benchmark Suite to benchmark many aspects of automatic program synthesis systems. These problems have been used to make notable progress toward the goal of general program synthesis: automatically creating the types of software that human programmers code. Many of the systems that have attempted the problems in the original benchmark suite have used it to demonstrate performance improvements granted through new techniques. Over time, the suite has gradually become outdated, hindering the accurate measurement of further improvements. The field needs a new set of more difficult benchmark problems to move beyond what was previously possible.

In this paper, we describe the 25 new general program synthesis benchmark problems that make up PSB2, a new benchmark suite. These problems are curated from a variety of sources, including programming katas and college courses. We selected these problems to be more difficult than those in the original suite, and give results using PushGP showing this increase in difficulty. These new problems give plenty of room for improvement, pointing the way for the next six or more years of general program synthesis research.

References

[1]

dnolan. 2015. Code Wars: Ten-Pin Bowling. https://www.codewars.com/kata/5531abe4855bcc8d1f00004c/javascript Accessed: 2020-01-20.

[2]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[3]

Project Euler. 2002. Project Euler: Coin Sums. https://projecteuler.net/problem=31 Accessed: 2020-01-20.

[4]

Project Euler. 2008. Project Euler: Dice Game. https://projecteuler.net/problem=205 Accessed: 2020-01-20.

[5]

Austin J. Ferguson, Jose Guadalupe Hernandez, Daniel Junghans, Alexander Lalejini, Emily Dolson, and Charles Ofria. 2019. Characterizing the effects of random subsampling and dilution on Lexicase selection. In Genetic Programming Theory and Practice XVII, Wolfgang Banzhaf, Erik Goodman, Leigh Sheneman, Leonardo Trujillo, and Bill Worzel (Eds.). East Lansing, MI, USA.

[6]

Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A Grammar Design Pattern for Arbitrary Program Synthesis Problems in Genetic Programming. In EuroGP 2017: Proceedings of the 20th European Conference on Genetic Programming (LNCS, Vol. 10196). Springer Verlag, Amsterdam, 262--277.

[7]

Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Extending Program Synthesis Grammars for Grammar-Guided Genetic Programming. In 15th International Conference on Parallel Problem Solving from Nature (LNCS, Vol. 11101), Anne Auger, Carlos M. Fonseca, Nuno Lourenco, Penousal Machado, Luis Paquete, and Darrell Whitley (Eds.). Springer, Coimbra, Portugal, 197--208.

[8]

Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Towards effective semantic operators for program synthesis in genetic programming. In GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Kyoto, Japan, 1119--1126.

Digital Library

[9]

Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Towards Understanding and Refining the General Program Synthesis Benchmark Suite with Genetic Programming. In 2018 IEEE Congress on Evolutionary Computation (CEC), Marley Vellasco (Ed.). IEEE, Rio de Janeiro, Brazil. https://doi.org/

Digital Library

[10]

g964. 2015. Code Wars: Bouncing Balls. https://www.codewars.com/kata/5544c7a5cb454edb3c000047 Accessed: 2020-01-20.

[11]

Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. SIGPLAN Not. 46, 1 (Jan. 2011), 317--330.

Digital Library

[12]

Thomas Helmuth and Peter Kelly. 2019. General Program Synthesis Benchmark Suite Datasets. https://github.com/thelmuth/program-synthesis-benchmark-datasets

[13]

Thomas Helmuth and Peter Kelly. 2021. PSB2: The Second Program Synthesis Benchmark Suite.

[14]

Thomas Helmuth, Nicholas Freitag McPhee, Edward Pantridge, and Lee Spector. 2017. Improving Generalization of Evolved Programs Through Automatic Simplification. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, Berlin, Germany, 937--944.

Digital Library

[15]

Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. Program Synthesis using Uniform Mutation by Addition and Deletion. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, Kyoto, Japan, 1127--1134.

Digital Library

[16]

Thomas Helmuth, Edward Pantridge, Grace Woolson, and Lee Spector. 2020. Genetic Source Sensitivity and Transfer Learning in Genetic Programming. In Artificial Life Conference Proceedings. MIT Press, 303--311.

[17]

Thomas Helmuth and Lee Spector. 2015. General Program Synthesis Benchmark Suite. In GECCO '15: Proceedings of the 2015 conference on Genetic and Evolutionary Computation Conference. ACM, Madrid, Spain, 1039--1046. https://doi.org/

Digital Library

[18]

Thomas Helmuth and Lee Spector. 2020. Explaining and Exploiting the Advantages of Down-sampled Lexicase Selection. In Artificial Life Conference Proceedings. MIT Press, 341--349.

[19]

Thomas Helmuth and Lee Spector. 2021. Problem-solving benefits of down-sampled lexicase selection. Artificial Life (2021). In press.

[20]

Thomas Helmuth, Lee Spector, and James Matheson. 2015. Solving Uncompromising Problems with Lexicase Selection. IEEE Transactions on Evolutionary Computation 19, 5 (Oct. 2015), 630--643.

Digital Library

[21]

Thomas Helmuth, Lee Spector, Nicholas Freitag McPhee, and Saul Shanabrook. 2016. Linear Genomes for Structured Programs. In Genetic Programming Theory and Practice XIV (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.

[22]

Erik Hemberg, Jonathan Kelly, and Una-May O'Reilly. 2019. On domain knowledge and novelty to improve program synthesis performance with grammatical evolution. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Prague, Czech Republic, 1039--1046. https://doi.org/

Digital Library

[23]

Jose Guadalupe Hernandez, Alexander Lalejini, Emily Dolson, and Charles Ofria. 2019. Random subsampling improves performance in lexicase selection. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, Prague, Czech Republic, 2028--2031. https://doi.org/

Digital Library

[24]

jacobb. 2014. Code Wars: Simple Substitution Cipher Helper. https://www.codewars.com/kata/52eb114b2d55f0e69800078d Accessed: 2020-01-20.

[25]

jhoffner. 2013. Code Wars: Convert string to camel case. https://www.codewars.com/kata/517abf86da9663f1d2000003 Accessed: 2020-01-20.

[26]

Susumu Katayama. 2010. Recent Improvements of MagicHaskeller. In Approaches and Applications of Inductive Programming. Springer.

[27]

Jonathan Kelly, Erik Hemberg, and Una-May O'Reilly. 2019. Improving Genetic Programming with Novel Exploration - Exploitation Control. In EuroGP 2019: Proceedings of the 22nd European Conference on Genetic Programming, Lukas Sekanina, Ting Hu, Nuno Lourenço, Hendrik Richter, and Pablo García-Sánchez (Eds.). Springer International Publishing, 64--80.

[28]

KenKamau. 2017. Code Wars: The boolean order. https://www.codewars.com/kata/59eb1e4a0863c7ff7e000008 Accessed: 2020-01-20.

[29]

Alexander Lalejini and Charles Ofria. 2019. Tag-accessed memory for genetic programming. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, Prague, Czech Republic, 346--347. https://doi.org/

Digital Library

[30]

Trang T Le, William La Cava, Joseph D Romano, John T Gregg, Daniel J Goldberg, Praneel Chakraborty, Natasha L Ray, Daniel Himmelstein, Weixuan Fu, and Jason H Moore. 2020. PMLB v1.0: an open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058 (2020).

[31]

Jinsuk Lim and Shin Yoo. 2016. Field report: Applying monte carlo tree search for program synthesis. In International Symposium on Search Based Software Engineering. Springer, 304--310.

[32]

David Lynch, James McDermott, and Michael O'Neill. 2020. Program Synthesis in a Continuous Space using Grammars and Variational Autoencoders. In 16th International Conference on Parallel Problem Solving from Nature, Part II (LNCS, Vol. 12270), Thomas Baeck, Mike Preuss, Andre Deutz, Hao Wang2, Carola Doerr, Michael Emmerich, and Heike Trautmann (Eds.). Springer, Leiden, Holland, 33--47. https://doi.org/

Digital Library

[33]

mcclaskc. 2014. Code Wars: Validate Credit Card Number. https://www.codewars.com/kata/5418a1dd6d8216e18a0012b2 Accessed: 2020-01-20.

[34]

James McDermott, David R. White, Sean Luke, Luca Manzoni, Mauro Castelli, Leonardo Vanneschi, Wojciech Jaskowski, Krzysztof Krawiec, Robin Harper, Kenneth De Jong, and Una-May O'Reilly. 2012. Genetic programming needs better benchmarks. In GECCO '12: Proceedings of the Genetic and evolutionary computation conference. ACM, Philadelphia, Pennsylvania, USA, 791--798. https://doi.org/

Digital Library

[35]

MrZizoScream. 2018. Code Wars: Array Leaders. https://www.codewars.com/kata/5a651865fd56cb55760000e0 Accessed: 2020-01-20.

[36]

myjinxin2015. 2016. Code Wars: Fastest Code: Half it IV. https://www.codewars.com/kata/5719b28964a584476500057d Accessed: 2020-01-20.

[37]

MysteriousMagenta. 2014. Code Wars: Square Every Digit. https://www.codewars.com/kata/546e2562b03326a88e000020 Accessed: 2020-01-20.

[38]

Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (11 Dec 2017), 36.

[39]

Michael O'Neill and Anthony Brabazon. 2019. Mutational Robustness and Structural Complexity in Grammatical Evolution. In 2019 IEEE Congress on Evolutionary Computation, CEC 2019, Carlos A. Coello Coello (Ed.). IEEE Computational Intelligence Society, IEEE Press, Wellington, New Zealand, 1338--1344. https://doi.org/

Digital Library

[40]

Michael O'Neill and Lee Spector. 2020. Automatic programming: The open issue? Genetic Programming and Evolvable Machines 21, 1-2 (June 2020), 251--262. https://doi.org/ Twentieth Anniversary Issue.

Digital Library

[41]

Edward Pantridge, Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2017. On the Difficulty of Benchmarking Inductive Program Synthesis Methods. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17). ACM, Berlin, Germany, 1589--1596. https://doi.org/

Digital Library

[42]

Edward Pantridge and Lee Spector. 2020. Code Building Genetic Programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (GECCO '20). Association for Computing Machinery, internet, 994--1002. https://doi.org/

Digital Library

[43]

rb50. 2017. Code Wars: Shopping List. https://www.codewars.com/kata/596266482f9add20f70001fc Accessed: 2020-01-20.

[44]

Christopher D. Rosin. 2019. Stepping Stones to Inductive Synthesis of Low-Level Looping Programs. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI '19, Vol. 33). AAAI Press, Palo Alto, California USA.

Digital Library

[45]

RVdeKoning. 2015. Code Wars: Greatest common divisor. https://www.codewars.com/kata/5500d54c2ebe0a8e8a0003fd/python Accessed: 2020-01-20.

[46]

Anil Kumar Saini and Lee Spector. 2019. Using Modularity Metrics as Design Features to Guide Evolution in Genetic Programming. In Genetic Programming Theory and Practice XVII, Wolfgang Banzhaf, Erik Goodman, Leigh Sheneman, Leonardo Trujillo, and Bill Worzel (Eds.). Springer, East Lansing, MI, USA, 165--180. https://doi.org/

[47]

Anil Kumar Saini and Lee Spector. 2020. Why and When Are Loops Useful in Genetic Programming?. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion (GECCO '20). Association for Computing Machinery, internet, 247--248. https://doi.org/

Digital Library

[48]

Shivo. 2015. Code Wars: Get the Middle Character. https://www.codewars.com/kata/56747fd5cb988479af000028 Accessed: 2020-01-20.

[49]

smile67. 2016. Code Wars: Text Search. https://www.codewars.com/kata/56b78faebd06e61870001191 Accessed: 2020-01-20.

[50]

Dominik Sobania and Franz Rothlauf. 2020. Challenges of Program Synthesis with Grammatical Evolution. In EuroGP 2020: Proceedings of the 23rd European Conference on Genetic Programming (LNCS, Vol. 12101), Ting Hu, Nuno Lourenco, and Eric Medvet (Eds.). Springer Verlag, Seville, Spain, 211--227. https://doi.org/

Digital Library

[51]

Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The Push3 execution stack and the evolution of control. In GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, Vol. 2. ACM Press, Washington DC, USA, 1689--1696. https://doi.org/

Digital Library

[52]

Lee Spector and Alan Robinson. 2002. Genetic Programming and Autoconstructive Evolution with the Push Programming Language. Genetic Programming and Evolvable Machines 3, 1 (March 2002), 7--40. https://doi.org/

Digital Library

[53]

StephenLastname2. 2017. Code Wars: Distance between two points. https://www.codewars.com/kata/5a0b72484bebaefe60001867 Accessed: 2020-01-20.

[54]

stephenyu. 2014. Code Wars: Fizz Buzz. https://www.codewars.com/kata/5300901726d12b80e8000498 Accessed: 2020-01-20.

[55]

Eric Wastl. 2015. Advent of Code: Not Quite Lisp. https://adventofcode.com/2015/day/1 Accessed: 2020-01-20.

[56]

Eric Wastl. 2017. Advent of Code: Inverse Captcha. https://adventofcode.com/2017/day/1 Accessed: 2020-01-20.

[57]

Eric Wastl. 2019. Advent of Code: The Tyranny of the Rocket Empire. https://adventofcode.com/2019/day/1 Accessed: 2020-01-20.

[58]

Eric Wastl. 2020. Advent of Code: Report Repair. https://adventofcode.com/2020/day/1 Accessed: 2020-01-20.

[59]

David R. White, James Mcdermott, Mauro Castelli, Luca Manzoni, Brian W. Goldman, Gabriel Kronberger, Wojciech Jaśkowski, Una-May O'Reilly, and Sean Luke. 2013. Better GP benchmarks: community survey results and proposals. Genetic Programming and Evolvable Machines 14, 1 (March 2013), 3--29.

Digital Library

[60]

John Woodward, Simon Martin, and Jerry Swan. 2014. Benchmarks that matter for genetic programming. In GECCO 2014 4th workshop on evolutionary computation for the automated design of algorithms. ACM, Vancouver, BC, Canada, 1397--1404. https://doi.org/

Digital Library

[61]

xDranik. 2013. Code Wars: Stop gninnipS My sdroW! https://www.codewars.com/kata/5264d2b162488dc400000001 Accessed: 2020-01-20.

Cited By

De La Torre CCortacero KCussat-Blanc SWilson DLi XHandl J(2024)Multimodal Adaptive Graph EvolutionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654347(499-502)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654347
Lalejini ASanson MGarbus JMoreno MDolson ELi XHandl J(2024)Runtime phylogenetic analysis enables extreme subsampling for test-based problemsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654208(511-514)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654208
Boldi RBao ABriesch MHelmuth TSobania DSpector LLalejini ALi XHandl J(2024)A Comprehensive Analysis of Down-sampling for Genetic Programming-based Program SynthesisProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654134(487-490)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654134
Show More Cited By

Index Terms

PSB2: the second program synthesis benchmark suite
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming

Recommendations

Applying genetic programming to PSB2: the next generation program synthesis benchmark suite
Abstract
For the past seven years, researchers in genetic programming and other program synthesis disciplines have used the General Program Synthesis Benchmark Suite (PSB1) to benchmark many aspects of systems that conduct programming by example, where the ...
SPEC HPG benchmarks for high-performance systems

In this paper, we discuss the results and characteristics of the benchmark suites maintained by the Standard Performance Evaluation Corporation's (SPEC) High-Performance Group (HPG). Currently, SPECHPGhas two lines of benchmark suites for measuring ...
General Boolean Function Benchmark Suite
FOGA '23: Proceedings of the 17th ACM/SIGEVO Conference on Foundations of Genetic Algorithms

Just over a decade ago, the first comprehensive review on the state of benchmarking in Genetic Programming (GP) analyzed the mismatch between the problems that are used to test the performance of GP systems and real-world problems. Since then, several ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference

June 2021

1219 pages

ISBN:9781450383509

DOI:10.1145/3449639

Editor:
Francisco Chicano
University of Malaga
,
General Chair:
Krzysztof Krawiec
Poznan University of Technology

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '21

Sponsor:

SIGEVO

GECCO '21: Genetic and Evolutionary Computation Conference

July 10 - 14, 2021

Lille, France

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
188
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)6

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

De La Torre CCortacero KCussat-Blanc SWilson DLi XHandl J(2024)Multimodal Adaptive Graph EvolutionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654347(499-502)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654347
Lalejini ASanson MGarbus JMoreno MDolson ELi XHandl J(2024)Runtime phylogenetic analysis enables extreme subsampling for test-based problemsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654208(511-514)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654208
Boldi RBao ABriesch MHelmuth TSobania DSpector LLalejini ALi XHandl J(2024)A Comprehensive Analysis of Down-sampling for Genetic Programming-based Program SynthesisProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654134(487-490)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638530.3654134
Ni ASpector LLi XHandl J(2024)Effective Adaptive Mutation Rates for Program SynthesisProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654135(952-960)Online publication date: 14-Jul-2024
https://dl.acm.org/doi/10.1145/3638529.3654135
Hsu TChang CYu T(2024)Program Synthesis on Single-Layer Loop Behavior in Pure Functional Programming2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612128(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/CEC60901.2024.10612128
Ságodi ZSiket IFerenc R(2024)Methodology for Code Synthesis Evaluation of LLMs Presented by a Case Study of ChatGPT and CopilotIEEE Access10.1109/ACCESS.2024.340385812(72303-72316)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3403858
Spector LDing LBoldi R(2024)ParticularityGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_9(159-176)Online publication date: 18-Feb-2024
https://doi.org/10.1007/978-981-99-8413-8_9
Freitag McPhee NLussier R(2024)The Impact of Step Limits on Generalization and Stability in Software SynthesisGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_5(87-104)Online publication date: 18-Feb-2024
https://doi.org/10.1007/978-981-99-8413-8_5
Dolson ELalejini A(2024)Reachability Analysis for Lexicase Selection via Community Assembly GraphsGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_15(283-301)Online publication date: 18-Feb-2024
https://doi.org/10.1007/978-981-99-8413-8_15
Lalejini AMoreno MHernandez JDolson E(2024)Phylogeny-Informed Fitness Estimation for Test-Based Parent SelectionGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_13(241-261)Online publication date: 18-Feb-2024
https://doi.org/10.1007/978-981-99-8413-8_13
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents