Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3449639.3459285acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

PSB2: the second program synthesis benchmark suite

Published: 26 June 2021 Publication History

Abstract

For the past six years, researchers in genetic programming and other program synthesis disciplines have used the General Program Synthesis Benchmark Suite to benchmark many aspects of automatic program synthesis systems. These problems have been used to make notable progress toward the goal of general program synthesis: automatically creating the types of software that human programmers code. Many of the systems that have attempted the problems in the original benchmark suite have used it to demonstrate performance improvements granted through new techniques. Over time, the suite has gradually become outdated, hindering the accurate measurement of further improvements. The field needs a new set of more difficult benchmark problems to move beyond what was previously possible.
In this paper, we describe the 25 new general program synthesis benchmark problems that make up PSB2, a new benchmark suite. These problems are curated from a variety of sources, including programming katas and college courses. We selected these problems to be more difficult than those in the original suite, and give results using PushGP showing this increase in difficulty. These new problems give plenty of room for improvement, pointing the way for the next six or more years of general program synthesis research.

References

[1]
dnolan. 2015. Code Wars: Ten-Pin Bowling. https://www.codewars.com/kata/5531abe4855bcc8d1f00004c/javascript Accessed: 2020-01-20.
[2]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[3]
Project Euler. 2002. Project Euler: Coin Sums. https://projecteuler.net/problem=31 Accessed: 2020-01-20.
[4]
Project Euler. 2008. Project Euler: Dice Game. https://projecteuler.net/problem=205 Accessed: 2020-01-20.
[5]
Austin J. Ferguson, Jose Guadalupe Hernandez, Daniel Junghans, Alexander Lalejini, Emily Dolson, and Charles Ofria. 2019. Characterizing the effects of random subsampling and dilution on Lexicase selection. In Genetic Programming Theory and Practice XVII, Wolfgang Banzhaf, Erik Goodman, Leigh Sheneman, Leonardo Trujillo, and Bill Worzel (Eds.). East Lansing, MI, USA.
[6]
Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2017. A Grammar Design Pattern for Arbitrary Program Synthesis Problems in Genetic Programming. In EuroGP 2017: Proceedings of the 20th European Conference on Genetic Programming (LNCS, Vol. 10196). Springer Verlag, Amsterdam, 262--277.
[7]
Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Extending Program Synthesis Grammars for Grammar-Guided Genetic Programming. In 15th International Conference on Parallel Problem Solving from Nature (LNCS, Vol. 11101), Anne Auger, Carlos M. Fonseca, Nuno Lourenco, Penousal Machado, Luis Paquete, and Darrell Whitley (Eds.). Springer, Coimbra, Portugal, 197--208.
[8]
Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Towards effective semantic operators for program synthesis in genetic programming. In GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Kyoto, Japan, 1119--1126.
[9]
Stefan Forstenlechner, David Fagan, Miguel Nicolau, and Michael O'Neill. 2018. Towards Understanding and Refining the General Program Synthesis Benchmark Suite with Genetic Programming. In 2018 IEEE Congress on Evolutionary Computation (CEC), Marley Vellasco (Ed.). IEEE, Rio de Janeiro, Brazil. https://doi.org/
[10]
g964. 2015. Code Wars: Bouncing Balls. https://www.codewars.com/kata/5544c7a5cb454edb3c000047 Accessed: 2020-01-20.
[11]
Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. SIGPLAN Not. 46, 1 (Jan. 2011), 317--330.
[12]
Thomas Helmuth and Peter Kelly. 2019. General Program Synthesis Benchmark Suite Datasets. https://github.com/thelmuth/program-synthesis-benchmark-datasets
[13]
Thomas Helmuth and Peter Kelly. 2021. PSB2: The Second Program Synthesis Benchmark Suite.
[14]
Thomas Helmuth, Nicholas Freitag McPhee, Edward Pantridge, and Lee Spector. 2017. Improving Generalization of Evolved Programs Through Automatic Simplification. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '17). ACM, Berlin, Germany, 937--944.
[15]
Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2018. Program Synthesis using Uniform Mutation by Addition and Deletion. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, Kyoto, Japan, 1127--1134.
[16]
Thomas Helmuth, Edward Pantridge, Grace Woolson, and Lee Spector. 2020. Genetic Source Sensitivity and Transfer Learning in Genetic Programming. In Artificial Life Conference Proceedings. MIT Press, 303--311.
[17]
Thomas Helmuth and Lee Spector. 2015. General Program Synthesis Benchmark Suite. In GECCO '15: Proceedings of the 2015 conference on Genetic and Evolutionary Computation Conference. ACM, Madrid, Spain, 1039--1046. https://doi.org/
[18]
Thomas Helmuth and Lee Spector. 2020. Explaining and Exploiting the Advantages of Down-sampled Lexicase Selection. In Artificial Life Conference Proceedings. MIT Press, 341--349.
[19]
Thomas Helmuth and Lee Spector. 2021. Problem-solving benefits of down-sampled lexicase selection. Artificial Life (2021). In press.
[20]
Thomas Helmuth, Lee Spector, and James Matheson. 2015. Solving Uncompromising Problems with Lexicase Selection. IEEE Transactions on Evolutionary Computation 19, 5 (Oct. 2015), 630--643.
[21]
Thomas Helmuth, Lee Spector, Nicholas Freitag McPhee, and Saul Shanabrook. 2016. Linear Genomes for Structured Programs. In Genetic Programming Theory and Practice XIV (Genetic and Evolutionary Computation). Springer, Ann Arbor, USA.
[22]
Erik Hemberg, Jonathan Kelly, and Una-May O'Reilly. 2019. On domain knowledge and novelty to improve program synthesis performance with grammatical evolution. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Prague, Czech Republic, 1039--1046. https://doi.org/
[23]
Jose Guadalupe Hernandez, Alexander Lalejini, Emily Dolson, and Charles Ofria. 2019. Random subsampling improves performance in lexicase selection. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, Prague, Czech Republic, 2028--2031. https://doi.org/
[24]
jacobb. 2014. Code Wars: Simple Substitution Cipher Helper. https://www.codewars.com/kata/52eb114b2d55f0e69800078d Accessed: 2020-01-20.
[25]
jhoffner. 2013. Code Wars: Convert string to camel case. https://www.codewars.com/kata/517abf86da9663f1d2000003 Accessed: 2020-01-20.
[26]
Susumu Katayama. 2010. Recent Improvements of MagicHaskeller. In Approaches and Applications of Inductive Programming. Springer.
[27]
Jonathan Kelly, Erik Hemberg, and Una-May O'Reilly. 2019. Improving Genetic Programming with Novel Exploration - Exploitation Control. In EuroGP 2019: Proceedings of the 22nd European Conference on Genetic Programming, Lukas Sekanina, Ting Hu, Nuno Lourenço, Hendrik Richter, and Pablo García-Sánchez (Eds.). Springer International Publishing, 64--80.
[28]
KenKamau. 2017. Code Wars: The boolean order. https://www.codewars.com/kata/59eb1e4a0863c7ff7e000008 Accessed: 2020-01-20.
[29]
Alexander Lalejini and Charles Ofria. 2019. Tag-accessed memory for genetic programming. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, Prague, Czech Republic, 346--347. https://doi.org/
[30]
Trang T Le, William La Cava, Joseph D Romano, John T Gregg, Daniel J Goldberg, Praneel Chakraborty, Natasha L Ray, Daniel Himmelstein, Weixuan Fu, and Jason H Moore. 2020. PMLB v1.0: an open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058 (2020).
[31]
Jinsuk Lim and Shin Yoo. 2016. Field report: Applying monte carlo tree search for program synthesis. In International Symposium on Search Based Software Engineering. Springer, 304--310.
[32]
David Lynch, James McDermott, and Michael O'Neill. 2020. Program Synthesis in a Continuous Space using Grammars and Variational Autoencoders. In 16th International Conference on Parallel Problem Solving from Nature, Part II (LNCS, Vol. 12270), Thomas Baeck, Mike Preuss, Andre Deutz, Hao Wang2, Carola Doerr, Michael Emmerich, and Heike Trautmann (Eds.). Springer, Leiden, Holland, 33--47. https://doi.org/
[33]
mcclaskc. 2014. Code Wars: Validate Credit Card Number. https://www.codewars.com/kata/5418a1dd6d8216e18a0012b2 Accessed: 2020-01-20.
[34]
James McDermott, David R. White, Sean Luke, Luca Manzoni, Mauro Castelli, Leonardo Vanneschi, Wojciech Jaskowski, Krzysztof Krawiec, Robin Harper, Kenneth De Jong, and Una-May O'Reilly. 2012. Genetic programming needs better benchmarks. In GECCO '12: Proceedings of the Genetic and evolutionary computation conference. ACM, Philadelphia, Pennsylvania, USA, 791--798. https://doi.org/
[35]
MrZizoScream. 2018. Code Wars: Array Leaders. https://www.codewars.com/kata/5a651865fd56cb55760000e0 Accessed: 2020-01-20.
[36]
myjinxin2015. 2016. Code Wars: Fastest Code: Half it IV. https://www.codewars.com/kata/5719b28964a584476500057d Accessed: 2020-01-20.
[37]
MysteriousMagenta. 2014. Code Wars: Square Every Digit. https://www.codewars.com/kata/546e2562b03326a88e000020 Accessed: 2020-01-20.
[38]
Randal S. Olson, William La Cava, Patryk Orzechowski, Ryan J. Urbanowicz, and Jason H. Moore. 2017. PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining 10, 1 (11 Dec 2017), 36.
[39]
Michael O'Neill and Anthony Brabazon. 2019. Mutational Robustness and Structural Complexity in Grammatical Evolution. In 2019 IEEE Congress on Evolutionary Computation, CEC 2019, Carlos A. Coello Coello (Ed.). IEEE Computational Intelligence Society, IEEE Press, Wellington, New Zealand, 1338--1344. https://doi.org/
[40]
Michael O'Neill and Lee Spector. 2020. Automatic programming: The open issue? Genetic Programming and Evolvable Machines 21, 1-2 (June 2020), 251--262. https://doi.org/ Twentieth Anniversary Issue.
[41]
Edward Pantridge, Thomas Helmuth, Nicholas Freitag McPhee, and Lee Spector. 2017. On the Difficulty of Benchmarking Inductive Program Synthesis Methods. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '17). ACM, Berlin, Germany, 1589--1596. https://doi.org/
[42]
Edward Pantridge and Lee Spector. 2020. Code Building Genetic Programming. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (GECCO '20). Association for Computing Machinery, internet, 994--1002. https://doi.org/
[43]
rb50. 2017. Code Wars: Shopping List. https://www.codewars.com/kata/596266482f9add20f70001fc Accessed: 2020-01-20.
[44]
Christopher D. Rosin. 2019. Stepping Stones to Inductive Synthesis of Low-Level Looping Programs. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI '19, Vol. 33). AAAI Press, Palo Alto, California USA.
[45]
RVdeKoning. 2015. Code Wars: Greatest common divisor. https://www.codewars.com/kata/5500d54c2ebe0a8e8a0003fd/python Accessed: 2020-01-20.
[46]
Anil Kumar Saini and Lee Spector. 2019. Using Modularity Metrics as Design Features to Guide Evolution in Genetic Programming. In Genetic Programming Theory and Practice XVII, Wolfgang Banzhaf, Erik Goodman, Leigh Sheneman, Leonardo Trujillo, and Bill Worzel (Eds.). Springer, East Lansing, MI, USA, 165--180. https://doi.org/
[47]
Anil Kumar Saini and Lee Spector. 2020. Why and When Are Loops Useful in Genetic Programming?. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion (GECCO '20). Association for Computing Machinery, internet, 247--248. https://doi.org/
[48]
Shivo. 2015. Code Wars: Get the Middle Character. https://www.codewars.com/kata/56747fd5cb988479af000028 Accessed: 2020-01-20.
[49]
smile67. 2016. Code Wars: Text Search. https://www.codewars.com/kata/56b78faebd06e61870001191 Accessed: 2020-01-20.
[50]
Dominik Sobania and Franz Rothlauf. 2020. Challenges of Program Synthesis with Grammatical Evolution. In EuroGP 2020: Proceedings of the 23rd European Conference on Genetic Programming (LNCS, Vol. 12101), Ting Hu, Nuno Lourenco, and Eric Medvet (Eds.). Springer Verlag, Seville, Spain, 211--227. https://doi.org/
[51]
Lee Spector, Jon Klein, and Maarten Keijzer. 2005. The Push3 execution stack and the evolution of control. In GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, Vol. 2. ACM Press, Washington DC, USA, 1689--1696. https://doi.org/
[52]
Lee Spector and Alan Robinson. 2002. Genetic Programming and Autoconstructive Evolution with the Push Programming Language. Genetic Programming and Evolvable Machines 3, 1 (March 2002), 7--40. https://doi.org/
[53]
StephenLastname2. 2017. Code Wars: Distance between two points. https://www.codewars.com/kata/5a0b72484bebaefe60001867 Accessed: 2020-01-20.
[54]
stephenyu. 2014. Code Wars: Fizz Buzz. https://www.codewars.com/kata/5300901726d12b80e8000498 Accessed: 2020-01-20.
[55]
Eric Wastl. 2015. Advent of Code: Not Quite Lisp. https://adventofcode.com/2015/day/1 Accessed: 2020-01-20.
[56]
Eric Wastl. 2017. Advent of Code: Inverse Captcha. https://adventofcode.com/2017/day/1 Accessed: 2020-01-20.
[57]
Eric Wastl. 2019. Advent of Code: The Tyranny of the Rocket Empire. https://adventofcode.com/2019/day/1 Accessed: 2020-01-20.
[58]
Eric Wastl. 2020. Advent of Code: Report Repair. https://adventofcode.com/2020/day/1 Accessed: 2020-01-20.
[59]
David R. White, James Mcdermott, Mauro Castelli, Luca Manzoni, Brian W. Goldman, Gabriel Kronberger, Wojciech Jaśkowski, Una-May O'Reilly, and Sean Luke. 2013. Better GP benchmarks: community survey results and proposals. Genetic Programming and Evolvable Machines 14, 1 (March 2013), 3--29.
[60]
John Woodward, Simon Martin, and Jerry Swan. 2014. Benchmarks that matter for genetic programming. In GECCO 2014 4th workshop on evolutionary computation for the automated design of algorithms. ACM, Vancouver, BC, Canada, 1397--1404. https://doi.org/
[61]
xDranik. 2013. Code Wars: Stop gninnipS My sdroW! https://www.codewars.com/kata/5264d2b162488dc400000001 Accessed: 2020-01-20.

Cited By

View all
  • (2024)Multimodal Adaptive Graph EvolutionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654347(499-502)Online publication date: 14-Jul-2024
  • (2024)Runtime phylogenetic analysis enables extreme subsampling for test-based problemsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654208(511-514)Online publication date: 14-Jul-2024
  • (2024)A Comprehensive Analysis of Down-sampling for Genetic Programming-based Program SynthesisProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654134(487-490)Online publication date: 14-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference
June 2021
1219 pages
ISBN:9781450383509
DOI:10.1145/3449639
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic program synthesis
  2. benchmarking
  3. genetic programming

Qualifiers

  • Research-article

Conference

GECCO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)6
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Multimodal Adaptive Graph EvolutionProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654347(499-502)Online publication date: 14-Jul-2024
  • (2024)Runtime phylogenetic analysis enables extreme subsampling for test-based problemsProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654208(511-514)Online publication date: 14-Jul-2024
  • (2024)A Comprehensive Analysis of Down-sampling for Genetic Programming-based Program SynthesisProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3638530.3654134(487-490)Online publication date: 14-Jul-2024
  • (2024)Effective Adaptive Mutation Rates for Program SynthesisProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654135(952-960)Online publication date: 14-Jul-2024
  • (2024)Program Synthesis on Single-Layer Loop Behavior in Pure Functional Programming2024 IEEE Congress on Evolutionary Computation (CEC)10.1109/CEC60901.2024.10612128(1-8)Online publication date: 30-Jun-2024
  • (2024)Methodology for Code Synthesis Evaluation of LLMs Presented by a Case Study of ChatGPT and CopilotIEEE Access10.1109/ACCESS.2024.340385812(72303-72316)Online publication date: 2024
  • (2024)ParticularityGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_9(159-176)Online publication date: 18-Feb-2024
  • (2024)The Impact of Step Limits on Generalization and Stability in Software SynthesisGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_5(87-104)Online publication date: 18-Feb-2024
  • (2024)Reachability Analysis for Lexicase Selection via Community Assembly GraphsGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_15(283-301)Online publication date: 18-Feb-2024
  • (2024)Phylogeny-Informed Fitness Estimation for Test-Based Parent SelectionGenetic Programming Theory and Practice XX10.1007/978-981-99-8413-8_13(241-261)Online publication date: 18-Feb-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media