research-article

Open access

Top-Down Synthesis for Library Learning

Authors:

Matthew Bowers,

Theo X. Olausson,

Joshua B. Tenenbaum,

Armando Solar-LezamaAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 7, Issue POPL

Article No.: 41, Pages 1182 - 1213

https://doi.org/10.1145/3571234

Published: 11 January 2023 Publication History

Abstract

This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from initial DSL primitives, using syntactic pattern matching of intermediate abstractions to intelligently prune the search space and guide the algorithm towards abstractions that maximally capture shared structures in the corpus. We present an implementation of the approach in a tool called Stitch and evaluate it against the state-of-the-art deductive library learning algorithm from DreamCoder. Our evaluation shows that Stitch is 3-4 orders of magnitude faster and uses 2 orders of magnitude less memory while maintaining comparable or better library quality (as measured by compressivity). We also demonstrate Stitch’s scalability on corpora containing hundreds of complex programs that are intractable with prior deductive approaches and show empirically that it is robust to terminating the search procedure early—further allowing it to scale to challenging datasets by means of early stopping.

Supplementary Material

Auxiliary Archive (popl23main-p278-p-archive.zip)

Supplement for the paper Top-Down Synthesis for Library Learning (POPL 2023). stitch_appendix.pdf: Appendix for the paper stitch.v: Coq proof of the completeness of LambdaUnify from the paper Top-Down Synthesis for Library Learning (POPL 2023). Proven with CoqIDE 8.15.2.

Download
1.48 MB

References

[1]

Martin Abadi, Luca Cardelli, P-L Curien, and J-J Lévy. 1989. Explicit substitutions. In Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 31–46.

Digital Library

[2]

Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 51, 4 (2018), 1–37.

Digital Library

[3]

Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the 22nd acm sigsoft international symposium on foundations of software engineering. 472–483.

Digital Library

[4]

Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2016. Deepcoder: Learning to write programs. arXiv preprint arXiv:1611.01989.

[5]

Matthew Bowers, Olausson, Wong, Grand, Tenenbaum, Ellis, and Solar-Lezama. 2022. Artifact for "Top-Down Synthesis For Library Learning". https://doi.org/10.5281/zenodo.7151663

Digital Library

[6]

Rod M Burstall and John Darlington. 1977. A transformation system for developing recursive programs. Journal of the ACM (JACM), 24, 1 (1977), 44–67.

Digital Library

[7]

David Cao, Rose Kunkel, Chandrakana Nandi, Max Willsey, Zachary Tatlock, and Nadia Polikarpova. 2023. babble: Learning Better Abstractions with E-Graphs and Anti-Unification. Proc. ACM Program. Lang., https://doi.org/10.1145/3571207

Digital Library

[8]

Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-modal synthesis of regular expressions. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 487–502. https://doi.org/10.1145/3385412.3385988

Digital Library

[9]

Xinyun Chen, Chang Liu, and Dawn Song. 2018. Execution-guided neural program synthesis. In International Conference on Learning Representations.

[10]

Adam Chlipala, Benjamin Delaware, Samuel Duchovni, Jason Gross, Clément Pit-Claudel, Sorawit Suriyakarn, Peng Wang, and Katherine Ye. 2017. The end of history? Using a proof assistant to replace language design with library design. In 2nd Summit on Advances in Programming Languages (SNAPL 2017).

[11]

Geoffrey Chu and Peter J. Stuckey. 2015. Dominance breaking constraints. Constraints An Int. J., 20, 2 (2015), 155–182. https://doi.org/10.1007/s10601-014-9173-7

Digital Library

[12]

Andrew Cropper. 2019. Playgol: Learning programs through play. arXiv preprint arXiv:1904.08993.

[13]

Nicolaas Govert de Bruijn. 1972. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser theorem. In Indagationes Mathematicae (Proceedings). 75, 381–392.

[14]

Eyal Dechter, Jonathan Malmaud, Ryan Prescott Adams, and Joshua B Tenenbaum. 2013. Bootstrap learning via modular concept discovery. In Proceedings of the International Joint Conference on Artificial Intelligence.

[15]

Gilles Dowek, Thérèse Hardin, and Claude Kirchner. 1995. Higher-Order Unification via Explicit Substitutions. In Proceedings of the Tenth Annual Symposium on Logic in Computer Science, D. Kozen (Ed.). IEEE Computer Society Press, San Diego, California. 366–374.

[16]

Gilles Dowek, Thérèse Hardin, Claude Kirchner, and Frank Pfenning. 1996. Unification via Explicit Substitutions: The Case of Higher-Order Patterns. In Proceedings of the Joint International Conference and Symposium on Logic Programming, M. Maher (Ed.). MIT Press, Bonn, Germany. 259–273.

[17]

Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum. 2020. Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. arXiv preprint arXiv:2006.08381.

[18]

Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B Tenenbaum. 2021. Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 835–850.

Digital Library

[19]

Kevin M Ellis, Lucas E Morales, Mathias Sablé-Meyer, Armando Solar Lezama, and Joshua B Tenenbaum. 2018. Library learning for neurally-guided bayesian program induction.

[20]

Matthias Felleisen and Robert Hieb. 1992. The Revised Report on the Syntactic Theories of Sequential Control and State. Theor. Comput. Sci., 103, 2 (1992), sep, 235–271. issn:0304-3975 https://doi.org/10.1016/0304-3975(92)90014-7

Digital Library

[21]

John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. ACM SIGPLAN Notices, 50, 6 (2015), 229–239.

Digital Library

[22]

Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, SM Ali Eslami, and Oriol Vinyals. 2018. Synthesizing programs for images using reinforced adversarial learning. In International Conference on Machine Learning. 1666–1675.

[23]

Sumit Gulwani, José Hernández-Orallo, Emanuel Kitzelmann, Stephen H Muggleton, Ute Schmid, and Benjamin Zorn. 2015. Inductive programming meets the real world. Commun. ACM, 58, 11 (2015), 90–99.

Digital Library

[24]

Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Trans. Syst. Sci. Cybern., 4, 2 (1968), 100–107. https://doi.org/10.1109/TSSC.1968.300136

[25]

Robert John Henderson. 2013. Cumulative learning in the lambda calculus.

[26]

Gérard Huet. 1975. A Unification Algorithm for Typed λ -Calculus. Theoretical Computer Science, 1 (1975), 27–57.

[27]

Irvin Hwang, Andreas Stuhlmüller, and Noah D Goodman. 2011. Inducing probabilistic programs by Bayesian program merging. arXiv preprint arXiv:1110.5667.

[28]

Toshihide Ibaraki. 1977. The Power of Dominance Relations in Branch-and-Bound Algorithms. J. ACM, 24, 2 (1977), 264–279. https://doi.org/10.1145/322003.322010

Digital Library

[29]

Srinivasan Iyer, Alvin Cheung, and Luke Zettlemoyer. 2019. Learning programmatic idioms for scalable semantic parsing. arXiv preprint arXiv:1904.09086.

[30]

Thomas Johnsson. 1985. Lambda Lifting: Treansforming Programs to Recursive Equations. In Functional Programming Languages and Computer Architecture, FPCA 1985, Nancy, France, September 16-19, 1985, Proceedings, Jean-Pierre Jouannaud (Ed.) (Lecture Notes in Computer Science, Vol. 201). Springer, 190–203. https://doi.org/10.1007/3-540-15975-4_37

[31]

R Kenny Jones, David Charatan, Paul Guerrero, Niloy J Mitra, and Daniel Ritchie. 2021. ShapeMOD: macro operation discovery for 3D shape programs. ACM Transactions on Graphics (TOG), 40, 4 (2021), 1–16.

Digital Library

[32]

Manos Koukoutos, Mukund Raghothaman, Etienne Kneuss, and Viktor Kuncak. 2017. On repair with probabilistic attribute grammars. arXiv preprint arXiv:1707.04148.

[33]

Michihiro Kuramochi and George Karypis. 2001. Frequent Subgraph Discovery. In Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA, Nick Cercone, Tsau Young Lin, and Xindong Wu (Eds.). IEEE Computer Society, 313–320. https://doi.org/10.1109/ICDM.2001.989534

[34]

Michihiro Kuramochi and George Karypis. 2004. Finding Frequent Patterns in a Large Sparse Graph. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004, Michael W. Berry, Umeshwar Dayal, Chandrika Kamath, and David B. Skillicorn (Eds.). SIAM, 345–356. https://doi.org/10.1137/1.9781611972740.32

[35]

A. H. Land and A. G. Doig. 1960. An Automatic Method of Solving Discrete Programming Problems. Econometrica, 28, 3 (1960), 497–520. issn:00129682, 14680262 http://www.jstor.org/stable/1910129

[36]

Tessa Lau, Steven A Wolfman, Pedro Domingos, and Daniel S Weld. 2003. Programming by demonstration using version space algebra. Machine Learning, 53, 1 (2003), 111–156.

Digital Library

[37]

Miguel Lázaro-Gredilla, Dianhuan Lin, J Swaroop Guntupalli, and Dileep George. 2019. Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. Science Robotics, 4, 26 (2019), eaav3150.

[38]

Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2016, Amsterdam, The Netherlands, October 31 - November 1, 2016, Bernd Fischer and Ina Schaefer (Eds.). ACM, 70–80. https://doi.org/10.1145/2993236.2993244

Digital Library

[39]

Percy Liang, Michael I Jordan, and Dan Klein. 2010. Learning programs: A hierarchical Bayesian approach. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 639–646.

[40]

Dianhuan Lin, Eyal Dechter, Kevin Ellis, Joshua B Tenenbaum, and Stephen H Muggleton. 2014. Bias reformulation for one-shot function induction.

[41]

Zohar Manna and Richard Waldinger. 1980. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems (TOPLAS), 2, 1 (1980), 90–121.

Digital Library

[42]

Dale Miller. 1991. A Logic Programming Language with Lambda-Abstraction, Function Variables, and Simple Unification. Journal of Logic and Computation, 1, 4 (1991), 497–536.

[43]

Dale Miller. 1992. Unification under a Mixed Prefix. Journal of Symbolic Computation, 14 (1992), 321–358.

Digital Library

[44]

R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network Motifs: Simple Building Blocks of Complex Networks. Science, 298, 5594 (2002), 824–827. https://doi.org/10.1126/science.298.5594.824 arxiv:https://www.science.org/doi/pdf/10.1126/science.298.5594.824.

[45]

Tom M Mitchell. 1977. Version spaces: A candidate elimination approach to rule learning. In Proceedings of the 5th international joint conference on Artificial intelligence-Volume 1. 305–310.

[46]

David R. Morrison, Sheldon H. Jacobson, Jason J. Sauppe, and Edward C. Sewell. 2016. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discret. Optim., 19 (2016), 79–102. https://doi.org/10.1016/j.disopt.2016.01.005

Digital Library

[47]

Maxwell Nye, Yewen Pu, Matthew Bowers, Jacob Andreas, Joshua B Tenenbaum, and Armando Solar-Lezama. 2021. Representing Partial Programs with Blended Abstract Semantics. In International Conference on Learning Representations.

[48]

Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. ACM SIGPLAN Notices, 51, 6 (2016), 522–538.

Digital Library

[49]

Oleksandr Polozov and Sumit Gulwani. 2015. Flashmeta: A framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 107–126.

Digital Library

[50]

Falk Schreiber and Henning Schwöbbermeyer. 2005. Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks. Trans. Comp. Sys. Biology, 3 (2005), 89–104. https://doi.org/10.1007/11599128_7

[51]

Ameesh Shah, Eric Zhan, Jennifer Sun, Abhinav Verma, Yisong Yue, and Swarat Chaudhuri. 2020. Learning Differentiable Programs with Admissible Neural Heuristics. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 4940–4952. https://proceedings.neurips.cc/paper/2020/file/342285bb2a8cadef22f667eeb6a63732-Paper.pdf

[52]

Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal, 27, 3 (1948), 379–423.

[53]

Eui Chul Shin, Miltiadis Allamanis, Marc Brockschmidt, and Alex Polozov. 2019. Program synthesis and semantic parsing with learned code idioms. Advances in Neural Information Processing Systems, 32 (2019).

[54]

Thoralf Skolem. 1920. Logisch-kombinatorische Untersuchungen über die Erfüllbarkeit oder Bewiesbarkeit mathematischer Sätze nebst einem Theorem über dichte Mengen.

[55]

Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. Egg: Fast and Extensible Equality Saturation. Proc. ACM Program. Lang., 5, POPL (2021), Article 23, jan, 29 pages. https://doi.org/10.1145/3434304

Digital Library

[56]

Catherine Wong, Kevin M Ellis, Joshua Tenenbaum, and Jacob Andreas. 2021. Leveraging language to learn program abstractions and search heuristics. In International Conference on Machine Learning. 11193–11204.

[57]

Catherine Wong, William McCarthy, Gabriel Grand, Jacob Andreas, Joshua B Tenenbaum, Robert Hawkins, and Judy Fan. 2022. Identifying concept libraries from language about object structure. In CogSci. To appear.

Cited By

Tao NVentresque ANallur VSaber T(2024)Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic ProgrammingAlgorithms10.3390/a1707028717:7(287)Online publication date: 1-Jul-2024
https://doi.org/10.3390/a17070287
Lubin JFerguson JYe KYim JChasins S(2024)Equivalence by Canonicalization for Synthesis-Backed RefactoringProceedings of the ACM on Programming Languages10.1145/36564538:PLDI(1879-1904)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656453
Patton NRahmani KMissula MBiswas JDillig I(2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3632860
Show More Cited By

Index Terms

Top-Down Synthesis for Library Learning
1. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming
2. Theory of computation
  1. Design and analysis of algorithms
    1. Algorithm design techniques
      1. Branch-and-bound

Recommendations

Optimizing synthesis with metasketches
POPL '16

Many advanced programming tools---for both end-users and expert developers---rely on program synthesis to automatically generate implementations from high-level specifications. These tools often need to employ tricky, custom-built synthesis algorithms ...
Can reactive synthesis and syntax-guided synthesis be friends?
SPLASH Companion 2021: Companion Proceedings of the 2021 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...
Can reactive synthesis and syntax-guided synthesis be friends?
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 7, Issue POPL

January 2023

2196 pages

EISSN:2475-1421

DOI:10.1145/3554308

Editor:

Issue’s Table of Contents

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2023

Published in PACMPL Volume 7, Issue POPL

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
2,119
Total Downloads

Downloads (Last 12 months)904
Downloads (Last 6 weeks)106

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tao NVentresque ANallur VSaber T(2024)Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic ProgrammingAlgorithms10.3390/a1707028717:7(287)Online publication date: 1-Jul-2024
https://doi.org/10.3390/a17070287
Lubin JFerguson JYe KYim JChasins S(2024)Equivalence by Canonicalization for Synthesis-Backed RefactoringProceedings of the ACM on Programming Languages10.1145/36564538:PLDI(1879-1904)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656453
Patton NRahmani KMissula MBiswas JDillig I(2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
https://dl.acm.org/doi/10.1145/3632860
Nazari ASwayamdipta SChattopadhyay SRaghothaman M(2024)Generating Function Names to Improve Comprehension of Synthesized Programs2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC60511.2024.00035(248-259)Online publication date: 2-Sep-2024
https://doi.org/10.1109/VL/HCC60511.2024.00035
Eberhardinger MRupp FMaucher JMaghsudi S(2024)Unveiling the Decision-Making Process in Reinforcement Learning with Genetic ProgrammingAdvances in Swarm Intelligence10.1007/978-981-97-7181-3_28(349-365)Online publication date: 22-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-7181-3_28
Sun CSheng YPadon OBarrett C(2024)Clover: Closed-Loop Verifiable Code GenerationAI Verification10.1007/978-3-031-65112-0_7(134-155)Online publication date: 22-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-65112-0_7
Cerna DKutsia TElkind E(2023)Anti-unification and generalizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/736(6563-6573)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/736
Wu JWei LJiang YCheung SRen LXu C(2023)Programming by Example Made EasyACM Transactions on Software Engineering and Methodology10.1145/360718533:1(1-36)Online publication date: 7-Jul-2023
https://dl.acm.org/doi/10.1145/3607185
Jones RGuerrero PMitra NRitchie D(2023)ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured PrimitivesACM Transactions on Graphics10.1145/359241642:4(1-17)Online publication date: 26-Jul-2023
https://dl.acm.org/doi/10.1145/3592416
Cao DKunkel RNandi CWillsey MTatlock ZPolikarpova N(2023)babble: Learning Better Abstractions with E-Graphs and Anti-unificationProceedings of the ACM on Programming Languages10.1145/35712077:POPL(396-424)Online publication date: 11-Jan-2023
https://dl.acm.org/doi/10.1145/3571207
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents