Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Top-Down Synthesis for Library Learning

Published: 11 January 2023 Publication History

Abstract

This paper introduces corpus-guided top-down synthesis as a mechanism for synthesizing library functions that capture common functionality from a corpus of programs in a domain specific language (DSL). The algorithm builds abstractions directly from initial DSL primitives, using syntactic pattern matching of intermediate abstractions to intelligently prune the search space and guide the algorithm towards abstractions that maximally capture shared structures in the corpus. We present an implementation of the approach in a tool called Stitch and evaluate it against the state-of-the-art deductive library learning algorithm from DreamCoder. Our evaluation shows that Stitch is 3-4 orders of magnitude faster and uses 2 orders of magnitude less memory while maintaining comparable or better library quality (as measured by compressivity). We also demonstrate Stitch’s scalability on corpora containing hundreds of complex programs that are intractable with prior deductive approaches and show empirically that it is robust to terminating the search procedure early—further allowing it to scale to challenging datasets by means of early stopping.

Supplementary Material

Auxiliary Archive (popl23main-p278-p-archive.zip)
Supplement for the paper Top-Down Synthesis for Library Learning (POPL 2023). stitch_appendix.pdf: Appendix for the paper stitch.v: Coq proof of the completeness of LambdaUnify from the paper Top-Down Synthesis for Library Learning (POPL 2023). Proven with CoqIDE 8.15.2.

References

[1]
Martin Abadi, Luca Cardelli, P-L Curien, and J-J Lévy. 1989. Explicit substitutions. In Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 31–46.
[2]
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR), 51, 4 (2018), 1–37.
[3]
Miltiadis Allamanis and Charles Sutton. 2014. Mining idioms from source code. In Proceedings of the 22nd acm sigsoft international symposium on foundations of software engineering. 472–483.
[4]
Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2016. Deepcoder: Learning to write programs. arXiv preprint arXiv:1611.01989.
[5]
Matthew Bowers, Olausson, Wong, Grand, Tenenbaum, Ellis, and Solar-Lezama. 2022. Artifact for "Top-Down Synthesis For Library Learning". https://doi.org/10.5281/zenodo.7151663
[6]
Rod M Burstall and John Darlington. 1977. A transformation system for developing recursive programs. Journal of the ACM (JACM), 24, 1 (1977), 44–67.
[7]
David Cao, Rose Kunkel, Chandrakana Nandi, Max Willsey, Zachary Tatlock, and Nadia Polikarpova. 2023. babble: Learning Better Abstractions with E-Graphs and Anti-Unification. Proc. ACM Program. Lang., https://doi.org/10.1145/3571207
[8]
Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-modal synthesis of regular expressions. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 487–502. https://doi.org/10.1145/3385412.3385988
[9]
Xinyun Chen, Chang Liu, and Dawn Song. 2018. Execution-guided neural program synthesis. In International Conference on Learning Representations.
[10]
Adam Chlipala, Benjamin Delaware, Samuel Duchovni, Jason Gross, Clément Pit-Claudel, Sorawit Suriyakarn, Peng Wang, and Katherine Ye. 2017. The end of history? Using a proof assistant to replace language design with library design. In 2nd Summit on Advances in Programming Languages (SNAPL 2017).
[11]
Geoffrey Chu and Peter J. Stuckey. 2015. Dominance breaking constraints. Constraints An Int. J., 20, 2 (2015), 155–182. https://doi.org/10.1007/s10601-014-9173-7
[12]
Andrew Cropper. 2019. Playgol: Learning programs through play. arXiv preprint arXiv:1904.08993.
[13]
Nicolaas Govert de Bruijn. 1972. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser theorem. In Indagationes Mathematicae (Proceedings). 75, 381–392.
[14]
Eyal Dechter, Jonathan Malmaud, Ryan Prescott Adams, and Joshua B Tenenbaum. 2013. Bootstrap learning via modular concept discovery. In Proceedings of the International Joint Conference on Artificial Intelligence.
[15]
Gilles Dowek, Thérèse Hardin, and Claude Kirchner. 1995. Higher-Order Unification via Explicit Substitutions. In Proceedings of the Tenth Annual Symposium on Logic in Computer Science, D. Kozen (Ed.). IEEE Computer Society Press, San Diego, California. 366–374.
[16]
Gilles Dowek, Thérèse Hardin, Claude Kirchner, and Frank Pfenning. 1996. Unification via Explicit Substitutions: The Case of Higher-Order Patterns. In Proceedings of the Joint International Conference and Symposium on Logic Programming, M. Maher (Ed.). MIT Press, Bonn, Germany. 259–273.
[17]
Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum. 2020. Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. arXiv preprint arXiv:2006.08381.
[18]
Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B Tenenbaum. 2021. Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 835–850.
[19]
Kevin M Ellis, Lucas E Morales, Mathias Sablé-Meyer, Armando Solar Lezama, and Joshua B Tenenbaum. 2018. Library learning for neurally-guided bayesian program induction.
[20]
Matthias Felleisen and Robert Hieb. 1992. The Revised Report on the Syntactic Theories of Sequential Control and State. Theor. Comput. Sci., 103, 2 (1992), sep, 235–271. issn:0304-3975 https://doi.org/10.1016/0304-3975(92)90014-7
[21]
John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. ACM SIGPLAN Notices, 50, 6 (2015), 229–239.
[22]
Yaroslav Ganin, Tejas Kulkarni, Igor Babuschkin, SM Ali Eslami, and Oriol Vinyals. 2018. Synthesizing programs for images using reinforced adversarial learning. In International Conference on Machine Learning. 1666–1675.
[23]
Sumit Gulwani, José Hernández-Orallo, Emanuel Kitzelmann, Stephen H Muggleton, Ute Schmid, and Benjamin Zorn. 2015. Inductive programming meets the real world. Commun. ACM, 58, 11 (2015), 90–99.
[24]
Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. 1968. A Formal Basis for the Heuristic Determination of Minimum Cost Paths. IEEE Trans. Syst. Sci. Cybern., 4, 2 (1968), 100–107. https://doi.org/10.1109/TSSC.1968.300136
[25]
Robert John Henderson. 2013. Cumulative learning in the lambda calculus.
[26]
Gérard Huet. 1975. A Unification Algorithm for Typed λ -Calculus. Theoretical Computer Science, 1 (1975), 27–57.
[27]
Irvin Hwang, Andreas Stuhlmüller, and Noah D Goodman. 2011. Inducing probabilistic programs by Bayesian program merging. arXiv preprint arXiv:1110.5667.
[28]
Toshihide Ibaraki. 1977. The Power of Dominance Relations in Branch-and-Bound Algorithms. J. ACM, 24, 2 (1977), 264–279. https://doi.org/10.1145/322003.322010
[29]
Srinivasan Iyer, Alvin Cheung, and Luke Zettlemoyer. 2019. Learning programmatic idioms for scalable semantic parsing. arXiv preprint arXiv:1904.09086.
[30]
Thomas Johnsson. 1985. Lambda Lifting: Treansforming Programs to Recursive Equations. In Functional Programming Languages and Computer Architecture, FPCA 1985, Nancy, France, September 16-19, 1985, Proceedings, Jean-Pierre Jouannaud (Ed.) (Lecture Notes in Computer Science, Vol. 201). Springer, 190–203. https://doi.org/10.1007/3-540-15975-4_37
[31]
R Kenny Jones, David Charatan, Paul Guerrero, Niloy J Mitra, and Daniel Ritchie. 2021. ShapeMOD: macro operation discovery for 3D shape programs. ACM Transactions on Graphics (TOG), 40, 4 (2021), 1–16.
[32]
Manos Koukoutos, Mukund Raghothaman, Etienne Kneuss, and Viktor Kuncak. 2017. On repair with probabilistic attribute grammars. arXiv preprint arXiv:1707.04148.
[33]
Michihiro Kuramochi and George Karypis. 2001. Frequent Subgraph Discovery. In Proceedings of the 2001 IEEE International Conference on Data Mining, 29 November - 2 December 2001, San Jose, California, USA, Nick Cercone, Tsau Young Lin, and Xindong Wu (Eds.). IEEE Computer Society, 313–320. https://doi.org/10.1109/ICDM.2001.989534
[34]
Michihiro Kuramochi and George Karypis. 2004. Finding Frequent Patterns in a Large Sparse Graph. In Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004, Michael W. Berry, Umeshwar Dayal, Chandrika Kamath, and David B. Skillicorn (Eds.). SIAM, 345–356. https://doi.org/10.1137/1.9781611972740.32
[35]
A. H. Land and A. G. Doig. 1960. An Automatic Method of Solving Discrete Programming Problems. Econometrica, 28, 3 (1960), 497–520. issn:00129682, 14680262 http://www.jstor.org/stable/1910129
[36]
Tessa Lau, Steven A Wolfman, Pedro Domingos, and Daniel S Weld. 2003. Programming by demonstration using version space algebra. Machine Learning, 53, 1 (2003), 111–156.
[37]
Miguel Lázaro-Gredilla, Dianhuan Lin, J Swaroop Guntupalli, and Dileep George. 2019. Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs. Science Robotics, 4, 26 (2019), eaav3150.
[38]
Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2016, Amsterdam, The Netherlands, October 31 - November 1, 2016, Bernd Fischer and Ina Schaefer (Eds.). ACM, 70–80. https://doi.org/10.1145/2993236.2993244
[39]
Percy Liang, Michael I Jordan, and Dan Klein. 2010. Learning programs: A hierarchical Bayesian approach. In Proceedings of the 27th International Conference on Machine Learning (ICML-10). 639–646.
[40]
Dianhuan Lin, Eyal Dechter, Kevin Ellis, Joshua B Tenenbaum, and Stephen H Muggleton. 2014. Bias reformulation for one-shot function induction.
[41]
Zohar Manna and Richard Waldinger. 1980. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems (TOPLAS), 2, 1 (1980), 90–121.
[42]
Dale Miller. 1991. A Logic Programming Language with Lambda-Abstraction, Function Variables, and Simple Unification. Journal of Logic and Computation, 1, 4 (1991), 497–536.
[43]
Dale Miller. 1992. Unification under a Mixed Prefix. Journal of Symbolic Computation, 14 (1992), 321–358.
[44]
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. 2002. Network Motifs: Simple Building Blocks of Complex Networks. Science, 298, 5594 (2002), 824–827. https://doi.org/10.1126/science.298.5594.824 arxiv:https://www.science.org/doi/pdf/10.1126/science.298.5594.824.
[45]
Tom M Mitchell. 1977. Version spaces: A candidate elimination approach to rule learning. In Proceedings of the 5th international joint conference on Artificial intelligence-Volume 1. 305–310.
[46]
David R. Morrison, Sheldon H. Jacobson, Jason J. Sauppe, and Edward C. Sewell. 2016. Branch-and-bound algorithms: A survey of recent advances in searching, branching, and pruning. Discret. Optim., 19 (2016), 79–102. https://doi.org/10.1016/j.disopt.2016.01.005
[47]
Maxwell Nye, Yewen Pu, Matthew Bowers, Jacob Andreas, Joshua B Tenenbaum, and Armando Solar-Lezama. 2021. Representing Partial Programs with Blended Abstract Semantics. In International Conference on Learning Representations.
[48]
Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. ACM SIGPLAN Notices, 51, 6 (2016), 522–538.
[49]
Oleksandr Polozov and Sumit Gulwani. 2015. Flashmeta: A framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. 107–126.
[50]
Falk Schreiber and Henning Schwöbbermeyer. 2005. Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks. Trans. Comp. Sys. Biology, 3 (2005), 89–104. https://doi.org/10.1007/11599128_7
[51]
Ameesh Shah, Eric Zhan, Jennifer Sun, Abhinav Verma, Yisong Yue, and Swarat Chaudhuri. 2020. Learning Differentiable Programs with Admissible Neural Heuristics. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 4940–4952. https://proceedings.neurips.cc/paper/2020/file/342285bb2a8cadef22f667eeb6a63732-Paper.pdf
[52]
Claude Elwood Shannon. 1948. A mathematical theory of communication. The Bell system technical journal, 27, 3 (1948), 379–423.
[53]
Eui Chul Shin, Miltiadis Allamanis, Marc Brockschmidt, and Alex Polozov. 2019. Program synthesis and semantic parsing with learned code idioms. Advances in Neural Information Processing Systems, 32 (2019).
[54]
Thoralf Skolem. 1920. Logisch-kombinatorische Untersuchungen über die Erfüllbarkeit oder Bewiesbarkeit mathematischer Sätze nebst einem Theorem über dichte Mengen.
[55]
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. Egg: Fast and Extensible Equality Saturation. Proc. ACM Program. Lang., 5, POPL (2021), Article 23, jan, 29 pages. https://doi.org/10.1145/3434304
[56]
Catherine Wong, Kevin M Ellis, Joshua Tenenbaum, and Jacob Andreas. 2021. Leveraging language to learn program abstractions and search heuristics. In International Conference on Machine Learning. 11193–11204.
[57]
Catherine Wong, William McCarthy, Gabriel Grand, Jacob Andreas, Joshua B Tenenbaum, Robert Hawkins, and Judy Fan. 2022. Identifying concept libraries from language about object structure. In CogSci. To appear.

Cited By

View all
  • (2024)Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic ProgrammingAlgorithms10.3390/a1707028717:7(287)Online publication date: 1-Jul-2024
  • (2024)Equivalence by Canonicalization for Synthesis-Backed RefactoringProceedings of the ACM on Programming Languages10.1145/36564538:PLDI(1879-1904)Online publication date: 20-Jun-2024
  • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 7, Issue POPL
January 2023
2196 pages
EISSN:2475-1421
DOI:10.1145/3554308
  • Editor:
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2023
Published in PACMPL Volume 7, Issue POPL

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Abstraction Learning
  2. Library Learning
  3. Program Synthesis

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)904
  • Downloads (Last 6 weeks)106
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Program Synthesis with Large Language Models Using Many-Objective Grammar-Guided Genetic ProgrammingAlgorithms10.3390/a1707028717:7(287)Online publication date: 1-Jul-2024
  • (2024)Equivalence by Canonicalization for Synthesis-Backed RefactoringProceedings of the ACM on Programming Languages10.1145/36564538:PLDI(1879-1904)Online publication date: 20-Jun-2024
  • (2024)Programming-by-Demonstration for Long-Horizon Robot TasksProceedings of the ACM on Programming Languages10.1145/36328608:POPL(512-545)Online publication date: 5-Jan-2024
  • (2024)Generating Function Names to Improve Comprehension of Synthesized Programs2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC60511.2024.00035(248-259)Online publication date: 2-Sep-2024
  • (2024)Unveiling the Decision-Making Process in Reinforcement Learning with Genetic ProgrammingAdvances in Swarm Intelligence10.1007/978-981-97-7181-3_28(349-365)Online publication date: 22-Aug-2024
  • (2024)Clover: Closed-Loop Verifiable Code GenerationAI Verification10.1007/978-3-031-65112-0_7(134-155)Online publication date: 22-Jul-2024
  • (2023)Anti-unification and generalizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/736(6563-6573)Online publication date: 19-Aug-2023
  • (2023)Programming by Example Made EasyACM Transactions on Software Engineering and Methodology10.1145/360718533:1(1-36)Online publication date: 7-Jul-2023
  • (2023)ShapeCoder: Discovering Abstractions for Visual Programs from Unstructured PrimitivesACM Transactions on Graphics10.1145/359241642:4(1-17)Online publication date: 26-Jul-2023
  • (2023)babble: Learning Better Abstractions with E-Graphs and Anti-unificationProceedings of the ACM on Programming Languages10.1145/35712077:POPL(396-424)Online publication date: 11-Jan-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media