research-article

Open access

Inductive Program Synthesis Guided by Observational Program Similarity

Authors:

Armando Solar-LezamaAuthors Info & Claims

Proceedings of the ACM on Programming Languages, Volume 7, Issue OOPSLA2

Article No.: 254, Pages 912 - 940

https://doi.org/10.1145/3622830

Published: 16 October 2023 Publication History

Abstract

We present a new general-purpose synthesis technique for generating programs from input-output examples. Our method, called metric program synthesis, relaxes the observational equivalence idea (used widely in bottom-up enumerative synthesis) into a weaker notion of observational similarity, with the goal of reducing the search space that the synthesizer needs to explore. Our method clusters programs into equivalence classes based on an expert-provided distance metric and constructs a version space that compactly represents “approximately correct” programs. Then, given a “close enough” program sampled from this version space, our approach uses a distance-guided repair algorithm to find a program that exactly matches the given input-output examples. We have implemented our proposed metric program synthesis technique in a tool called SyMetric and evaluate it in three different domains considered in prior work. Our evaluation shows that SyMetric outperforms other domain-agnostic synthesizers that use observational equivalence and that it achieves results competitive with domain-specific synthesizers that are either designed for or trained on those domains.

References

[1]

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob L Menick, Sebastian Borgeaud, Andy Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikoł aj Bińkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, and Karén Simonyan. 2022. Flamingo: a Visual Language Model for Few-Shot Learning. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.). 35, Curran Associates, Inc., 23716–23736. https://proceedings.neurips.cc/paper_files/paper/2022/file/960a172bc7fbf0177ccccbb411a7d800-Paper-Conference.pdf

[2]

Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive Program Synthesis. In Computer Aided Verification - 25th International Conference, CAV 2013, Saint Petersburg, Russia, July 13-19, 2013. Proceedings. 934–950. https://doi.org/10.1007/978-3-642-39799-8_67

[3]

Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to Write Programs. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. https://openreview.net/forum?id=ByldLrqlx

[4]

E.K. Burke, S. Gustafson, and G. Kendall. 2004. Diversity in genetic programming: an analysis of measures and correlation with fitness. IEEE Transactions on Evolutionary Computation, 8, 1 (2004), 47–62. https://doi.org/10.1109/TEVC.2003.819263

Digital Library

[5]

William G. La Cava, Lee Spector, and Kourosh Danai. 2016. Epsilon-Lexicase Selection for Regression. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference, Denver, CO, USA, July 20 - 24, 2016, Tobias Friedrich, Frank Neumann, and Andrew M. Sutton (Eds.). ACM, 741–748. https://doi.org/10.1145/2908812.2908898

Digital Library

[6]

Pavol Cerný and Thomas A. Henzinger. 2011. From boolean to quantitative synthesis. In Proceedings of the 11th International Conference on Embedded Software, EMSOFT 2011, part of the Seventh Embedded Systems Week, ESWeek 2011, Taipei, Taiwan, October 9-14, 2011, Samarjit Chakraborty, Ahmed Jerraya, Sanjoy K. Baruah, and Sebastian Fischmeister (Eds.). ACM, 149–154. https://doi.org/10.1145/2038642.2038666

Digital Library

[7]

Qiaochu Chen, Aaron Lamoreaux, Xinyu Wang, Greg Durrett, Osbert Bastani, and Isil Dillig. 2021. Web question answering with neurosymbolic program synthesis. In PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021. 328–343. https://doi.org/10.1145/3453483.3454047

Digital Library

[8]

Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2020. Multi-modal synthesis of regular expressions. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020. 487–502. https://doi.org/10.1145/3385412.3385988

Digital Library

[9]

Xinyun Chen, Chang Liu, and Dawn Song. 2019. Execution-Guided Neural Program Synthesis. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=H1gfOiAqYm

[10]

Paolo Ciaccia, Marco Patella, and Pavel Zezula. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Vldb. 97, 426–435.

[11]

2006. Genetic Programming, 9th European Conference, EuroGP 2006, Budapest, Hungary, April 10-12, 2006, Proceedings, Pierre Collet, Marco Tomassini, Marc Ebner, Steven M. Gustafson, and Anikó Ekárt (Eds.) (Lecture Notes in Computer Science, Vol. 3905). Springer. isbn:3-540-33143-3 https://doi.org/10.1007/11729976

[12]

Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. 2017. RobustFill: Neural Program Learning under Noisy I/O. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. 990–998. http://proceedings.mlr.press/v70/devlin17a.html

[13]

Tao Du, Jeevana Priya Inala, Yewen Pu, Andrew Spielberg, Adriana Schulz, Daniela Rus, Armando Solar-Lezama, and Wojciech Matusik. 2018. Inversecsg: Automatic Conversion of 3d Models To CSG Trees. ACM Trans. Graph., 37, 6 (2018), 213:1–213:16. https://doi.org/10.1145/3272127.3275006

Digital Library

[14]

Kevin Ellis, Maxwell I. Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, and Armando Solar-Lezama. 2019. Write, Execute, Assess: Program Synthesis with a REPL. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 9165–9174. https://proceedings.neurips.cc/paper/2019/hash/50d2d2262762648589b1943078712aa6-Abstract.html

[15]

Kevin Ellis, Catherine Wong, Maxwell I. Nye, Mathias Sablé-Meyer, Lucas Morales, Luke B. Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B. Tenenbaum. 2021. DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning. In PLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021. 835–850. https://doi.org/10.1145/3453483.3454080

Digital Library

[16]

Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. 420–435. https://doi.org/10.1145/3192366.3192382

Digital Library

[17]

Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 422–436. https://doi.org/10.1145/3062341.3062351

Digital Library

[18]

John Feser, Isil Dillig, and Armando Solar-lezama. 2023. Inductive Program Synthesis Guided by Observational Program Similarity (Artifact). https://doi.org/10.5281/zenodo.8200210

Digital Library

[19]

John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015. 229–239. https://doi.org/10.1145/2737924.2737977

Digital Library

[20]

Fred Glover and Manuel Laguna. 1998. Tabu Search. In Handbook of Combinatorial Optimization: Volume1–3, Ding-Zhu Du and Panos M. Pardalos (Eds.). Springer US, 2093–2229. isbn:978-1-4613-0303-9 https://doi.org/10.1007/978-1-4613-0303-9_33

[21]

Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011. 317–330. https://doi.org/10.1145/1926385.1926423

Digital Library

[22]

Shivam Handa and Martin C. Rinard. 2020. Inductive program synthesis over noisy data. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020. 87–98. https://doi.org/10.1145/3368089.3409732

Digital Library

[23]

Thomas Helmuth, Lee Spector, and James Matheson. 2015. Solving Uncompromising Problems With Lexicase Selection. IEEE Trans. Evol. Comput., 19, 5 (2015), 630–643. https://doi.org/10.1109/TEVC.2014.2362729

Digital Library

[24]

R. Kenny Jones, David Charatan, Paul Guerrero, Niloy J. Mitra, and Daniel Ritchie. 2021. Shapemod: Macro Operation Discovery for 3d Shape Programs. ACM Trans. Graph., 40, 4 (2021), 153:1–153:16. https://doi.org/10.1145/3450626.3459821

Digital Library

[25]

Tessa A. Lau, Steven A. Wolfman, Pedro M. Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn., 53, 1-2 (2003), 111–156. https://doi.org/10.1023/A:1025671410623

Digital Library

[26]

Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing regular expressions from examples for introductory automata assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2016, Amsterdam, The Netherlands, October 31 - November 1, 2016. 70–80. https://doi.org/10.1145/2993236.2993244

Digital Library

[27]

Woosuk Lee, Kihong Heo, Rajeev Alur, and Mayur Naik. 2018. Accelerating search-based program synthesis using learned probabilistic models. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. 436–449. https://doi.org/10.1145/3192366.3192410

Digital Library

[28]

Anders Miltner, Adrian Trejo Nuñez, Ana Brendel, Swarat Chaudhuri, and Isil Dillig. 2022. Bottom-up synthesis of recursive functional programs using angelic execution. Proc. ACM Program. Lang., 6, POPL (2022), 1–29. https://doi.org/10.1145/3498682

Digital Library

[29]

Alberto Moraglio, Krzysztof Krawiec, and Colin G. Johnson. 2012. Geometric Semantic Genetic Programming. In Parallel Problem Solving from Nature - PPSN XII - 12th International Conference, Taormina, Italy, September 1-5, 2012, Proceedings, Part I, Carlos A. Coello Coello, Vincenzo Cutello, Kalyanmoy Deb, Stephanie Forrest, Giuseppe Nicosia, and Mario Pavone (Eds.) (Lecture Notes in Computer Science, Vol. 7491). Springer, 21–31. https://doi.org/10.1007/978-3-642-32937-1_3

Digital Library

[30]

Chandrakana Nandi, James R. Wilcox, Pavel Panchekha, Taylor Blau, Dan Grossman, and Zachary Tatlock. 2018. Functional programming for compiling and decompiling computer-aided design. Proc. ACM Program. Lang., 2, ICFP (2018), 99:1–99:31. https://doi.org/10.1145/3236794

Digital Library

[31]

Chandrakana Nandi, Max Willsey, Adam Anderson, James R. Wilcox, Eva Darulova, Dan Grossman, and Zachary Tatlock. 2020. Synthesizing structured CAD models with equality saturation and inverse transformations. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 31–44. https://doi.org/10.1145/3385412.3386012

Digital Library

[32]

Maxwell I. Nye, Yewen Pu, Matthew Bowers, Jacob Andreas, Joshua B. Tenenbaum, and Armando Solar-Lezama. 2021. Representing Partial Programs with Blended Abstract Semantics. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. https://openreview.net/forum?id=mCtadqIxOJ

[33]

OpenAI. 2022. Introducing ChatGPT. https://openai.com/blog/chatgpt

[34]

OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774.

[35]

Rong Pan, Qinheping Hu, Gaowei Xu, and Loris D’Antoni. 2019. Automatic Repair of Regular Expressions. Proc. ACM Program. Lang., 3, OOPSLA (2019), 139:1–139:29. https://doi.org/10.1145/3360565

Digital Library

[36]

Matt Pharr and William R Mark. 2012. ispc: A SPMD compiler for high-performance CPU programming. In 2012 Innovative Parallel Computing (InPar). 1–13.

[37]

Veselin Raychev, Pavol Bielik, Martin T. Vechev, and Andreas Krause. 2016. Learning programs from noisy data. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016. 761–774. https://doi.org/10.1145/2837614.2837671

Digital Library

[38]

Thomas Rebele, Katerina Tzompanaki, and Fabian M. Suchanek. 2018. Adding Missing Words to Regular Expressions. In Advances in Knowledge Discovery and Data Mining - 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part II. 67–79. https://doi.org/10.1007/978-3-319-93037-4_6

Digital Library

[39]

Mark Santolucito, Kate Rogers, Aedan Lombardo, and Ruzica Piskac. 2018. Programming-by-example for audio: synthesizing digital signal processing programs. In Proceedings of the 6th ACM SIGPLAN International Workshop on Functional Art, Music, Modeling, and Design, FARM@ICFP 2018, St. Louis, MO, USA, September 29, 2018, Brent Yorgey and Donya Quick (Eds.). ACM, 18–25. https://doi.org/10.1145/3242903.3242906

Digital Library

[40]

Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic superoptimization. In Architectural Support for Programming Languages and Operating Systems, ASPLOS 2013, Houston, TX, USA, March 16-20, 2013. 305–316. https://doi.org/10.1145/2451116.2451150

Digital Library

[41]

Eric Schkufza, Rahul Sharma, and Alex Aiken. 2014. Stochastic optimization of floating-point programs with tunable precision. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014, Michael F. P. O’Boyle and Keshav Pingali (Eds.). ACM, 53–64. https://doi.org/10.1145/2594291.2594302

Digital Library

[42]

Vadim Shapiro and Donald L. Vossler. 1991. Construction and optimization of CSG representations. Comput. Aided Des., 23, 1 (1991), 4–20. https://doi.org/10.1016/0010-4485(91)90077-A

Digital Library

[43]

Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. 2018. CSGNet: Neural Shape Parser for Constructive Solid Geometry. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. 5515–5523. https://doi.org/10.1109/CVPR.2018.00578

[44]

Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, Jianfeng Wang, Jordan Lee Boyd-Graber, and Lijuan Wang. 2023. Prompting GPT-3 To Be Reliable. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=98p5x51L5af

[45]

Armando Solar-Lezama, Liviu Tancau, Rastislav Bodík, Sanjit A. Seshia, and Vijay A. Saraswat. 2006. Combinatorial sketching for finite programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA, October 21-25, 2006. 404–415. https://doi.org/10.1145/1168857.1168907

Digital Library

[46]

Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, and Jiajun Wu. 2019. Learning to Infer and Execute 3D Shape Programs. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. https://openreview.net/forum?id=rylNH20qFQ

[47]

Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim, Milo M.K. Martin, and Rajeev Alur. 2013. TRANSIT: Specifying Protocols with Concolic Snippets. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA. 287–296. isbn:978-1-4503-2014-6 https://doi.org/10.1145/2491956.2462174

Digital Library

[48]

Ashwin K. Vijayakumar, Michael Cogswell, Ramprasaath R. Selvaraju, Qing Sun, Stefan Lee, David J. Crandall, and Dhruv Batra. 2018. Diverse Beam Search for Improved Description of Complex Scenes. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 7371–7379. https://doi.org/10.1609/aaai.v32i1.12340

[49]

Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 452–466. https://doi.org/10.1145/3062341.3062365

Digital Library

[50]

Xinyu Wang, Isil Dillig, and Rishabh Singh. 2017. Synthesis of data completion scripts using finite tree automata. Proc. ACM Program. Lang., 1, OOPSLA (2017), 62:1–62:26. https://doi.org/10.1145/3133886

Digital Library

[51]

Xinyu Wang, Isil Dillig, and Rishabh Singh. 2018. Program Synthesis Using Abstraction Refinement. Proc. ACM Program. Lang., 2, POPL (2018), 63:1–63:30. https://doi.org/10.1145/3158151

Digital Library

[52]

Karl D. D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. 2021. Fusion 360 gallery: a dataset and environment for programmatic CAD construction from human design sequences. ACM Trans. Graph., 40, 4 (2021), 54:1–54:24. https://doi.org/10.1145/3450626.3459818

Digital Library

Index Terms

Inductive Program Synthesis Guided by Observational Program Similarity
1. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Discrete space search
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming

Recommendations

Algorithmic program synthesis: introduction

Program synthesis is a process of producing an executable program from a specification. Algorithmic synthesis produces the program automatically, without an intervention from an expert. While classical compilation falls under the definition of ...
Can reactive synthesis and syntax-guided synthesis be friends?
SPLASH Companion 2021: Companion Proceedings of the 2021 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity

While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...
Can reactive synthesis and syntax-guided synthesis be friends?
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation

While reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages

Proceedings of the ACM on Programming Languages Volume 7, Issue OOPSLA2

October 2023

2250 pages

EISSN:2475-1421

DOI:10.1145/3554312

Editor:
Michael Hicks
Amazon, USA

Issue’s Table of Contents

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 October 2023

Published in PACMPL Volume 7, Issue OOPSLA2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
374
Total Downloads

Downloads (Last 12 months)374
Downloads (Last 6 weeks)31

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents