Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3338906.3338951acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

Maximal multi-layer specification synthesis

Published: 12 August 2019 Publication History

Abstract

There has been a significant interest in applying programming-by-example to automate repetitive and tedious tasks. However, due to the incomplete nature of input-output examples, a synthesizer may generate programs that pass the examples but do not match the user intent. In this paper, we propose MARS, a novel synthesis framework that takes as input a multi-layer specification composed by input-output examples, textual description, and partial code snippets that capture the user intent. To accurately capture the user intent from the noisy and ambiguous description, we propose a hybrid model that combines the power of an LSTM-based sequence-to-sequence model with the apriori algorithm for mining association rules through unsupervised learning. We reduce the problem of solving a multi-layer specification synthesis to a Max-SMT problem, where hard constraints encode well-typed concrete programs and soft constraints encode the user intent learned by the hybrid model. We instantiate our hybrid model to the data wrangling domain and compare its performance against Morpheus, a state-of-the-art synthesizer for data wrangling tasks. Our experiments demonstrate that our approach outperforms MORPHEUS in terms of running time and solved benchmarks. For challenging benchmarks, our approach can suggest candidates with rankings that are an order of magnitude better than MORPHEUS which leads to running times that are 15x faster than MORPHEUS.

References

[1]
Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. 1993. Mining Association Rules between Sets of Items in Large Databases. In Proc. International Conference on Management of Data. ACM, 207–216.
[2]
Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining Association Rules in Large Databases. In Proc. International Conference on Very Large Data Bases. ACM, 487–499.
[3]
Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. Deepcoder: Learning to write programs. In Proc. International Conference on Learning Representations. OpenReview.
[4]
Shaon Barman, Rastislav Bodík, Satish Chandra, Emina Torlak, Arka Aloke Bhattacharya, and David Culler. 2015. Toward tool support for interactive synthesis. In Proc. International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. ACM, 121–136.
[5]
Daniel W. Barowy, Sumit Gulwani, Ted Hart, and Benjamin G. Zorn. 2015. FlashRelate: extracting relational data from semi-structured spreadsheets using examples. In Proc. Conference on Programming Language Design and Implementation. ACM, 218–228.
[6]
Nikolaj Bjørner, Anh-Dung Phan, and Lars Fleckenstein. 2015. ν Z - An Optimizing SMT Solver. In Proc. Tools and Algorithms for Construction and Analysis of Systems. Springer, 194–199.
[7]
Rastislav Bodík, Satish Chandra, Joel Galenson, Doug Kimelman, Nicholas Tung, Shaon Barman, and Casey Rodarmor. 2010. Programming with angelic nondeterminism. In Proc. Symposium on Principles of Programming Languages. ACM, 339–352.
[8]
Tamraparni Dasu and Theodore Johnson. 2003. Exploratory Data Mining and Data Cleaning. John Wiley.
[9]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proc. Tools and Algorithms for Construction and Analysis of Systems. Springer, 337–340.
[10]
Aditya Desai, Sumit Gulwani, Vineet Hingorani, Nidhi Jain, Amey Karkare, Mark Marron, Sailesh R, and Subhajit Roy. 2016. Program synthesis using natural language. In Proc. International Conference on Software Engineering. ACM, 345– 356.
[11]
efficient apriori. 2018. efficient-apriori 0.4.5. https://pypi.org/project/efficientapriori/.
[12]
Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. In Proc. Conference on Programming Language Design and Implementation. ACM, 420–435.
[13]
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based synthesis of table consolidation and transformation tasks from examples. In Proc. Conference on Programming Language Design and Implementation. ACM, 422–436.
[14]
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2018. Morpheus. https://utopia-group.github.io/morpheus/.
[15]
John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-output Examples. In Proc. Conference on Programming Language Design and Implementation. ACM, 229–239.
[16]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using inputoutput examples. In Proc. Symposium on Principles of Programming Languages. ACM, 317–330.
[17]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[18]
Vu Le, Sumit Gulwani, and Zhendong Su. 2013. SmartSynth: synthesizing smartphone automation scripts from natural language. In Proc. International Conference on Mobile Systems, Applications, and Services. ACM, 193–206.
[19]
Aditya Menon, Omer Tamuz, Sumit Gulwani, Butler Lampson, and Adam Kalai. 2013. A machine learning framework for programming by example. In Proc. International Conference on Machine Learning. Proceedings of Machine Learning Research, 187–195.
[20]
Arvind Neelakantan, Quoc V Le, Martin Abadi, Andrew McCallum, and Dario Amodei. 2017. Learning a natural language interface with neural programmer. In Proc. International Conference on Learning Representations. OpenReview.
[21]
Arvind Neelakantan, Quoc V Le, and Ilya Sutskever. 2016. Neural programmer: Inducing latent programs with gradient descent. In Proc. International Conference on Learning Representations. OpenReview.
[22]
Hila Peleg, Sharon Shoham, and Eran Yahav. 2018. Programming not only by example. In Proc. International Conference on Software Engineering. ACM, 1114– 1124.
[23]
pytorch. 2018. The Pytorch Framework. https://pytorch.org/.
[24]
Chris Quirk, Raymond J. Mooney, and Michel Galley. 2015. Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes. In Proc. Meeting of the Association for Computational Linguistics. 878–888.
[25]
Veselin Raychev, Pavol Bielik, Martin Vechev, and Andreas Krause. 2016. Learning programs from noisy data. In Proc. Symposium on Principles of Programming Languages. ACM, 761–774.
[26]
Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In Proc. Conference on Programming Language Design and Implementation. ACM, 419–428.
[27]
Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. 2015. Compositional Program Synthesis from Natural Language and Examples. In Proc. International Joint Conference on Artificial Intelligence. AAAI Press, 792–800.
[28]
Stackoverflow. 2018. Stackoverflow. https://stackoverflow.com/.
[29]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Proc. Annual Conference on Neural Information Processing Systems. 3104–3112.
[30]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Synthesizing highly expressive SQL queries from input-output examples. In Proc. Conference on Programming Language Design and Implementation. ACM, 452–466.
[31]
Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. 2016. Synthesizing transformations on hierarchically structured data. In Proc. Conference on Programming Language Design and Implementation. ACM, 508–521.
[32]
Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. In Proc. International Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM, 63:1– 63:26.

Cited By

View all
  • (2024)Maximal Quantified Precondition Synthesis for Linear Array LoopsProgramming Languages and Systems10.1007/978-3-031-57267-8_10(245-274)Online publication date: 6-Apr-2024
  • (2023)Fast and Reliable Program Synthesis via User Interaction2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00129(963-975)Online publication date: 11-Sep-2023
  • (2022)Visualization question answering using introspective program synthesisProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523709(137-151)Online publication date: 9-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
August 2019
1264 pages
ISBN:9781450355728
DOI:10.1145/3338906
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Max-SMT
  2. machine learning
  3. neural networks
  4. program synthesis

Qualifiers

  • Research-article

Funding Sources

Conference

ESEC/FSE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)134
  • Downloads (Last 6 weeks)24
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Maximal Quantified Precondition Synthesis for Linear Array LoopsProgramming Languages and Systems10.1007/978-3-031-57267-8_10(245-274)Online publication date: 6-Apr-2024
  • (2023)Fast and Reliable Program Synthesis via User Interaction2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00129(963-975)Online publication date: 11-Sep-2023
  • (2022)Visualization question answering using introspective program synthesisProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523709(137-151)Online publication date: 9-Jun-2022
  • (2022)JigsawProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510203(1219-1231)Online publication date: 21-May-2022
  • (2022)L2S: A Framework for Synthesizing the Most Probable Program under a SpecificationACM Transactions on Software Engineering and Methodology10.1145/348757031:3(1-45)Online publication date: 7-Mar-2022
  • (2022)Interactive NLU-Powered Ontology-Based Workflow Synthesis for FAIR Support of HPC2022 IEEE/ACM International Workshop on HPC User Support Tools (HUST)10.1109/HUST56722.2022.00009(29-40)Online publication date: Nov-2022
  • (2022)Enabling near real-time NLU-driven natural language programming through dynamic grammar graph-based translationProceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO53902.2022.9741262(278-289)Online publication date: 2-Apr-2022
  • (2021)Generalizable synthesis through unificationProceedings of the ACM on Programming Languages10.1145/34855445:OOPSLA(1-28)Online publication date: 15-Oct-2021
  • (2021)Multi-modal program inference: a marriage of pre-trained language models and component-based synthesisProceedings of the ACM on Programming Languages10.1145/34855355:OOPSLA(1-29)Online publication date: 15-Oct-2021
  • (2021)Gauss: program synthesis by reasoning over graphsProceedings of the ACM on Programming Languages10.1145/34855115:OOPSLA(1-29)Online publication date: 15-Oct-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media