Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3385412.3385988acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Multi-modal synthesis of regular expressions

Published: 11 June 2020 Publication History

Abstract

In this paper, we propose a multi-modal synthesis technique for automatically constructing regular expressions (regexes) from a combination of examples and natural language. Using multiple modalities is useful in this context because natural language alone is often highly ambiguous, whereas examples in isolation are often not sufficient for conveying user intent. Our proposed technique first parses the English description into a so-called hierarchical sketch that guides our programming-by-example (PBE) engine. Since the hierarchical sketch captures crucial hints, the PBE engine can leverage this information to both prioritize the search as well as make useful deductions for pruning the search space.
We have implemented the proposed technique in a tool called Regel and evaluate it on over three hundred regexes. Our evaluation shows that Regel achieves 80 % accuracy whereas the NLP-only and PBE-only baselines achieve 43 % and 26 % respectively. We also compare our proposed PBE engine against an adaptation of AlphaRegex, a state-of-the-art regex synthesis tool, and show that our proposed PBE engine is an order of magnitude faster, even if we adapt the search algorithm of AlphaRegex to leverage the sketch. Finally, we conduct a user study involving 20 participants and show that users are twice as likely to successfully come up with the desired regex using Regel compared to without it.

References

[1]
2016. Class: Regexp (Ruby 2.4.0). https://ruby-doc.org/core-2.4.0/Regexp.html. 2019. Pattern (Java Platform SE 8 ). https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html.
[2]
Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive program synthesis. In International conference on computer aided verification. Springer, 934–950.
[3]
R. Alquezar and A. Sanfeliu. 1994. Incremental Grammatical Inference From Positive And Negative Data Using Unbiased Finite State Automata. In In Proceedings of the ACLâĂŹ02 Workshop on Unsupervised Lexical Acquisition. 291–300.
[4]
Dana Angluin. 1978. On the complexity of minimum inference of regular sets. Information and Control 39, 3 (1978), 337 – 350.
[5]
Dana Angluin. 1987. Learning Regular Sets from Queries and Counterexamples. Inf. Comput. 75, 2 (1987), 87–106.
[6]
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic Parsing on Freebase from Question-Answer Pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 1533–1544.
[7]
James Bornholt, Emina Torlak, Dan Grossman, and Luis Ceze. 2016. Optimizing synthesis with metasketches. In ACM SIGPLAN Notices, Vol. 51. ACM, 775–788.
[8]
Bob Carpenter. 1998. Type-logical Semantics. MIT Press, Cambridge, MA, USA.
[9]
Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig. 2019. Multi-modal Synthesis of Regular Expressions. arXiv: cs.PL/1908.03316
[10]
Yanju Chen, Ruben Martins, and Yu Feng. 2019. Maximal Multi-layer Specification Synthesis. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). ACM, New York, NY, USA, 602–612.
[11]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08/ETAPS’08). Springer-Verlag, 337–340.
[12]
Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program Synthesis Using Conflict-driven Learning. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). ACM, 420–435.
[13]
Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri. 2017. Component-based Synthesis of Table Consolidation and Transformation Tasks from Examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). New York, NY, USA, 422–436.
[14]
John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data Structure Transformations from Input-output Examples. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). ACM, 229–239.
[15]
Laura Firoiu, Tim Oates, and Paul R. Cohen. 1998. Learning Regular Languages from Positive Evidence. In Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. 350–355.
[16]
E Mark Gold. 1978. Complexity of automaton identification from given data. Information and Control 37, 3 (1978), 302 – 320.
[17]
Sumit Gulwani. 2011. Automating String Processing in Spreadsheets Using Input-output Examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’11). ACM, 317–330.
[18]
Sumit Gulwani, Susmit Jha, Ashish Tiwari, and Ramarathnam Venkatesan. 2011. Synthesis of Loop-free Programs. SIGPLAN Not. 46, 6 (June 2011), 62–73.
[19]
Sumit Gulwani and Mark Marron. 2014. NLyze: Interactive Programming by Natural Language for Spreadsheet Data Analysis and Manipulation. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). ACM, 803–814.
[20]
Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing Java Expressions from Free-form Queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, 416–432.
[21]
Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. 2013. Complete Completion Using Types and Weights. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA, 27–38.
[22]
Po-Sen Huang, Chenglong Wang, Rishabh Singh, Wen-tau Yih, and Xiaodong He. 2018. Natural Language to Structured Query Generation via Meta-Learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, 732–738.
[23]
Susmit Jha, Sumit Gulwani, Sanjit A. Seshia, and Ashish Tiwari. 2010. Oracle-guided Component-based Program Synthesis. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1 (ICSE ’10). ACM, New York, NY, USA, 215–224.
[24]
Nate Kushman and Regina Barzilay. 2013. Using Semantic Unification to Generate Regular Expressions from Natural Language. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 826–836.
[25]
Vu Le and Sumit Gulwani. 2014. FlashExtract: A Framework for Data Extraction by Examples. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, 542–553.
[26]
Mina Lee, Sunbeom So, and Hakjoo Oh. 2016. Synthesizing Regular Expressions from Examples for Introductory Automata Assignments. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2016). ACM, 70–80.
[27]
A Solar Lezama. 2008. Program synthesis by sketching. Ph.D. Dissertation.
[28]
Xi Victoria Lin, Chenglong Wang, Luke Zettlemoyer, and Michael D. Ernst. 2018. NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018). European Language Resource Association. http://aclweb.org/anthology/L18-1491
[29]
Nicholas Locascio, Karthik Narasimhan, Eduardo De Leon, Nate Kushman, and Regina Barzilay. 2016. Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1918–1923.
[30]
Bill Maccartney. 2009. Natural Language Inference. Ph.D. Dissertation. Stanford, CA, USA. Advisor(s) Manning, Christopher D. AAI3364139.
[31]
Mehdi Manshadi, Daniel Gildea, and James Allen. 2013. Integrating programming by example and natural language programming. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. AAAI Press, 661–667.
[32]
Anders Møller. 2017. dk.brics.automaton – Finite-State Automata and Regular Expressions for Java. http://www.brics.dk/automaton/.
[33]
Arvind Neelakantan, Quoc V. Le, Martín Abadi, Andrew McCallum, and Dario Amodei. 2016. Learning a Natural Language Interface with Neural Programmer. CoRR abs/1611.08945 (2016). arXiv: 1611.08945 http://arxiv.org/abs/1611.08945
[34]
Maxwell I. Nye, Luke B. Hewitt, Joshua B. Tenenbaum, and Armando Solar-Lezama. 2019. Learning to Infer Program Sketches. CoRR abs/1902.06349 (2019). arXiv: 1902.06349 http://arxiv.org/abs/1902.
[35]
06349 PLDI ’20, June 15–20, 2020, London, UK Qiaochu Chen, Xinyu Wang, Xi Ye, Greg Durrett, and Isil Dillig
[36]
Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-exampledirected program synthesis. In ACM SIGPLAN Notices, Vol. 50. ACM, 619–630.
[37]
Rong Pan, Qinheping Hu, Gaowei Xu, and Loris D’Antoni. 2019. Automatic Repair of Regular Expressions. Proc. ACM Program. Lang. 3, OOPSLA, Article 139 (Oct. 2019), 29 pages.
[38]
Rajesh Parekh and Vasant Honavar. 1996. An incremental interactive algorithm for regular grammar inference. In Grammatical Interference: Learning Syntax from Sentences, Laurent Miclet and Colin de la Higuera (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 238–249.
[39]
Rajesh Parekh and Vasant Honavar. 2001. Learning DFA from Simple Examples. Machine Learning 44, 1 (01 Jul 2001), 9–35. 10.1023/A:1010822518073
[40]
Chris Quirk, Raymond Mooney, and Michel Galley. 2015. Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 878–888.
[41]
Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. 2015. Compositional Program Synthesis from Natural Language and Examples. In IJCAI.
[42]
R. L. Rivest and R. E. Schapire. 1989. Inference of Finite Automata Using Homing Sequences. In Proceedings of the Twenty-first Annual ACM Symposium on Theory of Computing (STOC ’89). ACM, 411–420.
[43]
Ashish Tiwari, Adrià Gascón, and Bruno Dutertre. 2015. Program Synthesis Using Dual Interpretation. In Automated Deduction - CADE- 25, Amy P. Felty and Aart Middeldorp (Eds.). Springer International Publishing, 482–497.
[44]
Xinyu Wang, Sumit Gulwani, and Rishabh Singh. 2016. FIDEX: Filtering Spreadsheet Data Using Examples. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2016). ACM, 195–213.
[45]
Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. 2016. Synthesizing transformations on hierarchically structured data. In ACM SIGPLAN Notices, Vol. 51. ACM, 508–521.
[46]
Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: Query Synthesis from Natural Language. Proc. ACM Program. Lang. 1, OOPSLA, Article 63 (Oct. 2017), 26 pages.
[47]
John M. Zelle and Raymond J. Mooney. 1996. Learning to Parse Database Queries Using Inductive Logic Programming. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2 (AAAI’96). AAAI Press, 1050–1055.
[48]
Luke S. Zettlemoyer and Michael Collins. 2005. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.

Cited By

View all
  • (2024)Combining Regular Expressions and Machine Learning for SQL Injection Detection in Urban ComputingJournal of Internet Services and Applications10.5753/jisa.2024.379915:1(103-111)Online publication date: 2-Jul-2024
  • (2024)Refinement Types for VisualizationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695550(1871-1881)Online publication date: 27-Oct-2024
  • (2024)Repairing Regex-Dependent String FunctionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695005(294-305)Online publication date: 27-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2020
1174 pages
ISBN:9781450376136
DOI:10.1145/3385412
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Program Synthesis
  2. Programming by Example
  3. Programming by Natural Languages
  4. Regular Expression

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)8
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Combining Regular Expressions and Machine Learning for SQL Injection Detection in Urban ComputingJournal of Internet Services and Applications10.5753/jisa.2024.379915:1(103-111)Online publication date: 2-Jul-2024
  • (2024)Refinement Types for VisualizationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695550(1871-1881)Online publication date: 27-Oct-2024
  • (2024)Repairing Regex-Dependent String FunctionsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695005(294-305)Online publication date: 27-Oct-2024
  • (2024)ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into PrinciplesProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645144(853-868)Online publication date: 18-Mar-2024
  • (2024)Efficient Bottom-Up Synthesis for Programs with Local VariablesProceedings of the ACM on Programming Languages10.1145/36328948:POPL(1540-1568)Online publication date: 5-Jan-2024
  • (2024)Structure and design of multimodal dataset for automatic regex synthesis methods in Roman UrduInternational Journal of Data Science and Analytics10.1007/s41060-024-00612-yOnline publication date: 23-Jul-2024
  • (2024)Automatic regex synthesis methods for english: a comparative analysisKnowledge and Information Systems10.1007/s10115-024-02232-1Online publication date: 3-Oct-2024
  • (2024)Enhancing Multi-modal Regular Expression Synthesis via Large Language Models and Semantic Manipulations of Sub-expressionsDependable Software Engineering. Theories, Tools, and Applications10.1007/978-981-96-0602-3_7(122-141)Online publication date: 25-Nov-2024
  • (2024)Relational Synthesis of Recursive Programs via Constraint Annotated Tree AutomataComputer Aided Verification10.1007/978-3-031-65633-0_3(41-63)Online publication date: 26-Jul-2024
  • (2023)News Classification and Categorization with Smart Function Sentiment AnalysisInternational Journal of Intelligent Systems10.1155/2023/17843942023:1Online publication date: 13-Nov-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media