Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2737924.2738002acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Interactive parser synthesis by example

Published: 03 June 2015 Publication History

Abstract

Despite decades of research on parsing, the construction of parsers remains a painstaking, manual process prone to subtle bugs and pitfalls. We present a programming-by-example framework called Parsify that is able to synthesize a parser from input/output examples. The user does not write a single line of code. To achieve this, Parsify provides: (a) an iterative algorithm for synthesizing and refining a grammar one example at a time, (b) an interface that provides immediate visual feedback in response to changes in the grammar being refined, and (c) a graphical mechanism for specifying example parse trees using only textual selections. We empirically demonstrate the viability of our approach by using Parsify to construct parsers for source code drawn from Verilog, SQL, Apache, and Tiger.

References

[1]
A. V. Aho and J. D. Ullman. The Theory of Parsing, Translation, and Compiling. Prentice-Hall, 1972.
[2]
D. Angluin. Inference of reversible languages. J. ACM, 29(3), 1982.
[3]
D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 1987.
[4]
A. W. Appel. Modern Compiler Implementation in ML: Basic Techniques. Cambridge University Press, 1997.
[5]
M. F. Arlitt and C. L. Williamson. Web server workload characterization: The search for invariants. In SIGMETRICS, 1996.
[6]
census-postgres, 2014. URL https://github.com/leehach/ census-postgres.
[7]
A. Cypher, editor. Watch What I Do – Programming by Demonstration. MIT Press, 1993.
[8]
M. Daly, M. F. Fernández, K. Fisher, Y. Mandelbaum, and D. Walker. LAUNCHPADS: A system for processing ad hoc data. In PLAN-X, 2006.
[9]
A. Dubey, S. Aggarwal, and P. Jalote. A technique for extracting keyword based rules from a set of programs. In CSMR, 2005.
[10]
K. Fisher, D. Walker, K. Q. Zhu, and P. White. From dirt to shovels: Fully automatic tool generation from ad hoc data. In POPL, 2008.
[11]
B. Ford. Parsing expression grammars: A recognition-based syntactic foundation. In POPL, 2004.
[12]
GNU Bison manual. GNU Software Foundation. URL http: //www.gnu.org/software/bison/manual/.
[13]
R. Grimm. Better extensibility through modular syntax. In PLDI, 2006.
[14]
D. Grune and C. J. H. Jacobs. Parsing Techniques: A Practical Guide. Ellis Horwood, 1990.
[15]
S. Gulwani. Automating string processing in spreadsheets using inputoutput examples. In POPL, 2011.
[16]
S. Gulwani. Synthesis from examples: Interaction models and algorithms. In SYNASC, 2012.
[17]
W. R. Harris and S. Gulwani. Spreadsheet table transformations from examples. In PLDI, 2011.
[18]
P. Hart, N. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on, 4(2), 1968.
[19]
instaparse, 2014. URL https://github.com/Engelberg/ instaparse.
[20]
P. Klint and E. Visser. Using filters for the disambiguation of contextfree grammars. In ASMICS, 1994.
[21]
P. Klint, R. Lämmel, and C. Verhoef. Toward an engineering discipline for grammarware. ACM TOSEM, 14(3), 2005.
[22]
T. Lau, S. A. Wolfman, P. Domingos, and D. S. Weld. Programming by demonstration using version space algebra. Mach. Learn., 53(1-2), 2003.
[23]
V. Le and S. Gulwani. FlashExtract: A framework for data extraction by examples. In PLDI, 2014.
[24]
L. Lee. Learning of context-free languages: A survey of the literature. Technical Report TR-12-96, Harvard University, 1996.
[25]
T. Lei, F. Long, R. Barzilay, and M. C. Rinard. From natural language specifications to program input parsers. In ACL, 2013.
[26]
S. McPeak and G. Necula. Elkhound: A fast, practical GLR parser generator. In CC, 2004.
[27]
M. Mernik, G. Gerliˇc, V. Žumer, and B. R. Bryant. Can a parser be generated from examples? In SAC, 2003.
[28]
M. Might and D. Darais. Yacc is dead. CoRR, abs/1010.5023, 2010.
[29]
R. C. Miller and B. A. Myers. Lightweight structured text processing. In USENIX ATC, 1999.
[30]
MonitorWare. Apache (Unix) log samples, 2004. URL http: //www.monitorware.com/en/logsamples/apache.php.
[31]
R. C. Moore. Removing left recursion from context-free grammars. In NAACL, 2000.
[32]
T. Parr and K. Fisher. LL(*): The foundation of the antlr parser generator. In PLDI, 2011.
[33]
Y. Sakakibara. Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97(1), 1992.
[34]
E. Scott and A. Johnstone. GLL parsing. ENTCS, 253(7), 2010.
[35]
R. Singh and S. Gulwani. Synthesizing number transformations from input-output examples. In CAV, 2012.
[36]
M. Thorup. Disambiguating grammars by exclusion of sub-parse trees. Acta Informatica, 33(5), 1996.
[37]
M. Tomita. Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer Academic Publishers, 1985.
[38]
E. Vidal. Grammatical inference: An introductory survey. In Grammatical Inference and Applications, LNCS. 1994.
[39]
K. Yessenov, S. Tulsiani, A. Menon, R. C. Miller, S. Gulwani, B. Lampson, and A. Kalai. A colorful approach to text processing by example. In UIST, 2013.

Cited By

View all
  • (2023)Automated Ambiguity Detection in Layout-Sensitive GrammarsProceedings of the ACM on Programming Languages10.1145/36228387:OOPSLA2(1150-1175)Online publication date: 16-Oct-2023
  • (2023)Programming by Example Made EasyACM Transactions on Software Engineering and Methodology10.1145/360718533:1(1-36)Online publication date: 7-Jul-2023
  • (2023)Improving Oracle-Guided Inductive Synthesis by Efficient Question SelectionProceedings of the ACM on Programming Languages10.1145/35860557:OOPSLA1(819-847)Online publication date: 6-Apr-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2015
630 pages
ISBN:9781450334686
DOI:10.1145/2737924
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 50, Issue 6
    PLDI '15
    June 2015
    630 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2813885
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Parsing
  2. Program Synthesis
  3. Programming by Example

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)5
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Automated Ambiguity Detection in Layout-Sensitive GrammarsProceedings of the ACM on Programming Languages10.1145/36228387:OOPSLA2(1150-1175)Online publication date: 16-Oct-2023
  • (2023)Programming by Example Made EasyACM Transactions on Software Engineering and Methodology10.1145/360718533:1(1-36)Online publication date: 7-Jul-2023
  • (2023)Improving Oracle-Guided Inductive Synthesis by Efficient Question SelectionProceedings of the ACM on Programming Languages10.1145/35860557:OOPSLA1(819-847)Online publication date: 6-Apr-2023
  • (2023)Grammar-Based String Refinement TypesProceedings of the 45th International Conference on Software Engineering: Companion Proceedings10.1109/ICSE-Companion58688.2023.00072(267-269)Online publication date: 14-May-2023
  • (2023)Symbolic encoding of LL(1) parsing and its applicationsFormal Methods in System Design10.1007/s10703-023-00420-361:2-3(338-379)Online publication date: 22-Jun-2023
  • (2022)Almost correct invariants: synthesizing inductive invariants by fuzzing proofsProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3533767.3534381(352-364)Online publication date: 18-Jul-2022
  • (2020)Question selection for interactive program synthesisProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386025(1143-1158)Online publication date: 11-Jun-2020
  • (2019)On the fly synthesis of edit suggestionsProceedings of the ACM on Programming Languages10.1145/33605693:OOPSLA(1-29)Online publication date: 10-Oct-2019
  • (2019)Phoenix: automated data-driven synthesis of repairs for static analysis violationsProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338952(613-624)Online publication date: 12-Aug-2019
  • (2018)Syntax-guided synthesis of Datalog programsProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3236024.3236034(515-527)Online publication date: 26-Oct-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media