research-article

Automating grammar comparison

Authors:

Ravichandhran Madhavan,

Viktor KuncakAuthors Info & Claims

OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications

Pages 183 - 200

https://doi.org/10.1145/2814270.2814304

Published: 23 October 2015 Publication History

Abstract

We consider from a practical perspective the problem of checking equivalence of context-free grammars. We present techniques for proving equivalence, as well as techniques for finding counter-examples that establish non-equivalence. Among the key building blocks of our approach is a novel algorithm for efficiently enumerating and sampling words and parse trees from arbitrary context-free grammars; the algorithm supports polynomial time random access to words belonging to the grammar. Furthermore, we propose an algorithm for proving equivalence of context-free grammars that is complete for LL grammars, yet can be invoked on any context-free grammar, including ambiguous grammars. Our techniques successfully find discrepancies between different syntax specifications of several real-world languages, and are capable of detecting fine-grained incremental modifications performed on grammars. Our evaluation shows that our tool improves significantly on the existing available state of the art tools. In addition, we used these algorithms to develop an online tutoring system for grammars that we then used in an undergraduate course on computer language processing. On questions involving grammar constructions, our system was able to automatically evaluate the correctness of 95% of the solutions submitted by students: it disproved 74% of cases and proved 21% of them.

Supplementary Material

Auxiliary Archive (p183-madhavan-s.zip)

A VM containing the executable implementation of the system described in the paper Automating Grammar Comparison, and the benchmarks used in the experimental study.

Download
2056.11 MB

References

[1]

Antlr version 4. http://www.antlr.org/.

[2]

Java 7 language specification. http://docs.oracle.com/ javase/specs/jls/se7/html/jls-18.html.

[3]

A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Princiles, Techniques, and Tools. Addison-Wesley, 1986. ISBN 0-201- 10088-6.

Digital Library

[4]

R. Axelsson, K. Heljanko, and M. Lange. Analyzing context-free grammars using an incremental SAT solver. In Automata, Languages and Programming, ICALP, pages 410–422, 2008.

Digital Library

[5]

. URL http://dx.doi.org/10.1007/ 978-3-540-70583-3_34.

[6]

C. Bastien, J. Czyzowicz, W. Fraczak, and W. Rytter. Prime normal form and equivalence of simple grammars. Theor. Comput. Sci., 363(2):124–134, 2006.

Digital Library

[7]

A. Bertoni, M. Goldwurm, and M. Santini. Random generation and approximate counting of ambiguously described combinatorial structures. In STACS 2000, pages 567–580. 2000.

Digital Library

[8]

C. Creus and G. Godoy. Automatic evaluation of context-free grammars (system description). In Rewriting and Typed Lambda Calculi RTA-TLCA, pages 139– 148, 2014.

[9]

. URL http://dx.doi.org/10.1007/ 978-3-319-08918-8_10.

[10]

B. Daniel, D. Dig, K. Garcia, and D. Marinov. Automated testing of refactoring engines. In Foundations of Software Engineering, pages 185–194, 2007.

Digital Library

[11]

P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based whitebox fuzzing. In Programming Language Design and Implementation, pages 206–215, 2008.

Digital Library

[12]

V. Gore, M. Jerrum, S. Kannan, Z. Sweedyk, and S. R. Mahaney. A quasi-polynomial-time algorithm for sampling words from a context-free language. Inf. Comput., 134(1):59–74, 1997.

Digital Library

[13]

H. Guo and Z. Qiu. Automatic grammar-based test generation. In Testing Software and Systems ICTSS, pages 17–32, 2013.

[14]

M. A. Harrison, I. M. Havel, and A. Yehudai. On equivalence of grammars through transformation trees. Theor. Comput. Sci., 9:173–205, 1979.

[15]

M. Hennessy. An analysis of rule coverage as a criterion in generating minimal test suites for grammar-based software. In Automated Software Engineering, pages 104–113, 2005.

Digital Library

[16]

T. J. Hickey and J. Cohen. Uniform random generation of strings in a context-free language. SIAM J. Comput., 12(4): 645–655, 1983.

Digital Library

[17]

A. J. Korenjak and J. E. Hopcroft. Simple deterministic languages. In Symposium on Switching and Automata Theory (Swat), pages 36–46, 1966.

Digital Library

[18]

D. Kozen. Automata and computability. Undergraduate texts in computer science. Springer, 1997. ISBN 978-0-387-94907-9.

[19]

I. Kuraj and V. Kuncak. Scife: Scala framework for efficient enumeration of data structures with invariants. In Scala Workshop, pages 45–49, 2014.

Digital Library

[20]

R. Lämmel and W. Schulte. Controllable combinatorial coverage in grammar-based testing. In Testing of Communicating Systems, TestCom, pages 19–38, 2006.

Digital Library

[21]

H. G. Mairson. Generating words in a context-free language uniformly at random. Inf. Process. Lett., 49(2):95–99, 1994.

Digital Library

[22]

R. Majumdar and R. Xu. Directed test generation using symbolic grammars. In Automated Software Engineering, pages 553–556, 2007.

Digital Library

[23]

B. A. Malloy. An interpretation of purdom’s algorithm for automatic generation of test cases. In International Conference on Computer and Information Science, pages 3–5, 2001.

[24]

P. M. Maurer. Generating test data with enhanced context-free grammars. IEEE Software, 7(4):50–55, 1990.

Digital Library

[25]

A. Nijholt. The equivalence problem for LL- and LR-regular grammars. pages 149–161, 1982.

[26]

T. Olshansky and A. Pnueli. A direct algorithm for checking equivalence of LL(k) grammars. Theor. Comput. Sci., 4(3): 321–349, 1977.

[27]

T. Parr, S. Harwell, and K. Fisher. Adaptive LL(*) parsing: the power of dynamic analysis. In Object Oriented Programming Systems Languages & Applications, OOPSLA, pages 579––598, 2014.

Digital Library

[28]

S. Pigeon. Pairing function. http://mathworld.wolfram. com/PairingFunction.html.

[29]

P. Purdom. A sentence generator for testing parsers. BIT Numerical Mathematics, pages 366–375, 1972.

[30]

D. J. Rosenkrantz and R. E. Stearns. Properties of deterministic top down grammars. In Symposium on Theory of Computing STOC, pages 165–180, 1969.

Digital Library

[31]

R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for introductory programming assignments. In Programming Language Design and Implementation PLDI, pages 15–26, 2013.

Digital Library

[32]

E. G. Sirer and B. N. Bershad. Using production grammars in software testing. In Domain-Specific Languages DSL, pages 1–13, 1999.

Digital Library

[33]

G. Sénizergues. L(a)=l(b)? decidability results from complete formal systems. Theoretical Computer Science, 251(1–2):1 – 166, 2001.

Digital Library

[34]

L. G. Valiant. Decision procedures for families of deterministic pushdown automata. Technical report, University of Warwick, Coventry, UK, 1973.

Digital Library

[35]

A. Warth, J. R. Douglass, and T. D. Millstein. Packrat parsers can support left recursion. In Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM, pages 103–110, 2008.

Digital Library

Cited By

Liu JZhu FHe F(2023)Automated Ambiguity Detection in Layout-Sensitive GrammarsProceedings of the ACM on Programming Languages10.1145/36228387:OOPSLA2(1150-1175)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622838
Baumann PGanardi MMajumdar RThinniyam RZetzsche G(2023)Context-Bounded Verification of Context-Free SpecificationsProceedings of the ACM on Programming Languages10.1145/35712667:POPL(2141-2170)Online publication date: 11-Jan-2023
https://dl.acm.org/doi/10.1145/3571266
Kalita PSingal DAgarwal PJhunjhunwala SRoy S(2023)Symbolic encoding of LL(1) parsing and its applicationsFormal Methods in System Design10.1007/s10703-023-00420-361:2-3(338-379)Online publication date: 22-Jun-2023
https://doi.org/10.1007/s10703-023-00420-3
Show More Cited By

Index Terms

Automating grammar comparison

Recommendations

Automating grammar comparison
OOPSLA '15

We consider from a practical perspective the problem of checking equivalence of context-free grammars. We present techniques for proving equivalence, as well as techniques for finding counter-examples that establish non-equivalence. Among the key ...
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages

For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04

For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications

October 2015

953 pages

ISBN:9781450336895

DOI:10.1145/2814270

General Chair:
Jonathan Aldrich
Carnegie Mellon University, USA
,
Program Chair:
Patrick Eugster
Purdue University, USA

ACM SIGPLAN Notices Volume 50, Issue 10
OOPSLA '15
October 2015
953 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2858965
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SPLASH '15

Sponsor:

SIGPLAN

SPLASH '15: Conference on Systems, Programming, Languages, and Applications: Software for Humanity

October 25 - 30, 2015

PA, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
302
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)3

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu JZhu FHe F(2023)Automated Ambiguity Detection in Layout-Sensitive GrammarsProceedings of the ACM on Programming Languages10.1145/36228387:OOPSLA2(1150-1175)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622838
Baumann PGanardi MMajumdar RThinniyam RZetzsche G(2023)Context-Bounded Verification of Context-Free SpecificationsProceedings of the ACM on Programming Languages10.1145/35712667:POPL(2141-2170)Online publication date: 11-Jan-2023
https://dl.acm.org/doi/10.1145/3571266
Kalita PSingal DAgarwal PJhunjhunwala SRoy S(2023)Symbolic encoding of LL(1) parsing and its applicationsFormal Methods in System Design10.1007/s10703-023-00420-361:2-3(338-379)Online publication date: 22-Jun-2023
https://doi.org/10.1007/s10703-023-00420-3
Schröder MPotanin A(2022)Grammar Inference for Ad Hoc ParsersCompanion Proceedings of the 2022 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity10.1145/3563768.3565550(38-42)Online publication date: 29-Nov-2022
https://dl.acm.org/doi/10.1145/3563768.3565550
Schröder MCito JPasquale LTreude C(2022)Grammars for freeProceedings of the ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results10.1145/3510455.3512787(41-45)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510455.3512787
Schroder MCito J(2022)Grammars for Free: Toward Grammar Inference for Ad Hoc Parsers2022 IEEE/ACM 44th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)10.1109/ICSE-NIER55298.2022.9793523(41-45)Online publication date: May-2022
https://doi.org/10.1109/ICSE-NIER55298.2022.9793523
Zakharov V(2021)Efficient Equivalence Checking Technique for Some Classes of Finite-State MachinesAutomatic Control and Computer Sciences10.3103/S014641162107018X55:7(670-701)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.3103/S014641162107018X
Raselimo MFischer BVisser EKolovos DSöderberg E(2021)Automatic grammar repairProceedings of the 14th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3486608.3486910(126-142)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3486608.3486910
Raselimo MFischer BNierstrasz OGray JOliveira B(2019)Spectrum-based fault localization for context-free grammarsProceedings of the 12th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3357766.3359538(15-28)Online publication date: 20-Oct-2019
https://dl.acm.org/doi/10.1145/3357766.3359538
Abdulla PAtig MChen YDiep BHolík LRezine ARümmer P(2017)Flatten and conquer: a framework for efficient analysis of string constraintsACM SIGPLAN Notices10.1145/3140587.306238452:6(602-617)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3140587.3062384
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten