Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Public Access

On the complexity and performance of parsing with derivatives

Published: 02 June 2016 Publication History

Abstract

Current algorithms for context-free parsing inflict a trade-off between ease of understanding, ease of implementation, theoretical complexity, and practical performance. No algorithm achieves all of these properties simultaneously. Might et al. introduced parsing with derivatives, which handles arbitrary context-free grammars while being both easy to understand and simple to implement. Despite much initial enthusiasm and a multitude of independent implementations, its worst-case complexity has never been proven to be better than exponential. In fact, high-level arguments claiming it is fundamentally exponential have been advanced and even accepted as part of the folklore. Performance ended up being sluggish in practice, and this sluggishness was taken as informal evidence of exponentiality. In this paper, we reexamine the performance of parsing with derivatives. We have discovered that it is not exponential but, in fact, cubic. Moreover, simple (though perhaps not obvious) modifications to the implementation by Might et al. lead to an implementation that is not only easy to understand but also highly performant in practice.

References

[1]
Bison. Bison. URL https://www.gnu.org/software/bison/. Janusz A. Brzozowski. Derivatives of regular expressions. Journal of the ACM (JACM), 11(4):481–494, October 1964. ISSN 0004- 5411.
[2]
William Byrd. relational-parsing-with-derivatives, 2013. URL https://github.com/webyrd/ relational-parsing-with-derivatives. Russ Cox. Yacc is not dead. Blog, December 2010. URL http://research.swtch.com/yaccalive. Jay Earley. An Efficient Context-Free Parsing Algorithm. PhD thesis, Carnegie Mellon University, 1968. URL http://reports-archive.adm.cs.cmu.edu/anon/anon/ usr/ftp/scan/CMU-CS-68-earley.pdf. Jay Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–102, February 1970. ISSN 0001- 0782.
[3]
Mark Engelberg. instaparse, 2015. URL https://github. com/Engelberg/instaparse. Abraham Flaxman, Aram W. Harrow, and Gregory B. Sorkin. Strings with maximally many distinct subsequences and substrings. The Electronic Journal of Combanatorics, 11(1):R8, 2004. ISSN 1077-8926. URL http://www.combinatorics. org/ojs/index.php/eljc/article/view/v11i1r8. Bryan Ford. Parsing expression grammars: a recognition-based syntactic foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’04, pages 111–122, New York, NY, USA, January 2004. ACM. ISBN 1-58113-729-X. 964001.964011.
[4]
Gary A. Kildall. A unified approach to global program optimization. In Proceedings of the 1st Annual ACM SIGACTSIGPLAN Symposium on Principles of Programming Languages, POPL ’73, pages 194–206, New York, NY, USA, October 1973. ACM.
[5]
Dexter Kozen. A completeness theorem for kleene algebras and the algebra of regular events. Information and Computation, 110(2): 366–390, May 1994. ISSN 0890-5401.
[7]
Bernard Lang. Deterministic techniques for efficient nondeterministic parsers. In Prof. Dr.-Ing. J. Loeckx, editor, Automata, Languages and Programming, volume 14 of Lecture Notes in Computer Science, pages 255–269. Springer Berlin Heidelberg, 1974. ISBN 978-3-540-06841-9. 3-540-06841-4_65. Tommy McGuire. Java-Parser-Derivatives, 2012. URL https://github.com/tmmcguire/ Java-Parser-Derivatives. Gary H. Merrill. Parsing non-LR(k) grammars with yacc. Software: Practice and Experience, 23(8):829–850, August 1993. ISSN 1097-024X.
[8]
Matthew Might. derp documentation, 2013. URL http://matt.might.net/teaching/compilers/ spring-2013/derp.html. Matthew Might, David Darais, and Daniel Spiewak. Parsing with derivatives: a functional pearl. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming, ICFP ’11, pages 189–195, New York, NY, USA, September 2011. ACM. ISBN 978-1-4503-0865-6. 1145/2034773.2034801.
[9]
Russell Mull. parsing-with-derivatives, 2013. URL https: //github.com/mullr/parsing-with-derivatives. Scott Owens, John Reppy, and Aaron Turon. Regular-expression derivatives re-examined. Journal of Functional Programming, 19(02):173–190, March 2009. ISSN 1469-7653. S0956796808007090.
[10]
Per Vognsen. parser, 2012. URL https://gist.github.com/ pervognsen/815b208b86066f6d7a00. Introduction Background The bzd Derivative Parsing Expressions Derivatives of Parsing Expressions Nullability Derivatives of Context-free Languages Representation Computation Performance Complexity Analysis Total Running Time in Terms of Grammar Nodes Grammar Nodes in Terms of Input Length Running Time in Terms of Input Length Improving Performance in Practice Benchmarks Computing Fixed Points Compaction Right-hand Children of Sequence Nodes Canonicalizing Chains of Sequence Nodes Avoiding Separate Passes Hash Tables and Memoization Conclusion References

Cited By

View all
  • (2020)Staged selective parser combinatorsProceedings of the ACM on Programming Languages10.1145/34090024:ICFP(1-30)Online publication date: 3-Aug-2020
  • (2017)Derivatives of Parsing Expression GrammarsElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.252.18252(180-194)Online publication date: 21-Aug-2017
  • (2023)Derivatives of Context-free Grammars with LookaheadJournal of Information Processing10.2197/ipsjjip.31.42131(421-431)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. On the complexity and performance of parsing with derivatives

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 6
    PLDI '16
    June 2016
    726 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2980983
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
    • cover image ACM Conferences
      PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
      June 2016
      726 pages
      ISBN:9781450342612
      DOI:10.1145/2908080
      • General Chair:
      • Chandra Krintz,
      • Program Chair:
      • Emery Berger
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 June 2016
    Published in SIGPLAN Volume 51, Issue 6

    Check for updates

    Author Tags

    1. Parsing
    2. Parsing with derivatives
    3. Performance

    Qualifiers

    • Article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)253
    • Downloads (Last 6 weeks)34
    Reflects downloads up to 29 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Staged selective parser combinatorsProceedings of the ACM on Programming Languages10.1145/34090024:ICFP(1-30)Online publication date: 3-Aug-2020
    • (2017)Derivatives of Parsing Expression GrammarsElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.252.18252(180-194)Online publication date: 21-Aug-2017
    • (2023)Derivatives of Context-free Grammars with LookaheadJournal of Information Processing10.2197/ipsjjip.31.42131(421-431)Online publication date: 2023
    • (2022)Oregano: staging regular expressions with Moore Cayley fusionProceedings of the 15th ACM SIGPLAN International Haskell Symposium10.1145/3546189.3549916(66-80)Online publication date: 6-Sep-2022
    • (2022)Manipulation of Regular Expressions Using Derivatives: An OverviewImplementation and Application of Automata10.1007/978-3-031-07469-1_2(19-33)Online publication date: 28-Jun-2022
    • (2021)CoStar: a verified ALL(*) parserProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454053(420-434)Online publication date: 19-Jun-2021
    • (2021)Bohemia – A Validator for Parser Frameworks2021 IEEE Security and Privacy Workshops (SPW)10.1109/SPW53761.2021.00030(162-170)Online publication date: May-2021
    • (2021)On the size of partial derivatives and the word membership problemActa Informatica10.1007/s00236-021-00399-658:4(357-375)Online publication date: 19-Jul-2021
    • (2020)Parsing with zippers (functional pearl)Proceedings of the ACM on Programming Languages10.1145/34089904:ICFP(1-28)Online publication date: 3-Aug-2020
    • (2020)Parsing a markup language that supports overlap and discontinuityProceedings of the ACM Symposium on Document Engineering 202010.1145/3395027.3419590(1-4)Online publication date: 29-Sep-2020
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media