Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3385412.3385992acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Zippy LL(1) parsing with derivatives

Published: 11 June 2020 Publication History

Abstract

In this paper, we present an efficient, functional, and formally verified parsing algorithm for LL(1) context-free expressions based on the concept of derivatives of formal languages. Parsing with derivatives is an elegant parsing technique, which, in the general case, suffers from cubic worst-case time complexity and slow performance in practice. We specialise the parsing with derivatives algorithm to LL(1) context-free expressions, where alternatives can be chosen given a single token of lookahead. We formalise the notion of LL(1) expressions and show how to efficiently check the LL(1) property. Next, we present a novel linear-time parsing with derivatives algorithm for LL(1) expressions operating on a zipper-inspired data structure. We prove the algorithm correct in Coq and present an implementation as a part of Scallion, a parser combinators framework in Scala with enumeration and pretty printing capabilities.

References

[1]
Michael D. Adams, Celeste Hollenbeck, and Matthew Might. 2016.
[2]
On the Complexity and Performance of Parsing with Derivatives. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (Santa Barbara, CA, USA) (PLDI ’16). ACM, New York, NY, USA, 224–236. 2908080.2908128
[3]
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006.
[4]
Compilers: Principles, Techniques, and Tools (2nd Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
[5]
Alfred V. Aho and Jeffrey D. Ullman. 1972.
[6]
The theory of parsing, translation, and compiling. 1: Parsing. Prentice-Hall.
[7]
Fahad Ausaf, Roy Dyckhoff, and Christian Urban. 2016. POSIX Lexing with Derivatives of Regular Expressions. Archive of Formal Proofs (May 2016). http://isa-afp.org/entries/Posix-Lexing.html, Formal proof development.
[8]
Ralph Becket and Zoltan Somogyi. 2008. DCGs+ memoing= packrat parsing but is it worth it?. In International Symposium on Practical Aspects of Declarative Languages. Springer, 182–196.
[9]
Cloudflare Blog. 2019.
[10]
Incident report on memory leak caused by Cloudflare parser bug. https://blog.cloudflare.com/incident-reporton-memory-leak-caused-by-cloudflare-parser-bug/.
[11]
Jonathan Immanuel Brachthäuser, Tillmann Rendel, and Klaus Ostermann. 2016.
[12]
Parsing with First-class Derivatives. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (Amsterdam, Netherlands) (OOPSLA 2016). ACM, New York, NY, USA, 588–606.
[13]
Anne Brüggemann-Klein and Derick Wood. 1992. Deterministic regular languages. In Annual Symposium on Theoretical Aspects of Computer Science. Springer, 173–184.
[14]
Janusz A Brzozowski. 1964.
[15]
Derivatives of regular expressions. In Journal of the ACM. Citeseer.
[16]
William H Burge. 1975. Recursive programming techniques. (1975).
[17]
John Cocke. 1969. Programming languages and their compilers: Preliminary notes. (1969).
[18]
Nils Anders Danielsson. 2010. Total Parser Combinators. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (Baltimore, Maryland, USA) (ICFP ’10). ACM, New York, NY, USA, 285–296.
[19]
Franklin Lewis DeRemer. 1969.
[20]
Practical translators for LR (k) languages. Ph.D. Dissertation. Massachusetts Institute of Technology.
[21]
Jay Earley. 1970. An efficient context-free parsing algorithm. Commun. ACM 13, 2 (1970), 94–102.
[22]
Jeroen Fokker. 1995.
[23]
Functional parsers. In International School on Advanced Functional Programming. Springer, 1–23.
[24]
Bryan Ford. 2002.
[25]
Packrat Parsing:: Simple, Powerful, Lazy, Linear Time, Functional Pearl. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (Pittsburgh, PA, USA) (ICFP ’02). ACM, New York, NY, USA, 36–47. 1145/581478.581483
[26]
Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Venice, Italy) (POPL ’04). ACM, New York, NY, USA, 111–122. 1145/964001.964011
[27]
Robert Grimm. 2004.
[28]
Practical Packrat Parsing. Technical Report. New York University.
[29]
Li Haoyi. 2019. FastParse 2.1.3. http://www.lihaoyi.com/fastparse/.
[30]
Ian Henriksen, Gianfranco Bilardi, and Keshav Pingali. 2019. Derivative Grammars: A Symbolic Approach to Parsing with Derivatives. Proc. ACM Program. Lang. 3, OOPSLA, Article 127 (Oct. 2019), 28 pages.
[31]
Gérard Huet. 1997. The zipper. Journal of functional programming 7, 5 (1997), 549–554.
[32]
Graham Hutton. 1992. Higher-order functions for parsing. Journal of functional programming 2, 3 (1992), 323–343.
[33]
Graham Hutton and Erik Meijer. 1996. Monadic parser combinators. (1996).
[34]
Adrian Johnstone and Elizabeth Scott. 1998.
[35]
Generalised recursive descent parsing and follow-determinism. In International Conference on Compiler Construction. Springer, 16–30.
[36]
Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. 2012. Validating LR (1) parsers. In European Symposium on Programming. Springer, 397–416.
[37]
Tadao Kasami. 1966.
[38]
An efficient recognition and syntax-analysis algorithm for context-free languages. Coordinated Science Laboratory Report no. R-257 (1966).
[39]
Donald E Knuth. 1965. On the translation of languages from left to right. Information and control 8, 6 (1965), 607–639.
[40]
Adam Koprowski and Henri Binsztok. 2010. TRX: A formally verified parser interpreter. In European Symposium on Programming. Springer, 345–365.
[41]
Neelakantan R. Krishnaswami and Jeremy Yallop. 2019.
[42]
A Typed, Algebraic Approach to Parsing. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (Phoenix, AZ, USA) (PLDI 2019). ACM, New York, NY, USA, 379–393.
[43]
Ramana Kumar, Magnus O. Myreen, Michael Norrish, and Scott Owens. 2014. CakeML: A Verified Implementation of ML. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Diego, California, USA) (POPL ’14). ACM, New York, NY, USA, 179–191.
[44]
LAMP EPFL and Lightbend, Inc. 2019.
[45]
Scala Parser Combinators. https://github.com/scala/scala-parser-combinators.
[46]
Bernard Lang. 1974.
[47]
Deterministic techniques for efficient nondeterministic parsers. In International Colloquium on Automata, Languages, and Programming. Springer, 255–269.
[48]
Sam Lasser, Chris Casinghino, Kathleen Fisher, and Cody Roux. 2019.
[49]
A Verified LL (1) Parser Generator. In 10th International Conference on Interactive Theorem Proving (ITP 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
[50]
Daan Leijen and Erik Meijer. 2001. Parsec: Direct style monadic parser combinators for the real world. (2001).
[51]
Haas Leiß. 1991. Towards Kleene algebra with recursion. In International Workshop on Computer Science Logic. Springer, 242–256.
[52]
Xavier Leroy. 2009. Formal verification of a realistic compiler. Commun. ACM 52, 7 (2009), 107–115.
[53]
P. M. Lewis, II and R. E. Stearns. 1968. Syntax-Directed Transduction. J. ACM 15, 3 (July 1968), 465–488.
[54]
Conor McBride. 2001. The Derivative of a Regular Type is its Type of One-Hole Contexts (Extended Abstract).
[55]
Conor McBride and Ross Paterson. 2008. Applicative programming with effects. Journal of functional programming 18, 1 (2008), 1–13.
[56]
Matthew Might, David Darais, and Daniel Spiewak. 2011.
[57]
Parsing with Derivatives: A Functional Pearl. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming (Tokyo, Japan) (ICFP ’11). ACM, New York, NY, USA, 189–195. 10.1145/2034773.2034801
[58]
Vazha Omanashvili. 2019.
[59]
JSON Generator. https://www.jsongenerator.com. Accessed 2019-11-20.
[60]
Terence Parr. 2013.
[61]
The definitive ANTLR 4 reference. Pragmatic Bookshelf.
[62]
Terence Parr. 2019. Grammars written for ANTLR v4; expectation that the grammars are free of actions. https://github.com/antlr/grammarsv4/tree/master/json. Accessed 2019-11-22. PLDI ’20, June 15–20, 2020, London, UK Romain Edelmann, Jad Hamza, and Viktor Kunčak
[63]
Terence Parr and Kathleen Fisher. 2011. LL(*): the foundation of the ANTLR parser generator. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 425–436. 1145/1993498.1993548
[64]
Benjamin C. Pierce, Arthur Azevedo de Amorim, Chris Casinghino, Marco Gaboardi, Michael Greenberg, Cătălin Hriţcu, Vilhelm Sjöberg, and Brent Yorgey. 2018.
[65]
Logical Foundations. Electronic textbook. Version 5.5. http://www.cis.upenn.edu/~bcpierce/sf.
[66]
Aleksandar Prokopec. 2019. Scalameter: Automate your performance testing today. https://scalameter.github.io /. Accessed 2019-11-20.
[67]
Alexey Radul. 2009. Propagation networks: A flexible and expressive substrate for computation. (2009).
[68]
Tahina Ramananandro, Antoine Delignat-Lavaud, Cédric Fournet, Nikhil Swamy, Tej Chajed, Nadim Kobeissi, and Jonathan Protzenko. 2019.
[69]
EverParse: Verified Secure Zero-Copy Parsers for Authenticated Message Formats. In 28th USENIX Security Symposium, USENIX Security 2019, Santa Clara, CA, USA, August 14-16, 2019.
[70]
1465–1482.
[71]
https://www.usenix.org/conference/usenixsecurity19/ presentation/delignat-lavaud
[72]
Roman R Redziejowski. 2008.
[73]
Some aspects of parsing expression grammar. Fundamenta Informaticae 85, 1-4 (2008), 441–451.
[74]
Tillmann Rendel and Klaus Ostermann. 2010.
[75]
Invertible Syntax Descriptions: Unifying Parsing and Pretty Printing. In Proceedings of the Third ACM Haskell Symposium on Haskell (Baltimore, Maryland, USA) (Haskell ’10). ACM, New York, NY, USA, 1–12.
[76]
Elizabeth Scott and Adrian Johnstone. 2010. GLL parsing. Electronic Notes in Theoretical Computer Science 253, 7 (2010), 177–189.
[77]
Matthieu Sozeau and Cyprien Mangin. 2019. Equations reloaded: highlevel dependently-typed functional programming and proving in Coq. Proceedings of the ACM on Programming Languages 3, ICFP (2019), 86.
[78]
Daniel Spiewak. 2018.
[79]
Parseback. https://github.com/djspiewak/ parseback.
[80]
Guy L Steele Jr. 1980. The definition and implementation of a computer programming language based on constraints. (1980).
[81]
S Doaitse Swierstra and Luc Duponcheel. 1996. Deterministic, errorcorrecting combinator parsers. In International School on Advanced Functional Programming. Springer, 184–207.
[82]
Dmitriy Traytel. 2015. Derivatives of Logical Formulas. Archive of Formal Proofs (May 2015). http://isa-afp.org/entries/Formula_Derivatives. html, Formal proof development.
[83]
Dmitriy Traytel and Tobias Nipkow. 2014. Decision Procedures for MSO on Words Based on Derivatives of Regular Expressions. Archive of Formal Proofs (June 2014). http://isa-afp.org/entries/MSO_Regex_ Equivalence.html, Formal proof development.
[84]
Philip Wadler. 1985.
[85]
Daniel H Younger. 1967.

Cited By

View all
  • (2024)Daedalus: Safer Document ParsingProceedings of the ACM on Programming Languages10.1145/36564108:PLDI(816-840)Online publication date: 20-Jun-2024
  • (2023)A Haskell Library for Adaptable Parsing Expression GrammarsProceedings of the XXVII Brazilian Symposium on Programming Languages10.1145/3624309.3624313(73-81)Online publication date: 25-Sep-2023
  • (2023)A Derivative-based Parser Generator for Visibly Pushdown GrammarsACM Transactions on Programming Languages and Systems10.1145/359147245:2(1-68)Online publication date: 15-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2020
1174 pages
ISBN:9781450376136
DOI:10.1145/3385412
This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Derivatives
  2. Formal proof
  3. LL(1)
  4. Parsing
  5. Zipper

Qualifiers

  • Research-article

Funding Sources

  • Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung

Conference

PLDI '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)3
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Daedalus: Safer Document ParsingProceedings of the ACM on Programming Languages10.1145/36564108:PLDI(816-840)Online publication date: 20-Jun-2024
  • (2023)A Haskell Library for Adaptable Parsing Expression GrammarsProceedings of the XXVII Brazilian Symposium on Programming Languages10.1145/3624309.3624313(73-81)Online publication date: 25-Sep-2023
  • (2023)A Derivative-based Parser Generator for Visibly Pushdown GrammarsACM Transactions on Programming Languages and Systems10.1145/359147245:2(1-68)Online publication date: 15-May-2023
  • (2023)Type-based Termination Analysis for Parsing Expression GrammarsProceedings of the 38th ACM/SIGAPP Symposium on Applied Computing10.1145/3555776.3577620(1372-1379)Online publication date: 27-Mar-2023
  • (2023)Verified ALL(*) Parsing with Semantic Actions and Dynamic Input ValidationNASA Formal Methods10.1007/978-3-031-33170-1_25(414-429)Online publication date: 3-Jun-2023
  • (2021)A derivative-based parser generator for visibly Pushdown grammarsProceedings of the ACM on Programming Languages10.1145/34855285:OOPSLA(1-24)Online publication date: 15-Oct-2021
  • (2021)CoStar: a verified ALL(*) parserProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454053(420-434)Online publication date: 19-Jun-2021
  • (2020)Parsing with zippers (functional pearl)Proceedings of the ACM on Programming Languages10.1145/34089904:ICFP(1-28)Online publication date: 3-Aug-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media