Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

flap: A Deterministic Parser with Fused Lexing

Published: 06 June 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity and reduces the potential for parsing ambiguity. However, materializing tokens as data structures and case-switching on tokens comes with a cost.
    We show how to fuse separately-defined lexers and parsers, drastically improving performance without compromising modularity or increasing ambiguity. We propose a deterministic variant of Greibach Normal Form that ensures deterministic parsing with a single token of lookahead and makes fusion strikingly simple, and prove that normalizing context free expressions into the deterministic normal form is semantics-preserving. Our staged parser combinator library, flap, provides a standard interface, but generates specialized token-free code that runs two to six times faster than ocamlyacc on a range of benchmarks.

    References

    [1]
    Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. 2007. Compilers: principles, techniques, and tools. 2, Addison-wesley Reading.
    [2]
    Norbert Blum and Robert Koch. 1999. Greibach Normal Form Transformation Revisited. Information and Computation, 150, 1 (1999), 112–118. issn:0890-5401 https://doi.org/10.1006/inco.1998.2772
    [3]
    Anne Brüggemann-Klein and Derick Wood. 1992. Deterministic regular languages. In STACS 92, Alain Finkel and Matthias Jantzen (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 173–184. isbn:978-3-540-46775-5
    [4]
    Janusz A. Brzozowski. 1964. Derivatives of Regular Expressions. J. ACM, 11, 4 (1964), Oct., 481–494. issn:0004-5411 https://doi.org/10.1145/321239.321249
    [5]
    Chris Casinghino and Cody Roux. 2020. ParTS: Final Report. HR001120C0016 - Final Report.
    [6]
    Duncan Coutts, Roman Leshchinskiy, and Don Stewart. 2007. Stream fusion: from lists to streams to nothing at all. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, ICFP 2007, Freiburg, Germany, October 1-3, 2007, Ralf Hinze and Norman Ramsey (Eds.). ACM, 315–326. isbn:978-1-59593-815-2 https://doi.org/10.1145/1291151.1291199
    [7]
    Olivier Danvy, Karoline Malmkjæ r, and Jens Palsberg. 1996. Eta-Expansion Does The Trick. ACM Trans. Program. Lang. Syst., 18, 6 (1996), 730–751. https://doi.org/10.1145/236114.236119
    [8]
    Giorgios Economopoulos, Paul Klint, and Jurgen J. Vinju. 2009. Faster Scannerless GLR Parsing. In Compiler Construction, 18th International Conference, CC 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings, Oege de Moor and Michael I. Schwartzbach (Eds.) (Lecture Notes in Computer Science, Vol. 5501). Springer, 126–141. https://doi.org/10.1007/978-3-642-00722-4_10
    [9]
    Jiri Filipovic, Matus Madzin, Jan Fousek, and Ludek Matyska. 2015. Optimizing CUDA code by kernel fusion: application on BLAS. J. Supercomput., 71, 10 (2015), 3934–3957. https://doi.org/10.1007/s11227-015-1483-z
    [10]
    Bryan Ford. 2002. Packrat parsing: : simple, powerful, lazy, linear time, functional pearl. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP ’02), Pittsburgh, Pennsylvania, USA, October 4-6, 2002, Mitchell Wand and Simon L. Peyton Jones (Eds.). ACM, 36–47. https://doi.org/10.1145/581478.581483
    [11]
    Matthew M. Geller, Michael A. Harrison, and Ivan M. Havel. 1976. Normal forms of deterministic grammars. Discret. Math., 16, 4 (1976), 313–321. https://doi.org/10.1016/S0012-365X(76)80004-0
    [12]
    Sheila A. Greibach. 1965. A New Normal-Form Theorem for Context-Free Phrase Structure Grammars. J. ACM, 12, 1 (1965), Jan., 42–52. issn:0004-5411 https://doi.org/10.1145/321250.321254
    [13]
    Christopher S. Hardin and Roshan P. James. 2013. Core_bench: Micro-Benchmarking for OCaml. OCaml Workshop.
    [14]
    Manohar Jonnalagedda, Thierry Coppey, Sandro Stucki, Tiark Rompf, and Martin Odersky. 2014. Staged Parser Combinators for Efficient Data Processing. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). ACM, New York, NY, USA. 637–653. isbn:978-1-4503-2585-1 https://doi.org/10.1145/2660193.2660241
    [15]
    Oleg Kiselyov, Aggelos Biboudis, Nick Palladinos, and Yannis Smaragdakis. 2017. Stream fusion, to completeness. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017, Giuseppe Castagna and Andrew D. Gordon (Eds.). ACM, 285–299. isbn:978-1-4503-4660-3 https://doi.org/10.1145/3009837
    [16]
    Ilya Klyuchnikov. 2010. Towards effective two-level supercompilation. Preprint 81. Keldysh Institute of Applied Mathematics, Moscow.
    [17]
    Neelakantan R. Krishnaswami and Jeremy Yallop. 2019. A typed, algebraic approach to parsing. 379–393. isbn:978-1-4503-6712-7 https://doi.org/10.1145/3314221.3314625
    [18]
    2019. Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019, Kathryn S. McKinley and Kathleen Fisher (Eds.). ACM. isbn:978-1-4503-6712-7 https://doi.org/10.1145/3314221
    [19]
    Jakob Nielsen. 1993. Usability Engineering. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. isbn:0125184050
    [20]
    Anton Nijholt. 1979. Strict Deterministic Grammars and Greibach Normal Form. J. Inf. Process. Cybern., 15, 8/9 (1979), 395–401.
    [21]
    Scott Owens, John H. Reppy, and Aaron Turon. 2009. Regular-expression derivatives re-examined. J. Funct. Program., 19, 2 (2009), 173–190. https://doi.org/10.1017/S0956796808007090
    [22]
    Terence Parr, Sam Harwell, and Kathleen Fisher. 2014. Adaptive LL(*) parsing: the power of dynamic analysis. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20-24, 2014, Andrew P. Black and Todd D. Millstein (Eds.). ACM, 579–598. https://doi.org/10.1145/2660193.2660202
    [23]
    François Pottier and Yann Régis-Gianas. [n. d.]. The Menhir parser generator. http://gallium.inria.fr/ fpottier/menhir/
    [24]
    Alban Reynaud, Gabriel Scherer, and Jeremy Yallop. 2021. A practical mode system for recursive definitions. Proc. ACM Program. Lang., 5, POPL (2021), 1–29. https://doi.org/10.1145/3434326
    [25]
    Laith Sakka, Kirshanthan Sundararajah, Ryan R. Newton, and Milind Kulkarni. 2019. Sound, fine-grained traversal fusion for heterogeneous trees. 830–844. isbn:978-1-4503-6712-7 https://doi.org/10.1145/3314221.3314626
    [26]
    Yakov Shafranovich. 2005. Common Format and MIME Type for Comma-Separated Values (CSV) Files. RFC 4180. https://doi.org/10.17487/RFC4180
    [27]
    Amir Shaikhha, Mohammad Dashti, and Christoph Koch. 2018. Push versus pull-based loop fusion in query engines. J. Funct. Program., 28 (2018), e10. https://doi.org/10.1017/S0956796818000102
    [28]
    Walid Taha. 1999. Multi-Stage Programming: Its Theory and Applications.
    [29]
    Mark van den Brand, Jeroen Scheerder, Jurgen J. Vinju, and Eelco Visser. 2002. Disambiguation Filters for Scannerless Generalized LR Parsers. In Compiler Construction, 11th International Conference, CC 2002, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2002, Grenoble, France, April 8-12, 2002, Proceedings, R. Nigel Horspool (Ed.) (Lecture Notes in Computer Science, Vol. 2304). Springer, 143–158. https://doi.org/10.1007/3-540-45937-5_12
    [30]
    Eric R. Van Wyk and August C. Schwerdfeger. 2007. Context-aware Scanning for Parsing Extensible Languages. In Proceedings of the 6th International Conference on Generative Programming and Component Engineering (GPCE ’07). ACM, New York, NY, USA. 63–72. isbn:978-1-59593-855-8 https://doi.org/10.1145/1289971.1289983
    [31]
    Philip Wadler. 1985. How to Replace Failure by a List of Successes. In Proc. of a Conference on Functional Programming Languages and Computer Architecture. Springer-Verlag, Berlin, Heidelberg. 113–128. isbn:3387159754
    [32]
    Philip Wadler. 1990. Deforestation: transforming programs to eliminate trees. Theoretical Computer Science, 73, 2 (1990), 231–248. issn:0304-3975 https://doi.org/10.1016/0304-3975(90)90147-A
    [33]
    Jeremy Yallop and Oleg Kiselyov. 2019. Generating Mutually Recursive Definitions. In Proceedings of the 2019 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM 2019). ACM, New York, NY, USA. 75–81. isbn:978-1-4503-6226-9 https://doi.org/10.1145/3294032.3294078
    [34]
    Jeremy Yallop, Neel Krishnaswami, and Ningning Xie. 2023. flap: A Deterministic Parser with Fused Lexing (artifact). April, https://doi.org/10.5281/zenodo.7824835
    [35]
    Jeremy Yallop, Ningning Xie, and Neel Krishnaswami. 2023. flap: A Deterministic Parser with Fused Lexing. arxiv:2304.05276.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Programming Languages
    Proceedings of the ACM on Programming Languages  Volume 7, Issue PLDI
    June 2023
    2020 pages
    EISSN:2475-1421
    DOI:10.1145/3554310
    Issue’s Table of Contents
    This work is licensed under a Creative Commons Attribution 4.0 International License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2023
    Published in PACMPL Volume 7, Issue PLDI

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. fusion
    2. lexing
    3. multi-stage programming
    4. optimization
    5. parsing

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 418
      Total Downloads
    • Downloads (Last 12 months)294
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media