Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3486608.3486900acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Fast incremental PEG parsing

Published: 22 November 2021 Publication History

Abstract

Incremental parsing is an integral part of code analysis performed by text editors and integrated development environments. This paper presents new methods to significantly improve the efficiency of incremental parsing for Parsing Expression Grammars (PEGs). We build on Incremental Packrat Parsing, an algorithm that adapts packrat parsing to an incremental setting, by implementing the memoization table as an interval tree with special support for shifting intervals, and modifying the memoization strategy to create tree structures in the table. Our approach enables reparsing in time logarithmic in the size of the input for typical edits, compared with linear-time reparsing for Incremental Packrat Parsing. We implement our methods in a prototype called GPeg, a parsing machine for PEGs with support for dynamic parsers (an important feature for extensibility in editors). Experiments show that GPeg has strong performance (sub-5ms reparse times) across a variety of input sizes (tens to hundreds of megabytes) and grammar types (from full language grammars to minimal grammars), and compares well with existing incremental parsers. As a complete example, we implement a syntax highlighting library and prototype editor using GPeg, with optimizations for these applications.

Supplementary Material

Auxiliary Presentation Video (splashws21slemain-p5-p-video.mp4)
This is a video of my talk at SLE 2021 on our research paper titled "Fast Incremental PEG Parsing." I explain our three primary improvements to the Incremental Packrat Parsing algorithm that enable logarithmic rather than linear-time reparsing for typical edits. In particular we use an interval tree with lazy shifts for the memoization table, and enforce a new memoization strategy for the Kleene star operator in order to replace linear structures with logarithmic structures in the memoization table. I also explain some space optimizations we apply and show how our evaluation demonstrates the scaling behavior we expect as input size increases and over a large number of edits. Our techniques are implemented in a library called GPeg, which is open-source and available online.

References

[1]
2021. JDK 7 webpage. https://openjdk.java.net/projects/jdk7/
[2]
Ico Doornekamp. 2021. NPeg webpage. https://github.com/zevv/npeg
[3]
Patrick Dubroy and Alessandro Warth. 2017. Incremental Packrat Parsing. In Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering (SLE 2017). Association for Computing Machinery, New York, NY, USA. 14–25. isbn:9781450355254 https://doi.org/10.1145/3136014.3136022
[4]
Max Brunsfeld et al. 2021. Tree Sitter webpage. https://tree-sitter.github.io/tree-sitter/
[5]
Mitchell Foral. [n.d.]. Scintillua. https://orbitalquark.github.io/scintillua/index.html
[6]
Bryan Ford. 2002. Packrat Parsing: Simple, Powerful, Lazy, Linear Time, Functional Pearl. SIGPLAN Not., 37, 9 (2002), Sept., 36–47. issn:0362-1340 https://doi.org/10.1145/583852.581483
[7]
Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-Based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’04). Association for Computing Machinery, New York, NY, USA. 111–122. isbn:158113729X https://doi.org/10.1145/964001.964011
[8]
Eclipse Foundation. 2020. Ceylon webpage. https://github.com/eclipse/ceylon
[9]
Carlo Ghezzi and Dino Mandrioli. 1979. Incremental Parsing. ACM Trans. Program. Lang. Syst., 1, 1 (1979), Jan., 58–70. issn:0164-0925 https://doi.org/10.1145/357062.357066
[10]
Carlo Ghezzi and Dino Mandrioli. 1980. Augmenting Parsers to Support Incrementality. J. ACM, 27, 3 (1980), July, 564–579. issn:0004-5411 https://doi.org/10.1145/322203.322215
[11]
Roberto Ierusalimschy. 2009. A Text Pattern-Matching Tool Based on Parsing Expression Grammars. Softw. Pract. Exper., 39, 3 (2009), March, 221–258. issn:0038-0644
[12]
Roberto Ierusalimschy. 2019. LPeg: Parsing Expression Grammars for Lua. http://www.inf.puc-rio.br/~roberto/lpeg
[13]
Donald E. Knuth. 1971. Top-down Syntax Analysis. Acta Inf., 1, 2 (1971), June, 79–110. issn:0001-5903 https://doi.org/10.1007/BF00289517
[14]
Ilya Lakhin. 2013. Papa Carlo webpage. https://lakhin.com/projects/papa-carlo/
[15]
Sérgio Medeiros and Roberto Ierusalimschy. 2008. A Parsing Machine for PEGs. In Proceedings of the 2008 Symposium on Dynamic Languages (DLS ’08). Association for Computing Machinery, New York, NY, USA. Article 2, 12 pages. isbn:9781605582702 https://doi.org/10.1145/1408681.1408683
[16]
Arvid M. Murching, Y. V. Prasad, and Y. N. Srikant. 1990. Incremental Recursive Descent Parsing. Comput. Lang., 15, 4 (1990), Oct., 193–204. issn:0096-0551 https://doi.org/10.1016/0096-0551(90)90020-P
[17]
J. J. Shilling. 1993. Incremental LL(1) Parsing in Language-Based Editors. IEEE Trans. Softw. Eng., 19, 9 (1993), Sept., 935–940. issn:0098-5589 https://doi.org/10.1109/32.241775
[18]
Andrew Snodgrass. 2021. Peg webpage. https://github.com/pointlander/peg
[19]
Tim Allen Wagner. 1998. Practical Algorithms for Incremental Software Development Environments. Ph.D. Dissertation. USA. UMI Order No. GAX98-03388.
[20]
Tim A. Wagner and Susan L. Graham. 1998. Efficient and Flexible Incremental Parsing. ACM Trans. Program. Lang. Syst., 20, 5 (1998), Sept., 980–1013. issn:0164-0925 https://doi.org/10.1145/293677.293678

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SLE 2021: Proceedings of the 14th ACM SIGPLAN International Conference on Software Language Engineering
October 2021
176 pages
ISBN:9781450391115
DOI:10.1145/3486608
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. PEG
  2. incremental parsing
  3. packrat parsing

Qualifiers

  • Research-article

Conference

SLE '21
Sponsor:

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 115
    Total Downloads
  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media