research-article

Open access

CoStar: a verified ALL(*) parser

Authors:

Chris Casinghino,

Kathleen Fisher,

Cody RouxAuthors Info & Claims

PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

Pages 420 - 434

https://doi.org/10.1145/3453483.3454053

Published: 18 June 2021 Publication History

Abstract

Parsers are security-critical components of many software systems, and verified parsing therefore has a key role to play in secure software design. However, existing verified parsers for context-free grammars are limited in their expressiveness, termination properties, or performance characteristics. They are only compatible with a restricted class of grammars, they are not guaranteed to terminate on all inputs, or they are not designed to be performant on grammars for real-world programming languages and data formats.

In this work, we present CoStar, a verified parser that addresses these limitations. The parser is implemented with the Coq Proof Assistant and is based on the ALL(*) parsing algorithm. CoStar is sound and complete for all non-left-recursive grammars; it produces a correct parse tree for its input whenever such a tree exists, and it correctly detects ambiguous inputs. CoStar also provides strong termination guarantees; it terminates without error on all inputs when applied to a non-left-recursive grammar. Finally, CoStar achieves linear-time performance on a range of unambiguous grammars for commonly used languages and data formats.

References

[1]

Michael D. Adams, Celeste Hollenbeck, and Matthew Might. 2016. On the Complexity and Performance of Parsing with Derivatives. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2016). ACM, 224–236. https://doi.org/10.1145/2908080.2908128

Digital Library

[2]

Aditi Barthwal and Michael Norrish. 2009. Verified, Executable Parsing. In Proceedings of the 18th European Symposium on Programming (ESOP 2009). Springer-Verlag, 160–174. isbn:9783642005893 https://doi.org/10.1007/978-3-642-00590-9_12

Digital Library

[3]

Clement Blaudeau and Natarajan Shankar. 2020. A Verified Packrat Parser Interpreter for Parsing Expression Grammars. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs. ACM, 3–17. https://doi.org/10.1145/3372885.3373836

Digital Library

[4]

William S Cleveland. 1979. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74, 368 (1979), 829–836.

[5]

The Coq Development Team. 2020. The Coq Proof Assistant, version 8.11.0. https://doi.org/10.5281/zenodo.3744225

[6]

Nils Anders Danielsson. 2010. Total Parser Combinators. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP 2010). ACM, 285–296. https://doi.org/10.1145/1863543.1863585

Digital Library

[7]

Romain Edelmann, Jad Hamza, and Viktor Kunčak. 2020. Zippy LL(1) Parsing with Derivatives. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). ACM, 1036–1051. isbn:9781450376136 https://doi.org/10.1145/3385412.3385992

Digital Library

[8]

Denis Firsov and Tarmo Uustalu. 2014. Certified CYK Parsing of Context-Free Languages. Journal of Logical and Algebraic Methods in Programming, 83, 5-6 (2014), 459–468. https://doi.org/10.1016/j.jlamp.2014.09.002

[9]

Denis Firsov and Tarmo Uustalu. 2015. Certified Normalization of Context-Free Grammars. In Proceedings of the 2015 Conference on Certified Programs and Proofs (CPP 2015). ACM, 167–174. https://doi.org/10.1145/2676724.2693177

Digital Library

[10]

Kathleen Fisher and Robert Gruber. 2005. PADS: A Domain-Specific Language for Processing Ad Hoc Data. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2005). ACM, 295–304. isbn:1595930566 https://doi.org/10.1145/1065010.1065046

Digital Library

[11]

Bryan Ford. 2002. Packrat Parsing: Simple, Powerful, Lazy, Linear Time (Functional Pearl). In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP 2002). ACM, 36–47. isbn:1581134878 https://doi.org/10.1145/581478.581483

Digital Library

[12]

Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-Based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2004). ACM, 111–122. isbn:158113729X https://doi.org/10.1145/964001.964011

Digital Library

[13]

Dan Goodin. 2017. Failure to patch two-month-old bug led to massive Equifax breach. Ars Technica, 13 Sept., https://arstechnica.com/information-technology/2017/09/massive-equifax-breach-caused-by-failure-to-patch-two-month-old-bug

[14]

Dan Goodin. 2020. Windows has a new wormable vulnerability, and there’s no patch in sight. Ars Technica, 11 March, https://arstechnica.com/information-technology/2020/03/windows-has-a-new-wormable-vulnerability-and-theres-no-patch-in-sight

[15]

2017. Cloudflare: Cloudflare Reverse Proxies are Dumping Uninitialized Memory. https://bugs.chromium.org/p/project-zero/issues/detail?id=1139

[16]

Clinton L. Jeffery. 2003. Generating LR Syntax Error Messages from Examples. ACM Transactions on Programming Languages and Systems, 25, 5 (2003), Sept., 631–640. issn:0164-0925 https://doi.org/10.1145/937563.937566

Digital Library

[17]

Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. 2012. Validating LR(1) Parsers. In Proceedings of the 21st European Symposium on Programming (ESOP 2012). Springer, 397–416. https://doi.org/10.1007/978-3-642-28869-2_20

Digital Library

[18]

Bart Kiers. 2014. ANTLR4 Grammar for Python 3. Retrieved from. https://github.com/antlr/grammars-v4

[19]

Adam Koprowski and Henri Binsztok. 2010. TRX: A Formally Verified Parser Interpreter. In Proceedings of the 19th European Symposium on Programming (ESOP 2010). Springer, 345–365. https://doi.org/10.1007/978-3-642-11957-6_19

Digital Library

[20]

Mohit Kumar. 2020. Critical PPP Daemon Flaw Opens Most Linux Systems to Remote Hackers. The Hacker News, https://thehackernews.com/2020/03/ppp-daemon-vulnerability.html

[21]

Sam Lasser, Chris Casinghino, Kathleen Fisher, and Cody Roux. 2019. A Verified LL(1) Parser Generator. In Proceedings of the 10th International Conference on Interactive Theorem Proving (ITP 2019). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 24:1–24:18. https://doi.org/10.4230/LIPIcs.ITP.2019.24

[22]

Sam Lasser, Chris Casinghino, Kathleen Fisher, and Cody Roux. 2021. CoStar parser implementation, correctness proofs, and performance evaluation. https://doi.org/10.5281/zenodo.4681598

Digital Library

[23]

Sam Lasser, Chris Casinghino, Kathleen Fisher, and Cody Roux. 2021. GitHub repository for the development and evaluation framework. https://github.com/slasser/CoStar

[24]

P. M. Lewis and R. E. Stearns. 1968. Syntax-Directed Transduction. Journal of the ACM, 15, 3 (1968), July, 465–488. issn:0004-5411 https://doi.org/10.1145/321466.321477

Digital Library

[25]

Matthew Might, David Darais, and Daniel Spiewak. 2011. Parsing with Derivatives: A Functional Pearl. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming (ICFP 2011). Association for Computing Machinery, 189–195. isbn:9781450308656 https://doi.org/10.1145/2034773.2034801

Digital Library

[26]

2016. CVE-2016-0101. National Vulnerability Database. https://nvd.nist.gov/vuln/detail/CVE-2016-0101

[27]

2017. CVE-2017-5638. National Vulnerability Database. https://nvd.nist.gov/vuln/detail/CVE-2017-5638

[28]

2020. CVE-2020-8597. National Vulnerability Database. https://nvd.nist.gov/vuln/detail/CVE-2020-8597

[29]

Terence Parr and Kathleen Fisher. 2011. LL(*): The Foundation of the ANTLR Parser Generator. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2011). ACM, 425–436. isbn:9781450306638 https://doi.org/10.1145/1993498.1993548

Digital Library

[30]

Terence Parr, Sam Harwell, and Kathleen Fisher. 2014. Adaptive LL(*) Parsing: The Power of Dynamic Analysis. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA 2014). ACM, 579–598. isbn:9781450325851 https://doi.org/10.1145/2660193.2660202

Digital Library

[31]

François Pottier. 2016. Reachability and Error Diagnosis in LR(1) Parsers. In Proceedings of the 25th International Conference on Compiler Construction. ACM, 88–98. isbn:9781450342414 https://doi.org/10.1145/2892208.2892224

Digital Library

[32]

American National Corpus Project. 2010. Open American National Corpus. http://www.anc.org/data/oanc/download/

[33]

Tom Ridge. 2011. Simple, Functional, Sound and Complete Parsing for All Context-Free Grammars. In Proceedings of the First International Conference on Certified Programs and Proofs (CPP 2011). Springer, 103–118. https://doi.org/10.1007/978-3-642-25379-9_10

Digital Library

[34]

Elizabeth Scott and Adrian Johnstone. 2010. GLL Parsing. Electronic Notes in Theoretical Computer Science, 253, 7 (2010), 177–189. https://doi.org/10.1016/j.entcs.2010.08.041

Digital Library

[35]

Ryan Wisnesky, Gregory Malecha, and Greg Morrisett. 2009. Certified Web Services in Ynot. In Proceedings of the 5th International Workshop on Automated Specification and Verification of Web Systems (WWV 2009). RISC-Linz, 5–19. https://www3.risc.jku.at/conferences/wwv09

[36]

W. A. Woods. 1970. Transition Network Grammars for Natural Language Analysis. Communications of the ACM, 13, 10 (1970), Oct., 591–606. issn:0001-0782 https://doi.org/10.1145/355598.362773

Digital Library

Cited By

Lammich P(2024)Fast and Verified UNSAT Certificate CheckingAutomated Reasoning10.1007/978-3-031-63498-7_26(439-457)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-63498-7_26
Jia XKumar ATan G(2023)A Derivative-based Parser Generator for Visibly Pushdown GrammarsACM Transactions on Programming Languages and Systems10.1145/359147245:2(1-68)Online publication date: 15-May-2023
https://dl.acm.org/doi/10.1145/3591472
Lasser SCasinghino CEgolf DFisher KRoux C(2023)Verified ALL(*) Parsing with Semantic Actions and Dynamic Input ValidationNASA Formal Methods10.1007/978-3-031-33170-1_25(414-429)Online publication date: 16-May-2023
https://dl.acm.org/doi/10.1007/978-3-031-33170-1_25
Show More Cited By

Index Terms

CoStar: a verified ALL(*) parser
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Formal software verification
  2. Software notations and tools
    1. Compilers
      1. Parsers

Recommendations

Left Corner Parser for Tree Insertion Grammars
AIMSA '02: Proceedings of the 10th International Conference on Artificial Intelligence: Methodology, Systems, and Applications

Tree Adjoining Grammar (TAG) is a grammar formalism that has become very popular for the description of natural languages, however, this context-sensitive formalism entails important computation costs ( O ( n ⁶)-time). Tree Insertion Grammar (TIG) is ...
Verified ALL(*) Parsing with Semantic Actions and Dynamic Input Validation
NASA Formal Methods
Abstract
Formally verified parsers are powerful tools for preventing the kinds of errors that result from ad hoc parsing and validation of program input. However, verified parsers are often based on formalisms that are not expressive enough to capture the ...
A verified packrat parser interpreter for parsing expression grammars
CPP 2020: Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs

Parsing expression grammars (PEGs) offer a natural opportunity for building verified parser interpreters based on higher-order parsing combinators. PEGs are expressive, unambiguous, and efficient to parse in a top-down recursive descent style. We use ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

June 2021

1341 pages

ISBN:9781450383912

DOI:10.1145/3453483

General Chair:
Stephen N. Freund
Williams College, USA
,
Program Chair:
Eran Yahav
Technion, Israel

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Conference

PLDI '21

Sponsor:

SIGPLAN

PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

June 20 - 25, 2021

Virtual, Canada

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
1,086
Total Downloads

Downloads (Last 12 months)253
Downloads (Last 6 weeks)34

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lammich P(2024)Fast and Verified UNSAT Certificate CheckingAutomated Reasoning10.1007/978-3-031-63498-7_26(439-457)Online publication date: 3-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-63498-7_26
Jia XKumar ATan G(2023)A Derivative-based Parser Generator for Visibly Pushdown GrammarsACM Transactions on Programming Languages and Systems10.1145/359147245:2(1-68)Online publication date: 15-May-2023
https://dl.acm.org/doi/10.1145/3591472
Lasser SCasinghino CEgolf DFisher KRoux C(2023)Verified ALL(*) Parsing with Semantic Actions and Dynamic Input ValidationNASA Formal Methods10.1007/978-3-031-33170-1_25(414-429)Online publication date: 16-May-2023
https://dl.acm.org/doi/10.1007/978-3-031-33170-1_25
Jia XKumar ATan G(2021)A derivative-based parser generator for visibly Pushdown grammarsProceedings of the ACM on Programming Languages10.1145/34855285:OOPSLA(1-24)Online publication date: 15-Oct-2021
https://dl.acm.org/doi/10.1145/3485528

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents