Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3453483.3454053acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Open access

CoStar: a verified ALL(*) parser

Published: 18 June 2021 Publication History

Abstract

Parsers are security-critical components of many software systems, and verified parsing therefore has a key role to play in secure software design. However, existing verified parsers for context-free grammars are limited in their expressiveness, termination properties, or performance characteristics. They are only compatible with a restricted class of grammars, they are not guaranteed to terminate on all inputs, or they are not designed to be performant on grammars for real-world programming languages and data formats.
In this work, we present CoStar, a verified parser that addresses these limitations. The parser is implemented with the Coq Proof Assistant and is based on the ALL(*) parsing algorithm. CoStar is sound and complete for all non-left-recursive grammars; it produces a correct parse tree for its input whenever such a tree exists, and it correctly detects ambiguous inputs. CoStar also provides strong termination guarantees; it terminates without error on all inputs when applied to a non-left-recursive grammar. Finally, CoStar achieves linear-time performance on a range of unambiguous grammars for commonly used languages and data formats.

References

[1]
Michael D. Adams, Celeste Hollenbeck, and Matthew Might. 2016. On the Complexity and Performance of Parsing with Derivatives. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2016). ACM, 224–236. https://doi.org/10.1145/2908080.2908128
[2]
Aditi Barthwal and Michael Norrish. 2009. Verified, Executable Parsing. In Proceedings of the 18th European Symposium on Programming (ESOP 2009). Springer-Verlag, 160–174. isbn:9783642005893 https://doi.org/10.1007/978-3-642-00590-9_12
[3]
Clement Blaudeau and Natarajan Shankar. 2020. A Verified Packrat Parser Interpreter for Parsing Expression Grammars. In Proceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs. ACM, 3–17. https://doi.org/10.1145/3372885.3373836
[4]
William S Cleveland. 1979. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association, 74, 368 (1979), 829–836.
[5]
The Coq Development Team. 2020. The Coq Proof Assistant, version 8.11.0. https://doi.org/10.5281/zenodo.3744225
[6]
Nils Anders Danielsson. 2010. Total Parser Combinators. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP 2010). ACM, 285–296. https://doi.org/10.1145/1863543.1863585
[7]
Romain Edelmann, Jad Hamza, and Viktor Kunčak. 2020. Zippy LL(1) Parsing with Derivatives. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). ACM, 1036–1051. isbn:9781450376136 https://doi.org/10.1145/3385412.3385992
[8]
Denis Firsov and Tarmo Uustalu. 2014. Certified CYK Parsing of Context-Free Languages. Journal of Logical and Algebraic Methods in Programming, 83, 5-6 (2014), 459–468. https://doi.org/10.1016/j.jlamp.2014.09.002
[9]
Denis Firsov and Tarmo Uustalu. 2015. Certified Normalization of Context-Free Grammars. In Proceedings of the 2015 Conference on Certified Programs and Proofs (CPP 2015). ACM, 167–174. https://doi.org/10.1145/2676724.2693177
[10]
Kathleen Fisher and Robert Gruber. 2005. PADS: A Domain-Specific Language for Processing Ad Hoc Data. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2005). ACM, 295–304. isbn:1595930566 https://doi.org/10.1145/1065010.1065046
[11]
Bryan Ford. 2002. Packrat Parsing: Simple, Powerful, Lazy, Linear Time (Functional Pearl). In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP 2002). ACM, 36–47. isbn:1581134878 https://doi.org/10.1145/581478.581483
[12]
Bryan Ford. 2004. Parsing Expression Grammars: A Recognition-Based Syntactic Foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2004). ACM, 111–122. isbn:158113729X https://doi.org/10.1145/964001.964011
[13]
Dan Goodin. 2017. Failure to patch two-month-old bug led to massive Equifax breach. Ars Technica, 13 Sept., https://arstechnica.com/information-technology/2017/09/massive-equifax-breach-caused-by-failure-to-patch-two-month-old-bug
[14]
Dan Goodin. 2020. Windows has a new wormable vulnerability, and there’s no patch in sight. Ars Technica, 11 March, https://arstechnica.com/information-technology/2020/03/windows-has-a-new-wormable-vulnerability-and-theres-no-patch-in-sight
[15]
2017. Cloudflare: Cloudflare Reverse Proxies are Dumping Uninitialized Memory. https://bugs.chromium.org/p/project-zero/issues/detail?id=1139
[16]
Clinton L. Jeffery. 2003. Generating LR Syntax Error Messages from Examples. ACM Transactions on Programming Languages and Systems, 25, 5 (2003), Sept., 631–640. issn:0164-0925 https://doi.org/10.1145/937563.937566
[17]
Jacques-Henri Jourdan, François Pottier, and Xavier Leroy. 2012. Validating LR(1) Parsers. In Proceedings of the 21st European Symposium on Programming (ESOP 2012). Springer, 397–416. https://doi.org/10.1007/978-3-642-28869-2_20
[18]
Bart Kiers. 2014. ANTLR4 Grammar for Python 3. Retrieved from. https://github.com/antlr/grammars-v4
[19]
Adam Koprowski and Henri Binsztok. 2010. TRX: A Formally Verified Parser Interpreter. In Proceedings of the 19th European Symposium on Programming (ESOP 2010). Springer, 345–365. https://doi.org/10.1007/978-3-642-11957-6_19
[20]
Mohit Kumar. 2020. Critical PPP Daemon Flaw Opens Most Linux Systems to Remote Hackers. The Hacker News, https://thehackernews.com/2020/03/ppp-daemon-vulnerability.html
[21]
Sam Lasser, Chris Casinghino, Kathleen Fisher, and Cody Roux. 2019. A Verified LL(1) Parser Generator. In Proceedings of the 10th International Conference on Interactive Theorem Proving (ITP 2019). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 24:1–24:18. https://doi.org/10.4230/LIPIcs.ITP.2019.24
[22]
Sam Lasser, Chris Casinghino, Kathleen Fisher, and Cody Roux. 2021. CoStar parser implementation, correctness proofs, and performance evaluation. https://doi.org/10.5281/zenodo.4681598
[23]
Sam Lasser, Chris Casinghino, Kathleen Fisher, and Cody Roux. 2021. GitHub repository for the development and evaluation framework. https://github.com/slasser/CoStar
[24]
P. M. Lewis and R. E. Stearns. 1968. Syntax-Directed Transduction. Journal of the ACM, 15, 3 (1968), July, 465–488. issn:0004-5411 https://doi.org/10.1145/321466.321477
[25]
Matthew Might, David Darais, and Daniel Spiewak. 2011. Parsing with Derivatives: A Functional Pearl. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming (ICFP 2011). Association for Computing Machinery, 189–195. isbn:9781450308656 https://doi.org/10.1145/2034773.2034801
[26]
2016. CVE-2016-0101. National Vulnerability Database. https://nvd.nist.gov/vuln/detail/CVE-2016-0101
[27]
2017. CVE-2017-5638. National Vulnerability Database. https://nvd.nist.gov/vuln/detail/CVE-2017-5638
[28]
2020. CVE-2020-8597. National Vulnerability Database. https://nvd.nist.gov/vuln/detail/CVE-2020-8597
[29]
Terence Parr and Kathleen Fisher. 2011. LL(*): The Foundation of the ANTLR Parser Generator. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2011). ACM, 425–436. isbn:9781450306638 https://doi.org/10.1145/1993498.1993548
[30]
Terence Parr, Sam Harwell, and Kathleen Fisher. 2014. Adaptive LL(*) Parsing: The Power of Dynamic Analysis. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA 2014). ACM, 579–598. isbn:9781450325851 https://doi.org/10.1145/2660193.2660202
[31]
François Pottier. 2016. Reachability and Error Diagnosis in LR(1) Parsers. In Proceedings of the 25th International Conference on Compiler Construction. ACM, 88–98. isbn:9781450342414 https://doi.org/10.1145/2892208.2892224
[32]
American National Corpus Project. 2010. Open American National Corpus. http://www.anc.org/data/oanc/download/
[33]
Tom Ridge. 2011. Simple, Functional, Sound and Complete Parsing for All Context-Free Grammars. In Proceedings of the First International Conference on Certified Programs and Proofs (CPP 2011). Springer, 103–118. https://doi.org/10.1007/978-3-642-25379-9_10
[34]
Elizabeth Scott and Adrian Johnstone. 2010. GLL Parsing. Electronic Notes in Theoretical Computer Science, 253, 7 (2010), 177–189. https://doi.org/10.1016/j.entcs.2010.08.041
[35]
Ryan Wisnesky, Gregory Malecha, and Greg Morrisett. 2009. Certified Web Services in Ynot. In Proceedings of the 5th International Workshop on Automated Specification and Verification of Web Systems (WWV 2009). RISC-Linz, 5–19. https://www3.risc.jku.at/conferences/wwv09
[36]
W. A. Woods. 1970. Transition Network Grammars for Natural Language Analysis. Communications of the ACM, 13, 10 (1970), Oct., 591–606. issn:0001-0782 https://doi.org/10.1145/355598.362773

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation
June 2021
1341 pages
ISBN:9781450383912
DOI:10.1145/3453483
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2021

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. interactive theorem proving
  2. parsing

Qualifiers

  • Research-article

Conference

PLDI '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)253
  • Downloads (Last 6 weeks)34
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fast and Verified UNSAT Certificate CheckingAutomated Reasoning10.1007/978-3-031-63498-7_26(439-457)Online publication date: 3-Jul-2024
  • (2023)A Derivative-based Parser Generator for Visibly Pushdown GrammarsACM Transactions on Programming Languages and Systems10.1145/359147245:2(1-68)Online publication date: 15-May-2023
  • (2023)Verified ALL(*) Parsing with Semantic Actions and Dynamic Input ValidationNASA Formal Methods10.1007/978-3-031-33170-1_25(414-429)Online publication date: 16-May-2023
  • (2021)A derivative-based parser generator for visibly Pushdown grammarsProceedings of the ACM on Programming Languages10.1145/34855285:OOPSLA(1-24)Online publication date: 15-Oct-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media