Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2997364.2997370acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Taming context-sensitive languages with principled stateful parsing

Published: 20 October 2016 Publication History

Abstract

Historically, true context-sensitive parsing has seldom been applied to programming languages, due to its inherent complexity. However, many mainstream programming and markup languages (C, Haskell, Python, XML, and more) possess context-sensitive features. These features are traditionally handled with ad-hoc code (e.g., custom lexers), outside of the scope of parsing theory.
Current grammar formalisms struggle to express context-sensitive features. Most solutions lack context transparency: they make grammars hard to write, maintain and compose by hardwiring context through the entire grammar. Instead, we approach context-sensitive parsing through the idea that parsers may recall previously matched input (or data derived therefrom) in order to make parsing decisions. We make use of mutable parse state to enable this form of recall.
We introduce principled stateful parsing as a new transactional discipline that makes state changes transparent to parsing mechanisms such as backtracking and memoization. To enforce this discipline, users specify parsers using formally specified primitive state manipulation operations.
Our solution is available as a parsing library named Autumn. We illustrate our solution by implementing some practical context-sensitive grammar features such as significant whitespace handling and namespace classification.

Supplementary Material

Auxiliary Archive (p15-laurent-s.zip)
The supplementary materials include two artifacts: The code for the Autumn parsing library (autumn) The Z specification presented in the paper (zspec)

References

[1]
A FROOZEH, A., AND I ZMAYLOVA, A. One parser to rule them all. In ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (2015), Onward! 2015, ACM, pp. 151–170.
[2]
A TKEY, R. The semantics of parsing with semantic actions. In Proceedings of the 27th Annual IEEE/ACM Symposium on Logic in Computer Science (2012), LICS 2015, IEEE Computer Society, pp. 75–84.
[3]
A YCOCK, J., AND H ORSPOOL, R. N. Schrödinger’s token. Software: Practice and Experience 31, 8 (2001), 803–814.
[4]
B AGWELL, P. Ideal hash trees. Tech. Rep. LAMP-REPORT- 2001-001, Ecole polytechnique fédérale de Lausanne, 2001.
[5]
C HOMSKY, N. Three models for the description of language. IRE Transactions on Information Theory 2, 3 (1956), 113–124.
[6]
C HOMSKY, N. Formal properties of grammar. In Handbook of Mathematical Psychology. Wiley, 1963, ch. 12, pp. 360–363 and 367.
[7]
D OENITZ, M. The Parboiled homepage, 2015.
[8]
https: //github.com/sirthias/parboiled.
[9]
F ORD, B. Parsing expression grammars: A recognition-based syntactic foundation. SIGPLAN Notices 39, 1 (Jan. 2004), 111–122.
[10]
G RIMM, R. Better extensibility through modular syntax. SIGPLAN Notices 41, 6 (June 2006), 38–51.
[11]
G RUNE, D., AND J ACOBS, C. J. Parsing Techniques: A Practical Guide, p. 21–23, 2nd ed. Springer, 2008.
[12]
H EDIN, G. Reference attributed grammars. Informatica (Slovenia) 24, 3 (2000).
[13]
H UTTON, G., AND M EIJER, E. Monadic parsing in Haskell. Journal of Functional Programming 8, 4 (July 1998), 437–444.
[14]
I ERUSALIMSCHY, R. A text pattern-matching tool based on parsing expression grammars. Software: Practice and Experience 39, 3 (Mar. 2009), 221–258.
[15]
J IM, T., AND M ANDELBAUM, Y. A new method for dependent parsing. In Proceedings of the 20th European Conference on Programming Languages and Systems (2011), ESOP 2011, Springer, pp. 378–397.
[16]
J IM, T., M ANDELBAUM, Y., AND W ALKER, D. Semantics and algorithms for data-dependent grammars. SIGPLAN Notices 45, 1 (Jan. 2010), 417–430.
[17]
K ALLMEYER, L. Parsing Beyond Context-Free Grammars. Springer, 2010.
[18]
K LINT, P., L ÄMMEL, R., AND V ERHOEF, C. Toward an engineering discipline for grammarware. ACM Transactions on Software Engineering and Methodology 14, 3 (July 2005), 331–380.
[19]
K NUTH, D. Semantics of context-free languages. Mathematical systems theory 2, 2 (1968), 127–145.
[20]
L AURENT, N., AND M ENS, K. Parsing expression grammars made practical. In Proceedings of the ACM SIGPLAN International Conference on Software Language Engineering (2015), SLE 2015, ACM, pp. 167–172.
[21]
L AURENT, N., AND M ENS, K. Taming context-sensitive languages with principled stateful parsing: Artifacts. Software Language Engineering: Artifacts Track (2016).
[22]
https:// github.com/ncellar/sle2016.
[23]
L EIJEN, D., AND M EIJER, E. Parsec: Direct style monadic parser combinators for the real world. Tech. Rep. UU-CS- 2001-35, Department of Information and Computing Sciences, Utrecht University, 2001.
[24]
M OORS, A., P IESSENS, F., AND O DERSKY, M. Parser combinators in Scala. Tech. Rep. CW491, Department of Computer Science, KU Leuven, Feb. 2008.
[25]
O KASAKI, C. Purely Functional Data Structures. Cambridge University Press, New York, NY, USA, 1998.
[26]
P ARR, T., H ARWELL, S., AND F ISHER, K. Adaptive LL(*) parsing: The power of dynamic analysis. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (2014), OOPSLA ’14, ACM, pp. 579–598.
[27]
P EREIRA, F., AND W ARREN, D. Definite clause grammars for language analysis. In Readings in Natural Language Processing, B. J. Grosz, K. Sparck-Jones, and B. L. Webber, Eds. Morgan Kaufmann, 1986, pp. 101–124.
[28]
S PIVEY, J. M., AND A BRIAL, J. The Z notation. Prentice Hall, 1992.
[29]
S TEINDORFER, M. J., AND V INJU, J. J. Optimizing hasharray mapped tries for fast and lean immutable jvm collections. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (2015), OOPSLA 2015, ACM, pp. 783–800.
[30]
T HE F REE S OFTWARE F OUNDATION. The GNU Bison homepage, 2014. http://www.gnu.org/software/bison/.
[31]
T HURSTON, A. D., AND C ORDY, J. R. A backtracking LR algorithm for parsing ambiguous context-dependent languages. In Proceedings of the 2006 conference of the Centre for Advanced Studies on Collaborative Research, October 16-19, 2006, Toronto, Ontario, Canada (2006), pp. 39–53.
[32]
V AN W YK, E., AND S CHWERDFEGER, A. Context-aware scanning for parsing extensible languages. In International Conference on Generative Programming and Component Engineering, GPCE 2007 (October 2007), ACM.

Cited By

View all
  • (2023)On Parsing Programming Languages with Turing-Complete ParserMathematics10.3390/math1107159411:7(1594)Online publication date: 25-Mar-2023
  • (2022)Context-sensitive parsing for programming languagesJournal of Computer Languages10.1016/j.cola.2022.10117273(101172)Online publication date: Dec-2022
  • (2020)Mining input grammars from dynamic control flowProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3409679(172-183)Online publication date: 8-Nov-2020
  • Show More Cited By

Index Terms

  1. Taming context-sensitive languages with principled stateful parsing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SLE 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering
    October 2016
    257 pages
    ISBN:9781450344470
    DOI:10.1145/2997364
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. context sensitivity
    2. data dependence
    3. grammars
    4. parsing expressions
    5. stateful parsing

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SLE '16
    Sponsor:
    SLE '16: Software Language Engineering
    October 31 - November 1, 2016
    Amsterdam, Netherlands

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)On Parsing Programming Languages with Turing-Complete ParserMathematics10.3390/math1107159411:7(1594)Online publication date: 25-Mar-2023
    • (2022)Context-sensitive parsing for programming languagesJournal of Computer Languages10.1016/j.cola.2022.10117273(101172)Online publication date: Dec-2022
    • (2020)Mining input grammars from dynamic control flowProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3409679(172-183)Online publication date: 8-Nov-2020
    • (2017)Red Shift: procedural shift-reduce parsing (vision paper)Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3136014.3136036(38-42)Online publication date: 23-Oct-2017
    • (2017)A symbol-based extension of parsing expression grammars and context-sensitive packrat parsingProceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering10.1145/3136014.3136025(26-37)Online publication date: 23-Oct-2017
    • (2017)Context Sensitive and Secure Parser Generation for Deep Packet Inspection of Binary Protocols2017 15th Annual Conference on Privacy, Security and Trust (PST)10.1109/PST.2017.00019(77-7709)Online publication date: Aug-2017
    • (2016)Taming context-sensitive languages with principled stateful parsingProceedings of the 2016 ACM SIGPLAN International Conference on Software Language Engineering10.1145/2997364.2997370(15-27)Online publication date: 20-Oct-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media