Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1993498.1993548acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

LL(*): the foundation of the ANTLR parser generator

Published: 04 June 2011 Publication History

Abstract

Despite the power of Parser Expression Grammars (PEGs) and GLR, parsing is not a solved problem. Adding nondeterminism (parser speculation) to traditional LL and LR parsers can lead to unexpected parse-time behavior and introduces practical issues with error handling, single-step debugging, and side-effecting embedded grammar actions. This paper introduces the LL(*) parsing strategy and an associated grammar analysis algorithm that constructs LL(*) parsing decisions from ANTLR grammars. At parse-time, decisions gracefully throttle up from conventional fixed k>=1 lookahead to arbitrary lookahead and, finally, fail over to backtracking depending on the complexity of the parsing decision and the input symbols. LL(*) parsing strength reaches into the context-sensitive languages, in some cases beyond what GLR and PEGs can express. By statically removing as much speculation as possible, LL(*) provides the expressivity of PEGs while retaining LL's good error handling and unrestricted grammar actions. Widespread use of ANTLR (over 70,000 downloads/year) shows that it is effective for a wide variety of applications.

References

[1]
Bermudez, M. E., and Schimpf, K. M. Practical arbitrary lookahead LR parsing. Journal of Computer and System Sciences 41, 2 (1990), 230--250.
[2]
Charles, P. A Practical Method for Constructing Efficient LALR(K) Parsers with Automatic Error Recovery. PhD thesis, New York University, New York, NY, USA, 1991.
[3]
Cohen, R., and Culik, K. LR-Regular grammars an extension of LR(k) grammars. In SWAT '71: Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971) (Washington, DC, USA, 1971), IEEE Computer Society, pp. 153--165.
[4]
Earley, J. An efficient context-free parsing algorithm. Communications of the ACM 13, 2 (1970), 94--102.
[5]
Ford, B. Packrat Parsing: Simple, powerful, lazy, linear time. In Proceedings of annual ACM SIGPLAN International Conference on Functional Programming (2002), ACM Press, pp. 36--47.
[6]
Ford, B. Parsing Expression Grammars: A recognition-based syntactic foundation. In POPL '04: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (2004), ACM Press, pp. 111--122.
[7]
Grimm, R. Better extensibility through modular syntax. In PLDI'06: Proceedings of annual ACM SIGPLAN Conference on Programming Language Design and Implementation (2006), ACM Press, pp. 38--51.
[8]
Hanson, D. R. Compact recursive-descent parsing of expressions. Software Practice and Experience 15 (December 1985), 1205--1212.
[9]
Jarzabek, S., and Krawczyk, T. LL-Regular grammars. Information Processing Letters 4, 2 (1975), 31--37.
[10]
Jim, T., Mandelbaum, Y., and Walker, D. Semantics and algorithms for data-dependent grammars. In POPL '10: Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (New York, NY, USA, 2010), ACM, pp. 417--430.
[11]
McPeak, S., and Necula, G. C. Elkhound: A fast, practical GLR parser generator. In Compiler Construction (2004), pp. 73--88.
[12]
Milton, D. R., and Fischer, C. N. LL(k) parsing for attributed grammars. In International Conference on Automata, Languages, and Programming (1979), pp. 422--430.
[13]
Nederhof, M.-J. Practical experiments with regular approximation of context-free languages. Computational Linguistics 26, 1 (2000), 17--44.
[14]
Nijholt, A. On the parsing of LL-Regular grammars. In Mathematical Foundations of Computer Science 1976 (Heidelberg, 1976), A. Mazurkiewicz, Ed., vol. 45 of Lecture Notes in Computer Science, Springer Verlag, pp. 446--452.
[15]
Nijholt, A. On the parsing of LL-Regular grammars. In Mathematical Foundations of Computer Science 1976 (Heidelberg, 1976), A. Mazurkiewicz, Ed., vol. 45 of Lecture Notes in Computer Science, Springer Verlag, pp. 446--452.
[16]
Parr, T. J. Obtaining practical variants of LL(k) and LR(k) for k > 1 by splitting the atomic k-tuple. PhD thesis, Purdue University, West Lafayette, IN, USA, 1993.
[17]
Parr, T. J., and Quong, R. W. Adding Semantic and Syntactic Predicates to LL(k)|pred-LL(k). In Proceedings of the International Conference on Compiler Construction; Edinburgh, Scotland (April 1994).
[18]
Poplawski, D. A. On LL-Regular grammars. Journal of Computer and System Sciences 18, 3 (1979), 218--227.
[19]
Tomita, M. Efficient Parsing for Natural Language. Kluwer Academic Publishers, 1986.
[20]
Woods, W. A. Transition network grammars for natural language analysis. Communications of the ACM 13, 10 (1970), 591--606.

Cited By

View all
  • (2024)Zombie cheminformatics: extraction and conversion of Wiswesser Line Notation (WLN) from chemical documentsJournal of Cheminformatics10.1186/s13321-024-00831-216:1Online publication date: 15-Apr-2024
  • (2024)Jasay: Towards Voice Commands in Projectional EditorsProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648449(30-34)Online publication date: 20-Apr-2024
  • (2024)Atomic Condition Coverage Analysis for Structured Text Based Programmable Logic Controller (PLC)Proceedings of the 17th Innovations in Software Engineering Conference10.1145/3641399.3641427(1-5)Online publication date: 22-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2011
668 pages
ISBN:9781450306638
DOI:10.1145/1993498
  • General Chair:
  • Mary Hall,
  • Program Chair:
  • David Padua
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 6
    PLDI '11
    June 2011
    652 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1993316
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. augmented transition networks
  2. backtracking
  3. context-sensitive parsing
  4. deterministic finite automata
  5. glr
  6. memoization
  7. nondeterministic parsing
  8. peg
  9. semantic predicates
  10. subset construction
  11. syntactic predicates

Qualifiers

  • Research-article

Conference

PLDI '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)125
  • Downloads (Last 6 weeks)14
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Zombie cheminformatics: extraction and conversion of Wiswesser Line Notation (WLN) from chemical documentsJournal of Cheminformatics10.1186/s13321-024-00831-216:1Online publication date: 15-Apr-2024
  • (2024)Jasay: Towards Voice Commands in Projectional EditorsProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648449(30-34)Online publication date: 20-Apr-2024
  • (2024)Atomic Condition Coverage Analysis for Structured Text Based Programmable Logic Controller (PLC)Proceedings of the 17th Innovations in Software Engineering Conference10.1145/3641399.3641427(1-5)Online publication date: 22-Feb-2024
  • (2024)Ranked Syntax Completion With LR ParsingProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635944(1242-1251)Online publication date: 8-Apr-2024
  • (2023)Ordered Context-Free Grammars RevisitedElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.388.13388(140-153)Online publication date: 15-Sep-2023
  • (2023)Derivatives of Context-free Grammars with LookaheadJournal of Information Processing10.2197/ipsjjip.31.42131(421-431)Online publication date: 2023
  • (2023)A Survey of Learning-based Automated Program RepairACM Transactions on Software Engineering and Methodology10.1145/363197433:2(1-69)Online publication date: 23-Dec-2023
  • (2023)Multiple Input Parsing and Lexical AnalysisACM Transactions on Programming Languages and Systems10.1145/359473445:3(1-44)Online publication date: 19-Jul-2023
  • (2023)Associative Operator Precedence Parsing: A Method To Increase Data Parsing ParallelismProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3578178.3578233(75-87)Online publication date: 27-Feb-2023
  • (2023)Statically Resolvable AmbiguityProceedings of the ACM on Programming Languages10.1145/35712517:POPL(1686-1712)Online publication date: 11-Jan-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media