Antlr PDF
Antlr PDF
Antlr PDF
Homepage: http://www.antlr4.org
This pops up a dialog box showing that rule r matched keyword hello followed by
the identifier brink.
THE BIG PICTURE
Tokenizing:
assign : ID '=' expr ';' ; // match an assignment statement like "sp = 100;"
void stat() {
switch ( «current input token» ) {
CASE ID : assign(); break;
CASE IF : ifstat(); break; // IF is token type for keyword 'if'
CASE WHILE : whilestat(); break;
...
default : «raise no viable alternative exception»
}
}
In the example on the previous slide:
Method stat() has to make a parsing decision or
prediction by examining the next input token.
ANTLR resolves the ambiguity by choosing the first alternative involved in the decision.
Ambiguities can occur in the lexer as well as the parser.
ANTLR resolves lexical ambiguities by matching the input string to the rule
specified first in the grammar.
Lexers process characters and pass tokens to the parser, which in turn checks syntax
and creates a parse tree.
The corresponding ANTLR classes are CharStream, Lexer, Token, Parser, and ParseTree.
The “pipe” connecting the lexer and parser is called a TokenStream.
ANTLR uses context objects which knows the start and stop tokens for a
recognized phrase and provides access to all of the elements of that phrase.
For example, AssignContext provides methods ID() and expr() to access the
identifier node and expression subtree.
Parse-Tree Listeners and Visitors
By default, ANTLR generates a parse-tree listener interface that responds to
events triggered by the built-in tree walker.
To walk a tree and trigger calls into a listener, ANTLR’s runtime provides the class
ParseTreeWalker.
The beauty of the listener mechanism is that it’s all automatic. We don’t have to
write a parse-tree walker, and our listener methods don’t have to explicitly visit
their children.
ParseTreeWalker call sequence
Parse-Tree Visitors
There are situations, however, where we want to control the walk itself, explicitly
calling methods to visit children.
Option -visitor asks ANTLR to generate a visitor interface from a grammar with a visit
method per rule.
grammar ArrayInit;
/** A rule called init that matches comma-separated values between {...}. */
init : '{' value (',' value)* '}' ; // must match at least one value
// parser rules start with lowercase letters, lexer rules with uppercase
INT : [0-9]+ ; // Define token INT as one or more digits
WS : [ \t\r\n]+ -> skip ; // Define whitespace rule, toss it out
From the grammar ArrayInit.g4, ANTLR generates the following files:
ArrayInitParser.java This file contains the parser class definition specific to
grammar ArrayInit that recognizes our array language syntax.
ArrayInit.tokens ANTLR assigns a token type number to each token we define and
stores these values in this file.
import org.antlr.v4.runtime.tree.*;
System.out.printf("\\u%04x", value); }
}
}
Driver program for ArrayInitBaseListener
grammar LabeledExpr;
prog: stat+ ;
T visitId(LabeledExprParser.IdContext ctx);
T visitAssign(LabeledExprParser.AssignContext ctx);
T visitMulDiv(LabeledExprParser.MulDivContext ctx);
...
}
To implement the calculator, we override the methods associated with statement and expression alternatives.
import java.util.HashMap;
import java.util.Map;
Imagine you want to build a tool that generates a Java interface file from the methods in a Java class definition.
Sample Input:
import java.util.List;
import java.util.Map;
interface IDemo {
int[ ] g( );
List<Map<String, Integer>>[ ] h( );
}
The key “interface” between the grammar and our listener object is called
JavaListener, and ANTLR automatically generates it for us.
It defines all of the methods that the class ParseTreeWalker from ANTLR’s runtime
can trigger as it traverses the parse tree.
Here are the relevant methods from the generated listener interface:
Forgetting to invoke visit() on a node’s children means those subtrees don’t get
visited.
Due: 31 May