1 Syntax Analyzer
1 Syntax Analyzer
Syntax Analysis
Syntax Analysis
The parser (syntax analyzer) receives the source code in
the form of tokens from the lexical analyzer and performs
syntax analysis, which create a tree-like intermediate
representation that depicts the grammatical structure of the
token stream.
Parser
Checks the stream of words and their parts of speech
(produced by the scanner) for grammatical correctness
Determines if the input is syntactically well formed
Guides checking at deeper levels than syntax (static
semantics checking)
Builds an IR representation of the code
Study of Parsing
Parser
The parser
Needs the syntax of programming language constructs, which can be
specified by context-free grammars or BNF (Backus-Naur Form)
Need an algorithm for testing membership in the language of the grammar.
Roadmap
The roadmap for study of parsing
Context-free grammars and derivations
Top-down parsing
Recursive descent (predictive parsing)
LL (Left-to-right, Leftmost derivation) methods
Bottom-up parsing
Operator precedence parsing
LR (Left-to-right, Rightmost derivation) methods
SLR, canonical LR, LALR
Expressive Power of Different Parsing
Techniques
Benefits Offered by Grammar
Advantages of RE/DFA
Limits of RE/DFA
Definition
A context-free grammar (CFG) has four components:
A set of terminal symbols, sometimes referred to as "tokens."
A set of nonterminal symbols. sometimes called "syntactic variables."
One nonterminal is distinguished as the start symbol.
The terminals are the elementary symbols of the language defined by the
grammar.
Nonterminals impose a hierarchical structure on the language that is key to
syntax analysis and translation.
Conventionally, the productions for the start symbol are listed first.
The productions specify the manner in which the terminals and
nonterminals can be combined to form strings.
CFG Example
A CFG Grammar
1Expr → Expr Op
Expr 2Expr →
number 3Expr → id
4Op → +
5Op → -
6Op → *
7Op → /
where
Expr and Op are nonterminals
number, id, +, -, *, and / are terminals
Expr is the start symbol
CFG Example
terminal symbols
nonterminal symbols
Derivations
A grammar derives strings by beginning with the start symbol and repeatedly
replacing a nonterminal by the body of a production for that nonterminal. This
sequence of replacements is called derivation.
Derivation Example
Given the grammar:
1 exp → exp op exp | ( exp ) | number
2 op → + | - | *
The following is a derivation for an expression. At each step the grammar rule
choice used for the replacement is given on the right.
Context-Free Language
∗ +
New Notations: ⇒ ⇒
= and =
∗
α 1⇒=α means
n α derives
1 α in nzero or more steps.
+
α 1 =⇒α n means α 1 derives α n in one or more steps.
Definition
Definition
A parse Tree is a labeled tree representation of a derivation
that filters out the order in which productions are applied to
replace nonterminals.
The interior nodes are labeled by nonterminals
The leaf nodes are labeled by terminals
The children of each internal node A are labeled, from
left to right, by the symbols in the body of the production
by which this A was replaced during the derivation.
Definition
A grammar that produces more than one parse tree for
some sentence is said to be ambiguous. Such a grammar is
called ambiguous grammar.
The grammar:
1 exp → exp op exp | id | id
2 op → + | - | * | /
are ambiguous because there are two different parse trees for sentence: id -
number*id
Solving Ambiguity
Ambiguous Grammar
To use Approach 1 to remove ambiguity from the above ambiguous grammar, the
following disambiguating rules are defined:
all operators (+, -, *, /) are left associative.
+ and - have the same precedence
* and / have the same precedence
* and / have higher precedence than + and -.
Ambiguous Grammar
The following explain the idea to rewrite the dangling-else grammar to remove the
ambiguity.
A statement appearing between a then and an else must be ”matched”; that
is, the interior statement must not end with an unmatched or open then.
A matched statement is either an if-then-else statement containing no open
statements or it is any other kind of unconditional statement.
Ambiguous Grammar
S → S+S
| S-S
| S*S
| S/S
| (S)
| -S
| S ^S
| number