Chapter 3 Syntax Analysis
Chapter 3 Syntax Analysis
3: Mr. Yirga K.
Parser /
Syntax
Analysis
OUTLIN
1. Introduction to Parsing
E
2. Types of Parsing
1. Top-down Parsing
Top-down Parsing Implementation:
- Recursive Decent parsing
- Predictive (Non-Recursive)
Parsing
- LL (1) Grammar
OUTLIN
2. Bottom-Up Parsing
E
- Handles a n d Handles pruning
- Stack Implementation of Shift Re d u ce
Parsing
- LR Parser Implementation
- SLR, CLR a n d LALR parser
3. Error Recovery in Parsing
4. Ya c c Automatic Parser Generator
Introduction to Parsing
Chapter Content
In this chapter, the process of parsing, the role of parser,
various types of parses, a n d their implementation
strategies will b e addressed.
Objective: Students should:
Design parsers using the c o n c e p t of CFG.
C o m p a re different parsing strategies.
Implement LL(1) a n d LR(1) parsers.
Students b e able to use YA C C as a syntax generator.
Introduction to Parsing
What is
parsing? or parsing is the second phase of a
• Synta x compiler
• a
It na lysistransformation of a sequence of tokens to a n
is the
architecture.
abstract syntax tree.
• The parser obtains a set of tokens from the lexical analyzer.
• Syntax Analyzer creates the syntactic structure of the
given source program.
• It uses BNF (Backus-Naur Form) notation in the description of
CFGs.
Introduction to Parsing
• Tokens arrangements are c h e c ke d against source c o d e grammar.
The synta x o fa p ro g ra mming la ng ua g e is
we ll d e sc rib e d by using
c ontext-free grammar.
It checks whether a given source program satisfies the rules
implied by a context-free grammar or not.
• If it satisfies, the parser creates the parse tree/syntax
tree/, which further used for Generating Intermediate
code(ICG).
• Otherwise the parser gives the error messages.
Context Free C onti
Grammar
Pushdown …
Automata
Regular
Expressions Finite
State Automata
Introduction to Parsing
Syntax is the rule governing the formation of
statements in a programming language.
It is the way tokens are put together to form expressions,
statements, or blocks of statements.
Syntax analysis is the task con ce rn e d with fitting a
sequence of tokens into a specifi ed syntax.
Due to the limitation of RE, a scanner cannot c h e c k the
syntax of a se nte nc e . E.g . RE c a nno t c he c k { a nd }.
Syntax of a programming language is the grammar rules of
context- free grammar (CFG).
Introduction to Parsing
A grammar describes the hierarchical structure
of most programming languages.
Exa mp le : if ( e xp re ssio n ) sta te me nt else
sta te me nt
Using expr for expression a n d stmt for statement, it c a n b e
expressed a s, stmt: if ( expr ) stmt else stmt
This is c a lle d produc tion.
In production, lexical elements like the keyword if, else a n d
parentheses are called Terminals.
Variables like expr a n d stmt represent sequences of
terminals a n d are called non-terminals.
Introduction to Parsing
sp e c ific a ti
C FG s G ive s a p re c ise synta c tic of a
p ro g ra mming language. on
The d e sig no f the g ra mma r is a n initia l p ha se o f
the d e sig no f a compiler.
We categorize the parsers into two groups:
Top-Down Parser
The parse tree is created top to bottom starting from the
root.
Bottom-Up Parser
The parse tree is created bottom to top starting from the
leaves.
Introduction to Parsing
In general,
The parser accomplishes the following tasks, i.e.,
Parsing the c o d e ,
Looking for errors (syntactical) a n d
G e ne ra ting a p a rse tre e /synta x tre e / as the
o utp ut fo r the ne xt phase.
1) Top-down
Parsing
It Parses the input, a n d starts constructing a parse tree from
the root n o d e gradually moving down to the leaf nodes.
Top-down
1)Recursive Descent Parsing: Parsing
It is a c o mmo n fo rm o f to p -d o wn p a rsing .
It use s re c ursive p ro c e d ure s to p ro c e ss the inp ut.
Recursive descent parsing suffers from backtracking.
It constructs the parse tree from the top a n d the input
is read from left to right.
Top-down
Backtracking: Parsing
It is a te c hniq ue in to p -d o wn p a rsing tha t a llo ws
p a rse rs to explore alternative choices a n d handle
ambiguity.
In backtracking, the parser restores the input to its
original state when it fails.
It goes b a c k to earlier state to find another route after
knowing that current state is a d e a d end.
Top-down
Backtracking: Parsing
If one derivation of a production fails, the syntax analyzer
restarts the process using different rules of that
production.
This technique may process the input string more than
o n c e to determine the right production.
Top-down
Parsing
Top-down
Consider a Parsing
CFG:
S → rXd | rZd
X → oa | ea
Z → ai
Parse a n
input string:
“read”,
Try: P A B, A
xy | x, By,
a nd p a rse
inp ut “ xy”?
Top-down
2) Predic tive Parser Parsing
A form of recursive-descent parsing that does not require
any back- tracking is known as predictive parsing.
Predictive parser has the capability to predict which
production is to b e used to replace the input string.
It d o e s no t suffe r fro m b a c ktra c king .
Predictive parser uses a look-ahead pointer, which points to
the next input symbols.
To ma ke the p a rse r b a c k-tra c king fre e ,
the p re d ic tive p a rse r p uts some constraints on
the grammar.
Top-down
Parsing
It a c c e p t s only a class of grammar known as LL (1) grammar.
The first ‘L’ stands for scanning the input from left to right,
and
the second “L” stands for left most derivations, a n d
‘1’ refers to using only one input symbol (lookahead) at
e a c h step to make parsing action decisions.
Note: LL(1) is a top-down parsing algorithm that reads input
from left to right a n d builds leftmost derivations.
Top-down
Parsing
Two c ommon problems in top down parsing.
F(E)| id E’ε|+TE’
TFT’
T’*FT’/ ε
F(E)|id
Top-down
2) Left Fac toring
Parsing
If more than one grammar production rules has a
c o m m o n prefix string, then the top-down parser cannot
make a choice.
Le ft fa c to ring is a g ra mma r tra nsfo rma tio n tha t
is use ful fo r producing a grammar suitable for
predictive or top down parsing.
A grammar is said to b e Left Factored when it has the form:
A αβ 1 | αβ 2 | αβ 3 | … … | αβ n | γ, i.e. the productions start
with the same terminal (or set of terminals).
Top-down
Fro m the a b o ve e xa mp le , b y Parsing
se e ing the inp ut
‘α’ we c a nno t immediately tell which production to
choose to expand A.
Left factoring transforms the grammar to make it useful for
top-down parsers.
For the grammar A αβ 1 | αβ 2 | αβ 3 | … … | αβ n | γ
The equivalent Left Factored grammar will b e by rewriting:
A αA’ | γ
A ’ β 1 | β 2 | β 3 | … … | βn.
Top-down
3) FIRST and FOLLOW Parsing
First a nd Fo llo w Se ts use d in LL (1) o r p re d ic tive p a rse r.
Parser table construction is applying to create first a n d
follow sets.
The se se ts c a n p ro vid e the a c tua l p o sitio n of
a ny te rmina lin the derivation.
They allow the parser c a n properly apply the n e e d e d rule
at the correct position.
This is d o ne to c re a te the p a rsing ta b le whe re
the d e c isio n o f replacing T[A, t] = α, with some
production rule.
Top-down
Parsing
First Set: If there is a variable, a n d from that variable, if w e
try to drive all the strings then the beginning Terminal
Symbol is called the First.
FIRST(X) for a grammar symbol X is the set of terminals that
begin the strings derivable from X.
Rules to c ompute FIRST set:
If x is a te rmina l, the n FIRST(x) = { ‘ x’ }.
If Xɛ, is a production rule, then FIRST(X) = {ɛ}.
If XY1Y2Y3… .Yn is a p ro d uc tio n, the n
FIRST(X) = FIRST(Y1).
If FIRST(Y1) contains ɛ, then FIRST(X) = { FIRST(Y1) – ɛ } U
{ FIRST(Y2) }.
If FIRST (Yi) contains ɛ for all i = 1 to n, then a d d Є to FIRST(X).
Top-down
Example 1: Pro d uc tio n Rule s o f Parsing
G ra mma r:
E TE’ ; E’ +T E’ | Є
T F T’ ; T’ *F T’ | Є
F (E) | id
FIRST sets:
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’ ) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’ ) = { *, Є }
FIRST(F) = { ( , id }
Top-down
Parsing
Follow Set: What is the Terminal Symbol which follows a
variable in the process of derivation.
Rules to c ompute FOLLOW set
Follow (S) = { $}, where S is the starting Non-Terminal.
If A pBq is a production, where p, B a n d q are any
grammar symbols, then everything in FIRST(q) except Є is
in FOLLOW(B).
If A p B is a p ro d uc tio n, the n e ve rything
in FO LLO W(A ) is in
FOLLOW(B).
If A p Bq is a p ro d uc tio n a nd FIRST(q ) c o nta ins
Є, the n FO LLO W(B) c o nta ins { FIRST(q ) – Є } U
FO LLO W(A ).
Top-down
Example: Consider a production
rules: E TE’
Parsing
E’ +T E’ | Є
T F T’
T’ *F T’ | Є
F (E) | id
FOLLOW Set
will be:
FO LLO
W(E) = {
$ , ) } //
No te ')'
is the re
b e c a us
e o f 5th
rule
FO LLO
W(E’ ) =
Top-down
Example 2: Consider production rules of Parsing
grammar:
S AC B | C bb |
Ba A d a | BC
Bg | Є
C h | Є
FIRST
Sets:
FIRST(S) =
FIRST(A) U
FIRST(B) U
FIRST(C) =
{ d, g, h,
Є}
FIRST(A ) = { d } U FIRST(B) = { d , g , Є } FIRST(B)
= { g , Є } FIRST(C ) = { h , Є }
Top-down
Production FIRST set:
FIRST(S) = {
Parsing
a }
Rules:
FIRST(B) = { c }
S ->
FIRST(C) = { b , Є }
aBDh B FIRST(D) = FIRST(E) U FIRST(F) = { g, f,
-> cC Є } FIRST(E) = { g , Є } FIRST(F) = { f
C -> bC , Є }
| D ->
F -> f |
EF
Є FOLLOW
FOLLOW(S = { $ }
Set:
E -> g | )
Є FOLLOW(B = { FIRST(D) – Є } U FIRST(h) = { g , f ,
) h }
FOLLOW(C = FOLLOW(B) = { g , f , h }
)
FOLLOW(D = FIRST(h) = { h }
)
FOLLOW(E = { FIRST(F) – Є } U FOLLOW(D) = { f ,
) h }
FOLLOW(F = FOLLOW(D) = { h }
)
Top-down
Limitations of Syntax Analyzers Parsing
It c a nno t d e te rmine if a to ke n is valid.
It c a nno t d e te rmine if a to ke n is dec lared b e fo re it is b e ing
use d .
It cannot determine if a token is initialized before it is being
used,
It c a nno t d e te rmine if a n o p e ra tio n p e rfo rme d o n
a to ke n type is valid or not.
LL(1)
Implementation of LL(1) Implementation
Step1: First c he c k fo r le ft re c ursio n in the g ra mma r,
if the re is le ft recursion in the grammar remove that
a n d g o to step 2.
Step
Left2: Calculate First() a n d Follow() for all non-terminals.
factor ? ?
Top-down
Parsing
Step 3: For e a c h production A α. (A tends to alpha)