Chapter 4 Syntax Analysis
Chapter 4 Syntax Analysis
Ø Parser obtains a string of token from the lexical analyzer and reports
syntax error if any otherwise generates syntax tree.
The Role of the Parser
Major task conducted during parsing(syntax analysis):
– the parser obtains a stream of tokens and verifies that token names
can be generated by the grammar for the source language.
• They are any phase structure grammar including all formal grammars.
(iv) P is a finite set of rules, with each rule being a variable and a
string of variables and terminals.
Example #1 - Context-Free Grammars
• G: S AB
A aAA
A aA
Aa
B bB
Bb
1. Q1. Identify Start variable, Terminal symbols , Non terminals and
Production rules.
• Algorithm
• If the left linear grammar has a rule with the start symbol S on the right
hand side, simply add rule: S0 → S
1) If the left linear grammar has a rule then make that a rule in
the right linear grammar
2) If the left linear grammar has a rule then add the following rule
to the right linear grammar
3) If the left linear grammar has a rule add the following rule to
the right linear grammar:
4) If the left linear grammar has a rule then add the following
rule to the right linear grammar:
5) If the left linear grammar has a rule then add the following
rule to the right linear grammar:
Conversion of Left-linear Grammar into Right-Linear Grammar
Left Linear
S → Aa
A → ab
Right Linear
left linear
S → abA
S → Aa
A → ab
2) If the left linear grammar has this rule A → p, then add the
following rule to the right linear grammar: S → pA
21
Right hand side of S has non-terminal
Left Linear Right Linear
S → Aa S → abA
A → ab A→a
4) If the left linear grammar has S → Ap, then add the following rule to
the right linear grammar: A → p
S → Aa S → abA
A → ab A→a
Convert this
23
Right hand side has terminals
S0 → S S0 → aA
S → Ab
S → Sb
A → Aa
A→a
2) If the left linear grammar has this rule A → p, then add the
following rule to the right linear grammar: S → pA
24
Right hand side has non-terminal
25
Right hand side of start symbol has
non-terminal
Left Linear Right Linear
S0 → S S0 → aA
S → Ab A → bS
S → Sb A → aA
A → Aa S → bS
A→a S→ε
S0 → S S0 → aA
S → Ab A → bS
S → Sb A → aA
A → Aa S → bS
A→a S→ε
27
Derivation & Ambiguity
• Derivation: Derivation is used to find whether the string belongs to a
given grammar or not.
1. Leftmost derivation
2. Rightmost derivation
Leftmost Derivation
Example:
Input: –(id + id )
DERIVATION TREES
• Example -1: A grammar G which is context-free has the productions
S → aAB
A → Bba
B → bB
B→c
• The word w = acbabc is derived as follows:
S ⇒ aAB
⇒ a(Bba)B
⇒ acbaB
⇒ acba(bB)
⇒ acbabc.
• Obtain the derivation tree.
DERIVATION TREES
Exercise- Derivation
1. Perform leftmost derivation and draw parse tree.
S A1B
A 0A | �
B 0B | 1B | �
Output string: 1001
Ambiguous grammar is one that produces more than one leftmost or rightmost
derivation for the same sentence.
Ambiguous grammar
Ambiguous grammar is one that produces more than one leftmost or
rightmost derivation for the same sentence.
Example
In other words , in the derivation process starting from any non – terminal A,
if the sentential form starts with the same non-terminal A, then we say that
� →� � |� A →� A
A → � A | ϵ
Example #1:
Eliminate the left recursion from the following grammars:
E → E+T | T
T → T* F | F
F → (E) | id
Left Recursion Elimination
A → Ab | Sa
• Solution: substituting for S in A – production can eliminate the
indirect left recursion from S. So the grammar can be written as,
?
S → Ab | a S → Ab | a
A → aaA1
A → Ab | Aba | aa A1 →b A1 | ba A1 | ϵ
Example #2: eliminate left recursion
Eliminating Ambiguity - Left Factoring
A → α1 | α2 | α3 |…….. |αn , where I (V T)* and does not
start with α.
A → αA’| 1 | 2 | …… | m , where
A’ → 1 | 2 | 3 |…….. |n
Left Factoring - Elimination
Example #1:
Example #2: consider the grammar S → aSa | aa, and remove the
left factoring(if any).
Solution –
S → aSa | aa have α = a as left factor, so removing the left
factoring, we get the productions : S →aS’
S’ → Sa | a
Basic Parsing Techniques
E + T
S id|=|E|;
E E+T|T
T T * F
T T*F|F
F F id
F id
id id
Types of parsing:
1) Top down parsing - parser build parse tree from top to bottom.
Definition of parser
Classification of parsers
Top down Parsing - Backtracking
• Backtracking is top down parsing method that involve repeated scans of the
input.
• If any mismatch occurs then we try another alternative.
• Backtracking provides flexibility to handle ambiguous grammars or
situations where the parser encounters uncertainty in choosing the correct
production rule.
• Grammar: S cAd Input string: cad
A ab|a
Top down Parsing - LL(1) parser (predictive parser)
• LL(1) is non recursive top down parser.
1. First L indicates input is scanned from left to right.
2. The second L means it uses leftmost derivation for input
string
3. 1 means it uses only input symbol to predict the parsing
process.
Top down Parsing - LL(1) parsing (predictive parsing)
• FIRST - is the set of terminals that appear at the beginning of some string
derived from that non-terminal.
2. If � → ∈, add ∈ to � � � � � (� ).
• If � → � 1� 2 … … ..� � ,
� � � � � (� ) = � � � � � (� 1)
– If � 1 derives ∈ � ℎ � � ,
� � � � � (� ) = � � � � � (� 1 )− � U � � � � � (� 2)
– If � 1 & Y2 derives ∈ � ℎ � � ,
� � � � � (� ) = � � � � � (� 1 ) − � U � � � � � (� 2)
− � � � � � � � (� 3)
Top down Parsing - LL(1) parsing (predictive parsing)
Simplification of Rule 3
• If � → � 1� 2 … … . . � � ,
– If � 1 ,Y2 & Y3 derives ∈ � ℎ � � ,
� � � � � (� ) = � � � � � (� 1) − � � � � � � � (�
2) − � � � � � � � (� 3) − � � � � � � � (� 4)
� � � � � (� ) = � � � � � (� 1) − � � � � � � � (�
2) − � � � � � � � (� 3) − � � � � � � � (� 4) − � �
… … … … � � � � � (�
� � ) (note: if all non terminals derives ∈
then add ∈ to FIRST(A))
Top down Parsing - LL(1) parsing (predictive parsing)
Rules to compute FOLLOW of non terminal
• The FOLLOW set of a non-terminal is the set of terminals that can appear
immediately to the right of that non-terminal in some "sentential“ form.
3. If � is in � � � � � (� ), Add � → � to � [� , � ] for
each terminal � in � � � � � � (� ). If � is in � �
� � � (� ), and $ is in � � � � � �(� ), add � → � to �
[� , $].
Example - predictive parsing - LL(1) parsing
Grammar G: A productions A A1| A2|…. |An|1| 2|…| m
E E + E| T where do not start by A.
T T * F|F Replaced by,
F (E)|id A 1A’|2A’|….|mA’
A’ 1 A’|1 A’|. . . |nA’|
Example - predictive parsing - LL(1) parsing
Grammar G:
E E + E| T
T T * F|F
F (E)|id
2) Compute FIRST and FOLLOW:
FOLLOW: If following the variable, you have:
FIRST: If A→, then FIRST(A) ={}
Terminal – write it as it is.
If a production A→a, then FIRST(A)={a}
Non-terminal – write its first elements.
If a production is A→XYZ, then
Last element – write follow of LHS.
1) FIRST(A)= FIRST(X), if first(X) not
contain . If FIRST(X) contains ,
3) Compute FOLLOW:
2) FIRST(A) = FIRST(X) –{} FIRST(Y)
FOLLOW(E) ={), $ }
2) Compute FIRST: FOLLOW(E’)={), $ }
FIRST(E) = FIRST(T) =FIRST(F) ={(,id} FOLLOW(T) ={+, ), $ }
FIRST(E’) ={+, } FOLLOW(T’) = {+,),$ }
FIRST(T’) ={*,} FOLLOW(F) ={*, +,), $}
Example - predictive parsing - LL(1) parsing
+ * ( ) Id $
E E→TE’ E→TE’
E’ E’→+TE’ E’→ E’→
T T→FT’ T→FT’
T’ T’→ T’→*FT’ T’→ T’→
F F →(E) F→ id
Example - predictive parsing - LL(1) parsing
+ * ( ) Id $
E E→TE’ E→TE’
E’ E’→+TE’ E’→ E’→
T T→FT’ T→FT’
T’ T’→ T’→*FT’ T’→ T’→
F F →(E) F→ id
Explanations: FIRST(E), FIRST(T)and FIRST(F) contains {(, id}, hence place E,T& F
productions in respective terminals and FIRST(E’) &FIRST(T’) contains then place E’
→ & T’→ in FOLLOW of E’ and T’.
Top down Parsing - Recursive Descent Parsing
• Recursive descent parser executes a set of recursive procedure to
process the input without backtracking.
– There is a procedure for each non terminal in the grammar.
Here, we start from a sentence and then apply production rules in reverse
manner in order to reach start symbol.
Shift –Reduce
LR Parsing
SLR Parsing
CLR Parsing
LALR Parsing
Bottom up parsing - Shift-Reduce Parsers
Shift-reduce parsers are a type of bottom-up parser used in syntax analysis of
programming languages.
They operate by shifting input symbols onto a stack and reducing them to grammar
rules when a rule can be applied.
Ex.- Consider the grammar:
S → aABe, A → Abc | b, B → d
The sentence to be recognized is abbcde.
Reduction (Leftmost) Rightmost Derivation
abbcde (A → b) S →aABe
aAbcde ( A → Abc)
aAde (B → d)
aABe (S → aABe)
S
The reduction trace out the right – most derivation in reverse.
Bottom up parsing - Shift-Reduce Parsers
Handles - is a substring that matches the right side of a production, and
whose reduction to the non-terminal on the left side of the production.
Example:
– Consider the grammar:
E → E+E
E → E*E
E →(E)
E → id
Input string id1+id2*id3
Stack
Bottom up parsing - SHIFT-REDUCE PARSING
Stack i/o string Actions
1 $ id*(id+id)$ Shift
2 $id *(id+id)$ Shift
3 $E *(id+id)$ Reduce (E →id)
4 $E* (id+id)$ Shift
5 $E*( id+id)$ Shift
6 $E*(id +id)$ Shift
id 7 $E*(E +id)$ Reduce (E →id)
E
) 8 $E*(E+ id)$ Shift
+++ 9 $E*(E+ id )$ Shift
id
E
EEE
E 10 $E*(E+ E )$ Reduce (E →id)
((( 11 $E*(E )$ Reduce (E →E+E)
**** 12 $E*(E) $ Shift
E
E
E
id
E 13 $E*E $ Reduce (E →(E))
$$$ 14 $E $ Accept
LR Parser
• LR parsers are also known as LR(k) parsers, where L stands for left
–to – right scanning of the input stream; R stands for the
construction of right –most derivation in reverse, and k denotes the
number of look ahead symbols to make decisions.
S S1→ S .
S →AA.
A A
S → A .A
A → . aA | . b
a a
b A → a.A |.b A
A → aA .
A → . aA|.b
b a
b
A →b .
S4
Types of LR Parser S-denotes shift
action and
– Example#1: LR(0) parsing for the grammar G: 4-indicates state
(1) S → AA number
(2) A →aA | b(3)
– Find prepare LR(0) parsing table
Action Goto
a b $ A S
0 S3 S4 2 1
1 Accept
2 S3 S4 5
3 S3 S4 6
4 r3 r3 r3
5 r1 r1 r1
6 r2 r2 r2
states
S
S1 →.S a $ S A
A
I2 0 s3 1 2
S → . A/. a
S → A.
A→. a 1 accept
a
I3 2 r1
S → a. 3 r1/r2
A → a.
If there is 2 entries in
same, we call it conflicts.
Meaning the parser is not
SLR parser.
Types of SLR Parser – CLR(1) & LALR(1)
– Example#1: CLR(1) parsing for the grammar:
S → AA
A →aA/b
– Find augmented grammar
Operator-precedence Parsing
The operator-precedence parser is a shift –reduce parser that can be easily
constructed by hand.
=• - equal to
Trailing - refers to the set of terminals that can appear at the end of
strings derived from a non-terminal.
Example:
Given the production rules: Leading(E): { ( , id } Trailing(E): { +, *, ) , id }
1. E → E + T | T
Leading(T): { ( , id } Trailing(T): { *, ) , id }
2. T → T * F | F
3. F → ( E ) | id Leading(F): { ( , id } Trailing(F): { ) , id }
Bottom up parsing - Operator-precedence Parsing
Example#2: Operator-precedence relations for the grammar
S→ a | | (T)
T → T,S | S , is given in the following table
Step 01: Compute LEADING
– LEADING(S) = {a, ,( } Operator Precedence Relation Table
– LEADING (T) = {, , a, , ( } a ( ) , $
a >. >. .>
Step 02: Compute the TRAILING
– TRAILING(S) ={a, ,) } >. >. .>
Planning the error handling right from the start can both
2) Syntax errors: misplaced semicolons, extra or missing braces; that is, " { " or " } "
• Typical syntax errors are: Example : swich(ch)
– Errors in structure {
– Missing operator .......
– Misspelled keywords .......
– Unbalanced parenthesis }
• Example - int 2; The keyword switch is incorrectly written as a
swich. Hence, an “Unidentified keyword/
identifier” error occurs.
Syntax Error Handling
3) Semantic errors: type mismatches between operators and operands.
– Undeclared variables
operator ==.
Error Recovery Strategies-
1) Panic mode
• E x a m p l e : I n c a s e o f a n e r r o r l i ke : a = b + c / / n o s e m i - c o l o n
d=e + f ;
d=e + f
2) Phrase-level recovery
– However, these methods are in general too costly in terms of time and space.
Exercises
Question : Consider the following statements about the context free grammar
I. G is ambiguous
II. G produces all strings with equal number of a’s and b’s
A. I only
D. I, II and III
Exercises
Solution : There are different LMD’s for string abab which can be
S => SS => abS => abab, So the grammar is ambiguous. Therefore statement I is true.
Statement II states that the grammar G produces all strings with equal number of a’s and b’s but it
can’t generate aabb string. So statement II is incorrect.
Statement III is also correct as it can be accepted by deterministic PDA. So correct option is (B).
B. An unambiguous context free grammar always has a unique parse tree for each string of
the language generated by it.
C. Both deterministic and non-deterministic pushdown automata always accept the same set
of languages.
(B) is also correct as unambiguous CFG has a unique parse tree for each string of the
language generated by it.
(C) is false as some languages are accepted by Non – deterministic PDA but not by
deterministic PDA.