Chapter 4 Syntax Analysis
Chapter 4 Syntax Analysis
• Classification of parsing
– Bottom up parsing
Syntax Analysis
Syntax analyzer receives the source code in the
terminals.
G: E E O E| (E) | -E | id
O+|-|*|/|↑
• Write terminals, non terminals, start symbol, and productions for
following grammar.
– Terminals: id + - * / ↑ ( )
– Non terminals: E, O
– Start symbol: E
– Productions: E E O E| (E) | -E | id
O+|-|*|/|↑
Example #2 - Context-Free Grammars
• G: S AB
A aAA
A aA
Aa
B bB
Bb
1. Q1. Identify Start variable, Terminal symbols , Non terminals and
Production rules.
2. Q2. Check if the following input string is accepted or not by the given G.
Input string= ab, aab, aaab , aabba.
Context-Free Grammars
• A context-free grammar -
– Gives a precise syntactic specification of a programming
language.
– The design of the grammar is an initial phase of the design
of a compiler.
– A grammar can be directly converted into a parser by
some tools.
• Parser: program that takes tokens and grammars (CFGs) as
input and validates the output tokens against the grammar.
Context-Free Grammars(CFG)
CFG
• Algorithm
• If the left linear grammar has a rule with the start symbol S on the
right hand side, simply add this rule: S0 → S
1) If the left linear grammar has a rule S → p, then make that a rule in
the right linear grammar
2) If the left linear grammar has a rule A →p, then add the following
rule to the right linear grammar: S → p A
3) If the left linear grammar has a rule B → Ap, add the following rule
to the right linear grammar: A → pB
4) If the left linear grammar has a rule S → Ap, then add the following
rule to the right linear grammar: A → p
5) If the left linear grammar has a rule S → A, then add the following
rule to the right linear grammar: A →
Conversion of Left-linear Grammar into Right-Linear Grammar
Left Linear
S → Aa
A → ab
Right Linear
left linear
S → abA
S → Aa
A → ab
2) If the left linear grammar has this rule A → p, then add the
following rule to the right linear grammar: S → pA
20
Right hand side of S has non-terminal
Left Linear Right Linear
S → Aa S → abA
A → ab A→a
4) If the left linear grammar has S → Ap, then add the following rule to
the right linear grammar: A → p
S → Aa S → abA
A → ab A→a
Both grammars generate this language: {aba}
21
Convert this left linear grammar
Convert this
22
Right hand side has terminals
S0 → S S0 → aA
S → Ab
S → Sb
A → Aa
A→a
2) If the left linear grammar has this rule A → p, then add the
following rule to the right linear grammar: S → pA
23
Right hand side has non-terminal
S0 → S S0 → aA
S → Ab A → bS
S → Sb A → aA
A → Aa S → bS
A→a S→ε
S0 → S S0 → aA
S → Ab A → bS
S → Sb A → aA
A → Aa S → bS
A→a S→ε
26
Derivation & Ambiguity
• Derivation: Derivation is used to find whether the string belongs to a
given grammar or not.
– Derivation is a sequence of production rules.
Production Rules Derivations
1. Leftmost derivation
2. Rightmost derivation
Leftmost Derivation
Example:
Rules: E E+E | E*E | -E | (E) | id
Input: –(id + id )
DERIVATION TREES
• Example -1: A grammar G which is context-free has the productions
S → aAB
A → Bba
B → bB
B→c
• The word w = acbabc is derived as follows:
S ⇒ aAB
⇒ a(Bba)B
⇒ acbaB
⇒ acba(bB)
⇒ acbabc.
• Obtain the derivation tree.
DERIVATION TREES
Exercise- Derivation
1. Perform leftmost derivation and draw parse tree.
S A1B
A 0A | 𝜖
B 0B | 1B | 𝜖
Output string: id + id * id
Exercise- Derivation
Ambiguity
• Ambiguity, is a word, phrase, or statement which contains more
than one meaning.
Ambiguity
A grammar that produces more than one parse tree for some sentence is
said to be ambiguous. Or
Ambiguous grammar is one that produces more than one leftmost or more
than one rightmost derivation for the same sentence.
Ambiguous grammar
Ambiguous grammar is one that produces more than one leftmost or
more than one rightmost derivation for the same sentence.
Grammar: S→S+S | (S) | a Output string: a+a+a
Here, Two leftmost derivation for string a+a+a is possible because Rule of
associativity is not maintained.
Ambiguous grammar
In other words , in the derivation process starting from any non – terminal A,
if the sentential form starts with the same non-terminal A, then we say that
Example #1:
𝐴 → 𝐴𝛼| 𝛽 A →𝛽 A
A → 𝛼 A | ϵ
#2: Eliminate the left recursion from the following grammars:
E → E+T | T
T → T* F | F
F → (E) | id
Left Recursion Elimination
S → Ab | a
A → Ab | Aba | aa
Example #2: eliminate left recursion
Eliminating Ambiguity - Left Factoring
A → α1 | α2 | α3 |…….. |αn , where I (V T)* and does not
start (prefix) with α. All these A- productions have common left factor α.
Left Factoring - Elimination
A → αA’| 1 | 2 | …… | m , where
A’ → 1 | 2 | 3 |…….. |n
Left Factoring - Elimination
Example #1:
• If 𝐴 → 𝑌1𝑌2 … … . . 𝑌𝐾 ,
𝐹𝐼𝑅𝑆𝑇(𝐴) = 𝐹𝐼𝑅𝑆𝑇(𝑌1)
– If 𝑌1 derives ∈ 𝑡ℎ𝑒𝑛,
𝑈 𝐹𝐼𝑅𝑆𝑇(𝑌4)
• Failure when building the tables? Some entry has multiple actions!
– The grammar is not LR
1) Lexical errors: occurs when the compiler does not recognize a sequence of characters
as a proper lexical token.
– Example : printf("Geeksforgeeks");$
2) Syntax errors: misplaced semicolons, extra or missing braces; that is, " { " or " } "
Example : swich(ch)
• Typical syntax errors are: {
– Errors in structure .......
– Missing operator .......
– Misspelled keywords }
– Unbalanced parenthesis The keyword switch is incorrectly written as a
swich. Hence, an “Unidentified keyword/
• Example - int 2; identifier” error occurs.
Syntax Error Handling
3) Semantic errors: type mismatches between operators and
operands.
– Undeclared variables
3) Error productions
– The parser is constructed using augmented grammar with error
productions.
– If an error production is used by the parser, appropriate error diagnostics
can be generated to indicate the erroneous constructs recognized by the
input.
– Example – write 5X instead of 5*X.
Error Recovery Strategies-
4) Global correction
– Choose a minimal sequence of changes to obtain a global least-cost
correction.
– Given an incorrect input string x and grammar G, certain algorithms
can be used to find a parse tree for a string y, such that the number of
insertions, deletions and changes of tokens is as small as possible.
– However, these methods are in general too costly in terms of time and
space.
Exercises
Question : Consider the following statements about the context free grammar
G = {S -> SS, S -> ab, S -> ba, S -> ?}
I. G is ambiguous
II. G produces all strings with equal number of a’s and b’s
III. G can be accepted by a deterministic PDA
Which combination below expresses all the true statements about G?
A. I only
D. I, II and III
Exercises
Solution : There are different LMD’s for string abab which can be
S => SS => SSS => abSS => ababS => abab
S => SS => abS => abab, So the grammar is ambiguous. Therefore statement I is true.
Statement II states that the grammar G produces all strings with equal number of a’s and b’s but it
can’t generate aabb string. So statement II is incorrect.
Statement III is also correct as it can be accepted by deterministic PDA. So correct option is (B).
Solution : (A) is correct because for ambiguous CFL’s, all CFG corresponding to it are
ambiguous.
(B) is also correct as unambiguous CFG has a unique parse tree for each string of the
language generated by it.
(C) is false as some languages are accepted by Non – deterministic PDA but not by
deterministic PDA.