AT&CD Unit 2
AT&CD Unit 2
AT&CD Unit 2
UNIT - II
Context Free grammars and parsing: Context free grammars, derivation, parse trees,
ambiguity LL(K) grammars and LL(1) parsing.
Bottom up parsing handle pruning LR Grammar Parsing, LALR parsing, parsing ambiguous
grammars, YACC programming specification.
Derivation
Derivation is a sequence of production rules. It is used to get the input string through these
production rules. During parsing we have to take two decisions. These are as follows:
o We have to decide the non-terminal which is to be replaced.
o We have to decide the production rule by which the non-terminal will be replaced.
We have two options to decide which non-terminal to be replaced with production rule.
Left-most Derivation
In the left most derivation, the input is scanned and replaced with the production rule
from left to right. So in left most derivatives we read the input string from left to right.
Example:
Production rules:
S=S+S
S=S-S
S = a | b |c
Input:
a–b+c
The left-most derivation is:
S=S+S
S=S-S+S
S=a-S+S
S=a-b+S
S=a-b+c
Right-most Derivation
In the right most derivation, the input is scanned and replaced with the production rule
from right to left. So in right most derivatives we read the input string from right to left.
Example:
S=S+S
S=S-S
S = a | b |c
Input:
a-b+c
The right-most derivation is:
S=S-S
S=S-S+S
S=S-S+c
S=S-b+c
S=a-b+c
Parse tree
o Parse tree is the graphical representation of symbol. The symbol can be terminal or non-
terminal.
o In parsing, the string is derived using the start symbol. The root of the parse tree is that
start symbol.
Production rules:
T= T + T | T * T
T = a|b|c
Input:
a*b+c
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Ambiguity
A grammar is said to be ambiguous if there exists more than one leftmost derivation or
more than one rightmost derivative or more than one parse tree for the given input string. If the
grammar is not ambiguous then it is called unambiguous.
Example:
S = aSb | SS
S=∈
For the string aabb, the above grammar generates two parse trees:
If the grammar has ambiguity then it is not good for a compiler construction. No method
can automatically detect and remove the ambiguity but you can remove ambiguity by re-writing
the whole grammar without ambiguity.
A grammar G is said to be ambiguous if it has more than one parse tree (left or right
derivation) for at least one string.
Example
E→E+E
E→E–E
E → id
For the string id + id – id, the above grammar generates two parse trees:
The language generated by an ambiguous grammar is said to be inherently ambiguous.
Ambiguity in grammar is not good for a compiler construction. No method can detect and
remove ambiguity automatically, but it can be removed by either re-writing the whole grammar
without ambiguity, or by setting and following associativity and precedence constraints.
Associativity
If an operand has operators on both sides, the side on which the operator takes this
operand is decided by the associativity of those operators. If the operation is left-associative, then
the operand will be taken by the left operator or if the operation is right-associative, the right
operator will take the operand.
Example
Operations such as Addition, Multiplication, Subtraction, and Division are left associative. If the
expression contains:
id op id op id
it will be evaluated as:
(id op id) op id
For example, (id + id) + id
Operations like Exponentiation are right associative, i.e., the order of evaluation in the same
expression will be:
id op (id op id)
For example, id ^ (id ^ id)
Precedence
If two different operators share a common operand, the precedence of operators decides
which will take the operand. That is, 2+3*4 can have two different parse trees, one
corresponding to (2+3)*4 and another corresponding to 2+(3*4). By setting precedence among
operators, this problem can be easily removed. As in the previous example, mathematically *
(multiplication) has precedence over + (addition), so the expression 2+3*4 will always be
interpreted as:
2 + (3 * 4)
These methods decrease the chances of ambiguity in a language or its grammar.
Left Recursion
A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation
contains ‘A’ itself as the left-most symbol. Left-recursive grammar is considered to be a
problematic situation for top-down parsers. Top-down parsers start parsing from the Start
symbol, which in itself is non-terminal. So, when the parser encounters the same non-terminal in
its derivation, it becomes hard for it to judge when to stop parsing the left non-terminal and it
goes into an infinite loop.
Example:
(1) A => Aα | β
(2) S => Aα | β
A => Sd
(1) is an example of immediate left recursion, where A is any non-terminal symbol and α
represents a string of non-terminals.
(2) is an example of indirect-left recursion. A top-down parser will first parse the A, which in-
turn will yield a string consisting of A itself and the parser may go into a loop forever.
Removal of Left Recursion
One way to remove left recursion is to use the following technique:
The production
A => Aα | β
is converted into following productions
A => βA'
A'=> αA' | ε
This does not impact the strings derived from the grammar, but it removes immediate left
recursion.
Second method is to use the following algorithm, which should eliminate all direct and
indirect left recursions.
START
Arrange non-terminals in some order like A1, A2, A3,…, An
for each i from 1 to n
{
for each j from 1 to i-1
{
replace each production of form Ai ⟹Aj𝜸
with Ai ⟹ δ1𝜸 | δ2𝜸 | δ3𝜸 |…| 𝜸
where Aj ⟹ δ1 | δ2|…| δn are current Aj productions
}
}
eliminate immediate left-recursion
END
Example
The production set
S => Aα | β
A => Sd
after applying the above algorithm, should become
S => Aα | β
A => Aαd | βd
and then, remove immediate left recursion using the first technique.
A => βdA'
A' => αdA' | ε
Now none of the production has either direct or indirect left recursion.
Left Factoring
If more than one grammar production rules has a common prefix string, then the top-
down parser cannot make a choice as to which of the production it should take to parse the string
in hand.
Example
If a top-down parser encounters a production like
A ⟹ αβ | α𝜸 | …
Then it cannot determine which production to follow to parse the string as both
productions are starting from the same terminal (or non-terminal). To remove this confusion, we
use a technique called left factoring.
Left factoring transforms the grammar to make it useful for top-down parsers. In this
technique, we make one production for each common prefixes and the rest of the derivation is
added by new productions.
Example
The above productions can be written as
A => αA'
A'=> β | 𝜸 | …
Now the parser has only one production per prefix which makes it easier to take decisions.
First and Follow Sets
An important part of parser table construction is to create first and follow sets. These sets
can provide the actual position of any terminal in the derivation. This is done to create the
parsing table where the decision of replacing T[A, t] = α with some production rule.
First Set
This set is created to know what terminal symbol is derived in the first position by a non-
terminal. For example,
α→tβ
That is α derives t (terminal) in the very first position. So, t ∈ FIRST(α).
Algorithm for calculating First set
Look at the definition of FIRST(α) set:
if α is a terminal, then FIRST(α) = { α }.
if α is a non-terminal and α → ℇ is a production, then FIRST(α) = { ℇ }.
if α is a non-terminal and α → 𝜸1 𝜸2 𝜸3 … 𝜸n and any FIRST(𝜸) contains t then t is in
FIRST(α).
Follow Set
Parser
Parser is a compiler that is used to break the data into smaller elements coming from lexical
analysis phase.
A parser takes input in the form of sequence of tokens and produces output in the form of parse
tree.
Parsing is of two types: top down parsing and bottom up parsing.
Bottom up parsing
o Bottom up parsing is also known as shift-reduce parsing.
o Bottom up parsing is used to construct a parse tree for an input string.
o In the bottom up parsing, the parsing starts with the input symbol and construct the parse
tree up to the start symbol by tracing out the rightmost derivations of string in reverse.
Example
Production
E→T
T→T*F
T → id
F→T
F → id
Parse Tree representation of input string "id * id" is as follows:
o Sift reduce parsing performs the two actions: shift and reduce. That's why it is known as
shift reduces parsing.
o At the shift action, the current symbol in the input string is pushed to a stack.
o At each reduction, the symbols will replaced by the non-terminals. The symbol is the
right side of the production and non-terminal is the left side of the production.
Example:
Grammar:
S → S+S
S → S-S
S → (S)
S→a
Input string:
a1-(a2+a3)
Parsing table:
Parsing Action
o Both end of the given input string, add the $ symbol.
o Now scan the input string from left right until the ⋗ is encountered.
o Scan towards left over all the equal precedence until the first left most ⋖ is encountered.
o Everything between left most ⋖ and right most ⋗ is a handle.
o $ on $ means parsing is successful.
Example
Grammar:
E → E+T/T
T → T*F/F
F → id
Given string:
w = id + id * id
Let us consider a parse tree for it as follows:
On the basis of above tree, we can design following operator precedence table:
Now let us process the string with the help of the above precedence table:
LR Parser
LR parsing is one type of bottom up parsing. It is used to parse the large class of
grammars.
In the LR parsing, "L" stands for left-to-right scanning of the input.
"R" stands for constructing a right most derivation in reverse.
"K" is the number of input symbols of the look ahead used to make number of parsing
decision.
LR parsing is divided into four parts: LR (0) parsing, SLR parsing, CLR parsing and
LALR parsing.
LR algorithm:
The LR algorithm requires stack, input, output and parsing table. In all type of LR parsing, input,
output and stack are same but parsing table is different.
LR(0) Table
o If a state is going to some other state on a terminal then it correspond to a shift move.
o If a state is going to some other state on a variable then it correspond to go to move.
o If a state contain the final item in the particular row then write the reduce node
completely.
Explanation:
o I0 on S is going to I1 so write it as 1.
o I0 on A is going to I2 so write it as 2.
o I2 on A is going to I5 so write it as 5.
o I3 on A is going to I6 so write it as 6.
o I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
o I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
o I4, I5 and I6 all states contains the final item because they contain • in the right most end.
So rate the production as production number.
Productions are numbered as follows:
S → AA ... (1)
A → aA ... (2)
A → b ... (3)
o I1 contains the final item which drives(S` → S•), so action {I1, $} = Accept.
o I4 contains the final item which drives A → b• and that production corresponds to the
production number 3 so write it as r3 in the entire row.
o I5 contains the final item which drives S → AA• and that production corresponds to the
production number 1 so write it as r1 in the entire row.
o I6 contains the final item which drives A → aA• and that production corresponds to the
production number 2 so write it as r2 in the entire row.
If a state (Ii) is going to some other state (Ij) on a variable then it correspond to go to
move in the Go to part.
If a state (Ii) contains the final item like A → ab• which has no transitions to the next
state then the production is known as reduce production. For all terminals X in FOLLOW (A),
write the reduce entry along with their production numbers.
Example
S -> •Aa
A->αβ•
Follow(S) = {$}
Follow (A) = {a}
SLR ( 1 ) Grammar
S→E
E→E+T|T
T→T*F|F
F → id
Add Augment Production and insert '•' symbol at the first position for every production in G
S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •E)
Add all productions starting with E in to I0 State because "." is followed by the non-terminal. So,
the I0 State becomes
I0 = S` → •E
E → •E + T
E → •T
Add all productions starting with T and F in modified I0 State because "." is followed by the
non-terminal. So, the I0 State becomes.
I0= S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
I1= Go to (I0, E) = closure (S` → E•, E → E• + T)
I2= Go to (I0, T) = closure (E → T•T, T• → * F)
I3= Go to (I0, F) = Closure ( T → F• ) = T → F•
I4= Go to (I0, id) = closure ( F → id•) = F → id•
I5= Go to (I1, +) = Closure (E → E +•T)
Add all productions starting with T and F in I5 State because "." is followed by the non-terminal.
So, the I5 State becomes
I5 = E → E +•T
T → •T * F
T → •F
F → •id
Go to (I5, F) = Closure (T → F•) = (same as I3)
Go to (I5, id) = Closure (F → id•) = (same as I4)
I6= Go to (I2, *) = Closure (T → T * •F)
Add all productions starting with F in I6 State because "." is followed by the non-terminal. So,
the I6 State becomes
I6 = T → T * •F
F → •id
Go to (I6, id) = Closure (F → id•) = (same as I4)
I7= Go to (I5, T) = Closure (E → E + T•) = E → E + T•
I8= Go to (I6, F) = Closure (T → T * F•) = T → T * F•
Drawing DFA:
Explanation:
First (E) = First (E + T) ∪ First (T)
First (T) = First (T * F) ∪ First (F)
First (F) = {id}
Add Augment Production, insert '•' symbol at the first position for every production in G and
also add the lookahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "." is followed by the non-terminal. So,
the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "." is followed by the non-
terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because "." is followed by the non-terminal. So,
the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
I3= Go to (I0, a) = Closure ( A → a•A, a/b )
Add all productions starting with A in I3 State because "." is followed by the non-terminal. So,
the I3 State becomes
I3= A → a•A, a/b
A → •aA, a/b
A → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $
I6= Go to (I2, a) = Closure (A → a•A, $)
Add all productions starting with A in I6 State because "." is followed by the non-terminal. So,
the I6 State becomes
I6 = A → a•A, $
A → •aA, $
A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
Go to (I6, b) = Closure (A → b•, $) = (same as I7)
I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b
I9= Go to (I6, A) = Closure (A → aA•, $) = A → aA•, $
Drawing DFA:
The I8 and I9 are same but they differ only in their look ahead, so we can combine them and
called as I89.
I89 = {A → aA•, a/b/$}
Drawing DFA: