Compiler Design Unit 2
Compiler Design Unit 2
The parser collects sufficient number of tokens and builds a parse tree. Then by building the parse tree,
parse smartly finds the syntactical errors if any. It is also necessary that the parse should recover from
commonly occurring errors so that remaining task of process the input can be continued.
The role of the parser code source tokens errors scanner parser IR Parser
• performs context-free syntax analysis
• guides context-sensitive analysis
• constructs an intermediate representation
• produces meaningful error messages
• attempts error correction
In the compiler model, the parser obtains a string of tokens from the lexical analyser, and verifies that the
string can be generated by the grammar for the source language.
The parser returns any syntax error for the source language.
Error-Recovery Strategies:
o There are many different general strategies that a parser can employ to recover from a syntactic error.
1. Panic mode
2. Phrase level
3. Error production
4. Global correction
o Panic mode:
• This is used by most parsing methods.
• On discovering an error, the parser discards input symbols one at a time until one of a designated set
of synchronizing tokens (delimiters; such as; semicolon or end) is found.
• Panic mode correction often skips a considerable amount of input without checking it for additional
errors.
• It is simple.
o Phrase-level recovery:
• On discovering an error; the parser may perform local correction on the remaining input; i.e., it may
replace a prefix of the remaining input by some string that allows the parser to continue.
• e.g., local correction would be to replace a comma by a semicolon, deleting an extraneous
semicolon, or insert a missing semicolon.
• Its major drawback is the difficulty it has in coping with situations in which the actual error has
occurred before the point of detection.
o Error productions:
• If an error production is used by the parser, can generate appropriate error diagnostics to indicate the
erroneous construct that has been recognized in the input.
2
Unit 2 Prof(Dr) Anil Kumar
o Global correction:
• Given an incorrect input string x and grammar G, the algorithm will find a parse tree for a related
string y, such that the number of insertions, deletions and changes of tokens required to transform x
into y is as small as possible.
Context-Free Grammars.
A context-free grammar (CFG) is a set of recursive rewriting rules (or productions) used to generate
patterns of strings. A CFG consists of the following components:
• a set of terminal symbols, which are the characters of the alphabet that appear in the strings
generated by the grammar
• a set of nonterminal symbols, which are placeholders for patterns of terminal symbols that can
be generated by the nonterminal symbols.
• a set of productions, which are rules for replacing (or rewriting) nonterminal symbols (on the
left side of the production) in a string with other nonterminal or terminal symbols (on the right
side of the production).
• a start symbol, which is a special nonterminal symbol that appears in the initial string generated
by the grammar.
To generate a string of terminal symbols from a CFG, we:
• Begin with a string consisting of the start symbol;
• Apply one of the productions with the start symbol on the left hand size, replacing the start
symbol with the right hand side of the production;
• Repeat the process of selecting nonterminal symbols in the string, and replacing them with the
right hand side of some corresponding production, until all nonterminals have been replaced by
terminal symbols.
For example: Simple Arithmetic Expression
• An integer is an arithmetic expression
• If expression 1 and expression 2 are arithmetic expression, the so are following:
Expr 1 – Expr 2
Expr 1 / Expr 2
(Expr1)
The corresponding CFG:
Expr --→ INITLITERAL E -→ initlit
Expr --→ Expr MINUS Expr E--→ E – E
Expr --→ Expr DIVIDE Expr E-→ E / E
Expr – LPAREN Expr RPAREN E-→(E)
A more compact way to write above grammar:
E--→ initlit
|E–E
|E/E
| (E)
Simplification of CFG
Simplification of CFG, is important to simplify because simplify grammar easy to process. To
simplifying CFG there are 3 steps:
1. Removal of Null Variable
2. Removal of Unit Production
3. Removal of Useless Variable
1. Removal of Nullable Variable
Any production rule that is terminated by λ or ε is classified as Nullable variable. E.g. a→ε
3
Unit 2 Prof(Dr) Anil Kumar
4
Unit 2 Prof(Dr) Anil Kumar
But if production, after a number of steps, it not able to generate string that production is
useless. Therefore, it needs to eliminate from grammar.
For example:
S→a|b [Production directly generating the strings, so it is not useless]
A→B[There is loop going on, you not able to generate any string out of this,
B→C [therefore these are useless variable, so need to eliminate]
C→ A
For Example
S→aS|A|C
A→a
B→aa
C→ aCb
Find all variable, that can be produce string with only terminal. Here, we can see, A & B are
only production that will generate a string terminal.
So make a set {A, B} of it.
Now each production, check that if there is any combination of A & B be present.
We find that it is first production S→A, so add it in the set {A, B, S}
Keep only those variable, that produce terminal symbol {A, B, S}, rest of variable is
useless.
In the above grammar, we can see that C is only variable, i.e. not present in set. Hence it is
useless variable, so eliminate it.
S→aS|A
A→a
B→aa [B is not connected to first Production, so need to eliminate it]
Even though B→aa production , is not reachable to S, so it is useless variable, eliminate it also.
Keep only those production in the existing grammar, whose variable is reachable to S. do final
grammar is:
S→aS|A
A→a
For example:
S→AB
A→a|B [A→a; A→B (Unit Production)]
B→b|C [B→b; B→C(unit production)]
C→aC
D→b
In the above grammar A→ a and B→b and D→ b are useless variable, rest of production not
deriving any terminal, therefore it is useless, need to eliminate useless variable).
S→AB
A→a
B→b
D→b [ This Production not reachable to S, therefore it need to eliminate it
also]
So final grammar, after eliminating useless oroduction.
S→AB
A→a
B→b
For Example
S→AB|a
A→a|BC
B→aC|bB
C→aB|bC
D→b
5
Unit 2 Prof(Dr) Anil Kumar
In the above grammar B and C is not deriving any terminal, so B→aC|bB, C→aB|bC are useless
production, it need to eliminate.
S→AB|a
A→a|BC
D→b [This production is unreachable to S, so it is useless, eliminate it also]
Also we need to discard B & C from above grammar
S→a
A→a
Since starting symbol, deriving terminal symbol a, so which is use A→a, discard it also.
So final simplified grammar is:
S→a
Eliminating ambiguity
If the grammar is not ambiguous then we call it unambiguous grammar. If the grammar has ambiguity then it is good for
compiler construction. No method can automatically detect and remove the ambiguity, but we can remove the
ambiguity by re-writing the whole grammar without ambiguity.
If RHS of more than one production starts with the same symbol,then such a grammar is called
as Grammar With Common Prefixes.
Example-
6
Unit 2 Prof(Dr) Anil Kumar
• This kind of grammar creates a problematic situation for Top down parsers.
• Top down parsers can not decide which production must be chosen to parse the string in hand.
To remove this confusion, we use left factoring.
Left factoring is a process by which the grammar with common prefixes is transformed to make
it useful for Top down parsers.
In left factoring,
• We make one production for each common prefixes.
• The common prefix may be a terminal or a non-terminal or a combination of both.
• Rest of the derivation is added by new productions.
The grammar obtained after the process of left factoring is called as Left Factored Grammar.
Problem-01:
7
Unit 2 Prof(Dr) Anil Kumar
Problem-02:
Do left factoring in the following grammar-
A → aAB / aBc / aAc
Solution-
Step-01:
`````A → aA’
` A’ → AB / Bc / Ac
Again, this is a grammar with common prefixes.
Step-02:
A → aA’
A’ → AD / Bc
D→B/c
This is a left factored grammar.
Problem-03:
Solution-
Step-01:
S → bSS’ / a
S’ → SaaS / SaSb / b
Again, this is a grammar with common prefixes.
Step-02:
S → bSS’ / a
S’ → SaA / b
A → aS / Sb
This is a left factored grammar.
8
Unit 2 Prof(Dr) Anil Kumar
Problem-04:
Solution-
Step-01:
S → aS’ / b
S’ → SSbS / SaSb / bb
Again, this is a grammar with common prefixes.
Step-02:
S → aS’ / b
S’ → SA / bb
A → SbS / aSb
This is a left factored grammar.
Problem-05:
Solution-
Step-01:
S → aS’
S’ → b / bc / bcd / ∈
Again, this is a grammar with common prefixes.
Step-02:
S → aS’
9
Unit 2 Prof(Dr) Anil Kumar
S’ → bA / ∈
A → c / cd / ∈
Again, this is a grammar with common prefixes.
Step-03:
S → aS’
S’ → bA / ∈
A → cB / ∈
B→d/∈
This is a left factored grammar.
Problem-06:
Solution-
• A production of grammar is said to have left recursion if the leftmost variable of its RHS is
same as variable of its LHS.
• A grammar containing a production having left recursion is called as Left Recursive Grammar.
10
Unit 2 Prof(Dr) Anil Kumar
Example-
S → Sa / ∈
(Left Recursive Grammar)
Left recursion is eliminated by converting the grammar into a right recursive grammar.
Then, we can eliminate left recursion by replacing the pair of productions with-
A → βA’
A’ → αA’ / ∈
(Right Recursive Grammar)
E→E+T/T
T→TxF/F
F → id
11
Unit 2 Prof(Dr) Anil Kumar
Solution-
Predictive Parsing
it is top down parsing method of syntax analysis in which a set of recursive procedure us used to
process the input string with procedure associated with each non-terminal grammar.
It uses an explicit stack and parsing table to do deterministic top down parsing.
Consider how we would like to parse a program in the little programming language.
• Stmt → if Stmt | Whole Stmt | Begin Stmt | ass-stmt
• If-stmt→ if Boolean then stmt else stmt
• While stmt → while bool-expr do stmt
• Begin-stmt → begin stmt-list end
• Stmt-list → stmt | stmt | stmt-list
• Bool-expr → Arith-expr compare-op arith expr
• Compare-op → <J> <=J> =J =J ! =
• We read the Lexed Program Token-by-token from the start.
The start symbol of the grammar is stmt.
• Suppose the first lexical class in the program is begin.
• From this information, we can tell that the first production in the syntax tree must be
• stmt → begin-stmt
• We thus have to parse the program as begin-stmt.
• We now see that the next production in the syntax tree has to be
• begin-stmt → begin stmt-list end
• We thus have to parse the full program begin …………. As begin stmt-list end.
12
Unit 2 Prof(Dr) Anil Kumar
• We can thus step over begin, and proceed to parse the remaining program …………………..as
as stmt-list end, etc.
Algorithm
• If ( X== a==s) [here TOS, X Symbol and also having $(Looking $), that mean
String Successful parsed]
• The success complete
• If ((x==a) <> $) [a----$, a is Lookahead Symbol (TOS also having a means we already derive a.
so here in this situation, I want to derive next terminal, so I POP small a & increment input
pointer.
• POP TOS Symbol
• And increment Input Pointer
• If X is Variable then T[X,a]
• X→uvw [x→uvw is a production in Predictive Table]
• Replace x by uvw in reverse order
• If (T,a) ==Blank then [if in Table variable X and small a is terminal, no entry, then there are
parsing error]
• ERROR
Check Grammar (LL(1)), depending upon production, 3 Rule
• First (x) = { x }, if x is a terminal
• For example: First a = {a}
• First(abc)={a} [if u are writing a string, First Terminal of String, become First]
• If x → λ or x -*-> A, then add λ to FIRST(x)
• FIRST(x) = λ
• X→ y1, y2, y3 or x→ y1, y2, y3 ---yk when k>=1
• If grammer, production is like x→y1, y2, y3 [terminal depends on fi(y1,y2,y3]
First and Follow
First(A) gives a set of all terminals, that may begin in strings drive from A.
For example:
A→abc | def | ghr
Simplified above grammar is
A→abc [ First (A) ={a
A→def [First(A) = {a, d
A→ghr [First(A) = {a, d, g}
If we want to determine First(A), First (A) is the First Terminal of all string of above production.
For Example
S→AB | b | c
A→a
Determine First of A and First of S
First (S) = {
First (A) = {a}
While we try to determine First(S), in above grammar S→AB is the first production and its first variable
is non-terminal symbol so substitute its value to A→ a i.e.
S→ AB → aB
a
First (S) = {a
Next production is : S→b, First variable of this production is terminal symbol directly add in First(S)
={a ,b
Next production is : S→c, First variable of this production is terminal symbol directly add in First(S)
={a ,b, c}
First (S) = {a, b, c}
First (A) = {a}
13
Unit 2 Prof(Dr) Anil Kumar
Problem is here, both B and C has λ, there is possibility of B→ λ or C→ λ in that case what happened.
A→ λ (A is directly derive λ), so we have to include that one also in First(A).
First (A)= {d, g, h λ}
Next Production is S→AaB | CbB | Ba
S→AaB [in this production first of S is A(non-terminal symbol, so we have to find, what is First of A
variable, i.e. First(A)={d, g, h, λ}
Assign to First of S:
First(S)={d, g, h [we could not assign λ, directly because there is two more
production of S, (if there is only one production, you can add
λ to First(S), we know that there is one more production, so
need to substitute to next element.
S→AaB → aB [so there is possibilities that ‘a’ comes next elements of S]
λ
First (S) = { d, g, h, a
S→CbB [in this production first of S is C(non-terminal symbol, so we have to find,
what is First of C variable, i.e. First(C)={ h, λ},
h is already exist in First (S) = { d, g, h, a
and you could not add λ because one more production is there, so
substitute λ.
S→CbB →bB [So add ‘b’ next element of First (S)]
First(S) ={ d, g, h, a, b}
After Find out First of Above grammar, next need to find out Follow of above Grammar.
Whenever you trying to getting Follow of grammar always start from first production.
S→AaB | CbB | Ba
A→da | BC
B→g | λ
C→h | λ
Initially, we will start $ (special symbol) for starting symbol.
S→AaB | CbB | Ba
Check expression of production, check in given definition where you have variable, S is no way, S does
not occur in other place, so stop it.
Follow(S) ={$}
A→da | BC
A→da [ Follow of capital A is a}
Follow (A)={a}
A→BC [capital B occur in First Production only, i.e. S→Ba, so follow of capital B
is small a, small ‘a’ is already exist in Follow (A), so no need to add,
15
Unit 2 Prof(Dr) Anil Kumar
STOP it]
Follow(A)= {a}
B→g | λ [Find follow of B in whole grammar, capital B occur in 4 places, in two places capital B occur
at end S→CbB and S→AaB (so when variable occur at end, then take follow of that variable.
According to this production, follow of of S be occur Follow of B, Follow(B)={$
S→Ba [next element is a, Follow (B)= {$,a
And in case of A→BC [Follow of B is First C, i.e. c→h | λ
Follow (B)= {$, a, h
When you have λ, apply λ, so B to end, so we have to add Follow(A) inside,i.e. Follow(A)={a}
‘a’ is already exist, in Follow (B), so no need to add and stop it.
Follow(B)={ $, a, h}
C→h | λ [in the above definition, Cis in two places, S→CbB (according to this production Follow of
C is small b, i.e. terminal symbol itself, so add it in Follow of C:
Follow (C) = {b
A→BC [According to this production, C comes to end, so we should add follow of A i.e. Follow
(A)={a}, so add it in Follow of C
Follow (C )= {b, a}
To verify whether grammar is LL(1) or not, need to design a Predictive Table (Predictive Parser
Table).
When you fill the table, take individual production and add in table.
A b D g h $
S S→CbB S→CbB S→AaB S→AaB S→AaB S→Ba
S→Ba
A A→BC A→da A→BC A→BC
B B→ λ B→g B→ λ B→ λ
C C-→λ C→ λ C→h
16
Unit 2 Prof(Dr) Anil Kumar
Next production is S→CbB [in this production First of S is First of C (First ( C)= {h, λ}, so
add c to h (S→CbB)
This C to h, already have one production, so you need to add one more production under it.
In First (C )={h | λ}, next λ, need to check Follow(C ) ={b,a}, so need to add in predictive table S to
b (S→CbB) and S to a (S→CbB)
Next production is S→Ba [First of S is first B, First (B)={g, λ}, so have to include S to g (S→Ba),
And you have λ, you need to check Follow of variable, Follow(B)= {$, a, h}, so need to include S to
$ (S→Ba), S to a (S→Ba) and S to h (S→Ba).
Finally left out λ of every production, i.e B→ λ and C→ λ
As you already know that when you have λ you need to check Follow of variable.
Follow (B) ={ $, a, h}[ so you need to include B→ λ in predictive table(in B to $, B to a and B to h]
Follow(C )={b, a} [so you need to include C→ λ in predictive table(in C to b and C to a)]
So table is completed, but problem is that, so much of places, you have two production in same
column.
So above grammar is not LL(1) grammar.
For LL(1) Grammar, only single production should be in same column of Predictive Table.
LR Parser
Operator-Precedence Parser
• It is also known as Bottom-up Parsing
• We parse the tree construction from bottom to top.
• It can parse only operator Precedence grammar.
• What is Operator Precedence Grammar?
• A grammar G, is said to be Operator Precedence, if it posses following two properties:
• I. No Two Production on the Right Side is ε or λ
17
Unit 2 Prof(Dr) Anil Kumar
• II. There should not be any production rule processing, two adjacent Non-Terminal at the
Right Hand Side.
• It is small but important class of grammar.
• Because it is used to define Mathematical Expression.
• It interprets an Operator Precedence Grammar.
• For Example:
• E → E A E /( E ) / - E / id
• A→+/-/*///^
• In this grammar, we can say that First Condition is fulfil
• But 2nd condition is not fulfil, because E A E, Here 3 non-terminal, it violet 2nd condition.
• Therefore above grammar is not Operator Precedence Grammar.
• But if replace with Production, A → + / - / * / / / ^, then we can get Operator Precedence
Grammar.
• E→ E + E / E – E / E * E/ E / E / E ^ E / id
• Above grammar satisfy both condition, there is no ε RHS and no two non-terminal RHS,
• So grammar is Operator Precedence Grammar.
PRECEDENCE TABLE
⚫ $ id1 + id2 * id3 $ => $ < id1 > + < id2 > * < id3 > $
⚫ To find the handle:
⚫ Scan the input left to right until first >
⚫ scan right to left until first <
⚫ The handle is between < >
⚫ In the example above: the handle is < id1 >
⚫ Then substitute the LHS non-terminal instead of the handle => $ E+ id2 * id3 $
⚫ And so on ..
⚫ If the string is reduced to E1 + E2 * E3, we have two handles:
⚫ E1 + E2 and E2 * E3
⚫ Which of them to choose?
⚫ Reduce the string to $ + * $ by removing non-terminals
⚫ Insert precedence relations: $ < + < * > $
⚫ The handle is: *
⚫ Insert non-terminals around it E2 * E3
For example
Construct Operator Precedence Parser for the following Grammar:
E→E A A / id
A→ + / *
Then Parse the following String
Id + id * id
Let, First we need to check, is given grammar is Operator Precedence Grammar or not
Above grammar is not a operator precedence grammar, because there are 3 non-terminal symbol
together (E→E A A )
Solution
• Step1: So convert it operator precedence grammar, with help of production of A → + / *
• E→ E + E / E * E / id
• Now construct Operator Precedence Table of above grammar, check/ find out Terminal Symbol
of above grammar
• Here 3 terminal symbol: + , * , id
• In addition to these terminal, we have to add one extra terminal symbol that is $.
19
Unit 2 Prof(Dr) Anil Kumar
BASIC PRINCIPLE
• Scan input string left to right, try to detect .> and put a pointer on its location.
• Now scan backwards till reaching <.
• String between <. And .> is our handle.
• Replace handle by the head of the respective production.
• REPEAT until reaching start symbol.
• By applying above rule- parsing the give grammar
• Id + id * id
Step 1:
• Insert $ symbol start and end of input string
• Precedence operator in between every two symbol of the string by referencing the designed
precedence table.
Step 2:
• Start scanning, the string from left until .> & put a pointer on its location.
• Now sacn backwards the string from right to left until <.
• Everything between the two relation (Left angular bracket) <. And .> (right angular bracket)
from the handle.
• Replace handle with the Head of Respective Production
• Repeat the step, until, you reacg start symbol.
20
Unit 2 Prof(Dr) Anil Kumar
First you need to build the operator precedence table for these operator.
A ( ) , $
21
Unit 2 Prof(Dr) Anil Kumar
LR Parser
It is bottom-up parsing technique that efficiently handles deterministic context free language in
guaranteed linear time.
LR parser are used to parse the large class of CFG.
This technique is called LR(k) parsing:
• “L” is the Left-to-Right scanning of the input
• "R" is for constructing a rightmost derivation in reverse.
• K, is the number of input symbols of lookahead that is used in making parsing decision.
Principle Behind LR Parsing
• Does a right most derivation in reverse
• End with the root non-terminal on the stack
• Start with empty stack
• Uses the stack for designating what is already seen
• Build the parse tree bottom-up
• Tries to recognize a right-hand-side on the stack, pop it and pushes the corresponding non-
terminals
• Reduce the non-terminals
• Read the terminals, while it pushes them on the stack
• Po st order traversal of the parse tree
WHY LR PARSING:
22
Unit 2 Prof(Dr) Anil Kumar
The disadvantage is that it takes too much work to constuct an LR parser by hand for a typical
programming-language grammar. But there are lots of LR parser generators available to make this task
easy. However, if the grammar contains ambiguities them, it is too difficult to parse in a left-to-right
scan of the input.
Ambiguity
If a grammar has more than one derivation for a single sentential form, then it is ambiguous.
<stmt> :: if <expr> then < stmt >
| if <expr> then <stmt> else <stmt>
|…………
Consider: if E1 and E2, then S1 and S2
• This has two derivations
• The ambiguity is purely grammatical
• It is called a CFG
Ambiguity may be eliminated by rearranging the grammar:
<stmt> :: <matched>
| <unmatched>
<matched> :: if <expr> then <matched> else
|………
<unmatched> :: if <expr> then <stmt>
| if <expr> then <matched> else <unmatched>
Ambiguity is often due to confusion in the context free specification. Confusion arise from overloading.
LR Parsing Algorithm
Token = next_token ( )
repeat forever
S := Top of Stack
If action [ S, token ] = “shift Si “ then
PUSH token
PUSH Si
Token = next_token ( )
Elseif action [S,Token ] = “reduce A :: = ?” then
POP 2 * |B| Symbol
S = Top of Stack
PUSH A
Push goto [S,A]
Elseif action[S,token] = “accept” then
Return
Else
Error( )
End
LR(0) Items
• In case, any grammar with .(dot) create a production
• A→xyz
• We can get different production of this grammar by using .(dot)
• A→.xyz [whenever we read dot (.) RHS part invisible]
• A→x.yz [in this production x is visible, yz are invisible]
• A→xyz.
23
Unit 2 Prof(Dr) Anil Kumar
• A→ε [A→.]
• A→ wε
• A→.wε
• A→w. ε
• A→w
Transitions
There will be a transition from one state to another state for each grammar symbol in an item that
immediately follows the marker • in an item in that state.
If an item in the state is [A → • X], then
The transition from that state occurs when the symbol X is processed.
The transition is to the state that is the closure of the item [A → X • ].
LR-Parsing model:
LR Parser is an important algo. It required one input buffer, one stack and other one is LR Parsing
Table.
To construct LR Parsing table, we need canonical collection of LR(0) items.
INPUT
And L
24
LR Parsing Program
Unit 2 Prof(Dr) Anil Kumar
Construct canonical collection of LR(o) items and LR(0) Parsing table of following
grammar:
S→AA
A→aA | b
Like LL(1) parsing table you need to find out First and follow of given grammar, similarly in LR(0)
parsing, to construct LR(0) parsing table you need closure and goto.
Whenever you want to proceed LR(0), you need to add one more production in existing grammar, i.e.
S’→S to make it canonical items.
Augmented grammar
S’→S
S→AA
A→aA | b
25
Unit 2 Prof(Dr) Anil Kumar
The table is index by state and symbol. we creates the states already and symbol are given by the
grammar, no you need to create the action within the cells. The goto function defines the transition
between closure.
• If .(dot) is at the end of an items, this is reduction action
• If symbol is a non-terminal, the action is goto
• If symbol is terminal, the action is shift
Action part and goto part common for all the parse table, only thing is different entry goes to final item,
then entry will be different.
LR(0) Parse Table
State ACTION GOTO
a b $ A S
0 S3 S4 2 1
1 Accept
2 S3 S4 5
3 S3 S4 6
4 r3 r3 r3
5 r1 r1 r1
6 r2 r2 r2
26
Unit 2 Prof(Dr) Anil Kumar
Shift=Reduce Parser
A shift-reduce parser is a bottom-up parsing technique that uses a stack. It shifts input symbols onto the
stack and reduces them based on grammar rules until the input is completely parsed. It continuously
reduces or shifts symbols until a valid parse is achieved
Constructs parse tree for an input string beginning at the leaves (the bottom) and working towards the
root (the top)
Example: id*id.
T -> T * F | F id F F id T*F
F -> (E) | id
id id F id
id
The general idea is to shift some symbols of input to the stack until a reduction can be applied
At each reduction step, a specific substring matching the body of a production is replaced by the
nonterminal at the head of the production
The key decisions during bottom-up parsing are about when to reduce and about what production to
apply
A reduction is a reverse of a step in a derivation
The goal of a bottom-up parser is to construct a derivation in reverse:
E=>T=>T*F=>T*id=>F*id=>id*id
Shift Reduce parser attempts for the construction of parse in a similar manner as done in bottom-up
parsing i.e. the parse tree is constructed from leaves(bottom) to the root(up). A more general form of
the shift-reduce parser is the LR parser.
This parser requires some data structures i.e.
• An input buffer for storing the input string.
• A stack for storing and accessing the production rules.
Basic Operations –
• Shift: This involves moving symbols from the input buffer onto the stack.
• Reduce: If the handle appears on top of the stack then, its reduction by using appropriate
production rule is done i.e. RHS of a production rule is popped out of a stack and LHS of a
production rule is pushed onto the stack.
• Accept: If only the start symbol is present in the stack and the input buffer is empty then, the
parsing action is called accept. When accepted action is obtained, it is means successful parsing is
done.
• Error: This is the situation in which the parser can neither perform shift action nor reduce
action and not even accept action.
27
Unit 2 Prof(Dr) Anil Kumar
28
Unit 2 Prof(Dr) Anil Kumar
29
Unit 2 Prof(Dr) Anil Kumar
$ (a,(a,a))$ Shift
$( a,(a,a))$ Shift
$ ( L, ( L ))$ Shift
$ ( L, ( L ) )$ Reduce S → (L)
30
Unit 2 Prof(Dr) Anil Kumar
$ ( L, S )$ Reduce L → L, S
$(L )$ Shift
$S $ Accept
31