CD Unit 3
CD Unit 3
CD Unit 3
UNIT 3 NOTES
UNIT 3
What is bottom up parsing approach, Types of Bottom up approaches; Introduction to simple LR
Why LR Parsers Model of an LR Parsers Operator Precedence- Shift Reduce Parsing
Difference between LR and LL Parsers, Construction of SLR Tables. More powerful LR parses,
construction of CLR (1), LALR Parsing tables, Dangling ELSE Ambiguity, Error recovery in LR
Parsing. Comparison of all bottoms up approaches with all top down approaches
INTRODUCTION
Bottom-Up Parsing
A bottom-up parse corresponds to the construction of a parse tree for an input string beginning at
the leaves (the bottom) and working up towards the root (the top). It is convenient to describe
parsing as the process of building parse trees, although a front end may in fact carry out a
translation directly without building an explicit tree. The sequence of tree snapshots in Fig. 4.25
illustrates
grammar (4.1).This section introduces a general style of bottom-up parsing known as shift
reduce parsing. The largest class of grammars for which shift-reduce parsers can be built, the LR
grammars.
Bottom-up parsers build parse trees from the leaves and work up to the root.
Bottom-up syntax analysis known as shift-reduce parsing.
An easy-to-implement shift-reduce parser is called operator precedence parsing.
General method of shift-reduce parsing is called LR parsing.
Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves
(the bottom) and working up towards the root (the top).
At each reduction step a particular substring matching the right side of a production is replaced
by the symbol on the left of that production, and if the substring is chosen correctly at each step,
a rightmost derivation is traced out in reverse.
Consider the grammar
S aABe
A Abc | b
Bd
The sentence abbcde can be reduced to S by the following steps.
abbcde
aAbcde
aAde
aABe
S
1
COMPILER DESIGN
UNIT 3 NOTES
Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the nonterminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
Precise definition of a handle:
A handle of a right-sentential form is a production A and a position of where the string
may be found and replaced by A to produce the previous right-sentential form in a rightmost
derivation of .
The string w to the right of the handle contains only terminal symbols.
In the example above, abbcde is a right sentential form whose handle is A b at position 2.
Likewise, aAbcde is a right sentential form whose handle is A Abc at position 2.
Handle Pruning:
A rightmost derivation in reverse can be obtained by handle pruning.
i.e., start with a string of terminals w that is to parse. If w is a sentence of the grammar at hand,
then w = n, where n is the nth right sentential form of some as yet unknown rightmost
derivation.
Example for right sentential form and handle for grammar
EE+E
EE*E
E(E)
E id
COMPILER DESIGN
UNIT 3 NOTES
LR(k) parsing; the "L" is for left-to-right scanning of the input, the "R" for constructing a
rightmost derivation in reverse, and the k for the number of input symbols of look ahead that are
used in making parsing decisions. The cases k = 0 or k = 1 are of practical interest, and we shall
only consider LR parsers with k <= 1 here. When (k) is omitted, k is assumed to be 1. This
section introduces the basic concepts of LR parsing and the easiest method for constructing shiftreduce parsers, called "simple LR" (or SLR, for short). Some familiarity with the basic concepts
is helpful even if the LR parser itself is constructed using an automatic parser generator. We
begin with "items" and "parser states;" the diagnostic output from an LR parser generator
typically includes parser states, which can be used to isolate the sources of parsing conflicts.
Why LR Parsers?
LR parsers are table-driven, much like the non recursive LL parsers A grammar for which we
can construct a parsing table using one of the methods in this section and the next is said to be an
LR grammar. Intuitively, for a grammar to be LR it is sufficient that a left-to-right shift-reduce
parser be able to recognize handles of right-sentential forms when they appear
on top of the stack LR parsing is attractive for a variety of reasons:
LR parsers can be constructed to recognize virtually all programming language constructs for
which context-free grammars can be written. Non- LR context-free grammars exist, but these can
generally be avoided for typical programming-language constructs.
The LR-parsing method is the most general nonbacktracking shift-reduce parsing method known,
yet it can be implemented as efficiently as other,more primitive shift-reduce methods (see the
bibliographic notes).An LR parser can detect a syntactic error as soon as it is possible to doso on
a left-to-right scan of the input.The class of grammars that can be parsed using LR methods is a
proper superset of the class of grammars that can be parsed with predictive or LL methods. For a
grammar to be LR(k), we must be able to recognize the occurrence of the right side of a
production in a right-sentential form, with k input symbols of lookahead. This requirement is far
less stringent than that for LL(k) grammars where we must be able to recognize the use of a
production seeing only the first k symbols of what its right side derives. Thus, it should not be
surprising that LR grammars can describe more languages than LL grammars.
Items and the LR(0) Automaton
How does a shift-reduce parser know when to shift and when to reduce? For example, with stack
contents $ T and next input symbol * in Fig. 4.28, how does the parser know that T on the top of
the stack is not a handle, so the appropriate action is to shift and not to reduce T to E? An LR
parser makes shift-reduce decisions by maintaining states to keep track of where we are in a
parse. States represent sets of "items." An LR(0) item (item for short) of a grammar G is a
production of G with a dot at some position of the body. Thus, production A -> XYZ yields the
four items
COMPILER DESIGN
UNIT 3 NOTES
Intuitively, an item indicates how much of a production we have seen at a given point in the
parsing process. For example, the item A -> .XYZ indicates that we hope to see a string
derivable from XYZ next on the input. Item
Closure of Item Sets If I is a set of items for a grammar G, then CLOSURE(Ii)s the set of items
constructed from I by the two rules:
1. Initially, add every item in I to CLOSURE(I).
2. If A -+ a-BP is in CLOSURE(Ia)n d B -+ y is a production, then add the item B ->.y to
CLOSURE(I)if, it is not already there. Apply this rule until no more new items can be added to
CLOSURE (I).
Intuitively, A->cr-B in CLOSURE(iIn)d icates that, at some point in the parsing process, we think we might
next see a substring derivable from B as input. The substring derivable from B will have a prefix derivable
from B by applying one of the B-productions. We therefore add items for all the B-productions; that is, if B>.y is a production, we also include B->.y in CLOSURE
\
COMPILER DESIGN
UNIT 3 NOTES
Example:
To see how the closure is computed, E'->-E is put in CLOSURE(bIy) rule (1). Since there is an E immediately
to the right of a dot, we add the E-productions with dots at the left ends: E->.E+T and E ->.T. Now there is a T
immediately to the right of a dot in the latter item, so we add T ->.T *F and T->.F. Next, the F to the right of a
dot forces us to add F ->.(E) and F -+ -id, but no other items need to be added.
The closure can be computed as in Fig. 4.32. A convenient way to implement the function closure is to keep a
boolean array added, indexed by the non terminals of G, such that added[B] is set to true if and when we add
the item B ->.y for each B-production B-> y.
1. Kernel items: the initial item, S' ->.S, and all items whose dots are not at the left end.
2. Non kernel items: all items with their dots at the left end, except for S' -> .S.
The Function GOT0
COMPILER DESIGN
UNIT 3 NOTES
OPERATOR-PRECEDENCE PARSING
An efficient way of constructing shift-reduce parser is called operator-precedence parsing.
Operator precedence parser can be constructed from a grammar called Operator-grammar. These
grammars have the property that no production on right side is or has two adjacent nonterminals.
Example:
Consider the grammar:
EEAE|(E)|-E|id
A+|-|*|/|
Since the right side EAE has three consecutive non-terminals, the grammar can be written as
follows:
EE+E|E-E|E*E|E/E|EE |-E|id
Operator precedence relations:
There are three disjoint precedence relations namely
<.
- less than
=
- equal to
.
>
- greater than
The relations give the following meaning:
a<.b
- a yields precedence to b
a=b
- a has the same precedence as b
a.>b
- a takes precedence over b
Rules for binary operations:
1. If operator 1 has higher precedence than operator 2, then make
1 . > 2 and 2 < . 1
2. If operators 1 and 2, are of equal precedence, then make
1 . > 2 and 2 . > 1 if operators are left associative
1 < . 2 and 2 < . 1 if right associative
3. Make the following for all operators :
<. id ,id.>
<.(,(<.
).>, .>)
.>$ , $<.
Also make
6
COMPILER DESIGN
UNIT 3 NOTES
COMPILER DESIGN
UNIT 3 NOTES
COMPILER DESIGN
UNIT 3 NOTES
Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree
for an input string beginning at the leaves (the bottom) and working up towards the root (the
top).
Example:
Consider the grammar: S aABe
A Abc | b
Bd
The sentence to be recognized is abbcde.
Stack
$
Input
w$
The parser operates by shifting zero or more input symbols onto the stack until a handle
is on top of the stack.
The parser then reduces to the left side of the appropriate production.
The parser repeats this cycle until it has detected an error or until the stack contains the
start symbol and the input is empty:
Stack
$S
Input
$
After entering this configuration, the parser halts and announces successful completion of
parsing. There are four possible actions that a shift-reduce parser can make: 1) shift 2) reduce 3)
accept 4) error.
9
COMPILER DESIGN
UNIT 3 NOTES
1. In a shift action, the next symbol is shifted onto the top of the stack.
2. In a reduce action, the parser knows the right end of the handle is at the top of the stack. It
must then locate the left end of the handle within the stack and decide with what non terminal to
replace the handle.
3. In an accept action, the parser announces successful completion of parsing.
4. In an error action, the parser discovers that a syntax error has occurred and calls an error
recovery routine.
Note: an important fact that justifies the use of a stack in shift-reduce parsing: the handle will
always appear on top of the stack, and never inside.
Example
Consider the grammar
EE+E
EE*E
E(E)
E id
and the input string id1 + id2 * id3. Use the shift-reduce parser to check whether the input string
is accepted by the grammar.
Viable Prefixes:
The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser
are called viable prefixes.
Conflicts during shift-reduce parsing:
There are CFGs for which shift-reduce parsing cannot be used.
For every shift-reduce parser for such grammar can reach a configuration in which the
parser cannot decide whether to shift or to reduce (a shift-reduce conflict), or cannot
decide which of several reductions to make (a reduce/reduce conflict), by knowing the
entire stack contents and the next input symbol.
10
COMPILER DESIGN
UNIT 3 NOTES
In this grammar there is a shift/reduce conflict occur for some input string. So this (Grammar
2.8.2) is not LR(1) grammar.
LR PARSERS
An efficient bottom-up syntax analysis technique that can be used
CFG is called LR(k) parsing. The L is for left-to-right scanning of the input, the R for
constructing a rightmost derivation in reverse, and the k for the number of input symbols.
When k is omitted, it is assumed to be 1.
Advantages of LR parsing:
It recognizes virtually all programming language constructs for which CFG can be
written.
It is an efficient non-backtracking shift-reduce parsing method.
A grammar that can be parsed using LR method is a proper superset of a grammar that
can be parsed with predictive parser.
It detects a syntactic error as soon as possible.
Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming language
grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC.
Types of LR parsing method:
1. SLR- Simple LR
Easiest to implement, least powerful.
2. CLR- Canonical LR
Most powerful, most expensive.
3. LALR- Look-Ahead LR
Intermediate in size and cost between the other two methods.
11
COMPILER DESIGN
UNIT 3 NOTES
It consists of : an input, an output, a stack, a driver program, and a pa parts (action and goto).
The driver program is the same for all LR parser.
The parsing program reads characters from an input buffer one at a time.
The program uses a stack to store a string of the form s0X1s1X2s2Xmsm, where sm is on
top. Each Xi is a grammar symbol and each si is a state.
The parsing table consists of two parts : action and goto functions.
Action : The parsing program determines sm, the state currently on top of stack, and ai, the
current input symbol. It then consults action[sm,ai] in the action table which can have one of four
values :
1. shift s, where s is a state,
2. reduce by a grammar production A ,
3. accept, and
4. error.
Goto : The function goto takes a state and grammar symbol as arguments and produces a state.
LR Parsing algorithm:
Input: An input string w and an LR parsing table with functions action and goto for grammar G.
Output: If w is in L(G), a bottom-up-parse for w; otherwise, an error indication.
Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input
buffer. The parser then executes the following program :
set ip to point to the first input symbol of w$;
12
COMPILER DESIGN
UNIT 3 NOTES
COMPILER DESIGN
UNIT 3 NOTES
3.Construct the parsing action function action and goto using the following algorithm that
requires FOLLOW(A) for each non-terminal of grammar.
Algorithm for construction of SLR parsing table:
Input : An augmented grammar G
Output : The SLR parsing table functions action and goto for G Method :
1. Construct C = {I0, I1, . In}, the collection of sets of LR(0) items for G.
2. State i is constructed from Ii.. The parsing functions for state i are determined as follows:
(a) If [Aa] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to shift j. Here a must be
terminal.
(b) If [A] is in Ii , then set action[i,a] to reduce A for all a in FOLLOW(A).
(c) If [SS.] is in Ii, then set action[i,$] to accept.
If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).
3. The goto transitions for state i are constructed for all non-term
If goto(Ii,A) = Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made error
5. The initial state of the parser is the one constructed from the [S.S].
Example for SLR parsing:
Construct SLR parsing for the following grammar : G : E E + T | T
TT*F|F
F (E) | id
The given grammar is :
G:EE+T
E T
TT*F
TF
F (E)
F id
------ (1)
------ (2)
------ (3)
------ (4)
------ (5)
------ (6)
14
COMPILER DESIGN
UNIT 3 NOTES
GOTO ( I4 , id )
I5 : F id .
GOTO ( I6 , T )
I9 : E E + T .
GOTO ( I0 , T)
I2 : E T .
TT.*F
TT.*F
GOTO ( I6 , F )
I3 : T F .
GOTO ( I0 , F)
I3 : T F .
GOTO ( I6 , ( )
I4 : F ( . E )
GOTO ( I0 , ( )
I4 : F ( . E)
I5 : F id .
E .E + T
E .T
T .T * F
T .F
F .(E)
F .id
GOTO ( I6 , id)
GOTO ( I7 , F )
I10 : T T * F .
GOTO ( I7 , ( )
I4 : F ( . E )
E .E + T
E .T
T .T * F
T .F
F .(E)
F .id
GOTO ( I0 , id )
I5 : F id .
GOTO ( I1 , + )
I6 : E E + . T
T .T * F
T .F
F .(E)
F .id
GOTO ( I7 , id )
I5 : F id .
15
COMPILER DESIGN
UNIT 3 NOTES
GOTO ( I8 , ) )
I11 : F ( E ) .
GOTO ( I2 , * )
I7 : T T * . F
F .(E)
F .id
GOTO ( I8 , + )
I6 : E E + . T
T.T * F
T.F
F.( E )
F.id
GOTO ( I4 , E )
I8 : F ( E . )
EE.+T
GOTO ( I4 , T)
I2 : E T .
TT.*F
GOTO ( I9 , *)
I7 : T T * . F
F.( E )
F.id
GOTO ( I4 , ( )
I4 : F ( . E)
E .E + T
E .T
T .T * F
T .F
F .(E)
F id
FOLLOW (E) = { $ , ) , +)
FOLLOW (T) = { $ , + , ) , * }
FOOLOW (F) = { * , + , ) , $ }
16
COMPILER DESIGN
UNIT 3 NOTES
17
COMPILER DESIGN
UNIT 3 NOTES
18
COMPILER DESIGN
UNIT 3 NOTES
To appreciate the new definition of the CLOSURE operation, in particular, why b must be in
FIRST(^^), consider an item of the form A->.B in the set of items valid for some viable prefix
y. Then there is a rightmost derivation
Suppose ax derives terminal string by right most derivation . Then for each production of the
form ->q for some q we have derivation
19
COMPILER DESIGN
UNIT 3 NOTES
20
COMPILER DESIGN
UNIT 3 NOTES
Note that I6 differs from I3 only in second components. We shall see that it is common for
several sets of LR(1) items for a grammar to have the same first components and differ in their
second components. When we construct the collection of sets of LR(0) items for the same
grammar, each set of LR(0) items will coincide with the set of first components of one or more
sets of LR(1) items. We shall have more to say about this phenomenon when we discuss LALR
parsing.
Continuing with the GOT0 function for I2 , GOTO(I2,d) is seen to be
21
COMPILER DESIGN
UNIT 3 NOTES
Example
The canonical parsing table for the grammar is shown as given below. Productions 1 ,2 and 3 are S->CC,
C->cC, and C->d respectively.
generates a reduce ,reduce conflict, since reductions by both A->c and B->c are called for on
inputs d and e.
22
COMPILER DESIGN
UNIT 3 NOTES
We are now prepared to give the first of two LALR table-construction algorithms. The general
idea is to construct the sets of LR(1) items, and if no conflicts arise, merge sets with common
cores. We then construct the parsing table from the collection of merged sets of items. The
method we are about to describe serves primarily as a definition of LALR(1) grammars.
Constructing the entire collection of LR(1) sets of items requires too much space and time to be
useful in practice.
23
COMPILER DESIGN
UNIT 3 NOTES
To see how the GOTO'S are computed, consider GOTO(I36,C) ,I n the original set of LR(1)
items, G0T0(I3, C) = I8,a nd I8 is now part of I89,so we make GOTO(I36,C) be I89
When presented with erroneous input, the LALR parser may proceed to do some reductions after
the LR parser has declared an error. However, the LALR parser will never shift another symbol
after the LR parser declares an error. For example, on input ccd followed by $, the LR parser of
Fig. 4.42 will put
on the stack, and in state 4 will discover an error, because $ is the next input symbol and state 4
has action error on $. In contrast, the LALR parser of Fig. 4.43 will make the corresponding
moves, putting
on the stack. But state 47 on input $ has action reduce C->d. The LALR parser will thus change
Now the action of state 89 on input $ is reduce C->cC. The stack becomes
its stack to
where upon a similar reduction is called for, obtaining stack
Finally, state 2 has action error on input $, so the error is now discovered.
Efficient Construction of LALR Parsing Tables
There are several modifications we can make to Algorithm 4.59 to avoid constructing the full
collection of sets of LR(1) items in the process of creating an LALR(1) parsing table
We shall use as an example of the efficient LALR(1) table construction method the non-SLR
grammar from Example 4.48, which we reproduce below in its augmented form:
The complete sets of LR(0) items for this grammar were shown in Fig. 4.39.The kernels of these
items are shown in Fig. 4.44
24
COMPILER DESIGN
UNIT 3 NOTES
Now we must attach the proper look aheads ta the LR(0) items in the kernels,to create the kernels
of the sets of LALR(1) items. There are two ways a look ahead b can get attached to an LR(0)
item
in some set of LALR(1)items J:
25
COMPILER DESIGN
UNIT 3 NOTES
Among the items in the closure, we see two where the lookahead = has been generated
spontaneously. The first of these is
This item, with * to the right of the dot, gives rise to
That is, = is a spontaneously generated look ahead for
which is in set of items I4 Similarly,[ L->.id,=] tells us that = is a spontaneously generated look
ahead for
in I5
As # is a look ahead for all six items in the closure, we determine that the item
propagates lookaheads to the following six items:
26
in I0
COMPILER DESIGN
UNIT 3 NOTES
27
COMPILER DESIGN
UNIT 3 NOTES
this grammar is ambiguous because it does not resolve the dangling-else ambiguity. To simplify
the discussion, let us consider an abstraction of this grammar, where i stands for if expr then, e
stands for
else, and a stands for "all other productions.'' We can then write the grammar, with augmenting
production
The sets of LR(0) items for grammar (4.67) are shown in Fig. 4.50. The ambiguity in (4.67)
gives rise to a shift reduce conflict in I4 There, S-> iSeS calls for a shift of e and, since
FOLLOW(S)= {e, $},item S -> iS. calls for reduction by S->is on input e. Translating back to
the if-then-else terminology, given
on the stack and else as the first input symbol, should we shift else onto the stack (i.e., shift e) or
reduce if expr then stmt (i.e, reduce by S -> iS)? The answer is that we should shift else, because
it is "associated" with the previous then. In the terminology of grammar (4.67), the e on the
input, standing for else, can only form part of the body beginning with the iS now on the top of
the stack. If what follows e on the input cannot be parsed as an S, completing body iSeS, then it
can be shown that there is no other parse possible.
We conclude that the shiftlreduce conflict in I4 should be resolved in favor of shift on input e.
The SLR parsing table constructed from the sets of items of Fig. 4.48, using this resolution of the
parsing-action conflict in I4 on input e, is shown in Fig. 4.51. Productions 1 through 3 are
28
COMPILER DESIGN
UNIT 3 NOTES
For example, on input iiaea, the parser makes the moves shown in Fig. 4.52, corresponding to the
correct resolution of the "dangling-else." At line (5), state 4 selects the shift action on input e,
whereas at line (9), state 4 calls for reduction by S ->iS on input $.
COMPILER DESIGN
UNIT 3 NOTES
the state GOTO(s,A ) and resumes normal parsing. There might be more than one choice for the
non terminal A. Normally these would be non terminals representing major program pieces, such
as an expression, statement, or block. For example,
if A is the non terminal stmt, a might be semicolon or ), which marks the end of a statement
sequence.
This method of recovery attempts to eliminate the phrase containing the syntactic error. The
parser determines that a string derivable from A contains an error. Part of that string has already
been processed, and the result of this processing is a sequence of states on top of the stack. The
remainder of the string is still in the input, and the parser attempts to skip over the remainder of
this string by looking for a symbol on the input that can legitimately follow A. By removing
states from the stack, skipping over the input, and pushing GOTO(s, A) on the stack, the parser
pretends that it has found an instance of A and resumes normal parsing.
Phrase-level recovery is implemented by examining each error entry in the LR parsing table and
deciding on the basis of language usage the most likely programmer error that would give rise to
that error. An appropriate recovery procedure can then be constructed; presumably the top of the
stack and/or first input symbols would be modified in a way deemed appropriate for each error
entry.
In designing specific error-handling routines for an LR parser, we can fill in each blank entry in
the action field with a pointer to an error routine that will take the appropriate action selected by
the compiler designer. The actions may include insertion or deletion of symbols from the stack or
the input or both, or alteration and transposition of input symbols. We must make our choices so
that the LR parser will not get into an infinite loop. A safe strategy will assure that at least one
input symbol will be removed or shifted eventually, or that the stack will eventually shrink if the
end of the input has been reached. Popping a stack state that covers a nonterminal should be
avoided, because this modification eliminates from the stack a construct that has already been
successfully parsed.
30