Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

CD Unit 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

COMPILER DESIGN

UNIT 3 NOTES

UNIT 3
What is bottom up parsing approach, Types of Bottom up approaches; Introduction to simple LR
Why LR Parsers Model of an LR Parsers Operator Precedence- Shift Reduce Parsing
Difference between LR and LL Parsers, Construction of SLR Tables. More powerful LR parses,
construction of CLR (1), LALR Parsing tables, Dangling ELSE Ambiguity, Error recovery in LR
Parsing. Comparison of all bottoms up approaches with all top down approaches
INTRODUCTION
Bottom-Up Parsing
A bottom-up parse corresponds to the construction of a parse tree for an input string beginning at
the leaves (the bottom) and working up towards the root (the top). It is convenient to describe
parsing as the process of building parse trees, although a front end may in fact carry out a
translation directly without building an explicit tree. The sequence of tree snapshots in Fig. 4.25
illustrates

grammar (4.1).This section introduces a general style of bottom-up parsing known as shift
reduce parsing. The largest class of grammars for which shift-reduce parsers can be built, the LR
grammars.






Bottom-up parsers build parse trees from the leaves and work up to the root.
Bottom-up syntax analysis known as shift-reduce parsing.
An easy-to-implement shift-reduce parser is called operator precedence parsing.
General method of shift-reduce parsing is called LR parsing.
Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves
(the bottom) and working up towards the root (the top).
 At each reduction step a particular substring matching the right side of a production is replaced
by the symbol on the left of that production, and if the substring is chosen correctly at each step,
a rightmost derivation is traced out in reverse.
Consider the grammar
S aABe
A Abc | b
Bd
The sentence abbcde can be reduced to S by the following steps.
abbcde
aAbcde
aAde
aABe
S
1

COMPILER DESIGN

UNIT 3 NOTES

These reductions trace out the following rightmost derivation in reverse.

Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the nonterminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
Precise definition of a handle:
A handle of a right-sentential form is a production A and a position of where the string
may be found and replaced by A to produce the previous right-sentential form in a rightmost
derivation of .

The string w to the right of the handle contains only terminal symbols.
In the example above, abbcde is a right sentential form whose handle is A b at position 2.
Likewise, aAbcde is a right sentential form whose handle is A Abc at position 2.
Handle Pruning:
A rightmost derivation in reverse can be obtained by handle pruning.
i.e., start with a string of terminals w that is to parse. If w is a sentence of the grammar at hand,
then w = n, where n is the nth right sentential form of some as yet unknown rightmost
derivation.
Example for right sentential form and handle for grammar
EE+E
EE*E
E(E)
E id

Introduction to LR Parsing: Simple LR


The most prevalent type of bottom-up parser today is based on a concept called

COMPILER DESIGN

UNIT 3 NOTES

LR(k) parsing; the "L" is for left-to-right scanning of the input, the "R" for constructing a
rightmost derivation in reverse, and the k for the number of input symbols of look ahead that are
used in making parsing decisions. The cases k = 0 or k = 1 are of practical interest, and we shall
only consider LR parsers with k <= 1 here. When (k) is omitted, k is assumed to be 1. This
section introduces the basic concepts of LR parsing and the easiest method for constructing shiftreduce parsers, called "simple LR" (or SLR, for short). Some familiarity with the basic concepts
is helpful even if the LR parser itself is constructed using an automatic parser generator. We
begin with "items" and "parser states;" the diagnostic output from an LR parser generator
typically includes parser states, which can be used to isolate the sources of parsing conflicts.
Why LR Parsers?
LR parsers are table-driven, much like the non recursive LL parsers A grammar for which we
can construct a parsing table using one of the methods in this section and the next is said to be an
LR grammar. Intuitively, for a grammar to be LR it is sufficient that a left-to-right shift-reduce
parser be able to recognize handles of right-sentential forms when they appear
on top of the stack LR parsing is attractive for a variety of reasons:
LR parsers can be constructed to recognize virtually all programming language constructs for
which context-free grammars can be written. Non- LR context-free grammars exist, but these can
generally be avoided for typical programming-language constructs.
The LR-parsing method is the most general nonbacktracking shift-reduce parsing method known,
yet it can be implemented as efficiently as other,more primitive shift-reduce methods (see the
bibliographic notes).An LR parser can detect a syntactic error as soon as it is possible to doso on
a left-to-right scan of the input.The class of grammars that can be parsed using LR methods is a
proper superset of the class of grammars that can be parsed with predictive or LL methods. For a
grammar to be LR(k), we must be able to recognize the occurrence of the right side of a
production in a right-sentential form, with k input symbols of lookahead. This requirement is far
less stringent than that for LL(k) grammars where we must be able to recognize the use of a
production seeing only the first k symbols of what its right side derives. Thus, it should not be
surprising that LR grammars can describe more languages than LL grammars.
Items and the LR(0) Automaton
How does a shift-reduce parser know when to shift and when to reduce? For example, with stack
contents $ T and next input symbol * in Fig. 4.28, how does the parser know that T on the top of
the stack is not a handle, so the appropriate action is to shift and not to reduce T to E? An LR
parser makes shift-reduce decisions by maintaining states to keep track of where we are in a
parse. States represent sets of "items." An LR(0) item (item for short) of a grammar G is a
production of G with a dot at some position of the body. Thus, production A -> XYZ yields the
four items

The production A -> E. generates only one item, A ->.

COMPILER DESIGN

UNIT 3 NOTES

Intuitively, an item indicates how much of a production we have seen at a given point in the
parsing process. For example, the item A -> .XYZ indicates that we hope to see a string
derivable from XYZ next on the input. Item

Closure of Item Sets If I is a set of items for a grammar G, then CLOSURE(Ii)s the set of items
constructed from I by the two rules:
1. Initially, add every item in I to CLOSURE(I).
2. If A -+ a-BP is in CLOSURE(Ia)n d B -+ y is a production, then add the item B ->.y to
CLOSURE(I)if, it is not already there. Apply this rule until no more new items can be added to
CLOSURE (I).

Intuitively, A->cr-B in CLOSURE(iIn)d icates that, at some point in the parsing process, we think we might
next see a substring derivable from B as input. The substring derivable from B will have a prefix derivable
from B by applying one of the B-productions. We therefore add items for all the B-productions; that is, if B>.y is a production, we also include B->.y in CLOSURE
\

COMPILER DESIGN

UNIT 3 NOTES

Example:

To see how the closure is computed, E'->-E is put in CLOSURE(bIy) rule (1). Since there is an E immediately
to the right of a dot, we add the E-productions with dots at the left ends: E->.E+T and E ->.T. Now there is a T
immediately to the right of a dot in the latter item, so we add T ->.T *F and T->.F. Next, the F to the right of a
dot forces us to add F ->.(E) and F -+ -id, but no other items need to be added.
The closure can be computed as in Fig. 4.32. A convenient way to implement the function closure is to keep a
boolean array added, indexed by the non terminals of G, such that added[B] is set to true if and when we add
the item B ->.y for each B-production B-> y.

1. Kernel items: the initial item, S' ->.S, and all items whose dots are not at the left end.
2. Non kernel items: all items with their dots at the left end, except for S' -> .S.
The Function GOT0

COMPILER DESIGN

UNIT 3 NOTES

Computation of the canonical collection of sets of LR(0) items

OPERATOR-PRECEDENCE PARSING
An efficient way of constructing shift-reduce parser is called operator-precedence parsing.
Operator precedence parser can be constructed from a grammar called Operator-grammar. These
grammars have the property that no production on right side is or has two adjacent nonterminals.
Example:
Consider the grammar:
EEAE|(E)|-E|id
A+|-|*|/|
Since the right side EAE has three consecutive non-terminals, the grammar can be written as
follows:
EE+E|E-E|E*E|E/E|EE |-E|id
Operator precedence relations:
There are three disjoint precedence relations namely
<.
- less than
=
- equal to
.
>
- greater than
The relations give the following meaning:
a<.b
- a yields precedence to b
a=b
- a has the same precedence as b
a.>b
- a takes precedence over b
Rules for binary operations:
1. If operator 1 has higher precedence than operator 2, then make
1 . > 2 and 2 < . 1
2. If operators 1 and 2, are of equal precedence, then make
1 . > 2 and 2 . > 1 if operators are left associative
1 < . 2 and 2 < . 1 if right associative
3. Make the following for all operators :
<. id ,id.>
<.(,(<.
).>, .>)
.>$ , $<.
Also make
6

COMPILER DESIGN

UNIT 3 NOTES

( = ) , ( <. ( , ) .> ) , ( <. id , id .> ) , $ <. id , id .> $ , $


Example:
Operator-precedence relations for the grammar
EE+E|E-E|E*E|E/E|EE |(E)|-E|idis given in the following table assuming
1. is of highest precedence and right-associative
2.* and / are of next higher precedence and left-associative, and
3.+ and - are of lowest precedence and left-associative
Note that the blanks in the table denote error entries.
TABLE : Operator-precedence relations

Operator precedence parsing algorithm:


Input
: An input string w and a table of precedence relations.
Output : If w is well formed, a skeletal parse tree ,with a placeholder non-terminal E labeling all
interior nodes; otherwise, an error indication.
Method : Initially the stack contains $ and the input buffer the string w $. To parse, we execute the
following program :
(1) Set ip to point to the first symbol of w$;
(2) repeat forever
(3)if $ is on top of the stack and ip points to $ then
(4)return
else begin
(5)let a be the topmost terminal symbol on the stack
and let b be the symbol pointed to by ip;
.
(6)if a < b or a = b then begin
(7)push b onto the stack;
(8)advance ip to the next input symbol;
end;
7

COMPILER DESIGN

UNIT 3 NOTES

(9)else if a . > b then


/*reduce*/
(10)repeat
(11)pop the stack
(12)until the top stack terminal is related by <.to the terminal most recently popped
(13)else error( )
end
Stack implementation of operator precedence parsing:
Operator precedence parsing uses a stack and precedence relation table for its
implementation of above algorithm. It is a shift-reduce parsing containing all four actions shift,
reduce, accept and error.
The initial configuration of an operator precedence parsing is
STACK INPUT
$
w$
where w is the input string to be parsed
Example:
Consider the grammar E E+E | E-E | E*E | E/E | EE | (E) | id. Input string is
id+id*id .The implementation is as follows:

Advantages of operator precedence parsing:


1. It is easy to implement.
2.Once an operator precedence relation is made between all pairs of terminals of a grammar ,
the grammar can be ignored. The grammar is not referred anymore during implementation.
8

COMPILER DESIGN

UNIT 3 NOTES

Disadvantages of operator precedence parsing:


1. It is hard to handle tokens like the minus sign (-) which has two different precedence.
2. Only a small class of grammar can be parsed using operator-precedence parser.
SHIFT-REDUCE PARSING

Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree
for an input string beginning at the leaves (the bottom) and working up towards the root (the
top).
Example:
Consider the grammar: S aABe
A Abc | b
Bd
The sentence to be recognized is abbcde.

Stack implementation of Shift-reduce parsing:


There are two problems that must be solved to parse by handle pruning.
1. The first is to locate the substring to be reduced in a right sentential form.
2. The second is to determine what production to chose in case there is more than one production
with that substring on the right side.
3 The type of data structure to use in a shift reduces parser is stack.
Implementation of Shift-Reduce Parser:
 To implement shift-reduce parser, use a stack to hold grammar symbols and an input
buffer to hold the string w to be parsed.
 Use $ to mark the bottom of the stack and also the right end of the input.
Initially the stack is empty, and the string w is on the input, as follows:

Stack
$

Input
w$

 The parser operates by shifting zero or more input symbols onto the stack until a handle
is on top of the stack.
 The parser then reduces to the left side of the appropriate production.
 The parser repeats this cycle until it has detected an error or until the stack contains the
start symbol and the input is empty:
Stack
$S

Input
$

After entering this configuration, the parser halts and announces successful completion of
parsing. There are four possible actions that a shift-reduce parser can make: 1) shift 2) reduce 3)
accept 4) error.
9

COMPILER DESIGN

UNIT 3 NOTES

1. In a shift action, the next symbol is shifted onto the top of the stack.
2. In a reduce action, the parser knows the right end of the handle is at the top of the stack. It
must then locate the left end of the handle within the stack and decide with what non terminal to
replace the handle.
3. In an accept action, the parser announces successful completion of parsing.
4. In an error action, the parser discovers that a syntax error has occurred and calls an error
recovery routine.
Note: an important fact that justifies the use of a stack in shift-reduce parsing: the handle will
always appear on top of the stack, and never inside.
Example
Consider the grammar
EE+E
EE*E
E(E)
E id
and the input string id1 + id2 * id3. Use the shift-reduce parser to check whether the input string
is accepted by the grammar.

Viable Prefixes:
The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser
are called viable prefixes.
Conflicts during shift-reduce parsing:
 There are CFGs for which shift-reduce parsing cannot be used.
 For every shift-reduce parser for such grammar can reach a configuration in which the
parser cannot decide whether to shift or to reduce (a shift-reduce conflict), or cannot
decide which of several reductions to make (a reduce/reduce conflict), by knowing the
entire stack contents and the next input symbol.

10

COMPILER DESIGN

UNIT 3 NOTES

Example of such grammars:


 These grammars are not LR(k) class grammars, refer them as no-LR grammars.
 The k in the LR(k) grammars refer to the number of symbols of look ahead on the input.
 Grammars used in compiling usually fall in the LR(1) class, with one symbol look ahead.
 An ambiguous grammar can never be LR.

In this grammar there is a shift/reduce conflict occur for some input string. So this (Grammar
2.8.2) is not LR(1) grammar.
LR PARSERS
An efficient bottom-up syntax analysis technique that can be used
CFG is called LR(k) parsing. The L is for left-to-right scanning of the input, the R for
constructing a rightmost derivation in reverse, and the k for the number of input symbols.
When k is omitted, it is assumed to be 1.
Advantages of LR parsing:
It recognizes virtually all programming language constructs for which CFG can be
written.
It is an efficient non-backtracking shift-reduce parsing method.
A grammar that can be parsed using LR method is a proper superset of a grammar that
can be parsed with predictive parser.
It detects a syntactic error as soon as possible.
Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming language
grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC.
Types of LR parsing method:
1. SLR- Simple LR
Easiest to implement, least powerful.
2. CLR- Canonical LR
Most powerful, most expensive.
3. LALR- Look-Ahead LR
Intermediate in size and cost between the other two methods.

11

COMPILER DESIGN

UNIT 3 NOTES

The LR parsing algorithm:


The schematic form of an LR parser is as follows:

It consists of : an input, an output, a stack, a driver program, and a pa parts (action and goto).
The driver program is the same for all LR parser.
The parsing program reads characters from an input buffer one at a time.
The program uses a stack to store a string of the form s0X1s1X2s2Xmsm, where sm is on
top. Each Xi is a grammar symbol and each si is a state.
The parsing table consists of two parts : action and goto functions.
Action : The parsing program determines sm, the state currently on top of stack, and ai, the
current input symbol. It then consults action[sm,ai] in the action table which can have one of four
values :
1. shift s, where s is a state,
2. reduce by a grammar production A ,
3. accept, and
4. error.
Goto : The function goto takes a state and grammar symbol as arguments and produces a state.
LR Parsing algorithm:
Input: An input string w and an LR parsing table with functions action and goto for grammar G.
Output: If w is in L(G), a bottom-up-parse for w; otherwise, an error indication.
Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input
buffer. The parser then executes the following program :
set ip to point to the first input symbol of w$;
12

COMPILER DESIGN

UNIT 3 NOTES

repeat forever begin


let s be the state on top of the stack and
a the symbol pointed to by ip;
if action[s, a] = shift s then begin
push a then s on top of the stack;
advance ip to the next input symbol end
else if action[s, a] = reduce A then begin
pop 2* | | symbols off the stack;
let s be the state now on top of the stack; push A then goto[s, A] on top of the stack; output the
production A
end
else if action[s, a] = accept then
return
else error( )
end
CONSTRUCTING SLR(1) PARSING TABLE:
To perform SLR parsing, take grammar as input and do the following:
1. Find LR(0) items.
2. Completing the closure.
3. Compute goto(I,X), where, I is set of items and X is grammar symbol.
LR(0) items:
An LR(0) item of a grammar G is a production of G with a dot at some position of the right side. For
example, production A XYZ yields the four items :
A.XYZ
A X . YZ
A XY . Z
A XYZ .
Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I by the two
rules:
1. Initially, every item in I is added to closure(I).
2. If A . B is in closure(I) and B is a production, then add the item B . to I , if it
is not already there. We apply this rule until no more new items can be added to closure(I).
Goto operation:
Goto(I, X) is defined to be the closure of the set of all items [A X . ] such that [A . X] is in
I.
Steps to construct SLR parsing table for grammar G are:
1.Augment G and produce G
2.Construct the canonical collection of set of items C for G
13

COMPILER DESIGN

UNIT 3 NOTES

3.Construct the parsing action function action and goto using the following algorithm that
requires FOLLOW(A) for each non-terminal of grammar.
Algorithm for construction of SLR parsing table:
Input : An augmented grammar G
Output : The SLR parsing table functions action and goto for G Method :
1. Construct C = {I0, I1, . In}, the collection of sets of LR(0) items for G.
2. State i is constructed from Ii.. The parsing functions for state i are determined as follows:
(a) If [Aa] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to shift j. Here a must be
terminal.
(b) If [A] is in Ii , then set action[i,a] to reduce A for all a in FOLLOW(A).
(c) If [SS.] is in Ii, then set action[i,$] to accept.
If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).

3. The goto transitions for state i are constructed for all non-term
If goto(Ii,A) = Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made error
5. The initial state of the parser is the one constructed from the [S.S].
Example for SLR parsing:
Construct SLR parsing for the following grammar : G : E E + T | T
TT*F|F
F (E) | id
The given grammar is :
G:EE+T
E T
TT*F
TF
F (E)
F id

------ (1)
------ (2)
------ (3)
------ (4)
------ (5)
------ (6)

Step 1 : Convert given grammar into augmented grammar. Augmented grammar :


E E
EE+TET
TT*FTF
F (E)
F id

14

COMPILER DESIGN

UNIT 3 NOTES

Step 2 : Find LR (0) items.


I0 : E . E
E .E + T
E .T
T .T * F T .F
F .(E)
F .id
GOTO ( I0 , E)
I1 : E E .
EE.+T

GOTO ( I4 , id )
I5 : F id .
GOTO ( I6 , T )
I9 : E E + T .

GOTO ( I0 , T)
I2 : E T .
TT.*F
TT.*F

GOTO ( I6 , F )
I3 : T F .

GOTO ( I0 , F)
I3 : T F .

GOTO ( I6 , ( )
I4 : F ( . E )
GOTO ( I0 , ( )
I4 : F ( . E)
I5 : F id .
E .E + T
E .T
T .T * F
T .F
F .(E)
F .id

GOTO ( I6 , id)

GOTO ( I7 , F )
I10 : T T * F .
GOTO ( I7 , ( )
I4 : F ( . E )
E .E + T
E .T
T .T * F
T .F
F .(E)
F .id

GOTO ( I0 , id )
I5 : F id .
GOTO ( I1 , + )
I6 : E E + . T
T .T * F
T .F
F .(E)
F .id

GOTO ( I7 , id )
I5 : F id .
15

COMPILER DESIGN

UNIT 3 NOTES

GOTO ( I8 , ) )
I11 : F ( E ) .

GOTO ( I2 , * )
I7 : T T * . F
F .(E)
F .id

GOTO ( I8 , + )
I6 : E E + . T
T.T * F
T.F
F.( E )
F.id

GOTO ( I4 , E )
I8 : F ( E . )
EE.+T

GOTO ( I4 , T)
I2 : E T .
TT.*F

GOTO ( I9 , *)
I7 : T T * . F
F.( E )
F.id

GOTO ( I4 , ( )
I4 : F ( . E)
E .E + T
E .T
T .T * F
T .F
F .(E)
F id
FOLLOW (E) = { $ , ) , +)
FOLLOW (T) = { $ , + , ) , * }
FOOLOW (F) = { * , + , ) , $ }

16

COMPILER DESIGN

UNIT 3 NOTES

SLR parsing table:

Blank entries are error entries.


Stack implementation:
Check whether the input id + id * id is valid or not.

17

COMPILER DESIGN

UNIT 3 NOTES

More Powerful LR Parsers


the previous LR parsing techniques can be extended to use one symbol of lookahead on the
input. There are two different methods:
1.The "canonical-LR" or just "LR" method, which makes full use of
the lookahead symbol(s). This method uses a large set of items, called the LR(1) items.
2. The "lookahead-LR" or "LALR" method, which is based on the LR(0) sets of items, and has
many fewer states than typical parsers based on the LR(1) items. By carefully introducing
lookaheads into the LR(0) items,
we can handle many more grammars with the LALR method than with the SLR method, and
build parsing tables that are no bigger than the SLR tables. LALR is the method of choice in
most situations.
Canonical LR(1) Items
We shall now present the most general technique for constructing an LR parsing table from a
grammar. Recall that in the SLR method, state i calls for reduction by A-> if the set of items I1
contains item [A ->. as] and a is in FOLLOW(A).
In some situations, however, when state i appears on top of the stack, the viable prefix on the
stack is such that A cannot be followed by a in any right-sentential form. Thus, the reduction by
A -> should be invalid on input a.
Constructing LR(1) Sets of Items
The method for building the collection of sets of valid LR(1) items is essentially the same as the
one for building the canonical collection of sets of LR(0) items. We need only to modify the two
procedures CLOSURE and GOTO.

18

COMPILER DESIGN

UNIT 3 NOTES

To appreciate the new definition of the CLOSURE operation, in particular, why b must be in
FIRST(^^), consider an item of the form A->.B in the set of items valid for some viable prefix
y. Then there is a rightmost derivation
Suppose ax derives terminal string by right most derivation . Then for each production of the
form ->q for some q we have derivation

Algorithm 4.53 : Construction of the sets of LR(1) items.


INPUT: An augmented grammar G'.
OUTPUT: The sets of LR(1) items that are the set of items valid for one or more viable prefixes
of G'.
METHOD: The procedures CLOSURE and GOT0 and the main routine items for constructing
the sets of items were shown in Fig. 4.40.

19

COMPILER DESIGN

UNIT 3 NOTES

Example 4.54 : Consider the following augmented grammar.

20

COMPILER DESIGN

UNIT 3 NOTES

Note that I6 differs from I3 only in second components. We shall see that it is common for
several sets of LR(1) items for a grammar to have the same first components and differ in their
second components. When we construct the collection of sets of LR(0) items for the same
grammar, each set of LR(0) items will coincide with the set of first components of one or more
sets of LR(1) items. We shall have more to say about this phenomenon when we discuss LALR
parsing.
Continuing with the GOT0 function for I2 , GOTO(I2,d) is seen to be

Canonical LR(1) parsing tables


We now give the rules for constructing the LR(1) ACTION and GOT0 functions from the sets of
LR(1) items. These functions are represented by a table, as before. The only difference is in the
values of the entries.

21

COMPILER DESIGN

UNIT 3 NOTES

Example
The canonical parsing table for the grammar is shown as given below. Productions 1 ,2 and 3 are S->CC,
C->cC, and C->d respectively.

Constructing LALR Parsing Tables


We now introduce our last parser construction method, the LALR (look ahead-LR) technique.
This method is often used in practice, because the tables obtained by it are considerably smaller
than the canonical LR tables, yet most common syntactic constructs of programming languages
can be expressed conveniently by an LALR grammar. The same is almost true for SLR
grammars, but there are a few constructs that cannot be conveniently handled by SLR techniques
For a comparison of parser size, the SLR and LALR tables for a grammar always have the same
number of states, and this number is typically several hundred states for a language like C. The
canonical LR table would typically have several thousand states for the same-size language.
Thus, it is much easier and more economical to construct SLR and LALR tables than the
canonical LR tables.
Example
Consider the grammar
which generates the four strings acd, ace, bed, and bee. The reader can check that the grammar is
LR(1) by constructing the sets of items. Upon doing so,

generates a reduce ,reduce conflict, since reductions by both A->c and B->c are called for on
inputs d and e.

22

COMPILER DESIGN

UNIT 3 NOTES

We are now prepared to give the first of two LALR table-construction algorithms. The general
idea is to construct the sets of LR(1) items, and if no conflicts arise, merge sets with common
cores. We then construct the parsing table from the collection of merged sets of items. The
method we are about to describe serves primarily as a definition of LALR(1) grammars.
Constructing the entire collection of LR(1) sets of items requires too much space and time to be
useful in practice.

23

COMPILER DESIGN

UNIT 3 NOTES

To see how the GOTO'S are computed, consider GOTO(I36,C) ,I n the original set of LR(1)
items, G0T0(I3, C) = I8,a nd I8 is now part of I89,so we make GOTO(I36,C) be I89
When presented with erroneous input, the LALR parser may proceed to do some reductions after
the LR parser has declared an error. However, the LALR parser will never shift another symbol
after the LR parser declares an error. For example, on input ccd followed by $, the LR parser of
Fig. 4.42 will put
on the stack, and in state 4 will discover an error, because $ is the next input symbol and state 4
has action error on $. In contrast, the LALR parser of Fig. 4.43 will make the corresponding
moves, putting
on the stack. But state 47 on input $ has action reduce C->d. The LALR parser will thus change
Now the action of state 89 on input $ is reduce C->cC. The stack becomes
its stack to
where upon a similar reduction is called for, obtaining stack
Finally, state 2 has action error on input $, so the error is now discovered.
Efficient Construction of LALR Parsing Tables
There are several modifications we can make to Algorithm 4.59 to avoid constructing the full
collection of sets of LR(1) items in the process of creating an LALR(1) parsing table

We shall use as an example of the efficient LALR(1) table construction method the non-SLR
grammar from Example 4.48, which we reproduce below in its augmented form:

The complete sets of LR(0) items for this grammar were shown in Fig. 4.39.The kernels of these
items are shown in Fig. 4.44

24

COMPILER DESIGN

UNIT 3 NOTES

Now we must attach the proper look aheads ta the LR(0) items in the kernels,to create the kernels
of the sets of LALR(1) items. There are two ways a look ahead b can get attached to an LR(0)
item
in some set of LALR(1)items J:

25

COMPILER DESIGN

UNIT 3 NOTES

Efficient computation of the kernels of the LALR(1) collection of sets of items.


INPUT: An augmented grammar G'.
OUTPUT: The kernels of the LALR(1) collection of sets of items for GI.
METHOD:
1. Construct the kernels of the sets of LR(0) items for G. If space is not at a premium, the
simplest way is to construct the LR(0) sets of items, as in Section 4.6.2, and then remove the non
kernel items. If space is severely constrained, we may wish instead to store only the kernel items
for each set, and compute GOT0 for a set of items I by first computing the closure
of I.
2. Apply Algorithm 4.62 to the kernel of each set of LR(0) items and grammar symbol X to
determine which look aheads are spontaneously generated for kernel items in GOTO(I, X), and
from which items in I lookaheads are propagated to kernel items in GOTO(I,X ) .
3. Initialize a table that gives, for each kernel item in each set of items, the associated look
aheads. Initially, each item has associated with it only those look aheads that we determined in
step (2) were generated spontaneously.
4. Make repeated passes over the kernel items in all sets. When we visit an item i, we look up the
kernel items to which i propagates its look aheads, using information tabulated in step (2). The
current set of look aheads for i is added to those already associated with each of the items to
which i propagates its look ahead. We continue making passes over the kernel items until no
more new look ahead are propagated.
Let us construct the kernels of the LALR(1) items for the grammar of Example 4.61. The kernels
of the LR(0) items were shown in Fig. 4.44. When we apply Algorithm 4.62 to the kernel of set
of items Io, we first compute

Among the items in the closure, we see two where the lookahead = has been generated
spontaneously. The first of these is
This item, with * to the right of the dot, gives rise to
That is, = is a spontaneously generated look ahead for
which is in set of items I4 Similarly,[ L->.id,=] tells us that = is a spontaneously generated look
ahead for
in I5
As # is a look ahead for all six items in the closure, we determine that the item
propagates lookaheads to the following six items:

26

in I0

COMPILER DESIGN

UNIT 3 NOTES

27

COMPILER DESIGN

UNIT 3 NOTES

The "Dangling-Else" Ambiguity


Consider again the following grammar for conditional statements:

this grammar is ambiguous because it does not resolve the dangling-else ambiguity. To simplify
the discussion, let us consider an abstraction of this grammar, where i stands for if expr then, e
stands for
else, and a stands for "all other productions.'' We can then write the grammar, with augmenting
production

The sets of LR(0) items for grammar (4.67) are shown in Fig. 4.50. The ambiguity in (4.67)
gives rise to a shift reduce conflict in I4 There, S-> iSeS calls for a shift of e and, since
FOLLOW(S)= {e, $},item S -> iS. calls for reduction by S->is on input e. Translating back to
the if-then-else terminology, given

on the stack and else as the first input symbol, should we shift else onto the stack (i.e., shift e) or
reduce if expr then stmt (i.e, reduce by S -> iS)? The answer is that we should shift else, because
it is "associated" with the previous then. In the terminology of grammar (4.67), the e on the
input, standing for else, can only form part of the body beginning with the iS now on the top of
the stack. If what follows e on the input cannot be parsed as an S, completing body iSeS, then it
can be shown that there is no other parse possible.
We conclude that the shiftlreduce conflict in I4 should be resolved in favor of shift on input e.
The SLR parsing table constructed from the sets of items of Fig. 4.48, using this resolution of the
parsing-action conflict in I4 on input e, is shown in Fig. 4.51. Productions 1 through 3 are
28

COMPILER DESIGN

UNIT 3 NOTES

S ->iSeS S-> is, and S-> a, respectively.

For example, on input iiaea, the parser makes the moves shown in Fig. 4.52, corresponding to the
correct resolution of the "dangling-else." At line (5), state 4 selects the shift action on input e,
whereas at line (9), state 4 calls for reduction by S ->iS on input $.

By way of comparison, if we are unable to use an ambiguous grammar to specify conditional


statements, then we would have to use a bulkier grammar along the lines of Example 4.16
Error Recovery in LR Parsing
An LR parser will detect an error when it consults the parsing action table and finds an error
entry. Errors are never detected by consulting the goto table. An LR parser will announce an
error as soon as there is no valid continuation for the portion of the input thus far scanned. A
canonical LR parser will not make even a single reduction before announcing an error. SLR and
LALR parsers may make several reductions before announcing an error, but they will never shift
an erroneous input symbol onto the stack.
In LR parsing, we can implement panic-mode error recovery as follows. We scan down the stack
until a state s with a goto on a particular non terminal A is found. Zero or more input symbols are
then discarded until a symbol a is found that can legitimately follow A. The parser then stacks
29

COMPILER DESIGN

UNIT 3 NOTES

the state GOTO(s,A ) and resumes normal parsing. There might be more than one choice for the
non terminal A. Normally these would be non terminals representing major program pieces, such
as an expression, statement, or block. For example,
if A is the non terminal stmt, a might be semicolon or ), which marks the end of a statement
sequence.
This method of recovery attempts to eliminate the phrase containing the syntactic error. The
parser determines that a string derivable from A contains an error. Part of that string has already
been processed, and the result of this processing is a sequence of states on top of the stack. The
remainder of the string is still in the input, and the parser attempts to skip over the remainder of
this string by looking for a symbol on the input that can legitimately follow A. By removing
states from the stack, skipping over the input, and pushing GOTO(s, A) on the stack, the parser
pretends that it has found an instance of A and resumes normal parsing.
Phrase-level recovery is implemented by examining each error entry in the LR parsing table and
deciding on the basis of language usage the most likely programmer error that would give rise to
that error. An appropriate recovery procedure can then be constructed; presumably the top of the
stack and/or first input symbols would be modified in a way deemed appropriate for each error
entry.
In designing specific error-handling routines for an LR parser, we can fill in each blank entry in
the action field with a pointer to an error routine that will take the appropriate action selected by
the compiler designer. The actions may include insertion or deletion of symbols from the stack or
the input or both, or alteration and transposition of input symbols. We must make our choices so
that the LR parser will not get into an infinite loop. A safe strategy will assure that at least one
input symbol will be removed or shifted eventually, or that the stack will eventually shrink if the
end of the input has been reached. Popping a stack state that covers a nonterminal should be
avoided, because this modification eliminates from the stack a construct that has already been
successfully parsed.

30

You might also like