Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Compiler 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

CS1601 COMPILER Department of CSE

DESIGN
UNIT II SYNTAX ANALYSIS

Need and Role of the Parser-Context Free Grammars -Top Down Parsing -General Strategies-
1601602 / Compiler
Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce Parser-LR Parser-LR
(0)Item-Construction of SLR Parsing Table -Introduction to LALR Parser - Error Handling
Design
and Recovery in Syntax Analyzer-YACC-Design of a syntax Analyzer for a Sample Language
.

NEED AND ROLE OF THE PARSER


SYNTAX ANALYSIS
Syntax analysis is the second phase of the compiler. It gets the input from the tokens and
generates a syntax tree or parse tree.
Advantages of grammar for syntactic specification :

A grammar gives a precise and easy-to-understand syntactic specification of a programming

language.

An efficient parser can be constructed automatically from a properly designed grammar.

A grammar imparts a structure to a source program that is useful for its translation into
object code and for the detection of errors.

New constructs can be added to a language more easily when there is a grammatical
description of the language.

THE ROLE OF PARSER


The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer and verifies
that the string can be generated by the grammar for the source language. It reports any syntax
errors in the program. It also recovers from commonly occurring errors so that it can continue
processing its input.

Functions of the parser :



It verifies the structure generated by the tokens based on the grammar.

It constructs the parse tree.

It reports the errors.

It performs error recovery.

St. Joseph‟s College of Engineering Page 1


CS1601 COMPILER Department of CSE
DESIGN
Issues handled by Semantic Analysis phase are

Variable re-declaration

Variable initialization before use.
1601602 /➢Compiler
Data type mismatch for an operation.
Design
Syntax error handling :
Programs can contain errors at ma ny different levels. For example :

Lexical, such as misspelling a keyword.

Syntactic, such as an arithmetic expression with unbalanced parentheses.

Semantic, such as an operator applied to an incompatible operand.

Logical, such as an infinitely recursive call.
Functions of error handler :

It should report the presence of errors clearly and accurately.

It should recover from each error quickly enough to be able to detect subsequent errors.

It should not significantly slow down the processing of correct programs.

CONTEXT-FREE GRAMMARS

A Context-Free Grammar is a quadruple that consists of terminals, non-terminals, start symbol


and productions.
Terminals : These are the basic symbols from which strings are formed. Non-Terminals : These
are the syntactic variables that denote a set of strings. These help to define the language
generated by the grammar.
Start Symbol : One non-terminal in the grammar is denoted as the “Start-symbol” and the set of
strings it denotes is the language defined by the grammar.
Productions : It specifies the manner in which terminals and non-terminals can be combined to
form strings. Each production consists of a non-terminal, followed by an arrow, followed by a
string of non-terminals and terminals.

Example of context-free grammar: The following grammar defines simple arithmetic


expressions:
E → E+E |E*E |( E ) | - E | id
In this grammar,
id + - * / ↑( ) are terminals.
E is non-terminal.
E is the start symbol.
Each line is a production – 5 productions.
E → E+E
E →E*E
E→(E)
E→-E
E → id
Derivations:
Two basic requirements for a grammar are :
1. To generate a valid string.

St. Joseph‟s College of Engineering Page 2


CS8602 / Compiler Design Department of CSE

2. To recognize a valid string.


Derivation is a process that generates a valid string with the help of grammar by replacing the
nonterminals on the left with the string on the right side of the production.

Example : Consider the following grammar for arithmetic expressions :


E → E+E |E*E |( E ) | - E | id
Word = “- (id+id ) “
To generate a valid string - (id+id ) from the grammar the steps are
1. E → - E
2. E → - (E )
3. E → - (E+E )
4. E → - (id+E )
5. E → - (id+id )

E is the start symbol.

-(id+id) is the required sentence (only terminals).

Strings such as E, -E, -(E), . . . are called sentinel forms.
REGULAR EXPRESSION CONTEXT-FREE GRAMMAR
It is used to describe the tokens of It consists of a quadruple where S → start
programming languages. symbol, P → production, T → terminal, V →
variable or non- terminal.
It is used to check whether the given input is It is used to check whether the given input is
valid or not using transition diagram. valid or not using derivation.
The transition diagram has set of states and The context-free grammar has set of
edges. productions.
It has no start symbol. It has start symbol.
It is useful for describing the structure of It is useful in describing nested structures
lexical constructs such as identifiers, constants, such as balanced parentheses, matching
keywords, and so forth. begin-end and so on.

Types of derivations
The two types of derivation are:
1. Left most derivation - In leftmost derivations, the leftmost non-terminal in each
sentinel is always chosen first for replacement.
2. Right most derivation - In rightmost derivation s, the rightmost non-terminal in each sentinel
is always chosen first for replacement.

Example: Given grammar G : E → E+E |E*E |(E ) | - E


|id Sentence to be derived : – (id+id) E → E+E

E →E*E
E→(E)
E→-E
E → id

St. Joseph‟s College of Engineering Page 3


CS8602 / Compiler Design Department of CSE


String that appear in leftmost derivation are called left sentinel forms.

String that appear in rightmost derivation are called right sentinel forms.

Sentinels:
Given a grammar G with start symbol S, if S → α , where α may contain non-terminals or
terminals, then α is called the sentinel form of G.
Yield or frontier of tree:
Each interior node of a parse tree is a non-terminal. The children of node can be a terminal or
non-terminal of the sentinel forms that are read from left to right. The sentinel form in the parse
tree is called yield or frontier of the tree.

Ambiguity:
A grammar that produces more than one parse for some sentence is said to be ambiguous
grammar.

Example 1 : Given grammar G : E → E+E | E*E | ( E ) | - E | id


The sentence id+id*id has the following two distinct LMD derivations:

St. Joseph‟s College of Engineering Page 4


CS8602 / Compiler Design Department of CSE

Example 2:
Consider the grammar:
S → aABe
A → Abc | b
B→ d
The sentence to be recognized is abbcde.

The reductions trace out the right-most derivation in reverse.

Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the
reverse of a rightmost derivation.

St. Joseph‟s College of Engineering Page 5


CS8602 / Compiler Design Department of CSE

Example:
Consider the grammar:
E → E+E
E → E*E
E → (E)
E → id
And the input string id1+id2*id3

The rightmost derivation is :


E → E+E
→ E+E*E
→ E+E*id3
→ E+id2*id 3
→ id1+id2*id 3
In the above derivation the underlined substrings are called handles.

Handle pruning:
A rightmost derivation in reverse can be obtained by “handle pruning”.
(i.e.) if w is a sentence or string of the grammar at hand, then w = γn, where γn is the nth right
sentinel form of some rightmost derivation.

Eliminating ambiguity:

Ambiguity of the grammar that produces more than one parse tree for leftmost or rightmost
derivation can be eliminated by re-writing the grammar.

Consider this example, G: stmt → if expr then stmt | if expr then stmt else stmt | other

This grammar is ambiguous since the string if E1 then if E2 then S1 else S2 has the following
two parse trees for leftmost derivation :

St. Joseph‟s College of Engineering Page 6


CS8602 / Compiler Design Department of CSE

To eliminate ambiguity, the following grammar may be used:


stmt → matched_stmt | unmatched_stmt
matched_stmt → if expr then matched_stmt else matched_stmt | other
unmatched_stmt → if expr then stmt | if expr then matched_stmt else unmatched_stmt

Eliminating Left Recursion:


A grammar is said to be left recursive if it has a non-terminal A such that there is a derivation
A=>Aα for some string α. Top-down parsing methods cannot handle left-recursive grammars.
Hence, left recursion can be eliminated as follows:
If there is a production A → Aα |β it can be replaced with a sequence of two productions
A → βA’
A’→ αA’ | ε without changing the set of strings derivable from A.

Example : Consider the following grammar for arithmetic expressions:


E → E+T |T
T → T*F |F
F→ (E) |id

First eliminate the left recursion for E as


E → TE‟
E‟ → +TE‟ |ε

Then eliminate for T as


T → FT‟
T‟→ *FT‟ | ε
Thus the obtained grammar after eliminating left recursion is
E → TE‟
E‟ → +TE‟ | ε
T → FT‟
T‟ → *FT‟ | ε
F → (E) |id

St. Joseph‟s College of Engineering Page 7


CS8602 / Compiler Design Department of CSE

Algorithm to eliminate left recursion:


1. Arrange the non-terminals in some order A1, A2 . . . An.
2. for i := 1 to n do begin
for j := 1 to i-1 do begin
replace each production of the form Ai → A j γ by
the productions Ai → δ1 γ | δ2γ | . . . | δk γ
where Aj → δ1 | δ2 | . . . | δk are all the current Aj-productions;
end
eliminate the immediate left recursion among the Ai-productions
end

Left factoring:
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing. When it is not clear which of two alternative productions to use to expand a
non-terminal A, we can rewrite the A-productions to defer the decision until we have seen
enough of the input to make the right choice.

If there is any production A → αβ1 | αβ2 , it can be rewritten as


A → αA’
A’→ β1 | β2

Consider the grammar , G : S → iEtS | iEtSeS | a


E→b

Left factored, this grammar becomes


S → iEtSS‟ | a
S‟→ eS |ε
E→b

TOP-DOWN PARSING
It can be viewed as an attempt to find a left-most derivation for an input string or an
attempt to construct a parse tree for the input starting from the root to the leaves.
Types of top-down parsing :
1. Recursive descent parsing
2. Predictive parsing

1. RECURSIVE DESCENT PARSING



Recursive descent parsing is one of the top-down parsing techniques that uses a set of
recursive procedures to scan its input.

This parsing method may involve backtracking, that is, making repeated scans of the
input.
Example for backtracking :
Consider the grammar G : S → cAd
A → ab |a
and the input string w=cad.

St. Joseph‟s College of Engineering Page 8


CS8602 / Compiler Design Department of CSE

The parse tree can be constructed using the following top-down approach :
Step1:
Initially create a tree with single node labeled S. An input pointer points to „c‟, the first symbol
of w. Expand the tree with the production of S.

Step2:
The leftmost leaf „c‟ matches the first symbol of w, so advance the input pointer to the second
symbol of w „a‟ and consider the next leaf „A‟. Expand A using the first alternative.

Step3:
The second symbol „a‟ of w also matches with second leaf of tree. So advance the input pointer
to third symbol of w „d‟. But the third leaf of tree is b which does not match with the input
symbol d.
Hence discard the chosen production and reset the pointer to second position. This is called
backtracking.
Step4:
Now try the second alternative for A.

Now we can halt and announce the successful completion of parsing.


Example for recursive decent parsing:
A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop.
Hence, elimination of left-recursion must be done before parsing.
Consider the grammar for arithmetic
expressions E → E+T |T
T → T*F |F
F→ (E) |id

St. Joseph‟s College of Engineering Page 9


CS8602 / Compiler Design Department of CSE

After eliminatingthe left-recursion the grammar


becomes, E → TE‟
E‟ → +TE‟ | ε
T → FT‟
T‟ → *FT‟ | ε
F → (E) |id
PREDICTIVE PARSING
Predictive parsing is a special case of recursive descent parsing where no backtracking is
required.
The key problem of predictive parsing is to determine the production to be applied for a
non- terminal in case of alternatives.
Non-recursive predictive parser

The table-driven predictive parser has an input buffer, stack, a parsing table and an
output stream.

Input buffer:

It consists of strings to be parsed, followed by $ to indicate the end of the input string.

Stack:

It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the


stack. Initially, the stack contains the start symbol on top of $.

Parsing table:

It is a two-dimensional array M[A, a], where ‘A’ is a non-terminal and ‘a’ is a terminal.

Predictive parsing program:

The parser is controlled by a program that considers X, the symbol on top of stack, and

St. Joseph‟s College of Engineering Page 10


CS8602 / Compiler Design Department of CSE

a, the current input symbol. These two symbols determine the parser action. There are
three possibilities:

1. If X = a = $, the parser halts and announces successful completion of parsing.


2. If X = a ≠ $, the parser pops X off the stack and advances the input pointer to the
next input symbol.
3. If X is a non-terminal, the program consults entry M[X, a] of the parsing table M.
This entry will either be an X-production of the grammar or an error entry.
If M[X , a] = {X → UVW},the parser replaces X on top of the stack by
WVU. If M[X , a] = error, the parser calls an error recovery routine.

pop X from the stack;


push Yk, Yk-1, … ,Y1 onto the stack, with Y1
on top;
output the production X → Y1 Y2 . . . Yk
end
until X = else error()
$ /* stack is empty */

Predictive parsing table construction:


The construction of a predictive parser is aided by two functions associated with a
grammar G :

FIRST

FOLLOW

RULES FOR FIRST( ):


1. If X is terminal, then FIRST(X) is {X}.
2. If X → ε is a production, then add ε to FIRST(X).
3. If X is non- terminal and X → aα is a production then add a to FIRST(X).
4. If X is non- terminal and X → Y1 Y2…Yk is a production, then place a in FIRST(X) if
for some i, a is in FIRST(Yi), and ε is in all of FIRST(Y1),…,FIRST(Yi-1); that is,
Y1,….Yi-1 => ε. If ε is in FIRST(Y j) for all j=1,2,..,k, then add ε to FIRST(X).

RULES FOR FOLLOW( ):


1. If S is a start symbol, then FOLLOW(S) contains $.
2. If there is a production A → αBβ, then everything in FIRST(β) except ε is placed in
follow(B).
3. If there is a production A → αB, or a production A → αBβ where FIRST(β) contains
ε, then everything in FOLLOW(A) is in FOLLOW(B).

Algorithm for construction of predictive parsing table:


Input : Grammar G
Output : Parsing table M

St. Joseph‟s College of Engineering Page 11


CS8602 / Compiler Design Department of CSE

Method :
1. For each production A →α of the grammar, do steps 2 and 3.
2. For each terminal a in FIRST(α), add A → α to M[A, a].
3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in FOLLOW(A). If ε is in
FIRST(α) and $ is in FOLLOW(A) , add A → α to M[A, $].
4. Make each undefined entry of M be error.

Example:
Consider the following grammar :
E → E+T |T
T → T*F |F
F→ (E) |id

After eliminating left-recursion the grammar is

E → TE‟
E‟ → +TE‟ |ε
T → FT‟
T‟ → *FT‟ | ε
F → (E) |id

FIRST( ) :

FIRST(E) ={ (, id}

FIRST(E‟) ={+ , ε }

FIRST(T) = { ( , id}

FIRST(T‟) ={*, ε }

FIRST(F) ={ ( , id }

FOLLOW( ):

FOLLOW(E) ={ $, ) }

FOLLOW(E‟) ={ $, ) }

FOLLOW(T) ={ +, $, ) }

FOLLOW(T‟) = { +, $, ) }

FOLLOW(F) ={+, * , $ , ) }

St. Joseph‟s College of Engineering Page 12


CS8602 / Compiler Design Department of CSE

Predictive parsing table :

NON- id + * ( ) $
TERMINAL
E E → TE‟ E → TE‟
E‟ E‟ → +TE‟ E‟ → ε E‟→ ε
T T → FT‟ T → FT‟
T‟ T‟→ ε T‟→ *FT‟ T‟ → ε T‟ → ε
F F→ id F→ (E)

Stack implementation:

stack Input Output


id+id id
$E $
$E‟T id+id*id $ E → TE‟
$E‟T‟F id+id*id $ T → FT‟
$E‟T‟id id+id*id $ F→ id
$E‟T‟ +id*id $
$E‟ +id*id $ T‟ → ε
$E‟T+ +id*id $ E‟ → +TE‟
$E‟T id*id $
$E‟T‟F id*id $ T → FT‟
$E‟T‟id id*id $ F→ id
$E‟T‟ *id $
$E‟T‟F* *id $ T‟ → *FT‟
$E‟T‟F id $
$E‟T‟id id $ F→ id
$E‟T‟ $
$E‟ $ T‟ → ε
$ $ E‟ → ε
LL(1) GRAMMAR
The parsing table entries are single entries. So each location has not more than one
entry. This type of grammar is called LL(1) grammar. Consider this following
grammar:
S → iEtS | iEtSeS | a

St. Joseph‟s College of Engineering Page 13


CS8602 / Compiler Design Department of CSE

E→b
After eliminating left factoring, we have
S → iEtSS‟ |a
S‟→ eS | ε
E→b
To construct a parsing table, we need FIRST()and FOLLOW() for all the non-terminals.
FIRST(S) ={ i, a }
FIRST(S‟) = {e, ε }
FIRST(E) ={ b}

FOLLOW(S) ={ $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}

Parsing table:

NON- a b e i t $
TERMINAL
S S→a S → iEtSS‟

S‟ S‟→ eS S‟→ ε
S‟→ ε
E E→b

Since there are more than one production, the grammar is not LL(1) grammar.

Actions performed in predictive parsing:


1. Shift
2. Reduce
3. Accept
4. Error

1. Elimination of left recursion, left factoring and ambiguous grammar.


2. Construct FIRST() and FOLLOW() for all non-terminals.
3. Construct predictive parsing table.
4. Parse the given input string using stack and parsing table.

St. Joseph‟s College of Engineering Page 14


CS8602 / Compiler Design Department of CSE

BOTTOM-UP PARSING
Constructing a parse tree for an input string beginning at the leaves and going towards
the root is called bottom-up parsing.
A general type of bottom-up parser is a shift-reduce parser.

SHIFT-REDUCE PARSING
Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse
tree for an input string beginning at the leaves (the bottom) and working up towards
the root (the top).

Example:
Consider the grammar:
S → aABe
A → Abc | b
B→ d
The sentence to be recognized is abbcde.
REDUCTION (LEFTMOST) RIGHTMOST DERIVATION

abbcde (A → b) S → aABe
aAbcde (A → Abc) → aAde
aAde (B → d) → aAbcde
aABe (S → aABe) → abbcde
S
The reductions trace out the right-most derivation in reverse.
Handles:
A handle of a string is a substring that matches the right side of a production, and
whose reduction to the non-terminal on the left side of the production represents one
step along the reverse of a rightmost derivation.

Example:
Consider the grammar:
E → E+E
E → E*E
E → (E)
E → id
And the input string id1+id2*id3

The rightmost derivation is :


E → E+E
→ E+E*E
→ E+E*id3

St. Joseph‟s College of Engineering Page 15


CS8602 / Compiler Design Department of CSE

→ E+id2*id 3
→ id1+id2*id 3
In the above derivation the underlined substrings are called handles.

Handle pruning:
A rightmost derivation in reverse can be obtained by “handle pruning”.
(i.e.) if w is a sentence or string of the grammar at hand, then w = γn, where γn is the nth
right-sentinel form of some rightmost derivation.

Stack implementation of shift-reduce parsing :

Stack Input Action


$ id1+id2*id3 $ shift
$ id1 +id2*id3 $ reduce by E→id
$E +id2*id3 $ shift
$ E+ id2*id3 $ shift
$ E+id2 *id3 $ reduce by E→id
$ E+E *id3 $ shift
$ E+E* id3 $ shift

$ E+E*id3 $ reduce by E→id

$ E+E*E $ reduce by E→ E *E
$ E+E $ reduce by E→ E+E
$E $ accept

Actions in shift -reduce parser:


shift – The next input symbol is shifted onto the top of the stack.
reduce – The parser replaces the handle within a stack with a non-terminal.
accept – The parser announces successful completion of parsing.
error – The parser discovers that a syntax error has occurred and calls an error recovery
routine.

Conflicts in shift-reduce parsing:


There are two conflicts that occur in shift shift-reduce parsing:

Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.

Reduce-reduce conflict: The parser cannot decide which of several reductions to make.

Shift-reduce conflict:
Example:
Consider the grammar:
E→E+E |E*E |id and input id+id*id

St. Joseph‟s College of Engineering Page 16


CS8602 / Compiler Design Department of CSE

Stack Input Action Stack Input Action


$ E+E *id $ Reduce by $E+E *id $ Shift
E→E+E
$E *id $ Shift $E+E* id $ Shift
Reduce by
$ E* id $ Shift $E+E*id $ E→id
Reduce by Reduce by
$ E*id $ E→id $E+E*E $ E→E*E
Reduce by Reduce by
$ E*E $ E→E*E $E+E $ E→E*E
$E $E

Reduce-reduce conflict:
Example:
Consider the grammar:
M → R+R |R+c
|R R → c
and input c+c

Stack Input Action Stack Input Action


$ c+c $ Shift $ c+c $ Shift
Reduce by Reduce by
$c +c $ R→c $c +c $ R→c
$R +c $ Shift $R +c $ Shift
$ R+ c$ Shift $ R+ c$ Shift
Reduce by Reduce by
$ R+c $ R→c $ R+c $ M→R+c
Reduce by
$ R+R $ M→R+R $M $
$M $

LR PARSERS
An efficient bottom-up syntax analysis technique that can be used to parse a
large class of CFG is called LR(k) parsing. The „L‟ is for left-to-right scanning of the
input, the „R‟ for constructing a rightmost derivation in reverse, and the „k‟ for the
number of input symbols. When „k‟ is omitted, it is assumed to be 1.
Advantages of LR parsing:
➢ It recognizes virtually all programming language constructs for which CFG can be
written.

St. Joseph‟s College of Engineering Page 17


CS8602 / Compiler Design Department of CSE

It is an efficient non-backtracking shift-reduce parsing method.
➢ A grammar that can be parsed using LR method is a proper superset of a grammar
that can be parsed with predictive parser.

It detects asyntactic error as soon as possible.

Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming
language grammar. A specialized tool, called a LR parser generator, is needed.
Example: YACC.

Types of LR parsing method:


SLR- Simple LR - Easiest to implement, least powerful.
CLR- Canonical LR - Most powerful, most expensive.
LALR- Look -Ahead LR - Intermediate in size and cost between the other two methods.
The LR parsing algorithm:
The schematic form of an LR parser is as follows:

It consists of : an input, an output, a stack, a driver program, and a parsing table that
has two parts (action and goto).

The driver program is the same for all LR parser.

The parsing program reads characters from an input buffer one at a time.

The program uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is on
top. Each Xi is a grammar symbol and each si is a state.

The parsing table consists of two parts : action and goto functions.

Action : The parsing program determines sm, the state currently on top of stack, and ai,
the current input symbol. It then consults action[sm,ai] in the action table which can
have one of four values :

shift s, where s is a state,
reduce by a grammar production A → β,

St. Joseph‟s College of Engineering Page 18


CS8602 / Compiler Design Department of CSE

accept, and

error.
Goto : The function goto takes a state and grammar symbol as arguments and produces
a state.

Input: An input string w and an LR parsing table with functions action and goto for
grammar G.
Output: If w is in L(G), a bottom-up-parse for w; otherwise, an error indication.
Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in
the input buffer. The parser then executes the following program :

set ip to point to the first input symbol


of w$; repeat forever begin
let s be the state on top of the stack
and a the symbol pointed to by
ip;

if action[s, a] =shift s‟ then begin


push a then s‟ on top of the
stack; advance ip to the next
input symbol
end

else if action[s, a]=reduce A→β then


begin pop 2* |β |symbols off the
stack;

let s‟ be the state now on top of the


stack; push A then goto[s‟, A] on top of
the stack; output the production A→ β
end

else if action[s, a]=accept


then return
else error( )
end

CONSTRUCTING SLR(1) PARSING TABLE:


To perform SLR parsing, take grammar as input and do the following:
1. Find LR(0) items.
2. Completing the closure.
3. Compute goto(I,X), where, I is set of items and X is grammar symbol.

St. Joseph‟s College of Engineering Page 19


CS8602 / Compiler Design Department of CSE

LR(0) items:
An LR(0) item of a grammar G is a production of G with a dot at some position
of the right side. For example, production A → XYZ yields the four items : A → . XYZ

A → X . YZ
A → XY . Z
A → XYZ .

Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items
constructed from I by the two rules:
1. Initially, every item in I is added to closure(I).
2. If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B → . γ
to I , if it is not already there. We apply this rule until no more new items can be
added to closure(I).

Goto operation:
Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such that
[A→ α . Xβ] is in I.

Steps to construct SLR parsing table for grammar G are:


1. Augment G and produce G‟
2. Construct the canonical collection of set of items C for G‟
3. Construct the parsing action function action and goto using the following
algorithm that requires FOLLOW(A) for each non-terminal of grammar.

Algorithm for construction of SLR parsing table:


Input : An augmented grammar G‟
Output : The SLR parsing table functions action and goto for G‟
Method :
1. Construct C ={I0, I1, …. In}, the collection of sets of LR(0) items for G‟.
2. State i is constructed from Ii.. The parsing functions for state i are determined as
follows:
(a) If [A→α∙aβ] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to “shift j”. Here a must
be terminal.
(b) If[A→α∙] is in Ii , then set action[i,a] to “reduce A→α” for all a in FOLLOW(A).
(c) If [S‟→S.] is in Ii, then set action[i,$] to “accept”.
If any conflicting actions are generated by the above rules, we say grammar is
not SLR(1).
3. The goto transitions for state i are constructed for all non-terminals A using the rule: If
goto(Ii,A)= Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the set of items containing

St. Joseph‟s College of Engineering Page 20


CS8602 / Compiler Design Department of CSE

[S‟→.S].

Example for SLR parsing:


Construct SLR parsing for the following grammar :
G:E→E+T|T
T→T*F|F
F→ (E) | id

The given grammar is :


G:E→E+T ------ (1)
E →T ------ (2)
T→T*F ------ (3)
T→F ------ (4)
F→ (E) ------ (5)
F→ id ------ (6)
Step 1 : Convert given grammar into
augmented grammar.
Augmented grammar :
E’ → E
E→E+T
E→T
T→T*F
T→F
F→ (E)
F→ id

Step 2 : Find LR (0) items.


I0 : E’ → . E
E→.E+T
E→.T
T→.T*F
T→.F
F → . (E)
F → . id

St. Joseph‟s College of Engineering Page 21


CS8602 / Compiler Design Department of CSE

St. Joseph‟s College of Engineering Page 22


CS8602 / Compiler Design Department of CSE

SLR parsing table:

ACTION GOTO

id + * ( ) $ E T F

I0 s5 s4 1 2 3

I1 s6 ACC

I2 r2 s7 r2 r2

I3 r4 r4 r4 r4

I4 s5 s4 8 2 3

I5 r6 r6 r6 r6

I6 s5 s4 9 3

I7 s5 s4 10

I8 s6 s11

I9 r1 s7 r1 r1

I10 r3 r3 r3 r3

I11 r5 r5 r5 r5

Blank entries are error entries.

Stack implementation:

St. Joseph‟s College of Engineering Page 23


CS8602 / Compiler Design Department of CSE

Check whether the input id + id * id is valid or not.

CANONICAL LR PARSING
Algorithm For Construction Of The Canonical Lr Parsing Table
Input: grammar G'
Output: canonical LR parsing table functions action and goto
Method :

St. Joseph‟s College of Engineering Page 24


CS8602 / Compiler Design Department of CSE

1. Construct C = {I0, I1 , ..., In} the collection of sets of LR(1) items for G'.State i is
constructed from Ii.
2. if [A -> a.ab, b>] is in Ii and goto(Ii, a) = Ij, then set action[i, a] to "shift j". Here a must
be a terminal.
3. if [A -> a., a] is in Ii, then set action[i, a] to "reduce A -> a" for all a in FOLLOW(A).
Here A may not be S'.
4. if [S' -> S.] is in Ii, then set action[i, $] to "accept"
5. If any conflicting actions are generated by these rules, the grammar is not LR(1) and
the algorithm fails to produce a parser.
6. The goto transitions for state i are constructed for all nonterminals A using the rule: If
goto(Ii, A)= Ij, then goto[i, A] = j.
7. All entries not defined by rules 2 and 3 are made "error".
8. The inital state of the parser is the one constructed from the set of items containing [S'
-> .S, $].

For example, let us Consider the following


grammer,
S->CC
C->cC / d

Augumented grammer :
S‟->S
S->CC ---1
C->cC ---- 2
C->d ----- 3

Sets of LR(1) items


I0: S‟->.S, $
S->.CC, $
C->.cC, c/d
C->.d, c/d

Go to (I0, S)
I1: S‟->S., $

Go to (I0,C)
I2: S->C.C, $
C->.cC, $
C->.d, $

Go to (I0,c)
I3:C->c.C, c/d
C->.cC, c/d

St. Joseph‟s College of Engineering Page 25


CS8602 / Compiler Design Department of CSE

C->.d, c/d

Go to (I0,d)
I4: C->d., c/d

Go to (I2,C)
I5: S->CC.,$

Go to (I2,c)
I6: C->c.C, $
C->.cC,$
C->.d,$

Go to (I2,d)
I7:C->d., $

Go to (I3,C)
I8:C->cC., c/d

Go to (I3,c)
I3 : C->c.C, c/d
C->.cC, c/d
C->.d, c/d

Go to (I3,d)
I4: C->d., c/d

Go to (I6,C)
I9:C->cC.,$

Go to (I6,c)
I6: C->c.C,$
C->.cC, $
C->.d, $

Go to (I6,d)
I7: C->d.,$

St. Joseph‟s College of Engineering Page 26


CS8602/ Compiler Design Department of CSE

St. Joseph‟s College of Engineering Page 27


CS8602/ Compiler Design Department of CSE

LALR PARSER
Algorithm for construction of LALR parsing table:
Input : An augmented grammar G‟
Output : The LALR parsing table functions action and goto for G‟
Method :
1. Construct C = {I0, I1 , ..., In} the collection of sets of LR(1) items for G'.
2. For each core present among the set of LR(1) items, find all sets having that core and
replace these sets by the union.
3. Let C' = {J0, J1 , ..., Jm} be the resulting sets of LR(1) items. The parsing actions for
state i are constructed from Ji in the same manner as in the construction of the canonical
LR parsing table.
4. If there is a conflict, the grammar is not LALR(1) and the algorithm fails.
5. The goto table is constructed as follows: If J is the union of one or more sets of LR(1)
items, that is, J = I0U I1 U ... U Ik, then the cores of goto(I0, X), goto(I1, X), ..., goto(Ik, X)
are the same, since I0, I1 , ..., Ik all have the same core. Let K be the union of all sets of
items having the same core asgoto(I1, X).
6. Then goto(J, X) = K.

Construct the LALR parser for the foll. Grammer :


S->CC
C->cC / d

St. Joseph‟s College of Engineering Page 28


CS8602/ Compiler Design Department of CSE

(same as CLR from I0 to I9 states derivation)


As I3 and I6 are same, but the extensions are different. So it is combined as I36
I3 & I6 can be replaced by their union
I3:C->c.C, c/d
C->.cC, c/d
C->.d, c/d

I6: C->c.C, $
C->.cC,$
C->.d,$

I36: C->c.C, c/d/$


C->.cC, c/d/$
C->.d, c/d/$

I4: C->d., c/d


I7:C->d., $
I47: C->d., c/d/$

I8:C->cC., c/d
I9:C->cC.,$
I89: C->Cc., c/d/$

St. Joseph‟s College of Engineering Page 29


CS8602/ Compiler Design Department of CSE

LR ERROR RECOVERY
An LR parser will detect an error when it consults the parsing action table and find a
blank or error entry. Errors are never detected by consulting the goto table. An LR

St. Joseph‟s College of Engineering Page 30


CS8602/ Compiler Design Department of CSE

parser will detect an error as soon as there is no valid continuation for the portion of the
input thus far scanned. A canonical LR parser will not make even a single reduction
before announcing the error. SLR and LALR parsers may make several reductions
before detecting an error, but they will never shift an erroneous input symbol onto the
stack.
GENERATE YACC SPECIFICATION TO VALIDATE ARITHMETIC
EXPRESSIONS
AIM:
To create LEX and YACC programs to recognize the arithmetic expressions.
ALGORITHM:
Step 1: Start the program.
Step 2: Create a lex program to read and include the output of the yacc program.
Step 3: Define the rule for the operands for operations to be only numbers.
Step 4: Define the Yacc rules for the expressions and conditions to be validated.
Step 5: Terminate the programs.
Program:
4a.y
%{
#include<stdio.h>
%}
%token NUM
%left '+' '-'
%left '*' '/'
%left '(' ')'
%%
expr: e{
printf("result:%d\n",$$);
return 0;
}
e:e'+'e {$$=$1+$3;}
|e'-'e {$$=$1-$3;}
|e'*'e {$$=$1*$3;}
|e'/'e {$$=$1/$3;}
|'('e')' {$$=$2;}
| NUM {$$=$1;}
;
%%
main()
{
printf("\n enter the arithematic expression:\n");
yyparse();
printf("\nvalid expression\n");

St. Joseph‟s College of Engineering Page 31


CS8602/ Compiler Design Department of CSE

}
yyerror()
{
printf("\n invalid expression\n");
exit(0);
}
4a.l
%{
#include<stdio.h>
#include"y.tab.h"
extern int yylval;
%}
%%
[0-9]+ {
yylval=atoi(yytext);
return NUM;
}
[\t] ;
\n return 0;
. return yytext[0];
%%
OUTPUT:
To compile : yacc -d 4a.y, lex 4a.l, cc -c lex.yy.c y.tab.c, cc -o a.out lex.yy.o y.tab.o -lfl
enter the arithmetic expression:
10+10
result:20
valid expression

enter the arithmetic expression:


10-10
result:0

valid expression

St. Joseph‟s College of Engineering Page 32

You might also like