Compiler 2
Compiler 2
Compiler 2
DESIGN
UNIT II SYNTAX ANALYSIS
Need and Role of the Parser-Context Free Grammars -Top Down Parsing -General Strategies-
1601602 / Compiler
Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce Parser-LR Parser-LR
(0)Item-Construction of SLR Parsing Table -Introduction to LALR Parser - Error Handling
Design
and Recovery in Syntax Analyzer-YACC-Design of a syntax Analyzer for a Sample Language
.
CONTEXT-FREE GRAMMARS
Types of derivations
The two types of derivation are:
1. Left most derivation - In leftmost derivations, the leftmost non-terminal in each
sentinel is always chosen first for replacement.
2. Right most derivation - In rightmost derivation s, the rightmost non-terminal in each sentinel
is always chosen first for replacement.
E →E*E
E→(E)
E→-E
E → id
➢
String that appear in leftmost derivation are called left sentinel forms.
➢
String that appear in rightmost derivation are called right sentinel forms.
Sentinels:
Given a grammar G with start symbol S, if S → α , where α may contain non-terminals or
terminals, then α is called the sentinel form of G.
Yield or frontier of tree:
Each interior node of a parse tree is a non-terminal. The children of node can be a terminal or
non-terminal of the sentinel forms that are read from left to right. The sentinel form in the parse
tree is called yield or frontier of the tree.
Ambiguity:
A grammar that produces more than one parse for some sentence is said to be ambiguous
grammar.
Example 2:
Consider the grammar:
S → aABe
A → Abc | b
B→ d
The sentence to be recognized is abbcde.
Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
Example:
Consider the grammar:
E → E+E
E → E*E
E → (E)
E → id
And the input string id1+id2*id3
Handle pruning:
A rightmost derivation in reverse can be obtained by “handle pruning”.
(i.e.) if w is a sentence or string of the grammar at hand, then w = γn, where γn is the nth right
sentinel form of some rightmost derivation.
Eliminating ambiguity:
Ambiguity of the grammar that produces more than one parse tree for leftmost or rightmost
derivation can be eliminated by re-writing the grammar.
Consider this example, G: stmt → if expr then stmt | if expr then stmt else stmt | other
This grammar is ambiguous since the string if E1 then if E2 then S1 else S2 has the following
two parse trees for leftmost derivation :
Left factoring:
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive parsing. When it is not clear which of two alternative productions to use to expand a
non-terminal A, we can rewrite the A-productions to defer the decision until we have seen
enough of the input to make the right choice.
TOP-DOWN PARSING
It can be viewed as an attempt to find a left-most derivation for an input string or an
attempt to construct a parse tree for the input starting from the root to the leaves.
Types of top-down parsing :
1. Recursive descent parsing
2. Predictive parsing
The parse tree can be constructed using the following top-down approach :
Step1:
Initially create a tree with single node labeled S. An input pointer points to „c‟, the first symbol
of w. Expand the tree with the production of S.
Step2:
The leftmost leaf „c‟ matches the first symbol of w, so advance the input pointer to the second
symbol of w „a‟ and consider the next leaf „A‟. Expand A using the first alternative.
Step3:
The second symbol „a‟ of w also matches with second leaf of tree. So advance the input pointer
to third symbol of w „d‟. But the third leaf of tree is b which does not match with the input
symbol d.
Hence discard the chosen production and reset the pointer to second position. This is called
backtracking.
Step4:
Now try the second alternative for A.
The table-driven predictive parser has an input buffer, stack, a parsing table and an
output stream.
Input buffer:
It consists of strings to be parsed, followed by $ to indicate the end of the input string.
Stack:
Parsing table:
It is a two-dimensional array M[A, a], where ‘A’ is a non-terminal and ‘a’ is a terminal.
The parser is controlled by a program that considers X, the symbol on top of stack, and
a, the current input symbol. These two symbols determine the parser action. There are
three possibilities:
Method :
1. For each production A →α of the grammar, do steps 2 and 3.
2. For each terminal a in FIRST(α), add A → α to M[A, a].
3. If ε is in FIRST(α), add A → α to M[A, b] for each terminal b in FOLLOW(A). If ε is in
FIRST(α) and $ is in FOLLOW(A) , add A → α to M[A, $].
4. Make each undefined entry of M be error.
Example:
Consider the following grammar :
E → E+T |T
T → T*F |F
F→ (E) |id
E → TE‟
E‟ → +TE‟ |ε
T → FT‟
T‟ → *FT‟ | ε
F → (E) |id
FIRST( ) :
FIRST(E) ={ (, id}
FIRST(E‟) ={+ , ε }
FIRST(T) = { ( , id}
FIRST(T‟) ={*, ε }
FIRST(F) ={ ( , id }
FOLLOW( ):
FOLLOW(E) ={ $, ) }
FOLLOW(E‟) ={ $, ) }
FOLLOW(T) ={ +, $, ) }
FOLLOW(T‟) = { +, $, ) }
FOLLOW(F) ={+, * , $ , ) }
NON- id + * ( ) $
TERMINAL
E E → TE‟ E → TE‟
E‟ E‟ → +TE‟ E‟ → ε E‟→ ε
T T → FT‟ T → FT‟
T‟ T‟→ ε T‟→ *FT‟ T‟ → ε T‟ → ε
F F→ id F→ (E)
Stack implementation:
E→b
After eliminating left factoring, we have
S → iEtSS‟ |a
S‟→ eS | ε
E→b
To construct a parsing table, we need FIRST()and FOLLOW() for all the non-terminals.
FIRST(S) ={ i, a }
FIRST(S‟) = {e, ε }
FIRST(E) ={ b}
FOLLOW(S) ={ $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
Parsing table:
NON- a b e i t $
TERMINAL
S S→a S → iEtSS‟
S‟ S‟→ eS S‟→ ε
S‟→ ε
E E→b
Since there are more than one production, the grammar is not LL(1) grammar.
BOTTOM-UP PARSING
Constructing a parse tree for an input string beginning at the leaves and going towards
the root is called bottom-up parsing.
A general type of bottom-up parser is a shift-reduce parser.
SHIFT-REDUCE PARSING
Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse
tree for an input string beginning at the leaves (the bottom) and working up towards
the root (the top).
Example:
Consider the grammar:
S → aABe
A → Abc | b
B→ d
The sentence to be recognized is abbcde.
REDUCTION (LEFTMOST) RIGHTMOST DERIVATION
abbcde (A → b) S → aABe
aAbcde (A → Abc) → aAde
aAde (B → d) → aAbcde
aABe (S → aABe) → abbcde
S
The reductions trace out the right-most derivation in reverse.
Handles:
A handle of a string is a substring that matches the right side of a production, and
whose reduction to the non-terminal on the left side of the production represents one
step along the reverse of a rightmost derivation.
Example:
Consider the grammar:
E → E+E
E → E*E
E → (E)
E → id
And the input string id1+id2*id3
→ E+id2*id 3
→ id1+id2*id 3
In the above derivation the underlined substrings are called handles.
Handle pruning:
A rightmost derivation in reverse can be obtained by “handle pruning”.
(i.e.) if w is a sentence or string of the grammar at hand, then w = γn, where γn is the nth
right-sentinel form of some rightmost derivation.
$ E+E*E $ reduce by E→ E *E
$ E+E $ reduce by E→ E+E
$E $ accept
Shift-reduce conflict:
Example:
Consider the grammar:
E→E+E |E*E |id and input id+id*id
Reduce-reduce conflict:
Example:
Consider the grammar:
M → R+R |R+c
|R R → c
and input c+c
LR PARSERS
An efficient bottom-up syntax analysis technique that can be used to parse a
large class of CFG is called LR(k) parsing. The „L‟ is for left-to-right scanning of the
input, the „R‟ for constructing a rightmost derivation in reverse, and the „k‟ for the
number of input symbols. When „k‟ is omitted, it is assumed to be 1.
Advantages of LR parsing:
➢ It recognizes virtually all programming language constructs for which CFG can be
written.
Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming
language grammar. A specialized tool, called a LR parser generator, is needed.
Example: YACC.
It consists of : an input, an output, a stack, a driver program, and a parsing table that
has two parts (action and goto).
➢
The driver program is the same for all LR parser.
➢
The parsing program reads characters from an input buffer one at a time.
➢
The program uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is on
top. Each Xi is a grammar symbol and each si is a state.
➢
The parsing table consists of two parts : action and goto functions.
Action : The parsing program determines sm, the state currently on top of stack, and ai,
the current input symbol. It then consults action[sm,ai] in the action table which can
have one of four values :
➢
shift s, where s is a state,
reduce by a grammar production A → β,
Input: An input string w and an LR parsing table with functions action and goto for
grammar G.
Output: If w is in L(G), a bottom-up-parse for w; otherwise, an error indication.
Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in
the input buffer. The parser then executes the following program :
LR(0) items:
An LR(0) item of a grammar G is a production of G with a dot at some position
of the right side. For example, production A → XYZ yields the four items : A → . XYZ
A → X . YZ
A → XY . Z
A → XYZ .
Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items
constructed from I by the two rules:
1. Initially, every item in I is added to closure(I).
2. If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B → . γ
to I , if it is not already there. We apply this rule until no more new items can be
added to closure(I).
Goto operation:
Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such that
[A→ α . Xβ] is in I.
[S‟→.S].
ACTION GOTO
id + * ( ) $ E T F
I0 s5 s4 1 2 3
I1 s6 ACC
I2 r2 s7 r2 r2
I3 r4 r4 r4 r4
I4 s5 s4 8 2 3
I5 r6 r6 r6 r6
I6 s5 s4 9 3
I7 s5 s4 10
I8 s6 s11
I9 r1 s7 r1 r1
I10 r3 r3 r3 r3
I11 r5 r5 r5 r5
Stack implementation:
CANONICAL LR PARSING
Algorithm For Construction Of The Canonical Lr Parsing Table
Input: grammar G'
Output: canonical LR parsing table functions action and goto
Method :
1. Construct C = {I0, I1 , ..., In} the collection of sets of LR(1) items for G'.State i is
constructed from Ii.
2. if [A -> a.ab, b>] is in Ii and goto(Ii, a) = Ij, then set action[i, a] to "shift j". Here a must
be a terminal.
3. if [A -> a., a] is in Ii, then set action[i, a] to "reduce A -> a" for all a in FOLLOW(A).
Here A may not be S'.
4. if [S' -> S.] is in Ii, then set action[i, $] to "accept"
5. If any conflicting actions are generated by these rules, the grammar is not LR(1) and
the algorithm fails to produce a parser.
6. The goto transitions for state i are constructed for all nonterminals A using the rule: If
goto(Ii, A)= Ij, then goto[i, A] = j.
7. All entries not defined by rules 2 and 3 are made "error".
8. The inital state of the parser is the one constructed from the set of items containing [S'
-> .S, $].
Augumented grammer :
S‟->S
S->CC ---1
C->cC ---- 2
C->d ----- 3
Go to (I0, S)
I1: S‟->S., $
Go to (I0,C)
I2: S->C.C, $
C->.cC, $
C->.d, $
Go to (I0,c)
I3:C->c.C, c/d
C->.cC, c/d
C->.d, c/d
Go to (I0,d)
I4: C->d., c/d
Go to (I2,C)
I5: S->CC.,$
Go to (I2,c)
I6: C->c.C, $
C->.cC,$
C->.d,$
Go to (I2,d)
I7:C->d., $
Go to (I3,C)
I8:C->cC., c/d
Go to (I3,c)
I3 : C->c.C, c/d
C->.cC, c/d
C->.d, c/d
Go to (I3,d)
I4: C->d., c/d
Go to (I6,C)
I9:C->cC.,$
Go to (I6,c)
I6: C->c.C,$
C->.cC, $
C->.d, $
Go to (I6,d)
I7: C->d.,$
LALR PARSER
Algorithm for construction of LALR parsing table:
Input : An augmented grammar G‟
Output : The LALR parsing table functions action and goto for G‟
Method :
1. Construct C = {I0, I1 , ..., In} the collection of sets of LR(1) items for G'.
2. For each core present among the set of LR(1) items, find all sets having that core and
replace these sets by the union.
3. Let C' = {J0, J1 , ..., Jm} be the resulting sets of LR(1) items. The parsing actions for
state i are constructed from Ji in the same manner as in the construction of the canonical
LR parsing table.
4. If there is a conflict, the grammar is not LALR(1) and the algorithm fails.
5. The goto table is constructed as follows: If J is the union of one or more sets of LR(1)
items, that is, J = I0U I1 U ... U Ik, then the cores of goto(I0, X), goto(I1, X), ..., goto(Ik, X)
are the same, since I0, I1 , ..., Ik all have the same core. Let K be the union of all sets of
items having the same core asgoto(I1, X).
6. Then goto(J, X) = K.
I6: C->c.C, $
C->.cC,$
C->.d,$
I8:C->cC., c/d
I9:C->cC.,$
I89: C->Cc., c/d/$
LR ERROR RECOVERY
An LR parser will detect an error when it consults the parsing action table and find a
blank or error entry. Errors are never detected by consulting the goto table. An LR
parser will detect an error as soon as there is no valid continuation for the portion of the
input thus far scanned. A canonical LR parser will not make even a single reduction
before announcing the error. SLR and LALR parsers may make several reductions
before detecting an error, but they will never shift an erroneous input symbol onto the
stack.
GENERATE YACC SPECIFICATION TO VALIDATE ARITHMETIC
EXPRESSIONS
AIM:
To create LEX and YACC programs to recognize the arithmetic expressions.
ALGORITHM:
Step 1: Start the program.
Step 2: Create a lex program to read and include the output of the yacc program.
Step 3: Define the rule for the operands for operations to be only numbers.
Step 4: Define the Yacc rules for the expressions and conditions to be validated.
Step 5: Terminate the programs.
Program:
4a.y
%{
#include<stdio.h>
%}
%token NUM
%left '+' '-'
%left '*' '/'
%left '(' ')'
%%
expr: e{
printf("result:%d\n",$$);
return 0;
}
e:e'+'e {$$=$1+$3;}
|e'-'e {$$=$1-$3;}
|e'*'e {$$=$1*$3;}
|e'/'e {$$=$1/$3;}
|'('e')' {$$=$2;}
| NUM {$$=$1;}
;
%%
main()
{
printf("\n enter the arithematic expression:\n");
yyparse();
printf("\nvalid expression\n");
}
yyerror()
{
printf("\n invalid expression\n");
exit(0);
}
4a.l
%{
#include<stdio.h>
#include"y.tab.h"
extern int yylval;
%}
%%
[0-9]+ {
yylval=atoi(yytext);
return NUM;
}
[\t] ;
\n return 0;
. return yytext[0];
%%
OUTPUT:
To compile : yacc -d 4a.y, lex 4a.l, cc -c lex.yy.c y.tab.c, cc -o a.out lex.yy.o y.tab.o -lfl
enter the arithmetic expression:
10+10
result:20
valid expression
valid expression