Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Chapter 3 Syntax Analysis

Chapter 3 discusses parsing in compiler design, detailing its role, types (top-down and bottom-up), and implementation strategies. It covers error recovery, the use of context-free grammar (CFG), and the Yacc automatic parser generator. Key concepts include recursive descent parsing, predictive parsing, left recursion, left factoring, and the construction of FIRST and FOLLOW sets.

Uploaded by

Betelhem
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Chapter 3 Syntax Analysis

Chapter 3 discusses parsing in compiler design, detailing its role, types (top-down and bottom-up), and implementation strategies. It covers error recovery, the use of context-free grammar (CFG), and the Yacc automatic parser generator. Key concepts include recursive descent parsing, predictive parsing, left recursion, left factoring, and the construction of FIRST and FOLLOW sets.

Uploaded by

Betelhem
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

CHAPTER

3: Mr. Yirga K.

Parser /
Syntax
Analysis
OUTLIN
1. Introduction to Parsing
E
2. Types of Parsing
1. Top-down Parsing
Top-down Parsing Implementation:
- Recursive Decent parsing
- Predictive (Non-Recursive)
Parsing
- LL (1) Grammar
OUTLIN
2. Bottom-Up Parsing
E
- Handles a n d Handles pruning
- Stack Implementation of Shift Re d u ce
Parsing
- LR Parser Implementation
- SLR, CLR a n d LALR parser
3. Error Recovery in Parsing
4. Ya c c Automatic Parser Generator
Introduction to Parsing

Chapter Content
 In this chapter, the process of parsing, the role of parser,
various types of parses, a n d their implementation
strategies will b e addressed.
Objective: Students should:
 Design parsers using the c o n c e p t of CFG.
 C o m p a re different parsing strategies.
 Implement LL(1) a n d LR(1) parsers.
 Students b e able to use YA C C as a syntax generator.
Introduction to Parsing

What is
parsing? or parsing is the second phase of a
• Synta x compiler
• a
It na lysistransformation of a sequence of tokens to a n
is the
architecture.
abstract syntax tree.
• The parser obtains a set of tokens from the lexical analyzer.
• Syntax Analyzer creates the syntactic structure of the
given source program.
• It uses BNF (Backus-Naur Form) notation in the description of
CFGs.
Introduction to Parsing
• Tokens arrangements are c h e c ke d against source c o d e grammar.
 The synta x o fa p ro g ra mming la ng ua g e is
we ll d e sc rib e d by using
c ontext-free grammar.
 It checks whether a given source program satisfies the rules
implied by a context-free grammar or not.
• If it satisfies, the parser creates the parse tree/syntax
tree/, which further used for Generating Intermediate
code(ICG).
• Otherwise the parser gives the error messages.
Context Free C onti
Grammar
Pushdown …
Automata

Regular
Expressions Finite
State Automata
Introduction to Parsing
 Syntax is the rule governing the formation of
statements in a programming language.
 It is the way tokens are put together to form expressions,
statements, or blocks of statements.
 Syntax analysis is the task con ce rn e d with fitting a
sequence of tokens into a specifi ed syntax.
 Due to the limitation of RE, a scanner cannot c h e c k the
syntax of a se nte nc e . E.g . RE c a nno t c he c k { a nd }.
 Syntax of a programming language is the grammar rules of
context- free grammar (CFG).
Introduction to Parsing
 A grammar describes the hierarchical structure
of most programming languages.
 Exa mp le : if ( e xp re ssio n ) sta te me nt else
sta te me nt
 Using expr for expression a n d stmt for statement, it c a n b e
expressed a s, stmt: if ( expr ) stmt else stmt
 This is c a lle d produc tion.
 In production, lexical elements like the keyword if, else a n d
parentheses are called Terminals.
 Variables like expr a n d stmt represent sequences of
terminals a n d are called non-terminals.
Introduction to Parsing
 sp e c ific a ti
C FG s G ive s a p re c ise synta c tic of a
p ro g ra mming language. on
 The d e sig no f the g ra mma r is a n initia l p ha se o f
the d e sig no f a compiler.
 We categorize the parsers into two groups:
 Top-Down Parser
 The parse tree is created top to bottom starting from the
root.
 Bottom-Up Parser
 The parse tree is created bottom to top starting from the
leaves.
Introduction to Parsing
In general,
 The parser accomplishes the following tasks, i.e.,
 Parsing the c o d e ,
 Looking for errors (syntactical) a n d
 G e ne ra ting a p a rse tre e /synta x tre e / as the
o utp ut fo r the ne xt phase.
1) Top-down
Parsing
 It Parses the input, a n d starts constructing a parse tree from
the root n o d e gradually moving down to the leaf nodes.
Top-down
1)Recursive Descent Parsing: Parsing
 It is a c o mmo n fo rm o f to p -d o wn p a rsing .
 It use s re c ursive p ro c e d ure s to p ro c e ss the inp ut.
 Recursive descent parsing suffers from backtracking.
 It constructs the parse tree from the top a n d the input
is read from left to right.
Top-down
Backtracking: Parsing
 It is a te c hniq ue in to p -d o wn p a rsing tha t a llo ws
p a rse rs to explore alternative choices a n d handle
ambiguity.
 In backtracking, the parser restores the input to its
original state when it fails.
 It goes b a c k to earlier state to find another route after
knowing that current state is a d e a d end.
Top-down
Backtracking: Parsing
 If one derivation of a production fails, the syntax analyzer
restarts the process using different rules of that
production.
 This technique may process the input string more than
o n c e to determine the right production.
Top-down
Parsing
Top-down
 Consider a Parsing
CFG:
 S → rXd | rZd
 X → oa | ea
 Z → ai
 Parse a n
input string:
 “read”,

 Try: P  A B, A
 xy | x, By,
a nd p a rse
inp ut “ xy”?

Top-down
2) Predic tive Parser Parsing
 A form of recursive-descent parsing that does not require
any back- tracking is known as predictive parsing.
 Predictive parser has the capability to predict which
production is to b e used to replace the input string.
 It d o e s no t suffe r fro m b a c ktra c king .
 Predictive parser uses a look-ahead pointer, which points to
the next input symbols.
 To ma ke the p a rse r b a c k-tra c king fre e ,
the p re d ic tive p a rse r p uts some constraints on
the grammar.
Top-down

Parsing
It a c c e p t s only a class of grammar known as LL (1) grammar.
 The first ‘L’ stands for scanning the input from left to right,
and
 the second “L” stands for left most derivations, a n d
 ‘1’ refers to using only one input symbol (lookahead) at
e a c h step to make parsing action decisions.
 Note: LL(1) is a top-down parsing algorithm that reads input
from left to right a n d builds leftmost derivations.
Top-down
 Parsing
Two c ommon problems in top down parsing.

1) Left Rec ursion

 A g ra mma r bec omes le ft-re c ursive

if it ha s a ny no n-te rmina l ‘ A’ whose derivation

contains ‘ A ’ itself as the left-most symbol.

 Example: S  Sα | β, where α (V+T)* a n d β (V+T)*

 (β is a sequenc e of terminals a n d non terminals that d o not


start with S).
Top-down
 Parsing
Left-recursive grammars are considered as problematic as
the parser goes into a n infinite loop.
 Left recursion c a n b e immediate or indirect recursion.
 Example:
 A  Aα | β
 S  A α | β, A  Sd
 (1) is a n example of immediate left recursion, where A is
any non- terminal symbol a n d ‘ α ’ represents a string of non-
terminals.
 (2) is a n e xa mp le o f indirec t-left re c ursio n.
Top-down
 A top-down parser
Parsing
will first parse the ‘A’, which in-turn will
yield a string consisting of ‘ A ’ itself a n d the parser may
goes into a loop infinitely.
Top-down
Removal of Left Rec ursion Parsing
 O n e way to remove left recursion is to use the following
technique:
 The production, A  A α | β , which is a type of immediate
recursion
is converted or rewriting into following productions.
 A  βA'
 A ’  αA' | ε,

 This d o e s not imp a c t the string s d e rive d fro m


the g ra mma r, b ut it removes immediate left
recursion.
 Se c o nd me tho d is to use the fo llo wing a lg o rithm,
whic h sho uld
eliminate all direct and indirect left recursions.
Top-down
 More Generally
Parsing
 A  Aα 1 | Aα 2 | Aα 3 | ….. | A α m | β1 | β2 | …. | β n Where
no βi begins with a n A .
 Then w e replace the “A” productions by
 A  β1 A ’ | β2 A ’ | ….. | β n A ’
 A ’  α 1 A ’ | α 2 A ’ | α 3 A ’ | ….. | α m A ’ | ε.
 Exercise: G1: SSab | Scd | Sef | g | h, G2: S  (L)|a a n d L LS|S?
Top-down
Parsing
Top-down

Parsing
Exercise: Eliminate left recursion from the following
grammar?
 EE+T| Solution:
T
 TT*F|F ETE’

 F(E)| id E’ε|+TE’

TFT’

T’*FT’/ ε

F(E)|id
Top-down
2) Left Fac toring
Parsing
 If more than one grammar production rules has a
c o m m o n prefix string, then the top-down parser cannot
make a choice.
 Le ft fa c to ring is a g ra mma r tra nsfo rma tio n tha t
is use ful fo r producing a grammar suitable for
predictive or top down parsing.
 A grammar is said to b e Left Factored when it has the form:
 A  αβ 1 | αβ 2 | αβ 3 | … … | αβ n | γ, i.e. the productions start
with the same terminal (or set of terminals).
Top-down
 Fro m the a b o ve e xa mp le , b y Parsing
se e ing the inp ut
‘α’ we c a nno t immediately tell which production to
choose to expand A.
 Left factoring transforms the grammar to make it useful for
top-down parsers.
 For the grammar A  αβ 1 | αβ 2 | αβ 3 | … … | αβ n | γ
 The equivalent Left Factored grammar will b e by rewriting:
 A  αA’ | γ
 A ’  β 1 | β 2 | β 3 | … … | βn.
Top-down
3) FIRST and FOLLOW Parsing
 First a nd Fo llo w Se ts use d in LL (1) o r p re d ic tive p a rse r.
 Parser table construction is applying to create first a n d
follow sets.
 The se se ts c a n p ro vid e the a c tua l p o sitio n of
a ny te rmina lin the derivation.
 They allow the parser c a n properly apply the n e e d e d rule
at the correct position.
 This is d o ne to c re a te the p a rsing ta b le whe re
the d e c isio n o f replacing T[A, t] = α, with some
production rule.
Top-down
 Parsing
First Set: If there is a variable, a n d from that variable, if w e
try to drive all the strings then the beginning Terminal
Symbol is called the First.
 FIRST(X) for a grammar symbol X is the set of terminals that
begin the strings derivable from X.
 Rules to c ompute FIRST set:
 If x is a te rmina l, the n FIRST(x) = { ‘ x’ }.
 If Xɛ, is a production rule, then FIRST(X) = {ɛ}.
 If XY1Y2Y3… .Yn is a p ro d uc tio n, the n
 FIRST(X) = FIRST(Y1).
 If FIRST(Y1) contains ɛ, then FIRST(X) = { FIRST(Y1) – ɛ } U
{ FIRST(Y2) }.
 If FIRST (Yi) contains ɛ for all i = 1 to n, then a d d Є to FIRST(X).
Top-down
Example 1: Pro d uc tio n Rule s o f Parsing
G ra mma r:
 E  TE’ ; E’  +T E’ | Є
 T  F T’ ; T’  *F T’ | Є
 F  (E) | id

FIRST sets:
 FIRST(E) = FIRST(T) = { ( , id }
 FIRST(E’ ) = { +, Є }
 FIRST(T) = FIRST(F) = { ( , id }
 FIRST(T’ ) = { *, Є }
 FIRST(F) = { ( , id }
Top-down
Parsing
Follow Set: What is the Terminal Symbol which follows a
variable in the process of derivation.
 Rules to c ompute FOLLOW set
 Follow (S) = { $}, where S is the starting Non-Terminal.
 If A pBq is a production, where p, B a n d q are any
grammar symbols, then everything in FIRST(q) except Є is
in FOLLOW(B).
 If A p B is a p ro d uc tio n, the n e ve rything
in FO LLO W(A ) is in
FOLLOW(B).
 If A p Bq is a p ro d uc tio n a nd FIRST(q ) c o nta ins
Є, the n FO LLO W(B) c o nta ins { FIRST(q ) – Є } U
FO LLO W(A ).
Top-down
Example: Consider a production
rules: E  TE’
Parsing
E’  +T E’ | Є
T  F T’
T’  *F T’ | Є
F  (E) | id
FOLLOW Set
will be:
 FO LLO
W(E) = {
$ , ) } //
No te ')'
is the re
b e c a us
e o f 5th
rule
 FO LLO
W(E’ ) =
Top-down
Example 2: Consider production rules of Parsing
grammar:
S  AC B | C bb |
Ba A  d a | BC
Bg | Є
C h | Є
FIRST
Sets:
FIRST(S) =
FIRST(A) U
FIRST(B) U
FIRST(C) =
{ d, g, h,
Є}
FIRST(A ) = { d } U FIRST(B) = { d , g , Є } FIRST(B)
= { g , Є } FIRST(C ) = { h , Є }
Top-down
Production FIRST set:
FIRST(S) = {
Parsing
a }
Rules:
FIRST(B) = { c }
S ->
FIRST(C) = { b , Є }
aBDh B FIRST(D) = FIRST(E) U FIRST(F) = { g, f,
-> cC Є } FIRST(E) = { g , Є } FIRST(F) = { f
C -> bC , Є }
| D ->
F -> f |
EF
Є FOLLOW
FOLLOW(S = { $ }
Set:
E -> g | )
Є FOLLOW(B = { FIRST(D) – Є } U FIRST(h) = { g , f ,
) h }
FOLLOW(C = FOLLOW(B) = { g , f , h }
)
FOLLOW(D = FIRST(h) = { h }
)
FOLLOW(E = { FIRST(F) – Є } U FOLLOW(D) = { f ,
) h }
FOLLOW(F = FOLLOW(D) = { h }
)
Top-down
Limitations of Syntax Analyzers Parsing
 It c a nno t d e te rmine if a to ke n is valid.
 It c a nno t d e te rmine if a to ke n is dec lared b e fo re it is b e ing
use d .
 It cannot determine if a token is initialized before it is being
used,
 It c a nno t d e te rmine if a n o p e ra tio n p e rfo rme d o n
a to ke n type is valid or not.
LL(1)
Implementation of LL(1) Implementation
Step1: First c he c k fo r le ft re c ursio n in the g ra mma r,
if the re is le ft recursion in the grammar remove that
a n d g o to step 2.
Step
 Left2: Calculate First() a n d Follow() for all non-terminals.
factor ? ?
Top-down
Parsing
Step 3: For e a c h production A α. (A tends to alpha)

1. Find First(α) a n d for e a c h terminal in First(α), make entry A


 α in the table.

2. If First(α) contains ε (epsilon) as terminal then, find the


Follow(A) a n d for e a c h terminal in Follow(A), make entry
A  α in the table.

3. If the First(α) contains ε a n d Follow(A) contains $ as


terminal, then make entry A  α in the table for the $.
Top-down
 In the table, rows will contain Parsing
the Non-Terminals a n d the
column will contain the Terminal Symbols.
 All the Null Productions of the Grammars will g o under the
Follow elements a n d the remaining productions will lie
under the elements of the First set.
 Example-1: Consider the Grammar:
 E  TE’
 E'  +TE' | ε
 T  FT’
 T'  *FT' | ε
 F  id | (E)  ε denotes epsilon
Top-down
Parsing

LL(1) or predictive Parsing


Table.
2) Bottom-Up
 Parsing
Build the parse tree from leaves to root.
 Bottom-up parsing reduces the input string ‘ w ’ to the start
symbol of a grammar by tracing out the rightmost
derivations of ‘ w ’ in reverse.
 Example 1: Consider the following rules, a n d apply bottom-up
appro a ch.
2) Bottom-Up
 Parsing
Bottom-up parsing starts with the input symbols a n d tries to
construct the parse tree up to the start symbol.
 Example 2: Pro d uc tio n rule s:
 S→E
 E→E+T
 E→E*T
 E→T
 T → id
 Le t us sta rt b o tto m-up p a rsing fo r inp ut, a + b * c
 Read the input a n d c h e c k if any production matches with
the input.
2) Bottom-Up
 a +b * Parsing
c
 T+b *
c
 E+b *
c
 E+T*
c
 E*c
 E*T
 E
 S
2) Bottom-Up
 Classifications of Bottom-Up Parsing
Parsing.
2) Bottom-Up
Benefits of LR parsing: Parsing
 Man y programming languages using some variations of a n
LR parser.
 It should b e noted that C++ a n d Perl are exceptions to it.
 LR Parser c a n b e implemented very effi ciently.
 Of all the Parsers that scan their symbols from left to right, LR
Parsers detect syntactic errors, as soon as possible.
2) Bottom-Up
Parsing
Shift Reduce Parsing Implementation
 Shift Re du ce parser attempts for the construction of parse
tree in a similar manner as d o n e in bottom up parsing(i.e.
the parse tree is constructed from leaves to the root).
 A mo re g e ne ra l fo rm o f shift re d uc e p a rse r is LR(0) p a rse r.
 The L stands for scanning the input from left to right a n d R
stands for constructing a rightmost derivation in reverse.
 This p a rse r re q uire s so me d a ta struc ture s, i.e .
 A input buff er for storing the input string.
 A stack for storing a n d accessing the production rules.
2) Bottom-Up
 Parsing
Shift-reduce parsing uses two operations(steps).
a) Shift step: The shift step refers to the a d v a n c e me n t of the
input pointer to the next input symbol, which is called the
shifted symbol. This symbol is pushed onto the stack. The
shifted symbol is treated as a single n o d e of the parse tree.
b) Reduce step : When the parser finds a complete grammar
rule (RHS) a n d then replaces it to (LHS), it is known as
reduce step. This occurs when the top of the stack
contains a handle. To reduce, a POP function is performed
on the stack which pops off the handle a n d replaces it
with LHS non-terminal symbol.
2) Bottom-Up
 Accept: If only start symbol isParsing
present in the stack a n d
the
input buffer is empty then, the parsing action is called
a c c e p t . When a c c e p t action is obtained, it means
successful parsing is done.
 Error: This is the situation in which the parser c a n neither
perform shift action nor reduce action a n d not even a c c e p t
action.
 Example 1 – C o nsid e r the
g ra mma r: S  S + S
S  S *
S S  id
 Apply Shift
Reduce parsing
for input string, “id
2) Bottom-Up
Parsing
2) Bottom-Up
 Parsing
Example 2 – Consider the grammar. E  2E2, E  3E3, a n d E
 4, Apply Shift Reduce Parsing for input string “32423”?
 Solution:
2) Bottom-Up
 Parsing
Example 3 – C o nsid e r the g ra mma r, S  (L)| a ,
L L , S | S, a nd apply
Shift Re du ce parsing for input string “( a , ( a , a ) ) “?
 Project Work:
 Implement Simple LR Parser that recognizes inputs from certain grammar?
2) Bottom-Up
Handles and Handles pruning Parsing
 In compiler design, handle pruning is a technique that
improves the grammar's parsing process.
 It involves removing the children of the left-hand side non-
terminal from the parse tree.
 A handle is a substring that matches the body of a
production.
 Reducing a handle represents one step along with the
reverse of a Rightmost derivation.
 https://www.geeksforgeeks.org/what-is-handle-pruning/
 SLR, CLR a n d LALR parser
 https://www.geeksforgeeks.org/slr-clr-and-lalr-parsers-set-3/
 https://www.geeksforgeeks.org/lalr-parser-with-examples/
2) Bottom-Up
3. Error Recovery in Parsing Parsing
4. Ya c c Automatic Parser
Generator
Thank
you!!

You might also like