Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Chapter 3 Compiler Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Chapter three

parsing/Syntax Analysis
COMPILER DESIGN – PHASES OF COMPILER
The compilation process is a sequence of various phases. Each phase takes
input from its previous stage, has its own representation of source program,
and feeds its output to the next phase of the compiler. Let us understand the
phases of a compiler
Parser
The parser is the phase of the compiler which takes a
token string as input and with the help of existing
grammar, converts it into the corresponding Intermediate
Representation.
Comparison with Lexical Analysis
The role of the Parser
Cont….
 Error handling
Common programming errors
Lexical errors
parser errors/ Syntactic errors
Semantic errors
Error handler goals
 Report the presence of errors clearly and accurately
 Recover from each error quickly enough to detect
subsequent errors.
 Add minimal overhead to the processing of correct
programs.
 Syntax error/parser error can be detected
at this level if the input is not in
accordance with the grammar.
 Parsing or syntactic analysis is the process of analyzing a
string of symbols, either in natural language or in
computer languages, conforming to the rules of a formal
grammar
 It is aided by using techniques based on formal grammar
of the programming language
 It is analyze (a string or text) into logical syntactic
components, typically in order to test conformability to a
logical grammar.
 It takes the token produced by lexical analysis as input
and generates a parse tree (or syntax tree). In this phase,
token arrangements are checked against the source code
grammar, i.e., the parser checks if the expression made by
the tokens is syntactically correct.
 If the lexical analyzer finds a token invalid, it
generates an error.
 The lexical analyzer works closely with the syntax
analyzer. It reads character streams from the source
code, checks for legal tokens, and passes the data to
the syntax analyzer when it demands.
 The parser analyzes the source code token stream
against the production rules to detect any errors in
the code. The output of this phase is a parse tree.

This way, the parser accomplishes two tasks, i.e., parsing the
code, looking for errors and generating a parse tree as the output
of the phase.
Types of Parser:
Types of Parser:
The parser is mainly classified into two categories, i.e.
Top-down Parser, and Bottom-up Parser. These are
explained below:
1- Top-Down Parser: The top-down parser is the
parser that generates parse for the given input string
with the help of grammar productions by expanding the
non-terminals i.e. it starts from the start symbol and
ends on the terminals. It uses left most derivation.
Further Top-down parser is classified into 2 types:
Recursive descent parser, and Non-recursive descent
parser.
Leftmost Derivation (LMD): If the sentential
form of an input is scanned and replaced from left
to right, it is called left-most derivation.
• The sentential form derived by the left-most
derivation is called the left-sentential form.
Example: Consider the G,
E → E + E | E * E | (E ) | - E | id
Derive the string id + id * id using leftmost
derivation
Recursive descent parser is also known as the
Brute force parser or the backtracking parser.
 It basically generates the parse tree by using
brute force and backtracking.
Non-recursive descent parser is also known as
LL(1) parser or predictive parser or without
backtracking parser or dynamic parser.
 It uses a parsing table to generate the parse
tree instead of backtracking.
2- Bottom-up Parser: Bottom-up Parser is the parser that
generates the parse tree for the given input string with the help of
grammar productions by compressing the nonterminal i.e. it starts
from non-terminals and ends on the start symbol.
It uses the reverse of the rightmost derivation. Further Bottom-
up parser is classified into two types: LR parser, and Operator
precedence parser, Shift Reduce Parsers
LR parser is the bottom-up parser that generates the parse tree
for the given string by using unambiguous grammar. It follows the
reverse of the rightmost derivation.
types of Bottom-up Parser
 LR parser ( a- LR(0) b- SLR(1) c-LALR(1)
d-CLR(1))
 Operator precedence parser.
 Shift Reduce Parsers
LR parser is the bottom-up parser that generates the parse tree for the given
string by using unambiguous grammar. It follows the reverse of the
rightmost derivation.
Why LR parsing:
• LR parsers can be constructed to recognize virtually all programming-
language constructs for which context-free grammars can be written.
• The LR parsing method is the most general non-backtracking shift-reduce
parsing method known, yet it can be implemented as efficiently as other shift-
reduce methods.
• The class of grammars that can be parsed using LR methods is a proper
subset of the class of grammars that can be parsed with predictive parsers.
• An LR parser can detect a syntactic error as soon as it is possible to do so
on a left-toright scan of the input.
• The disadvantage is that it takes too much work to construct an LR parser
by hand for a typical programming-language grammar. But there are lots of
LR parser generators available to make this task easy.
Cont…

LR(0) parser
Also know as LR(k) parsing where "L" stands for left most derivation
scanning of the input. "R" stands for the construction of the right most
derivation in reverse. "K" stands for the number of input symbols of the look
ahead used to make number of parsing decisions.
This is non-recursive shift-reduce, bottom-up parsing.
It is used to parse a large class of grammars therefore making it the most
efficient syntax analysis technique.
SLR(1) (Simple LR Parsing)
It is similar to LR(0) parser but works on a reduced entry. It has few number
of states hence very small table.
It is has simple and fast construction
Cont….

LALR(1) Parsing(Look-Ahead LR Parsing)


It attempts to reduce the number of states in LR parser by merging similar
states. This reduces the number of states to the same as SLR but retains
some power of LR look- aheads. It is the Look ahead LR parser. Works on
an intermediate size of grammar. It can has the same number of states as
the SLR(1).
CLR(1)(Canonical Look-Ahead LR Parsing)
Parsing Refers to canonical look ahead. It uses the canonical collection of
LR(1) to construct its parsing table. It has more states compared to SLR
parsing. In this type of parsing the reduced node is replaced only in the look
ahead symbols
Operator precedence parser generates the parse tree form given grammar
and string but the only condition is two consecutive non-terminals and
epsilon never appear on the left and right-hand side of any production.
It is a kind of shift-reduce parsing applied to a small class of operator
grammars. A grammar is said to be operator precedence if;
No two terminals are adjacent at the right side of production.
Production does not contain E on its right side.
Operator precedence relations.
p < q --> p has less precedence than q
p > q --> p has more precedence than q
p = q --> p has equal precedence than q Limitations.
Applies only to small class of grammars.
An operator like minus can be unary or binary and therefore will have
different precedence in different statements.
Rightmost Derivation (RMD): If we scan and replace the input
with production rules, from right to left, it is known as right-most
derivation.
• The sentential form derived from the right-most derivation is
called the right sentential form.
Example 1: Consider the G, E → E + E | E * E | (E ) | - E | id
Derive the string id + id * id using rightmost derivation.
Cont….

 In bottom-up parsing, Constructing a parse tree for


an input string beginning at the leaves and going
towards the root. Bottom up parsers start from the
sequence of terminal symbols and work their way
back up to the start symbol by repeatedly replacing
grammar rules' right hand sides by the
corresponding non-terminal. This is the reverse of
the derivation process, and is called "reduction". A
general type of bottom-up parser is a shift-reduce
parser.
Cont……

Shift Reduce Parsers


Shift-reduce parsing is a type of bottom -up parsing
that attempts to construct a parse tree for an input
string beginning at the leaves (the bottom) and
working up towards the root (the top).
 parser is sometimes called as Syntax
Analyze . It constructs the parse tree. It takes
all the tokens one by one and uses Context
Free Grammar to construct the parse tree.
Why Grammar ?
 The rules of programming can be entirely
represented in some few productions. Using
these productions we can represent what the
program actually is. The input has to be
checked whether it is in the desired format or
not.
Context-Free Grammars
Syntax analysis or parsing is the second phase of a compiler. In this chapter,
we shall learn the basic concepts used in the construction of a parser. We
have seen that a lexical analyzer can identify tokens with the help of regular
expressions and pattern rules. But a lexical analyzer cannot check the syntax
of a given sentence due to the limitations of the regular expressions. Regular
expressions cannot check balancing tokens, such as parenthesis. Therefore,
this phase uses context-free grammar CFG, which is recognized by
pushdown automata.

It implies that every Regular Grammar is also context-free, but there exists some
problems, which are beyond the scope of Regular Grammar. CFG is a helpful tool in
describing the syntax of programming languages.
Cont…..

A grammar is a set of rules for putting strings together and so


corresponds to a language.
A context-free grammar is a quadruple (V, Z, P, S) where:
V is a set of variables (also called nonterminals), one of which is
designated the start variable; It is customary to use upper-case
letters for variables;
Z (the alphabet) is a finite set of terminal symbols.
P is a finite set of rules( like production rules) ( A X). Where x
is string of variables and terminals S is a distinguished element
of V called the start symbol. The sets V and Z are assumed to
be disjoint.
Cont…..
Context-Free Languages
A language L is context-free IF AND ONLY IF there is a grammar G with L=L(G)
Classes of Grammars
In 1959 Noam Chomsky, a linguist, suggested a way of classifying grammars
according to complexity . The convention used below, and in the remaining chapters,
is that the term string includes the null string and that, in referring to grammars, the
following symbols will have particular meanings
Generally, A context-free grammar (CFG) consists of a
set of productions that you use to replace a variable by
a string of variables and terminals. The language of a
grammar is the set of strings it generates. A language
is context-free if there is a CFG for it.
Context free grammars
Example:
Derivation Order
Derivation example
Derivation Trees /parser tree
Derivation Trees /parser tree
Derivation Trees /parser tree
Derivation Trees /parser tree
Derivation Trees /parser tree
In a parse tree:
All leaf nodes are terminals.
All interior nodes are non-terminals.
In-order traversal gives original input string. A parse tree depicts associativity and
precedence of operators. The deepest sub-tree is traversed first, therefore the
operator in that sub-tree gets precedence over the operator which is in the parent
nodes
Ambiguity:
A grammar that produces more than one parse for some sentence is said to be
ambiguous grammar. i.e. An ambiguous grammar is one that produce more than one
leftmost or more than one rightmost derivation for the same sentence.
Ambiguous and un ambiguous grammar
thank you

You might also like