Chapter 3 Compiler Design
Chapter 3 Compiler Design
Chapter 3 Compiler Design
parsing/Syntax Analysis
COMPILER DESIGN – PHASES OF COMPILER
The compilation process is a sequence of various phases. Each phase takes
input from its previous stage, has its own representation of source program,
and feeds its output to the next phase of the compiler. Let us understand the
phases of a compiler
Parser
The parser is the phase of the compiler which takes a
token string as input and with the help of existing
grammar, converts it into the corresponding Intermediate
Representation.
Comparison with Lexical Analysis
The role of the Parser
Cont….
Error handling
Common programming errors
Lexical errors
parser errors/ Syntactic errors
Semantic errors
Error handler goals
Report the presence of errors clearly and accurately
Recover from each error quickly enough to detect
subsequent errors.
Add minimal overhead to the processing of correct
programs.
Syntax error/parser error can be detected
at this level if the input is not in
accordance with the grammar.
Parsing or syntactic analysis is the process of analyzing a
string of symbols, either in natural language or in
computer languages, conforming to the rules of a formal
grammar
It is aided by using techniques based on formal grammar
of the programming language
It is analyze (a string or text) into logical syntactic
components, typically in order to test conformability to a
logical grammar.
It takes the token produced by lexical analysis as input
and generates a parse tree (or syntax tree). In this phase,
token arrangements are checked against the source code
grammar, i.e., the parser checks if the expression made by
the tokens is syntactically correct.
If the lexical analyzer finds a token invalid, it
generates an error.
The lexical analyzer works closely with the syntax
analyzer. It reads character streams from the source
code, checks for legal tokens, and passes the data to
the syntax analyzer when it demands.
The parser analyzes the source code token stream
against the production rules to detect any errors in
the code. The output of this phase is a parse tree.
This way, the parser accomplishes two tasks, i.e., parsing the
code, looking for errors and generating a parse tree as the output
of the phase.
Types of Parser:
Types of Parser:
The parser is mainly classified into two categories, i.e.
Top-down Parser, and Bottom-up Parser. These are
explained below:
1- Top-Down Parser: The top-down parser is the
parser that generates parse for the given input string
with the help of grammar productions by expanding the
non-terminals i.e. it starts from the start symbol and
ends on the terminals. It uses left most derivation.
Further Top-down parser is classified into 2 types:
Recursive descent parser, and Non-recursive descent
parser.
Leftmost Derivation (LMD): If the sentential
form of an input is scanned and replaced from left
to right, it is called left-most derivation.
• The sentential form derived by the left-most
derivation is called the left-sentential form.
Example: Consider the G,
E → E + E | E * E | (E ) | - E | id
Derive the string id + id * id using leftmost
derivation
Recursive descent parser is also known as the
Brute force parser or the backtracking parser.
It basically generates the parse tree by using
brute force and backtracking.
Non-recursive descent parser is also known as
LL(1) parser or predictive parser or without
backtracking parser or dynamic parser.
It uses a parsing table to generate the parse
tree instead of backtracking.
2- Bottom-up Parser: Bottom-up Parser is the parser that
generates the parse tree for the given input string with the help of
grammar productions by compressing the nonterminal i.e. it starts
from non-terminals and ends on the start symbol.
It uses the reverse of the rightmost derivation. Further Bottom-
up parser is classified into two types: LR parser, and Operator
precedence parser, Shift Reduce Parsers
LR parser is the bottom-up parser that generates the parse tree
for the given string by using unambiguous grammar. It follows the
reverse of the rightmost derivation.
types of Bottom-up Parser
LR parser ( a- LR(0) b- SLR(1) c-LALR(1)
d-CLR(1))
Operator precedence parser.
Shift Reduce Parsers
LR parser is the bottom-up parser that generates the parse tree for the given
string by using unambiguous grammar. It follows the reverse of the
rightmost derivation.
Why LR parsing:
• LR parsers can be constructed to recognize virtually all programming-
language constructs for which context-free grammars can be written.
• The LR parsing method is the most general non-backtracking shift-reduce
parsing method known, yet it can be implemented as efficiently as other shift-
reduce methods.
• The class of grammars that can be parsed using LR methods is a proper
subset of the class of grammars that can be parsed with predictive parsers.
• An LR parser can detect a syntactic error as soon as it is possible to do so
on a left-toright scan of the input.
• The disadvantage is that it takes too much work to construct an LR parser
by hand for a typical programming-language grammar. But there are lots of
LR parser generators available to make this task easy.
Cont…
LR(0) parser
Also know as LR(k) parsing where "L" stands for left most derivation
scanning of the input. "R" stands for the construction of the right most
derivation in reverse. "K" stands for the number of input symbols of the look
ahead used to make number of parsing decisions.
This is non-recursive shift-reduce, bottom-up parsing.
It is used to parse a large class of grammars therefore making it the most
efficient syntax analysis technique.
SLR(1) (Simple LR Parsing)
It is similar to LR(0) parser but works on a reduced entry. It has few number
of states hence very small table.
It is has simple and fast construction
Cont….
It implies that every Regular Grammar is also context-free, but there exists some
problems, which are beyond the scope of Regular Grammar. CFG is a helpful tool in
describing the syntax of programming languages.
Cont…..