0% found this document useful (0 votes)

23 views

Compiler Design - Syntax Analysis

Uploaded by

Samuel

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Compiler Design - Syntax Analysis

Uploaded by

Samuel

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Compiler Design - Syntax Analysis

Syntax analysis or parsing is the second phase of a compiler. In this

chapter, we shall learn the basic concepts used in the construction
of a parser.

We have seen that a lexical analyzer can identify tokens with the
help of regular expressions and pattern rules. But a lexical analyzer
cannot check the syntax of a given sentence due to the limitations
of the regular expressions. Regular expressions cannot check
balancing tokens, such as parenthesis. Therefore, this phase uses
context-free grammar (CFG), which is recognized by push-down
automata.

CFG, on the other hand, is a superset of Regular Grammar, as

depicted below:

It implies that every Regular Grammar is also context-free, but

there exists some problems, which are beyond the scope of Regular
Grammar. CFG is a helpful tool in describing the syntax of
programming languages.

Context-Free Grammar
In this section, we will first see the definition of context-free
grammar and introduce terminologies used in parsing technology.

A context-free grammar has four components:

 A set of non-terminals (V). Non-terminals are syntactic variables

that denote sets of strings. The non-terminals define sets of
strings that help define the language generated by the
grammar.
 A set of tokens, known as terminal symbols (Σ). Terminals are
the basic symbols from which strings are formed.
 A set of productions (P). The productions of a grammar specify
the manner in which the terminals and non-terminals can be
combined to form strings. Each production consists of a non-
terminal called the left side of the production, an arrow, and a
sequence of tokens and/or on- terminals, called the right side of
the production.
 One of the non-terminals is designated as the start symbol
(S); from where the production begins.

The strings are derived from the start symbol by repeatedly

replacing a non-terminal (initially the start symbol) by the right side
of a production, for that non-terminal.

Example

We take the problem of palindrome language, which cannot be

described by means of Regular Expression. That is, L = { w | w =
wR } is not a regular language. But it can be described by means of
CFG, as illustrated below:

G = ( V, Σ, P, S )

Where:

V = { Q, Z, N }
Σ = { 0, 1 }
P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 | N → 1Q1 }
S={Q}
This grammar describes palindrome language, such as: 1001,
11100111, 00100, 1010101, 11111, etc.

Syntax Analyzers
A syntax analyzer or parser takes the input from a lexical analyzer
in the form of token streams. The parser analyzes the source code
(token stream) against the production rules to detect any errors in
the code. The output of this phase is a parse tree.

This way, the parser accomplishes two tasks, i.e., parsing the code,
looking for errors and generating a parse tree as the output of the
phase.

Parsers are expected to parse the whole code even if some errors
exist in the program. Parsers use error recovering strategies, which
we will learn later in this chapter.

Derivation
A derivation is basically a sequence of production rules, in order to
get the input string. During parsing, we take two decisions for some
sentential form of input:

 Deciding the non-terminal which is to be replaced.

 Deciding the production rule, by which, the non-terminal will
be replaced.

To decide which non-terminal to be replaced with production rule,

we can have two options.
Left-most Derivation

If the sentential form of an input is scanned and replaced from left

to right, it is called left-most derivation. The sentential form derived
by the left-most derivation is called the left-sentential form.

Right-most Derivation

If we scan and replace the input with production rules, from right to
left, it is known as right-most derivation. The sentential form
derived from the right-most derivation is called the right-sentential
form.

Example

Production rules:

E → E + E
E → E * E
E → id

Input string: id + id * id

The left-most derivation is:

E → E * E
E → E + E * E
E → id + E * E
E → id + id * E
E → id + id * id

Notice that the left-most side non-terminal is always processed first.

The right-most derivation is:

E → E + E
E → E + E * E
E → E + E * id
E → E + id * id
E → id + id * id
Parse Tree
A parse tree is a graphical depiction of a derivation. It is convenient
to see how strings are derived from the start symbol. The start
symbol of the derivation becomes the root of the parse tree. Let us
see this by an example from the last topic.

We take the left-most derivation of a + b * c

The left-most derivation is:

E → E * E
E → E + E * E
E → id + E * E
E → id + id * E
E → id + id * id

Step 1:

E→E*E

Step 2:

E→E+E*E

Step 3:
E → id + E * E

Step 4:

E → id + id * E

Step 5:
E → id + id * id

In a parse tree:

 All leaf nodes are terminals.

 All interior nodes are non-terminals.
 In-order traversal gives original input string.

A parse tree depicts associativity and precedence of operators. The

deepest sub-tree is traversed first, therefore the operator in that
sub-tree gets precedence over the operator which is in the parent
nodes.

Ambiguity
A grammar G is said to be ambiguous if it has more than one parse
tree (left or right derivation) for at least one string.

Example

E → E + E
E → E – E
E → id

For the string id + id – id, the above grammar generates two parse
trees:
The language generated by an ambiguous grammar is said to
be inherently ambiguous. Ambiguity in grammar is not good for a
compiler construction. No method can detect and remove ambiguity
automatically, but it can be removed by either re-writing the whole
grammar without ambiguity, or by setting and following
associativity and precedence constraints.

Associativity
If an operand has operators on both sides, the side on which the
operator takes this operand is decided by the associativity of those
operators. If the operation is left-associative, then the operand will
be taken by the left operator or if the operation is right-associative,
the right operator will take the operand.

Example

Operations such as Addition, Multiplication, Subtraction, and

Division are left associative. If the expression contains:

id op id op id

it will be evaluated as:

(id op id) op id

For example, (id + id) + id

Operations like Exponentiation are right associative, i.e., the order
of evaluation in the same expression will be:

id op (id op id)

For example, id ^ (id ^ id)

Precedence
If two different operators share a common operand, the precedence
of operators decides which will take the operand. That is, 2+3*4
can have two different parse trees, one corresponding to (2+3)*4
and another corresponding to 2+(3*4). By setting precedence
among operators, this problem can be easily removed. As in the
previous example, mathematically * (multiplication) has precedence
over + (addition), so the expression 2+3*4 will always be
interpreted as:

2 + (3 * 4)

These methods decrease the chances of ambiguity in a language or

its grammar.

Left Recursion
A grammar becomes left-recursive if it has any non-terminal ‘A’
whose derivation contains ‘A’ itself as the left-most symbol. Left-
recursive grammar is considered to be a problematic situation for
top-down parsers. Top-down parsers start parsing from the Start
symbol, which in itself is non-terminal. So, when the parser
encounters the same non-terminal in its derivation, it becomes hard
for it to judge when to stop parsing the left non-terminal and it goes
into an infinite loop.

Example:

(1) A => Aα | β

(2) S => Aα | β
A => Sd
(1) is an example of immediate left recursion, where A is any non-
terminal symbol and α represents a string of non-terminals.

(2) is an example of indirect-left recursion.

A top-down parser will first parse the A, which in-turn will yield a
string consisting of A itself and the parser may go into a loop
forever.

Removal of Left Recursion

One way to remove left recursion is to use the following technique:

The production

A => Aα | β

is converted into following productions

A => βA'
A'=> αA' | ε

This does not impact the strings derived from the grammar, but it
removes immediate left recursion.

Second method is to use the following algorithm, which should

eliminate all direct and indirect left recursions.

START

Arrange non-terminals in some order like A1, A2, A3,…, An

for each i from 1 to n
{
for each j from 1 to i-1
{
replace each production of form Ai ⟹Aj𝜸
with Ai ⟹ δ1𝜸 | δ2𝜸 | δ3𝜸 |…| 𝜸
where Aj ⟹ δ1 | δ2|…| δn are current Aj productions
}
}
eliminate immediate left-recursion

END

Example

The production set

S => Aα | β
A => Sd

after applying the above algorithm, should become

S => Aα | β
A => Aαd | βd

and then, remove immediate left recursion using the first technique.

A => βdA'
A' => αdA' | ε

Now none of the production has either direct or indirect left

recursion.

Left Factoring
If more than one grammar production rules has a common prefix
string, then the top-down parser cannot make a choice as to which
of the production it should take to parse the string in hand.

Example

If a top-down parser encounters a production like

A ⟹ αβ | α𝜸 | …
Then it cannot determine which production to follow to parse the
string as both productions are starting from the same terminal (or
non-terminal). To remove this confusion, we use a technique called
left factoring.

Left factoring transforms the grammar to make it useful for top-

down parsers. In this technique, we make one production for each
common prefixes and the rest of the derivation is added by new
productions.

Example

The above productions can be written as

A => αA'
A'=> β | 𝜸 | …

Now the parser has only one production per prefix which makes it
easier to take decisions.

First and Follow Sets

An important part of parser table construction is to create first and
follow sets. These sets can provide the actual position of any
terminal in the derivation. This is done to create the parsing table
where the decision of replacing T[A, t] = α with some production
rule.

First Set

This set is created to know what terminal symbol is derived in the

first position by a non-terminal. For example,

α→tβ

That is α derives t (terminal) in the very first position. So, t ∈

FIRST(α).

Algorithm for calculating First set

Look at the definition of FIRST(α) set:

 if α is a terminal, then FIRST(α) = { α }.
 if α is a non-terminal and α → ℇ is a production, then FIRST(α)
= { ℇ }.
 if α is a non-terminal and α → 𝜸1 𝜸2 𝜸3 … 𝜸n and any FIRST(𝜸)
contains t then t is in FIRST(α).

First set can be seen as:

Follow Set

Likewise, we calculate what terminal symbol immediately follows a

non-terminal α in production rules. We do not consider what the
non-terminal can generate but instead, we see what would be the
next terminal symbol that follows the productions of a non-terminal.

Algorithm for calculating Follow set:

 if α is a start symbol, then FOLLOW() = $

 if α is a non-terminal and has a production α → AB, then
FIRST(B) is in FOLLOW(A) except ℇ.
 if α is a non-terminal and has a production α → AB, where B ℇ,
then FOLLOW(A) is in FOLLOW(α).

Follow set can be seen as: FOLLOW(α) = { t | S αt}

Limitations of Syntax Analyzers

Syntax analyzers receive their inputs, in the form of tokens, from
lexical analyzers. Lexical analyzers are responsible for the validity of
a token supplied by the syntax analyzer. Syntax analyzers have the
following drawbacks -

 it cannot determine if a token is valid,

 it cannot determine if a token is declared before it is being
used,
 it cannot determine if a token is initialized before it is being
used,
 it cannot determine if an operation performed on a token type
is valid or not.

These tasks are accomplished by the semantic analyzer, which we

shall study in Semantic Analysis.

Compiler 3
No ratings yet
Compiler 3
11 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
11 pages
Chapter – 3
No ratings yet
Chapter – 3
46 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
88 pages
Atcd Unit 2
No ratings yet
Atcd Unit 2
49 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
34 pages
AT&CD Unit 2
No ratings yet
AT&CD Unit 2
26 pages
Context Free Grammar: 1. G (V, T, P, S)
No ratings yet
Context Free Grammar: 1. G (V, T, P, S)
37 pages
compiler_design- Module3
No ratings yet
compiler_design- Module3
19 pages
Top Down Parsing-Note1
No ratings yet
Top Down Parsing-Note1
18 pages
Session 3
No ratings yet
Session 3
18 pages
Unit 2 Compiler
No ratings yet
Unit 2 Compiler
42 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
37 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
Syntax Analysis
No ratings yet
Syntax Analysis
73 pages
Grammars
No ratings yet
Grammars
34 pages
Syntax Analysis
No ratings yet
Syntax Analysis
27 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
28 pages
SPCC Oral Questions
No ratings yet
SPCC Oral Questions
10 pages
Lesson 18
No ratings yet
Lesson 18
32 pages
Module-2 1
No ratings yet
Module-2 1
51 pages
Chapter 3
No ratings yet
Chapter 3
9 pages
6836
No ratings yet
6836
42 pages
CC 3
No ratings yet
CC 3
29 pages
Syntax Analysis
No ratings yet
Syntax Analysis
58 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
43 pages
CD Unit2
No ratings yet
CD Unit2
45 pages
CD Unit 2
100% (1)
CD Unit 2
20 pages
Unit 2-Part B
No ratings yet
Unit 2-Part B
73 pages
PCD - Unit Ii
No ratings yet
PCD - Unit Ii
31 pages
Syntax Analysis
No ratings yet
Syntax Analysis
47 pages
Compiler Design Unit II-1
No ratings yet
Compiler Design Unit II-1
46 pages
Compiler Design Questions
No ratings yet
Compiler Design Questions
6 pages
2014-CD Ch-03 SAn
No ratings yet
2014-CD Ch-03 SAn
21 pages
Unit-2 F&CD
No ratings yet
Unit-2 F&CD
31 pages
Ambiguity in Grammar: Solution
No ratings yet
Ambiguity in Grammar: Solution
5 pages
Unit II PDF
No ratings yet
Unit II PDF
7 pages
Lecture 08 09 PDF
No ratings yet
Lecture 08 09 PDF
10 pages
M2 Compiler Design
No ratings yet
M2 Compiler Design
51 pages
WWW Tutorialspoint Com Compiler Design Compiler Design Syntax Analysis HTM
No ratings yet
WWW Tutorialspoint Com Compiler Design Compiler Design Syntax Analysis HTM
11 pages
FIRST Set in Syntax Analysis: Lecture-05
No ratings yet
FIRST Set in Syntax Analysis: Lecture-05
14 pages
Unit 2
No ratings yet
Unit 2
29 pages
CD2
No ratings yet
CD2
24 pages
Unit 2
No ratings yet
Unit 2
45 pages
Unit-2 PCD
No ratings yet
Unit-2 PCD
36 pages
2024_CD-Ch03_Syntaxx_Analysis
No ratings yet
2024_CD-Ch03_Syntaxx_Analysis
28 pages
Module 2
No ratings yet
Module 2
36 pages
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
No ratings yet
3-Module 2 - Role of Parser - Parse Tree-02-08-2024
76 pages
Why Syntax Analysis?
No ratings yet
Why Syntax Analysis?
15 pages
Chapter Four Automata
No ratings yet
Chapter Four Automata
36 pages
Parser
No ratings yet
Parser
4 pages
COMPILER DESIGN UNIT 2
No ratings yet
COMPILER DESIGN UNIT 2
44 pages
Syntax Analysis Parsing (1)
No ratings yet
Syntax Analysis Parsing (1)
9 pages
Chapter 3 (Part 1)
No ratings yet
Chapter 3 (Part 1)
33 pages
Lex
No ratings yet
Lex
13 pages
Chapter 3a - Syntax Analysis
No ratings yet
Chapter 3a - Syntax Analysis
10 pages
Module 3 Ss and CD Lecture Notes 18cs61
No ratings yet
Module 3 Ss and CD Lecture Notes 18cs61
15 pages
Compiler 2
No ratings yet
Compiler 2
32 pages
Unit III
No ratings yet
Unit III
29 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
TOA Mid - II Question and Answers
No ratings yet
TOA Mid - II Question and Answers
5 pages
Tut 7 Sol Objects and Classes
No ratings yet
Tut 7 Sol Objects and Classes
5 pages
Self-Quiz 1 Syntax and Semantics Including EBNF and Compiler Concepts Attempt Review
No ratings yet
Self-Quiz 1 Syntax and Semantics Including EBNF and Compiler Concepts Attempt Review
7 pages
Control Structures in Python
No ratings yet
Control Structures in Python
24 pages
R5RS
No ratings yet
R5RS
50 pages
Syllabus csc309 2024_2025
No ratings yet
Syllabus csc309 2024_2025
3 pages
Linear Bounded Automata Lbas
No ratings yet
Linear Bounded Automata Lbas
36 pages
An Introduction To Prolog Programming: Ulle Endriss Institute For Logic, Language and Computation University of Amsterdam
No ratings yet
An Introduction To Prolog Programming: Ulle Endriss Institute For Logic, Language and Computation University of Amsterdam
25 pages
CodingBat Answers
No ratings yet
CodingBat Answers
12 pages
Regular Languages and Finite State Automata
No ratings yet
Regular Languages and Finite State Automata
15 pages
Inside 1 1) Definitions of Math
No ratings yet
Inside 1 1) Definitions of Math
3 pages
Critical Thinking
No ratings yet
Critical Thinking
5 pages
Structural Pattern
No ratings yet
Structural Pattern
60 pages
Slide#01 Week # 01-05 TAFL (CT-364)
No ratings yet
Slide#01 Week # 01-05 TAFL (CT-364)
114 pages
CFG To CNF Conversion
No ratings yet
CFG To CNF Conversion
5 pages
Principles of Compiler Design
No ratings yet
Principles of Compiler Design
36 pages
Weakest-Precondition Reasoning What Do We Mean by "Weakest"?
No ratings yet
Weakest-Precondition Reasoning What Do We Mean by "Weakest"?
18 pages
Flex and Bison
100% (2)
Flex and Bison
21 pages
Types of Parsing
No ratings yet
Types of Parsing
12 pages
Fuzzylogic 2
No ratings yet
Fuzzylogic 2
25 pages
4.1 Data Driven Modelling
No ratings yet
4.1 Data Driven Modelling
4 pages
15A05404 Formal Languages & Automata Theory
No ratings yet
15A05404 Formal Languages & Automata Theory
2 pages
Programming For Engineers: Excel Automation With VBA Session 1: "Eat Your Vegetables"
No ratings yet
Programming For Engineers: Excel Automation With VBA Session 1: "Eat Your Vegetables"
29 pages
Codecademy C# Logics
No ratings yet
Codecademy C# Logics
6 pages
Ang Thomas Waterloo Automata MS
No ratings yet
Ang Thomas Waterloo Automata MS
60 pages
Download Complete Logic for Computer Scientists 1st Edition Uwe Schöning (Auth.) PDF for All Chapters
100% (9)
Download Complete Logic for Computer Scientists 1st Edition Uwe Schöning (Auth.) PDF for All Chapters
72 pages
2 Python Regular Expression Patterns List
No ratings yet
2 Python Regular Expression Patterns List
4 pages
Chomsky Normal Form: Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity
No ratings yet
Chomsky Normal Form: Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity
4 pages
Funcionamiento Tarjeta Xmotion
No ratings yet
Funcionamiento Tarjeta Xmotion
9 pages
Kotlin Cheat Sheet 1p - by Ekito 1.2
No ratings yet
Kotlin Cheat Sheet 1p - by Ekito 1.2
4 pages