Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
146 views

Compiler Design Notes Unit-1 & Unit-2

This document discusses the introduction to language processing and compilers. It covers: 1) Language translators like compilers and interpreters that translate programs between languages. Compilers translate to an executable target program, while interpreters directly execute operations. 2) The typical phases of a compiler including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, optimization, and code generation. 3) Types of compilers like traditional compilers, cross-compilers, just-in-time compilers, and more. 4) The typical steps in a language processing system including preprocessing, compiling, linking/loading to produce executable code.

Uploaded by

Kartik sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views

Compiler Design Notes Unit-1 & Unit-2

This document discusses the introduction to language processing and compilers. It covers: 1) Language translators like compilers and interpreters that translate programs between languages. Compilers translate to an executable target program, while interpreters directly execute operations. 2) The typical phases of a compiler including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, optimization, and code generation. 3) Types of compilers like traditional compilers, cross-compilers, just-in-time compilers, and more. 4) The typical steps in a language processing system including preprocessing, compiling, linking/loading to produce executable code.

Uploaded by

Kartik sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Department ofComputer Science&Engineering CompilerDesign by Dr.

Rashmi Ranjan Sahoo

UNIT-I
INTRODUCTION TO LANGUAGE PROCESSING:
As Computers became inevitable and indigenous part of human life, and several languages
with different and more advanced features are evolved into this stream to satisfy or comfort the user
in communicating with the machine , the development of the translators or mediator Software‘s
have become essential to fill the huge gap between the human and machine understanding. This
process is called Language Processing to reflect the goal and intent of the process. On the way to
this process to understand it in a better way, we have to be familiar with some key terms and
concepts explained in following lines.

LANGUAGE TRANSLATORS :

Is a computer program which translates a program written in one (Source) language to its
equivalent program in other [Target]language. The Source program is a high level language where as
the Target language can be any thing from the machine language of a target machine (between
Microprocessor to Supercomputer) to another high level language program.

 Two commonly Used Translators are Compiler and Interpreter


1. Compiler : Compiler is a program, reads program in one language called Source Language
and translates in to its equivalent program in another Language called Target Language, in
addition to this its presents the error information to the User.

 If the target program is an executable machine-language program, it can then be called by


the users to process inputs and produce outputs.

Input Target Program Output

Figure1.1: Running the target Program

Page 1 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

2. Interpreter: An interpreter is another commonly used language processor. Instead of producing


a target program as a single translation unit, an interpreter appears to directly execute the
operations specified in the source program on inputs supplied by theuser.

Source Program
Input Interpreter Output

Figure 1.2: Running the target Program

LANGUAGE PROCESSING SYSTEM:


Based on the input the translator takes and the output it produces, a language translator can be
called as any one of the following.
Preprocessor: A preprocessor takes the skeletal source program as input and produces an extended
version of it, which is the resultant of expanding the Macros, manifest constants if any, and
including header files etc in the source file. For example, the C preprocessor is a macro processor
that is used automatically by the C compiler to transform our source before actual compilation. Over
and above a preprocessor performs the following activities:
 Collects all the modules, files in case if the source program is divided into different modules
stored at different files.
 Expands short hands / macros into source language statements.
Compiler: Is a translator that takes as input a source program written in high level language and
converts it into its equivalent target program in machine language. In addition to above the compiler
also
 Reports to its user the presence of errors in the source program.
 Facilitates the user in rectifying the errors, and execute the code.
Assembler: Is a program that takes as input an assembly language program and converts it into its
equivalent machine language code.
Loader / Linker: This is a program that takes as input a relocatable code and collects the library
functions, relocatable object files, and produces its equivalent absolute machine code.
Specifically,
 Loading consists of taking the relocatable machine code, altering the relocatable addresses,
and placing the altered instructions and data in memory at the proper locations.
 Linking allows us to make a single program from several files of relocatable machine code.
These files may have been result of several different compilations, one or more may be
library routines provided by the system available to any program that needs them.

Page 2 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

In addition to these translators, programs like interpreters, text formatters etc., may be used in
language processing system. To translate a program in a high level language program to an
executable one, the Compiler performs by default the compile and linking functions.
Normally the steps in a language processing system includes Preprocessing the skeletal Source
program which produces an extended or expanded source program or a ready to compile unit of
the source program, followed by compiling the resultant, then linking / loading , and finally its
equivalent executable code is produced. As I said earlier not all these steps are mandatory. In
some cases, the Compiler only performs this linking and loading functions implicitly.
The steps involved in a typical language processing system can be understood with following
diagram.
Source Program [ Example: filename.C ]

Preprocessor

Modified Source Program [ Example: filename.C ]

Compiler

Target Assembly Program

Assembler

Relocatable Machine Code [ Example: filename.obj ]

Loader/Linker Library files


Relocatable Object files
Target Machine Code [ Example: filename. exe ]
Figure1.3 : Context of a Compiler in Language Processing System

TYPES OF COMPILERS:
Based on the specific input it takes and the output it produces, the Compilers can be classified
into the following types;

Traditional Compilers(C, C++, Pascal): These Compilers convert a source program in a HLL
into its equivalent in native machine code or object code.

Page 3 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Interpreters(LISP, SNOBOL, Java1.0): These Compilers first convert Source code into
intermediate code, and then interprets (emulates) it to its equivalent machine code.

Cross-Compilers: These are the compilers that run on one machine and produce code for
another machine.

Incremental Compilers: These compilers separate the source into user defined–steps;
Compiling/recompiling step- by- step; interpreting steps in a given order

Converters (e.g. COBOL to C++): These Programs will be compiling from one high level
language to another.

Just-In-Time (JIT) Compilers (Java, Micosoft.NET): These are the runtime compilers from
intermediate language (byte code, MSIL) to executable code or native machine code. These
perform type –based verification which makes the executable code more trustworthy

Ahead-of-Time (AOT) Compilers (e.g., .NET ngen): These are the pre-compilers to the native
code for Java and .NET

Binary Compilation: These compilers will be compiling object code of one platform into object code
of another platform.

PHASES OF A COMPILER:

Due to the complexity of compilation task, a Compiler typically proceeds in a Sequence of


compilation phases. The phases communicate with each other via clearly defined interfaces.
Generally an interface contains a Data structure (e.g., tree), Set of exported functions. Each
phase works on an abstract intermediate representation of the source program, not the source
program text itself (except the first phase)

Compiler Phases are the individual modules which are chronologically executed to perform their
respective Sub-activities, and finally integrate the solutions to give target code.

It is desirable to have relatively few phases, since it takes time to read and write immediate files.
Following diagram (Figure1.4) depicts the phases of a compiler through which it goes during the
compilation. There fore a typical Compiler is having the following Phases:

1. Lexical Analyzer (Scanner), 2. Syntax Analyzer (Parser), 3.Semantic Analyzer,


4.Intermediate Code Generator(ICG), 5.Code Optimizer(CO) , and 6.Code Generator(CG)

In addition to these, it also has Symbol table management, and Error handler phases. Not all
the phases are mandatory in every Compiler. e.g, Code Optimizer phase is optional in some

Page 4 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

cases. The description is given in next section.

The Phases of compiler divided in to two parts, first three phases we are called as
Analysis part remaining three called as Synthesis part.

Figure1.4 : Phases of a Compiler

PHASE, PASSES OF A COMPILER:

In some application we can have a compiler that is organized into what is called passes.
Where a pass is a collection of phases that convert the input from one representation to a
completely deferent representation. Each pass makes a complete scan of the input and produces
its output to be processed by the subsequent pass. For example a two pass Assembler.

THE FRONT-END & BACK-END OF A COMPILER

Page 5 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

All of these phases of a general Compiler are conceptually divided into The Front-end,
and The Back-end. This division is due to their dependence on either the Source Language or
the Target machine. This model is called an Analysis & Synthesis model of a compiler.
The Front-end of the compiler consists of phases that depend primarily on the Source
language and are largely independent on the target machine. For example, front-end of the
compiler includes Scanner, Parser, Creation of Symbol table, Semantic Analyzer, and the
Intermediate Code Generator.

The Back-end of the compiler consists of phases that depend on the target machine, and
those portions don‘t dependent on the Source language, just the Intermediate language. In this we
have different aspects of Code Optimization phase, code generation along with the necessary
Error handling, and Symbol table operations.

LEXICAL ANALYZER (SCANNER): The Scanner is the first phase that works as interface
between the compiler and the Source language program and performs the following functions:

 Reads the characters in the Source program and groups them into a stream of tokens in
which each token specifies a logically cohesive sequence of characters, such as an
identifier , a Keyword , a punctuation mark, a multi character operator like := .

 The character sequence forming a token is called a lexeme of the token.

 The Scanner generates a token-id, and also enters that identifiers name in the Symbol
table if it doesn‘t exist.

 Also removes the Comments, and unnecessary spaces.

The format of the token is < Token name, Attribute value>

SYNTAX ANALYZER (PARSER): The Parser interacts with the Scanner, and its subsequent
phase Semantic Analyzer and performs the following functions:

 Groups the above received, and recorded token stream into syntactic structures, usually
into a structure called Parse Tree whose leaves are tokens.

 The interior node of this tree represents the stream of tokens that logically belongs
together.

 It means it checks the syntax of program elements.

SEMANTIC ANALYZER: This phase receives the syntax tree as input, and checks the
semantically correctness of the program. Though the tokens are valid and syntactically correct, it

Page 6 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

may happen that they are not correct semantically. Therefore the semantic analyzer checks the
semantics (meaning) of the statements formed.

 The Syntactically and Semantically correct structures are produced here in the form of a
Syntax tree or DAG or some other sequential representation like matrix.

INTERMEDIATE CODE GENERATOR(ICG): This phase takes the syntactically and


semantically correct structure as input, and produces its equivalent intermediate notation of the
source program. The Intermediate Code should have two important properties specified below:

 It should be easy to produce,and Easy to translate into the target program. Example
intermediate code forms are:

 Three address codes,

 Polish notations, etc.

CODE OPTIMIZER: This phase is optional in some Compilers, but so useful and beneficial in
terms of saving development time, effort, and cost. This phase performs the following specific
functions:

 Attempts to improve the IC so as to have a faster machine code. Typical functions


include –Loop Optimization, Removal of redundant computations, Strength reduction,
Frequency reductions etc.

 Sometimes the data structures used in representing the intermediate forms may also be
changed.

CODE GENERATOR: This is the final phase of the compiler and generates the target code,
normally consisting of the relocatable machine code or Assembly code or absolute machine code.

 Memory locations are selected for each variable used, and assignment of variables to
registers is done.

 Intermediate instructions are translated into a sequence of machine instructions.

The Compiler also performs the Symbol table management and Error handling throughout the
compilation process. Symbol table is nothing but a data structure that stores different source
language constructs, and tokens generated during the compilation. These two interact with all
phases of the Compiler.

Page 7 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

For example the source program is an assignment statement; the following figure shows how the
phases of compiler will process the program.

The input source program is Position=initial+rate*60

Figure1.5: Translation of an assignment Statement

Page 8 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

LEXICAL ANALYSIS:
As the first phase of a compiler, the main task of the lexical analyzer is to read the
input characters of the source program, group them into lexemes, and produce as output tokens
for each lexeme in the source program. This stream of tokens is sent to the parser for syntax
analysis. It is common for the lexical analyzer to interact with the symbol table as well.
When the lexical analyzer discovers a lexeme constituting an identifier, it needs to
enter that lexeme into the symbol table. This process is shown in the following figure.

Figure 1.6 : Lexical Analyzer

. When lexical analyzer identifies the first token it will send it to the parser, the parser
receives the token and calls the lexical analyzer to send next token by issuing the getNextToken()
command. This Process continues until the lexical analyzer identifies all the tokens. During this
process the lexical analyzer will neglect or discard the white spaces and comment lines.

TOKENS, PATTERNS AND LEXEMES:

A token is a pair consisting of a token name and an optional attribute value. The token
name is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or a
sequence of input characters denoting an identifier. The token names are the input symbols that
the parser processes. In what follows, we shall generally write the name of a token in boldface.
We will often refer to a token by its token name.

A pattern is a description of the form that the lexemes of a token may take [ or match]. In the
case of a keyword as a token, the pattern is just the sequence of characters that form the keyword.
For identifiers and some other tokens, the pattern is a more complex structure that is matched by
many strings.

Page 9 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

A lexeme is a sequence of characters in the source program that matches the pattern for a
token and is identified by the lexical analyzer as an instance of that token.

Example: In the following C language statement ,

printf ("Total = %d\n‖, score) ;

both printf and score are lexemes matching the pattern for token id, and "Total = %d\n‖ is
a lexeme matching literal [or string].

Figure 1.7: Examples of Tokens

LEXICAL ANALYSIS Vs PARSING:

There are a number of reasons why the analysis portion of a compiler is normally separated into
lexical analysis and parsing (syntax analysis) phases.

 1. Simplicity of design is the most important consideration. The separation of Lexical


and Syntactic analysis often allows us to simplify at least one of these tasks. For example,
a parser that had to deal with comments and whitespace as syntactic units would be
considerably more complex than one that can assume comments and whitespace have
already been removed by the lexical analyzer.

 2. Compiler efficiency is improved. A separate lexical analyzer allows us to apply


specialized techniques that serve only the lexical task, not the job of parsing. In addition,
specialized buffering techniques for reading input characters can speed up the compiler
significantly.

 3. Compiler portability is enhanced: Input-device-specific peculiarities can be


restricted to the lexical analyzer.

Page 10 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

INPUT BUFFERING:

Before discussing the problem of recognizing lexemes in the input, let us examine
some ways that the simple but important task of reading the source program can be speeded.
This task is made difficult by the fact that we often have to look one or more characters beyond
the next lexeme before we can be sure we have the right lexeme. There are many situations
where we need to look at least one additional character ahead. For instance, we cannot be sure
we've seen the end of an identifier until we see a character that is not a letter or digit, and
therefore is not part of the lexeme for id. In C, single-character operators like -, =, or <
could also be the beginning of a two-character operator like ->, ==, or <=. Thus, we shall
introduce a two-buffer scheme that handles large look-aheads safely. We then consider an
improvement involving "sentinels" that saves time checking for the ends of buffers.

Buffer Pairs

Because of the amount of time taken to process characters and the large number of characters
that must be processed during the compilation of a large source program, specialized buffering
techniques have been developed to reduce the amount of overhead required to process a single
input character. An important scheme involves two buffers that are alternately reloaded.

Figure1.8 : Using a Pair of Input Buffers

Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096
bytes. Using one system read command we can read N characters in to a buffer, rather than
using one system call per character. If fewer than N characters remain in the input file, then a
special character, represented by eof, marks the end of the source file and is different from any
possible character of the source program.

 Two pointers to the input are maintained:

1. The Pointer lexemeBegin, marks the beginning of the current lexeme, whose extent
we are attempting to determine.

2. Pointer forward scans ahead until a pattern match is found; the exact strategy
whereby this determination is made will be covered in the balance of this chapter.

Page 11 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Once the next lexeme is determined, forward is set to the character at its right end. Then,
after the lexeme is recorded as an attribute value of a token returned to the parser, 1exemeBegin
is set to the character immediately after the lexeme just found. In Fig, we see forward has passed
the end of the next lexeme, ** (the FORTRAN exponentiation operator), and must be retracted
one position to its left.

Advancing forward requires that we first test whether we have reached the end of one
of the buffers, and if so, we must reload the other buffer from the input, and move forward to
the beginning of the newly loaded buffer. As long as we never need to look so far ahead of the
actual lexeme that the sum of the lexeme's length plus the distance we look ahead is greater
than N, we shall never overwrite the lexeme in its buffer before determining it.

Sentinels To Improve Scanners Performance:

If we use the above scheme as described, we must check, each time we advance forward,
that we have not moved off one of the buffers; if we do, then we must also reload the other buffer.
Thus, for each character read, we make two tests: one for the end of the buffer, and one to
determine what character is read (the latter may be a multi way branch). We can combine the
buffer-end test with the test for the current character if we extend each buffer to hold a sentinel
character at the end. The sentinel is a special character that cannot be part of the source program,
and a natural choice is the character eof. Figure 1.8 shows the same arrangement as Figure 1.7,
but with the sentinels added. Note that eof retains its use as a marker for the end of the entire
input.

Figure1.8 : Sentential at the end of each buffer

Any eof that appears other than at the end of a buffer means that the input is at an end. Figure 1.9
summarizes the algorithm for advancing forward. Notice how the first test, which can be part of

Page 12 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

a multiway branch based on the character pointed to by forward, is the only test we make, except
in the case where we actually are at the end of a buffer or the end of the input.

switch ( *forward++ )
{
case eof: if (forward is at end of first buffer )
{
reload second buffer;
forward = beginning of second buffer;
}

else if (forward is at end of second buffer )


{
reload first buffer;
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
}
Figure 1.9: use of switch-case for the sentential

SPECIFICATION OF TOKENS:

Regular expressions are an important notation for specifying lexeme patterns. While they cannot express
all possible patterns, they are very effective in specifying those types of patterns that we actually need for
tokens.

LEX the Lexical Analyzer generator

Lex is a tool used to generate lexical analyzer, the input notation for the Lex tool is
referred to as the Lex language and the tool itself is the Lex compiler. Behind the scenes, the
Lex compiler transforms the input patterns into a transition diagram and generates code, in a
file called lex .yy .c, it is a c program given for C Compiler, gives the Object code. Here we need
to know how to write the Lex language. The structure of the Lex program is given below.

Page 13 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Structure of LEX Program : A Lex program has the following form:

Declarations
%%
Translation rules
%%

Auxiliary functions definitions


The declarations section : includes declarations of variables, manifest constants (identifiers
declared to stand for a constant, e.g., the name of a token), and regular definitions. It appears
between %{. . .%}

In the Translation rules section, We place Pattern Action pairs where each pair have the form

Pattern {Action}

The auxiliary function definitions section includes the definitions of functions used to install
identifiers and numbers in the Symbol tale.

LEX Program Example:


%{
/* definitions of manifest constants LT,LE,EQ,NE,GT,GE, IF,THEN, ELSE,ID, NUMBER,
RELOP */
%}
/* regular definitions */
delim [ \t\n]
ws { delim}+
letter [A-Za-z]
digit [o-91
id {letter} ({letter} | {digit}) *
number {digit}+ (\ . {digit}+)? (E [+-I]?{digit}+)?
%%
{ws} {/* no action and no return */}
if {return(1F) ; }

Page 14 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

then {return(THEN) ; }
else {return(ELSE) ; }
(id) {yylval = (int) installID(); return(1D);}
(number) {yylval = (int) installNum() ; return(NUMBER) ; }
‖< ‖ {yylval = LT; return(REL0P) ; )}
―<=‖ {yylval = LE; return(REL0P) ; }
―=‖ {yylval = EQ ; return(REL0P) ; }
―<>‖ {yylval = NE; return(REL0P);}
―<‖ {yylval = GT; return(REL0P);)}
―<=‖ {yylval = GE; return(REL0P);}
%%
int installID0() {/* function to install the lexeme, whose first character is pointed to by yytext,
and whose length is yyleng, into the symbol table and return a pointer
thereto */
int installNum() {/* similar to installID, but puts numerical constants into a separate table */}
Figure 1.10 : Lex Program for tokens common tokens

SYNTAX ANALYSIS (PARSER)


THE ROLE OF THE PARSER:

In our compiler model, the parser obtains a string of tokens from the lexical analyzer,
as shown in the below Figure, and verifies that the string of token names can be generated
by the grammar for the source language. We expect the parser to report any syntax errors in
an intelligible fashion and to recover from commonly occurring errors to continue processing the
remainder of the program. Conceptually, for well-formed programs, the parser constructs a parse
tree and passes it to the rest of the compiler for further processing.

Page 15 of 152
CompilerDesign by Dr. Rashmi Ranjan Sahoo

Figure2.1: Parser in the Compiler

During the process of parsing it may encounter some error and present the error information back
to the user

Syntactic errors include misplaced semicolons or extra or missing braces; that is,
―{" or "}." As another example, in C or Java, the appearance of a case statement without
an enclosing switch is a syntactic error (however, this situation is usually allowed by the
parser and caught later in the processing, as the compiler attempts to generate code).

Based on the way/order the Parse Tree is constructed, Parsing is basically classified in to
following two types:

1. Top Down Parsing : Parse tree construction start at the root node and moves to the
children nodes (i.e., top down order).

2. Bottom up Parsing: Parse tree construction begins from the leaf nodes and proceeds
towards the root node (called the bottom up order).

IMPORTANT (OR) EXPECTED QUESTIONS

1. What is a Compiler? Explain the working of a Compiler with your own example?
2. What is the Lexical analyzer? Discuss the Functions of Lexical Analyzer.
3. Write short notes on tokens, pattern and lexemes?
4. Write short notes on Input buffering scheme? How do you change the basic input
buffering algorithm to achieve better performance?
5. What do you mean by a Lexical analyzer generator? Explain LEX tool.

Page 16 of 152
ASSIGNMENT QUESTIONS:
1. Write the differences between compilers and interpreters?

2. Write short notes on token reorganization?

3. Write the Applications of the Finite Automata?

4. Explain How Finite automata are useful in the lexical analysis?

5. Explain DFA and NFA with an Example?

Page 17 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

MODULE-II
TOP DOWN PARSING:
 Top-down parsing can be viewed as the problem of constructing a parse tree for the given
input string, starting from the root and creating the nodes of the parse tree in preorder
(depth-first left to right).

 Equivalently, top-down parsing can be viewed as finding a leftmost derivation for an


input string.

It is classified in to two different variants namely; one which uses Back Tracking and the other is
Non Back Tracking in nature.

Non Back Tracking Parsing: There are two variants of this parser as given below.
1. Table Driven Predictive Parsing :
i. LL (1) Parsing

2. Recursive Descent parsing

Back Tracking
1. Brute Force method

NON BACK TRACKING:


LL (1) Parsing or Predictive Parsing
LL (1) stands for, left to right scan of input, uses a Left most derivation, and the parser
takes 1 symbol as the look ahead symbol from the input in taking parsing action decision.
A non recursive predictive parser can be built by maintaining a stack explicitly, rather
than implicitly via recursive calls. The parser mimics a leftmost derivation. If w is the input
that has been matched so far, then the stack holds a sequence of grammar symbols a such
that

The table-driven parser in the figure has

 An input buffer that contains the string to be parsed followed by a $ Symbol, used to
indicate end of input.

 A stack, containing a sequence of grammar symbols with a $ at the bottom of the stack,
which initially contains the start symbol of the grammar on top of $.

 A parsing table containing the production rules to be applied. This is a two dimensional
array M [Non terminal, Terminal].

 A parsing Algorithm that takes input String and determines if it is conformant to


Grammar and it uses the parsing table and stack to take such decision.
Page 18 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Figure 2.2: Model for table driven parsing

The Steps Involved In constructing an LL(1) Parser are:


1.Write the Context Free grammar for given input String
2.Check for Ambiguity. If ambiguous remove ambiguity from the grammar
3.Check for Left Recursion. Remove left recursion if it exists.
4.Check For Left Factoring. Perform left factoring if it contains common prefixes in
more than one alternates.
5. Compute FIRST and FOLLOW sets
6. Construct LL(1) Table
7. Using LL(1) Algorithm generate Parse tree as the Output
Context Free Grammar (CFG): CFG used to describe or denote the syntax of the
programming language constructs. The CFG is denoted as G, and defined using a four tuple
notation.
Let G be CFG, then G is written as, G= (V, T, P, S)
Where

 V is a finite set of Non terminal; Non terminals are syntactic variables that denote sets of
strings. The sets of strings denoted by non terminals help define the language generated
by the grammar. Non terminals impose a hierarchical structure on the language that
is key to syntax analysis and translation.

 T is a Finite set of Terminal; Terminals are the basic symbols from which strings are
formed. The term "token name" is a synonym for '"terminal" and frequently we will use
the word "token" for terminal when it is clear that we are talking about just the token
name. We assume that the terminals are the first components of the tokens output by the
lexical analyzer.

 S is the Starting Symbol of the grammar, one non terminal is distinguished as the start
symbol, and the set of strings it denotes is the language generated by the grammar. P
is finite set of Productions; the productions of a grammar specify the manner in which the

Page 19 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

terminals and non terminals can be combined to form strings, each production is in α->β
form, where α is a single non terminal, β is (VUT)*.Each production consists of:
(a) A non terminal called the head or left side of the production; this
production defines some of the strings denoted by the head.
(b) The symbol ->. Some times: = has been used in place of the arrow.
(c) A body or right side consisting of zero or more terminals and non- terminals.
The components of the body describe one way in which strings of the non terminal at
the head can be constructed.

 Conventionally, the productions for the start symbol are listed first.
Example: Context Free Grammar to accept Arithmetic expressions.
The terminals are +, *, -, (,), id.
The Non terminal symbols are expression, term, factor and expression is the starting symbol.

expression expression + term


expression expression – term
expression term
term term * factor
term term / factor
term factor
factor ( expression )
factor id
Figure 2.3 : Grammar for Simple Arithmetic Expressions

Notational Conventions Used In Writing CFGs:


To avoid always having to state that ―these are the terminals," "these are the non
terminals," and so on, the following notational conventions for grammars will be used
throughout our discussions.

1. These symbols are terminals:


(a) Lowercase letters early in the alphabet, such as a, b, e.
(b) Operator symbols such as +, *, and so on.
(c) Punctuation symbols such as parentheses, comma, and so on.
(d) The digits 0, 1. . . 9.
(e) Boldface strings such as id or if, each of which represents a single
terminal symbol.

Page 20 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

2. These symbols are non terminals:


(a) Uppercase letters early in the alphabet, such as A, B, C.
(b) The letter S, which, when it appears, is usually the start symbol.
(c) Lowercase, italic names such as expr or stmt.
(d) When discussing programming constructs, uppercase letters may be used to represent
Non terminals for the constructs. For example, non terminal for expressions, terms,
and factors are often represented by E, T, and F, respectively.
Using these conventions the grammar for the arithmetic expressions can be written as
EE+T|E–T|T
TT*F|T/F|F
F (E) | id

DERIVATIONS:
The construction of a parse tree can be made precise by taking a derivational view, in
which productions are treated as rewriting rules. Beginning with the start symbol, each rewriting
step replaces a Non terminal by the body of one of its productions. This derivational view
corresponds to the top-down construction of a parse tree as well as the bottom construction of the
parse tree.

 Derivations are classified in to Let most Derivation and Right Most Derivations.

Left Most Derivation (LMD):


It is the process of constructing the parse tree or accepting the given input string, in
which at every time we need to rewrite the production rule it is done with left most non terminal
only.
Ex: - If the Grammar is E-> E+E | E*E | -E| (E) | id and the input string is id + id* id
The production E -> - E signifies that if E denotes an expression, then – E must also denote an
expression. The replacement of a single E by - E will be described by writing
E => -E which is read as “E derives _E”
For a general definition of derivation, consider a non terminal A in the middle of a
sequence of grammar symbols, as in αAβ, where α and β are arbitrary strings of grammar symbol.
Suppose A ->γ is a production. Then, we write αAβ => αγβ. The symbol => means "derives in
one step". Often, we wish to say, "Derives in zero or more steps." For this purpose, we can use
the symbol , If we wish to say, "Derives in one or more steps." We cn use the symbol
. If S a, where S is the start symbol of a grammar G, we say that α is asentential form
of G.
The Leftmost Derivation for the given input string id + id* id is
E => E +E

Page 21 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

=> id + E
=> id + E * E
=> id + id * E
=> id + id * id

NOTE: Every time we need to start from the root production only, the under line using at Non
terminal indicating that, it is the non terminal (left most one) we are choosing to rewrite the
productions to accept the string.

Right Most Derivation (RMD):


It is the process of constructing the parse tree or accepting the given input string, every
time we need to rewrite the production rule with Right most Non terminal only.
The Right most derivation for the given input string id + id* id is

E => E + E
=> E + E * E
=> E + E * id
=> E + id * id
=> id + id * id

NOTE: Every time we need to start from the root production only, the under line using at Non
terminal indicating that, it is the non terminal (Right most one) we are choosing to rewrite the
productions to accept the string.
What is a Parse Tree?
A parse tree is a graphical representation of a derivation that filters out the order in which
productions are applied to replace non terminals.
 Each interior node of a parse tree represents the application of a production.
 All the interior nodes are Non terminals and all the leaf nodes terminals.
 All the leaf nodes reading from the left to right will be the output of the parse tree.
 If a node n is labeled X and has children n1,n2,n3,…nk with labels X1,X2,…Xk
respectively, then there must be a production A->X1X2…Xk in the grammar.

Example1:- Parse tree for the input string - (id + id) using the above Context free Grammar is

Page 22 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Figure 2.4 : Parse Tree for the input string - (id + id)

The Following figure shows step by step construction of parse tree using CFG for the parse tree
for the input string - (id + id).

Figure 2.5 : Sequence outputs of the Parse Tree construction process for the input string –(id+id)

Example2:- Parse tree for the input string id+id*id using the above Context free Grammar is

Figure 2.6: Parse tree for the input string id+ id*id

Page 23 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

AMBIGUITY in CFGs:
Definition: A grammar that produces more than one parse tree for some sentence (input string)
is said to be ambiguous.
In other words, an ambiguous grammar is one that produces more than one leftmost
derivation or more than one rightmost derivation for the same sentence.
Or If the right hand production of the grammar is having two non terminals which are
exactly same as left hand side production Non terminal then it is said to an ambiguous grammar.
Example : If the Grammar is E-> E+E | E*E | -E| (E) | id and the Input String is id + id* id
Two parse trees for given input string are

(a)
(b)
Two Left most Derivations for given input String are :
E => E +E E => E * E
=> id + E => E+E*E
=> id + E * E => id + E * E
=> id + id * E => id+ id* E
=> id + id * id => id + id * id
(a) (b)

The above Grammar is giving two parse trees or two derivations for the given input string so, it
is an ambiguous Grammar
Note: LL (1) parser will not accept the ambiguous grammars or We cannot construct an
LL(1) parser for the ambiguous grammars. Because such grammars may cause the Top
Down parser to go into infinite loop or make it consume more time for parsing. If necessary
we must remove all types of ambiguity from it and then construct.
ELIMINATING AMBIGUITY: Since Ambiguous grammars may cause the top down Parser
go into infinite loop, consume more time during parsing.
Therefore, sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity. The
general form of ambiguous productions that cause ambiguity in grammars is

Page 24 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

A Aα | β

This can be written as (introduce one new non terminal in the place of second non terminal)
A β Aꞌ
Aꞌ α Aꞌ| ε
Example : Let the grammar is E E+E | E*E | -E| (E) | id . It is shown that it is ambiguous that
can be written as
E E+E
E E-E
E E*E
E -E
E (E)
E id
In the above grammar the 1st and 2nd productions are having ambiguity. So, they can be written
as
E-> E+E | E*E this production again can be written as
E-> E+E | β , where β is E*E
The above production is same as the general form. so, that can be written as
E->E+T|T
T->β

The value of β is E*E so, above grammar can be written as


1) E->E+T|T
2) T-> E*E The first production is free from ambiguity and substitute E->T in
the 2nd production then it can be written as
T-> T*T | -E| (E) | id this production again can be written as
T->T*T | β where β is -E| (E) | id, introduce new non terminal in the Right hand side
production then it becomes
T->T*F | F
F-> -E | (E) | id now the entire grammar turned in to it equivalent unambiguous,
The Unambiguous grammar equivalent to the given ambiguous one is
1) E E+T|T
2) T T*F|F
3) F -E | (E) | id

LEFT RECURSION:
Another feature of the CFGs which is not desirable to be used in top down parsers is left
recursion. A grammar is left recursive if it has a non terminal A such that there is a derivation
A=>Aα for some string α in (TUV)*. LL(1) or Top Down Parsers can not handle the Left
Recursive grammars, so we need to remove the left recursion from the grammars before being
used in Top Down Parsing.

Page 25 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

The General form of Left Recursion is

A Aα | β

The above left recursive production can be written as the non left recursive equivalent :
A βAꞌ
Aꞌ αAꞌ| €
Example : - Is the following grammar left recursive? If so, find a non left recursive grammar
equivalent to it.

E E+T|T
T T*F|F
F -E | (E) | id
Yes ,the grammar is left recursive due to the first two productions which are satisfying the
general form of Left recursion, so they can be rewritten after removing left recursion from
E → E + T, and T→ T * F is
E TE′
E′ +T E′ | €
T F T′
T′ *F T′ | €
F (E) | id

LEFT FACTORING:
Left factoring is a grammar transformation that is useful for producing a grammar suitable for
predictive or top-down parsing. A grammar in which more than one production has common
prefix is to be rewritten by factoring out the prefixes.
For example, in the following grammar there are n A productions have the common prefix α,
which should be removed or factored out without changing the language defined for A.

A αA1 | αA2 | αA3 |


αA4 |… | αAn

We can factor out the α from all n productions by adding a new A production A αA′
, and rewriting the A′ productions grammar as

A αA′
A′ A1|A2|A3|A4…|An
FIRST and FOLLOW:

Page 26 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

The construction of both top-down and bottom-up parsers is aided by two functions,
FIRST and FOLLOW, associated with a grammar G. During top down parsing, FIRST and
FOLLOW allow us to choose which production to apply, based on the next input (look a head)
symbol.

Computation of FIRST:
FIRST function computes the set of terminal symbols with which the right hand side of
the productions begin. To compute FIRST (A) for all grammar symbols, apply the following
rules until no more terminals or € can be added to any FIRST set.
1. If A is a terminal, then FIRST {A} = {A}.
2. If A is a Non terminal and A->X1X2…Xi
FIRST(A)=FIRST(X1) if X1is not null, if X1 is a non terminal and X1->€, add
FIRST(X2) to FIRST(A), if X2-> € add FIRST(X3) to FIRST(A), … if Xi->€ ,
i.e., all Xi‘s for i=1..i are null, add € FIRST(A).
3. If A ->€ is a production, then add € to FIRST (A).

Computation Of FOLLOW:
Follow (A) is nothing but the set of terminal symbols of the grammar that are
immediately following the Non terminal A. If a is to the immediate right of non terminal A, then
Follow(A)= {a}. To compute FOLLOW (A) for all non terminals A, apply the following rules
until no more symbols can be added to any FOLLOW set.

1. Place $ in FOLLOW(S), where S is the start symbol, and $ is the input right end
marker.
2. If there is a production A-> αBβ, then everything in FIRST (β) except € is in
FOLLOW(B).
3. If there is a production A->αB or a production A-> αBβ with FIRST(β) contains €,
then FOLLOW (B) = FOLLOW (A).

Example: - Compute the FIRST and FOLLOW values of the expression grammar
1. E TE′
2. E′ +TE′ | €
3. T FT′
4. T′ *FT′ | €
5. F (E) | id

Computing FIRST Values:


FIRST (E) = FIRST (T) = FIRST (F) = {(, id}
FIRST (E′) = {+, €}
FIRST (T′) = {*, €}

Page 27 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Computing FOLLOW Values:


FOLLOW (E) = { $, ), } Because it is the start symbol of the grammar.
FOLLOW (E′) = {FOLLOW (E)} satisfying the 3rd rule of FOLLOW()
={$,)}
FOLLOW (T) = { FIRST E′} It is Satisfying the 2nd rule.
U { FOLLOW(E′) }
= {+, FOLLOW (E′)}
= { +, $, ) }
FOLLOW (T′) = { FOLLOW(T)} Satisfying the 3rd Rule
= { +, $, ) }
FOLLOW (F) = { FIRST (T′) } It is Satisfying the 2nd rule.
U { FOLLOW(E′) }
= {*, FOLLOW (T)}
= { *, +, $, )}

NON TERMINAL FIRST FOLLOW

E { (, id } { $, ) }
E′ { +, € } { $, ) }
T { (, id} { +, $, ) }
T′ {*, €} { +, $, ) }
F { ( , id} { *, +, $, ) }
Table 2.1: FIRST and FOLLOW values
Constructing Predictive Or LL (1) Parse Table:
It is the process of placing the all productions of the grammar in the parse table based on the
FIRST and FOLLOW values of the Productions.
The rules to be followed to Construct the Parsing Table (M) are :
1. For Each production A-> α of the grammar, do the bellow steps.
2. For each terminal symbol ‗a‘ in FIRST (α), add the production A-> α to M [A,a].
3. i. If € is in FIRST (α) add production A->α to M [ A, b], where b is all terminals in
FOLLOW (A).
ii. If € is in FIRST(α) and $ is in FOLLOW (A) then add production A->α to
M [A, $].
4. Mark other entries in the parsing table as error .

INPUT SYMBOLS
NON-TERMINALS
+ * ( ) id $

Page 28 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

E TE′ E id
E
E′ +TE′ E′ € E′ €
E′
T FT′ T FT′
T
T′ € T′ *FT′ T′ € T′ €
T′
F (E) F id
F
Table 2.2: LL (1) Parsing Table for the Expressions Grammar
Note: if there are no multiple entries in the table for single a terminal then grammar is accepted
by LL(1) Parser.
LL (1) Parsing Algorithm:
The parser acts on basis on the basis of two symbols
i. A, the symbol on the top of the stack
ii. a, the current input symbol
There are three conditions for A and ‗a‘, that are used fro the parsing program.
1. If A=a=$ then parsing is Successful.
2. If A=a≠$ then parser pops off the stack and advances the current input pointer to the
next.
3. If A is a Non terminal the parser consults the entry M [A, a] in the parsing table. If
M[A, a] is a Production A-> X1X2..Xn, then the program replaces the A on the top of
the Stack by X1X2..Xn in such a way that X1 comes on the top.

STRING ACCEPTANCE BY PARSER:


If the input string for the parser is id + id * id, the below table shows how the parser
accept the string with the help of Stack.

Stack Input Action Comments


$E id+ id* id $ E TE` E on top of the stack is replaced by TE`
$E`T id+ id*id $ T FT` T on top of the stack is replaced by FT`
$E`T`F id+ id*id $ F id F on top of the stack is replaced by id
$E`T`id id+ id*id $ pop and remove id Condition 2 is satisfied
$E`T` +id*id$ T` € T` on top of the stack is replaced by €
$E` +id*id$ E` +TE` E` on top of the stack is replaced by +TE`
$E`T+ +id*id$ Pop and remove + Condition 2 is satisfied
$E`T id*id$ T FT` T on top of the stack is replaced by FT`
$E`T`F id*id$ F id F on top of the stack is replaced by id
$E`T`id id * id$ pop and remove id Condition 2 is satisfied

Page 29 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

$E`T` *id$ T` *FT` T` on top of the stack is replaced by *FT`


$E`T`F* *id$ pop and remove * Condition 2 is satisfied
$E`T`F id$ F id F on top of the stack is replaced by id
$E`T`id id$ Pop and remove id Condition 2 is satisfied
$E`T` $ T` € T` on top of the stack is replaced by €
$E` $ E` € E` on top of the stack is replaced by €
$ $ Parsing is successful Condition 1 satisfied
Table2.3 : Sequence of steps taken by parser in parsing the input token stream id+ id* id

Figure 2.7: Parse tree for the input id + id* id

ERROR HANDLING (RECOVERY) IN PREDICTIVE PARSING:


In table driven predictive parsing, it is clear as to which terminal and Non terminals the
parser expects from the rest of input. An error can be detected in the following situations:
1. When the terminal on top of the stack does not match the current input symbol.
2. when Non terminal A is on top of the stack, a is the current input symbol, and M[A,
a] is empty or error
The parser recovers from the error and continues its process. The following error recovery
schemes are use in predictive parsing:
Panic mode Error Recovery :
It is based on the idea that when an error is detected, the parser will skips the
remaining input until a synchronizing token is en countered in the input. Some examples are
listed below:
1. For a Non Terminal A, place all symbols in FOLLOW (A) are adde into the
synchronizing set of non terminal A. For Example, consider the assignment statement
―c=;‖ Here, the expression on the right hand side is missing. So the Follow of this is
considered. It is ―;‖ and is taken as synchronizing token. On encountering it, parser
emits an error message ―Missing Expression‖.
2. For a Non Terminal A, place all symbols in FIRST (A) are adde into the
synchronizing set of non terminal A. For Example, consider the assignment statement
―22c= a + b;‖ Here, FIRST (expr) is 22. It is ―;‖ and is taken as synchronizing token
and then the reports the error as ―extraneous token‖.

Page 30 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Phrase Level Recovery :


It can be implemented in the predictive parsing by filling up the blank entries in
the predictive parsing table with pointers to error Handling routines. These routines can
insert, modify or delete symbols in the input.
RECURSIVE DESCENT PARSING :
A recursive-descent parsing program consists of a set of recursive procedures, one for each non
terminal. Each procedure is responsible for parsing the constructs defined by its non terminal,
Execution begins with the procedure for the start symbol, which halts and announces success if
its procedure body scans the entire input string.
If the given grammar is
E TE′
E′ +TE′ | €
T FT′
T′ *FT′ | €
F (E) | id
Reccursive procedures for the recursive descent parser for the given grammar are given below.
procedure E( )
{
T( );
E′( );
}
procedure T ( )
{
F( );
T′( );
}
Procedure E′( )
{
if input =‗+‘
{
advance( );
T ( );
E′( );
return true;
}
else error;
}
procedure T′( )
{
if input =‗*‘
{
advance( );
F ( );

Page 31 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

T′( );
return true;
}
else return error;
}
procedure F( )
{
if input = ‗(‗
{
advance( );
E ( );
if input =
‗)‘ advance(
); return
true;
}
else if input = ―id‖
{

advance( );
return true;
}
else return error;
}
advance()
{
input = next token;
}

BACK TRACKING: This parsing method uses the technique called Brute Force method
during the parse tree construction process. This allows the process to go back (back track) and
redo the steps by undoing the work done so far in the point of processing.
Brute force method: It is a Top down Parsing technique, occurs when there is more
than one alternative in the productions to be tried while parsing the input string. It selects
alternatives in the order they appear and when it realizes that something gone wrong it tries with
next alternative.
For example, consider the grammar bellow.

S cAd
A ab | a
To generate the input string ―cad‖, initially the first parse tree given below is generated.
As the string generated is not ―cad‖, input pointer is back tracked to position ―A‖, to examine the
next alternate of ―A‖. Now a match to the input string occurs as shown in the 2 nd parse trees
given below.

Page 32 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

( 1) (2)

IMPORTANT AND EXPECTED QUESTIONS


1. Explain the components of working of a Predictive Parser with an example?
2. What do the FIRST and FOLLOW values represent? Give the algorithm for computing
FIRST n FOLLOW of grammar symbols with an example?
3. Construct the LL (1) Parsing table for the following grammar?
E E+T|T
T T*F
F (E) | id
4. For the above grammar construct, and explain the Recursive Descent Parser?
5. What happens if multiple entries occurring in your LL (1) Parsing table? Justify your
answer? How does the Parser
ASSIGNMENT QUESTIONS

1. Eliminate the Left recursion from the below grammar?


A->Aab|AcB|b
B-> Ba | d
2. Explain the procedure to remove the ambiguity from the given grammar with your own
example?
3. Write the grammar for the if-else statement in the C programming and check for the left
factoring?

4. Will the Predictive parser accept the ambiguous Grammar justify your answer?

5. Is the grammar G = { S->L=R, S->R, R->L, L->*R | id } an LL(1) grammar?

Page 33 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

BOTTOM-UP PARSING
Bottom-up parsing corresponds to the construction of a parse tree for an input string
beginning at the leaves (the bottom nodes) and working up towards the root (the top node). It
involves ―reducing an input string ‗w‘ to the Start Symbol of the grammar. in each reduction
step, a perticular substring matching the right side of the production is replaced by symbol on the
left of that production and it is the Right most derivation. For example consider the following
Grammar:
E E+T|T
T T*F
F (E)|id
Bottom up parsing of the input string “id * id “is as follows:

INPUT STRING SUB STRING REDUCING PRODUCTION


id*id Id F->id
F*id T F->T
T*id Id F->id
T*F * T->T*F
T T*F E->T
Start symbol. Hence, the input
E
String is accepted
Parse Tree representation is as follows:

Figure 3.1 : A Bottom-up Parse tree for the input String “id*id”

Bottom up parsing is classified in to 1. Shift-Reduce Parsing, 2. Operator Precedence parsing ,


and 3. [Table Driven] L R Parsing
i. SLR( 1 )
ii. CALR ( 1 )
iii.LALR( 1 )
SHIFT-REDUCE PARSING:
Shift-reduce parsing is a form of bottom-up parsing in which a stack holds grammar
symbols and an input buffer holds the rest of the string to be parsed, We use $ to mark the
Page 34 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo
bottom of the stack and also the right end of the input. And it makes use of the process of shift
and reduce actions to accept the input string. Here, the parse tree is Constructed bottom up from
the leaf nodes towards the root node.
When we are parsing the given input string, if the match occurs the parser takes the
reduce action otherwise it will go for shift action. And it can accept ambiguous grammars also.
For example, consider the below grammar to accept the input string ―id * id―, using S-R parser
E E+T|T
T T*F | F
F (E)|id
Actions of the Shift-reduce parser using Stack implementation

STACK INPUT ACTION


$ Id*id$ Shift
$id *id$ Reduce with F d
$F *id$ Reduce with T F
$T *id$ Shift
$T* id$ Shift
$T*id $ Reduce with F id
$T*F $ Reduce with T T*F
$T $ Reduce with E T
$E $ Accept

Page 35 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Consider the following grammar:


S aAcBe
A Ab|b
B d
Let the input string is ―abbcde‖. The series of shift and reductions to the start symbol are as
follows.
abbcde aAbcde aAcde aAcBe S
Note: in the above example there are two actions possible in the second Step, these are as
follows :
1. Shift action going to 3rd Step
2. Reduce action, that is A->b
If the parser is taking the 1st action then it can successfully accepts the given input string,
if it is going for second action then it can‘t accept given input string. This is called shift reduce
conflict. Where, S-R parser is not able take proper decision, so it not recommended for parsing.
OPERATOR PRECEDENCE PARSING:
Operator precedence grammar is kinds of shift reduce parsing method that can be applied to a
small class of operator grammars. And it can process ambiguous grammars also.
 An operator grammar has two important characteristics:
1. There are no € productions.
2. No production would have two adjacent non terminals.
 The operator grammar to accept expressions is give below:
E E+E / E E-E / E E*E / E E/E / E E^E / E -E / E (E) / E
id
Two main Challenges in the operator precedence parsing are:
1. Identification of Correct handles in the reduction step, such that the given input should be
reduced to starting symbol of the grammar.
2. Identification of which production to use for reducing in the reduction steps, such that we
should correctly reduce the given input to the starting symbol of the grammar.
Operator precedence parser consists of:
1. An input buffer that contains string to be parsed followed by a$, a symbol used to
indicate the ending of input.
2. A stack containing a sequence of grammar symbols with a $ at the bottom of the stack.
3. An operator precedence relation table O, containing the precedence ralations between the
pair of terminal. There are three kinds of precedence relations will exist between the pair
of terminal pair ‘a’ and ‘b’ as follows:
4. The relation a<•b implies that he terminal ‘a’ has lower precedence than terminal ‘b’.
5. The relation a•>b implies that he terminal ‘a’ has higher precedence than terminal ‘b’.
6. The relation a=•b implies that he terminal ‘a’ has lower precedence than terminal ‘b’.

Page 36 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

7. An operator precedence parsing program takes an input string and determines whether it
conforms to the grammar specifications. It uses an operator precedence parse table and
stack to arrive at the decision.
a1 a2 a3 ……….. $ Input Buffer

Operator precedence
Parsing Algorithm
Output

$
Stack

Operator Precedence Table

Figure3.2: Components of operator precedence parser

Example, If the grammar is

E E+E
E E-E
E E*E
E E/E
E E^E
E -E
E (E)
E id , Construct operator precedence table and accept input string “ id+id*id”

The precedence relations between the operators are


( id ) > ( ^ ) > ( * / ) > ( + - ) > $ , „^‟ operator is Right Associative and reaming all operators
are Left Associative
+ - * / ^ id ( ) $
+ •> •> <• <• <• <• <• •> •>
- •> •> <• <• <• <• <• •> •>
* •> •> •> •> <• <• <• •> •>
/ •> •> •> •> <• <• <• •> •>
^ •> •> •> •> <• <• <• •> •>
Id •> •> •> •> •> Err Err •> •>
( <• <• <• <• <• <• <• = Err
) •> •> •> •> •> Err Err •> •>
$ <• <• <• <• <• <• <• Err Err

Page 37 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo
The intention of the precedence relations is to delimit the handle of the given input String with <•
marking the left end of the Handle and •> marking the right end of the handle.
Parsing Action:
To locate the handle following steps are followed:
1. Add $ symbol at the both ends of the given input string.
2. Scan the input string from left to right until the right most •> is encountered.
3. Scan towards left over all the equal precedence‘s until the first <• precedence is
encountered.
4. Every thing between <• and •> is a handle.
5. $ on S means parsing is success.
Example, Explain the parsing Actions of the OPParser for the input string is “id*id” and the
grammar is:
E E+E
E E*E
E id
1. $ <• id •> *<• id•> $

The first handle is ‗id‘ and match for the ‗id ‗in the grammar is E id .
So, id is replaced with the Non terminal E. the given input string can be
written as
2. $ <• E •> *<• id•> $
The parser will not consider the Non terminal as an input. So, they are not
considered in the input string. So , the string becomes
3. $ <• *<• id•> $

The next handle is ‗id‘ and match for the ‗id ‗in the grammar is E id .
So, id is replaced with the Non terminal E. the given input string can be
written as
4. $ <• *<• E•> $
The parser will not consider the Non terminal as an input. So, they are not
considered in the input string. So, the string becomes
5. $ <• * •> $

The next handle is ‗*‘ and match for the ‗ ‗in the grammar is E E*E.
So, id is replaced with the Non terminal E. the given input string can be
written as
6. $ E $
The parser will not consider the Non terminal as an input. So, they are not
considered in the input string. So, the string becomes

Page 38 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

7. $ $
$ On $ means parsing successful.
Operator Parsing Algorithm:
The operator precedence Parser parsing program determines the action of the parser depending
on
1. ‗a‘ is top most symbol on the Stack
2. ‗b‘ is the current input symbol
There are 3 conditions for ‗a‘ and ‗b‘ that are important for the parsing program
1. a=b=$ , the parsing is successful
2. a <• b or a = b, the parser shifts the input symbol on to the stack and advances the
input pointer to the next input symbol.
3. a •> b, parser performs the reduce action. The parser pops out elements one by
one from the stack until we find the current top of the stack element has lower
precedence than the most recently popped out terminal.
Example, the sequence of actions taken by the parser using the stack for the input string ―id * id
― and corresponding Parse Tree are as under.

STACK INPUT OPERATIONS


$ id * id $ $ <• id, shift ‗id‘ in to stack
$ id *id $ id •> *, reduce ‗id‘ using E-> id
$E *id $ $ <• *, shift ‗*‘ in to stack
$E* id$ * <• id , shift ‗id‘ in to Stack
$E*id $ id •> $, reduce ‗id‘ using E->id
$E*E $ *•> $, reduce ‗*‘ using E->E*E
$E $ $=$=$, so parsing is successful
E

E * E

id id
Advantages and Disadvantages of Operator Precedence Parsing:
The following are the advantages of operator precedence parsing
1. It is simple and easy to implement parsing technique.
2. The operator precedence parser can be constructed by hand after understanding the
grammar. It is simple to debug.
The following are the disadvantages of operator precedence parsing:
1. It is difficult to handle the operator like ‗-‗which can be either unary or binary and hence
different precedence‘s and associativities.
2. It can parse only a small class of grammar.

Page 39 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

3. New addition or deletion of the rules requires the parser to be re written.


4. Too many error entries in the parsing tables.

LR Parsing:
Most prevalent type of bottom up parsing is LR (k) parsing. Where, L is left to right scan of the
given input string, R is Right Most derivation in reverse and K is no of input symbols as the
Look ahead.

 It is the most general non back tracking shift reduce parsing method

 The class of grammars that can be parsed using the LR methods is a proper superset of
the class of grammars that can be parsed with predictive parsers.

 An LR parser can detect a syntactic error as soon as it is possible to do so, on a left to


right scan of the input.

a1 a2 a3 ………. $ Input Buffer

LR PARSING ALGORTHM OUTPUT

Shift GOTO
Stack
LR Parsing Table

Figure 3.3: Components of LR Parsing


LR Parser Consists of
 An input buffer that contains the string to be parsed followed by a $ Symbol, used to
indicate end of input.
 A stack containing a sequence of grammar symbols with a $ at the bottom of the stack,
which initially contains the Initial state of the parsing table on top of $.
 A parsing table (M), it is a two dimensional array M[ state, terminal or Non terminal] and
it contains two parts

Page 40 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

1. ACTION Part
The ACTION part of the table is a two dimensional array indexed by state and the
input symbol, i.e. ACTION[state][input], An action table entry can have one of
following four kinds of values in it. They are:
1. Shift X, where X is a State number.
2. Reduce X, where X is a Production number.
3. Accept, signifying the completion of a successful parse.
4. Error entry.
2. GO TO Part
The GO TO part of the table is a two dimensional array indexed by state and a
Non terminal, i.e. GOTO[state][NonTerminal]. A GO TO entry has a state
number in the table.
 A parsing Algorithm uses the current State X, the next input symbol ‗a‘ to consult the
entry at action[X][a]. it makes one of the four following actions as given below:
1. If the action[X][a]=shift Y, the parser executes a shift of Y on to the top of the stack
and advances the input pointer.
2. If the action[X][a]= reduce Y (Y is the production number reduced in the State X), if
the production is Y->β, then the parser pops 2*β symbols from the stack and push Y
on to the Stack.
3. If the action[X][a]= accept, then the parsing is successful and the input string is
accepted.
4. If the action[X][a]= error, then the parser has discovered an error and calls the error
routine.
The parsing is classified in to
1. LR ( 0 )

2. Simple LR ( 1 )

3. Canonical LR ( 1 )

4. Look ahead LR ( 1 )

LR (1) Parsing: Various steps involved in the LR (1) Parsing:


1.
Write the Context free Grammar for the given input string
2.
Check for the Ambiguity
3.
Add Augment production
4.
Create Canonical collection of LR ( 0 ) items
5.
Draw DFA
6.
Construct the LR ( 0 ) Parsing table
7.
Based on the information from the Table, with help of Stack and Parsing algorithm
generate the output.
Augment Grammar

Page 41 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

The Augment Grammar G`, is G with a new starting symbol S` an additional production
S` S. this helps the parser to identify when to stop the parsing and announce the acceptance
of the input. The input string is accepted if and only if the parser is about to reduce by S` S. For
example let us consider the Grammar below:

E E+T|T
T T*F
F (E) | id the Augment grammar G` is Represented by

E` E
E E+T|T
T T*F
F (E) | id

NOTE: Augment Grammar is simply adding one extra production by preserving the actual
meaning of the given Grammar G.
Canonical collection of LR (0) items

LR (0) items
An LR (0) item of a Grammar is a production G with dot at some position on the right
side of the production. An item indicates how much of the input has been scanned up to a given
point in the process of parsing. For example, if the Production is X YZ then, The LR (0)
items are:
1. X •AB, indicates that the parser expects a string derivable from AB.
2. X A•B, indicates that the parser has scanned the string derivable from the A and
expecting the string from Y.
3. X AB•, indicates that he parser has scanned the string derivable from AB.
If the grammar is X € the, the LR (0) item is
X •, indicating that the production is reduced one.
Canonical collection of LR(0) Items:
This is the process of grouping the LR (0) items together based on the closure and Go to
operations

Closure operation
If I is an initial State, then the Closure (I) is constructed as follows:
1. Initially, add Augment Production to the state and check for the • symbol in the Right
hand side production, if the • is followed by a Non terminal then Add Productions
which are Stating with that Non Terminal in the State I.

Page 42 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

2. If a production X α•Aβ is in I, then add Production which are starting with X in the
State I. Rule 2 is applied until no more productions added to the State I( meaning that
the • is followed by a Terminal symbol).
Example :
0. E` E E` •E
1. E E+T LR (0) items for the Grammar is E • E+T
2. T F T •F
3. T T*F T • T*F
4. F (E) F • (E)
5. F id F • id

Closure (I0)State
Add E ` • E in I0 State
Since, the ‗•‘ symbol in the Right hand side production is followed by A Non
terminal E. So, add productions starting with E in to Io state. So, the state
becomes
E` •E
0. E •E+T
1. T •F
The 1st and 2nd productions are satisfies the 2nd rule. So, add productions
which are starting with E and T in I0
Note: once productions are added in the state the same production should
not added for the 2nd time in the same state. So, the state becomes
0. E` •E
1. E • E+T
2. T •F
3. T • T*F
4. F • (E)
5. F • id

GO TO Operation
Go to (I0, X), where I0 is set of items and X is the grammar Symbol on whichwe
are moving the „•‟ symbol. It is like finding the next state of the NFA for a give State I0 and the
input symbol is X. For example, if the production is E •E+T

Go to (I0, E) is E` •E, E E•+T

Note: Once we complete the Go to operation, we need to compute closure operation for the
output production

Page 43 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Go to (I0, E) is E E•+T,E` E. = Closure ({E` E•, E E•+T})

E`->.E E`-> E.
E->.E+T E E-> E.+T
T-> .T*F

Construction of LR (0) parsing Table:


Once we have Created the canonical collection of LR (0) items, need to follow the steps
mentioned below:
If there is a transaction from one state (Ii ) to another state(Ij ) on a terminal value then,
we should write the shift entry in the action part as shown below:

a States ACTION GO TO

A->α•aβ A->αa•β a $ A

Ii Sj
Ii Ij
Ij

er state (Ij ) on a Non terminal val ue


If there is a transaction from one state (Ii ) to another
then, we should write the subscript value of Ii in the GO TO part as shown below: part as shown
below:

States ACTION GO TO
A

A->α•Aβ A->αA•β a $ A

Ii j
Ii Ij
Ij

If there is one state (Ii), where there is one production which has no transitions. Then, the
production is said to be a reduced production. These productions should have reduced entry in
the Action part along with their production numbers. If the Augment production is reducing then,
write accept in the Action part.

1 A->αβ•

Page 44 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

For Example, Construct the LR (0) parsing Table for the given Grammar (G)
S aB
B bB | b
Sol: 1. Add Augment Production and insert „•‟ symbol at the first position for every
production in G
0. S′ •S
1. S •aB
2. B •bB
3. B •b
I0 State:
1. Add Augment production to the I0 State and Compute the Closure

I0 = Closure ( S′ •S)
Since ‗•‘ is followed by the Non terminal, add all productions starting with S in to I0 State. So,
the I0 State becomes
I0 = S′ •S
S •aB Here, in the S production ‗.‘ Symbol is followed by a terminal value so close
the state.
I1= Go to (I0, S)
S` S•
Closure( S` S•) = S′ S• Here, The Production is reduced so close the State.

I1= S′ S•

I2= Go to ( I0, a) = closure (S a•B)


Here, the ‗•‘ symbol is followed by The Non terminal B. So, add the productions which are
Starting B.
I2= B •bB
B •b Here, the ‗•‘ symbol in the B production is followed by the terminal value. So,
Close the State.

I2= S a•B
B •bB
B •b

Page 45 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

I3= Go to ( I2 ,B) = Closure ( S aB• ) = S aB•

I4= Go to ( I2 , b) = closure ({B b•B, B b•})


Add productions starting with B in I4.

B • bB
B •b The Dot Symbol is followed by the terminal value. So, close the State.

I4= B b•B
B • bB
B •b
B b•

I5= Go to (I2, b) =Closure (B b• ) = B b•

I6= Go to (I4, B) = Closure (B bB• ) = B bB•

I7 = Go to ( I4 , b) = I4
Drawing Finite State diagram DFA: Following DFA gives the state transitions of the parser
and is useful in constructing the LR parsing table.

S->aB •

S′->S•
S I3
I1 B
S′->•S

S->•aB
B->b•B B
a S->a•B b B->•bB
B->bB•
I0 B->•bB B->•b

B->•b B->b•
b
I5
I4
I2 I4

Page 46 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

LR Parsing Table:
ACTION GOTO
States
a B $ S B
I0 S2 1
I1 ACC
I2 S4 3
I3 R1 R1 R1
I4 R3 S4/R3 R3 5
I5 R2 R2 R2

Note: if there are multiple entries in the LR (1) parsing table, then it will not accepted by the
LR(1) parser. In the above table I3 row is giving two entries for the single terminal value ‗b‘ and
it is called as Shift- Reduce conflict.

Shift-Reduce Conflict in LR (0) Parsing: Shift Reduce Conflict in the LR (0) parsing
occurs when a state has
1. A Reduced item of the form A α• and
2. An incomplete item of the form A β•aα as shown below:

1 A->β•a α States Action GOTO


a
2 B->b• a $ A B
Ij
Ii Sj/r2 r2

Ii
Ij

Reduce - Reduce Conflict in LR (0) Parsing:


Reduce- Reduce Conflict in the LR (1) parsing occurs when a state has two or more
reduced items of the form
1. A α•
2. B β• as shown below:

Ii : States Action GOTO

1 A-> α• a $ A B

2 B->β• Ii r1/r2 r1/ r2

Page 47 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

SLR PARSER CONSTRUCTION: What is SLR (1) Parsing


Various steps involved in the SLR (1) Parsing are:

1. Write the Context free Grammar for the given input string
2. Check for the Ambiguity
3. Add Augment production
4. Create Canonical collection of LR ( 0 ) items
5. Draw DFA
6. Construct the SLR ( 1 ) Parsing table
7. Based on the information from the Table, with help of Stack and Parsing algorithm
generate the output.

SLR (1) Parsing Table Construction


Once we have Created the canonical collection of LR (0) items, need to follow the steps
mentioned below:

If there is a transaction from one state (Ii ) to another state(Ij ) on a terminal value then,
we should write the shift entry in the action part as shown below:

States ACTION GO TO
a
a $ A
A->α•aβ A->αa•β
Ii Sj
Ii Ij Ij

If there is a transaction from one state (Ii ) to another state (Ij ) on a Non terminal value
then, we should write the subscript value of Ii in the GO TO part as shown below: part as shown
below:

A States ACTION GO TO

A->α•Aβ A->αA•β a $ A

Ii j

Ij

Page 48 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Ii Ij

1 If there is one state (Ii), where there is one production (A->αβ•) which has no transitions
to the next State. Then, the production is said to be a reduced production. For all
terminals X in FOLLOW (A), write the reduce entry along with their production numbers.
If the Augment production is reducing then write accept.

1 S -> •aAb
2 A->αβ•
Follow(S) = {$}
Follow (A) = (b}

States ACTION GO TO
Ii
2 A->αβ• a b $ S A

Ii r2
Ii

SLR ( 1 ) table for the Grammar

S aB
B bB | b

Follow (S) = {$}, Follow (B) = {$}

ACTION GOTO
States
A b $ S B
I0 S2 1
I1 ACCEPT
I2 S4 3
I3 R1
I4 S4 R3 5
I5 R2

Note: When Multiple Entries occurs in the SLR table. Then, the grammar is not accepted by
SLR(1) Parser.
Conflicts in the SLR (1) Parsing :
When multiple entries occur in the table. Then, the situation is said to be a Conflict.

Page 49 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

Shift-Reduce Conflict in SLR (1) Parsing : Shift Reduce Conflict in the LR (1) parsing occurs
when a state has
1. A Reduced item of the form A α• and Follow(A) includes the terminal value
‗a‘.
2. An incomplete item of the form A β•aα as shown below:

1 A->β•a α
States Action GOTO
a
2 B->b•
Ij a $ A B

Ii Sj/r2
Ii

Reduce - Reduce Conflict in SLR (1) Parsing


Reduce- Reduce Conflict in the LR (1) parsing occurs when a state has two or more
reduced items of the form
1. A α•
2. B β• and Follow (A) ∩ Follow(B) ≠ null as shown below:
If The Grammar is
S-> αAaBa
A-> α
B-> β
Follow(S)= {$}
Follow(A)={a} and Follow(B)= {a}

1 A-> α• States Action GOTO

2 B->β• a $ A B

Ii r1/r2

Ii
Canonical LR (1) Parsing: Various steps involved in the CLR (1) Parsing:
1. Write the Context free Grammar for the given input string
2. Check for the Ambiguity

3. Add Augment production

Page 50 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

4. Create Canonical collection of LR ( 1 ) items

5. Draw DFA

6. Construct the CLR ( 1 ) Parsing table

7. Based on the information from the Table, with help of Stack and Parsing
algorithm generate the output.

LR (1) items :
The LR (1) item is defined by production, position of data and a terminal symbol. The
terminal is called as Look ahead symbol.
General form of LR (1) item is S->α•Aβ , $

A-> •γ, FIRST(β,$)

Rules to create canonical collection:


1. Every element of I is added to closure of I
2. If an LR (1) item [X-> A•BC, a] exists in I, and there exists a production B->b1b2…..,
then add item [B->• b1b2, z] where z is a terminal in FIRST(Ca), if it is not already
in Closure(I).keep applying this rule until there are no more elements adde.
For example, if the grammar is
S->CC
C->cC
C->d
The Canonical collection of LR (1) items can be created as follows:

0. S′->•S (Augment Production)


1. S->•CC
2. C->•cC
3. C->•d

I0 State : Add Augment production and compute the Closure, the look ahead symbol for the Augment
Production is $.

S′->•S, $= Closure(S′->•S, $)

The dot symbol is followed by a Non terminal S. So, add productions starting with S in I0
State.

S->•CC, FIRST ($), using 2nd rule

S->•CC, $

Page 51 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

The dot symbol is followed by a Non terminal C. So, add productions starting with C in I0
State.

C->•cC, FIRST(C, $)
C->•d, FIRST(C, $)

FIRST(C) = {c, d} so, the items are

C->•cC, c/d
C->•d, c/d

The dot symbol is followed by a terminal value. So, close the I0 State. So, the productions in the
I0 are

S′->•S ,
$ S->•CC ,
$
C->•cC, c/d
C->•d , c/d

I1 = Goto ( I0, S)= S′->S•,$

I2 = Go to (I0 , C)= Closure( S-> C•C, $)

S-> C->•cC , $
C->•d,$ So, the I2 State is

S->C•C,$
C->•cC ,
$ C->•d,$

I3= Goto(I0,c)= Closure( C->c•C, c/d)


C->•cC, c/d
C->•d , c/d So, the I3 State is

C->c•C, c/d
C->•cC, c/d
C->•d , c/d

I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d

I5 = Goto ( I2, C)= closure(S->CC•,$)= S->CC•, $

I6= Goto ( I2, c)= closure(C->c•C , $)=


C->•cC, $
C->•d , $ S0, the I6 State is

Page 52 of 152
Departmentof Computer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

C->c•C ,
$ C->•cC ,
$ C-
>•d,$

I7 = Go to (I2 , d)= Closure(C->d•,$ ) = C->d•,

$ Goto(I3, c)= closure(C->•cC, c/d)= I3.

I8= Go to (I3 , C)= Closure(C->cC•, c/d) = C->cC•, c/d

Go to (I3 , c)= Closure(C->c•C, c/d) = I3

Go to (I3 , d)= Closure(C->d•, c/d) = I4

I9= Go to (I6 , C)= Closure(C->cC• , $) = C->cC• , $


Go to (I6 , c)= Closure(C->c•C , $) = I6

Go to (I6 , d)= Closure(C->d•,$ ) = I7

Drawing the Finite State Machine DFA for the above LR (1) items

S->CC•, $
S′->S•,$

S I1 C I5 C->cC• , $

0 S′->•S , $ C->c•C , I9
S->C•C,$
1S->•CC,$ C C->•cC , c $ C- c
2C->•cC,c/d $ C- >•cC , $
3C->•d ,c/d >•d,$ d I6

I2 I6 I7

I0 c d

d
C->c•C, c/d C->d•, $

C->d•, c/d C->•cC,c/d C I7


I4 C->•d , c/d
d I3 c

I4 I3 I8
C->cC•, c/d

Page 53 of 152
Department ofComputer Science&Engineering Course File: CompilerDesign

Construction of CLR (1) Table


Rule1: if there is an item [A->α•Xβ,b] in Ii and goto(Ii,X) is in Ij then action [Ii][X]= Shift
j, Where X is Terminal.
Rule2: if there is an item [A->α•, b] in Ii and (A≠S`) set action [Ii][b]= reduce along with
the production number.
Rule3: if there is an item [S`->S•, $] in Ii then set action [Ii][$]= Accept.
Rule4: if there is an item [A->α•Xβ,b] in Ii and go to(Ii,X) is in Ij then goto [Ii][X]= j,
Where X is Non Terminal.

ACTION GOTO
States
c d $ S C
I0 S3 S4 1 2
I1 ACCEPT
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3 5
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2
Table : LR (1) Table

LALR (1) Parsing


The CLR Parser avoids the conflicts in the parse table. But it produces more number of
States when compared to SLR parser. Hence more space is occupied by the table in the memory.
So LALR parsing can be used. Here, the tables obtained are smaller than CLR parse table. But it
also as efficient as CLR parser. Here LR (1) items that have same productions but different look-
aheads are combined to form a single set of items.
For example, consider the grammar in the previous example. Consider the states I4 and I7
as given below:
I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d

I7 = Go to (I2 , d)= Closure(C->d•,$ ) = C->d•, $

These states are differing only in the look-aheads. They have the same productions. Hence these
states are combined to form a single state called as I47.

Similarly the states I3 and I6 differing only in their look-aheads as given below:
I3= Goto(I0,c)=

Page 54 of 152
Departmentof Computer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

C->c•C, c/d
C->•cC, c/d
C->•d , c/d

I6= Goto ( I2, c)=


C->c•C ,
$ C->•cC ,
$ C->•d,$

These states are differing only in the look-aheads. They have the same productions. Hence these
states are combined to form a single state called as I36.
Similarly the States I8 and I9 differing only in look-aheads. Hence they combined to form
the state I89.
ACTION GOTO
States
c d $ S C
I0 S36 S47 1 2
I1 ACCEPT
I2 S36 S47 5
I36 S36 S47 89
I47 R3 R3 R3 5
I5 R1
I89 R2 R2 R2

Table: LALR Table


Conflicts in the CLR (1) Parsing : When multiple entries occur in the table. Then, the
situation is said to be a Conflict.

Shift-Reduce Conflict in CLR (1) Parsing

Shift Reduce Conflict in the CLR (1) parsing occurs when a state has
3. A Reduced item of the form A α•, a and
4. An incomplete item of the form A β•aα as shown below:

1 A-> β•a α ,$
States Action GOTO
a
2 B->b• ,a
Ij a $ A B

Ii Sj/r2
Ii

Page 55 of 152
Department ofComputer Science&Engineering Course File: CompilerDesign

Reduce / Reduce Conflict in CLR (1) Parsing

Reduce- Reduce Conflict in the CLR (1) parsing occurs when a state has two or more
reduced items of the form
3. A α•
4. B β• If two productions in a state (I) reducing on same look ahead symbol
as shown below:

1 A-> α• ,a
States Action GOTO
2 B->β•,a
a $ A B

Ii r1/r2
Ii
String Acceptance using LR Parsing:
Consider the above example, if the input String is cdd
ACTION GOTO
States
c D $ S C
I0 S3 S4 1 2
I1 ACCEPT
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3 5
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2

0 S′->•S (Augment Production)


1 S->•CC
2 C->•cC
3 C->•d

STACK INPUT ACTION

$0 cdd$ Shift S3
$0c3 dd$ Shift S4
$0c3d4 d$ Reduce with R3,C->d, pop 2*β symbols from the stack
$0c3C d$ Goto ( I3, C)=8Shift S6

Page 56 of 152
Department ofComputer Science&Engineering Course File: CompilerDesign

$0c3C8 d$ Reduce with R2 ,C->cC, pop 2*β symbols from the stack
$0C d$ Goto ( I0, C)=2
$0C2 d$ Shift S7
$0C2d7 $ Reduce with R3,C->d, pop 2*β symbols from the stack
$0C2C $ Goto ( I2, C)=5
$0C2C5 $ Reduce with R1,S->CC, pop 2*β symbols from the stack
$0S $ Goto ( I0, S)=1
$0S1 $ Accept

Handing Ambiguous grammar

Ambiguity: A Grammar can have more than one parse tree for a string . For example, consider
grammar.

string string + string


| string - string
|0|1|.|9

String 9-5+2 has two parse trees

A grammar is said to be an ambiguous grammar if there is some string that it can generate in
more than one way (i.e., the string has more than one parse tree or more than one leftmost
derivation). A language is inherently ambiguous if it can only be generated by ambiguous
grammars.

For example, consider the following grammar:

string string + string


| string - string
|0|1|.|9

In this grammar, the string 9-5+2 has two possible parse trees as shown in the next slide.

Consider the parse trees for string 9-5+2, expression like this has more than one parse tree. The
two trees for 9-5+2 correspond to the two ways of parenthesizing the expression: (9-5)+2 and 9-
(5+2). The second parenthesization gives the expression the value 2 instead of 6.

Page 57 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

 Ambiguity is problematic because meaning of the programs can be incorrect

 Ambiguity can be handled in several ways

- Enforce associativity and precedence

- Rewrite the grammar (cleanest way)

There are no general techniques for handling ambiguity, but

. It is impossible to convert automatically an ambiguous grammar to an unambiguous one

Ambiguity is harmful to the intent of the program. The input might be deciphered in a way which
was not really the intention of the programmer, as shown above in the 9-5+2 example. Though
there is no general technique to handle ambiguity i.e., it is not possible to develop some feature
which automatically identifies and removes ambiguity from any grammar. However, it can be
removed, broadly speaking, in the following possible ways:-

1) Rewriting the whole grammar unambiguously.

2) Implementing precedence and associatively rules in the grammar. We shall discuss this
technique in the later slides.

If an operand has operator on both the sides, the side on which operator takes this operand is the
associativity of that operator

. In a+b+c b is taken by left +


. +, -, *, / are left associative
. ^, = are right associative

Grammar to generate strings with right associative operators right à letter = right | letter letter
a| b |.| z

A binary operation * on a set S that does not satisfy the associative law is called non-
associative. A left-associative operation is a non-associative operation that is conventionally
evaluated from left to right i.e., operand is taken by the operator on the left side.
For example,
6*5*4 = (6*5)*4 and not 6*(5*4)
6/5/4 = (6/5)/4 and not 6/(5/4)

A right-associative operation is a non-associative operation that is conventionally evaluated from


right to left i.e., operand is taken by the operator on the right side.

For example,

Page 58 of 152
Department ofComputer Science&Engineering CompilerDesign by Dr. Rashmi Ranjan Sahoo

6^5^4 => 6^(5^4) and not (6^5)^4)


x=y=z=5 => x=(y=(z=5))

Following is the grammar to generate strings with left associative operators. (Note that this is left
recursive and may go into infinite loop. But we will handle this problem later on by making it
right recursive)

left left + letter | letter


letter a | b |......| z

IMPORTANT QUESTIONS
1. Discuss the the working of Bottom up parsing and specifically the Operator
Precedence Parsing with an example?
2. What do you mean by an LR parser? Explain the LR (1) Parsing technique?
3. Write the differences between canonical collection of LR (0) items and LR (1) items?
4. Write the Difference between CLR (1) and LALR(1) parsing?
5. What is YACC? Explain how do you use it in constructing the parser using it.

ASSIGNMENT QUESTIONS

1. Explain the conflicts in the Shift reduce Parsing with an example?


2. E E+T|T
T T*F
F (E)|id , construct the LR(1) Parsing table? And explain the Conflicts?
3. E E+T|T
T T*F
F (E)|id , construct the SLR(1) Parsing table? And explain the Conflicts?
4. E E+T|T
T T*F
F (E)|id, construct the CLR(1) Parsing table? And explain the Conflicts?

5. E E+T|T
T T*F
F (E) |id, construct the LALR (1) Parsing table? And explain the Conflicts?

Page 59 of 152

You might also like