Unit-I Introduction To Compilers: CS6660-Compiler Design Department of CSE &IT 2016-2017
Unit-I Introduction To Compilers: CS6660-Compiler Design Department of CSE &IT 2016-2017
Unit-I Introduction To Compilers: CS6660-Compiler Design Department of CSE &IT 2016-2017
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 1 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
generators, Data flow engines.
12 Difference between phase and pass.
Phase: The compilation is a process into a series of sub processes are called as phase.
Pass: The collections of phase are called as pass.
13 What is Linear analysis?
The stream of characters making up the source program is read from left-to-right and
grouped into tokens that are sequence of characters having a collective meaning.
14 What is Cross complier? (May 2014)
There may be a complier which run on one machine and produces the target code for
another machine. Such a complier is called cross complier. Thus by using cross complier
technique platform independency can be achieved.
15 What is Symbol Table or Write the purpose of symbol table. (May 2014)
Symbol table is a datastructures containing a record for each identifier, with fields for the
attributes of the identifier. The data structure allows us to find the record for each identifier
quickly and to store or retrieve data from that record quickly.
16 Define Passes.
In an implementation of a compiler, portion of oneormore phases are combined into a
module called pass. A pass reads the source program or the output of the previous pass,
makes the transformation specified by its phases and writes output into an intermediate
file, which is read by subsequent pass.
17 State some software tools that manipulate source program?(May 2006).
Structure editors, Pretty printers, Static checkers, Interpreters
18 Difference Between Assembler, Compiler and Interpreter.
Assembler:
It is the Computer Program which takes the Computer Instructions and converts
them in to the bits that the Computer can understand and performs by certain
Operations.
Compiler:
It Converts High Level Language to Low Level Language.
It considers the entire code and converts it in to the Executable Code and runs the
code.
C, C++are Compiler based Languages.
Interpreter:
It converts the higher Level language to low level language or Assembly level
language. It converts to the language of 0' and 1's.
It considers single line of code and converts it to the binary language and runs the
code on the machine.
If it founds the error, the programs need to run from the beginning.
BASIC is the Interpreter based Language.
19 Define compiler-compiler.
Systems to help with the compiler-writing process are often been referred to as compiler-
compilers, compiler-generators or translator-writing systems.
Largely they are oriented around a particular model of languages, and they are suitable for
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 2 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
generating compilers of languages similar model.
20 What is Front-end and Back-end of the compiler?
Often the phases of a compiler are collected into a front-end and back-end.
1. Front-end consists of those phases that depend primarily on the source language
and largely independent of the target machine.
Back-end consists of those phases that depend on the target machine language and generally those
portions do not depend on the source language, just the intermediate language. In back end we use
aspects of code optimization, code generation, along with error handling and symbol table
operations.
21 Draw the diagram of the language processing system. (Nov 2006)
Illustrate diagrammatically how a language is processed. (May 2016)
22 What are the two parts of a compilation? Explain briefly. (May 2016)
There are two parts to compilation:
1. Analysis determines the operations implied by the source program which are
recorded in a tree structure.
2. Synthesis takes the tree structure and translates the operations therein into the
target program.
PART-B
1 What is a compiler? State various phases of a compiler and explain them in detail.(16)
What is a compiler? State various phases of a compiler and explain them in detail. (16)
COMPLIER: Complier is a program that reads a program written in one language the
source language- and translates it into an equivalent program in another language- the
target language.
In this translation process, the complier reports to its user the presence of the errors in the
source program.
The classifications of compiler: Single-pass compiler, Multi-pass compiler, Load and go
compiler, Debugging compiler, Optimizing compiler.
INTERPRETER: Interpreter is a language processor program that translates and executes
source code directly, without compiling it to machine code.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 3 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Each phase transforms the source program from one representation into another
representation.
They communicate with error handlers.
They communicate with the symbol table.
Lexical Analyzer
1. Lexical Analyzer reads the source program character by character and returns the
tokens of the source program.
2. A token describes a pattern of characters having same meaning in the source
program.(such as identifiers, operators, keywords, numbers, delimeters and so on)
Ex
Puts information about identifiers into the symbol table.
Regular expressions are used to describe tokens (lexical constructs).
A (Deterministic) Finite State Automaton can be used in the implementation of a lexical
analyzer.
Syntax Analyzer
1. A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the
given program.
2. A syntax analyzer is also called as a parser.
3. A parse tree describes a syntactic structure.
Example:
In a parse tree, all terminals are at leaves.
All inner nodes are non-terminals in a context free grammar.
Semantic Analyzer
1. It checks the source program for semantic errors and collects the type information
for the subsequent code-generation phase.
2. It uses hierarchical structure determined by the syntax-analysis phase to identify the
3. operators and operands of expressions and statements.
4. An important component of semantic analysis is type checking.
5. Normally semantic information cannot be represented by a context-free language
used in syntax analyzers.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 4 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
6. Context-free grammars used in the syntax analysis are integrated with attributes
(semantic rules)
the result is a syntax-directed translation,
Attribute grammars
Intermediate Code Generation
1. A compiler may produce an explicit intermediate code representing the source
program.
2. These intermediate codes are generally machine (architecture independent). But the
level of intermediate codes is close to the level of machine codes.
3. An intermediate form called "three-address code which is like the assembly
language for a machine in which every memory location can act like a register.
4. Three-address code consists of a sequence of instructions, each of which has at most
three operands.
Code Optimizer
1. The code optimizer optimizes the code produced by the intermediate code generator
in the terms of time and space.
2. The code optimization phase attempts to improve the intermediate code, so that
faster running machine code will result.
Optimization may involve:
Detection and removal of dead(unreachable) code
Calculation of constant expressions and terms
Collapsing of repeated expressions into temporary storage
Loop controlling
Moving code outside of loops
Removal of unnecessary temporary variables.
Code Generator
Produces the target language in a specific architecture.
The target program is normally is a re-locatable object file containing the machine codes.
This phase involves:
Allocation of registers and memory
Generation of correct references
Generation of correct types
Generation of machine code.
Symbol table
1. An essential function of a compiler is to record the identifiers used in the source
program and collect information about various attributes of each identifier.
2. A symbol table is a data structure containing a record for each identifier, with fields for
the attributes of the identifier.
3. The data structure allows us to find the record for each identifier quickly and to store or
retrieve data from that record quickly.
4. When an identifier in the source program is detected by the lexical analyzer, the
identifier is entered into the symbol table.
5. The attributes of an identifier cannot normally be determined during lexical analysis.
For example, in a Pascal declaration like var position* initial, rate : real ;
The type real is not known when position, initial, and rate are seen by the lexical
analyzer.
6. The remaining phases enter information about identifiers into the symbol table and then
use this information in various ways.
Error Detection and Reporting
1. Each compiler phase can encounter errors. However, after detecting an error, a
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 5 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
phase must somehow deal with that error, so that compilation can proceed, allowing
further errors in the source program to be detected.
2. The lexical phase can detect errors where the characters remaining in the input do
not form any token of the language.
3. Errors where the token stream violates the structure rules (syntax) of the language
are determined by the syntax analysis phase.
4. During semantic analysis, the compiler tries to detect constructs that have the right
syntactic structure but no meaning to the operation involved.
2 Explain the various phases of a compiler in detail. Also write down the output for the
following expression after each phase a:= b*c-d.(16)
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 6 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
3 Write in detail about the analysis cousins of the compiler.(16) (May 2013)
The cousins of a compiler are
1. Preprocessor
2. Assembler
3. Loader and Link-editor
PREPROCESSOR
A preprocessor is a program that processes its input data to produce output that is used as
input to another program. The output is said to be a preprocessed form of the input data,
which is often used by some subsequent programs like compilers. The preprocessor is
executed before the actual compilation of code begins, therefore the preprocessor digests all
these directives before any code is generated by the statements.
They may perform the following functions
1. Macro processing
2. File Inclusion
3. Rational Preprocessors
4. Language extension
1. Macro processing:
A macro is a rule or pattern that specifies how a certain input sequence (often a
sequence of characters) should be mapped to an output sequence (also often a
sequence of characters) according to a defined procedure.
The mapping processes that instantiates (transforms) a macro into a specific output
sequence is known as macro expansion. macro definitions (#define, #undef)
To define preprocessor macros we can use #define.
FORMAT:
#define identifier replacement
int table2[100];
2.File Inclusion:
Preprocessor includes header files into the program text. When the preprocessor
finds an #include directive it replaces it by the entire content of the specified file.
There are two ways to specify a file to be included:
#include "file"
#include <file>
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 7 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
3. Rational Preprocessors:
These processors augment older languages with more modern flow of control and data
structuring facilities.
For example, such a preprocessor might provide the user with built-in macros for constructs
like while-statements or if-statements, where none exist in the programming language itself.
4. Language extension:
These processors attempt to add capabilities to the language by what amounts to
built-in macros.
For example, the language equal is a database query language embedded in C.
Statements begging with ## are taken by the preprocessor to be database access
statements unrelated to C and are translated into procedure calls on routines that
perform the database access.
ASSEMBLER
There are two types of assemblers based on how many passes through the source are
needed to produce the executable program.
One-pass assemblers go through the source code once and assumes that all symbols
will be defined before any instruction that references them.
Two-pass assemblers create a table with all symbols and their values in the first pass,
then use the table in a second pass to generate code. The assembler must at least be
able to determine the length of each instruction on the first pass so that the addresses
of symbols can be calculated.
A loader is the part of an operating system that is responsible for loading programs, one of
the essential stages in the process of starting a program.
Loading a program involves reading the contents of executable file, the file containing the
program text, into memory, and then carrying out other required preparatory tasks to
prepare the executable for running.
Once loading is complete, the operating system starts the program by passing control to the
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 8 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
loaded program code.
4 (i)Describe how various phases could be combined as a pass in a compiler?(8)(May 2008)
Logically each phase is viewed as a separate program that reads input and produced
output for the next phase. In practice, some phases are combined.
Front and Back Ends
Modern compilers contain two parts, each which is often subdivided. These two parts are
the front end and back end.
The front end consists of those phases, or parts of phases, that depends primarily on
the source language and is largely independent of the target machine.
These normally include lexical and syntactic analysis, the creation of the symbol
table, semantic analysis, and the generation of intermediate code.
A certain amount of code optimization can be done by the front end as well. The
front end also includes the error handling that goes along with each of these phases.
The back end includes those portions of the compiler that depend on the target
machine, and generally, these portions do not depend on the source language, just
the intermediate language.
In the back end, we find aspects of the code optimization phase, and we find code
generation, along with the necessary error handling and symbol-table operations.
Passes
Several phases of compilation are usually implemented in a single pass consisting of
reading an input file and writing an output file.
In practice, there is great variation in the way the phases of a compiler are grouped
into passes, so we prefer to organize our discussion of compiling around phases
rather than passes.
It is common for several phases to be grouped into one pass, and for the activity of
these phases to be interleaved during the pass.
For example, lexical analysis, syntax analysis, semantic analysis, and intermediate
code generation might be grouped into one pass.
If so, the token stream after lexical analysis may be translated directly into
intermediate code.
Reducing the Number of Passes
It is desirable to have relatively few passes, since it takes time to read and write
intermediate files.
If we group several phases into one pass, we may be forced to keep the entire
program in memory, because one phase may need information in a different order
than a previous phase produces it.
The internal form of the program may be considerably larger than either the source
program or the target program, so this space may not be a trivial matter.
For some phases, grouping into one pass presents few problems.
For example, as we mentioned above, the interface between the lexical and syntactic
analyzers can often be limited to a single token.
(ii) Describe the following software tools i) Structure Editors ii) Pretty printers iii)
Interpreters (8)
Structure Editor: A structure editor takes as input a sequence of commands to build a
source program.
The structure editor not only performs the text-creation and modification function of
an ordinary text editor, but it also analyzes the program text, putting an appropriate
hierarchical structure on the source program.
Pretty printer: A pretty printer analyzes a program and prints it in such a way that
the structure of the program becomes clearly visible, for example comment,
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 9 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
indentation.
Static checker: A static checker reads a program, analyzes it and attempts to discover
potential bugs without running the program.
Interpreter: Instead of producing a target program as a translation, an interpreter performs the
operation implied by the source program.
5 Elaborate on grouping of phases in a compiler.
Logically each phase is viewed as a separate program that reads input and produced
output for the next phase. In practice, some phases are combined.
Front and Back Ends
Modern compilers contain two parts, each which is often subdivided. These two parts are
the front end and back end.
The front end consists of those phases, or parts of phases, that depends primarily on
the source language and is largely independent of the target machine.
These normally include lexical and syntactic analysis, the creation of the symbol
table, semantic analysis, and the generation of intermediate code.
A certain amount of code optimization can be done by the front end as well. The
front end also includes the error handling that goes along with each of these phases.
The back end includes those portions of the compiler that depend on the target
machine, and generally, these portions do not depend on the source language, just
the intermediate language.
In the back end, we find aspects of the code optimization phase, and we find code
generation, along with the necessary error handling and symbol-table operations.
Passes
Several phases of compilation are usually implemented in a single pass consisting of
reading an input file and writing an output file.
It is common for several phases to be grouped into one pass, and for the activity of
these phases to be interleaved during the pass.
For example, lexical analysis, syntax analysis, semantic analysis, and intermediate
code generation might be grouped into one pass.
If so, the token stream after lexical analysis may be translated directly into
intermediate code.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 10 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Reducing the Number of Passes
It is desirable to have relatively few passes, since it takes time to read and write
intermediate files.
If we group several phases into one pass, we may be forced to keep the entire
program in memory, because one phase may need information in a different order
than a previous phase produces it.
The internal form of the program may be considerably larger than either the source
program or the target program, so this space may not be a trivial matter.
For some phases, grouping into one pass presents few problems. For example, as we mentioned
above, the interface between the lexical and syntactic analyzers can often be limited to a single token.
6 What is difference between a phase and pass of a compiler? Explain machine dependent
and machine independent phase of compiler.
Phase and Pass are two terms used in the area of compilers.
A pass is a single time the compiler passes over (goes through) the sources code or
some other representation of it.
Typically, most compilers have at least two phases called front end and back end,
while they could be either one-pass or multi-pass.
Phase is used to classify compilers according to the construction, while pass is used to
classify compilers according to how they operate
7 Describe the various phases of a compiler and tract it with the program segment
(position=initial + rate * 60) (16)(May 2016)
Explain the phases of a compiler (Refer Q.No:1)
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 11 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
The high-level language is converted into binary language in various phases. A compiler is
a program that converts high-level language to assembly language. Similarly, an assembler
is a program that converts the assembly language to machine-level language.
Steps to execute C compiler, in a host machine:
First writes a program in C language (high-level language).
The C compiler, compiles the program and translates it to assembly program (low-
level language).
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 12 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
An assembler then translates the assembly program into machine code (object).
A linker tool is used to link all the parts of the program together for execution
(executable machine code).
A loader loads all of them into memory and then the program is executed.
1. Preprocessor
A preprocessor, generally considered as a part of compiler, is a tool that produces input
for compilers.
It deals with macro-processing, augmentation; file inclusion, language extension, etc.
2. Interpreter
An interpreter, like a compiler, translates high-level language into low-level machine
language. The difference lies in the way they read the source code or input.
A compiler reads the whole source code at once, creates tokens, checks semantics,
generates intermediate code, executes the whole program and may involve many
passes.
In contrast, an interpreter reads a statement from the input, converts it to an
intermediate code, executes it, then takes the next statement in sequence.
If an error occurs, an interpreter stops execution and reports it, whereas a compiler
reads the whole program even if it encounters several errors.
3. Assembler
An assembler translates assembly language programs into machine code.
The output of an assembler is called an object file, which contains a combination of
machine instructions as well as the data required to place these instructions in memory.
4. Linker
Linker is a computer program that links and merges various object files together in
order to make an executable file.
All these files might have been compiled by separate assemblers.
The major task of a linker is to search and locate referenced module/routines in a
program and to determine the memory location where these codes will be loaded,
making the program instruction to have absolute references.
5. Loader
Loader is a part of operating system and is responsible for loading executable files into
memory and execute them.
It calculates the size of a program (instructions and data) and creates memory space for
it. It initializes various registers to initiate execution.
6. Cross-compiler
A compiler that runs on platform (A) and is capable of generating executable code for
platform (B) is called a cross-compiler.
7. Source-to-source Compiler
A compiler that takes the source code of one programming language and translates it
into the source code of another programming language is called a source-to-source
compile
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 13 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
These normally include lexical and syntactic analysis, the creation of symbol table,
semantic analysis and intermediate code generation.
The back end includes portions of the compiler that depend on the target machine.
This includes part of code optimization and code generation.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 14 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
The basic organization of the resulting lexical analyzer is in effect a finite automaton.
3. Syntax-directed translation engines: These produce collections of routines that walk
the parse tree, generating intermediate code.
The basic idea is that one or more "translations" are associated with each node of the
parse tree, and each translation is defined in terms of translations at its neighbor nodes in
the tree.
4. Automatic code generators: Such a tool takes a collection of rules that define the
translation of each operation of the intermediate language into the machine language for
the target machine.
The rules must include sufficient detail that we can handle the different possible access
methods for data; e.g.variables may be in registers, in a fixed (static) location in memory,
or may be allocated a position on a stack. The basic technique is "template matching.
5. Data-flow engines: Much of the information needed to perform good code
optimization involves "data-flow analysis," the gathering of information about how
values are transmitted from one part of a program to each other part. Different tasks of
this nature can be performed by essentially the same routine, with the user supplying
details of the relationship between intermediate code statements and the information
being gathered.
(ii)Explain the various errors encountered in different phases of compiler.(8)
Refer Previous Answer
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 15 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
whitespaces and comments. It generates a stream of tokens from the input. This is
modelled through regular expressions and the structure is recognized through finite state
automata.
6 What are the possible error recovery actions in a lexical analyzer? (May-2012),(May 2015)
i. Panic mode recovery.
Delete successive characters from the remaining input until the lexical analyzer can find a
well-formed token. This technique may confuse the parser.
ii. Other possible error recovery actions:
Deleting an extraneous characters
Inserting missing characters.
Replacing an incorrect character by a correct character.
Transposing two adjacent characters
7 Define Tokens, Patterns and Lexemes (or) Define Lexeme. (May 2013&2014)
Tokens: A token is a syntactic category. Sentences consist of a string of tokens. For
example number, identifier, keyword, string etc are tokens
Lexemes: Sequence of characters in a token is a lexeme. For example 100.01, counter,
const, "How are you?" etc are lexemes.
Patterns: Rule of description is a pattern. For example letter (letter | digit)* is a pattern to
symbolize a set of strings which consist of a letter followed by a letter or digit.
8 Write short note on input buffering. (or).Why is buffering used in lexical analysis ?
What are the commonly used buffering methods? (May 2014)
Lexical analyzer scans the sources program line by line. For storing the input string, which
is to be read, the lexical analyzer makes use of input buffer. The lexical analyzer maintains
two pointer forward and backward pointers for scanning the input string. There are two
types of input buffering schemes- one buffer scheme and two buffer scheme.
9 What are the drawbacks of using buffer pairs?
This buffering scheme works quite well most of the time but with it amount of look
ahead is limited.
Limited look ahead makes it impossible to recognize tokens in situations where the
distance, forward pointer must travel is more than the length of buffer.
10 Define regular expressions. (or) Write a regular expression for an identifier.
Regular expression is used to define precisely the statements and expressions in the
source language. For .g. in Pascal the identifiers is denotes in the form of regular expression
as letter letter(letter|digit)*.
11 What are algebraic properties of regular expressions?
The algebraic law obeyed by regular expressions are called algebraic properties of regular
expression. The algebraic properties are used to check equivalence of two regular
expressions.
S.No Properties Meaning
1 r1|r2=r2|r1 | is commutative
2 r1|(r1|r3)=(r1|r2)|r3 | is associative
3 (r1 r2)r3=r1(r2 r3) Concatenation is associative
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 16 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
4 r1(r2|r3)=r1 r2|r1 r3 Concatenation is distributive
(r2|r3) r1=r2 r1|r3 r1 over |
5 r=r=r is identity
6 r*=(r|)* Relation between and *
7 r**=r* *is idempotent
12 What is meant by recognizer?
It is a part of LEX analyzer that identifies the presence of a token on the input is a
recognizer for the language defining that token. It is the program, which automatically
recognizes the tokens. A recognizer for a language L is a program that takes an input
string x and response yes if x is sentence of L and no otherwise.
13 List the operations on languages. (May 2016)
Union L U M ={s | s is in L or s is in M}
Concatenation LM ={st | s is in L and t is in M}
Kleene Closure L* (zero or more concatenations of L)
Positive Closure L+ ( one or more concatenations of L)
14 Mention the various notational shorthands for representing regular expressions.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 17 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
F Set of accepting states or final states (F is subset of Q)
Transition function defined as : Q Q
18 What are the conditions to satisfy for NFA?
(i) NFA should have one start state.
(ii) NFA may have one or more accepting states.
(iii) -Transitions may present on NFA.
(iv) There can be more than transitions, on same input symbol from any one state.
(v) In NFA, from any state S
There can be at most 2 outgoing -transitions.
There can be only one transition on real input symbol.
On real input transition it should reach the new state only.
19 Write a short note on LEX.
A LEX source program is a specification of lexical analyzer consisting of set of regular
expressions together with an action for each regular expression. The action is a piece of
code, which is to be executed whenever a token specified by the corresponding regular
expression is recognized. The output of a LEX is a lexical analyzer program constructed
from the LEX source specification.
20 Define -closure?
- closure of a given state q is defined as the set of all states ,that can be reached from q
on a path labeled by .
21 How to recognize tokens.
Consider the following grammar and try to construct an analyzer that will return <token,
attribute> pairs.
relop < | = | = | <> | = | >
id letter (letter | digit)*
num digit+ ('.' digit+)? (E ('+' | '-')? digit+)?
delim blank | tab | newline
ws delim+
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 18 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
23 What is the need for separating the analysis phase into lexical analysis and parsing?
Separation of lexical analysis from syntax allows the simplified design.
Efficiency of complier gets increased. Reading of input source file is a time consuming
process and if it is been done once in lexical analyzer then efficiency get increased.
Lexical analyzer uses buffering techniques to improve the performances.
Input alphabet peculiarities and other device specific anomalies can be restricted to the
lexical analyzer.
24 Draw a transition diagram to represent relational operators.
The relational operators are: <,<=,>,>=,=,!=
25 State any reasons as to why phases of compiler should be grouped. (May 2014)
By keeping the same front end and attaching different back ends one can produce a
compiler for same source language on different machine.
By keeping different front ends and same back end one can compile several different
language on the same machine.
26 Draw a NFA for a*|b*. (May 2004,Nov 2005)
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 19 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
states at once
The transition function returns only The transition function returns zero, one or
one state.(i.e) : Q X Q more states. (i.e) : Q X 2 Q
Construction of DFA is difficult Construction of NFA is easy compared to DFA.
compared to NFA (Take more space) (Takes less space)
Implementation of DFA is easy Implementation of NFA is difficult compared
compared to NFA. (speeds up to DFA. (slow computation process)
computation)
28 What is the time and space complexity of NFA and DFA.
Automatio Space Time
n
NFA O(|r|) O(|r|*|r|)
DFA O(2|r|) O(|x|)
29 Write a regular definition to represent date and time in the following format: MONTH
DAY YEAR. (May 2015)
date = day month year
day = ([FWS] 1*2DIGIT FWS) / obs-day
month = "Jan" / "Feb" / "Mar" / "Apr" /"May" / "Jun" /
"Jul" / "Aug" /"Sep" / "Oct" / "Nov" / "Dec"
year = (FWS 4*DIGIT FWS) / obs-year
30 Write a grammar for branching statements. (May 2016)
PART-B
1 Explain in detail about lexical analyzer generator.
Lexical analyzers tokenize input streams
Tokens are the terminals of a language
English
words, punctuation marks,
Programming language
Identifiers, operators, keywords,
Regular expressions define terminals/tokens
Typical program style
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 20 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
The simplest kind of statement is an expression statement, which is an
expression followed by a semicolon. Expressions are further composed
of operators, objects (variables), and constants.
C source code consists of several lexical elements. Some are words, such
as for, return, main, and i, which are either keywords of the language
(for, return) or identifiers (names) we've chosen for our own functions and
variables (main, i).
There are constants such as 1 and 10 which introduce new values into the
program. There are operators such as =, +, and >, which manipulate
variables and values.
Transition diagram.
A useful intermediate step between regular expressions and lexer
Shows actions taken by lexer
Move from position to position as characters are read, advancing input
(lookahead) pointer
A graph in which nodes are states of lexer, edges are input characters
Special states:
Start : no input
Accept : done / success
Example: scanning >= or >
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 21 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
2 Explain about input buffering.(8)
Input Buffering:
o Some efficiency issues concerned with the buffering of input.
o A two-buffer input scheme that is useful when lookahead on the input is
necessary to identify tokens.
o Techniques for speeding up the lexical analyser, such as the use of sentinels
to mark the buffer end.
o There are three general approaches to the implementation of a lexical
analyser:
1. Use a lexical-analyser generator, such as Lex compiler to produce the
lexical analyser from a regular expression based specification. In this, the
generator provides routines for reading and buffering the input.
2. Write the lexical analyser in a conventional systems-programming
language, using I/O facilities of that language to read the input.
3. Write the lexical analyser in assembly language and explicitly manage the
reading of input.
Buffer pairs:
o Because of a large amount of time can be consumed moving characters,
specialized buffering techniques have been developed to reduce the amount
of overhead required to process an input character. The scheme to be
discussed:
o Consists a buffer divided into two N-character halves.
o In this, most of the time it performs only one test to see whether forward
points to an eof.
o Only when it reach the end of the buffer half or eof, it performs more tests.
o Since N input characters are encountered between eofs, the average number
of tests per input character is very close to 1.
3 Explain in detail about the role of Lexical analyzer with the possible error recovery
actions.(16) (May 2013)
Lexical analysis is the first phase of a compiler. It takes the modified source code from
language preprocessors that are written in the form of sentences.
The lexical analyzer breaks these syntaxes into a series of tokens, by removing any
whitespace or comments in the source code.
If the lexical analyzer finds a token invalid, it generates an error.
The lexical analyzer works closely with the syntax analyzer.
It reads character streams from the source code, checks for legal tokens, and passes the
data to the syntax analyzer when it demands.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 23 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Up on receiving a get next token command from the parser, the lexical analyzer reads
input characters until it can identify the next token.
Its secondary tasks are,
One task is stripping out from the source program comments and white space is in
the form of blank, tab, new line characters.
Another task is correlating error messages from the compiler with the source
program.
Sometimes lexical analyzer is divided in to cascade of two phases.
1) Scanning
2) lexical analysis.
The scanner is responsible for doing simple tasks, while the lexical analyzer proper does
the more complex operations.
4 Describe the specification of tokens and how to recognize the tokens (16) (May 2013)
Specification of Tokens
An alphabet or a character class is a finite set of symbols. Typical examples of symbols are
letters and characters.
The set {0, 1} is the binary alphabet. ASCII and EBCDIC are two examples of computer
alphabets.
Strings
A string over some alphabet is a finite sequence of symbol taken from that alphabet.
For example, banana is a sequence of six symbols (i.e., string of length six) taken from
ASCII computer alphabet. The empty string denoted by , is a special string with zero
symbols (i.e., string length is 0).
If x and y are two strings, then the concatenation of x and y, written xy, is the string
formed by appending y to x.
For example, If x = dog and y = house, then xy = doghouse. For empty string, , we have S
= S = S.
String exponentiation concatenates a string with itself a given number of times:
S2 = SS or S.S
S3 = SSS or S.S.S
S4 = SSSS or S.S.S.S and so on
By definition S0 is an empty string, , and S` = S. For example, if x =ba and na then xy2 =
banana.
Languages
A language is a set of strings over some fixed alphabet. The language may contain a finite
or an infinite number of strings.
Let L and M be two languages where L = {dog, ba, na} and M = {house, ba} then
Union: LUM = {dog, ba, na, house}
Concatenation: LM = {doghouse, dogba, bahouse, baba, nahouse, naba}
Expontentiation: L2 = LL
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 24 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
By definition: L0 ={ } and L` = L
The kleene closure of language L, denoted by L*, is "zero or more Concatenation of" L.
L* = L0 U L` U L2 U L3 . . . U Ln . . .
For example, If L = {a, b}, then
L* = { , a, b, aa, ab, ab, ba, bb, aaa, aba, baa, . . . }
The positive closure of Language L, denoted by L+, is "one or more Concatenation of" L.
L+ = L` U L2 U L3 . . . U Ln . . .
For example, If L = {a, b}, then
L+ = {a, b, aa, ba, bb, aaa, aba, . . . }
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 25 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
entry on any input. It means that each entry in the transition table is a single state
(as oppose to set of states in NFA).
Because of single transition attached to each state, it is vary to determine whether a
DFA accepts a given input string.
{definitions}
%%
{transition rules}
%%
{user subroutines}
Lex Predefined Variables
yytext -- a string containing the lexeme
yyleng -- the length of the lexeme
yyin -- the input stream pointer
the default input of default main() is stdin
yyout -- the output stream pointer
the default output of default main() is stdout.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 26 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
E.g.
[a-z]+ printf(%s, yytext);
[a-z]+ ECHO;
[a-zA-Z]+ {words++; chars += yyleng;}
yylex()
The default main() contains a call of yylex()
yymore()
return the next token
yyless(n)
retain the first n characters in yytext
yywarp()
is called whenever Lex reaches an end-of-file
The default yywarp() always returns 1
6 Prove that the following two regular expressions are equivalent by showing that
minimum state DFAs are same.
(i)(a|b)*#
(ii)(a*|b*)*#
(Refer Class Notes)
7(i) Describe the error recovery schemes in the lexical phase of a compiler.(8) (April / May
2015)
Panic mode
When a parser encounters an error anywhere in the statement, it ignores the rest of the
statement by not processing input from erroneous input to delimiter, such as semi-colon.
This is the easiest way of error-recovery and also, it prevents the parser from developing
infinite loops.
Statement mode
When a parser encounters an error, it tries to take corrective measures so that the rest of
inputs of statement allow the parser to parse ahead. For example, inserting a missing
semicolon, replacing comma with a semicolon etc. Parser designers have to be careful here
because one wrong correction may lead to an infinite loop.
Error productions
Some common errors are known to the compiler designers that may occur in the code. In
addition, the designers can create augmented grammar to be used, as productions that
generate erroneous constructs when these errors are encountered.
Global correction
The parser considers the program in hand as a whole and tries to figure out what the
program is intended to do and tries to find out a closest match for it, which is error-free.
When an erroneous input (statement) X is fed, it creates a parse tree for some closest error-
free statement Y. This may allow the parser to make minimal changes in the source code,
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 27 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
but due to the complexity (time and space) of this strategy, it has not been implemented in
practice yet.
7 Draw the transition diagram for unsigned numbers.(8)
(ii)
8 Construct the NFA from the (a/b)*a(a/b) using Thompsons construction algorithm
May/June 2007
(4) Given a regular expression for R and S, assume these boxes represent the finite
automata for R and S:
9 Give the minimized DFA for the following expression.(10) (Nov 2006,2007)
(a/b)*abb
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 28 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 29 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 30 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
10 Write LEX specifications and necessary C code that reads English words from a text file
and response every occurrence of the sub string abc with ABC. The program should also
compute number of characters, words and lines read. It should not consider and count any
lines(s) that begin with a symbol #
%{
#include <stdio.h>
#include <stdlib.h>
int cno = 0, wno = 0, lno = 0; /*counts of characters, words and lines */
%}
character [a-z]
digit [0-9]
word ({character}|{digit})+[^({character}|{digit})]
line \n
%%
{line} { lno++; REJECT; }
{word} { wno++; REJECT; }
{character} { cno++; }
%%
void main()
{ yylex();
fprintf(stderr, "Number of characters: %d; Number of words: %d; Number of lines:
%d\n", cno, wno, lno);
return;
}
11 Write an algorithm to construct an NFA from a regular expression. (Nov 2010)
(i) INPUT: A regular expression r over alphabet S.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 31 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
OUTPUT: An NFA N accepting L(r).
METHOD: Begin by parsing r into its constituent subexpressions. The rules for
constructing an NFA consist of basis rules for handling subexpressions with no operators,
and inductive rules for constructing larger NFA's from the NFA's for the immediate
subexpressions of a given expression.
Here, i is a new state, the start state of this NFA, and / is another new state, the accepting
state for the NFA.
For any subexpression a in S, construct the NFA
where again i and / are new states, the start and accepting states, respectively. Note that in
both of the basis constructions, we construct a distinct NFA, with new states, for every
occurrence of e or some o as a subexpression of r.
INDUCTION: Suppose N(s) and N(t) are NFA's for regular expressions s and t,
respectively.
a) Suppose r = s\t. Then N(r), the NFA for r, is constructed as in the following figure. Here,
i and / are new states, the start and accepting states of N(r), respectively. There are e-
transitions from i to the start states of N(s) and N(t), and each of their accepting states have
e-transitions to the accepting state /. Note that the accepting states of N(s) and N(t) are not
accepting in N(r). Since any path from i to / must pass through either N(s) or N(t)
exclusively, and since the label of that path is not changed by the e's leaving i or entering /,
we conclude that N(r) accepts L(s) U L(t), which is the same as L(r).
b) Suppose r = st. Then construct N(r) as in the following figure. The start state of N(s)
becomes the start state of N(r), and the accepting state of N(t) is the only accepting state of
N(r). The accepting state of N(s) and the start state of N(t) are merged into a single state,
with all the transitions in or out of either state. A path from i to / is shown in the following
figure must go first through N(s), and therefore its label will begin with some string in L(s).
The path then continues through N(t), so the path's label finishes with a string in L(t). As
we shall soon argue, accepting states never have edges out and start states never have
edges in, so it is not possible for a path to re-enter N(s) after leaving it. Thus, N(r) accepts
exactly L(s)L(i), and is a correct NFA for r = st.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 32 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
c) Suppose r = s*. Then for r we construct the NFA N(r) shown in the following figure.
Here, i and / are new states, the start state and lone accepting state of N(r). To get from i
to /, we can either follow the introduced path labeled e, which takes care of the one string
in L(s), or we can go to the start state of N(s), through that NFA, then from its accepting
state back to its start state zero or more times. These options allow N(r) to accept all the
strings in L(s)1, L(s)2, and so on, so the entire set of strings accepted by N(r) is L(s*).
d) Finally, suppose r = (s). Then L(r) = L(s), and we can use the NFA N(s) as N(r).
12 Prove that the following two regular expressions are equivalent by showing that the
minimum state DFAs are same.
(i)(a/b)* (ii)(a*/b*) (16) (May 2015)
(i)(a/b)
(ii) (a*/b*)
Input: DFA
Output: Minimized DFA
Step 1
Draw a table for all pairs of states (Qi, Qj) not necessarily connected directly [All are
unmarked initially]
Step 2
Consider every state pair (Qi, Qj) in the DFA where Qi F and Qj F or vice versa
and mark them. [Here F is the set of final states].
Step 3
Repeat this step until we cannot mark anymore states
If there is an unmarked pair (Qi, Qj), mark it if the pair {(Qi, A), (Qi, A)} is
marked for some input alphabet.
Step 4
Combine all the unmarked pair (Qi, Qj) and make them a single state in the reduced
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 33 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
DFA.
13 (i)Differentiate between lexeme, token and pattern.(6) (May 2016)
A lexeme is a sequence of characters in the source program that matches the pattern for a
token and is identified by the lexical analyzer as an instance of that token.
A token is a pair consisting of a token name and an optional attribute value. The token
name is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or
sequence of input characters denoting an identifier. The token names are the input symbols
that the parser processes.
A pattern is a description of the form that the lexemes of a token may take. In the case of a keyword as a
token, the pattern is just the sequence of characters that form the keyword. For identifiers and some other
tokens, the pattern is more complex structure that is matched by many strings.
[Token] [Informal Description] [Sample Lexemes]
if characters i, f if
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 34 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
6 r*=(r|)* Relation between and *
7 r**=r* *is idempotent
14 (i)Write notes on regular expression to NFA. Construct Regular expression to NFA for
the sentence (a|b)*a. (10) (May 2016)
Thompson's construction is an NFA from a regular expression.
The Thompson's construction is guided by the syntax of the regular expression with cases
following the cases in the definition of regular expression.
is a regular expression that denotes {}, the set containing just the empty string.
where i is a new start state and f is a new accepting state. This NFA recognizes {}.
1. If a is a symbol in the alphabet, a , then regular expression 'a' denotes {a} and the
set containing just 'a' symbol.
This NFA recognizes {a}.
2. Suppose, s and t are regular expressions denoting L{s} and L(t) respectively, then
d. (s) is a regular expression denoting L(s) and can be used for putting
parenthesis around regular expression
FIRST:
1. If X is terminal, then FIRST(X) is {X}.
2. If X is a production, then add to FIRST(X).
3. If X is non terminal and X Y1,Y2..Yk is a production, then place a in FIRST(X) if for
some i , a is in FIRST(Yi) , and is in all of FIRST(Y1),FIRST(Yi-1);
FOLLOW
1. Place $ in FOLLOW(S), where S is the start symbol and $ is the input right end marker.
2. If there is a production A aB, then everything in FIRST() except for is placed in
FOLLOW(B).
3. If there is a production A aB, or a production AaB where FIRST() contains , then
everything in FOLLOW(A) is in FOLLOW(B).
20 Write the algorithm for the construction of a predictive parsing table?
Input : Grammar G
Output : Parsing table M
Method :
a) For each production A of the grammar, do steps b and c.
b) For each terminal a in FIRST(), add A to M [A, a]
c) If is in FIRST(), add A to M[A, b] for each terminal b is FOLLOW(A). if is
in FIRST() and $ is in FOLLOW(A), and A to M[A , $]
d) Make each undefined entry of M be error.
21 Differentiate Kernel and non-Kernel items.
Kernel items, which include the initial item, S .S and all items whose dots are not at the
left end. Whereas the non-kernel items have their dots at the left end.
22 Define handle pruning.
A technique to obtain the rightmost derivation in reverse (called canonical reduction
sequence) is known as handle pruning
Start with a string of terminals w we wish to parse.
If w is a sentence of the grammar at hand, then w = n, where n is the n th right
sentential from of some as yet unknown rightmost derivation.
S = 0 1 2 .--> n 1 n = w
23 What is shift reduce parsing?
The bottom up style of parsing is called shift reduce parsing. This parsing method is bottom
up because it attempts to construct a parse tree for an input string beginning at the leaves
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 38 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
and working up towards the root.
24 What are the four possible action of a shift reduce parser?
Shift action the next input symbol is shifted to the top of the stack
Reduce action replace handle
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 39 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
productions (there is none).
When i = 2 and j = 1, we substitute the S-productions in A Sd to obtain the A-
productions A Ac | Aad | bd |
Eliminating immediate left recursion from the A productions yields the grammar:
S Aa | b
A bdA' | A'
A' cA' | adA' |
32 Elininate left recursion from the following grammar AAc/Aad/bd/ . (May 2013)
AbdA'
A'aAcA' / adA' /
33 Specify the advantages of LALR.
Merging of states with common cores can never produce a shift/reduce conflict that was
not present in any one of the original states. Because shift actions depends only one core, not
the look ahead.
34 Mention the demerits of LALR parser.
Merger will produce reduce / reduce conflict.
On erroneous input, LALR parser may proceed to do some reductions after the LR
parser has declared an error, but LALR parser never shift a symbol after the LR
parser declares an error.
35 What is the syntax for YACC source specification program?
Declarations
%%
Translation rules
%%
Supporting C-routines
36 Define terminal and non terminal
Terminals are the basic symbols from which the strings are formed.
Non-terminal are syntactic variables that denote set of strings.
37 What are the possible actions of a shift reduce parser?
Shift, Reduce, Accept, Error.
PART-B
1 1.(i) Construct Predictive Parser for the following grammar:(May/June-2012&13)(10)
S->(L)/a
L->L,S/S
Refer Class Notes
(ii) Describe the conflicts that may occur during shift reduce parsing. (may/june-2012).(6)
Hints:
Shift/Reduce conflict: The entire stack contents and the next input symbol cannot decide
whether to shift or reduce.
Reduce/Reduce conflict: The entire stack contents and the next input symbol cannot
decide which of several reductions to make.
2 What is FIRST and FOLLOW? Explain in detail with an example. Write down the necessary
algorithm.(16)
Rules for first( ):
The CFG is
E TE'
E' +TE'
E'
T FT'
T' *FT'
T'
F (E)
F id
Example
FIRST(E) = {'(',id}
FIRST(E') = {+,}
FIRST(T) = {'(',id}
FIRST(T') = {*,}
FIRST(F) = {'(',id}
Example
FOLLOW(E) = {$,)}
FOLLOW(E') = {$,)}
FOLLOW(T) = {+,$,)}
FOLLOW(T') = {+,$,)}
FOLLOW(F) = {*,+,$,)}
3 Consider the grammar given below:
E E+T |T
T T*F |F
F (E) |id
Construct an LR parsing side for the above grammar. Give the moves of LR parser on
id*id+id
Solution:
After eliminating left-recursion the grammar is
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 41 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
E TE
E +TE |
T FT
T *FT |
F (E) |id
First( ) :
FIRST(E) ={ (, id}
FIRST(E) ={+ , }
FIRST(T) = { ( , id}
FIRST(T) ={*, }
FIRST(F) ={ ( , id }
Follow( ):
FOLLOW(E) ={ $, ) }
FOLLOW(E) ={ $, ) }
FOLLOW(T) ={ +, $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(F) ={+, * , $ , ) }
NON- id + * ( ) $
TERMINAL
E E TE E TE
E E +TE E E
T T FT T FT
T T T *FT T T
F F id F (E)
Stack implementation:
I0 : E . E
E .E+T
E .T
T .T*F
T .F
F . (E)
F . id
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 43 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 44 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 45 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
id + * ( ) $ E T F
I0 s5 s4 1 2 3
I1 s6 ACC
I2 r2 s7 r2 r2
I3 r4 r4 r4 r4
I4 s5 s4 8 2 3
I5 r6 r6 r6 r6
I6 s5 s4 9 3
I7 s5 s4 10
I8 s6 s11
I9 r1 s7 r1 r1
I10 r3 r3 r3 r3
I11 r5 r5 r5 r5
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 46 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
5 (i)Construct SLR parsing table for the following grammar (Nov 2012,May 04(8 Marks))
S L=R|R L*R| id RL
Refer Class Notes
6 Check whether the following grammar is a LL(1) grammar
S iEtS | iEtSeS | a
E b
Refer Class Notes
7 Construct non recursion predictive parsing table for the following grammar.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 47 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
EE or E /E and E/ not E / (E) / 0 /1. (16) (Dec 12)
Refer class notes
8 i. Find the language from (4)
S 0S1 | 0A1 A1A0 | 10
S0S1 | 0A | 0 |1B | 1
A0A | 0
B1B|1
Answer:
S->0S1=>001
S->0S1=>011
S->0S1=>00S11=>000S111=>0000A111=>00000111
ii.Define Parse tree , Regular Expression , Left most derivation , Right most derivation, and
write example for each. (4)
A parse tree or parsing tree or derivation tree or (concrete) syntax tree is an ordered, rooted
tree that represents the syntactic structure of a string according to some context-free grammar.
Regular Expression
Regular expressions are mathematical symbolism which describes the set of strings of specific
language. It provides convenient and useful notation for representing tokens. Here are some
rules that describe definition of the regular expressions over the input set denoted by
Leftmost Derivation
A top-down parse we always choose the leftmost non-terminal in a sentential form to apply a
production rule to - this is called a leftmost derivation.
Rightmost Derivation
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 48 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
A bottom-up parse then the situation would be reversed, and we would want to do apply the
production rules in reverse to the leftmost symbols; thus we are performing a rightmost
derivation in reverse.
We will use the rules which defined a regular expression as a basis for the construction:
The Kleene closure must allow for taking zero or more instances of the letter from the input
9 i. Prove the grammar is ambiguous. (4)
EE+E | E*E | (E) | id
This parsing technique recursively parses the input to make a parse tree, which may or may
not require back-tracking.
But the grammar associated with it (if not left factored) cannot avoid back-tracking.
Back-tracking
Top- down parsers start from the root node (start symbol) and match the input string against
the production rules to replace them (if matched). To understand this, take the following
example of CFG:
S rXd | rZd
X oa | ea
Z ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter of
the input, i.e. r. The very production of S (S rXd) matches with it. So the top-down parser
advances to the next input letter (i.e. e). The parser tries to expand non-terminal X and
checks its production from the left (X oa). It does not match with the next input symbol.
So the top-down parser backtracks to obtain the next production rule of X, (X ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 50 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
11 Construct predictive parsing table and parse the string NOT(true OR false)
bexprbexpr OR bterm | bterm
btermbterm AND bfactor | bfactor
bfactorNOT bfactor | (bexpr) | true | false
Find FIRST and FOLLOW and
construct table and parse the string.
12 Construct CLR parsing table to parse the sentence id=id*id for the following grammar.
S L=R | R
L*R | id
RL
Find LR(1) items;
Construct CLR table.
Parse the string.
13 Parse the string (a,a) using SLR parsing table.
S (L) | a
LL , S | S
Find LR(0) items.
Construct SLR parsing table.
Parse the string.
14 Construct LALR parsing table for the grammar.
E E+T | T
T T*F | F
F (E) | id
Find LR(1) items.
Find same core items different second component .them merge it
After merging construct LALR parsing table.
15 Write algorithms for SLR and CLR string parsing algorithm.
Brief introduction about SLR and CLR parsers.
Algorithm
16 Generate SLR Parsing table for the following grammar.
S-->Aa|bAc|Bc|bBa
A-->d
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 51 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
B-->d And parse the sentences bdcand dd. (12) (May 2015)
1. Construct F = {I0, I1, ... In}, the collection of LR(0) configurating sets for G'.
2. State i is determined from Ii. The parsing actions for the state are determined as follows:
a) If A > u is in Ii then set Action[i,a] to reduce A > u for all a in Follow(A) (A is not S').
c) If A > uav is in Ii and successor(Ii, a) = Ij, then set Action[i,a] to shift j (a must be a terminal).
3. The goto transitions for state i are constructed for all nonterminals A using the rule:
4. All entries state is the one constructed from the configurating set containing
5. The initial
17 Write the algorithm to eliminate left-recursion and left-factoring and apply both to the
following grammar. (8)
E-->E+T|E-T|T
T-->a|b|(E) (May 2015)
Step 1: Eliminate left recursion and left factoring
Step 2: Find out the FIRST
Step 3: Find out FOLLOW
Step 4: Constructing a parsing table
Step 5: Stack Implementation
18 (i) Construct stack implementation of shift reduce parsing for the grammar (May 2016)
E-> E+E
E -> E*E
E -> (E)
E -> id and the input string id1+id2*id3. (8)
Solution:
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 52 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
(ii) Explain LL(1) grammar for the sentence S ->iEts | iEtSes | a E->b (8)
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 53 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
The input buffer contains the string to be parsed; $ is the end-of-input marker. The stack
contains a sequence of grammar symbols. Initially, the stack contains the start symbol of the
grammar on the top of $. The parser is controlled by a program that behaves as follows:
1. The program considers X, the symbol on top of the stack, and a, the current input symbol.
These two symbols, X and a determine the action of the parser. There are three possibilities.
X = a = $, the parser halts and announces successful completion
2.X = a $ the parser pops X off the stack and advances input pointer to next input symbol .
3.If X is a non-terminal, the program consults entry M[X,a] of parsing table M.
a.If the entry is a production M[X,a] = {X UVW }, the parser replaces X on top of the
stack by WVU (with U on top). As output, the parser just prints the production used:
X UVW. However, any other code could be executed here.
b.If M[X,a] =error, the parser calls an error recovery routin
UNIT IV SYNTAX DIRECTED TRANSLATION & RUN TIME ENVIRONMENT
Syntax directed Definitions-Construction of Syntax Tree-Bottom-up Evaluation of S-Attribute
Definitions- Design of predictive translator - Type Systems-Specification of a simple type
checker-Equivalence of Type Expressions-Type Conversions.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 54 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
grammar production X is a associated with it a set of semantic rules of the form
a:=f(b1,b2,...bk) , where a is an attribute obtained from the function f.
2 What is mean by syntax directed definition.
It is a generalization of a CFG in which each grammar symbol has an associated set of
attributes like, synthesized attribute and inherited attribute
3 How the value of synthesized attribute is computed?
It was computed from the values of attributes at the children of that node in the parse tree.
4 How the value of inherited attribute is computed?
It was computed from the value of attributes at the siblings and parent of that node.
5 What is mean by construction of syntax tree for expression.
Construction syntax tree for an expression means translation of expression into postfix
form. The nodes for each operator and operand are created. Each node can implement as a
record with multiple fields. Following are the function used in syntax tree for expression.
1. Mknode (op, left, right)
2. Mknode (id, entry)
3. Mkleaf (num, val)
6 What are the functions of construction of syntax tree for expression? Explain.
Mknode (op, left, right)
This function creates a node with field operator having operator as label, and the two
pointers to left and right.
Mknode (id, entry)
This function creates identifier node with label id and a pointer to symbol table is
given by entry.
Mkleaf (num, val)
This function creates node for number with label num and val is for value of that
number.
7 What do you mean by DAG? (May 2016)
A DAG is a directed acyclic graph with the following labels on nodes
Leaves are labeled by unique identifiers, either variable names or constants.
Interior nodes are labeled by an operator symbol.
Nodes are given a sequence of identifiers for labels.
8 Construct a syntax tree and DAG for k:=k+5.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 56 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
parameters of the procedure. Arguments, known as actual parameters may be passed to
a called procedure and they are substituted for the formal in the body.
20 Define activation trees.
Each node represents an activation of a procedure,
The root represents the activation of the main program
The node for a is the parent of the node for b if and only if control flows
from activation a to b,
The node for a is to the left of the node for b if and only if the lifetime of a occurs
before the lifetime of b.
21 Write notes on control stack?
A control stack is to keep track of live procedure activations. The idea is to push the
node for activation onto the control stack as the activation begins and to pop the node
when the activation ends.
22 Write the scope of a declaration?
A portion of the program to which a declaration applies is called the scope of that
declaration. An occurrence of a name in a procedure is said to be local to the
procedure if it is in the scope of a declaration within the procedure; otherwise, the
occurrence is said to be nonlocal.
23 Define binding of names.
When an environment associates storage location s with a name x, we say that x is
bound to s; the association itself is referred to as a binding of x. A binding is the
dynamic counterpart of a declaring.
24 What is the use of run time storage?
The run time storage might be subdivided to hold: a) The generated target code b) Data
objects, and c) A counterpart of the control stack to keep track of procedure activation.
25 What is an activation record? (or) What is frame?
Information needed by a single execution of a procedure is managed using a contiguous
block of storage called an activation record or frame, consisting of the collection of
fields such as a) Return value b) Actual parameters c) Optional control link d) Optional
access link e) Saved machine status f) Local data g) Temporaries.
26 What does the runtime storage hold?
The generated target code
Data objects
A counter part of the control stack to keep track of procedure activations.
27 What are the various ways to pass a parameter in a function?
Call-by-value
Call-by-reference
Call-by-value-result (copy-restore) :this method is a hybrid between call by value and
call by references.
Call-by-name
28 What are the limitations of static allocation? (Nov 2012)
The size of a data object and constraints on its position in memory must be known at
compile time.
Recursive procedure is restricted.
Data structures cannot be created dynamically.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 57 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
29 What is stack allocation?
Stack allocation is based on the idea of a control stack; storage is organized as a
stack, and activation records are pushed and popped as activations begin and end
respectively.
30 List the fields in activation record. (Nov 2014)
Actual parameters, Returned Values, Control link, Access link, Saved machine status, Local
data, Temporaries
31 What is dangling references? (May 2016)
Whenever storage can be de-allocated, the problem of dangling references arises. A
dangling reference occurs when there is a reference to storage that has been de
allocated.
32 Constructed a decorated parse tree according to the syntax directed definition, for the
following input statement ( 4+7.5*3)/2. (May 2015)
Indirect:
Op Arg1 Arg2 statemen
t
11 * Y (0) 11
12 = (11) (1) 12
Triple:
Op Arg1 Arg2
(0) & x
(1) = a (0)
Indirect:
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 58 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Op Arg1 Arg2 statemen
t
11 & x (0) 11
12 = a (11) (1) 12
PART B
1 Explain the concept of syntax directed definition.
A syntax-directed definition (SDD) is a context-free grammar with attributes attached
to grammar symbols and semantic rules attached to the productions.
The semantic rules define values for attributes associated with the symbols of the
productions.
These values can be computed by creating a parse tree for the input and then making
a sequence of passes over the parse tree, evaluating some or all of the rules on each
pass.
SDDs are useful for specifying translations.
CFG + semantic rules = Syntax Directed Definitions
A syntax-directed definition (SDD) is a context-free grammar together with attributes
and rules. Attributes are associated with grammar symbols and rules are associated
with productions.
If X is a symbol and a is one of its attributes, then we write X.a to denote the value of a
at a particular parse-tree node labeled X.
If we implement the nodes of the parse tree by records or objects, then the attributes
of X can be implemented by data fields in the records that represent the nodes for X.
Attributes may be of any kind: numbers, types, table references, or strings, for
instance.
The strings may even be long sequences of code, say code in the intermediate
language used by a compiler.
2 Construct parse tree, syntax tree and annotated parse tree for the input string is 5*6+7;
Parse tree
Syntax tree
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 59 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 60 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
T(E) { T.node = E.node; }
4 Write Syntax Directed Definition and evaluate 9*3+2 with parser stack using LR parsing
method.
Parse tree helps us to visualize the translation specified by SDD.
The rules of an SDD are applied by first constructing a parse tree and then using the
rules to evaluate all of the attributes at each of the nodes of the parse tree.
A parse tree, showing the value(s) of its attribute(s) is called an annotated parse tree.
With synthesized attributes, we can evaluate attributes in any bottom-up order, such
as that of a postorder traversal of the parse tree.
The second class of SDD's is called L-attributed definitions. The idea behind this class is that,
between the attributes associated with a production body, dependency-graph edges can go
from left to right, but not from right to left (hence "L-attributed").
More precisely, each attribute must be either
1. Synthesized, or
2. Inherited, but with the rules limited as follows.
Suppose that there is a production A -> XXX2 Xn, and that there is an inherited
attribute Xi.a computed by a rule associated with this production.
Then the rule may use only:
(a) Inherited attributes associated with the head A.
(b) Either inherited or synthesized attributes associated with the occurrences of symbols
X,X2,... , X^ located to the left of X{.
(c) Inherited or synthesized attributes associated with this occurrence of Xi itself, but only in
such a way that there are no cycles in a dependency graph formed by the attributes of this Xj.
7 (i) Given the Syntax-Directed Definition below construct the annotated parse tree for the
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 61 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
input expression: int a, b, c.
D T L L.inh = T.type
T intT.type = integer
T float T.type = float
L L1, id L1.inh = L.inhaddType(id.entry,L.inh)
L id addType(id.entry,L.inh)
(ii) Given the Syntax-Directed Definition below with the synthesized attribute val, draw
the annotated parse tree for the expression (3+4) * (5+6).
L E L.val = E.val
E T E.val = T.val
E E1 + T E.val = E1.val + T.val
T F T.val = F.val
T T1 * F T.val = T1.val * F.val
F ( E ) F.val = E.val
F digit F.val = digit.lexval
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 62 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
8 Explain the various structures that are used for the symbol table constructions.(May
2012,2014)
A separate array arr_lexemes holds the character string forming an identifier. The
string is terminated by an end-of-string character, denoted by EOS, that may not
appear in identifiers.
Each entry in symbol-table array arr_symbol_table is a record consisting of two
fields, as lexeme_pointer, pointing to the beginning of a lexeme, and token.
Additional fields can hold attribute values. 0th entry is left empty, because lookup
return 0 to indicate that there is no entry for a string.
The 1st, 2nd, 3rd, 4th, 5th, 6th, and 7th entries are for the a, plus b and, c,
minus, and d where 2nd, 4th and 6th entries are for reserve keyword.
List
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 63 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Hash Table
Search Tree
Expression
We prove that int x ; x = x + 5 ; is valid in the empty context ().
x : int => x : int x : int => 5 : int
-----------------------------------------
x : int => x + 5 : int
--------------------------
x : int => x = x + 5 ; valid
-------------------------------
() => int x ; x = x + 5 ; valid
The signature is omitted for simplicity.
Function types
No expression in the language has function types, because functions are never returned as
values or used as arguments.
However, the compiler needs internally a data structure for function types, to hold the types
of the parameters and the return type. E.g. for a function
bool between (int x, double a, double b) {...}
we write
between : (int, double, double) -> bool
to express this internal representation in typing rules.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 65 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
exactly where each data object will reside at run time. In order to make such a decision, at
least two criteria must be met:
1. The size of each object must be known at compile time.
2. Only one occurrence of each object is allowable at a given moment during program
execution.
A static storage-allocation strategy is very simple to implement.
An object address can be either an absolute or a relative address.
DYNAMIC STORAGE ALLOCATION
In a dynamic storage-allocation strategy, the data area requirements for a program are not
known entirely at compilation time. In particular, the two criteria that were given in the
previous section as necessary for static storage allocation do not apply for a dynamic storage-
allocation scheme. The size and number of each object need not be known at compile time;
however, they must be known at run time when a block is entered. Similarly more than one
occurrence of a data object is allowed, provided that each new occurrence is initiated at run
time when a block is entered.
11 Explain about static and stack allocation in storage allocation strategies
In a static storage-allocation strategy, it is necessary to be able to decide at compile time
exactly where each data object will reside at run time. In order to make such a decision, at
least two criteria must be met:
2. Only one occurrence of each object is allowable at a given moment during program
execution.
In three-address code, this would be broken down into several separate instructions. These
instructions translate more easily to assembly language. It is also easier to detect common sub-
expressions for shortening the code.
Example:
t1 := b * b
t2 := 4 * a
t3 := t2 * c
t4 := t1 - t3
t5 := sqrt(t4)
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 67 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
t6 := 0 - b
t7 := t5 + t6
t8 := 2 * a
t9 := t7 / t8
x := t9
(iii)Determine the address of A[3,5] where , all are integer arrays with size of A as 10*10
and B as 10*10 with k=2 and the start index position of all arrays is at 1.(assume the base
addresses) (4) (May 2015)
Array indexing- In order to access the elements of array either single dimension or
multidimension, three address code requires base address and offset value.
Base address consists of the address of first element in an array.
Other elements of the array can be accessed using the base address and offset value.
Example: x = y[i]
Memory location m = Base address of y + Displacement i
x = contents of memory location m similarly x[i] = y
Memory location m = Base address of x + Displacement i.
The value of y is stored in memory location m
15 Apply Back-patching to generate intermediate code for the following input.
x:2+y;
If x<y then x:=x+y;
repeat y:=y*2;
while x>10 do x:=x/2;
A key problem when generating code for boolean expressions and flow-of-control
statements is that of matching a jump instruction with the target of the jump. For example,
the translation of the boolean expression B in if (B) S contains a jump, for when B is false, to
the instruction following the code for S. In a one-pass translation, B must be translated
before S is examined. What then is the target of the goto that jumps over the code for S?this
problem is addressed by passing labels as inherited attributes to where the relevant jump
instructions were generated. But a separate pass is then needed to bind labels to addresses.
This section takes a complementary approach, called backpatching, in which lists of jumps
are passed as synthesized attributes. Specifically, when a jumpis generated, the target of the
jump is temporarily left unspecified. Each such jump is put on a list of jumps whose labels
are to be filled in when the proper label can be determined. All of the jumps on a list have
the same target label.
(ii) What is an Activation Record? Explain how its relevant to the intermediate code
generation phase with respect to procedure declarations. (4) (April/May 2015)
Modern imperative programming languages typically have local variables.
Created upon entry to function.
Destroyed when function returns.
Each invocation of a function has its own instantiation of local variables.
Recursive calls to a function require several instantiations to exist simultaneously.
Functions return only after all functions it calls have returned last-in-first-out
(LIFO) behavior.
A LIFO structure called a stack is used to hold each instantiation.
The portion of the stack used for an invocation of a function is called the functions
stack frame or activation record.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 68 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
The Stack
Used to hold local variables.
Large array which typically grows downwards in memory toward lower addresses,
shrinks upwards.
Push(r1):
stack_pointer--;
M[stack_pointer] = r1;
r1 = Pop():
r1 = M[stack_pointer];
stack_pointer++;
Previous activation records need to be accessed, so push/pop not sufficient.
Treat stack as array with index off of stack pointer.
Push and pop entire activation records.
16 (i)Construct a syntax directed definition for constructing a syntax tree for assignment
statements (8) (May 2016)
S id: = E
E E1+E2
E E1*E2
E- E1
E( E1)
E id
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 69 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
44 Write three address code sequence for the assignment statement d:=(a-b)+(a-c)+(a-c)
(May2016)
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 75 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
t1:=a
t2:=t1-b
t3:=t1-c
t4:=t2+t3
t5:=t4+t3
d:=t5
PART B
1 Explain the principle sources of optimization in detail. (May 2016)
Transformations can be
Local : look within basic block
Global : look across blocks
Transformations should preserve function of program.
Function-preserving transformations include
Common sub expression elimination
Copy propagation
Dead-code elimination
Constant-folding
Common Sub expression Elimination
Occurrence of expression E is called common sub expression if
E was previously computed, and
values of variables in E have not changed since previous Computation
Copy Propagation
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 76 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Statement of form f := g is called a copy statement
Idea isto use g instead of f in subsequent statements
Doesn't help by itself, but can combine with other transformations to help eliminate code:
\
Dead-Code Elimination
Variable that is no longer live (subsequently used) is called dead.
Copy propagation often turns copy statement into dead code:
Loop Optimizations
Biggest speedups often come from moving code out of inner loop
Three techniques
Code motion
Induction-variable elimination
Reduction in strength
Code Motion
Expression whose value doesn't change inside loop is called a loop-invariant
Code motion moves loop-invariants outside loop
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 77 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Dead-Code Elimination
Variable that is no longer live (subsequently used) is called dead.
Copy propagation often turns copy statement into dead code:
Code Motion
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 78 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Expression whose value doesn't change inside loop is called a loop-invariant
Code motion moves loop-invariants outside loop
There is a node in the DAG for each of the initial values of the variables appearing in
the basic block.
There is a node N associated with each statement s within the block.
The children of N are those nodes corresponding to statements that are the last
definitions, prior to s, of the operands used by s.
Node N is labelled by the operator applied at s, and also attached to N is the list of
variables for which it is the last definition within the block.
Certain nodes are designated output nodes. These are the nodes whose variables are
live on exit from the block;
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 80 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
t2 = t1 + j
t3 = t2 * 4
t4 = &c + t3
t12 = t1 + k
t13 = t12 * 4
t14 = &a + t13
t21 = k * N
t22 = t21 + j
t23 = t22 * 4
t24 = &b + t23
t31 = *t14 * *t24
*t4 = *t4 + t31
k=k+1
if( k < N) goto L1
t1 = i * N
t2 = t1 + j
t3 = t2 * 4
t4 = &c + t3
L1: t12 = t1 + k
t13 = t12 * 4
t14 = &a + t13
t21 = k * N
t22 = t21 + j
t23 = t22 * 4
t24 = &b + t23
t31 = *t14 * *t24
*t4 = *t4 + t31
k=k+1
if( k < N) goto L1
6 (i)Explain the issues in design of code generator.
(ii)Explain peephole optimization.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 82 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Sophisticated compilers typically perform multiple passes over various intermediate forms.
This multi-stage process is used because many algorithms for code optimization are easier to
apply one at a time, or because the input to one optimization relies on the completed
processing performed by another optimization. This organization also facilitates the creation
of a single compiler that can target multiple architectures, as only the last of the code
generation stages (the backend) needs to change from target to target.
Example
MOV R0 x,R0
MOV R1 y,R1
MUL R0,R1
MOV t1, R0
MOV R0 t1,R0
MOV R1 z,R1
ADD R0,R1
MOV t2, R0
MOV R0 x,R0
MOV R1 x,R1
ADD R0,R1
MOV x, R0
MOV R0 y,R0
MOV R1 y,R1
SUB R0,R1
MOV y, R0
OUT x
MOV z,y
OUT z
8 Write detailed notes on Basic blocks and flow graphs.
A graph representation of intermediate code.
Basic block properties
The flow of control can only enter the basic block through the first instruction in the
block.
No jumps into the middle of the block.
Control leaves the block without halting / branching (except may be the last
instruction of the block).
The basic blocks become the nodes of a flow graph, whose edges indicate which blocks
can follow which other blocks.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 83 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
flow graphs
A control flow graph (CFG) in computer science is a representation, using graph notation, of
all paths that might be traversed through a program during its execution. In a control flow
graph each node in the graph represents a basic block, i.e. a straight-line piece of code without
any jumps or jump targets; jump targets start a block, and jumps end a block. Directed edges
are used to represent jumps in the control flow. There are, in most presentations, two
specially designated blocks: the entry block, through which control enters into the flow graph,
and the exit block, through which all control flow leaves.
9 Define a Directed Acyclic Graph. Construct a DAG and write the sequence of instructions
for the expression a+a*(b-c)+(b-c)*d. (May 2014)
DAG
A representation to assist in code reordering.
Nodes are operations
Edges represent dependences
Nodes are labeled as follows:
Leaves with variables or constants subscript 0 are used to distinguish initial value of
the variable from other values.
Interior nodes with operators and list of variables whose values are computed by the
node.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 84 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 85 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Loop nodes: 2, 3, 5
Header node: 2
Loop back edge: 52
TailHead
(ii).Explain Local optimization.
Loop Invariant Code Motion
If a computation produces the same value in every loop iteration, move it out of the loop
If a computation produces the same value in every loop iteration, move it out of the
loop
for i = 1 to N
x=x+1
for j = 1 to N
a(i,j) = 100*N + 10*i + j + x
If a computation produces the same value in every loop iteration, move it out of the loop
t1 = 100*N
for i = 1 to N
x=x+1
for j = 1 to N
a(i,j) = 100*N + 10*i + j + x
12 (i) Write the code generation algorithm using dynamic programming and generate code for
the statement x=a/(b-c)-s*(e+f) [Assume all instructions to be unit cost] (12)
(ii) What are the advantages of DAG representation ? Give example.(4) (May 2015)
Goal: Generate optimal code for broad class of register machines
Machine Model:
k interchangeable registers r0, r1, . . . , rk1.
Instructions are of the form ri := E, where E is an expression containing operators,
registers, and memory locations (denoted M)
Every instruction has an associated cost, measured by C()
Cost Vector: C(E) = (c0 c1 cr ) its defined for an expression E, where:
C0: cost of computing E into memory, with the use of unbounded number of regs
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 86 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Ci: cost of computing E into a register, with the use of up to i regs
(ii)What are the advantages of DAG representation ? Give example.(4) (April/May 2015)
Determining the common sub-expressions.
Determining which names are used inside the block and computed
outside the block.
Determining which statements of the block could have their computed
value outside the block.
Simplifying the list of quadruples by eliminating the common sub-expressions and not
performing.
13 13.(i)Write the procedure to perform Register Allocation and Assignment with Graph
Coloring.(8)
Two passes are used
Target-machine instructions are selected as though there are an infinite number
of symbolic registers
Assign physical registers to symbolic ones
Create a register-interference graph
Nodes are symbolic registers and edges connects two nodes if one is live
at a point where the other is defined.
For example in the previous example an edge connects a and d in the
graph
Use a graph coloring algorithm to assign registers.The Register Interference Graph
Two temporaries that are live simultaneously cannot be allocated in the same register
We construct an undirected graph
A node for each temporary
An edge between t1 and t2 if they are live simultaneously at some point in the program
This is the register interference graph (RIG)
Two temporaries can be allocated to the same register if there is no edge connecting them
example:
In this section we assume we are using an n-register machine with instructions of the
form
o LD reg, mem
o ST mem, reg
to evaluate expressions.
o Numbers, called Ershov numbers, can be assigned to label the nodes of an expression
tree. A node gives the minimum number of registers needed to evaluate on a register
machine the expression generated by that node with no spills. A spill is a store
instruction that gets generated when there are no empty registers and a register is needed
to perform a computation.
3. The label of an interior node with two children is the larger of the labels of its children
if these labels are different; otherwise, it is one plus the label of the left child.
14 Perform analysis of available expressions on the following code by converting into basic
blocks and compute global common sub expression elimination.
I. I:=0
II. A:=n-3
III. If i<a then loop else end
IV. Label loop
V. B:=i_4
VI. E:=p+b
VII. D:-m[c]
VIII. E:=d-2
IX. F:=I-4
X. G:=p+f
XI. M[g]:=e
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 88 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
XII. I:=i+1
XIII.A:=n-3
XIV. If i<a then loop else end
XV. Label end (April/May 2015)
Two operations are common if they produce the same result. In such a case, it is likely more
efficient to compute the result once and reference it the second time rather than re-evaluate
it. An expression is alive if the operands used to compute the expression have not been
changed. An expression that is no longer alive is dead.
main()
{
int x, y, z;
x = (1+20)* -x;
y = x*x+(x/y);
y = z = (x/y)/(x*x);
}
straight translation:
tmp1 = 1 + 20 ;
tmp2 = -x ;
x = tmp1 * tmp2 ;
tmp3 = x * x ;
tmp4 = x / y ;
y = tmp3 + tmp4 ;
tmp5 = x / y ;
tmp6 = x * x ;
z = tmp5 / tmp6 ;
y=z;
Here is an optimized version, after constant folding andpropagation and elimination of
common sub- expressions:
tmp2 = -x ;
x = 21 * tmp2 ;
tmp3 = x * x ;
tmp4 = x / y ;
y = tmp3 + tmp4 ;
tmp5 = x / y ;
z = tmp5 / tmp3 ;
y=z;
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 89 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
m[g]:=e
i:=i+1
a:=n-3
(ii)What are the optimization technique applied on procedure calls? Explain with example
(6) (May 2015)
Optimizations performed exclusively within a basic block are called "local optimizations".
These are typically the easiest to perform since we do not consider any control flow
information, we just work with the statements within the block. Many of the local
optimizations we will discuss have corresponding global optimizations that operate on the
same principle, but require additional analysis to perform.
Induction variable analysis
If a variable in a loop is a simple linear function of the index variable, such as j := 4*i + 1, it can
be updated appropriately each time the loop variable is changed. This is a strength reduction,
and also may allow the index variable's definitions to become dead code. This information is
also useful for bounds-checking elimination and dependence analysis, among other things.
16 (b) (i) Explain various issues in the design of code generator.(8) (May 2016)
(ii) Write note on simple code generator.(8)
Transformations can be
Local : look within basic block
Global : look across blocks
Transformations should preserve function of program.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 90 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Function-preserving transformations include
Common sub expression elimination
Copy propagation
Dead-code elimination
Constant-folding
Common Sub expression Elimination
Occurrence of expression E is called common sub expression if
E was previously computed, and
values of variables in E have not changed since previous Computation
Copy Propagation
Statement of form f := g is called a copy statement
Idea isto use g instead of f in subsequent statements
Doesn't help by itself, but can combine with other transformations to help eliminate code:
\
Dead-Code Elimination
Variable that is no longer live (subsequently used) is called dead.
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 91 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Copy propagation often turns copy statement into dead code:
Loop Optimizations
Biggest speedups often come from moving code out of inner loop
Three techniques
Code motion
Induction-variable elimination
Reduction in strength
Code Motion
Expression whose value doesn't change inside loop is called a loop-invariant
Code motion moves loop-invariants outside loop
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 92 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
Sophisticated compilers typically perform multiple passes over various intermediate forms.
This multi-stage process is used because many algorithms for code optimization are easier to
apply one at a time, or because the input to one optimization relies on the completed
processing performed by another optimization. This organization also facilitates the creation
of a single compiler that can target multiple architectures, as only the last of the code
generation stages (the backend) needs to change from target to target.
Example
MOV R0 x,R0
MOV R1 y,R1
MUL R0,R1
MOV t1, R0
MOV R0 t1,R0
MOV R1 z,R1
ADD R0,R1
MOV t2, R0
MOV R0 x,R0
MOV R1 x,R1
ADD R0,R1
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 94 of 95
CS6660-Compiler Design Department of CSE &IT 2016-2017
MOV x, R0
MOV R0 y,R0
MOV R1 y,R1
SUB R0,R1
MOV y, R0
OUT x
MOV z,y
OUT z
17 (i)Construct the DAG for the following basic block (8)
d:=b*c
e:=a+b
b:=b*c
a:=e-d
(ii) How to trace data-flow analysis of structured program? (8)
18 Write a Grammar and translate schema for procedure call statements. (12)
19 Generate DAG represenataion of the following code and list out the applications of DAG
representation
i=1,while(i<=10) do
sum+=a[i] (8)
St. Josephs College of Engineering & St. Josephs Institute of Technology Page 95 of 95