CD Unit I Part I Introduction

Compiler Design
Dr.S.Gopi Krishna
Professor in CSE
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language
Basics
Agenda
01 UNIT I : PART I
1. Language
Processors
Languages
Compiler
Technology
6. Programming Language Basics
Agenda UNIT I : PART I
01
1.Language
Processors
 Compiler
 Language Processing
System
Language Processors
Programming languages are notations for describing

computations to people and to machines.
But, before a program can be run, it first must be

translated into a form in which it can be executed by a
computer.
The software systems that do this translation are called

compilers.
Compiler
Simply stated, a compiler is a program that can read a
program in one language — the source language — and
translate it into an equivalent program in another language
— the target language
If the target program is an executable machine-language

program, it can then be called by the user to process
inputs and produce outputs
 An important role of the compiler is to report any

errors in the source program that it detects during
the translation process.
Language Processing System
Several other programs may be required to create

an executable target program.
1. Preprocessor
2. Compiler
3. Assembler
4. Linker/Loader
1. Preprocessor
 The task of collecting the source program is sometimes

entrusted to a separate program, called a preprocessor.
 The preprocessor may also expand shorthands, called

macros, into source language statements.
 Preprocessors produce input to compilers.
 They may perform the following functions:

1. Macro processing
2. File inclusion
3. "Rational" preprocessing
4. Language Extensions
2. Compiler
The (modified or pre-processed or pure) source

program is then fed to a compiler.
The compiler may produce an assembly-language

program as its output, because assembly language is
easier to produce as output and is easier to debug.
3. Assembler
The assembly language is then processed by a program

called an assembler that produces relocatable machine
code as its output.
 Assembly code is a mnemonic version of machine

code, in which
 names are used instead of binary codes for

operations, and names are also given to memory
addresses.
4. Linker/Loader
 The linker resolves external memory addresses,

where the code in one file may refer to a
location in another file.
 The loader then puts together the entire

executable object files into memory for
execution.
01
1.Language
Processors
 Compiler
 Language Processing
System
…. Thank you
Agenda
01 UNIT I : PART I
2. Structure of a
Compiler
Languages
Compiler
Technology
01
2. Structure of a
Compiler
 Analysis - Synthesis
 Phases of a Compiler
Structure of a compiler
A compiler maps a source program into a semantically equivalent target program. A

compiler breaks into two parts:
1. Analysis and
2. Synthesis.
1. Analysis
The analysis part often called the front end of the compiler,
 breaks up the source program into constituent pieces and

imposes a grammatical structure on them. It then uses this
structure to create an intermediate representation of the source
program.
 detects that the source program is either syntactically ill

formed or semantically unsound, then it must provide
informative messages, so the user can take corrective action.
 collects information about the source program and stores it in

a data structure called a symbol table, which is passed along with
the intermediate representation to the synthesis part.
2. Synthesis
The synthesis part often called the back end of the

compiler
 constructs the desired target program from the

intermediate representation and the information in
the symbol table.
Phases of a Compiler
Compiler operates as a sequence of phases, each

of which transforms one representation of the source
program to another.
A typical decomposition of a compiler into phases is

shown in Fig.
1 Lexical Analysis
The first phase of a compiler is called lexical analysis or
scanning (scanner).
The lexical analyzer

o reads the stream of characters making up the source
program
o groups the characters into meaningful sequences called
lexemes
o produce output as a sequence of tokens
For example, suppose a source program contains the assignment statement

position = initial + rate * 60
Representation of the assignment statement, after lexical analysis as the sequence of tokens
(id,1) (=) (id, 2) (+) (id, 3) (*) (60)
2 Syntax Analysis
The second phase of the compiler is syntax analysis or
parsing (Parser).
The parser creates a tree-like intermediate representation

that depicts the grammatical structure of the token stream.
A typical representation is a syntax tree in which

• each interior node represents an operation and
• the children of the node represent the arguments of the
operation.
A syntax tree for the token stream is shown as the output of the syntactic analyzer. This tree
shows the order in which the operations in the assignment position = initial + rate * 60
are to be performed.
3 Semantic Analysis
An important part of semantic analysis is type checking
 where the compiler checks that each operator has

matching operands.
 The language specification may permit some type

conversions called coercions.
Notice that the output of the semantic analyzer has an extra

node for the operator inttofloat, which explicitly converts its
integer argument into a floating-point number
4 Intermediate Code Generation
In the process of translating a source program into target

code, a compiler may construct one or more intermediate
representations, which can have a variety of forms.
We consider an intermediate form called three-address

code, which consists of a sequence of assembly-like
instructions with three operands per instruction.
Each operand can act like a register.

5 Code Optimization
The machine-independent code-optimization phase attempts to

improve the intermediate code so that better target code will result.
Usually better means faster, but other objectives may be desired,

such as shorter code, or target code that consumes less power.
The optimizer can deduce that the conversion of 60 from integer to

floating point can be done once and for all at compile time, so the
inttofloat operation can be eliminated by replacing the integer 60 by
the floating-point number 60.0.
6 Code Generation
The code generator takes as input an intermediate representation

of the source program and maps it into the target language.
The intermediate instructions are translated into sequences of

machine instructions that perform the same task.
A crucial aspect of code generation is the judicious assignment of

registers to hold variables. For example, using registers R1 and R2,
the intermediate code might get translated into the machine code
Symbol-Table Management
The symbol table is a data structure
 which stores information about all

variables/identifiers.
 The data structure should be designed to allow the

compiler
 to find the record for each name quickly and
 to store or retrieve data from that record quickly.
 Lexical and Syntax Analysis will fill up symbol table and

Code optimizer and Code generator make use of this
symbol table
01
2. Structure of a
Compiler
 Analysis - Synthesis
 Phases of a Compiler
…. Thank you
Agenda
01 UNIT I : PART I
3. Evolution of
Programming
Languages
Compiler
Technology
Agenda
01 UNIT I : PART IV
Evolution of
Programming
Languages
 The Move to Higher-level
Languages
 Impacts on Compilers
The Evolution of Programming Languages
The first electronic computers appeared in the 1940's and were programmed in machine
language by sequences of 0's and 1's that explicitly told the computer what operations to
execute and in what order.
The operations themselves were very low level:

 move data from one location to another,
 add the contents of two registers,
 compare two values, and
 so on.
Needless to say, this kind of programming was slow, tedious, and error prone. And once
written, the programs were hard to understand and modify.
1. The Move to Higher-level Languages

2. Impacts on Compilers
The first step towards more people-friendly programming languages was the development of
mnemonic assembly languages in the early 1950's.
 Initially, the instructions in an assembly language were just mnemonic representations
of machine instructions.
 Later, macro instructions were added to assembly languages so that a programmer
could define parameterized shorthands for frequently used sequences of machine
instructions.
A major step towards higher-level languages was made in the latter half of the 1950's with
the development of FORTRAN for scientific computation, COBOL for business data
processing, and LISP for symbolic computation.
Today, there are thousands of programming languages with innovative features to

help make programming easier, more natural, and more robust. They can be classified
in a variety of ways.
1. Classification is by generation.
 First-generation languages are the machine languages
 Second-generation the assembly languages
 Third-generation the higher-level languages like FORTRAN, COBOL, Lisp, C,

C++, C#, and Java
 Fourth-generation languages are languages designed for specific applications like

NOMAD for report generation, SQL for database queries, and Postscript for text
formatting.
2. An object-oriented language is one that supports object-oriented programming, a

programming style in which a program consists of a collection of objects that interact
with one another.
 Simula67 and Smalltalk are the earliest major object-oriented languages.
 Languages such as C++, C#, Java, and Ruby are more recent object-oriented
languages.
3. Scripting languages are interpreted languages with high-level operators designed for
"gluing together" computations.
 These computations were originally called "scripts." Awk, JavaScript, Perl, PHP,
Python, Ruby, and Tel are popular examples of scripting languages.
 Programs written in scripting languages are often much shorter than equivalent
programs written in languages like C.
2. An object-oriented language is one that supports object-oriented programming, a

programming style in which a program consists of a collection of objects that interact
with one another.
 Simula67 and Smalltalk are the earliest major object-oriented languages.
 Languages such as C++, C#, Java, and Ruby are more recent object-oriented
languages.
3. Scripting languages are interpreted languages with high-level operators designed for
"gluing together" computations.
 These computations were originally called "scripts." Awk, JavaScript, Perl, PHP,
Python, Ruby, and Tel are popular examples of scripting languages.
 Programs written in scripting languages are often much shorter than equivalent
programs written in languages like C.
2. Impacts on Compilers
Since the design of programming languages and compilers are intimately related, the
advances in programming languages placed new demands on compiler writers.
 They had to devise algorithms and representations to translate and support the
new language features.
A compiler must translate correctly the potentially infinite set of programs that
could be written in the source language.
 The problem of generating the optimal target code from a source program is
undecidable in general; thus, compiler writers must evaluate tradeoffs about what
problems to tackle and what heuristics to use to approach the problem of
generating efficient code.
Agenda
01 UNIT I : PART IV
Evolution of
Programming
Languages
 The Move to Higher-level
Languages
 Impacts on Compilers
… Thank You
Agenda
01 UNIT I : PART I
Languages
4. The Science of
building a Compiler
Technology
Agenda
01 UNIT I : PART I
The Science of
building a Compiler
 Modelling in Compiler Design
and Implementation
 The Science of Code
Optimization
The Science of Building a Compiler
The study of compilers is mainly

 a study of how we design the right mathematical models and
 choose the right algorithms,
while balancing the need for generality and power against simplicity and efficiency.
Some of most fundamental models are

 finite-state machines and
 regular expressions.
These models are useful for describing the lexical units of programs (keywords,
identifiers, and such) and for describing the algorithms used by the compiler to
recognize those units.
Also among the most fundamental models are context-free grammars, used to
describe the syntactic structure of programming languages such as the nesting of
parentheses or control constructs.
1 Modeling in Compiler Design and Implementation
A compiler must accept all source programs that conform to the

specification of the language.
Any transformation performed by the compiler while translating a

source program must preserve the meaning of the program being
compiled.
2 The Science of Code Optimization
The term "optimization" in compiler design refers to the attempts that a

compiler makes to produce code that is more efficient than the obvious code.
Compiler optimizations must meet the following design objectives:
1. The optimization must be correct, that is, preserve the meaning of the
compiled program,
2. The optimization must improve the performance of many programs,
3. The compilation time must be kept reasonable, and
4. The engineering effort required must be manageable.
Agenda
01 UNIT I : PART I
The Science of
building a Compiler
 Modelling in Compiler Design
and Implementation
 The Science of Code
Optimization
… Thank You
Agenda
01 UNIT I : PART I
Languages
Compiler
5. Applications of
Compiler
Technology
Agenda
01 UNIT I : PART I
Applications of
Compiler
Technology
 Implementation of High-Level
Programming Language
 Optimizations for Computer
Architectures
 Design of New Computer
Architectures
 Program Translations
 Software Productivity Tools
Applications of Compiler Technology
Compiler design is not only about compilers. Additionally, compiler design

impacts several other areas of computer science.
The most important interactions and applications of the technology include:
1. Implementation of High-Level Programming Languages

2. Optimizations for Computer Architectures
3. Design of New Computer Architectures
4. Program Translations
5. Software Productivity Tools
1. Implementation of High-Level Programming Language
A high-level programming language defines a programming abstraction: the programmer

expresses an algorithm using the language, and the compiler must translate that program to
the target language.
Generally, higher-level programming languages are easier to program in, but are less
efficient, that is, the target programs run more slowly.
Programmers using a low-level language have more control over a computation and
can, in principle, produce more efficient code.
 Unfortunately, lower-level programs are harder to write and — worse still — less
portable, more prone to errors, and harder to maintain.
Optimizing compilers include techniques to improve the performance of generated code,

thus offsetting the inefficiency introduced by high-level abstractions.
2 Optimizations for Computer Architectures
The rapid evolution of computer architectures has also led to an insatiable demand for new
compiler technology.
Almost all high-performance systems take advantage of the same two basic techniques:
1. Parallelism can be found at several levels:

 at the instruction level, where multiple operations are executed simultaneously and
 at the processor level, where different threads of the same application are run on
different processors.
2. Memory hierarchies are a response to the basic limitation that we can build very fast
storage or very large storage, but not storage that is both fast and large.
3 Design of New Computer Architectures
In the early days of computer architecture design, compilers were developed

after the machines were built. That has changed.
Since programming in high level languages is the norm, the performance of a

computer system is determined
 not by its raw speed but also by how well compilers can exploit its
features.
Thus, in modern computer architecture development, compilers are developed in

the processor-design stage, and compiled code, running on simulators, is used to
evaluate the proposed architectural features.
4 Program Translations
While we normally think of compiling as a translation from a high-level language to

the machine level, the same technology can be applied to translate between different
kinds of languages.
The following are some of the important applications of program-translation

techniques.
i. Binary Translation
ii. Database Query Interpreters
iii. Hardware Synthesis
iv. Compiled Simulation
i. Binary Translation
Compiler technology can be used to translate the binary code for one machine to that
of another, allowing a machine to run programs originally compiled for another
instruction set.
Binary translation technology has been used by various computer companies to
increase the availability of software for their machines.
ii. Database Query Interpreters

Besides specifying software and hardware, languages are useful in many other
applications. For example, query languages, especially SQL (Structured Query
Language), are used to search databases.
 Database queries consist of predicates containing relational and boolean operators.
 They can be interpreted or compiled into commands to search a database for
records satisfying that predicate.
iii. Hardware Synthesis

Not only is most software written in high-level languages; even hardware designs are mostly
described in high-level hardware description languages like Verilog and VHDL (Very high-
speed integrated circuit Hardware Description Language).
Hardware designs are typically described at the register transfer level (RTL), where
variables represent registers and expressions represent combinational logic.
Hardware-synthesis tools translate RTL descriptions automatically into gates, which

are then mapped to transistors and eventually to a physical layout.
Unlike compilers for programming languages, these tools often take hours optimizing the
circuit. Techniques to translate designs at higher levels, such as the behavior or functional
level, also exist.
iv. Compiled Simulation

Simulation is a general technique used in many scientific and engineering disciplines to
understand a phenomenon or to validate a design.
Inputs to a simulator usually include the description of the design and specific input
parameters for that particular simulation run.
Simulations can be very expensive.

5 Software Productivity Tools
Errors are rampant in programs;
• errors may crash a system,
• produce wrong results,
• render a system vulnerable to security attacks, or
• even lead to catastrophic failures in critical systems.
Testing is the primary technique for locating errors in programs.
An interesting and promising complementary approach is to use data-flow analysis to locate

errors statically (that is, before the program is run).
Dataflow analysis can find errors along all the possible execution paths, and not just
those exercised by the input data sets, as in the case of program testing.
Many of the data-flow-analysis techniques, originally developed for compiler optimizations,

can be used to create tools that assist programmers in their software engineering tasks.
The problem of finding all program errors is undecidable.
A data-flow analysis may be designed to warn the programmers of all possible
statements violating a particular category of errors.
• Type Checking
Type checking is an effective and well-established technique to catch
inconsistencies in programs.
It can be used to catch errors, for example, where an operation is applied to the
wrong type of object, or if parameters passed to a procedure do not match the
signature of the procedure.
Program analysis can go beyond finding type errors by analyzing the flow of data
through a program. For example, if a pointer is assigned null and then immediately
dereferenced, the program is clearly in error.
• Bounds Checking
It is easier to make mistakes when programming in a lower-level language than a
higher-level one.
For example, many security breaches in systems are caused by buffer overflows in programs
written in C. Because C does not have array bounds checks, it is up to the user to
ensure that the arrays are not accessed out of bounds.
Failing to check that the data supplied by the user can overflow a buffer, the program may
be tricked into storing user data outside of the buffer.
An attacker can manipulate the input data that causes the program to misbehave and
compromise the security of the system. Techniques have been developed to find buffer
overflows in programs, but with limited success.
• Memory-Management Tools
Garbage collection is another excellent example of the trade off between
efficiency and a combination of ease of programming and software
reliability.
Automatic memory management obliterates all memory-management errors

(e.g., "memory leaks"), which are a major source of problems in C and C++
programs.
Various tools have been developed to help programmers find memory management
errors. For example, Purify is a widely used tool that dynamically catches
memory management errors as they occur. Tools that help identify some of
these problems statically have also been developed.
Agenda
01 UNIT I : PART I
Applications of
Compiler
Technology
 Implementation of High-Level
 Optimizations for Computer
Architectures
 Design of New Computer
Architectures
 Program Translations
 Software Productivity Tools
… Thank You
Agenda
01 UNIT I : PART I
Languages
Compiler
Technology
6. Programming
Language Basics
Agenda
01 UNIT I : PART I
Basics
 The Static/Dynamic Distinction
 Environments and States
 Static Scope and Block
Structure
 Explicit Access Control
 Dynamic Scope
 Parameter Passing Mechanisms
 Aliasing
1 The Static/Dynamic Distinction
An important issue that we face when designing a compiler for a language is

what decisions can the compiler make about a program.
1. If a language uses a policy that allows the compiler to decide an issue,

then we say that the language uses a static policy or that the issue can be
decided at compile time.
2. On the other hand, a policy that only allows a decision to be made when
we execute the program is said to be a dynamic policy or to require a
decision at run time.
Most languages, such as C and Java, use static scope.

2 Environments and States
Another important distinction we must make when discussing programming

languages is whether changes occurring as the program runs affect
 the values of data elements or
 interpretation of names for that data
For example, the execution of an assignment such as x = y +1 changes the value

denoted by the name x.
More specifically, the assignment changes the value in whatever location is

denoted by x.
It may be less clear that the location denoted by x can change at run time.
3 Static Scope and Block Structure
Most languages, including C and its family, use static scope.
The scope rules for C are based on program structure;

• the scope of a declaration is determined implicitly by where the
declaration appears in the program.
Later languages, such as C++, Java, and C#, also provide explicit control
over scopes through the use of keywords like
1. public
2. private
3. protected.
C uses braces { and } to delimit a block.

4 Explicit Access Control
Classes and structures introduce a new scope for their members.
If p is an object of a class with a field (member) x, then the use of x in p.x refers to field
x in the class definition.
5 Dynamic Scope
Technically, any scoping policy is dynamic if it is based on factor(s) that

can be known only when the program executes.
The term dynamic scope, however, usually refers to the following policy:
• a use of a name x refers to the declaration of x in the most recently called
procedure with such a declaration.
Dynamic scoping of this type appears only in special situations.

6 Parameter Passing Mechanisms
All programming languages have a notion of a procedure, but they can differ in how
these procedures get their arguments.
The great majority of languages use either

 call-by-value
 call-by-reference
 or both.
7 Aliasing
There is an interesting consequence of call-by-reference parameter passing or its

simulation, as in Java, where references to objects are passed by value.
It is possible that two formal parameters can refer to the same location; such
variables are said to be aliases of one another.
As a result, any two variables, which may appear to take their values from two
distinct formal parameters, can become aliases of each other, as well.
Agenda
01 UNIT I : PART I
Basics
 The Static/Dynamic Distinction
 Environments and States
 Static Scope and Block
Structure
 Explicit Access Control
 Dynamic Scope
 Parameter Passing Mechanisms
 Aliasing
…. Thank You
Agenda 01 END OF
UNIT I : PART I
Languages
Compiler
Technology
6. Programming Language
Basics
….. THANK YOU

CD Unit I Part I Introduction

Uploaded by

Copyright:

Available Formats

CD Unit I Part I Introduction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CD Unit I Part I Introduction

Uploaded by

Copyright:

Available Formats

Compiler Design

Programming languages are notations for describing

But, before a program can be run, it first must be

The software systems that do this translation are called

If the target program is an executable machine-language

 An important role of the compiler is to report any

Several other programs may be required to create

 The task of collecting the source program is sometimes

 The preprocessor may also expand shorthands, called

 Preprocessors produce input to compilers.

 They may perform the following functions:

The (modified or pre-processed or pure) source

The compiler may produce an assembly-language

The assembly language is then processed by a program

 Assembly code is a mnemonic version of machine

 names are used instead of binary codes for

 The linker resolves external memory addresses,

 The loader then puts together the entire

A compiler maps a source program into a semantically equivalent target program. A

 breaks up the source program into constituent pieces and

 detects that the source program is either syntactically ill

 collects information about the source program and stores it in

The synthesis part often called the back end of the

 constructs the desired target program from the

Compiler operates as a sequence of phases, each

A typical decomposition of a compiler into phases is

The lexical analyzer

For example, suppose a source program contains the assignment statement

The parser creates a tree-like intermediate representation

A typical representation is a syntax tree in which

An important part of semantic analysis is type checking

 where the compiler checks that each operator has

 The language specification may permit some type

Notice that the output of the semantic analyzer has an extra

In the process of translating a source program into target

We consider an intermediate form called three-address

Each operand can act like a register.

The machine-independent code-optimization phase attempts to

Usually better means faster, but other objectives may be desired,

The optimizer can deduce that the conversion of 60 from integer to

The code generator takes as input an intermediate representation

The intermediate instructions are translated into sequences of

A crucial aspect of code generation is the judicious assignment of

The symbol table is a data structure

 which stores information about all

 The data structure should be designed to allow the

 Lexical and Syntax Analysis will fill up symbol table and

The operations themselves were very low level:

1. The Move to Higher-level Languages

Today, there are thousands of programming languages with innovative features to

 Second-generation the assembly languages

 Third-generation the higher-level languages like FORTRAN, COBOL, Lisp, C,

 Fourth-generation languages are languages designed for specific applications like

2. An object-oriented language is one that supports object-oriented programming, a

2. An object-oriented language is one that supports object-oriented programming, a

The study of compilers is mainly

Some of most fundamental models are

A compiler must accept all source programs that conform to the

Any transformation performed by the compiler while translating a

The term "optimization" in compiler design refers to the attempts that a