Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

CD Unit I Part I Introduction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 67

Compiler Design

Dr.S.Gopi Krishna
Professor in CSE
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language
Basics
Agenda
01 UNIT I : PART I
1. Language
Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language Basics
Agenda UNIT I : PART I
01

1.Language
Processors
 Compiler
 Language Processing
System
Language Processors

Programming languages are notations for describing


computations to people and to machines.

But, before a program can be run, it first must be


translated into a form in which it can be executed by a
computer.

The software systems that do this translation are called


compilers.
Compiler
Simply stated, a compiler is a program that can read a
program in one language — the source language — and
translate it into an equivalent program in another language
— the target language

If the target program is an executable machine-language


program, it can then be called by the user to process
inputs and produce outputs

 An important role of the compiler is to report any


errors in the source program that it detects during
the translation process.
Language Processing System

Several other programs may be required to create


an executable target program.

1. Preprocessor

2. Compiler

3. Assembler

4. Linker/Loader
1. Preprocessor

 The task of collecting the source program is sometimes


entrusted to a separate program, called a preprocessor.

 The preprocessor may also expand shorthands, called


macros, into source language statements.

 Preprocessors produce input to compilers.

 They may perform the following functions:


1. Macro processing
2. File inclusion
3. "Rational" preprocessing
4. Language Extensions
2. Compiler

The (modified or pre-processed or pure) source


program is then fed to a compiler.

The compiler may produce an assembly-language


program as its output, because assembly language is
easier to produce as output and is easier to debug.
3. Assembler

The assembly language is then processed by a program


called an assembler that produces relocatable machine
code as its output.

 Assembly code is a mnemonic version of machine


code, in which

 names are used instead of binary codes for


operations, and names are also given to memory
addresses.
4. Linker/Loader

 The linker resolves external memory addresses,


where the code in one file may refer to a
location in another file.

 The loader then puts together the entire


executable object files into memory for
execution.
Agenda UNIT I : PART I
01

1.Language
Processors
 Compiler
 Language Processing
System

…. Thank you
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a
Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language Basics
Agenda UNIT I : PART I
01

2. Structure of a
Compiler
 Analysis - Synthesis
 Phases of a Compiler
Structure of a compiler

A compiler maps a source program into a semantically equivalent target program. A


compiler breaks into two parts:

1. Analysis and
2. Synthesis.
1. Analysis
The analysis part often called the front end of the compiler,

 breaks up the source program into constituent pieces and


imposes a grammatical structure on them. It then uses this
structure to create an intermediate representation of the source
program.

 detects that the source program is either syntactically ill


formed or semantically unsound, then it must provide
informative messages, so the user can take corrective action.

 collects information about the source program and stores it in


a data structure called a symbol table, which is passed along with
the intermediate representation to the synthesis part.
2. Synthesis

The synthesis part often called the back end of the


compiler

 constructs the desired target program from the


intermediate representation and the information in
the symbol table.
Phases of a Compiler

Compiler operates as a sequence of phases, each


of which transforms one representation of the source
program to another.

A typical decomposition of a compiler into phases is


shown in Fig.
1 Lexical Analysis
The first phase of a compiler is called lexical analysis or
scanning (scanner).

The lexical analyzer


o reads the stream of characters making up the source
program
o groups the characters into meaningful sequences called
lexemes
o produce output as a sequence of tokens

For example, suppose a source program contains the assignment statement


position = initial + rate * 60

Representation of the assignment statement, after lexical analysis as the sequence of tokens
(id,1) (=) (id, 2) (+) (id, 3) (*) (60)
2 Syntax Analysis
The second phase of the compiler is syntax analysis or
parsing (Parser).

The parser creates a tree-like intermediate representation


that depicts the grammatical structure of the token stream.

A typical representation is a syntax tree in which


• each interior node represents an operation and
• the children of the node represent the arguments of the
operation.

A syntax tree for the token stream is shown as the output of the syntactic analyzer. This tree
shows the order in which the operations in the assignment position = initial + rate * 60
are to be performed.
3 Semantic Analysis

An important part of semantic analysis is type checking

 where the compiler checks that each operator has


matching operands.

 The language specification may permit some type


conversions called coercions.

Notice that the output of the semantic analyzer has an extra


node for the operator inttofloat, which explicitly converts its
integer argument into a floating-point number
4 Intermediate Code Generation

In the process of translating a source program into target


code, a compiler may construct one or more intermediate
representations, which can have a variety of forms.

We consider an intermediate form called three-address


code, which consists of a sequence of assembly-like
instructions with three operands per instruction.

Each operand can act like a register.


5 Code Optimization

The machine-independent code-optimization phase attempts to


improve the intermediate code so that better target code will result.

Usually better means faster, but other objectives may be desired,


such as shorter code, or target code that consumes less power.

The optimizer can deduce that the conversion of 60 from integer to


floating point can be done once and for all at compile time, so the
inttofloat operation can be eliminated by replacing the integer 60 by
the floating-point number 60.0.
6 Code Generation

The code generator takes as input an intermediate representation


of the source program and maps it into the target language.

The intermediate instructions are translated into sequences of


machine instructions that perform the same task.

A crucial aspect of code generation is the judicious assignment of


registers to hold variables. For example, using registers R1 and R2,
the intermediate code might get translated into the machine code
Symbol-Table Management

The symbol table is a data structure

 which stores information about all


variables/identifiers.

 The data structure should be designed to allow the


compiler
 to find the record for each name quickly and
 to store or retrieve data from that record quickly.

 Lexical and Syntax Analysis will fill up symbol table and


Code optimizer and Code generator make use of this
symbol table
Agenda UNIT I : PART I
01

2. Structure of a
Compiler
 Analysis - Synthesis
 Phases of a Compiler

…. Thank you
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of
Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language Basics
Agenda
01 UNIT I : PART IV

Evolution of
Programming
Languages
 The Move to Higher-level
Languages
 Impacts on Compilers
The Evolution of Programming Languages
The first electronic computers appeared in the 1940's and were programmed in machine
language by sequences of 0's and 1's that explicitly told the computer what operations to
execute and in what order.

The operations themselves were very low level:


 move data from one location to another,
 add the contents of two registers,
 compare two values, and
 so on.

Needless to say, this kind of programming was slow, tedious, and error prone. And once
written, the programs were hard to understand and modify.

1. The Move to Higher-level Languages


2. Impacts on Compilers
1. The Move to Higher-level Languages
The first step towards more people-friendly programming languages was the development of
mnemonic assembly languages in the early 1950's.
 Initially, the instructions in an assembly language were just mnemonic representations
of machine instructions.
 Later, macro instructions were added to assembly languages so that a programmer
could define parameterized shorthands for frequently used sequences of machine
instructions.

A major step towards higher-level languages was made in the latter half of the 1950's with
the development of FORTRAN for scientific computation, COBOL for business data
processing, and LISP for symbolic computation.

Today, there are thousands of programming languages with innovative features to


help make programming easier, more natural, and more robust. They can be classified
in a variety of ways.
1. The Move to Higher-level Languages

1. Classification is by generation.
 First-generation languages are the machine languages

 Second-generation the assembly languages

 Third-generation the higher-level languages like FORTRAN, COBOL, Lisp, C,


C++, C#, and Java

 Fourth-generation languages are languages designed for specific applications like


NOMAD for report generation, SQL for database queries, and Postscript for text
formatting.
1. The Move to Higher-level Languages

2. An object-oriented language is one that supports object-oriented programming, a


programming style in which a program consists of a collection of objects that interact
with one another.
 Simula67 and Smalltalk are the earliest major object-oriented languages.
 Languages such as C++, C#, Java, and Ruby are more recent object-oriented
languages.

3. Scripting languages are interpreted languages with high-level operators designed for
"gluing together" computations.
 These computations were originally called "scripts." Awk, JavaScript, Perl, PHP,
Python, Ruby, and Tel are popular examples of scripting languages.
 Programs written in scripting languages are often much shorter than equivalent
programs written in languages like C.
1. The Move to Higher-level Languages

2. An object-oriented language is one that supports object-oriented programming, a


programming style in which a program consists of a collection of objects that interact
with one another.
 Simula67 and Smalltalk are the earliest major object-oriented languages.
 Languages such as C++, C#, Java, and Ruby are more recent object-oriented
languages.

3. Scripting languages are interpreted languages with high-level operators designed for
"gluing together" computations.
 These computations were originally called "scripts." Awk, JavaScript, Perl, PHP,
Python, Ruby, and Tel are popular examples of scripting languages.
 Programs written in scripting languages are often much shorter than equivalent
programs written in languages like C.
2. Impacts on Compilers
Since the design of programming languages and compilers are intimately related, the
advances in programming languages placed new demands on compiler writers.

 They had to devise algorithms and representations to translate and support the
new language features.

A compiler must translate correctly the potentially infinite set of programs that
could be written in the source language.

 The problem of generating the optimal target code from a source program is
undecidable in general; thus, compiler writers must evaluate tradeoffs about what
problems to tackle and what heuristics to use to approach the problem of
generating efficient code.
Agenda
01 UNIT I : PART IV

Evolution of
Programming
Languages
 The Move to Higher-level
Languages
 Impacts on Compilers

… Thank You
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of
building a Compiler
5. Applications of Compiler
Technology
6. Programming Language Basics
Agenda
01 UNIT I : PART I
The Science of
building a Compiler
 Modelling in Compiler Design
and Implementation
 The Science of Code
Optimization
The Science of Building a Compiler

The study of compilers is mainly


 a study of how we design the right mathematical models and
 choose the right algorithms,
while balancing the need for generality and power against simplicity and efficiency.

Some of most fundamental models are


 finite-state machines and
 regular expressions.
These models are useful for describing the lexical units of programs (keywords,
identifiers, and such) and for describing the algorithms used by the compiler to
recognize those units.

Also among the most fundamental models are context-free grammars, used to
describe the syntactic structure of programming languages such as the nesting of
parentheses or control constructs.
1 Modeling in Compiler Design and Implementation

A compiler must accept all source programs that conform to the


specification of the language.

Any transformation performed by the compiler while translating a


source program must preserve the meaning of the program being
compiled.
2 The Science of Code Optimization

The term "optimization" in compiler design refers to the attempts that a


compiler makes to produce code that is more efficient than the obvious code.

Compiler optimizations must meet the following design objectives:

1. The optimization must be correct, that is, preserve the meaning of the
compiled program,
2. The optimization must improve the performance of many programs,
3. The compilation time must be kept reasonable, and
4. The engineering effort required must be manageable.
Agenda
01 UNIT I : PART I
The Science of
building a Compiler
 Modelling in Compiler Design
and Implementation
 The Science of Code
Optimization

… Thank You
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of
Compiler
Technology
6. Programming Language Basics
Agenda
01 UNIT I : PART I
Applications of
Compiler
Technology
 Implementation of High-Level
Programming Language
 Optimizations for Computer
Architectures
 Design of New Computer
Architectures
 Program Translations
 Software Productivity Tools
Applications of Compiler Technology

Compiler design is not only about compilers. Additionally, compiler design


impacts several other areas of computer science.

The most important interactions and applications of the technology include:

1. Implementation of High-Level Programming Languages


2. Optimizations for Computer Architectures
3. Design of New Computer Architectures
4. Program Translations
5. Software Productivity Tools
1. Implementation of High-Level Programming Language

A high-level programming language defines a programming abstraction: the programmer


expresses an algorithm using the language, and the compiler must translate that program to
the target language.

Generally, higher-level programming languages are easier to program in, but are less
efficient, that is, the target programs run more slowly.

Programmers using a low-level language have more control over a computation and
can, in principle, produce more efficient code.
 Unfortunately, lower-level programs are harder to write and — worse still — less
portable, more prone to errors, and harder to maintain.

Optimizing compilers include techniques to improve the performance of generated code,


thus offsetting the inefficiency introduced by high-level abstractions.
2 Optimizations for Computer Architectures

The rapid evolution of computer architectures has also led to an insatiable demand for new
compiler technology.

Almost all high-performance systems take advantage of the same two basic techniques:

1. Parallelism can be found at several levels:


 at the instruction level, where multiple operations are executed simultaneously and
 at the processor level, where different threads of the same application are run on
different processors.

2. Memory hierarchies are a response to the basic limitation that we can build very fast
storage or very large storage, but not storage that is both fast and large.
3 Design of New Computer Architectures

In the early days of computer architecture design, compilers were developed


after the machines were built. That has changed.

Since programming in high level languages is the norm, the performance of a


computer system is determined
 not by its raw speed but also by how well compilers can exploit its
features.

Thus, in modern computer architecture development, compilers are developed in


the processor-design stage, and compiled code, running on simulators, is used to
evaluate the proposed architectural features.
4 Program Translations

While we normally think of compiling as a translation from a high-level language to


the machine level, the same technology can be applied to translate between different
kinds of languages.

The following are some of the important applications of program-translation


techniques.
i. Binary Translation
ii. Database Query Interpreters
iii. Hardware Synthesis
iv. Compiled Simulation
4 Program Translations

i. Binary Translation
Compiler technology can be used to translate the binary code for one machine to that
of another, allowing a machine to run programs originally compiled for another
instruction set.
Binary translation technology has been used by various computer companies to
increase the availability of software for their machines.

ii. Database Query Interpreters


Besides specifying software and hardware, languages are useful in many other
applications. For example, query languages, especially SQL (Structured Query
Language), are used to search databases.
 Database queries consist of predicates containing relational and boolean operators.
 They can be interpreted or compiled into commands to search a database for
records satisfying that predicate.
4 Program Translations

iii. Hardware Synthesis


Not only is most software written in high-level languages; even hardware designs are mostly
described in high-level hardware description languages like Verilog and VHDL (Very high-
speed integrated circuit Hardware Description Language).

Hardware designs are typically described at the register transfer level (RTL), where
variables represent registers and expressions represent combinational logic.

Hardware-synthesis tools translate RTL descriptions automatically into gates, which


are then mapped to transistors and eventually to a physical layout.

Unlike compilers for programming languages, these tools often take hours optimizing the
circuit. Techniques to translate designs at higher levels, such as the behavior or functional
level, also exist.
4 Program Translations

iv. Compiled Simulation


Simulation is a general technique used in many scientific and engineering disciplines to
understand a phenomenon or to validate a design.

Inputs to a simulator usually include the description of the design and specific input
parameters for that particular simulation run.

Simulations can be very expensive.


5 Software Productivity Tools
Errors are rampant in programs;
• errors may crash a system,
• produce wrong results,
• render a system vulnerable to security attacks, or
• even lead to catastrophic failures in critical systems.
Testing is the primary technique for locating errors in programs.

An interesting and promising complementary approach is to use data-flow analysis to locate


errors statically (that is, before the program is run).

Dataflow analysis can find errors along all the possible execution paths, and not just
those exercised by the input data sets, as in the case of program testing.

Many of the data-flow-analysis techniques, originally developed for compiler optimizations,


can be used to create tools that assist programmers in their software engineering tasks.
5 Software Productivity Tools
The problem of finding all program errors is undecidable.
A data-flow analysis may be designed to warn the programmers of all possible
statements violating a particular category of errors.

• Type Checking
Type checking is an effective and well-established technique to catch
inconsistencies in programs.

It can be used to catch errors, for example, where an operation is applied to the
wrong type of object, or if parameters passed to a procedure do not match the
signature of the procedure.

Program analysis can go beyond finding type errors by analyzing the flow of data
through a program. For example, if a pointer is assigned null and then immediately
dereferenced, the program is clearly in error.
5 Software Productivity Tools

• Bounds Checking
It is easier to make mistakes when programming in a lower-level language than a
higher-level one.

For example, many security breaches in systems are caused by buffer overflows in programs
written in C. Because C does not have array bounds checks, it is up to the user to
ensure that the arrays are not accessed out of bounds.

Failing to check that the data supplied by the user can overflow a buffer, the program may
be tricked into storing user data outside of the buffer.

An attacker can manipulate the input data that causes the program to misbehave and
compromise the security of the system. Techniques have been developed to find buffer
overflows in programs, but with limited success.
5 Software Productivity Tools

• Memory-Management Tools
Garbage collection is another excellent example of the trade off between
efficiency and a combination of ease of programming and software
reliability.

Automatic memory management obliterates all memory-management errors


(e.g., "memory leaks"), which are a major source of problems in C and C++
programs.

Various tools have been developed to help programmers find memory management
errors. For example, Purify is a widely used tool that dynamically catches
memory management errors as they occur. Tools that help identify some of
these problems statically have also been developed.
Agenda
01 UNIT I : PART I
Applications of
Compiler
Technology
 Implementation of High-Level
Programming Language
 Optimizations for Computer
Architectures
 Design of New Computer
Architectures
 Program Translations
 Software Productivity Tools
… Thank You
Agenda
01 UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming
Language Basics
Agenda
01 UNIT I : PART I
Programming Language
Basics
 The Static/Dynamic Distinction
 Environments and States
 Static Scope and Block
Structure
 Explicit Access Control
 Dynamic Scope
 Parameter Passing Mechanisms
 Aliasing
1 The Static/Dynamic Distinction

An important issue that we face when designing a compiler for a language is


what decisions can the compiler make about a program.

1. If a language uses a policy that allows the compiler to decide an issue,


then we say that the language uses a static policy or that the issue can be
decided at compile time.

2. On the other hand, a policy that only allows a decision to be made when
we execute the program is said to be a dynamic policy or to require a
decision at run time.

Most languages, such as C and Java, use static scope.


2 Environments and States

Another important distinction we must make when discussing programming


languages is whether changes occurring as the program runs affect
 the values of data elements or
 interpretation of names for that data

For example, the execution of an assignment such as x = y +1 changes the value


denoted by the name x.

More specifically, the assignment changes the value in whatever location is


denoted by x.

It may be less clear that the location denoted by x can change at run time.
3 Static Scope and Block Structure
Most languages, including C and its family, use static scope.

The scope rules for C are based on program structure;


• the scope of a declaration is determined implicitly by where the
declaration appears in the program.

Later languages, such as C++, Java, and C#, also provide explicit control
over scopes through the use of keywords like
1. public
2. private
3. protected.

C uses braces { and } to delimit a block.


4 Explicit Access Control

Classes and structures introduce a new scope for their members.

If p is an object of a class with a field (member) x, then the use of x in p.x refers to field
x in the class definition.
5 Dynamic Scope

Technically, any scoping policy is dynamic if it is based on factor(s) that


can be known only when the program executes.

The term dynamic scope, however, usually refers to the following policy:
• a use of a name x refers to the declaration of x in the most recently called
procedure with such a declaration.

Dynamic scoping of this type appears only in special situations.


6 Parameter Passing Mechanisms

All programming languages have a notion of a procedure, but they can differ in how
these procedures get their arguments.

The great majority of languages use either


 call-by-value
 call-by-reference
 or both.
7 Aliasing

There is an interesting consequence of call-by-reference parameter passing or its


simulation, as in Java, where references to objects are passed by value.

It is possible that two formal parameters can refer to the same location; such
variables are said to be aliases of one another.

As a result, any two variables, which may appear to take their values from two
distinct formal parameters, can become aliases of each other, as well.
Agenda
01 UNIT I : PART I
Programming Language
Basics
 The Static/Dynamic Distinction
 Environments and States
 Static Scope and Block
Structure
 Explicit Access Control
 Dynamic Scope
 Parameter Passing Mechanisms
 Aliasing
…. Thank You
Agenda 01 END OF
UNIT I : PART I
1. Language Processors
2. Structure of a Compiler
3. Evolution of Programming
Languages
4. The Science of building a
Compiler
5. Applications of Compiler
Technology
6. Programming Language
Basics
….. THANK YOU

You might also like