Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Bedasa

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Chapter One

Introduction

Compiler Design

1
Introduction
Definition

 Compiler is an executable program that can read a program in

one high-level language and translate it into an equivalent

executable program in machine language.

 A compiler is a computer program that translates an executable

program in a source language into an equivalent program in a

target language.
Source Target
target program/code
A source progra Compil
program/code isis a aprogram/code written inprogra
the
m er m
program/code written in the
source language, which is usually a high-level language.
target language, which often is Error
messa
2 a machine language or an ge
Contd.
 As a discipline compiler design involves multiple computer

science and Engineering courses like:


Programming Languages

Data structures and Algorithms

Theory of Computation (Automata and formal language theory)

Assembly language

Software Engineering

Computer Architecture

Operating Systems and


Discrete Mathematics

3
Why Study Theory of Compiler?
 curiosity
 Prerequisite for developing advanced compilers, which continues
to be active as new computer architectures emerge

To improve capabilities of existing compiler/interpreter


To write more efficient code in a high-level language
 Useful to develop software tools that parse computer codes or
strings
E.g., editors, debuggers, interpreters, preprocessors, …
 Important to understand how compliers work to program more
effectively
 To provide solid foundation in parsing theory for parser writing
 To make compiler design as an excellent “capstone” project
4
 To apply almost all of the major computer science fields, such as:
Classification of Compilers
o Classifying compilers by
 Compilers Viewed from Many Perspectives
number of passes has its
Single Pass background in the hardware
resource limitations of
Multiple Pass Construction
computers.
Load & Go o Compiling involves performing
lots of work and early
computers did not have enough
Debugging
memory to contain one
Functional
Optimizing program that did all of this
work.
 So compilers were split up
into smaller programs
which each made a pass
 However, all utilize same basic tasks to accomplish their actions
over the source (or some
5
representation of it)
Contd.
1. Single(One) Pass Compilers:- is a compiler that passes through
the source code of each compilation unit only once
Also called narrow compilers.
The ability to compile in a single pass has classically been seen as
a benefit because it simplifies the job of writing a compiler.
Single-pass compilers generally perform compilations faster than
multi-pass compilers.
Due to the resource limitations of early systems, many early
languages were specifically designed so that they could be compiled
in a single pass (e.g., Pascal).

Disadvantage of single pass Compilers:


It is not possible to perform many of the sophisticated
optimizations needed to generate high quality code.
6
It can be difficult to count exactly how many passes an
Contd.
2. Multi-Pass Compilers:- is a type of compiler that processes
the source code or abstract syntax tree of a program several
times.
Also called wide compilers.
Phases are separate "Programs", which run sequentially
Here, by splitting the compiler up into small programs,
correct programs will be produced.
Proving the correctness of a set of small programs
often requires less effort than proving the
correctness of a larger, single, equivalent program.
Many programming languages cannot be represented

7
with a single pass compilers, for example most latest
Contd.
3. Load and Go Compilers:- generates machine code & then
immediately executes it.

Compilers usually produce either absolute code that is

executed immediately upon conclusion of the compilation

or object code that is transformed by a linking loader

into absolute code.

These compiler organizations will be called Load & Go

and Link/Load.

Both Load & Go and Link/Load compilers use a number of

8 passes to translate the source program into absolute


Contd.
4. Optimizing Compilers:- is a compiler that tries to

minimize or maximize some attributes of an executable

computer program.

The most common requirement is to minimize the time

taken to execute a program ; a less common one is to

minimize the amount of memory occupied.

The growth of portable computers has created a market for

minimizing the power consumed by a program.

Compiler optimization is generally implemented using a

sequence of optimizing transformations, algorithms which

take a program and transform it to produce a


9
Cousins of Compilers
A. Assembler:- is a translator that converts programs written in
assembly language into machine code.
 Translate mnemonic operation codes to their machine language
equivalents.
 Assigning machine addresses to symbolic labels.

B. Interpreter:- is a computer program that translates high


level instructions/programs into machine code as they are
encountered.
 It produces output of statement as they are interpreted
 It generally uses one of the following strategies for program
execution:
10
i. execute the source code directly
source program
Contd.
preprocessor
modified source program
compiler
target assembly program

assembler

Relocatable machine code

linker/loader Library
files
target machine code
C. Linker:- is a program that takes one or more objects
generated by a compiler and combines them into a single
executable program.
D. Loader:- is the part of an operating system that is
11 responsible for loading programs from executables (i.e.,
Compiler vs. Interpreter

Ideal concept:
Source Compiler Executable
code
Input
Executable Output
data
data

Source
code Interpreter Output
Input data data

 Most languages are usually thought of as using either one or

the other:

 Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1


12  Interpreters: Lisp, scheme, BASIC, APL, Perl, Python,
Basic Compiler Design
 Write a huge program that takes as input another program
in the source language for the compiler, and gives as output
an executable that we can run.
For modifying code easily, usually, we use modular design
(decomposition) methodology to design a compiler.
Two design strategies:
1. Write a “front end” of the compiler (i.e. the lexer, parser,
semantic analyzer, and assembly tree generator), and write
a separate back end for each platform that you want to
support
2. Write an
Sour efficient highly optimized back end, and write a
Targe
Intermedi
ce
different Frontend
front Endfor several Back End
ate languages, such as
t Fortran, C,
13 code code
C++, and Java. code
The Analysis-Synthesis Model of
Compilation
There are two parts to compilation: analysis & synthesis.
 During analysis, the operations implied by the source

program are determined and recorded in a hierarchical

structure called a tree.


 During synthesis,
1. Lexical Analysisthe operations involved
 Breaks in program
up source producing
into
2. Syntax Analysis
Analysis constituent pieces
Front

translated code.
End

3. Semantic Analysis  Creates intermediate


representation of source
program
 Construct target program
4. Code
Back
End

Generation Synthesis from intermediate


5. Optimization representation
14
 Takes the tree structure and
translates the operations into
Analysis
 In compiling, analysis has three phases:
1. Linear analysis: stream of characters read from left-to-right and

grouped into tokens; known as lexical analysis or scanning


 Converting input text into stream of known objects called
tokens.
 It simplifies scanning process

2. Hierarchical analysis: tokens grouped hierarchically with

collective meaning; known as parsing or syntax analysis


 Translating code to rules of grammar.
 Building representation of code.

3. Semantic analysis: check if the program components fit

together meaningfully
15  Checks source program for semantic errors
Phases of Compilation

Stream of characters

scanne
r Stream of tokens
parse
r
Parse/syntax tree
Semantic
analyzer
Annotated tree
Intermediate code
General Structure of a generator
Intermediate code
Compiler Code
optimization
Intermediate code
Code
generator
Target code
Code
optimization
Target code
Phase I: Lexical Analysis
 The low-level text processing portion of the compiler
 The source file, a stream of characters, is broken into larger
chunks called tokens.
For example:
void main() It will be broken into 13 tokens
{
int x; as below:
x=3; void main ( ) { int x ;
} x=3;}
 The lexical analyzer (scanner) reads a stream of characters
and puts them together into some meaningful (with respect to
the source language) units called tokens.
 Typically, spaces, tabs, end-of-line characters and comments

17 are ignored by the lexical analyzer.



Phase II: Parsing (Syntax Analysis)
 A parser gets a stream of tokens from the scanner, and
determines if the syntax (structure) of the program is correct
according to the (context-free) grammar of the source
language.
 Then, it produces a data structure, called a parse tree or an
abstract syntax tree, which describes the syntactic structure of
the program.
 The parser ensures that the sequence of tokens returned by
the lexical analyzer forms a syntactically correct program
 It also builds a structured representation of the program
called an abstract syntax tree that is easier for the type
checker to analyze than a stream of tokens
 It catches the syntax errors as the statement below:
18 if if (x > 3) then x = x + 1
Parse Tree
 Is output of parsing that shows the Top-down description of

program syntax
 Root node is entire program and leaves are tokens that were

identified during lexical analysis


 Constructed by repeated application of rules in Context Free

Grammar (CFG)
 Syntax structures are analyzed by DPDA (Deterministic Push

Down Automata)

Example: parse tree for position:=initial + rate*60

19
Phase III: Semantic Analysis
 It gets the parse tree from the parser together with
information about some syntactic elements
 It determines if the semantics (meanings) of the program is
correct.
 It detects errors of the program, such as using variables
before they are declared, assign an integer value to a
Boolean variable, …
 This part deals with static semantic.
 semantic of programs that can be checked by reading off
from the program only.
 syntax of the language which cannot be described in
context-free grammar.
20
 Mostly, a semantic analyzer does type checking (i.e. Gathers
Contd.
 The main tool used by the semantic analyzer is a symbol table
 Symbol table:- is a data structure with a record for each
identifier and its attributes
 Attributes include storage allocation, type, scope, etc
 All the compiler phases insert and modify the symbol table
 Discovery of meaning in a program using the symbol table
 Do static semantics check
 Simplify the structure of the parse tree ( from parse tree to
abstract syntax tree (AST) )
Static semantics check
 Making sure identifiers are declared before use
 Type checking for assignments and operators
21  Checking types and number of parameters to subroutines
Phase IV: Intermediate Code Generation
 An intermediate code generator
 takes a parse tree from the semantic analyzer
 generates a program in the intermediate language.
 In some compilers, a source program is translated into an

intermediate code first and then the intermediate code is

translated into the target language.


 In other compilers, a source program is translated directly into

the target language.

 Compiler makes a second pass over the parse tree to produce

the translated code


22 If there are no compile-time errors, the semantic analyzer
Contd.

 Using intermediate code is beneficial when compilers which

translates a single source language to many target languages

are required.

 The front-end of a compiler:- scanner to intermediate

code generator can be used for every compilers.

 Different back-ends:- code optimizer and code generator

is required for each target language.

 One of the popular intermediate code is three-address code.

 A three-address code instruction is in the form of x = y


23
Phase V: Assembly Code Generation

 Code generator coverts the abstract assembly tree into the

actual assembly code

 To do code generation

 The generator covers the abstract assembly tree with

tiles (each tile represents a small portion of an abstract

Phaseassembly
VI: Machine Code Generation and
tree) and
Linking
 Output the actual assembly code associated with the tiles that
 The final phase of compilation coverts the assembly code into
we used to cover the tree
machine code and links (by a linker) in appropriate language
24
libraries
Code Optimization
 Replacing an inefficient sequence of instructions with a better

sequence of instructions.
 Sometimes called code improvement.
 Code optimization can be done:
 after semantic analyzing
performed on a parse tree
 after intermediate code generation
performed on a intermediate code
 after code generation
performed on a target code

 Two types of optimization


1. Local
25
2. Global
Local Optimization

 The compiler looks at a very small block of instructions and

tries to determine how it can improve the efficiency of this

local code block

 Relatively easy; included as part of most compilers

Examples of possible local optimizations

1. Constant evaluation

2. Strength reduction

3. Eliminating unnecessary operations

26
Global Optimization

 The compiler looks at large segments of the program to

decide how to improve performance

 Much more difficult; usually omitted from all but the most

sophisticated and expensive production-level “optimizing

compilers”
 Optimization cannot make an inefficient algorithm efficient

27
The Phases of a Compiler
Phase Output Sample
Programmer (source Source string A=B+C;
code producer)
Scanner (performs lexical Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’,
analysis) ‘;’
And symbol table with
names
Parser (performs syntax Parse tree or abstract ;
|
analysis based on the syntax tree =
grammar of the /\
programming language) A +
/\
B C

Semantic analyzer (type Annotated parse tree


checking, etc) or abstract syntax tree
Intermediate code Three-address code, int2fp B t1
generator quads, or RTL + t1 C t2
:= t2 A
Optimizer Three-address code, int2fp B t1
quads, or RTL + t1 #2.3 A
28
Code generator Assembly code MOVF #2.3,r1
Summary of Phases of
Compiler

29
Compiler Construction Tools

Software development tools are available to implement one or more


compiler phases Other compiler tools:
 JavaCC, a parser generator for Java,
 Scanner generators
including scanner generator and
 Parser generators. parser generator. Input
specifications are different than
 Syntax-directed translation engines
those suitable for Lex/YACC. Also,
 Automatic code generators unlike YACC, JavaCC generates a
top-down parser.
 Data Flow Engines  ANTLR, a set of language translation
 Scanner generators for C/C++: Flex,toolsLex. (formerly PCCTS). Includes
 Parser generators for C/C++: Bison, scanner/parser
YACC. generators for C, C+
 +, and Java.
Available scanner generators for Java:
 JLex, a scanner generator for Java, very similar to Lex.
 JFlex, flex for Java.
 Available parser generators for Java:
 CUP, a parser generator for Java, very similar to YACC.
 BYACC/J, a different version of Berkeley YACC for Java. It
30 is an extension of the standard YACC (a -j flag has been
added to generate Java code).
Thank
Thank You
You ...
...

Assignmen ?
tI
Compiler Design Tools
History of Compilers

You might also like