Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
120 views

1-Introduction To Compilers

The document provides an overview of a compiler design course, including information about the textbook, assessment criteria, and course contents. The course will cover topics such as scanning, parsing, semantic analysis, code generation, and different compiler implementation approaches like top-down and bottom-up parsing. It defines compilers as programs that translate a program from a source language to a target language like machine code. Interpreters directly execute the source program instead of translating it. Compilation involves multiple phases including preprocessing, compilation to assembly code, assembly, and linking.

Uploaded by

Amr Hossam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views

1-Introduction To Compilers

The document provides an overview of a compiler design course, including information about the textbook, assessment criteria, and course contents. The course will cover topics such as scanning, parsing, semantic analysis, code generation, and different compiler implementation approaches like top-down and bottom-up parsing. It defines compilers as programs that translate a program from a source language to a target language like machine code. Interpreters directly execute the source program instead of translating it. Compilation involves multiple phases including preprocessing, compilation to assembly code, assembly, and linking.

Uploaded by

Amr Hossam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Lecture (1)

Introduction to Compilers

Dr. Wafaa Samy

CSE439: Design of Compilers (Spring 2024)


Contents
• About the Course
o Text Books
o Assessment Criteria
• Introduction to Compilers
• Why study Design of Compilers?
• Grouping of Phases

2
About the Course
• Introduction to Compilers
• Scanning
• Context Free Grammar and Parsing
• Top-Down Parsing
• Bottom-Up Parsing
• Semantic Analysis, Runtime Environments, and Code
Generation
3
Text Books
• Compilers: Principles, Techniques, and Tools” by Aho,
Sethi, and Ullman, 2007, 2nd edition.
• Compiler Construction: Principles and Practice,
Kenneth C. Louden, 1997, PWS Publishing Company,
ISBN 0-534-93972-4.

4
Assessment Criteria
• Quizzes 5
o Quiz (1) before the Midterm exam
o Quiz (2) after the Midterm exam

• Project 35
o Lexer and Parser for the C Programming language
• Bonus
o Attendance with In-Class Participation

• Midterm Exam 20

• Final Exam 40
5
Compilers
• Compilation: Translation of a program written in a source language into a semantically
equivalent program written in a target language.
• Compilers are computer programs that translate one language to another.
o A compiler takes as input a program written in a source language and produces an equivalent
program written in a target language.
o Source language is a high level language like C, C++, etc.
o Target language is object code (also called machine code) for the target machine (i.e. code
written in the machine instructions of the computer on which it is to be executed).
o An important role of the compiler is to report any errors in the source program that it detects
during the translation process.
• Simple view of a compiler:
Running the target program: If the target program is an
executable machine-language program, it can then be
Source Target
Program
Compiler Program
called by the user to process inputs and produce outputs.

Input Target
Output
Error messages Program 6
Interpreters
• Interpretation: Performing the operations implied by the source program.

• An interpreter is another common kind of language processor.


o Instead of producing a target program as a translation, an interpreter directly executes
the operations specified in the source program on inputs supplied by the user.

• Simple view of an interpreter:

Source
Program
Interpreter Output
Input

Error messages 7
Compilers and Interpreters
• The machine-language target program produced by a compiler is
usually much faster than an interpreter at mapping inputs to
outputs.
o A compiler is to be preferred if speed of execution is a primary consideration,
since compiled object code is faster than interpreted source code.

• An interpreter, however, can usually give better error diagnostics


than a compiler, because it executes the source program statement
by statement.
• Any programming language can be either interpreted (e.g. Python) or
compiled (e.g. C++). Also, there are hybrid language translators
based on both interpretation and compilation (e.g. Java).
8
Example (1): Java Language Processor
• Java language processors combine compilation and interpretation.
1. A Java source program may first be compiled into an intermediate form called
bytecodes.
2. The bytecodes are then interpreted by a virtual machine.
• What is the advantage of Java language processor?

A hybrid compiler

9
Example (1): Java Language Processor (Cont.)
• What is the advantage of Java language
processor?
o Java language processors combine
compilation and interpretation.
o Bytecodes compiled on one machine can be
interpreted on another machine.

• In order to achieve faster processing of


inputs to outputs, some Java compilers,
called just-in-time compilers, translate the
bytecodes into machine language
immediately before they run the
intermediate program to process the input.
10
A Language-Processing System
• In addition to a compiler, several other programs may
be required to create an executable target program.
1. Preprocessor: a separate program to collect the
source program because sometimes the source
program may be divided into modules stored in
separate files.
• A preprocessor can delete comments, include other
files, and perform macro substitutions (i.e. a macro is
a shorthand description of a repeated sequence of
source code statements).

11
A Language-Processing System
2. Compiler: the modified source program is then fed
to a compiler.
o Sometimes, the compiler may produce an assembly-
language program as its output (i.e. target program),
because assembly language is easier to produce as
output and is easier to debug.
o Then, rely on an assembler to finish the translation
into object code.
o Assembly language is a symbolic form of the machine
language of the computer and is easy to translate.

12
A Language-Processing System (Cont.)
3. Assembler: the assembly language is then processed
by a program called an assembler that produces
relocatable machine code as its output.
o Although the relocatable machine code is in 0 and 1 form,
but it cannot be executed.
o Why?
o Relocatable machine code is not ready yet to execute
because this code has not been assigned the actual
memory addresses yet and its memory references are all
made relative to an undetermined starting location that
can be anywhere in memory.
• So, why producing relocatable machine code?
o It allows to locate machine code at any point of the
computer’s RAM.
o It supports program movement. 13
A Language-Processing System (Cont.)
4. Linker: Large programs are often compiled in pieces,
so the relocatable machine code may have to be
linked together with other relocatable object files and
library files (which the program needs) into the code
that actually runs on the machine.
o The linker resolves external memory addresses, where
the code in one file may refer to a location in another file.
o Makes a single program from several files of relocatable
machine code.

• The loader then puts together all of the executable


object files into memory for execution.
o Loads machine code into memory.
o Change the relocatable addresses.
14
The Phases of a Compiler
Source Code
• A decomposition of a compiler into phases is shown in
figure. Scanner

• The symbol table, which stores information about the Parser


entire source program, is used by all phases of the
compiler.
• Some compilers have a machine-independent
optimization phase.
o The purpose of this optimization phase is to perform
transformations on the intermediate representation, so that the
next phasese can produce a better target program.

• Since optimization is optional, one or the other of the


two optimization phases shown in figure may be missing.
15
(1) Lexical Analysis
• The first phase of a compiler is called lexical analysis or scanning.
• The lexical analyzer reads the stream of characters making up the source
program and groups the characters into meaningful sequences called
lexemes.
• For each lexeme, the lexical analyzer produces as output a token of the
form that it passes on to the subsequent phase, syntax analysis:
< token-name, attribute-value >
• In the token, the first component token-name is an abstract symbol that is
used during syntax analysis, and the second component attribute-value
points to an entry in the symbol table for this token.
• Information from the symbol-table entry is needed for semantic analysis
and code generation.
16
Example (2): Lexical Analysis
• For example, suppose a source program contains the assignment statement:
position = initial + rate * 60
• The characters in this assignment could be grouped into the following lexemes and mapped into
the following tokens passed on to the syntax analyzer:
1. The position is a lexeme that would be mapped into a token < id, 1 >, where id is an abstract
symbol standing for identifier and 1 points to the symbol table entry for position. The symbol-
table entry for an identifier holds information about the identifier, such as its name and type.
2. The assignment symbol = is a lexeme that is mapped into the token < = >.
o Since this token needs no attribute-value, we have omitted the second component.
3. The initial is a lexeme that is mapped into the token < id, 2 >, where 2 points to the symbol-table
entry for initial.
4. The + is a lexeme that is mapped into the token < + >.
5. The rate is a lexeme that is mapped into the token < id, 3 >, where 3 points to the symbol-table
entry for rate.
6. The * is a lexeme that is mapped into the token < * >. Blanks separating the lexemes would
7. The 60 is a lexeme that is mapped into the token < 60 >. be discarded by the lexical analyzer.
17
Example (2): Lexical Analysis (Cont.)
• Figure shows the representation of the assignment statement after
lexical analysis as the sequence of tokens.

• In this representation, the token names =, +, and * are abstract


symbols for the assignment, addition, and multiplication operators,
respectively.

Tokens Stream 18
(2) Syntax Analysis
• The second phase of the compiler is syntax analysis or parsing.
• The parser uses the first components of the tokens produced by the
lexical analyzer to create a tree-like intermediate representation that
depicts the grammatical structure of the token stream.
o A typical representation is a syntax tree in which each interior node
represents an operation and the children of the node represent the
arguments of the operation.
• The subsequent phases of the compiler use the grammatical structure
to help analyze the source program and generate the target program.

19
Example (2): Syntax Analysis
• A syntax tree for the tokens stream is shown as the
output of the syntactic analyzer in figure.
• This tree shows the order in which the operations in
the assignment are to be performed:
position = initial + rate * 60
• The tree has an interior node labeled * with (id, 3) as its
left child and the integer 60 as its right child.
o The node (id, 3) represents the identifier rate.
o The node labeled * makes it explicit that we must first This ordering of operations is
multiply the value of rate by 60. consistent with the usual
• The node labeled + indicates that we must add the conventions of arithmetic which tell
result of this multiplication to the value of initial. us that multiplication has higher
precedence than addition, and
• The root of the tree, labeled =, indicates that we must hence that the multiplication is to be
store the result of this addition into the location for performed before the addition.
the identifier position.
20
(3) Semantic Analysis
• The semantic analyzer uses the syntax tree and the information in the
symbol table to check the source program for semantic consistency with
the language definition.
• An important part of semantic analysis is type checking, where the
compiler checks that each operator has matching operands.
o For example, many programming language definitions require an array index
to be an integer; the compiler must report an error if a floating-point number
is used to index an array.
• The language specification may permit some type conversions called
coercions.
o For example, a binary arithmetic operator may be applied to either a pair of
integers or to a pair of floating-point numbers.
o If the operator is applied to a floating-point number and an integer, the
compiler may convert or coerce the integer into a floating-point number.
21
Example (2): Semantic Analysis
• A type conversion or coercion appears in figure.
• Suppose that position, initial , and rate have been
declared to be floating-point numbers, and that the
lexeme 60 by itself forms an integer.
• The type checker in the semantic analyzer in figure
discovers that the operator * is applied to a floating-
point number rate and an integer 60.
• In this case, the integer may be converted into a
floating-point number.
• In figure, notice that the output of the semantic
analyzer has an extra node for the operator inttofloat,
which explicitly converts its integer argument into a
floating-point number.
22
(4) Intermediate Code Generation
• In the process of translating a source program into target code, a compiler may
construct one or more intermediate representations, which can have a variety of
forms.
• Syntax trees are a form of intermediate representation; they are commonly
used during syntax and semantic analysis.
• After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation,
which we can think of as a program for an abstract machine.
• This intermediate representation should have two important properties:
1. It should be easy to produce and
2. It should be easy to translate into the target machine code.

23
Example (2): Intermediate Code Generation
• E.g. Consider an intermediate form called three-
address code, which consists of a sequence of
assembly-like instructions with three operands
per instruction.
o Each operand can act like a register.
• The output of the intermediate code generator
in figure consists of the three-address code
sequence:
t1 = inttofloat (60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
24
Three-Address Instructions
• There are several points worth noting about three-address
instructions.
1. Each three-address assignment instruction has at most one
operator on the right side. Thus, these instructions fix the order in
which operations are to be done; the multiplication precedes the
addition in the source program.
2. The compiler must generate a temporary name to hold the value
computed by a three-address instruction.
3. Some "three-address instructions" like the first and last in the
example (2) sequence (i.e. in previous slide), have fewer than three
operands.
25
(5) Code Optimization
• The machine-independent code-optimization phase attempts to improve
the intermediate code so that better target code will result.
o A simple intermediate code generation algorithm followed by code
optimization is a reasonable way to generate good target code.

• Usually better means faster, but other objectives may be desired, such as
shorter code, or target code that consumes less power.

26
Example (2): Code Optimization
• The optimizer can deduce that the conversion
of 60 from integer to floating point can be done
once and for all at compile time, so the
inttofloat operation can be eliminated by
replacing the integer 60 by the floating-point
number 60.0.
• Moreover, t3 is used only once to transmit its
value to id1 so the optimizer can transform
into the shorter sequence.

27
(6) Code Generation
• The code generator takes as input an intermediate representation of
the source program and maps it into the target language.
• If the target language is machine code, registers or memory locations
are selected for each of the variables used by the program.
o A crucial aspect of code generation is the judicious assignment of registers to
hold variables.

• Then, the intermediate instructions are translated into sequences of


machine instructions that perform the same task.

28
Example (2): Code Generation
• Using registers R1 and R2, the intermediate code might get
translated into the machine code:
LDF R2, id3
MULF R2 , R2 , #60.0
LDF R1 , id2
ADDF R1 , R1 , R2
STF id1 , R1
The first operand of each instruction specifies a destination.
The F in each instruction tells us that it deals with floating-point numbers.
The code loads the contents of address id3 into register R2, then multiplies it with floating-point constant 60.0. The #
signifies that 60.0 is to be treated as an immediate constant.
The third instruction moves id2 into register R1 and the fourth adds to it the value previously computed in register R2.
Finally, the value in register R1 is stored into the address of id1, so the code correctly implements the assignment
statement.
29
Example (2): Summary
• Translation of an assignment
statement:
position = initial + rate * 60

30
Why study Design of Compilers?
• Open the lid of compilers and see inside:
o Understand what they do.
o Understand how they work.
o Understand how to build them.

• Basic background information for software engineer:


o Increase understanding of programming languages semantics.
o See the machine code generated for language constructs helps to understand
performance issues for programming languages.
o Teach good language design.
o New devices may need device-specific languages.
o New business fields may need domain-specific languages.
31
Applications of Compiler Technology & Tools
• Processing domain-specific and device-specific languages.
• Natural language processing, for example, spam filter, search,
document comprehension, summary generation
• Translating from a hardware description language to the schematic of
a circuit.
• Extending an existing programming language.
• Program analysis and improvement tools.

32
Grouping of Phases
Source Code
• Activities from several phases may be grouped together
Scanner
into a pass that reads an input file and writes an output
file. Parser

• The front-end phases of lexical analysis, syntax


analysis, semantic analysis, and intermediate code
generation might be grouped together into one pass.

• Code optimization might be an optional pass.

• Then there could be a back-end pass consisting of code


generation for a particular target machine.

33
Grouping of Phases (Cont.)
• There are two major phases of a 1. Lexical Analysis
compiler: 2. Syntax Analysis

Front
End
1. Front end: performs analysis 3. Semantic Analysis Analysis
(machine independent). 4. Intermediate Code
o In this phase, an intermediate Generation
representation is created from the

Back
given source program. 5. Code Generation

End
Synthesis
2. Back end: performs synthesis 6. Optimization
(machine dependent).
o In this phase, the equivalent target
program is created from this
intermediate representation. 34
Traditional Two-Pass Compiler
• Also known as Analysis-Synthesis model of compilation.
Allows multiple front ends:
produce compilers for different
source languages for one target
machine by combining different
front ends with the back end for
• Intermediate representation (IR). that target machine.
• Front end maps legal source code into IR. Simplify retargeting: produce
• Back end maps IR onto target machine code. compilers for different target
machines, by combining a front end
• Advantages: with back ends for different target
oAllows multiple front ends. machines.
oSimplify retargeting.
35
M*N vs. M+N Problem
• Compilers are required for all the languages and all the machines.
• For M languages and N machines we need to develop M*N
compilers.
• However, there is lot of repetition of work because of similar activities
in the front ends and back ends.
• Can we design only M front ends and N back ends, and some how
link them to get all M*N compilers?
o By using an intermediate language between the front-end and back-end.
o The front-end produces intermediate representation as output, then this
representation is entered as input to the back-end.
o So, the number of required compilers is M+N instead of M*N.
36
M*N vs. M+N Problem (Cont.)

37
Universal Intermediate Language

• Impossible to design a single intermediate language to accommodate


all programming languages.
• However, common IRs for similar languages, and similar machines
have been designed, and are used for compiler development.

38
The Course Project
• Project Team: 5 students.
• Required Task:
oWrite the two main tasks of a compiler (Scanner and Parser) for
the C programming language.

39
References
• Compilers: Principles, Techniques, and Tools” by Aho, Sethi, and
Ullman, 2007, 2nd edition. (Chapter 1)

• Compiler Construction: Principles and Practice, Kenneth C. Louden,


1997, PWS Publishing Company, ISBN 0-534-93972-4. (Chapter 1)

40
Thank You
Dr. Wafaa Samy

wafaa.elkassas@gmail.com

41

You might also like