Programming Languages and Compiler Design
Programming Languages and Compiler Design
Programming Languages and Compiler Design
Basic objective
How to translate a program in a programming language to a program
executable by a machine
2/36
References
3/36
Outline for today
Source Program
Compiler
Target Code
4/36
Compilers: what you surely already know. . .
Source Program
Pre−processor
Compiler
Assembler
Linker−Loader Librairies
Target Code
5/36
Compilers: what do we expect ?
language Ls
Compiler language Lt
Expected Properties?
6/36
Compilers: what do we expect ?
language Ls
Compiler language Lt
Expected Properties?
I correctness:
execution of T should preserve the semantics of S
I efficiency:
T should be optimized w.r.t. some execution resources (time,
memory, energy, etc.)
I “user-friendliness”: errors in S should be accurately reported
I completeness: any correct Ls-program should be accepted
6/36
Many programming language paradigms . . .
Imperative languages
FORTRAN, Algol-xx, Pascal, C, Ada, Java, etc
control structure, (explicit) memory assignment,
expressions, types, . . .
Functional languages
ML, CAML, LISP, Scheme, etc
term reduction, function evaluation, recursion, . . .
Object-oriented languages
Java, Ada, Eiffel, ...
objects, classes, types, inheritance, polymorphism, . . .
Logical languages
Prolog
resolution, unification, predicate calculus, . . .
etc.
7/36
. . . and many architectures to target!
8/36
We will mainly focus on:
Imperative languages
I data structures
I basic types (integers, characters, pointers, etc)
I user-defined types (enumeration, unions, arrays, . . . )
I control structures
I assignments
I iterations, conditionals, sequence
I nested blocks, sub-programs
“Standard” general-purpose machine architecture: (e.g. ARM, iX86)
I heap, stack and registers
I arithmetic and logical binary operations
I conditional branches
9/36
Describing a programming language P
Lexicon L: words of P
→ a regular language over P alphabet
Syntax S: sentences of P
→ a context-free language over L
Static semantic (e.g., typing): “meaningful” sentences of P
→ subset of S, defined by inference rules or attribute
grammars
Dynamic semantic: the meaning of P programs
10/36
Describing a programming language P
Lexicon L: words of P
→ a regular language over P alphabet
Syntax S: sentences of P
→ a context-free language over L
Static semantic (e.g., typing): “meaningful” sentences of P
→ subset of S, defined by inference rules or attribute
grammars
Dynamic semantic: the meaning of P programs
Meaning?
But How to define the meaning of program?
→ The semantics of programs
10/36
Describing a programming language P
Lexicon L: words of P
→ a regular language over P alphabet
Syntax S: sentences of P
→ a context-free language over L
Static semantic (e.g., typing): “meaningful” sentences of P
→ subset of S, defined by inference rules or attribute
grammars
Dynamic semantic: the meaning of P programs
Meaning?
But How to define the meaning of program?
→ The semantics of programs
Semantics?
I Several notions/visions of semantics
→ transition relation, predicate transformers, partial functions
I Depends on “what we want to do/know on programs”
10/36
Compiler architecture: the logical steps
source pgm lexical analysis
tokens
syntactic analysis
AST + symbol table
semantic analysis
AST + symbol table
optimisation
intermediate code
code generation target pgm
In practice:
I steps regrouped into passes
COMPILER
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Intermediate Code Generation
Optimisation
Code Generation
12/36
Lexical analysis by a scanner
Input: sequence of characters
Output: sequence of lexical unit classes
Some remarks:
I id is an abstract symbol meaning identifier
I In < id, i >, i is a reference to the entry in the symbol table
(The entry in the symbol table associated to an identifier contains
information on the identifier such as name and type)
I normally < 60 > is represented < number , 4 >
14/36
Running example: an assignment
Symbol Table
1 position ...
Lexical Analysis
2 initial ...
3 speed ...
< id , 1 >, <=>, < id , 2 >, < + >, < id , 3 >, < ∗ >, < 60 >
15/36
About the symbol table
Some features:
I Data structure containing an entry for each identifier (variable
name,. . . )
I Rapid Read/Write accesses
I type
I scope (locations of the program where the variable can be used)
I for procedure names: number and types of the parameters
I ...
16/36
Syntactic Analysis by a parser
17/36
Syntactic Analysis of the running example
Example (Syntactic analysis of
< id, 1 >, <=>, < id, 2 >, < + >, < id, 3 >, < ∗ >, < 60 >)
=
The next steps (of analysis and generation) will use the syntactic
structure of the tree
18/36
Running example: an assignment
position = initial+speed*60
Lexical Analysis
< id , 1 >, <=>, < id , 2 >, < + >, < id , 3 >, < ∗ >, < 60 >
< id , 1 > +
< id , 2 > ∗
< id , 3 > 60
19/36
Semantic analysis
1. name identification:
→ bind use-def occurrences
2. type verification and/or type inference
→ type system
(e.g., ∗ uses integers, indexes of arrays are integers,. . . )
3. languages may allow type coercion
⇒ traversals and modifications of the AST
20/36
Semantic Analysis of the running example
60
21/36
Running example: an assignment
position = initial+speed*60
Lexical Analysis
< id , 1 >, <=>, < id , 2 >, < + >, < id , 3 >, < ∗ >, < 60 >
Syntactic Analysis
< id , 1 > +
1 position ...
< id , 3 > 60
2 initial ...
3 speed ... Semantic Analysis
< id , 1 > +
< id , 2 > ∗
60
22/36
Intermediate Code generation
Input: AST
Output: intermediate code(, machine code)
23/36
Intermediate Code generation for the running example
Example
=
60
Some remarks:
I Every operation has at most one right-hand operand
I Use the order described by the AST
I Compiler may create temporary names that receive values created by
one operation: t1, t2, t3
I Some operations have less than 3 operands
24/36
Intermediate Code Optimization
25/36
Intermediate Code Optimization for the running example
Example
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
26/36
Intermediate Code Optimization for the running example
Example
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Some remarks:
I Conversion of 60 to a float can be once for all by replacing the
inttofloat operation by the number 60.0
I t3 is only used to transmit the value to id1
26/36
Intermediate Code Optimization for the running example
Example
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Some remarks:
I Conversion of 60 to a float can be once for all by replacing the
inttofloat operation by the number 60.0
I t3 is only used to transmit the value to id1
→ The code can be “shortened”:
t1 = id3 * 60.0
id1 = id2 + t1
26/36
Optimization for the running example
position = initial+speed*60
Lexical Analysis
< id , 1 >, <=>, < id , 2 >, < + >, < id , 3 >, < ∗ >, < 60 >
Syntactic Analysis
< id , 1 > +
< id , 2 > ∗
t1 = t1 = id3 * 60.0
id1 = id2 + t1
< id , 3 > 60
60 27/36
(Final) Code Generation
Input: Intermediate code
Output: Machine code
Principles:
I Each intermediate statement is translated into a sequence of
machine statements that “does the same job”
I Each variable corresponds to a register or a memory address
28/36
Final Code Generation for the running example
Example
Input:
t1 = id3 * 60.0
id1 = id2 + t1
Output:
< id , 2 > ∗
t1 = t1 = id3 * 60.0
id1 = id2 + t1
< id , 3 > 60
60 30/36
Outline
31/36
Example: static vs. dynamic binding
32/36
Example: static vs. dynamic binding
32/36
Example: static vs. dynamic binding
33/36
Example: parameters
33/36
Example: parameters
33/36
Overview of the semantics part of the course
Various Semantic styles:
Operational semantics: “How a computation is performed?” - meaning
in terms of “computation it induces”
I Natural: “from a bird-eye view”
I Operational: “step by step”
Let us consider:
I E a set
I f : E × E × . . . × E → E a partial function
I A ⊆ E a subset of E
35/36
Inductive/Compositional definitions
Let us consider:
I E a set
I f : E × E × . . . × E → E a partial function
I A ⊆ E a subset of E
Definition (closure)
A is closed by f iff f (A × . . . × A) ⊆ A
35/36
Inductive/Compositional definitions
Let us consider:
I E a set
I f : E × E × . . . × E → E a partial function
I A ⊆ E a subset of E
Definition (closure)
A is closed by f iff f (A × . . . × A) ⊆ A
35/36
Inductive definitions: examples
36/36
A notation: derivation tree
0
1
2
I aba is a palindrome:
I ababa is a palindrome:
a
b bab
aba ababa
37/36
(Some simple) Proofs techniques
Proof by contradiction, reducto-ad-absurdum, contraposition,. . .
Structural Induction
I Proof for the basic elements, atoms, of the set.
I Proof for composite elements (created by applying) rules:
I assume it holds for the immediate components (induction hypothesis)
I prove the property holds for the composite element
38/36