Lesson 1: Structure of A Compiler
Lesson 1: Structure of A Compiler
(Structure of a Compiler)
Risul Islam Rasel
Assistant Professor
Department of Computer Science and Engineering
Email: risul@ciu.edu.bd
Reference book: Compilers Principles, Techniques (2nd Edition), & Tools, by Aho, Lam, Sethi, Ullman
Compiler vs Interpreter
09/26/2021 2
Compiler vs Interpreter (cont…)
• What is a compiler?
– A program that accepts as input a program text in a certain
language and produces as output a program text in
another language, while preserving the meaning of that
text (Grune et al, 2000).
– A program that reads a program written in one language
(source language) and translates it into an equivalent
program in another language (target language) (Aho et al)
• What is an interpreter?
– A program that reads a source program and produces the
results of executing this source.
09/26/2021 3
Qualities of a Good Compiler
What qualities would we want in a compiler?
– generates correct code (first and foremost!)
– generates fast code
– conforms to the specifications of the input language
– copes with essentially arbitrary input size, variables, etc.
– compilation time (linearly)proportional to size of source
– good diagnostics
– consistent optimisations
– works well with the debugger
09/26/2021 4
Principles of Compilation
The compiler must:
• preserve the meaning of the program being compiled.
• “improve” the source code in some way.
Other issues (depending on the setting):
• Speed (of compiled code)
• Space (size of compiled code)
• Feedback (information provided to the user)
• Debugging (transformations obscure the relationship source code vs
target)
• Compilation time efficiency (fast or slow compiler?)
09/26/2021 5
Uses of Compiler Technology
• Most common use: translate a high-level program to object code
– Program Translation: binary translation, hardware synthesis, …
• Optimizations for computer architectures:
– Improve program performance, take into account hardware parallelism, etc…
• Automatic parallelisation or vectorisation
• Performance instrumentation: e.g., -pg option of cc or gcc
• Interpreters: e.g., Python, Ruby, Perl, Matlab, sh, …
• Software productivity tools
– Debugging aids: e.g, purify
• Security: Java VM uses compiler analysis to prove “safety” of Java code.
• Text formatters, just-in-time compilation for Java, power management,
global distributed computing, …
Key: Ability to extract properties of a source program (analysis) and
transform it to construct a target program (synthesis)
09/26/2021 6
Structure of a Compiler
09/26/2021 7
Structure of a Compiler(cont…)
• Front-end performs the analysis of Source code
the source language:
– Recognises legal and illegal programs
and reports errors. Front-End
– “understands” the input program and
collects its semantics in an IR.
– Produces IR and shapes the code for the
back-end. Intermediate
– Much can be automated.
Representation
• Back-end does the target language
synthesis:
– Chooses instructions to implement each
IR operation. Back-End
– Translates IR into target code.
– Needs to conform with system
interfaces.
– Automation has been less successful. Target code
09/26/2021 8
mn compilers with m+n components!
Fortran target 1
Front-end Back-end
Smalltalk target 2
Front-end Back-end
C I.R. target 3
Front-end Back-end
Java target 4
Front-end Back-end
09/26/2021 9
General Structure of a compiler
Source
Lexical I.C.
Analysis Optimisation
tokens IR
Syntax Code
Analysis Generation
Abstract Syntax Tree (AST) I.R. symbolic instructions
Semantic Target code
Analysis Optimisation
Annotated AST optimised symbolic instr.
Intermediate Target code
code generat. Generation Target
front-end back-end
09/26/2021 10
Lexical Analysis (Scanning)
• Reads characters in the source program and groups them
into words (basic unit of syntax)
• Produces words and recognises what sort they are.
• The output is called token and is a pair of the form <type,
lexeme> or <token_class, attribute>
E.g.: a=b+c becomes <id,a> <=,> <id,b> <+,> <id,c>
• Needs to record each id attribute: keep a symbol table.
• Lexical analysis eliminates white space, etc…
• Speed is important - use a specialised tool: e.g., flex - a
tool for generating scanners: programs which recognise
lexical patterns in text.
09/26/2021 11
Syntax (or syntactic) Analysis (Parsing)
• Imposes a hierarchical structure on the token stream.
• This hierarchical structure is usually expressed by
recursive rules.
• Context-free grammars formalise these recursive rules
and guide syntax analysis.
• Example:
expression expression ‘+’ term | expression ‘-’ term | term
term term ‘*’ factor | term ‘/’ factor | factor
factor identifier | constant | ‘(‘ expression ‘)’
(this grammar defines simple algebraic expressions)
09/26/2021 12
Parsing: parse tree for b*b-4*a*c
expression
expression - term
<const,
<id,b>
4>
09/26/2021 13
AST for b*b-4*a*c
-
* *
<const,
<id,a>
4>
• An Abstract Syntax Tree (AST) is a more useful data structure for
internal representation. It is a compressed version of the parse tree
(summary of grammatical structure without details about its derivation)
• ASTs are one form of IR
09/26/2021 14
Semantic Analysis (context handling)
09/26/2021 15
Intermediate code generation
• Translate language-specific constructs in the AST into more general
constructs.
• A criterion for the level of “generality”: it should be straightforward to
generate the target code from the intermediate representation
chosen.
• Two important properties: should be easy to produce and should be
easy to translate into target machine.
• Example of a form of IR (3-address code):
tmp1 = 4
tmp2 = tmp1 * a
tmp3 = tmp2 * c
tmp4 = b * b
tmp5 = tmp4 - tmp3
09/26/2021 16
Code Optimisation
• The goal is to improve the intermediate code and, thus, the effectiveness
of code generation and the performance of the target code.
• Optimisations can range from trivial (e.g. constant folding) to highly
sophisticated (e.g, in-lining).
• For example: replace the first two statements in the example of the
previous slide with: tmp2=4*a
• Modern compilers perform such a range of optimisations, that one could
argue for:
09/26/2021 17
Code Generation Phase
• Map the AST onto a linear list of target machine
instructions in a symbolic form:
– Instruction selection: a pattern matching problem.
– Register allocation: each value should be in a register when it is
used (but there is only a limited number): NP-Complete problem.
– Instruction scheduling: take advantage of multiple functional
units: NP-Complete problem.
• Target, machine-specific properties may be used to
optimise the code.
• Finally, machine code and associated information
required by the Operating System are generated.
09/26/2021 18
09/26/2021 19
compiler-construction tools
Some commonly used compiler-construction tools include
• Parser generators that automatically produce syntax analyzers from a
grammatical description of a programming language.
• Scanner generators that produce lexical analyzers from a regular-expression
description of the tokens of a language.
• Syntax-directed translation engines that produce collections of routines for
walking a parse tree and generating intermediate code.
• Code-generator generators that produce a code generator from a collection of
rules for translating each operation of the intermediate language into the
machine language for a target machine.
• Data-flow analysis engines that facilitate the gathering of information about
how values are transmitted from one part of a program to each other part.
Data-flow analysis is a key part of code optimization.
• Compiler- construction toolkits that provide an integrated set of routines for
constructing various phases of a compiler.
09/26/2021 20