Unit 2. The Parts of A Compiler
Unit 2. The Parts of A Compiler
Unit 2. The Parts of A Compiler
The parts of a
Compiler
1
Main parts of a compiler
2
parts of a compiler
Lexical Analysis
Stream of characters making up the source program
is read from left to right and grouped into tokens
(sequences of characters having a collective
meaning)
Syntax Analysis
Group the tokens of the source program into
grammatical phrases that are used by the compiler to
synthesize output
3
parts of a compiler
Semantic Analysis: Check the source
program for semantic errors and gather
type information for the subsequent code
generation part.
Intermediate Code Generation: Generate
an intermediate representation as a
program for an abstract machine.
4
parts of a compiler
Code optimization : Improve the
intermediate code so that faster running
code will result
Code generation: Generation of target
code, consisting normally of relocatable
machine code or assembly code
5
Translation
of a
statement
6
Details of the parts of a Compiler
part Output Sample
Position:= inition * rate + 60
Programmer (source code producer) Source string
Scanner (performs lexical analyzer) Token string position’, ‘:=’, ‘inition’, ‘+’,
‘60’,
And symbol table with identifier
Semantic analyzer (type checking, Annotated parse tree or abstract Convert integer (60)
etc) syntax tree to real
Intermediate code generator Three-address code
7
The Grouping of parts
Compiler front and back ends:
Frontend: analysis (machine independent)
Back end: synthesis (machine dependent)
Compiler passes:
A collection of parts is done only once (single pass) or
multiple times (multi pass)
Single pass: usually requires everything to be defined before
being used in source program
Multi pass: compiler may have to keep entire program
representation in memory
8
part 1:Lexical Analysis
Scanner: Converts the stream of input
characters into a stream of tokens that becomes
the input to the following part (parsing)
Tasks of a scanner
Group characters into tokens
Token: the syntax unit
Categorization of tokens.
9
Types of tokens
10
part 2: Parsing
The process of determining if a string of
token can be generate by a grammar
Is executed by a parser
11
part 2: Parsing
Output of a parser:
Parse tree (if any)
Error Message (otherwise)
12
Parse tree of statement a = b + c
13
Grammars,languages, BNF,syntax diagrams
The parser takes the token produced by scanner as
input and generates a parse tree (or syntax tree).
Token arrangements are checked against the
grammar of the source language.
Notations for grammar:
BNF (Backus-Naur Form) is is a meta language used to
express grammars of programming languages
Syntax Diagrams : A pictorial diagram showing the rules
for forming an instruction in a programming language, and
how the components of the statement are related. Syntax
diagrams are like directed graphs.
14
BNF
BNF (and formal grammars) use 2 types of symbol
Terminals :
Tokens of the language
Never appear in the left side of any production
Nonterminals
Intermediate symbol to express structures of a language
Must be in a left side of at lease one production
Enclose in <>
Start symbol
Nonterminal of the first level
Appear at the root of parse tree
15
Grammars,languages, BNF,syntax diagrams
Start symbol :
Nonterminal of the first level
Appear at the root of parse tree
16
Parsing: Concept and Techniques
17
Parsing: Concept and techniques
18
Parsing: Concept and Techniques
If
grammar is ambiguous, more than
one parse tree can be created
19
part 3: Semantic Analysis
Certain check are performed to
ensure that the components of a
program fit together meaningfully
To generate code, source program
must be syntactically and semantically
correct
20
part 4: Intermediate code generation
Source program is transferred to an
equivalent program in intermediate code by
intermediate code generator
Intermediate code is close to the target code,
which makes it suitable for register and
memory allocation, instruction set selection,
etc.
It is good for machine-dependent
optimizations.
21
Advantages of Intermediate Code
22
part 5: Code Generator
Input: Intermediate code of source program
Output: Object program
Assembly code
Virtual machine code
23
Problems
Input
Output
Object machine
Set of instruction
Register allocation
24