Compiler Lecture 2
Compiler Lecture 2
Compiler Lecture 2
Objectives:
Understand the structure of a language processing system.
Understand the structure of a compiler.
Understand the tools involved( Scanner generator, Parser generator, etc)
Outcomes:
Students should be able to understand how a language is processed step by step.
Students will analyze the phases of a compiler.
Language Processing System
Source
Program
Preprocessor
Assembler
Linker/ Loader
Absolute machine code
Target
Program
The Phases of a Compiler
Source Program
1
Lexical Analyzer
2
Syntax Analyzer
3
Semantic Analyzer
5
Code Optimizer
6
Code Generator
Target Program
Two main Phases of a Compiler
p o s i t i o n = i n i t i a l + r a t e * 60
The characters in this assignment could be grouped into the following lexemes
and mapped into the following tokens passed on to the syntax analyzer:
2. The assignment symbol = is a lexeme that is mapped into the token (=). Since
this token needs no attribute-value, we have omitted the second component.
We could have used any abstract symbol such as assign for the token-name, but
for notational convenience we have chosen to use the lexeme itself as the name
of the abstract symbol.
3. i n i t i a l is a lexeme that is mapped into the token (id, 2), where 2 points to
the symbol-table entry for i n i t i a l.
5. r a t e is a lexeme that is mapped into the token (id, 3), where 3 points to the
symbol-table entry for r a t e.
After lexical analyzer compiled the above expression the output of lexical
analyzer would be
(i d , 1 ) (=) (id, 2) (+) (id, 3) (*) (60)
:=
(id, 1) +
(id,2) *
(id,3) 60
The Phases of a Compiler
The tree has an interior node labeled * with (id, 3) as its left child and the
integer 60 as its right child. The node (id, 3) represents the identifier rate. The
node labeled * makes it explicit that we must first multiply the value of r a t e by
60. The node labeled + indicates that we must add the result of this
multiplication to the value of i n i t i a l. The root of the tree, labeled =, indicates
that we must store the result of this addition into the location for the identifier
p o s i t i o n. This ordering of operations is consistent with the usual
conventions of arithmetic which tell us that multiplication has higher
precedence than addition, and hence that the multiplication is to be performed
before the addition.
Syntax Error: A grammatical error is a one that violates the (grammatical) rules
of the language, for example if x = 7 y := 4 (missing then).
The Phases of a Compiler
Semantic Analyzer: The semantic analyzer uses the syntax tree and the
information in the symbol table to check the source program for semantic
consistency with the language definition. It also gathers type information and
saves it in either the syntax tree or the symbol table, for subsequent use during
intermediate-code generation.
Semantic analysis checks whether the parse tree constructed follows the
rules of language. For example, assignment of values is between compatible
data types, and adding string to an integer. Also, the semantic analyzer keeps
track of identifiers, their types and expressions; whether identifiers are
declared before use or not etc. In practice semantic analyzers are mainly
concerned with type checking and type coercion based on type rules. The
semantic analyzer produces an annotated syntax tree as an output.
The Phases of a Compiler
:=
(id, 1) +
(id,2) *
60
Conversion Action
The Phases of a Compiler
Datatype mismatch
Undeclared variable
Multiple declaration of a variable in a scope
Actual and formal parameter mismatch
Lecture References