Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
79 views

Chapter 6 - Intermediate Code Generation

Uploaded by

Aschalew Ayele
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Chapter 6 - Intermediate Code Generation

Uploaded by

Aschalew Ayele
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Compiler Design

Chapter six: Intermediate code generation


The Objective of this chapter are listed as follows,
❖ Describe intermediate code generation and the various representation formats: syntax tree, postfix
notation and three address code
❖ Explain the conversion process: from syntax directed translation into three address code.
❖ Explain data structures used in the implementation of TAC: Quadruples, Triples and Indirect triples.
❖ Describe the term Declarations as well as technique used in code generation (Backpatching).

Intermediate code generation


• In a compiler, the front end translates a source program into an intermediate representation, and the
back end generates the target code from this intermediate representation.
• Intermediate code generation (ICG) is the final phase of the compiler front-end
• Goal: translate the program into a format expected by the compiler back-end.
• Techniques for intermediate code generation can be used for final code generation
• The use of a machine independent intermediate code (IC) is:
o Retargeting to another machine is facilitated
o The optimization can be done on the machine independent code
• Intermediate languages can be represented in the form of
1. Syntax tree
2. Postfix notation
3. Three address code
Syntax tree
o While parsing the input, a syntax tree can be constructed for the following tables. A syntax tree
(abstract tree) is a condensed form of parse tree useful for representing language constructs.
o For example, for the string a+b, the parse tree in (a) below will be represented by the syntax tree
shown in (b); the keywords (syntactic sugar) that existed in the parse tree will not exist in the syntax
tree.

Postfix notation
The postfix notation is practical for an intermediate representation as the operands are found just before
the operator. In fact, the postfix notation is a linearized representation of a syntax tree.
Example: 1 + 2 * 3 will be represented in the postfix notation as 1 2 + 3 *

Compiled by: Dawit K. 1


Compiler Design

Three address code


The three-address code is a sequence of statements of the form:
X := Y op Z
Where: X, Y and Z are names, constants or compiler-generated temporaries, op is an operator such
as integer or floating-point arithmetic operator or logical operator on Boolean data.
Important Notes:
o No built-up arithmetic operator is permitted
o Only one operator at the right side of the assignment is possible, i.e. x + y + z is not possible
o Similarly, to postfix notation, the three-address code is a linearized representation of a syntax tree.
It has been given the name three-address code because such an instruction usually contains three
addresses (the two operands and the result).

Types of three address statements


• As with an assembler statement, the three-address code statement can have: Symbolic labels, as well
as Statements for control flow.
• Common three-address code statements:
Statement Format Comments
1. Assignment (binary operation) X := Y op Z Arithmetic and logical operators used
2. Assignment (unary operation) X := op Y Unary -, not, conversion operators used
3. Copy statement X := Y
4. Unconditional jump Goto L
5. Conditional jumps If X relop y goto L
6. Function call:
- Parameter specification param X1 The parameters are specified using param
- Calling the function call P, N The procedure P is called by indicating the
number of parameters
7. Indexed arguments 1) X := Y [I] X will be assigned the value at the address Y + I
2) Y [I] := X The value at the address Y + I will be assigned X
8. Address & pointer assignments X := &Y X is assigned the address of Y
X := *Y X is assigned the element at the address Y
*X = Y The value at the address X is assigned Y
The choice of allowable operators is an important issue in the design of an intermediate form. It should
be rich enough to implement the operations of the source language and yet it should not be too
complicated to be translated in the target language.

Syntax directed translation into three address code


o Syntax directed translation can be used to generate the three-address code. Generally, either the
three-address code is generated as an attribute of the attributed parse tree or the semantic actions
have side effects that write the three-address code statements in a file.
o When the three-address code is generated, it is often necessary to use temporary variables and
temporary names. To this end the following functions are given:
o Newtemp() - each time this function is called, it gives distinct names that can be used for
temporary variables.

Compiled by: Dawit K. 2


Compiler Design

o Newlabel() - each time this function is called, it gives distinct names that can be used for label names.

o In addition, for convenience, we use the notation gen to create a three-address code from a number
of strings. Gen will produce a three-address code after concatenating all the parameters.
o For example, if id1.lexeme = x, id2.lexeme =y and id3.lexeme = z: gen (id1.lexeme, ‘:=’,
id2.lexeme, ‘+’, id3.lexeme) will produce the three-address code : x := y + z
Note: variables and attribute values are evaluated by gen before being concatenated with the other parameters.
Example 1: generation of the three-address code for an assignment statement and an expression.
Syntax Rule Semantic action
S → id := E S.code := E.code || gen (id.lexeme, :=, E.place)
E → E1 + E2 E.place := newtemp();
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘+’, E2.place)
E → E1 * E2 E.place := newtemp();
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘*’, E2.place)
E → - E1 E.place := newtemp()
E.code := E1.code || gen (E.place, ‘:= uminus ’, E1.place)
E → (E1) E.place := newtemp()
E.code := E1.code
E → id E.place := id.lexeme
E.code := ‘’ /* empty code */
o The three-address code for the input a:= x + y * z will be:
t1 := y * z
t2 := x + t1
a := t2
o TAC (Three Address Code) can range from high- to low-level, depending on the choice of operators. In
general, it is a statement containing at most 3 addresses or operands.
o The general form is x := y op z, where “op” is an operator, x is the result, and y and z are operands. x,
y, z are variables, constants, or “temporaries”. A three-address instruction consists of at most 3 addresses
for each statement.
o Most common implementations of three address code are- Quadruples, Triples and Indirect triples.
Quadruples
Quadruples consists of four fields in the record structure. One field to store operator op, two fields to
store operands or arguments arg1and arg2 and one field to store result res. res = arg1 op arg2
Example 1: a = b + c
b is represented as arg1, c is represented as arg2, + as op and a as res.
Unary operators like ‘-‘ do not use agr2. Operators like param do not use agr2 nor result. For
conditional and unconditional statements res is label. Arg1, arg2 and res are pointers to symbol table or
literal table for the names.

Compiled by: Dawit K. 3


Compiler Design

Example: a = -b * d + c + (-b) * d
Three address code for the above statement is as follows
t1 = - b op arg1 arg2 res
t2 = t1 * d - b t1
t3 = t2 + c * t1 d t2
t4 = - b + t2 c t3
t5 = t4 * d - b t4
t6 = t3 + t5 * t4 d t5
a = t6
+ t3 t5 t6
three address code
= t6 a

Quadruples for the above example


Triples
Triples use only three fields in the record structure. One field for operator, two fields for operands named as
arg1 and arg2. Value of temporary variable can be accessed by the position of the statement the
computes it and not by location as in quadruples.
Example: a = -b * d + c + (-b) * d Triples for the above example is as follows

op arg1 arg2
- b
* d (0)
+ c (1)
- b
* d (3)
+ (2) (4)
= a (5)

Arg1 and arg2 may be pointers to symbol table for program variables or literal table for constant or
pointers into triple structure for intermediate results.
INDIRECT TRIPLES
These consist of a listing of pointers to triples, rather than a listing of the triples themselves.
An optimizing compiler can move an instruction by reordering the instruction list, without affecting the
triples themselves.
Instruction op arg1 arg2
(0) - b
(1) * d (0)
(2) + c (1)
(3) - b
(4) * d (3)
(5) + (2) (4)
(6) = a (5)
Compiled by: Dawit K. 4
Compiler Design

Declarations
The declaration is used by the compiler as a source of type-information that it will store in the symbol
table. While processing the declaration, the compiler reserves memory area for the variables and
stores the relative address of each variable in the symbol table. The relative address consists of an
address from the static data area.
We use in this section a number of variables, attributes and procedure that help the processing of the
declaration. The compiler maintains a global offset variable that indicates the first address not
yet allocated. Initially, offset is assigned 0. Each time an address is allocated to a variable, the
offset is incremented by the width of the data object denoted by the name.
The procedure enter (name, type, address) creates a symbol table entry for name, give it the type
type and the relative address address.
The synthesized attributes name and width for non-terminal T are also used to indicate the type and
number of memory units taken by objects of that type.

Backpatching
The main problem for generating code for control statements in a single pass is that, during one single
pass, we may not know the labels where the control must go at the time the jump statements
are generated. We can solve this problem by generating jump statements where the targets are
temporarily left unspecified. Each such statement will be put on a list of goto statements whose labels
will be filled when determined. We call this backpatching and it is widely used in three-address code
generation. Backpatching is a technique for generating code for boolean expressions and statements in
one pass. The idea is to maintain lists of incomplete jumps, where all the jump instructions on a list
have the same target. When the target becomes known, all the instructions on its list are completed
by filling in the target.

Compiled by: Dawit K.

You might also like