Chapter 6 - Intermediate Code Generation
Chapter 6 - Intermediate Code Generation
Postfix notation
The postfix notation is practical for an intermediate representation as the operands are found just before
the operator. In fact, the postfix notation is a linearized representation of a syntax tree.
Example: 1 + 2 * 3 will be represented in the postfix notation as 1 2 + 3 *
o Newlabel() - each time this function is called, it gives distinct names that can be used for label names.
o In addition, for convenience, we use the notation gen to create a three-address code from a number
of strings. Gen will produce a three-address code after concatenating all the parameters.
o For example, if id1.lexeme = x, id2.lexeme =y and id3.lexeme = z: gen (id1.lexeme, ‘:=’,
id2.lexeme, ‘+’, id3.lexeme) will produce the three-address code : x := y + z
Note: variables and attribute values are evaluated by gen before being concatenated with the other parameters.
Example 1: generation of the three-address code for an assignment statement and an expression.
Syntax Rule Semantic action
S → id := E S.code := E.code || gen (id.lexeme, :=, E.place)
E → E1 + E2 E.place := newtemp();
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘+’, E2.place)
E → E1 * E2 E.place := newtemp();
E.code := E1.code || E2.code || gen (E.place, ‘:=’, E1.place, ‘*’, E2.place)
E → - E1 E.place := newtemp()
E.code := E1.code || gen (E.place, ‘:= uminus ’, E1.place)
E → (E1) E.place := newtemp()
E.code := E1.code
E → id E.place := id.lexeme
E.code := ‘’ /* empty code */
o The three-address code for the input a:= x + y * z will be:
t1 := y * z
t2 := x + t1
a := t2
o TAC (Three Address Code) can range from high- to low-level, depending on the choice of operators. In
general, it is a statement containing at most 3 addresses or operands.
o The general form is x := y op z, where “op” is an operator, x is the result, and y and z are operands. x,
y, z are variables, constants, or “temporaries”. A three-address instruction consists of at most 3 addresses
for each statement.
o Most common implementations of three address code are- Quadruples, Triples and Indirect triples.
Quadruples
Quadruples consists of four fields in the record structure. One field to store operator op, two fields to
store operands or arguments arg1and arg2 and one field to store result res. res = arg1 op arg2
Example 1: a = b + c
b is represented as arg1, c is represented as arg2, + as op and a as res.
Unary operators like ‘-‘ do not use agr2. Operators like param do not use agr2 nor result. For
conditional and unconditional statements res is label. Arg1, arg2 and res are pointers to symbol table or
literal table for the names.
Example: a = -b * d + c + (-b) * d
Three address code for the above statement is as follows
t1 = - b op arg1 arg2 res
t2 = t1 * d - b t1
t3 = t2 + c * t1 d t2
t4 = - b + t2 c t3
t5 = t4 * d - b t4
t6 = t3 + t5 * t4 d t5
a = t6
+ t3 t5 t6
three address code
= t6 a
op arg1 arg2
- b
* d (0)
+ c (1)
- b
* d (3)
+ (2) (4)
= a (5)
Arg1 and arg2 may be pointers to symbol table for program variables or literal table for constant or
pointers into triple structure for intermediate results.
INDIRECT TRIPLES
These consist of a listing of pointers to triples, rather than a listing of the triples themselves.
An optimizing compiler can move an instruction by reordering the instruction list, without affecting the
triples themselves.
Instruction op arg1 arg2
(0) - b
(1) * d (0)
(2) + c (1)
(3) - b
(4) * d (3)
(5) + (2) (4)
(6) = a (5)
Compiled by: Dawit K. 4
Compiler Design
Declarations
The declaration is used by the compiler as a source of type-information that it will store in the symbol
table. While processing the declaration, the compiler reserves memory area for the variables and
stores the relative address of each variable in the symbol table. The relative address consists of an
address from the static data area.
We use in this section a number of variables, attributes and procedure that help the processing of the
declaration. The compiler maintains a global offset variable that indicates the first address not
yet allocated. Initially, offset is assigned 0. Each time an address is allocated to a variable, the
offset is incremented by the width of the data object denoted by the name.
The procedure enter (name, type, address) creates a symbol table entry for name, give it the type
type and the relative address address.
The synthesized attributes name and width for non-terminal T are also used to indicate the type and
number of memory units taken by objects of that type.
Backpatching
The main problem for generating code for control statements in a single pass is that, during one single
pass, we may not know the labels where the control must go at the time the jump statements
are generated. We can solve this problem by generating jump statements where the targets are
temporarily left unspecified. Each such statement will be put on a list of goto statements whose labels
will be filled when determined. We call this backpatching and it is widely used in three-address code
generation. Backpatching is a technique for generating code for boolean expressions and statements in
one pass. The idea is to maintain lists of incomplete jumps, where all the jump instructions on a list
have the same target. When the target becomes known, all the instructions on its list are completed
by filling in the target.