Chapter 7 Intermediate Representation
Chapter 7 Intermediate Representation
Intermediate Representation
Motivation
• ASTs are too high level and grammar dependent
ASTs are too high level and grammar dependent
– Different languages entail different implementations.
– Different machines entail different implementations.
Different machines entail different implementations
– We need something lower, closer to machine code so
that
that
• The ASTs from various languages can be translated into this
uniform IR.
• Translations to various machine code can be done with the
IR.
What are the difference between AST
and low level IR
dl l l
• Conditionals
Co d t o a s
– If‐then‐else does not exist in machine level
instructions. Instead, comparisons and conditional
j
jumps (to only one target).
(t l t t)
• Array and field references
– At
At low level, we need to think about heap/stack, and
low level we need to think about heap/stack and
decide the corresponding addressing mechanism.
• Method calls
Method calls
– In AST, we may have various numbers of arguments.
– At low level, we have only one “call” instruction.
Low Level Tree Representations
Low Level Tree Representations
• Such
Such tree representation is also used in
tree representation is also used in
compilers such as GCC (called RTL and RTX
there).
there)
• Translation to Intermediate Code is indeed a
process of tree rewriting
process of tree rewriting.
IR trees: Expressions
CONST
Integer constant i
i
NAME
Symbolic constant n [a code label]
n
TEMP
Temporary t [one of any number of “registers”]
t
BINOP
Application of binary operator:
e1 e2 ADD, SUB, MUL, DIV [arithmetic]
AND, OR, XOR [bitwise logical]
SLL, SRL [logical shifts]
SRA [arithmetic right-shift]
to integer operands e1 (evaluated first) and e2 (evaluated second)
MEM
Contents of a word of memory starting at address e
e
CALL
Procedure call; expression f is evaluated before arguments e1 , . . . , en
f [e1 , . . . , en ]
ESEQ
Expression sequence; evaluate s for side-effects, then e for result
se
t
MOVE
MEM e2 Evaluate e1 yielding address a, e2 into word at a
e1
EXP
Evaluate e and discard result
e
JUMP
Transfer control to address e; l1 , . . . , ln are all possible values for e
e [l1 , . . . , ln ]
CJUMP
Evaluate e1 then e2 , yielding a and b, respectively; compare a with b using
e1 e2 t f relational operators:
BEQ, BNE [signed and unsigned integers]
BLT, BGT, BLE, BGE [signed]
jump to t if true, f if false
SEQ
Statement s1 followed by s2
s1 s2
LABEL
Define constant value of name n as current code address; NAME(n) can be
n used as target of jumps, calls, etc.
CS352 Translating ASTs to IR trees 3
Some Examples
Some Examples
• A[i]=x+y;
• if
if (x>y)
( )
x=2
else
x 3
x=3
Things are Not That Easy
Things are Not That Easy
• The translations for (x>3) in
e t a s at o s o ( 3)
– y= x>3
– if (x>3) s1 else s2
• The translations for x=3 in
– x=3; …
– if (x=3)
• Solution:
– Let expressions, statements, and conditionals share
the same base class Translate.exp so that one can be
converted to the other in various contexts.
Kinds of expressions
TEMP
Ex(TEMP t)
t
Array elements: Array expression is reference to array in heap.
For exressions e and i, translate e[i] as:
Ex(MEM(ADD(e.unEx(), ×(i.unEx(), CONST(w)))))
where w is the target machine’s word size: all values are word-sized
(scalar) in MiniJava
Array bounds check: array index i <e.size; runtime will put size in word
preceding array base
Object fields: Object expression is reference to object in heap.
For expression e and field f , translate e.f as:
Ex(MEM(ADD(e.unEx(), CONST(o))))
where o is the byte offset of the field f in the object
Null pointer check: object expression must be non-null (i.e., non-zero)
CS352 Translating ASTs to IR trees 5
Translating MiniJava
Basic blocks:
while (c) s:
1. evaluate c
2. if false jump to next statement after loop
3. evaluate loop body s
4. evaluate c
5. if true jump back to loop body
e.g.,
if not(c) jump done
body :
s
if c jump body
done:
Nx(SEQ(SEQ(c.unCx(b, x), SEQ(LABEL(b), s.unNx())),
SEQ(c.unCx(b, x), LABEL(x))))
CS352 Translating ASTs to IR trees 8
for loops
for (i, c, u) s
e0.m(e1 , . . . , en ):
Translate a op b as:
RelCx.op(a.unEx(), b.unEx())
CJUMP(a.unEx(), b.unEx(), t, f )
ESEQ(SEQ(MOVE(TEMP(r), CONST(1)),
SEQ(unCx(t, f ),
SEQ(LABEL( f ),
SEQ(MOVE(TEMP(r), CONST(0)), LABEL(t))))),
TEMP(r))
CS352 Translating ASTs to IR trees 11
Conditionals
or more optimally:
translates to:
where k is offset of static array from the frame pointer FP, w is word size
Array allocation:
constant bounds
• allocate in static area, stack, or heap
• no run-time descriptor is needed
dynamic arrays: bounds fixed at run-time
• allocate in stack or heap
• descriptor is needed
dynamic arrays: bounds can change at run-time
• allocate in heap
• descriptor is needed
Array layout:
Contiguous:
1. Row major
Rightmost subscript varies most quickly:
A[1,1], A[1,2], ...
A[2,1], A[2,2], ...
Used in PL/1, Algol, Pascal, C, Ada, Modula, Modula-2, Modula-3
2. Column major
Leftmost subscript varies most quickly:
A[1,1], A[2,1], ...
A[1,2], A[2,2], ...
Used in FORTRAN
By vectors
Contiguous vector of pointers to (non-contiguous) subarrays
array [1..N,1..M] of T
≡ array [1..N] of array [1..M] of T
no. of elt’s in dimension j: D j = U j − L j + 1
position of A[i1, ..., in]:
(in − Ln)
+(in−1 − Ln−1)Dn
+(in−2 − Ln−2)DnDn−1
+···
+(i1 − L1)Dn · · · D2
which can be rewritten as
variable part
z }| {
i1D2 · · · Dn + i2D3 · · · Dn + · · · + in−1Dn + in
− (L1D2 · · · Dn + L2D3 · · · Dn + · · · + Ln−1Dn + Ln)
| {z }
constant part
Address of A[i1, ..., in]:
address(A) + ((variable part − constant part) × element size)
CS352 Translating ASTs to IR trees 17
case (switch) statements
A little complicated!
Resolving references to labels multiply-defined in different scopes:
begin
L: begin
goto L;
. . . { possible definition of L }
end
end
• Scope labels like variables
• On use, label definition is either resolved or unresolved
• On definition, backpatch previous unresolved label uses
Jumping out of blocks or procedures:
1. Pop run-time stack
2. Fix display (if used); static chain needs no fixing
3. Restore registers if jumping out of a procedure
CS352 Translating ASTs to IR trees 20
Parameter passing
Value:
Result:
Value-result:
Implementation:
Scalars:
• result/value-result ⇒
pass address of actual, copy value to/from local copy
• value ⇒ simply pass value directly
Arrays:
• pass dope vector
• static arrays ⇒ pass pointer to base of array
• result/value-result ⇒ two local dope vectors
• value ⇒ one local dope vector
Records:
• handle as scalar (since fixed in size)
• best to pass address, let callee copy
(more compact calling sequences)
CS352 Translating ASTs to IR trees 23
Reference and read-only parameters
Scalars: