Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 4 PCD

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

1

Code Generation
The final phase of a compiler is code generator
It receives an intermediate representation (IR) with supplementary
information in symbol table as input
Produces a semantically equivalent target program as output
Code generator main tasks:
Instruction selection
Register allocation and assignment
Instruction ordering
Optimizing Compiler
If a code generator takes the intermediate code after performing the code
optimization then the compiler is called as optimizing compiler that
produces more efficient target code

Position of Code generator


Issues in the design of a Code Generator
Since the code generation phase is system dependent, the following issues
arises during the code generation phase
1.
2.
3.
4.
5.
6.
7.

Input to the code generator (Intermediate code)


Target Program
Memory Management
Instruction Selection
Register Allocation
Choice of Evaluation Order
Approaches to code generation
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

Input to the code generator


The input to the code generator consists of the intermediated representation
of the source program produced by the front end, together with the
information in the symbol table
Intermediate Representation + Symbol Table
Input to the code generator is an intermediate code that may be of several
choices:
1. Linear Representation Postfix notation
2. Three address representation Quadruples
3. Virtual Machine Representation Stack machine code
4. Graphical Representation Syntax tree, DAGs
We assume that prior to code generation the front end has scanned, parsed
and translated into intermediate representation.
We assume that the necessary type checking and detection of the semantic
errors has already been performed.
Thus the input to the code generator must be free of errors. In some
compilers, this semantic checking is done together with code generation.
Target programs
The output of the code generator is the target program. The output may take on a
variety of forms:
Absolute machine language,
Relocatable machine language,
Assembly language.
Producing an absolute machine language program as output has the advantage that
it can be placed in a fixed memory location and immediately executed.
Producing a relocatable machine language program as output allows subprograms
to be compiled separately. A set of relocatable object modules can be linked
together and loaded for execution by a linking loader. This is a great deal of
flexibility.
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

Producing an assembly language program as output makes the process of code


generation somewhat easier .We can generate symbolic instructions and use the
macro facilities of the assembler to help generate code.
Let us use assembly code as the target language

Memory Management
Mapping names in the source program to addresses of data objects in run time
memory is done cooperatively by the front end and the code generator. We assume
that a name in a three-address statement refers to a symbol table entry for the
name.
If machine code is being generated, labels in three address statements have to be
converted to addresses of instructions. This process is analogous to the back
patching. Suppose that labels refer to quadruple numbers in a quadruple array.
Example: j: goto i
if i<j (backward jump), generate a jump instruction with the target address
= machine location of the first instruction in the code for quadruple i
if i>j (forward jump), we must store on a list for quadruple i the location of
the first machine instruction generated for quadruple j. Then we process
quadruple i, we fill in the proper machine location for all instructions that
are forward jumps to i.
Instruction Selection
The nature of the instruction set of the target machine determines the difficulty of
instruction selection. The important factors are

Uniformity
Completeness of the instruction set
Instruction execution speed
Machine Idioms

For each type of three- address statement we can design a code skeleton that
outlines the target code to be generated for that construct.
Example : Every three address statement of the form x := y + z, where x, y, and z
are statically allocated, can be translated into the code sequence

Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

MOV y, R0
ADD z, R0
MOV R0, x

/* load y into register R0 */


/* add z to R0 */
/* store R0 into x */

This kind of statement by - statement code generation often produces poor code.
For example, the sequence of statements
a=b+c
d=a+e
MOV b, R0
ADD c, R0
MOV R0, a
MOV a, R0
ADD e, R0
MOV R0, d
Here the fourth statement is redundant, and so is the third if a is not subsequently
used.
The quality of the generated code is determined by its speed and size.
For example if the target machine has an increment instruction (INC), then the
three address statement a = a+1 may be implemented more efficiently by the
single instruction INC a, rather than by a more obvious sequence that loads into a
register, add one to the register, and then stores the result back into a.
MOV a, R0
ADD #1, R0
MOV R0, a
Register Allocation
Instructions involving register operands are usually shorter and faster than those
involving operands in memory. Therefore, efficient utilization of register is
particularly important in generating good code. The use of registers is often
subdivided into two subproblems:
1. During register allocation, we select the set of variables that will reside in
registers at a point in the program.
2. During a subsequent register assignment phase, we pick the specific
register that a variable will reside in.

Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

Finding an optimal assignment of registers to variables is difficult.


Certain machines require register pairs (an even and next odd numbered register)
for some operands and results.
For example, integer multiplication and integer division involve register pairs. The
multiplication instruction is of the form
M x, y
Where x, is the multiplicand, is the even register of an even/odd register pair.
The multiplicand value is taken from the odd register pair. The multiplier y is a
single register. The product occupies the entire even/odd register pair.
The division instruction is of the form
D x, y
Where the 64-bit dividend occupies an even/odd register pair whose even register
is x; y represents the divisor. After division, the even register holds the remainder
and the odd register the quotient.
Now consider the two three address code sequences (a) and (b) in which the only
difference is the operator in the second statement.
Ri stands for register i. L, ST and A stand for load, store and add respectively.
Three address code
t := a + b
t := t * c
t := t / d
(a)

t := a + b
t := t + c
t := t / d
(b)

Shortest assembly code sequences


L R1, a
A R1, b
M R0, c
D R0, d
ST R1, t

L R0, a
A R0, b
A R0, c
SRDA R0, 32
D R0, d
ST R1, t
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

Choice of evaluation order


The order in which computations are performed can affect the efficiency of
the target code.
Some computation orders require fewer registers to hold intermediate
results than others.
Picking a best order is another difficult, NP-complete problem.
Initially, we shall avoid the problem by generating code for the three address statements in the order in which they have been produced by the
intermediate code generator.
Approaches to code generation
The most important criterion for a code generator is that it produces correct
code.
Correctness takes on special significance because of the number of special
cases that code generator must face.
Given the premium on correctness, designing a code generator can be easily
implemented, tested, and maintained

A simple target machine model


Our target computer models a three-address machine with load and store
operations, Computing operations, jump operations and conditional jumps.
Our target computer is a byte-addressable machine with n general purpose
registers R0, R1, R2Rn-1
We shall use a very limited set of instructions and assume that all operands
are integers.
Most instructions consist of an operator, followed by a target, followed by
a list of source operands.
The following kinds of instructions are available
Load operations :
LD, dst, addr - loads the value in the location addr into location dst (
dst=addr)
LD r,x loads the value in location x into register r
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

LD r1, r2 contents of register r2 are copied into register r1 (register-toregister copy)


Store Operations
ST x,r stores the value in register r into the location x (x=r)
Computation operations :
OP dst, src1, src2 where OP is a operator like ADD or SUB and dst, src1,
src2 are locations.
SUB r1,r2,r3 : r1=r2-r3
Unconditional jumps:
BR L This instruction causes control to branch to the machine instruction
with L
Conditional jumps:
Bcond r, L where r is a register, L is a Label and cond stands for any
common tests on values in the register r.
BLTZ r, L causes a jump to the label L if the value in register r is less
than zero and allows control to pass to the next instruction if not.
Target machine has variety of addressing modes
Mode

Form

Address

Associated Cost

Absolute
Register
Indexed
Indirect Register
Indirect Indexed
Immediate constant

M
M
R
R
c(R)
c + contents(R)
*R
contents(R)
*c(R) contents(c+contents(R))
#c
c

1
0
1
0
1
1

Instructions in the form : opcode source, destination

Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

Example
Mov R0, M
into memory location M
Mov 4((R0), M
(R0)) into memory location M

- Stores contents of register R0

- Store the values contents (4+contents

MOV 4(R0), M
- Stores the value contents (contents
(4+contents (R0) into memory location M
Mov #1, R0
register R0

- Loads the constant 1 into

Instruction costs
Instruction cost = 1 + cost for source and destination
Source and destination having registers Cost = 0
Source and destination having memory locations Cost = 1
Source and destination having literals Cost = 1
Run-time Storage Management
Information needed for the execution of procedure is kept in the block of
storage called activation record.
Each program runs in a logical address space and it is divided into four
areas
1.Code : A statically determined area Code that holds the executable target
code. The size of the target code can be determined at compile time.
2.Static data : A statically determined data area Static for holding global
constants and other data generated by the compiler. The size of the global
constants and compiler data can also be determined at compile time.
3.Heap : A dynamically managed area Heap for holding data objects that
are allocated and freed during program execution. The size of the Heap
cannot be determined at compile time.
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

4.Stack : A dynamically managed area Stack for holding activation records


as they are created and destroyed during procedure calls and returns. Like
the Heap, the size of the Stack cannot be determined at compile time.
There are two standard storage allocation strategies. They are:
1.Static allocation
2.Stack allocation
In static allocation, the position of activation record is fixed at compile
time.
In stack allocation, a new activation record is pushed onto the stack for
each execution of a procedure
Since run-time allocation and deallocation of activation records occurs as
part of the procedure call and return sequences, we focus on the following
three-address statements:
Call
Return
Halt
Action
For example consider the three-address code for the procedure C and P as
follows
The size and layout of activation records that communicate to the code
generator is as follows
Static allocation :
Consider the code needed to implement static allocation.
Implementation of call statement : A call statement in the intermediate code
is implemented by a sequence of two target-machine instructions. A mov
instruction saves the return address, and a GOTO transfers control to the
target code for the called procedure.
MOV #here+20, Calee.static area
GOTO callee.Code_area
callee.static_area
address of the activation record
callee.code_area
Address of the first instruction for the
called procedure
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

10

#here + 20
MOV #here+20, Calee.static area
GOTO callee.Code_area

is the literal return address.


=1+1+1=3
=1+1=2

The cost is 5 words or 20 bytes


Implementation of return statement :
GOTO *Callee.Static_area
Which transfers control to the address saved at the beginning of the
activation record.
Example:
We use the pseudo-instruction ACTION to implement the statement
action, which represents three-address code.
We arbitrarily start the code for these procedures at address 100 and 200
respectively.
Each ACTION takes 20 bytes.
The activation records for the procedures are statically allocated starting at
location 300 and 364 respectively.
The instructions starting at address 100 implement the statements
Action1; call p; action2 ; halt of the first procedure c.
Execution therefore starts with the instruction ACTION1 at address 100.
The MOV instruction at address 120 saves the return address 140 in the
machine- status field, which is the first word in the activation record of p.
The GOTO instruction at address 132 transfers control to the first
instruction in the target code of the called procedure.
Since 140 was saved at address 364 by the call sequence, *364 represents
140 when the GOTO statement at address 220 is executed.
Control therefore returns to address 140 and execution of procedure c
resumes.
STACK ALLOCATION:
Static allocation can become stack allocation by using relative addresses
for storage in activation records.
The position of the record for an activation of a procedure is not known
until run time.
In stack allocation, this position is usually stored in a register, so words in
the activation record can be accessed as offsets from the value in the
register.
A positive offset is maintained in a register SP, a pointer to the beginning
of the activation record on top of the stack.
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

11

When a procedure call occurs, the calling procedure increments SP and


transfers control to the called procedure.
After control returns to the caller, it decrements SP, thereby deallocating
the activation record of the called procedure.
Initialization of Stack
The code for the 1st procedure initializes the stack by setting SP to the start
of the stack area in memory.
MOV #stackstart, SP
/*initialize the stack*/
code for the first procedure
HALT
/*terminate execution*/
Implementation of call statement : A procedure call sequence increments
SP, saves the return address, and transfers control to the called procedure:
ADD #caller.recordsize, SP
MOV #here+16, SP
/* save return address*/
GOTO callee.code_area
Implementation of return statement
GOTO *0(SP)
/*return to caller*/
SUB #caller.recordsize, SP
Example : Consider the three addres code for the procedure s, p and q
Suppose that the sizes of the activation records for procedures s, p, and q
have been determined at compile time to be ssize, psize, and qsize,
respectively. The first word in each activation record will hold a return
address. We arbitrarily assume that the code for these procedures starts at
addresses 100,200 and 300 respectively, and that the stack starts at 600.
/*code for s*/
100: MOV #600, SP /*initialize the stack*/
108: ACTION1
128: ADD #ssize, SP /*call sequence begins*/
136: MOV #152, *SP /*push return address*/
144: GOTO 300
/*call q*/
152: SUB #ssize, SP
/*restore SP*/
160: ACTION2
180: HALT

/*code for p*/


200: ACTION3
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

12

220: GOTO *0(SP)

300:
320:
328:
336:
344:
352:
372:
380:
388:
396:
404:
424:
432:
440:
448:
456:

/*return*/
/*code for q*/
/*conditional jump to 456*/

ACTION4
ADD #qsize, SP
MOV #344, *SP /*push return address*/
GOTO 200
/*call p*/
SUB #qsize, SP
ACTION5
ADD #qsize, SP
MOV #396, *SP /*push return address*/
GOTO 300
/*call q*/
SUB #qsize, SP
ACTION6
ADD #qsize, SP
MOV #448, *SP
/*push return address*/
GOTO 300
/*call q*/
SUB #qsize, SP
GOTO *0(SP)
/*return*/

600:
/*stack starts here*/

Simple code Generator


The code generation strategy generates target code for a sequence of three-address
statement. During code generation an operand may reside in a machine register if
it is the result of a previous operation or it may reside in a memory location which
can be accessed by direct or indirect addressing.
In this method, computed results can be kept in registers as long as possible.
For example,
x:= y+z
If register R0 contains y and register R1 contains z
ADD R0, R1
MOV R0, x

Cost = 3

If register R0 contains z
ADD y, R0
MOV R0, x

Cost = 4

Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

13

One of the primary issues during code generation is deciding how to use registers
to best advantages. There are four principal uses of registers
1. Some or all of the operands of an operation must be in registers in order to
perform the operation
2. Registers make good temporaries
3. Registers are used to hold global values
4. Registers are used to help with runtime storage management
Register and Address Descriptors
The code generation algorithm uses descriptors to keep track of register contents
and address for names
A register descriptor and An Address descriptor
Register descriptor
A register descriptor is used to keep track of what is currently in each register. The
register descriptor shows that initially all the registers are empty. As the code
generation for the block progresses the registers will hold the values of
computation. It is consulted whenever a new register is needed. The fields in the
register descriptor are

Status

Operand descriptor

Address descriptor
The address descriptor stores the location where the current value of the name can
be found at run time. The information about locations can be stored in the symbol
table and is used to access the variables. The location can be a register, a stack
location, a memory address.
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

14

Attributes

Addressing Mode

Storage Location/ Register (Address)

The addressing mode may be


DS - Value of the operand in the storage (direct Access to Storage Location)
DR - Value of the operand in the Register (direct Access to Register)
IS - Address of the operand in the storage (Indirect Access to Storage Location)
IR - Address of the operand in the Register (Indirect Access to Register)
Example
a
b
T1

DS
DS
IS

MEM
MEM
R0

A code generation Algorithm


A code generation algorithm takes as input a sequence of three address code of a
basic block. For each statement of the form x = y op z we perform the following
action
1) Invoke a function getreg to determine the location L where the results of
the computation y op z should be stored. L will usually be a register and it
can be also a memory location.
2) Consult the address descriptor of y to determine y, the current location of
y. Prefer the register for y if the value of y is currently both in memory and
a register. If the value of y is not already in L, generate the instruction
MOV y, L to place a copy of y in L
3) Generate the instruction op z, L where z current location of z. again
prefer a register to a memory location if z is in both. Update the address
descriptor of x to indicate that x is in location L.
4) If the current values of y and/or z have no next uses, then alter the register
descriptor to indicate that after execution of x=y op z, those register no
longer will contain y and/or z respectively.
Function getreg
The function getreg return the location L to hold the value of x for the statement
x=y op z.
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

15

1. If y is in a register and y is not live and has no next use after execution of
x=y op z then return the register of y for L
2. Failing (1), return an empty register for L if there is one.
3. Failing (2), if x has a next use in the block, or op is an operator, such as
indexing, that requires a register, find an occupied register R. Store the
value of R into memory location. Update the address descriptor for M and
return R.
4. If x is not used in the block, or no suitable occupied register can be found,
select the memory location of x as L.
Code Generation Example
Consider the statement d = (a - b) + (a - c) + (a - c)
Translated into the following three-address code:
t := a b

u := a c v := t + u d := v + u

Assume that d is live at end of block


Assume that a, b, and c are always in memory
Assume that t, u, and v, being temporaries, are not in memory unless
explicitly stored with a MOV instruction

Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.

You might also like