Unit 4 PCD
Unit 4 PCD
Unit 4 PCD
Code Generation
The final phase of a compiler is code generator
It receives an intermediate representation (IR) with supplementary
information in symbol table as input
Produces a semantically equivalent target program as output
Code generator main tasks:
Instruction selection
Register allocation and assignment
Instruction ordering
Optimizing Compiler
If a code generator takes the intermediate code after performing the code
optimization then the compiler is called as optimizing compiler that
produces more efficient target code
Memory Management
Mapping names in the source program to addresses of data objects in run time
memory is done cooperatively by the front end and the code generator. We assume
that a name in a three-address statement refers to a symbol table entry for the
name.
If machine code is being generated, labels in three address statements have to be
converted to addresses of instructions. This process is analogous to the back
patching. Suppose that labels refer to quadruple numbers in a quadruple array.
Example: j: goto i
if i<j (backward jump), generate a jump instruction with the target address
= machine location of the first instruction in the code for quadruple i
if i>j (forward jump), we must store on a list for quadruple i the location of
the first machine instruction generated for quadruple j. Then we process
quadruple i, we fill in the proper machine location for all instructions that
are forward jumps to i.
Instruction Selection
The nature of the instruction set of the target machine determines the difficulty of
instruction selection. The important factors are
Uniformity
Completeness of the instruction set
Instruction execution speed
Machine Idioms
For each type of three- address statement we can design a code skeleton that
outlines the target code to be generated for that construct.
Example : Every three address statement of the form x := y + z, where x, y, and z
are statically allocated, can be translated into the code sequence
MOV y, R0
ADD z, R0
MOV R0, x
This kind of statement by - statement code generation often produces poor code.
For example, the sequence of statements
a=b+c
d=a+e
MOV b, R0
ADD c, R0
MOV R0, a
MOV a, R0
ADD e, R0
MOV R0, d
Here the fourth statement is redundant, and so is the third if a is not subsequently
used.
The quality of the generated code is determined by its speed and size.
For example if the target machine has an increment instruction (INC), then the
three address statement a = a+1 may be implemented more efficiently by the
single instruction INC a, rather than by a more obvious sequence that loads into a
register, add one to the register, and then stores the result back into a.
MOV a, R0
ADD #1, R0
MOV R0, a
Register Allocation
Instructions involving register operands are usually shorter and faster than those
involving operands in memory. Therefore, efficient utilization of register is
particularly important in generating good code. The use of registers is often
subdivided into two subproblems:
1. During register allocation, we select the set of variables that will reside in
registers at a point in the program.
2. During a subsequent register assignment phase, we pick the specific
register that a variable will reside in.
t := a + b
t := t + c
t := t / d
(b)
L R0, a
A R0, b
A R0, c
SRDA R0, 32
D R0, d
ST R1, t
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.
Form
Address
Associated Cost
Absolute
Register
Indexed
Indirect Register
Indirect Indexed
Immediate constant
M
M
R
R
c(R)
c + contents(R)
*R
contents(R)
*c(R) contents(c+contents(R))
#c
c
1
0
1
0
1
1
Example
Mov R0, M
into memory location M
Mov 4((R0), M
(R0)) into memory location M
MOV 4(R0), M
- Stores the value contents (contents
(4+contents (R0) into memory location M
Mov #1, R0
register R0
Instruction costs
Instruction cost = 1 + cost for source and destination
Source and destination having registers Cost = 0
Source and destination having memory locations Cost = 1
Source and destination having literals Cost = 1
Run-time Storage Management
Information needed for the execution of procedure is kept in the block of
storage called activation record.
Each program runs in a logical address space and it is divided into four
areas
1.Code : A statically determined area Code that holds the executable target
code. The size of the target code can be determined at compile time.
2.Static data : A statically determined data area Static for holding global
constants and other data generated by the compiler. The size of the global
constants and compiler data can also be determined at compile time.
3.Heap : A dynamically managed area Heap for holding data objects that
are allocated and freed during program execution. The size of the Heap
cannot be determined at compile time.
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.
10
#here + 20
MOV #here+20, Calee.static area
GOTO callee.Code_area
11
12
300:
320:
328:
336:
344:
352:
372:
380:
388:
396:
404:
424:
432:
440:
448:
456:
/*return*/
/*code for q*/
/*conditional jump to 456*/
ACTION4
ADD #qsize, SP
MOV #344, *SP /*push return address*/
GOTO 200
/*call p*/
SUB #qsize, SP
ACTION5
ADD #qsize, SP
MOV #396, *SP /*push return address*/
GOTO 300
/*call q*/
SUB #qsize, SP
ACTION6
ADD #qsize, SP
MOV #448, *SP
/*push return address*/
GOTO 300
/*call q*/
SUB #qsize, SP
GOTO *0(SP)
/*return*/
600:
/*stack starts here*/
Cost = 3
If register R0 contains z
ADD y, R0
MOV R0, x
Cost = 4
13
One of the primary issues during code generation is deciding how to use registers
to best advantages. There are four principal uses of registers
1. Some or all of the operands of an operation must be in registers in order to
perform the operation
2. Registers make good temporaries
3. Registers are used to hold global values
4. Registers are used to help with runtime storage management
Register and Address Descriptors
The code generation algorithm uses descriptors to keep track of register contents
and address for names
A register descriptor and An Address descriptor
Register descriptor
A register descriptor is used to keep track of what is currently in each register. The
register descriptor shows that initially all the registers are empty. As the code
generation for the block progresses the registers will hold the values of
computation. It is consulted whenever a new register is needed. The fields in the
register descriptor are
Status
Operand descriptor
Address descriptor
The address descriptor stores the location where the current value of the name can
be found at run time. The information about locations can be stored in the symbol
table and is used to access the variables. The location can be a register, a stack
location, a memory address.
Dr.V.D.Ambeth Kumar, Asso.Prof/CSE, PEC.
14
Attributes
Addressing Mode
DS
DS
IS
MEM
MEM
R0
15
1. If y is in a register and y is not live and has no next use after execution of
x=y op z then return the register of y for L
2. Failing (1), return an empty register for L if there is one.
3. Failing (2), if x has a next use in the block, or op is an operator, such as
indexing, that requires a register, find an occupied register R. Store the
value of R into memory location. Update the address descriptor for M and
return R.
4. If x is not used in the block, or no suitable occupied register can be found,
select the memory location of x as L.
Code Generation Example
Consider the statement d = (a - b) + (a - c) + (a - c)
Translated into the following three-address code:
t := a b
u := a c v := t + u d := v + u