Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Compiler Design (Unit-5)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

UNIT-5

CODE GENERATION
The final phase in compiler model is the code generator. It takes as input an intermediate
representation of the source program and produces as output an equivalent target program. The
code generation techniques presented below can be used whether or not an optimizing phase
occurs before code generation.

Position of code generator

source front end intermediate code intermediate code target


program code optimizer code generator program

symbol
table

ISSUES IN THE DESIGN OF A CODE GENERATOR

The following issues arise during the code generation phase :

1. Input to code generator


2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order

1. Input to code generator:


 The input to the code generation consists of the intermediate representation of the source
program produced by front end, together with information in the symbol table to
determine run-time addresses of the data objects denoted by the names in the
intermediate representation.

 Intermediate representation can be :


a. Linear representation such as postfix notation
b. Three address representation such as quadruples
c. Virtual machine representation such as stack machine code
d. Graphical representations such as syntax trees and dags.

 Prior to code generation, the front end must be scanned, parsed and translated into
intermediate representation along with necessary type checking. Therefore, input to code
generation is assumed to be error-free.

2. Target program:
 The output of the code generator is the target program. The output may be :
a. Absolute machine language
- It can be placed in a fixed memory location and can be executed immediately.
b. Re-locatable machine language
- It allows subprograms to be compiled separately.

c. Assembly language
- Code generation is made easier.

3. Memory management:
 Names in the source program are mapped to addresses of data objects in run-time
memory by the front end and code generator.

 It makes use of symbol table, that is, a name in a three-address statement refers to a
symbol-table entry for the name.

 Labels in three-address statements have to be converted to addresses of instructions.

4. Instruction selection:
 The instructions of target machine should be complete and uniform.

 Instruction speeds and machine idioms are important factors when efficiency of target
program is considered.

 The quality of the generated code is determined by its speed and size.

 The former statement can be translated into the latter statement as shown below:

5. Register allocation
 Instructions involving register operands are shorter and faster than those involving
operands in memory.

 The use of registers is subdivided into two sub problems :


 Register allocation– the set of variables that will reside in registers at a point in
the program is selected.
 Register assignment– the specific register that a variable will reside in is
picked.

 Certain machine requires even-odd register pairs for some operands and
results.For example , consider the division instruction of the form :
D x, y
where, x – dividend even register in even/odd register pair
y – divisor
even register holds the remainder
odd register holds the quotient
6. Evaluation order
 The order in which the computations are performed can affect the efficiency of the
target code. Some computation orders require fewer registers to hold intermediate
results than others.

TARGET MACHINE

 Familiarity with the target machine and its instruction set is a prerequisite for designing a
good code generator.
 The target computer is a byte-addressable machine with 4 bytes to a word.
 It has n general-purpose registers, R 0, R1, . . . , Rn-1.
 It has two-address instructions of the form:
op source, destination
where, op is an op-code, and source and destination are data fields.

RUN-TIME STORAGE MANAGEMENT


 Information needed during an execution of a procedure is kept in a block of storage
called an activation record, which includes storage for names local to the procedure.
 The two standard storage allocation strategies are:
1. Static allocation
2. Stack allocation
 In static allocation, the position of an activation record in memory is fixed at compile
time.
 In stack allocation, a new activation record is pushed onto the stack for each execution of
a procedure. The record is popped when the activation ends.
 The following three-address statements are associated with the run-time allocation and
deallocation of activation records:
1. Call,
2. Return,
3. Halt, and
4. Action, a placeholder for other statements.
 We assume that the run-time memory is divided into areas for:
1. Code
2. Static data
3. Stack
BASIC BLOCKS AND FLOW GRAPHS
Basic Blocks

 A basic block is a sequence of consecutive statements in which flow of control enters at


the beginning and leaves at the end without any halt or possibility of branching except at
the end.
 The following sequence of three-address statements forms a basic block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5

Basic Block Construction:

Algorithm: Partition into basic blocks

Input: A sequence of three-address statements

Output: A list of basic blocks with each three-address statement in exactly one block

Method:

1. We first determine the set of leaders, the first statements of basic blocks. The
ruleswe use are of the following:
a. The first statement is a leader.
b. Any statement that is the target of a conditional or unconditional goto is a
leader.
c. Any statement that immediately follows a goto or conditional goto statement
is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but not
including the next leader or the end of the program.
 Consider the following source code for dot product of two vectors a and b of length 20

begin

prod :=0;

i:=1;

do begin

prod :=prod+ a[i] * b[i];

i :=i+1;

end

while i <= 20

end

 The three-address code for the above source program is given as :


(1) prod := 0

(2) i := 1

(3) t1 := 4* i

(4) t2 := a[t1] /*compute a[i] */

(5) t3 := 4* i

(6) t4 := b[t3] /*compute b[i] */

(7) t5 := t2*t4

(8) t6 := prod+t5

(9) prod := t6

(10) t7 := i+1

(11) i := t7

(12) if i<=20 goto (3)

Basic block 1: Statement (1) to (2)

Basic block 2: Statement (3) to (12)


Transformations on Basic Blocks:

A number of transformations can be applied to a basic block without changing the set of
expressions computed by the block. Two important classes of transformation are :

 Structure-preserving transformations

 Algebraic transformations

1. Structure preserving transformations:

a) Common sub expression elimination:

a:=b+c a:=b+c
b:=a–d b:=a-d
c:=b+c c:=b+c
d:=a–d d:=b

Since the second and fourth expressions compute the same expression, the basic block can be
transformed as above.

b) Dead-code elimination:

Suppose x is dead, that is, never subsequently used, at the point where the statement
x : =y + z appears in a basic block. Then this statement may be safely removed without
changingthe value of the basic block.

c) Renaming temporary variables:

A statement t : = b + c( t is a temporary ) can be changed to u : = b + c(u is a new


temporary) and all uses of this instance of t can be changed to u without changing the value
ofthe basic block.
Such a block is called a normal-form block.

d) Interchange of statements:

Suppose a block has the following two adjacent statements:

t1 : = b + c
t2 : = x + y

We can interchange the two statements without affecting the value of the block if and
only if neither x nor y is t1 and neither b nor c is t2.

2. Algebraic transformations:

Algebraic transformations can be used to change the set of expressions computed by a basic
block into an algebraically equivalent set.
Examples:
i) x : = x + 0 or x : = x * 1 can be eliminated from a basic block without changing the set of
expressions it computes.
ii) The exponential statement x : = y * * 2 can be replaced by x : = y * y.
Flow Graphs

 Flow graph is a directed graph containing the flow-of-control information for the set of
basic blocks making up a program.
 The nodes of the flow graph are basic blocks. It has a distinguished initial node.
 E.g.: Flow graph for the vector dot product is given as follows:

prod : = 0 B1
i:=1

t1 : = 4 * i
t2 : = a [ t1 ]
t3 : = 4 * i
B2
t4 : = b [ t3 ]
t5 : = t2 * t4
t6 : = prod + t5
prod : = t6
t7 : = i + 1
i : = t7
if i <= 20 goto B2

B is the initial node. B 2 immediately follows B1, so there is an edge from B1 to B2. The
1
target of jump from last statement of B1 is the first statement B2, so there is an edge from
B1 (last statement) to B2 (first statement).
 B 1 is the predecessor of B2, and B2 is a successor of B 1.

Loops

 A loop is a collection of nodes in a flow graph such that


1. All nodes in the collection are strongly connected.
2. The collection of nodes has a unique entry.
 A loop that contains no other loops is called an inner loop.

THE DAG REPRESENTATION FOR BASIC BLOCKS


 A DAG for a basic block is a directed acyclic graph with the following labels on nodes:
1. Leaves are labeled by unique identifiers, either variable names or constants.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels to store the
computed values.
 DAGs are useful data structures for implementing transformations on basic blocks.
 It gives a picture of how the value computed by a statement is used in subsequent
statements.
 It provides a good way of determining common sub - expressions.
Algorithm for construction of DAG

Input: A basic block

Output: A DAG for the basic block containing the following information:

1. A label for each node. For leaves, the label is an identifier. For interior nodes, an
operator symbol.
2. For each node a list of attached identifiers to hold the computed values.
Case (i) x : = y OP z

Case (ii) x : = OP y

Case (iii) x : = y

Method:

Step 1:If y is undefined then create node(y).

If z is undefined, create node(z) for case(i).

Step 2:For the case(i), create a node(OP) whose left child is node(y) and right child is

node(z). ( Checking for common sub expression). Let n be this node.

For case(ii), determine whether there is node(OP) with one child node(y). If not create such
a node.

For case(iii), node n will be node(y).

Step 3:Delete x from the list of identifiers for node(x). Append x to the list of attached

identifiers for the node n found in step 2 and set node(x) to n.

Example: Consider the block of three- address statements:

1. t1 := 4* i
2. t2 := a[t1]
3. t3 := 4* i
4. t4 := b[t3]
5. t5 := t2*t4
6. t6 := prod+t5
7. prod := t6
8. t7 := i+1
9. i := t7
10. if i<=20 goto (1)
Stages in DAG Construction
Application of DAGs:

1. We can automatically detect common sub expressions.


2. We can determine which identifiers have their values used in the block.
3. We can determine which statements compute values that could be used outside the block.
CODE OPTIMIZATION

INTRODUCTION

 The code produced by the straight forward compiling algorithms can often be made to run
faster or take less space, or both. This improvement is achieved by program transformations
that are traditionally called optimizations. Compilers that apply code-improving
transformations are called optimizing compilers.

 Optimizations are classified into two categories. They are


 Machine independent optimizations:
 Machine dependant optimizations:

Machine independent optimizations:

 Machine independent optimizations are program transformations that improve the target code
without taking into consideration any properties of the target machine.

Machine dependant optimizations:

 Machine dependant optimizations are based on register allocation and utilization of special
machine-instruction sequences.

The criteria for code improvement transformations:

 Simply stated, the best program transformations are those that yield the most benefit for the
least effort.

 The transformation must preserve the meaning of programs. That is, the optimization must
not change the output produced by a program for a given input, or cause an error such as
division by zero, that was not present in the original source program. At all times we take the
“safe” approach of missing an opportunity to apply a transformation rather than risk
changing what the program does.

 A transformation must, on the average, speed up programs by a measurable amount. We are


also interested in reducing the size of the compiled code although the size of the code has
less importance than it once had. Not every transformation succeeds in improving every
program, occasionally an “optimization” may slow down a program slightly.

 The transformation must be worth the effort. It does not make sense for a compiler writer to
expend the intellectual effort to implement a code improving transformation and to have the
compiler expend the additional time compiling source programs if this effort is not repaid
when the target programs are executed. “Peephole” transformations of this kind are simple
enough and beneficial enough to be included in any compiler.
PRINCIPAL SOURCES OF OPTIMISATION

 A transformation of a program is called local if it can be performed by looking only at the


statements in a basic block; otherwise, it is called global.
 Many transformations can be performed at both the local and global levels. Local
transformations are usually performed first.

Function-Preserving Transformations
 There are a number of ways in which a compiler can improve a program without
changing the function it computes.
 The transformations

 Common sub expression elimination,


 Copy propagation,
 Dead-code elimination, and
 Constant folding

Are common examples of such function-preserving transformations. The other


transformations come up primarily when global optimizations are performed.

Frequently, a program will include several calculations of the same value, such as an offset in an
array. Some of the duplicate calculations cannot be avoided by theprogrammer because they lie below
the level of detail accessible within the sourcelanguage.

 Common Sub expressions elimination:


 An occurrence of an expression E is called a common sub-expression if E was previously
computed, and the values of variables in E have not changed since the previous
computation. We can avoid re computing the expression if we can use the previously
computed value.
 For example
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t4: = 4*i
t5 : = n
t6: = b [t4] +t5

The above code can be optimized using the common sub-expression elimination as
t1: = 4*i
t2: = a [t1]
t3: = 4*j
t5 : = n
t6: = b [t1] +t5

The common sub expression t4: =4*i is eliminated as its computation is alre ady in t1. And
value of i is not been changed from definition to use.
 Copy Propagation:
 Assignments of the form f : = g called copy statements, or copies for short. The idea
behind the copy-propagation transformation is to use g for f, whenever possible after the
copy statement f: = g. Copy propagation means use of one variable instead of another.
This may not appear to be an improvement, but as we shall see it gives us an opportunity
to eliminate x.
 For example:

x=Pi;
……
A=x*r*r;

The optimization using copy propagation can be done as follows:

A=Pi*r*r;

Here the variable x is eliminated

 Dead-Code Eliminations:
 A variable is live at a point in a program if its value can be used subsequently;
otherwise,it is dead at that point. A related idea is dead or useless code, statements that
compute values that never get used. While the programmer is unlikely to introduce any dead
code intentionally, it may appear as the result of previous transformations. An optimization
canbe done by eliminating dead code.
Example:

i=0;
if(i=1)
{
a=b+5;
}

Here, „if‟ statement is dead code because this condition will never get satisfied.

 Constant folding:
 We can eliminate both the test and printing from the object code. More generally,
deducing at compile time that the value of an expression is a constant and using the
constant instead is known as constant folding.

 One advantage of copy propagation is that it often turns the copy statement into dead
code.
 For example,
a=3.14157/2 can be replaced by
a=1.570 there by eliminating a division operation.
 Loop Optimizations:
 We now give a brief introduction to a very important place for optimizations, namely
loops, especially the inner loops where programs tend to spend the bulk of their time. The
running time of a program may be improved if we decrease the number of instructions in
an inner loop, even if we increase the amount of code outside that loop.
 Three techniques are important for loop optimization:

 code motion, which moves code outside a loop;


 Induction-variable elimination, which we apply to replace variables from inner loop.
 Reduction in strength, which replaces and expensive operation by a cheaper one, such as
a multiplication by an addition.

 Code Motion:
 An important modification that decreases the amount of code in a loop is code motion.
This transformation takes an expression that yields the same result independent of the
number of times a loop is executed ( a loop-invariant computation) and places the
expression before the loop. Note that the notion “before the loop” assumes the existence
of an entry for the loop. For example, evaluation of limit-2 is a loop-invariant
computation in the following while-statement:

while (i <= limit-2) /* statement does not change limit*/

Code motion will result in the equivalent of

t= limit-2;
while (i<=t) /* statement does not change limit or t */

 Induction Variables :
 Loops are usually processed inside out. For example consider the loop around B3.
 Note that the values of j and t 4 remain in lock-step; every time the value of j decreases by
1, that of t4 decreases by 4 because 4*j is assigned to t4. Such identifiers are called
induction variables.
 When there are two or more induction variables in a loop, it may be possible to get rid of
all but one, by the process of induction-variable elimination. For the inner loop around
B3 in Fig. we cannot get rid of either j or t4 completely; t4 is used in B3 and j in B4.
However, we can illustrate reduction in strength and illustrate a part of the process of
induction-variable elimination. Eventually j will be eliminated when the outer loop of B2
- B5 is considered.

Example:
As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig. and t4 is not
changed elsewhere in the inner loop around B3, it follows that just after the statement
j:=j-1 the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t4:=
4*j by t4:= t4-4. The only problem is that t4 does not have a value when we enter block B3
for the first time. Since we must maintain the relationship t4=4*j on entry to the block B3,
we place an initializations of t4 at the end of the block where j itself is
before after

initialized, shown by the dashed addition to block B1 in second Fig.


 The replacement of a multiplication by a subtraction will speed up the object code if
multiplication takes more time than addition or subtraction, as is the case on many
machines.

 Reduction In Strength:
 Reduction in strength replaces expensive operations by equivalent cheaper ones on the
target machine. Certain machine instructions are considerably cheaper than others and
can often be used as special cases of more expensive operators.
 For example, x² is invariably cheaper to implement as x*x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of two is
cheaper to implement as a shift. Floating-point division by a constant can be implemented
as multiplication by a constant, which may be cheaper.

OPTIMIZATION OF BASIC BLOCKS


There are two types of basic block optimizations. They are :

 Structure-Preserving Transformations
 Algebraic Transformations

1) Structure-Preserving Transformations:

The primary Structure-Preserving Transformation on basic blocks are:

 Common sub-expression elimination


 Dead code elimination
 Renaming of temporary variables
 Interchange of two independent adjacent statements.

 Common sub-expression elimination:


Common sub expressions need not be computed over and over again. Instead they can be
computed once and kept in store from where it‟s referenced when encountered aga in – of course
providing the variable values in the expression still remain constant.

Example:

a: =b+c
b: =a-d
c: =b+c
d: =a-d

The 2nd and 4th statements compute the same expression: b+c and a-d

Basic block can be transformed to

a: = b+c
b: = a-d
c: = a
d: = b

 Dead code elimination:


It‟s possible that a large amount of dead (useless) code may exist in the program. This
might be especially caused when introducing variables and procedures as part of construction or
error-correction of a program – once declared and defined, one forgets to remove them in case
they serve no purpose. Eliminating these will definitely optimize the code.

 Renaming of temporary variables:


 A statement t:=b+c where t is a temporary name can be changed to u:=b+c where u is
another temporary name, and change all uses of t to u.
 In this we can transform a basic block to its equivalent block called normal-form block.
 Interchange of two independent adjacent statements:

 Two statements

t1:=b+c

t2:=x+y

can be interchanged or reordered in its computation in the basic block when value of t 1
does not affect the value of t2.

2) Algebraic Transformations:

 Algebraic identities represent another important class of optimizations on basic blocks.


This includes simplifying expressions or replacing expensive operation by cheaper ones
i.e. reduction in strength.
 Another class of related optimizations is constant folding. Here we evaluate constant
expressions at compile time and replace the constant expressions by their values. Thus
the expression 2*3.14 would be replaced by 6.28.
 The relational operators <=, >=, <, >, + and = sometimes generate unexpected common
sub expressions.
 Associative laws may also be applied to expose common sub expressions. For example, if
the source code has the assignments

a :=b+c
e :=c+d+b

the following intermediate code may be generated:

a :=b+c
t :=c+d
e :=t+b

 Example:

x:=x+0 can be removed

x:=y**2 can be replaced by a cheaper statement x:=y*y

The compiler writer should examine the language carefully to determine what
rearrangements of computations are permitted, since computer arithmetic does not always
obey the algebraic identities of mathematics. Thus, a compiler may evaluate x*y-x*z as
x*(y-z) but it may not evaluate a+(b-c) as (a+b)-c.
PEEPHOLE OPTIMIZATION

 A statement-by-statement code-generations strategy often produce target code that


contains redundant instructions and suboptimal constructs .The quality of such target
code can be improved by applying “optimizing” transformations to the target program.
 A simple but effective technique for improving the target code is peephole optimization,
a method for trying to improving the performance of the target program by examining a
short sequence of target instructions (called the peephole) and replacing these
instructions by a shorter or faster sequence, whenever possible.
 The peephole is a small, moving window on the target program. The code in the peephole
need not contiguous, although some implementations do require this.it is characteristic of
peephole optimization that each improvement may spawn opportunities for additional
improvements.
 We shall give the following examples of program transformations that are characteristic
of peephole optimizations:

 Redundant-instructions elimination
 Flow-of-control optimizations
 Algebraic simplifications
 Use of machine idioms
 Unreachable Code
INTRODUCTION TO GLOBAL DATAFLOW ANALYSIS

 In order to do code optimization and a good job of code generation, compiler needs to
collect information about the program as a whole and to distribute this information to
each block in the flow graph.

 A compiler could take advantage of “reaching definitions” , such as knowing where a


variable like debug was last defined before reaching a given block, in order to
performtransformations are just a few examples of data-flow information that an
optimizing compiler collects by a process known as data-flow analysis.

 Data-flow information can be collected by setting up and solving systems of equations of


the form :

out [S] = gen [S] U ( in [S] – kill [S] )

This equation can be read as “ the information at the end of a statement is either generated
within the statement , or enters at the beginning and is not killed as control flows through
the statement.”

 The details of how data-flow equations are set and solved depend on three factors.

 The notions of generating and killing depend on the desired information, i.e., on the data
flow analysis problem to be solved. Moreover, for some problems, instead of proceeding
along with flow of control and defining out[s] in terms of in[s], we need to proceed
backwards and define in[s] in terms of out[s].

 Since data flows along control paths, data-flow analysis is affected by the constructs in a
program. In fact, when we write out[s] we implicitly assume that there is unique end
point where control leaves the statement; in general, equations are set up at the level of
basic blocks rather than statements, because blocks do have unique end points.

 There are subtleties that go along with such statements as procedure calls, assignments
through pointer variables, and even assignments to array variables.

CODE IMPROVIG TRANSFORMATIONS


 Algorithms for performing the code improving transformations rely on data-flow
information. Here we consider common sub-expression elimination, copy propagation and
transformations for moving loop invariant computations out of loops and for eliminating
induction variables.

 Global transformations are not substitute for local transformations; both must be performed.

Elimination of global common sub expressions:

 The available expressions data-flow problem discussed in the last section allows us to
determine if an expression at point p in a flow graph is a common sub-expression. The
following algorithm formalizes the intuitive ideas presented for eliminating common sub-
expressions.
Copy propagation:

 Various algorithms introduce copy statements such as x :=copies may also be generated
directly by the intermediate code generator, although most of these involve temporaries
local to one block and can be removed by the dag construction. We may substitute y for x
in all these places, provided the following conditions are met every such use u of x.

 Statement s must be the only definition of x reaching u.

 On every path from s to including paths that go through u several times, there are no
assignments to y.

 Condition (1) can be checked using ud-changing information. We shall set up a new data-
flow analysis problem in which in[B] is the set of copies s: x:=y such that every path
from initial node to the beginning of B contains the statement s, and subsequent to the
last occurrence of s, there are no assignments to y.

Detection of loop-invariant computations:

 Ud-chains can be used to detect those computations in a loop that are loop-invariant, that
is, whose value does not change as long as control stays within the loop. Loop is a region
consisting of set of blocks with a header that dominates all the other blocks, so the only
way to enter the loop is through the header.

 If an assignment x := y+z is at a position in the loop where all possible definitions of y


and z are outside the loop, then y+z is loop-invariant because its value will be the same
each time x:=y+z is encountered. Having recognized that value of x will not change, consider v
:= x+w, where w could only have been defined outside the loop, then x+w is also loop-invariant.

Performing code motion:

 Having found the invariant statements within a loop, we can apply to some of them an
optimization known as code motion, in which the statements are moved to pre-header of
the loop. The following three conditions ensure that code motion does not change what
the program computes. Consider s: x: =y+z.

 The block containing s dominates all exit nodes of the loop, where an exit of a loop is a
node with a successor not in the loop.

 There is no other statement in the loop that assigns to x. Again, if x is a temporary


assigned only once, this condition is surely satisfied and need not be changed.
 No use of x in the loop is reached by any definition of x other than s. This condition too
will be satisfied, normally, if x is temporary.
Elimination of induction variable:

 A variable x is called an induction variable of a loop L if every time the variable x


changes values, it is incremented or decremented by some constant. Often, an induction
variable is incremented by the same constant each time around the loop, as in a loop
headed by for i := 1 to 10.

 However, our methods deal with variables that are incremented or decremented zero, one,
two, or more times as we go around a loop. The number of changes to an induction
variable may even differ at different iterations.

 A common situation is one in which an induction variable, say i, indexes an array, and
some other induction variable, say t, whose value is a linear function of i, is the actual
offset used to access the array. Often, the only use made of i is in the test for loop
termination. We can then get rid of i by replacing its test by one on t.

 We shall look for basic induction variables, which are those variables i whose only
assignments within loop L are of the form i := i+c or i-c, where c is a constant.

You might also like