Unit - Viii Machine Dependent Code Optimization Peephole Optimization
Unit - Viii Machine Dependent Code Optimization Peephole Optimization
Unit - Viii Machine Dependent Code Optimization Peephole Optimization
Suresh
1
UNIT VIII
MACHINE DEPENDENT CODE OPTIMIZATION
PEEPHOLE OPTIMIZATION
Generate code and then improve the quality of the target code by applying "optimizing"
transformations to the target program. Many simple transformations can significantly improve
the running time or space requirement of the target program.
A simple technique for locally improving the target code is peephole optimization, which
is done by examining a sliding window of target instructions (called the peephole).
Replacing instruction sequences within the peephole by a shorter or faster sequence,
whenever possible.
Peephole optimization can also be applied directly after intermediate code generation to
improve the intermediate representation.
The peephole is a small, sliding window on a program. The code in the peephole need not
be contiguous, although some implementations do require this.. In general, repeated passes over
the target code are necessary to get the maximum benefit.
The following examples of program transformations that are characteristic of peephole
optimizations:
Redundant-instruction elimination
Flow-of-control optimizations
Algebraic simplifications
Use of machine idioms
1) Eliminating Redundant Loads and Stores:
If we see the instruction sequence
LD a, RO
ST RO, a
in a target program, we can delete the store instruction because whenever it is executed, the first
instruction will ensure that the value of a has already been loaded into register RO.
Note that if the store instruction had a label, we could not be sure that the first instruction
is always executed before the second, so we could not remove the store instruction
.
The two instructions have to be in the same basic block for this transformation to be safe.
VIEW A.N.Suresh
2
2) Eliminating Unreachable Code
Another opportunity for peephole optimization is the removal of unreachable
instructions. An unlabeled instruction immediately following an unconditional jump may be
removed. This operation can be repeated to eliminate a sequence of instructions.
For example, for debugging purposes, a large program may have within it certain code
fragments that are executed only if a variable debug is equal to 1.
In the intermediate representation, this code may look like
if debug == 1 goto LI
goto L2
LI: print debugging information
L2:
One obvious peephole optimization is to eliminate jumps over jumps,
if debug != 1 goto L2
print debugging information
L2:
If debug is set to 0 at the beginning of the program, constant propagation would transform this
sequence into
if 0 != 1 goto L2
print debugging information
L2:
Now the argument of the first statement always evaluates to true, so the statement can be
replaced by goto L2. Then all statements that print debugging information are unreachable and
can be eliminated one at a time.
3 ) Flow-of-Control Optimizations
Simple intermediate code-generation algorithms frequently produce jumps to jumps,
jumps to conditional jumps, or conditional jumps to jumps. These unnecessary jumps can be
eliminated in either the intermediate code or the target code by the following types of peephole
optimizations.
We can replace the sequence
goto Li
LI: goto L2
by the sequence
goto L2
LI: goto L2
If there are now no jumps to LI, then it may be possible to eliminate the statement
LI: goto L2 provided it is preceded by an unconditional jump.
Similarly, the sequence
if a < b goto LI
VIEW A.N.Suresh
3
LI: goto L2
Can be replaced by the sequence
if a < b goto L2
LI: goto L2
Suppose there is only one jump to LI and LI is preceded by an unconditional goto. Then
the sequence
goto LI
LI: if a < b goto L2
L3:
may be replaced by the sequence
if a < b goto L2
goto L3
L3:
While the number of instructions in the two sequences is the same, we sometimes skip
the unconditional jump in the second sequence, but never in the first. Thus, the second
sequence is superior to the first in execution time.
4 )Algebraic Simplification and Reduction in Strength
Algebraic identities that could be used to simplify DAG's. These algebraic identities can
also be used by a peephole optimizer to eliminate three-address statements such as
x = x + 0
or
x = x * 1 in the peephole.
Reduction-in-strength transformations can be applied in the peephole to replace
expensive operations by equivalent cheaper ones on the target machine.
Certain machine instructions are considerably cheaper than others.
Example:
x2 is invariably cheaper to implement as x * x than as a call to an exponentiation routine.
Fixed-point multiplication or division by a power of two is cheaper to implement as a
shift.
Floating-point division by a constant can be approximated as multiplication by a
constant, which may be cheaper.
5) Use of Machine Idioms
The target machine may have hardware instructions to implement certain specific
operations efficiently. For example,
Some machines have auto-increment and auto-decrement addressing modes.
These add or subtract one from an operand before or after using its value. The use of the
modes greatly improves the quality of code when pushing or popping a stack, as in
parameter passing.
VIEW A.N.Suresh
4
REGISTER ALLOCATION
Instructions involving only register operands are faster than those involving memory
operands. Therefore, efficient utilization of registers is vitally important in generating good
code.
One approach to register allocation and assignment is to assign specific values in the
target program to certain registers. For example, we could decide to assign base addresses to one
group of registers, arithmetic computations to another, the top of the stack to a fixed register, and
so on.
Advantage: It simplifies the design of a code generator.
Disadvantage : It uses registers inefficiently; certain registers may go unused over
substantial portions of code, while unnecessary loads and stores are generated into the
other registers.
Global Register Allocation
All live variables were stored at the end of each block. To save some of these stores and
corresponding loads, we might arrange to assign registers to frequently used variables and keep
these registers consistent across block boundaries (globally).
Since programs spend most of their time in inner loops, a natural approach to global
register assignment is to try to keep a frequently used value in a fixed register throughout a loop.
One strategy for global register allocation is to assign some fixed number of registers to
hold the most active values in each inner loop. The selected values may be different in different
loops. Registers not already allocated may be used to hold values local to one block .
Drawback : The fixed number of registers is not always the right number to make
available for global register allocation.
Usage Counts
The savings to be realized by keeping a variable x in a register for the duration of a loop L
is one unit of cost for each reference to x if x is already in a register.
To generate code for a block, there is a good chance that after x has been computed in a
block it will remain in a register if there are subsequent uses of x in that block.
Thus we count a savings of one for each use of x in loop L that is not preceded by an
assignment to x in the same block. We also save two units if we can avoid a store of x at the end
of a block.
Thus, if x is allocated a register, we count a savings of two for each block in loop L for
which x is live on exit and in which x is assigned a value.
VIEW A.N.Suresh
5
If x is live on entry to the loop header, we must load x into its register just before entering
loop L. This load costs two units. Similarly, for each exit block B of loop L at which x is live on
entry to some successor of B outside of L, we must store x at a cost of two.
Thus, an approximate formula for the benefit to be realized from allocating a register x within
loop L is
where use(x, B) is the number of times x is used in B prior to any definition of x; live(x, B) is 1 if
x is live on exit from B and is assigned a value in B, and live(x, B) is 0 otherwise.
E x a m p l e:
Consider the the basic blocks in the inner loop depicted in Fig. A. Assume
registers RO, Rl, and R2 are allocated to hold values throughout the loop. Variables live on
entry into and on exit from each block.
Fig A
To evaluate x = a, we observe that a is live on exit from B1 and is assigned a value there, but is
not live on exit from B2, B3, or B4. Thus, J2B in L use(a.:B) 2. Hence the value of x a is 4.
That is, four units of cost can be saved by selecting a for one of the global registers. The
values for b, c, d, e, and f are 5, 3, 6, 4, and 4, respectively.
Register Assignment for Outer Loops
If an outer loop L1 contains an inner loop L2, the names allocated registers in L2 need not
be allocated registers in L1 L2. Similarly, if we choose to allocate x a register in L2 but not L1,
we must load x on entrance to L2 and store x on exit from L2.
VIEW A.N.Suresh
6
Register Allocation by Graph Coloring
When a register is needed for a computation but all available registers are in use, the
contents of one of the used registers must be stored (spilled) into a memory location in order to
free up a register. Graph coloring is a simple technique for allocating registers and managing
register spills.
In the method, two passes are used.
1) Target-machine instructions are selected as though there are an infinite number of
symbolic registers; in effect, names used in the intermediate code become names of
registers and the three-address instructions become machine-language instructions.
2) Assign physical registers to symbolic ones. The goal is to find an assignment that
minimizes the cost of spills.
In the second pass, for each procedure a register-interference graph is constructed in
which the nodes are symbolic registers and an edge connects two nodes if one is live at a point
where the other is defined.
A graph is said to be colored if each node has been assigned a color in such a way that no
two adjacent nodes have the same color. A color represents a register, and the color makes sure
that no two symbolic registers that can interfere with each other are assigned the same physical
register.
Garbage collection via reference count
Data that cannot be referenced is generally known as garbage. Many high-level
programming languages remove the burden of manual memory management from the
programmer by offering automatic garbage collection, which de allocates unreachable data.
Design Goals for Garbage Collectors
Garbage collection is the reclamation of chunks of storage holding objects that can no
longer be accessed by a program. Objects have a type that can be determined by the garbage
collector at run time. From the type information, we can tell how large the object is and which
components of the object contain references to other objects.
A Basic Requirement: Type Safety
A language in which the type of any data component can be determined is said to be type
safe. There are type-safe languages like ML, for which we can determine types at compile time.
There are other typesafe languages, like Java, whose types cannot be determined at compile time,
but can be determined at run time.
The latter are called dynamically typed languages. If a language is neither statically nor
dynamically type safe, then it is said to be unsafe.
VIEW A.N.Suresh
7
Unsafe languages, which unfortunately include some of the most important languages
such as C and C + + , are bad candidates for automatic garbage collection.
Performance Metrics
The performance metrics that must be considered when designing a garbage collector are,
Overall Execution Time
Garbage collection can be very slow. It is important that it not significantly increase the
total run time of an application. Since the garbage collector necessarily must touch a lot of data,
its performance is determined greatly by how it leverages the memory subsystem.
Space Usage.
It is important that garbage collection avoid fragmentation and make the best use of the
available memory.
Pause Time.
Simple garbage collectors are notorious for causing programs the mutators to
pause suddenly for an extremely long time, as garbage collection kicks in without warning. Thus,
besides minimizing the overall execution time, it is desirable that the maximum pause time be
minimized. As an important special case, real-time applications require certain computations to
be completed within a time limit. We must either suppress garbage collection while performing
real-time tasks, or restrict maximum pause time. Thus, garbage collection is seldom used in real-
time applications.
Program Locality.
We cannot evaluate the speed of a garbage collector solely by its running time. The
garbage collector controls the placement of data and thus influences the data locality of the
mutator program. It can improve a mutator's temporal locality by freeing up space and reusing it;
it can improve the mutator's spatial locality by relocating data used together in the same cache or
pages.
Reachability
We refer to all the data that can be accessed directly by a program, without having to
dereference any pointer, as the root set For example, in Java the root set of a program consists of
all the static field members and all the variables on its stack. A program obviously can reach any
member of its root set at any time.
Reachability becomes a bit more complex when the program has been optimized by the
compiler.
First, a compiler may keep reference variables in registers. These references must also
be considered part of the root set.
Second, even though in a type-safe language programmers do not get to manipulate
memory addresses directly, a compiler often does so for the sake of speeding up the
code.
VIEW A.N.Suresh
8
Here are some things an optimizing compiler can do to enable the garbage collector
to find the correct root set:
The compiler can restrict the invocation of garbage collection to only certain code points
in the program, when no "hidden" references exist.
The compiler can write out information that the garbage collector can use to recover all
the references, such as specifying which registers contain references, or how to compute
the base address of an object that is given an internal address.
The compiler can assure that there is a reference to the base address of all reachable
objects whenever the garbage collector may be invoked
There are four basic operations that a mutator performs to change the set of
reachable objects
Object Allocations. These are performed by the memory manager, which returns a
reference to each newly allocated chunk of memory. This operation adds members to the
set of reachable objects.
Parameter Passing and Return Values. References to objects are passed from the actual
input parameter to the corresponding formal parameter, and from the returned result back
to the callee. Objects pointed to by these references remain reachable.
Reference Assignments. Assignments of the form u = v, where u and v are references,
have two effects. First, u is now a reference to the object referred to by v. As long as u is
reachable, the object it refers to is surely reachable. Second, the original reference in u is
lost. If this reference is the last to some reachable object, then that object becomes
unreachable. Any time an object becomes unreachable, all objects that are reachable only
through references contained in that object also become unreachable.
Procedure Returns. As a procedure exits, the frame holding its local variables is popped
off the stack. If the frame holds the only reachable reference to any object, that object
becomes unreachable. Again, if the now unreachable objects hold the only references to
other objects, they too become unreachable, and so on.
Reference Counting Garbage Collectors
Consider a simple, garbage collector, based on reference counting, which identifies garbage
as an object changes from being reachable to unreachable; the object can be deleted when its
count drops to zero. With a reference-counting garbage collector, every object must have a field
for the reference count.
Reference counts can be maintained as follows:
1. Object Allocation. The reference count of the new object is set to 1.
VIEW A.N.Suresh
9
2. Parameter Passing. The reference count of each object passed into a
procedure is incremented.
3. Reference Assignments. For statement u = v, where u and v are references,
the reference count of the object referred to by v goes up by one,
and the count for the old object referred to by u goes down by one.
4. Procedure Returns. As a procedure exits, all the references held by the
local variables of that procedure activation record must also be decremented.
If several local variables hold references to the same object, that
object's count must be decremented once for each such reference.
5. Transitive Loss of Reachability. Whenever the reference count of an object
becomes zero, we must also decrement the count of each object pointed
to by a reference within the object.