A Model-Based Framework: An Approach For Profit-Driven Optimization
A Model-Based Framework: An Approach For Profit-Driven Optimization
A Model-Based Framework: An Approach For Profit-Driven Optimization
1. Introduction
The field of optimization has been extremely successful over the past 40+ years. As new languages and new architectures have been introduced, novel and effective optimizations have been developed to target and exploit both the software and hardware innovations. Many reports from research and commercial projects have indicated that the performance of software improves significantly through aggressive optimizations. Most successes in the field have come from the development of particular optimizations, such as loop optimizations and path sensitive optimizations. Although there were several long-standing problems with
optimizations, they have not been adequately addressed because optimizations were yielding performance improvements. These problems included knowing what optimizations to apply and when, where, in which order (i.e., phase ordering) and in which configuration (e.g., the tile size in loop tiling) to apply them for the best improvement. A number of events are occurring that demand solutions to these problems. First, because of the continued growth of embedded systems and the critical importance of time-to-market in this domain, there is an energetic movement to write embedded software in highlevel languages. The use of high-level languages in this area requires a high quality optimizing compiler that can intelligently apply optimizations to achieve the highest performance improvement. Another activity that has brought optimization problems to the forefront is the trend toward dynamic optimization. To be effective, dynamic optimization requires a good understanding of certain properties of optimizations. Currently, it is unclear when and where to apply optimizations dynamically and how aggressive optimization can be and still be profitable after factoring in the cost of applying the optimization. Last, although new optimizations continue to be developed and applied, the performance improvement is shrinking. The question then is whether the optimization field has reached its limit or do further improvements depend on solutions to these problems. We believe the latter is true. Traditionally, heuristics have been used to address some of the challenges of applying optimizations. However, heuristics tend to be ad hoc and focus specifically on a single or a small class of optimizations. Heuristics also require tuning parameters to select appropriate threshold values. The success of the heuristic can depend on these values and the best choice can vary for different optimizations and code contexts. To systematically tackle these problems, we need to better understand the properties of optimizations, especially operational properties. We define optimization properties as either semantic or operational. Semantic properties deal with the semantics of the optimizations and include correctness, soundness and optimization specification. Operational properties target the application
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
of optimizations and include profitability and interaction of optimizations. Although research on many of these properties has been limited, there has recently been a flurry of activity focusing on optimization properties. There are two approaches to explore the properties of optimizations. One is through formal techniques, which include developing formal specifications, analytic models, and proofs through model checking and theorem provers [20, 24, 33, 15, 32]. Another approach is experimental. That is, the properties are evaluated by actually applying optimizations and executing the optimized code. This approach is mostly used for exploring operational properties, which are useful for determining when, where and how to apply optimizations [31, 9, 6, 17, 1, 18]. Because of the high cost of applying optimizations and experimentally evaluating their properties [9, 18], our research focuses on formally investigating operational properties of optimizations through analytic models. With analytic models, we can study, for example, the profitability of optimizations. Also our goal is to model the interactions among optimizations and then use the models to predict the impact of a sequence of optimizations without actually applying them. In this paper, as a step toward our goal, we present a framework of analytic models for exploring the profitability of optimizations. In particular, we address the specific problems of how scalar optimizations impact registers, computation (i.e., functional units) and overall performance. A number of research efforts have shown that applying an optimization can degrade performance [4, 36]. To avoid this degradation, we use our framework to first predict the profitability of applying an optimization at a program point. Then based on whether there is a profit or not, we either apply it or not. The profitability of optimizations depends on code context, particular optimizations and machine resources, all of which need to be modeled. Thus, the framework includes models for code context, optimizations and resources. As part of the framework, we have a Profitability Engine that uses the models to predict the profitability of applying an optimization at any code point where it is applicable. We developed models for a number of optimizations including copy propagation, constant propagation, dead code elimination, Partial Redundancy Elimination (PRE) and Loop Invariant Code Motion (LICM). In this paper, we focus on the models for PRE and LICM. Models for the other optimizations are useful when considering the impact of a sequence of optimizations, which is beyond the scope of this paper. We implemented the models and the profitability engine for both optimizations and compared profit-driven PRE and LICM with a heuristicdriven approach. Our experiments demonstrate that a model-based approach is effective and efficient in that it can accurately predict the profitability of optimizations
with reasonable overhead. By determining the profitability, we can intelligently select profitable optimizations to apply in a systematic way. The contributions of this paper include: x A conceptual framework for investigating optimization profitability. The framework includes analytic code models, optimization models, and resource models for cache, registers and computation, and a profitability engine that uses the models to determine the performance profit. x An implementation of the framework for scalar optimizations (in particular PRE and LICM) that uses the profitability of PRE and LICM to determine when to apply them. x An experimental evaluation demonstrating that the model-based approach for predicting the profitability of optimizations is effective and efficient. x A general model-based technique that can be used to study properties of optimizations.
Cache
changes on access sequence
Cache
configuration & cost
Computation
instruction list
Computation
changes on instruction list
Computation
configuration & cost
Register
live ranges
Register
changes on live ranges
Register
configuration & cost
Register Allocation
spills on live ranges
Profitability Engine
Figure 1. Profit-driven optimization framework Our framework, given in Figure 1, has three types of analytic models (code, optimization and resource models)
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
and a profitability engine that processes the models and computes the profit. The resources considered are cache, registers and functional units. The focus of this paper is the performance profit (i.e., execution time). However, other resources, such as code size and power/energy can also be modeled and included in the framework. Register allocation is an optimization but it also plays a part in determining the impact of other optimizations on registers. Thus, an optimization model for register allocation is shown separately in Figure 1.
basic edits. For example, constant propagation can be expressed as Delete variable v at statement P; Insert constant c at statement P. To determine the impact of the optimizations on registers, an optimization model for the register allocator must be developed. The characteristics of the register allocator that need to be modeled are whether the allocator is local or global and how it spills the live ranges (i.e., the number of additional loads and stores that are inserted into the code). A model for the register allocator can be constructed that approximates a particular register allocation scheme, say graph coloring [7, 11] or linear scan [26]. In this work, we are interested in the impact of other optimizations on registers rather than the impact of a particular register allocation scheme. Hence we only need a representative register allocation model, such as coloring.
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
table, pre-s means the point immediately before statement s while post-s means the point immediately after statement s. For example, the effects on live ranges of inserting a use of v (1st row) depend on the current code. If v is already live at post-s, there is no change. If there is a use in the block of u before s, then the only change is to the local live range of v. Otherwise, the live range of v has changed and v has to be added to the set of live variables at the beginning of the block, IN, and all reaching predecessors. And then the profitability engine uses a register allocation model to determine the spills (i.e., loads and stores) caused by these live range changes. The last step for the engine is to use the number of spills and where these loads and stores are inserted or deleted to compute the profit.
PRE decreases the number of register spills by one (Assume there are 5 available hardware registers)
Figure 2: An example of PRE impacting registers The impact of PRE and LICM on computation is clear: they insert or delete instructions at some program points. Their impact on registers is more complicated and depends on the code context. Sometimes PRE or LICM may introduce more register spills, while in other cases they may decrease the number of spills. In Figure 2, we show an example where PRE improves the register pressure by decreasing one register spill. In the figure, PRE moves the last use of a and b up in the code and thus shortens their live ranges but introduces a new live range
For example, assume the impact of an optimization on registers is desired. The engine inputs the code model for registers, a model for this optimization, an optimization model for register allocation, and a resource model of registers. Then it determines the changes on the live ranges (i.e., the code model for registers), using an incremental dataflow algorithm [25]. Since an optimization models its changes by basic edits [3], the engine takes the edits and computes the changes in live ranges using Table 1. The table describes how the code changes of an optimization affect live ranges. In this
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
Code Model before PRE IN ( ) (a) ( ) (b) ( ) OUT (a, b) IN (a, b) (c) (a, b) OUT (c, a, b) IN (a, b) (c) ( ) OUT (c, a, b)
Profitability Engine Live range changes: < Ins a Use at 4>: no spill change <Ins b Use at 4>: no spill change <Ins v Def at 4>: no spill change <Del a Use at 8>: Critical region: (6, 7, 8) def-use in critical region: (1 use of a, 1 use of b, 1 use of c, 2 use of d, 1 use of g, 1 use of f, ) decrease one spill (a) <Del b Use at 8>: no spill change <Ins v Use at 8>: no spill change <Del c Def at 3>: no spill change <Ins v Def at 3>: no spill change <Ins v Use at 3>: no spill change <Ins c Def at 3>: no spill change
Code Model after PRE IN ( ) (a) ( ) (b) ( ) OUT (a, b) IN (a, b) (v) (a, b) (c) (v) OUT (c, v) IN (a, b) (c) ( ) (v) (a, b) OUT (c, v)
IN (c, a, b) (d) (c) (f) (c, d) (g) (f, d) (h) (a, b) OUT (c, d, g) Register Allocation OPT Model 1) global graph coloring 2) insert or delete a live range within a critical region will cost or save a spill 3) record the defs and uses in critical regions 4) choose the least costly live range to spill
IN (c, v) (d) (c) (f) (c, d) (g) (f, d) (h) (v) OUT (c, d, g)
Register Resource Model # of hardware registers: 5 Average memory access time: 3 cycles
profitability on registers
Figure 3: Models for the example in Figure 2 for the temporary variable, v. However, if a and b were used later, their live ranges would remain the same. In this case, the total number of live ranges increases by one due to the temporary variable. In the next sections, we present models for PRE and LICM. Figure 3 shows our framework to predict the impact of PRE on registers for the example in Figure 2. The models and the profitability engine in this example are explained below. The code model for computation represents the type and locations of instructions involved in the optimization. We represent each instruction element as
op, B1, N1 ,..., Bm, Nm
` ,
instruction, Bi represents the block number and Ni expresses the static number of op instruction in block Bi.
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
each variable and temporary, and choose to spill the live range that is the least costly to spill, under the assumption that the register allocator typically performs well. That is, we choose the live range which spans the critical region with the smallest number of definitions and uses. This register allocation optimization model is input to the profitability engine (see the next section) which then computes the critical regions for each basic code change, records the definitions and uses of variables and determines the spills. In Figure 3, the box labeled Register Allocation OPT Model shows the optimization model for the register allocator described above.
IF meet the partial redundant computation exp (X op Y): T exp is partial redundant at >Bs, Ss @ ; move it to >Bd , Sd @ and assign a new temporary V; T1 at >Ba1, Sa1@ Tn at >Ban, San@ are redundant expressions THEN step 1: Ins exp USE [Bd, Sd]
Ti at [ Bai, Sai] :
Del Ti DEF [Bai, Sai] Ins V DEF [Bai, Sai] Ins Ti DEF [Bai + 1, Sai + 1] Ins V USE [Bai + 1, Sai + 1]
LICM Optimization Models: LICM moves a statement from the body of a loop and places it outside the loop. There are certain conditions that must be met to safely apply LICM. The actions are similar to PRE (and in fact can be thought of as a subcase of PRE) and the resulting optimization models for registers and computation are similar. Based on code movements, the models can predict register impact (with live ranges, as described for PRE) and computation (with code edits and motions, as described for PRE). We do not show these models for brevity because they are similar to PRE.
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
engine inputs the code model for registers, a model for this optimization, a register allocation optimization model, and a resource model for registers. It determines the changes in the live ranges according to the optimization model. Then it computes the benefit/cost in terms of spills (i.e., loads and stores) changed by the optimization according to the register allocation optimization model. That is, for each live range change, the engine finds the critical regions and records the number of definitions and uses in the critical regions. When a live range is inserted or deleted within the critical region, the engine chooses the least costly live range to spill and computes the cost or benefit associated with the spill. An example is shown in Figure 3 (see the box labeled Profitability Engine). The Profitability Engine determines the changes on the code model and for each change, determines how the spills are affected. For brevity, only detailed actions for deleting the use of a at the statement 8 are shown. The critical region is from line 6 to line 8. Also the uses and definitions are recorded for the critical region. When one live range is deleted, one spill is decreased and a is chosen. The benefit of this PRE on registers comes from saving this spill. If the profits for all the resources, namely registers and computation are combined, they must have the same metric. The computation profit considers the frequency of a node, and therefore, the register profit also need to consider the execution frequency of the loads or stores, based on either profiling or input from the user.
machine and the average execution time was computed to factor out system effects. Section 4.1 presents the performance improvement of a heuristic-driven PRE and LICM. In Section 4.2, we compare our profit-driven PRE and LICM with always applying and the heuristic-driven PRE and LICM in terms of performance improvement and compile time. Section 4.3 describes the verification of our models in terms of their prediction accuracy.
4. Experimental Results
To evaluate the effectiveness of our framework and the usefulness of profit-driven optimization, we implemented models for PRE and LICM, described in Section 3.3, and integrated them into the Mach SUIF compiler [30]. We compared our profit-driven PRE and LICM with always applying an optimization and a heuristic-driven PRE and LICM, which takes register pressure into consideration. We extended the PRE pass implemented by Rolaz [19] in Mach SUIF and implemented LICM [23]. To enable more PRE or LICM opportunities, we also applied passes of copy propagation, constant propagation, and dead code elimination before PRE or LICM. For experiments, we used a number of SPEC2K benchmarks (gzip, vpr, mcf, parser, vortex, and twolf), which are the SPEC2K benchmarks that can be compiled by the currently available Mach SUIF compiler. We used a dual-processor AMD Athlon 1.4 GHz machine, with 2 GB of memory running RedHat Linux. Using the training data sets, we performed node profiling with the HALT library (included in Mach SUIF) to get the frequency counts used in our computation models. In all experiments, each benchmark was run three times on a lightly loaded
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
approach, we can not always achieve the best performance improvement. Table 2. Performance improvement of heuristicdriven PRE with different limits
Benchmark gzip vpr mcf parser vortex bzip2 twolf Heuristic-driven PRE 4 8 3.75 3.78 0.75 2.35 1.50 5.25 7.52 0.88 1.81 2.31 1.70 4.66 8.19 1.14
more spills and thus degrade performance. Both the heuristic approach and our approach can avoid the unprofitable PREs. However, the selection of limits in the heuristics plays an important role in the performance benefit, as described in Section 4.1. Our P-PRE considers both register pressure and computation to predict the profitability of PRE and applies it accordingly, without requiring parameters to be tuned. It consistently performs as good as or better than the Best-Heuristic for PRE, except for bzip2, where predictions are sometimes incorrect.
10
A-PRE
Best-heuristic
Heuristic-8
P-PRE
Performance Improvement %
8 6 4 2 0
gzip
vpr
mcf
parser
vortex
bzip2
twolf geomean
A-LICM
Best-heuristic
Heuristic-8
P-LICM
4.2. Comparing Profit-driven PRE and LICM with Heuristic-driven PRE and LICM
4.2.1. Performance Benefit Using our model-based framework, we can determine the profitability of an optimization and selectively apply it. The cases where optimizations degrade the performance can be avoided. Figures 6 and 7 show the performance benefit of profit-driven PRE and LICM over the baseline, compared with always applying and the heuristic-driven PRE and LICM. In Figure 6, APRE is the improvement of always applying PRE when it is applicable. Heuristic-driven PRE is described as above and has two versions based on the register pressure allowed: Best-heuristic is the best case performance across various register pressures for each benchmark and Heuristic-8 uses a fixed limit of eight. Lastly, P-PRE represents the performance benefit of our profit-driven PRE. Figure 7 shows the same configurations only applied to LICM. In Figure 6, the performance benefit of different approaches to decide when to apply PRE is shown. The problem with always applying PRE when it is applicable is that it may increase register pressure, which may incur
Performance Improvement %
8 6 4 2 0
gzip
vpr
mcf
parser
vortex
bzip2
twolf
geomean
Figure 7. Performance benefit of profit-driven LICM compared with heuristic-driven LICM Figure 7 shows the performance benefit of the different approaches for applying LICM. Due to the register pressure increase caused by some LICMs, the overall performance of A-LICM can be improved by not applying the unprofitable ones. Although the heuristicdriven LICM can achieve performance improvement over always applying in some cases, it is very important to choose the right limit on allowed register pressure. For example in parser, with a register pressure limit of 8, heuristic-driven LICM is worse than always applying. While in the best-heuristic, it is better than always applying. Our profit-driven LICM can perform at least as well as best-heuristic LICM in most cases. However, in one case (gzip), due to incorrect predictions, profit-driven
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
LICM has worse performance than the heuristic-driven approach. Thus, our experiments show that a model-based approach can be used to explore and determine the profitability of optimizations, and this profitability property can be useful in deciding when to apply optimizations. Also, the profitability measure predicted by our framework has other uses, such as being used as the fitness value in a heuristic search for the best optimization sequence [1, 18]. 4.2.2. Compile-time Because our approach uses analytic models to make decisions about applying optimizations, we investigated how compile-time is impacted by profit-driven optimization. Tables 4 and 5 show the compile-time for different optimization strategies for PRE and LICM. Table 4. Compile time for PRE (in seconds)
Benchmark gzip vpr mcf parser vortex bzip2 twolf A-PRE 42 128.38 20.89 100.67 490.48 33.77 755.55 Heuristic PRE 45.14 193 29 123 575.33 42.6 1087 P-PRE 48.78 216 30.5 136.31 633.1 44.1 1187.16
LICM compile-time varies from approximately 20 seconds to 591 seconds. Heuristic LICM increases compile-time over A-LICM from 11% to 38% (average 24%) and P-LICM increases compile time over A-LICM by 15% to 55% (average 31%). Finally, in comparison to heuristic LICM, our P-LICM increases compile time by an average of 6.2%. As the tables show, the increase in compile-time of our profit-driven approach is modest and about the same as the heuristic-driven approach. These small increases show that our approach is feasible and efficient. However, our prototype has several implementation artifacts that hurt performance; a production implementation could decrease the compile time further. We conclude that the modest compile-time increase is worth the benefit of applying the optimizations more effectively.
From Table 4, the A-PRE compile-time varies from approximately 20 seconds to 755 seconds. Compile-time shown for the heuristic approach is the average for the different limits. It increases 7 % to 50% over A-PRE, with an average of 30% because heuristic-driven PRE has to compute and update live range information. The compile time for profit-driven PRE increases over A-PRE by 16% to 68%, with an average of 40%. Because P-PRE considers computation and register pressure in a more precise way than the heuristic-driven PRE, it incurs a modest overhead increase over the heuristic approach by an average of 8.3%. From Table 5, similar compile-time trends can be seen for A-LICM, heuristic LICM and P-LICM. The A-
Table 6 shows the prediction accuracy of our framework for PRE and LICM. In the table, TP is the total number of predictions and CP is the number of
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
correct predictions when using our framework. %Acc is the overall percentage accuracy of our framework. As the table shows, the prediction accuracy varies from 78% to 96%, with an average of 88%. The results demonstrate that our models are indeed accurate and can correctly predict the profit (or cost) and the profit-driven optimizations can achieve performance benefit. On average, 12% of the time, our framework made inaccurate predictions. The inaccuracy is primarily from a simplified assumption used in the register optimization model about how the register allocator spills registers. The model assumes that the allocator will select the spill priority based solely on the number of uses and definitions in a live range. However, the Mach SUIF register allocator also uses the number of conflicting edges in the interference graph to make the decisions. Note that even without the detailed implementation information, our models achieve good prediction accuracy. If more accuracy is needed, the accuracy of our models can be improved by incorporating implementation information.
5. Related Work
In the introduction, we indicated prior work on optimization properties. In this section, we discuss prior work that relates to profitability of optimizations. To our knowledge, ours is the first work that focuses on predicting the impact of scalar optimizations and the impact on registers and computation. Our previous paper developed a framework that had code, loop optimization and cache models and demonstrated that the benefit of applying loop optimizations on cache could be predicted [36]. The work relied on models that had already been developed for modeling the cache and array access sequences [12, 14]. It did not consider scalar optimizations, registers or computation. In this paper, we develop a more powerful and general framework that has a profitability engine as well as models and thus can be used for many types of optimizations. There have been several approaches to address the problems of the application of optimizations. An approach to discover a best optimization configuration uses an analytic model of machine resources to statically estimate the performance of the optimized code instead of executing it [31]. However, because optimizations are not modeled in this approach, they still need to be applied. Another approach is to select an optimization level to recompile the methods based on an experimental resource model [2, 21]. The optimizer uses a simple benefit-cost analysis to decide whether to recompile a method at a higher optimization level. The benefit of an optimization level is estimated as a constant by offline experiments. However, this model does not include some aspects of
optimization behavior (e.g., the effect of optimizations depends on the code context). The last approach is based on analytic models of code, optimizations and resources [34, 35, 27, 22, 8, 29, 5, 16, 12, 28]. The idea is to use a resource cost model (e.g., cache cost) and optimization models (e.g., unimodular matrix transformations) to select a program-specified sequence or configuration to apply optimizations that maximizes the benefit. These techniques demonstrate that analytic models are efficient in driving the application of optimizations. However, all these techniques use models that express only a small set of optimizations (loop optimizations and data optimizations) and mainly attack a single problem; i.e., to improve the performance of cache [9]. Research using register pressure sensitive PRE [13] sets upper limits on allowable register pressure and then performs redundancy elimination within these limits. In this paper, we develop independent models of optimizations, while register pressure sensitive PRE uses data flow analysis to determine register pressure, which is integrated with the PRE algorithm and only works for PRE. They also do not consider the impact of PRE on computation.
6. Conclusions
In this paper, we presented a novel model-based framework that can be used to determine the profitability of optimizations. This work coupled with prior work, which considered loop optimizations, has a wide range of applicability in terms of optimizations and resources. Here, we demonstrate the value of our framework for the scalar optimizations PRE and LICM. Our model-based technique can make accurate predictions without applying and executing the optimized code. As such the potential exists for faster searches over different optimization sequences to determine an effective optimization order since we do not have to actually apply the optimizations or run the resulting code. Although our focus was on exploring the profitability property, other properties can be explored using the model-based approach. For example, we believe that models can be used to explore the interaction property. Using models, a good sequence of optimizations can be found without the added expense of applying and then removing the optimization (undoing the optimization or storing two versions of the code).
7. Acknowledgements
This research is supported in part by the National Science Foundation, Next Generation Software, grants CNS-0305198 and CNS-0203945. We would also like to thank John Regehr and the anonymous reviewers for their useful suggestions on how to improve the paper.
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE
8. References
[1] L. Almagor, K. Cooper, A. Grosul, T. Harvey, S. Reeves, D. Subramanian, L. Torczon and T. Waterman. Finding effective compilation sequences. ACM 2004 Conf. On Languages, Compilers, and Tools for Embedded Systems. M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the Jalapeo JVM. ACM 2000 Conf. on Object-oriented Programming, systems, languages, and applications. M. P. Bivens and M. L. Soffa. Incremental register reallocation. Software Practice & Experience,20(10), 1990. P. Briggs and K. D. Cooper. Effective partial redundancy elimination. SIGPLAN94 Conf. on Programming Language Design and Implementation. B. Chandramouli, J. Carter, W. Hsieh, and S. McKee. A cost framework for evaluating integrated restructuring optimizations. Intl. Conf. on Parallel Architectures and Compilation Techniques, September 2001. K. Cooper, T.J. Harvey, D. Subramanian, and L. Torczon. Compilation order matters. Technical Report, Rice University, 2002. G. Chaitin. Register allocation and spilling via graph coloring. ACM SIGPLAN Symp. on Compiler Construction, June 1982. S. Coleman and K. S. McKinley. Tile size selection using cache organization and data Layout. SIGPLAN95 Conference on Programming Language Design and Implementation, June 1995. K. Cooper, D. Subramanian, and L. Torczon. Adaptive optimizing compilers for the 21st century. The Journal of Supercomputing, August 2002. E. Duesterwald, R. Gupta, M. L. Soffa. Practical framework for demand-driven interprocedural data flow analysis. ACM Transactions on Programming Languages and Systems, November 1997. L. George and A. Appel. Iterated register coalescing. ACM Transactions on Programming Languages and Systems, May 1996. S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: a compiler framework for analyzing and tuning behavior. ACM Transactions on Programming Languages and Systems, July 1999. R. Gupta and R. Bodk. Register pressure sensitive redundancy elimination. 8th Intl. Conf. on Compiler Construction , 1999. J. S. Hu, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, H. Saputra, and W. Zhang. Compiler directed cache polymorphis, Proc. of LCTES/SCOPES, June 2002. C. Jaramillo, R. Gupta, and M.L. Soffa. Comparison Checking: An approach to avoid debugging of optimized code. Proceedings of Foundation of Software Engineering, 1999. M. Kandemir, J. Ramanujam, and A. Choudhary. Improving cache locality by a combination of loop and data transformations. IEEE Transactions on Computers, February 1999. T. Kisuki, P. M. W. Knijnenburg and M. F. P. OBoyle. Combined selection of tile Size and unroll factors using iterative compilation. Intl Conference on Parallel Architectures and Compilation Techniques, 2000.
[2]
[3] [4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18] P.Kulkarni, S. Hines, J. Hiser, D. Whalley, J. Davidson, and D. Jones. Fast searches for effective optimization phase sequences. SIGPLAN04 Conf. on Programming Language Design and Implementation, 2004. [19] L. Rolaz. An implementation of lazy code motion for MachSUIF. URL: http://lapwww.epfl.ch/dev/machsuif/opt_passes/lcm.pdf [20] S. Lerner, T. Millstein, and C. Chambers. Automatically proving the correctness of compiler optimizations. SIGPLAN03 on Programming Language Design and Implementation, 2003. [21] U. Holzle and D. Ungar. Reconciling responsiveness with performance in pure object-oriented languages. ACM Transactions on Programming Languages and Systems, July 1996. [22] K. McKinley, S. Carr, and C. Tseng. Improving Data Locality with Loop Transformations. ACM Transactions on Programming Languages and Systems, July 1996. [23] S. S. Muchnick. Advanced Compiler Design Implementation. Morgan Kaufmann Publishers, 1997. [24] G. C. Necula. Translation validation for an optimizing compiler. SIGPLAN 2000 conference on Programming language design and implementation. [25] L. Pollock and M.L. Soffa. An Incremental Version of Iterative Data Flow Analysis. IEEE Transactions on Software Engineering, December 1989. [26] M. Poletto and V. Sarkar. Linear Scan Register Allocation. ACM Transactions on Programming Languages and Systems, September 1999. [27] W. Pugh. Uniform Techniques for Loop Optimization. 5th International Conference on Supercomputing, 1991. [28] V. Sarkar and N. Megiddo. An Analytic Model for Loop Tiling and its Solution. Intl. Symp. on Performance Analysis of Systems and Software, 2000. [29] V. Sarkar, Automatic Selection of high-order transformations in the IBM XL FORTRAN compilers, IBM Journal of Research and Development, May 1997. [30] M. D. Smith and G. Holloway. An Introduction to Machine SUIF and Its Portable Libraries for Analysis and Optimization. URL: http://www.eecs.harvard.edu/hube/software [31] S. Triantafyllis, M. Vachharajani, N. Vachharajani, and D. I. August. Compiler Optimization-space Exploration. Intl. Symp. on Code Generation and Optimization, 2003. [32] D. Whitfield and M. L. Soffa. An Approach to Ordering optimizing transformations. ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, 1990. [33] D. Whitfield and M. L. Soffa. An Approach for Exploring Code Improving Transformations. ACM Transactions on Programming Languages, November 1997. [34] M. Wolf and M. Lam. A Data Locality Optimizing Algorithm. SIGPLAN91 Conference on Programming Language Design and Implementation. [35] M. E.Wolf, D. E. Maydan and D. Chen. Combining Loop Transformations Considering Caches and Scheduling. Intl. Symp. on Mircoarchitecture, 1996. [36] M. Zhao, B. R. Childers, and M. L. Soffa. Predicting the Impact of Optimizations for Embedded Systems. ACM Conf. On Languages, Compilers, and Tools for Embedded Systems, 2003.
Proceedings of the International Symposium on Code Generation and Optimization (CGO05) 0-7695-2298-X/05 $ 20.00 IEEE