Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Improving register allocation for subscripted variables

Published: 01 April 2004 Publication History

Abstract

Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floating-point registers, which are most often used as temporary repositories for subscripted variables.In this paper, we present a source-to-source transformation, called scalar replacement, that finds opportunities for reuse of subscripted variables and replaces the references involved by references to temporary scalar variables. The objective is to increase the likelihood that these elements will be assigned to registers by the coloring-based register allocators found in most compilers. In addition, we present transformations to improve the overall effectiveness of scalar replacement and show how these transformations can be applied in a variety of loop nest types. Finally, we present experimental results showing that these techniques are extremely effective---capable of achieving integer factor speedups over code generated by good optimizing compilers of conventional design.

References

[1]
W. Abu-Sufah. Improving the Performance of Virtual Memory Computers. PhD thesis, University of Illinois, 1978.
[2]
F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.
[3]
J. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In Conference Record of the Fourteenth ACM Symposium on the Principles of Programming Languages, Munich, West Germany, Jan. 1987.
[4]
J. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491--542, Oct. 1987.
[5]
J. Allen and K. Kennedy. Vector register allocation. IEEE Transactions on Computers, 41(10):1290 -- 1317, Oct. 1992.
[6]
P. Briggs, K. D. Cooper, K. Kennedy, and L. Torczon. Coloring heuristics for register allocation. In Proceedings of the ACM SIGPLAN '89 Conference on Programming Language Design and Implementation, pages 275--284, Portland, OR, July 1989.
[7]
D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined machines. Journal of Parallel and Distributed Computing, 5:334--358, 1988.
[8]
S. Carr. Combining optimization for cache and instruction-level parallelism. In Proceedings of the 1996 Conference on Parallel Architectures and Compiler Techniques, pages 238--247, Boston, MA, Oct. 1996.
[9]
S. Carr and Y. Guan. Unroll-and-jam using uniformly generated sets. In Proceedings of the 30th International Symposium on Microarchitecture (MICRO-30), Research Triangle Park, NC, Dec. 1997.
[10]
S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 16(6):1768--1810, 1994.
[11]
S. Carr and K. Kennedy. Scalar replacement in the presence of conditional control flow. Software - Practice & Experience, 24(1):51--77, Jan. 1994.
[12]
G. Chaitin, M. Auslander, A. Chandra, J. Cocke, M. Hopkins, and P. Markstein. Register allocation via coloring. Computer Languages, 6:45--57, Jan. 1981.
[13]
G. J. Chaitin. Register allocation and spilling via graph coloring. In Proceedings of the ACM SIGPLAN '82 Symposium on Compiler Construction, pages 98--105, Boston, MA, June 1982.
[14]
F. C. Chow and J. L. Hennessy. Register allocation by priority-based coloring. In Proceedings of the ACM SIGPLAN '84 Symposium on Compiler Construction, pages 222--232, Montreal, Quebec, June 1984.
[15]
K.-H. Drechsler and M. P. Stadel. A solution to a problem with Morel and Renvoise's "Global optimization by suppression of partial redundancies". ACM Transactions on Programming Languages and Systems, 10(4):635--640, Oct. 1988.
[16]
E. Duesterwald, R. Gupta, and M. L. Soffa. A practical data flow framework for array reference analysis and its use in optimizations. In Proceedings of the ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 68--77, Albuquerque, NM, June 1993.
[17]
D. Kuck, R. Kuhn, B. Leasure, and M. Wolfe. The structure of an advanced retargetable vectorizer. In Supercomputers: Design and Applications, pages 163--178. IEEE Computer Society Press, Silver Spring, MD., 1984.
[18]
D. Kuck, R. Kuhn, D. Padua, B. Leasure, and M. Wolfe. Dependence graphs and compiler optimizations. In Conference Record of the Eight ACM Symposium on the Principles of Programming Languages, 1981.
[19]
D. Kuck, Y. Muraoka, and S. Chen. On the number of operations simultaneously executable in fortran-like programs and their resulting speedup. IEEE Transactions on Computers, C-21(12):1293--1310, Dec. 1972.
[20]
L. Lamport. The parallel execution of DO-loops. Communications of the ACM, 17(2):83--93, 1974.
[21]
K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424--453, 1996.
[22]
D. A. Padua and M. J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184--1201, Dec. 1986.
[23]
Y. Qian, S. Carr, and P. Sweany. Optimizing loop performance for clustered vliw architectures. In Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, pages 271--280, Charlottesville, VA, Sept. 2002.
[24]
V. Sarkar. Optimized unrolling of nested loops. In Proceedings of the 2000 International Conference on Supercomputing, pages 153--166, Sante Fe, NM, May 2000.
[25]
M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30--44, Toronto, Ontario, June 1991.
[26]
M. E. Wolf, D. E. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In Twenty-Ninth Annual Symposium on Micorarchitecture (MICRO-29), Dec. 1996.
[27]
M. Wolfe. Advanced loop interchange. In Proceedings of the 1986 International Conference on Parallel Processing, Aug. 1986.
[28]
M. Wolfe. Loop skewing: The wavefront method revisited. Journal of Parallel Programming, 15(4):279--293, Aug. 1986.
[29]
{AC72} F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In Design and Optimization of Compilers, pages 1--30. Prentice-Hall, 1972.
[30]
{AK84a} J. R. Allen and K. Kennedy. Automatic loop interchange. In Proceedings of the SIGPLAN '84 Symposium on Compiler Construction, SIGPLAN Notices Vol. 19, No. 6, June 1984.
[31]
{AK84b} J. R. Allen and K. Kennedy. PFC: A program to convert fortran to parallel form. In Supercomputers: Design and Applications, pages 186--205. IEEE Computer Society Press, Silver Spring, MD., 1984.
[32]
{AK88} J. R. Allen and K. Kennedy. Vector register allocation. Technical report, Department of Computer Science, Rice University, 1988.
[33]
{AN87} A. Aiken and A. Nicolau. Loop quantization: An analysis and algorithm. Technical Report 87-821, Cornell University, March 1987.
[34]
{CAC+81} G. J. Chaitin, M. A. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. W. Markstein. Register allocation via coloring. Computer Languages, 6:45--57, January 1981.
[35]
{CCK87} D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined machines. In Proceedings of the 1987 International Conference on Parallel Processing, August 1987.
[36]
{CK89} S. Carr and K. Kennedy. Blocking linear algebra codes for memory hierarchies. In Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing, Chicago, IL, December 1989.
[37]
{DBMS79} J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart. LINPACK User's Guide. SIAM Publications, Philadelphia, 1979.
[38]
{GJG87} D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. In Proceedings of the First International Conference on Supercomputing. Springer-Verlag, Athens, Greece, 1987.
[39]
{IT88} F. Irigoin and R. Triolet. Supernode partitiong. In Conference Record of the Fifteenth ACM Symposium on the Principles of Programming Languages, pages 319--328, January 1988.
[40]
{Kuc78} D. Kuck. The Structure of Computers and Computations Volume 1. John Wiley and Sons, New York, 1978.
[41]
{Por89} A. K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989.
[42]
{Wol82} M. Wolfe. Optimizing Supercompilers for Supercomputers. PhD thesis, University of Illinois, October 1982.
[43]
{Wol86} M. Wolfe. Advanced loop interchange. In Proceedings of the 1986 International Conference on Parallel Processing, August 1986.
[44]
{Wol87} M. Wolfe. Iteration space tiling for memory hierarchies, December 1987. Extended version of a paper which appeared in Proceedings of the Third SIAM Conference on Parallel Processing.
[45]
{Wol89} M. Wolfe. More iteration space tiling. In Proceedings of the Supercomputing '89 Conference, 1989.

Cited By

View all
  • (2021)Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs2021 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)10.1109/ProTools54808.2021.00009(26-35)Online publication date: Nov-2021
  • (2010)Parallel hyperbolic PDE simulation on clusters: Cell versus GPUComputer Physics Communications10.1016/j.cpc.2010.07.049181:12(2164-2179)Online publication date: Dec-2010
  • (2010)On minimizing register usage of linearly scheduled algorithms with uniform dependenciesComputer Languages, Systems and Structures10.1016/j.cl.2009.12.00136:3(250-267)Online publication date: 1-Oct-2010
  • Show More Cited By
  1. Improving register allocation for subscripted variables

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 39, Issue 4
    20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999: A Selection
    April 2004
    673 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/989393
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 April 2004
    Published in SIGPLAN Volume 39, Issue 4

    Check for updates

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Measurement and Analysis of GPU-Accelerated OpenCL Computations on Intel GPUs2021 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)10.1109/ProTools54808.2021.00009(26-35)Online publication date: Nov-2021
    • (2010)Parallel hyperbolic PDE simulation on clusters: Cell versus GPUComputer Physics Communications10.1016/j.cpc.2010.07.049181:12(2164-2179)Online publication date: Dec-2010
    • (2010)On minimizing register usage of linearly scheduled algorithms with uniform dependenciesComputer Languages, Systems and Structures10.1016/j.cl.2009.12.00136:3(250-267)Online publication date: 1-Oct-2010
    • (2009)Compute Unified Device Architecture Application SuitabilityComputing in Science and Engineering10.1109/MCSE.2009.4811:3(16-26)Online publication date: 1-May-2009
    • (2008)Optimization principles and application performance evaluation of a multithreaded GPU using CUDAProceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming10.1145/1345206.1345220(73-82)Online publication date: 20-Feb-2008

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media