Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Memory-hierarchy management
Publisher:
  • Rice University
  • 6100 S. Main Houston, TX
  • United States
Order Number:UMI Order No. GAX94-08602
Reflects downloads up to 13 Sep 2024Bibliometrics
Skip Abstract Section
Abstract

The trend in high-performance microprocessor design is toward increasing computational power on the chip. Microprocessors can now process dramatically more data per machine cycle than previous models. Unfortunately, memory speeds have not kept pace. The result is an imbalance between computation speed and memory speed. This imbalance is leading machine designers to use more complicated memory hierarchies. In turn, programmers are explicitly restructuring codes to perform well on particular memory systems, leading to machine-specific programs.

It is our belief that machine-specific programming is a step in the wrong direction. Compilers, not programmers, should handle machine-specific implementation details. To this end, this thesis develops and experiments with compiler algorithms that manage the memory hierarchy of a machine for floating-point intensive numerical codes. Specifically, we address the following issues:

Scalar replacement. Lack of information concerning the flow of array values in standard data-flow analysis prevents the capturing of array reuse in registers. We develop and experiment with a technique to perform scalar replacement in the presence of conditional-control flow to expose array reuse to standard data-flow algorithms.

Unroll-and-jam. Many loops require more data per cycle than can be processed by the target machine. We present and experiment with an automatic technique to apply unroll-and-jam to such loops to reduce their memory requirements.

Loop interchange. Cache locality in programs run on advanced microprocessors is critical to performance. We develop and experiment with a technique to order loops within a nest to attain good cache locality.

Blocking. Iteration-space blocking is a technique used to attain temporal locality within cache. Although it has been applied to "simple" kernels, there has been no investigation into its applicability over a range of algorithmic styles. We show how to apply blocking to loops with trapezoidal-, rhomboidal-, and triangular-shaped iteration spaces. In addition, we show how to overcome certain complex dependence patterns.

Experiments with the above techniques have shown that integer-factor speedups on a single chip are possible. These results reveal that many numerical algorithms can be expressed in a natural, machine-independent form while retaining good memory performance through the use of compiler optimizations.

Cited By

  1. Qasem A and Kennedy K A cache-conscious profitability model for empirical tuning of loop fusion Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (106-120)
  2. Kejariwal A, D'Alberto P, Nicolau A and Polychronopoulos C A geometric approach for partitioning n-dimensional non-rectangular iteration spaces Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (102-116)
  3. Ghosh S, Kanhere A, Krishnaiyer R, Kulkarni D, Li W, Lim C and Ng J Integrating high-level optimizations in a production compiler Proceedings of the 12th international conference on Compiler construction, (303-319)
  4. Kennedy K (2019). Fast Greedy Weighted Fusion, International Journal of Parallel Programming, 29:5, (463-491), Online publication date: 1-Oct-2001.
  5. ACM
    Fraboulet A, Kodary K and Mignotte A Loop fusion for memory space optimization Proceedings of the 14th international symposium on Systems synthesis, (95-100)
  6. Kalinov A, Lastovetsky A, Ledovskikh I and Posypkin M (2019). Compilation of Vector Statements of C[] Language for Architectures with Multilevel Memory Hierarchy, Programming and Computing Software, 27:3, (111-122), Online publication date: 1-May-2001.
  7. ACM
    Kennedy K Fast greedy weighted fusion Proceedings of the 14th international conference on Supercomputing, (131-140)
  8. ACM
    Jiménez M, Llabería J, Fernández A and Morancho E A general algorithm for tiling the register level Proceedings of the 12th international conference on Supercomputing, (133-140)
  9. ACM
    Lu J and Cooper K Register promotion in C programs Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, (308-319)
  10. ACM
    Lu J and Cooper K (2019). Register promotion in C programs, ACM SIGPLAN Notices, 32:5, (308-319), Online publication date: 1-May-1997.
  11. Carr S and Guan Y Unroll-and-jam using uniformly generated sets Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, (349-357)
  12. ACM
    Leopold C A fuzzy approach to automatic data locality optimization Proceedings of the 1996 ACM symposium on Applied Computing, (515-518)
  13. ACM
    Carr S, McKinley K and Tseng C (1994). Compiler optimizations for improving data locality, ACM SIGPLAN Notices, 29:11, (252-262), Online publication date: 1-Nov-1994.
  14. ACM
    Carr S, McKinley K and Tseng C Compiler optimizations for improving data locality Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, (252-262)
  15. ACM
    Carr S, McKinley K and Tseng C (1994). Compiler optimizations for improving data locality, ACM SIGOPS Operating Systems Review, 28:5, (252-262), Online publication date: 1-Dec-1994.
  16. ACM
    Hall M, Harvey T, Kennedy K, McIntosh N, McKinley K, Oldham J, Paleczny M and Roth G Experiences using the ParaScope Editor Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming, (33-43)
  17. ACM
    Hall M, Harvey T, Kennedy K, McIntosh N, McKinley K, Oldham J, Paleczny M and Roth G (1993). Experiences using the ParaScope Editor, ACM SIGPLAN Notices, 28:7, (33-43), Online publication date: 1-Jul-1993.
  18. Carr S and Kennedy K Compiler blockability of numerical algorithms Proceedings of the 1992 ACM/IEEE conference on Supercomputing, (114-124)
Contributors
  • Western Michigan University

Recommendations