Memory-hierarchy management

January 1993

Author:
Steven Mark Carr
Rice Univ., Houston, TX

Publisher:

Rice University
6100 S. Main Houston, TX
United States

Order Number:UMI Order No. GAX94-08602

Bibliometrics

Abstract

The trend in high-performance microprocessor design is toward increasing computational power on the chip. Microprocessors can now process dramatically more data per machine cycle than previous models. Unfortunately, memory speeds have not kept pace. The result is an imbalance between computation speed and memory speed. This imbalance is leading machine designers to use more complicated memory hierarchies. In turn, programmers are explicitly restructuring codes to perform well on particular memory systems, leading to machine-specific programs.

It is our belief that machine-specific programming is a step in the wrong direction. Compilers, not programmers, should handle machine-specific implementation details. To this end, this thesis develops and experiments with compiler algorithms that manage the memory hierarchy of a machine for floating-point intensive numerical codes. Specifically, we address the following issues:

Scalar replacement. Lack of information concerning the flow of array values in standard data-flow analysis prevents the capturing of array reuse in registers. We develop and experiment with a technique to perform scalar replacement in the presence of conditional-control flow to expose array reuse to standard data-flow algorithms.

Unroll-and-jam. Many loops require more data per cycle than can be processed by the target machine. We present and experiment with an automatic technique to apply unroll-and-jam to such loops to reduce their memory requirements.

Loop interchange. Cache locality in programs run on advanced microprocessors is critical to performance. We develop and experiment with a technique to order loops within a nest to attain good cache locality.

Blocking. Iteration-space blocking is a technique used to attain temporal locality within cache. Although it has been applied to "simple" kernels, there has been no investigation into its applicability over a range of algorithmic styles. We show how to apply blocking to loops with trapezoidal-, rhomboidal-, and triangular-shaped iteration spaces. In addition, we show how to overcome certain complex dependence patterns.

Experiments with the above techniques have shown that integer-factor speedups on a single chip are possible. These results reveal that many numerical algorithms can be expressed in a natural, machine-independent form while retaining good memory performance through the use of compiler optimizations.

Cited By

Contributors

Steven M. Carr
Western Michigan University
- Publication Years1989 - 2023
- Publication counts55
- Citation count2,415
- Available for Download48
- Downloads (cumulative)24,595
- Downloads (12 months)2,079
- Downloads (6 weeks)378
- Average Downloads per Article512
- Average Citation per Article44
View Full Profile

Index Terms

Memory-hierarchy management
1. Applied computing
  1. Computers in other domains
    1. Personal computers and PC applications
      1. Microcomputers
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Comments

Recommendations

The MIPS R10000 Superscalar Microprocessor

The Mips R10000 is a dynamic superscalar microprocessor that implements the 64-bit Mips-4 Instruction Set Architecture. It fetches and decodes four instructions per cycle and dynamically issues them to five fully pipelined low-latency execution units. ...
Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion

Because of the increasing gap between the speeds of processors and main memories, compilers must enhance the locality of applications to achieve high performance. Loop fusion enhances locality by fusing loops that access similar sets of data. Typically, ...
UltraSparc I: A Four-Issue Processor Supporting Multimedia

UltraSPARC-I is a general-purpose processor implementing the SPARC V9 64-bit RISC architecture. In addition to supporting this Instruction Set Architecture (ISA), UltraSPARC-I includes over 30 new multimedia instructions (VIS - Visual Instruction Set) ...

Browse Theses

Sections

Cited By

Index Terms

The MIPS R10000 Superscalar Microprocessor

Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion

UltraSparc I: A Four-Issue Processor Supporting Multimedia

Sections

Cited By

Save to Binder

Index Terms

Recommendations

The MIPS R10000 Superscalar Microprocessor

Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion

UltraSparc I: A Four-Issue Processor Supporting Multimedia