Super-scalar processor design

January 1990

Author:
W. M. Johnson

Publisher:

Stanford University
408 Panama Mall, Suite 217
Stanford
CA
United States

Order Number:UMI Order No: GAX89-25892

Bibliometrics

Abstract

A super-scalar processor is one that is capable of sustaining an instruction-execution rate of more than one instruction per clock cycle. Maintaining this execution rate is primarily a problem of scheduling processor resources (such as functional units) for high utilization. A number of scheduling algorithms have been published, with wide-ranging claims of performance over the single-instruction issue of a scalar processor. However, a number of these claims are based on idealizations or on special-purpose applications.

This study uses trace-driven simulation to evaluate many different super-scalar hardware organizations. It uses general-purpose benchmark programs executed with a typical RISC instruction set. Highly-optimized versions of the benchmark programs are used, to avoid measuring concurrency that is due to a lack of compiler optimization. However, the compiler performs no optimizations specifically for the super-scalar processor, to provide the fairest measure of super-scalar hardware performance. In contrast to previous studies, this study examines a wide range of cost and performance tradeoffs, rather than focusing on one specific processor organization or scheduling algorithm. Furthermore, the results are not based on idealizations; for example, they include the effects of realistic functional-unit latencies, instruction and data caches, and multi-tasking.

Within this framework, super-scalar performance is limited primarily by instruction-fetch inefficiencies caused by both branch delays and instruction misalignment. Because of this instruction-fetch limitation, it is not worthwhile to explore highly-concurrent execution hardware. Rather, it is more appropriate to explore economical execution hardware that more closely matches the instruction throughout provided by the instruction fetcher. This study examines techniques for reducing the instruction-fetch inefficiencies and explores the resulting hardware organizations.

This study concludes that a super-scalar processor can have nearly twice the performance of a scalar processor, but that this requires that four major hardware features: out-of-order execution, register renaming, branch prediction, and a four-instruction decoder. These features are interdependent, and removing, any single feature reduces average performance by 18% or more. However, there are many hardware simplifications that cause only a small performance reduction.

Cited By

Contributors

William M Johnson
Stanford University
- Publication Years1989 - 1990
- Publication counts2
- Citation count15
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article8
View Full Profile

Index Terms

Super-scalar processor design
1. Software and its engineering
  1. Software creation and management
    1. Designing software
  2. Software notations and tools
    1. Compilers

Comments

Recommendations

Super-Scalar Processor Design
Simple super-matrix processor

Data-parallel applications are growing in importance and demanding increased performance from hardware. Since the fundamental data structures for a wide variety of data parallel applications are scalar, vector, and matrix, this paper proposes a simple ...
Software register synchronization for super-scalar processors with partitioned register files

Browse Theses

Sections

Cited By

Index Terms