Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Super-scalar processor design
Publisher:
  • Stanford University
  • 408 Panama Mall, Suite 217
  • Stanford
  • CA
  • United States
Order Number:UMI Order No: GAX89-25892
Reflects downloads up to 03 Sep 2024Bibliometrics
Skip Abstract Section
Abstract

A super-scalar processor is one that is capable of sustaining an instruction-execution rate of more than one instruction per clock cycle. Maintaining this execution rate is primarily a problem of scheduling processor resources (such as functional units) for high utilization. A number of scheduling algorithms have been published, with wide-ranging claims of performance over the single-instruction issue of a scalar processor. However, a number of these claims are based on idealizations or on special-purpose applications.

This study uses trace-driven simulation to evaluate many different super-scalar hardware organizations. It uses general-purpose benchmark programs executed with a typical RISC instruction set. Highly-optimized versions of the benchmark programs are used, to avoid measuring concurrency that is due to a lack of compiler optimization. However, the compiler performs no optimizations specifically for the super-scalar processor, to provide the fairest measure of super-scalar hardware performance. In contrast to previous studies, this study examines a wide range of cost and performance tradeoffs, rather than focusing on one specific processor organization or scheduling algorithm. Furthermore, the results are not based on idealizations; for example, they include the effects of realistic functional-unit latencies, instruction and data caches, and multi-tasking.

Within this framework, super-scalar performance is limited primarily by instruction-fetch inefficiencies caused by both branch delays and instruction misalignment. Because of this instruction-fetch limitation, it is not worthwhile to explore highly-concurrent execution hardware. Rather, it is more appropriate to explore economical execution hardware that more closely matches the instruction throughout provided by the instruction fetcher. This study examines techniques for reducing the instruction-fetch inefficiencies and explores the resulting hardware organizations.

This study concludes that a super-scalar processor can have nearly twice the performance of a scalar processor, but that this requires that four major hardware features: out-of-order execution, register renaming, branch prediction, and a four-instruction decoder. These features are interdependent, and removing, any single feature reduces average performance by 18% or more. However, there are many hardware simplifications that cause only a small performance reduction.

Contributors
  • Stanford University

Recommendations