Superscalar multiprocessor design: | Guide books

Superscalar multiprocessor designJanuary 1991

Author:
Mike Johnson
Advanced Micro Devices

Publisher:

Prentice-Hall, Inc.
Division of Simon and Schuster One Lake Street Upper Saddle River, NJ
United States

ISBN:978-0-13-875634-5

Published:03 January 1991

Pages:

288

Available at Amazon

Bibliometrics

Abstract

No abstract available.

Cited By

Gebregiorgis A, Du Nguyen H, Yu J, Bishnoi R, Taouil M, Catthoor F and Hamdioui S (2022). A Survey on Memory-centric Computer Architectures, ACM Journal on Emerging Technologies in Computing Systems, 18:4, (1-50), Online publication date: 31-Oct-2022.
Yu J, Yan M, Khyzha A, Morrison A, Torrellas J and Fletcher C (2021). Speculative taint tracking (STT), Communications of the ACM, 64:12, (105-112), Online publication date: 1-Dec-2021.
Christie D, Clark M and Schulte M (2021). What Made Us Stronger: An Inside Look Back at the History of AMD Microprocessor Development, IEEE Micro, 41:6, (29-36), Online publication date: 1-Nov-2021.
Aşılıoğlu G, Jin Z, Köksal M, Javeri O and Önder S (2015). LaZy superscalar, ACM SIGARCH Computer Architecture News, 43:3S, (260-271), Online publication date: 4-Jan-2016.
Aşılıoğlu G, Jin Z, Köksal M, Javeri O and Önder S LaZy superscalar Proceedings of the 42nd Annual International Symposium on Computer Architecture, (260-271)
Jin Z, Aşilioğlu G and Önder S Mower Proceedings of the 29th ACM on International Conference on Supercomputing, (285-294)
Dubey P, O'Brien K, O'Brien K and Barton C Single-program speculative multithreading (SPSM) architecture Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, (109-121)
Farrens M, Tyson G and Pleszkun A A study of single-chip processor/cache organizations for large numbers of transistors Proceedings of the 21st annual international symposium on Computer architecture, (338-347)
Farrens M, Tyson G and Pleszkun A (1994). A study of single-chip processor/cache organizations for large numbers of transistors, ACM SIGARCH Computer Architecture News, 22:2, (338-347), Online publication date: 1-Apr-1994.

Contributors

Mike Johnson
Advanced Micro Devices, Inc.
- Publication Years1991 - 1991
- Publication counts1
- Citation count9
- Available for Download0
- Downloads (cumulative)0
- Downloads (12 months)0
- Downloads (6 weeks)0
- Average Downloads per Article0
- Average Citation per Article9
View Full Profile

Index Terms

Superscalar multiprocessor design

Reviews

Reviewer: Ashoke Deb

Miniaturization of computers, and the resulting proximity of their building blocks, usually shortens the data transfer time. In order to maximize the execution speed of programs, many interrelated parameters need to be considered. The total execution time of a program can be minimized by minimizing the number of instructions per program, minimizing the average number of cycles per instruction, and minimizing the processor clock cycle. Achieving these goals requires investigating issues such as the size and complexity (packing) of an instruction—RISC, CISC, and VLIW, parallelism—data parallelism and instruction parallelism, multiplicity of operations—multiple loadstore, parallel fetch, parallel decode, and multiple functional units, detection and exploitation of parallelism—out-of-order issue and out-of-order execution, design and control of the pipelines, memory architecture, and the bus or communication network among the elements of the machine. Johnson focuses on the design issues as they relate to general-purpose superscalar RISC machines. The book contains 12 chapters, more than 150 figures and charts, 10 tables, an appendix, and a bibliography, containing more than 80 recent papers. The tables and the figures are insightful—for example, tables show a “Comparison of Scalar and Superscalar Pipelines” and a “Critical Path for Central Window Issuing.” Figures include such items as “Performance Growth of Scalar and Superscalar Processor,” “Performance of Scoreboarding Compared to Renaming,” and “Relative Contribution of Reservation Stations to Lost Instruction Bandwidth.” Chapter 1 is “Beyond Pipelining, CISC, and RISC.” Chapter 2, “An Introduction to Superscalar Concepts,” covers their fundamental limitations, instruction issues and machine parallelism, the related concepts of VLIW and superpipelined processors, and unrelated parallel schemes. Chapter 3, “Developing an Execution Model,” discusses the simulation technique, benchmarking performance, basic observations on hardware design, the design of the standard processor, the real performance limit, and background. In chapter 4, “Instruction Fetching and Decoding,” Johnson presents branches and instruction-fetch inefficiencies, improving fetch efficiency, implementing hardware branch-prediction, implementing a four-instruction decoder, implementing branches, and reducing the penalty of procedural dependencies. Chapter 5, “The Role of Exception Recovery,” includes buffering state information for restart, restart implementation and its effect on performance, and observations on processor restart. Chapter 6, “Register Dataflow,” covers dependency mechanisms, result buses and arbitration, result forwarding, and supplying instruction operands. Chapter 7, “Out-of-Order Issue,” discusses reservation stations and implementing a central instruction window, and offers some observations. Chapter 8, “Memory Dataflow,” presents the ordering of loads and stores, and addressing and dependencies. Johnson then asks, “What I s More LoadStore Parallelism Worth ” and discusses “Esoterica: Multiprocessing Considerations” and some observations on accessing external data. Chapter 9, “Complexity and Controversy,” contains a brief glimpse at design complexity, major hardware features, hardware simplifications, and a section that asks “Is the Complexity Worth It ” Chapter 10, “Basic Software Scheduling,” covers the benefits of scheduling, program information needed for scheduling, the relationship of the scheduler and the compiler, and algorithms for scheduling basic blocks. It concludes by revisiting the hardware. Chapter 11, “Software Scheduling Across Branches,” discusses trace scheduling, loop unrolling, software pipelining, global code motion, and out-of-order issue and scheduling across branches. Chapter 12, “Evaluating Alternatives: A Perspective on Superscalar Microprocessors,” is divided into two sections—“The Case for Software Solutions” and “The Case for Hardware Solutions.” An appendix presents the architecture and implementation of a superscalar 386. It should be apparent that this work is not a general-purpose book on computer organization or computer architecture. According to the author, “This book is intended as a technical tutorial and introduction for engineers and computer scientists as well as a graduate-level text for students who have a strong background in computer architecture.” It is both specialized and special. In this extremely well-written text, Johnson systematically guides the reader through the issues, problems, and choices of a machine designer with a pragmatic viewpoint. The pragmatism is derived from extensive simulation studies using a collection of general-purpose programs, such as awk, simple, LINPACK, yacc, Whetstone, and LaT E X. I recommend this book highly for anyone interested in computer architecture.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Recommendations

Complexity-effective superscalar processors
ISCA '97: Proceedings of the 24th annual international symposium on Computer architecture

The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is defined. Then the specific areas of register renaming, instruction window wakeup and selection logic, and operand bypassing are ...
Complexity-effective superscalar processors
Register renaming for x86 superscalar design
ICPADS '96: Proceedings of the 1996 International Conference on Parallel and Distributed Systems

Register renaming eliminates storage conflicts for registers to allow more instruction level parallelism. This idea requires nontrivial implementation, however, especially when registers are accessible with different fields and data lengths. As a result,...

Browse Books

Sections

Cited By

Index Terms

Reviews

Access critical reviews of Computing literature here

Complexity-effective superscalar processors

Complexity-effective superscalar processors

Register renaming for x86 superscalar design

Save to Binder

Sections

Cited By

Save to Binder

Index Terms

Reviews

Access critical reviews of Computing literature here

Recommendations

Complexity-effective superscalar processors

Complexity-effective superscalar processors

Register renaming for x86 superscalar design