Article

Free access

WISQ: a restartable architecture using queues

Authors:

A. R. Pleszkun,

P. B. SchechterAuthors Info & Claims

ISCA '87: Proceedings of the 14th annual international symposium on Computer architecture

June 1987

Pages 290 - 299

https://doi.org/10.1145/30350.30383

Published: 01 June 1987 Publication History

Abstract

In this paper, the WISQ architecture is described. This architecture is designed to achieve high performance by exploiting new compiler technology and using a highly segmented pipeline. By having a highly segmented pipeline, a very-high-speed clock can be used. Since a highly segmented pipeline will require relatively long pipelines, a way must be provided to minimize the effects of pipeline bubbles that are formed due to data and control dependencies. It is also important to provide a way of supporting precise interrupts. These goals are met, in part, by providing a reorder buffer to help restore the machine to a precise state. The architecture then makes the pipelining visible to the programmer/compiler by making the reorder buffer accessible and by explicitly providing that issued instructions cannot be affected by immediately preceding ones. Compiler techniques have been identified that can take advantage of the reorder buffer and permit a sustained execution rate approaching or exceeding one per clock. These techniques include using trace scheduling and providing a relatively easy way to “undo” instructions if the predicted branch path is not taken. We have also studied ways to further reduce the effects of branches by not having them executed in the execution unit. In particular, branches are detected and resolved in the instruction fetch unit. Using this approach, the execution unit is sent a stream of instructions (without branches) that are guaranteed to execute.

References

[1]

Allen, F.E., and J. Cocke, "A Catalogue of Optimizing Transformations," in Design and Optimization of Compilers, edited by R. Rustin, Prentice-Hall, 1972.

[2]

Anderson, D. W., F. J. Sparacio and R. M. Tomasulo, "The IBM System/360 Model 91: Machine Philosophy and Instruction" Handling," IBM Journal of Research and Development, pp. 8-24, January 1967.

[3]

Bandyopadhyay, Sumit, V.S. Begwani, and R. B. Murray, "Compiling for the CRISP microprocessor," Proc. IEEE COMPCON SPRING 87, IEEE Cat. Number 87CH2409-1, pp. 96-100.

[4]

Coutant, D. S., C. L. Hammond, and J. W. Kelly, "Compilers for the New Generation of Hewlett-Packard Computers," Proc. "86 COMPCON, March 1986.

[5]

Cray Research, Inc., Cray-1 Computer Systems, Hardware Reference Manual, Chippewa Falls, Wisconsin, 1979.

[6]

Cray Research, Inc., CrayX-MP Series Mainframe Reference Manual, Chippewa Falls, Wisconsin, 1982.

[7]

D. R. Ditzel and H. R. McLellan, "Branch folding in the CRISP microprocessor: reducing branch delay to zero," 14th International Symposium on Computer Architecture (1987).

Digital Library

[8]

Ditzel, D., and R. McLellan, "Register allocation for free: The C machine stack cache," Symposium on Architecture Support for Programming Languages and Operating Systems, 1982.

Digital Library

[9]

Dongarra, J. J., and A. R. Jinds, "Unrolling Loops in Fortran", Software Practice and Experience 9, 3, pp. 219- 226, March 1979.

[10]

Fisher, J., "Trace Scheduling: A Technique for Global Microcode Compaction," IEEE Transactions on Computers, Vol.C-30, No. 7, July 1981.

Digital Library

[11]

Flynn, M. J., "Very High-Speed Computing Systems," Proceedings of the IEEE, Vol. 54, No. 12, pp. 1901-1909, December 1966.

[12]

Gibbons, P. B., and S. S. Muchnick, "Efficient Instruction Scheduling for a Pipelined Architecture," Proceedings of the SIGPLAN, '86 Symposium on Compiler Construction, June 1986.

Digital Library

[13]

Goodman, J.R., J.T. Hsieh, K. Liou, A.R. Pleszkun, P. B. Schechter and H. C. Young, "PIPE: A VLSI Decoupled Architecture," Proceedings of the 12th Annual Symposium on Computer Architecture, pp. 20-27, June 1985.

Digital Library

[14]

Goodman, J. R., and Wei C. Hsu "On the Use of Registers vs Cache to Minimize Memory Traffic," The 13th Annual Symposium on Computer Architecture, June, 1986.

Digital Library

[15]

Goodman, J. R., and Wei C. Hsu "A Code Scheduling Technique for Large Basic Blocks," In Preparation.

[16]

Hennessy, J. L., N. Jouppi, F. Baskett, T. R. Gross and J. Gill, "Hardware/software tradeoffs for increased performance," Proc. SIGARCH/SIGPLAN Symp. Architectural Support for Programming Languages and Operating Systems, ACM, pp. 2-11, March 1982.

Digital Library

[17]

Hennessy, J. L., and T. Gross, "Postpass Code Optimization of Pipeline Constraints," ACM Transactions on Programming Languages and Systems, 5, 3, pp. 422-448, July 1983.

Digital Library

[18]

Hennessy, J. L., "VLSI Processor Architecture," 1EEE Transactions on Computers, Vol.c-33 No. 12, Dec., 1984.

[19]

Hsu, P. Y., "Highly Concurrent Scalar Processing," Ph. D. Thesis, University of Illinois at Urbana-Champaign, 1986.

Digital Library

[20]

Wei C. Hsu, "Register Allocation and Code Scheduling for Load/Store Architectures", Ph.D. Thesis, University of Wisconsin - Madison 1987.

Digital Library

[21]

Hwang, K. and F. A. Briggs, "Computer Architecture and Parallel Processing," McGraw-Hill Book Company, 1984.

Digital Library

[22]

MacLaren, M. D. "Inline Routines in VAXELN Pascal" ACM SIGPLAN'84 Symposium on Compiler Construction, SIGPLAN Notice Vol.19, No. 6, June, 1984.

Digital Library

[23]

McFarling, Scott, and John Hennessy "Reducing the Cost of Branches," Proc. 13th Annual International Symposium on Computer Architectures, June 1986.

Digital Library

[24]

Patterson, D. A., and C. H. Sequin, "RISC I: A Reduced Instruction Set VLSI Computer," Proc. of the Eighth Annual Symposium on Computer Architecture, pp. 443- 458, May 1981.

Digital Library

[25]

Patterson, D. A., and C. H. Sequin, "A VLSI RISC," IEEE Computer, 15, 9, pp.8-21, Sep., 1982.

Digital Library

[26]

Radin, G., "The 801 Minicomputer," Symp. on Architecture Support for Programming Languages and Operating Systems, pp. 39-47, March, 1982.

Digital Library

[27]

Scheifler, R. W., "An Analysis of Inline Substitution for a Structured Programming Language," Comm. ACM, 20, 9, pp. 647-654, Sep., 1977.

Digital Library

[28]

Smith, J. E., and A. R. Pleszkun "Implementation of Precise Interrupts in Pipelined Processors," Proc. 12th Annual International Symposium on Computer Architectures, pp. 36-44, June 1985.

Digital Library

[29]

Wall, David W., "Global Register Allocation at Link Time," Proceedings of the SIGPLAN '82 Symposium on Compiler Construction,

Digital Library

[30]

Digital Technical Journal, Digital Equipment Corp. Hudson, MA, August 1985.

[31]

Wulf, William, et al., "The Design of an Optimizing Compiler," American Elsevier Publishing Company, NY, 1975

Digital Library

[32]

Young, H., "Evaluation of a Decoupled Computer Architecture and the Design" of A Vector Extension," Computer Sciences Technical Report #603, July, 1985.

Cited By

Sohi GVajapeyam S(1998)Instruction issue logic for high-performance, interruptable pipelined processors25 years of the international symposia on Computer architecture (selected papers)10.1145/285930.285992(329-336)Online publication date: 1-Aug-1998
https://dl.acm.org/doi/10.1145/285930.285992
Lozano LGao GMudge TEbcioğlu K(1995)Exploiting short-lived variables in superscalar processorsProceedings of the 28th annual international symposium on Microarchitecture10.5555/225160.225206(292-302)Online publication date: 1-Dec-1995
https://dl.acm.org/doi/10.5555/225160.225206
Lozano LGao G(1995)Exploiting short-lived variables in superscalar processorsProceedings of the 28th Annual International Symposium on Microarchitecture10.1109/MICRO.1995.476839(292-302)Online publication date: Nov-1995
https://doi.org/10.1109/MICRO.1995.476839
Show More Cited By

Index Terms

WISQ: a restartable architecture using queues

Recommendations

Register Queues: A New Hardware/Software Approach to Efficient Software Pipelining
PACT '00: Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques

In this paper, we propose a new hardware mechanism, called Register Queues (RQs), which effectively decouples the architected register space from the physical registers. Using RQs, the compiler can allocate physical registers to store live values in the ...
Read More
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm

This paper presents a cost-effective and high-performance dual-thread VLIW processor model. The dual-thread VLIW processor model is a low-cost subset of the Weld architecture paradigm. It supports one main thread and one speculative thread running ...
Read More
Intel® Itanium® floating-point architecture
WCAE '03: Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture

The Intel® Itanium® architecture is increasingly becoming one of the major processor architectures present in the market today. Launched in 2001, the Intel Itanium processor was followed in 2002 by the Itanium 2 processor, with increased integer and ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '87: Proceedings of the 14th annual international symposium on Computer architecture

June 1987

321 pages

ISBN:0818607769

DOI:10.1145/30350

Editor:
D. St. Clair

Copyright © 1987 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1987

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ISCA87

Sponsor:

SIGARCH

ISCA87: The 14th Annual International Symposium on Computer Architecture

June 2 - 5, 1987

Pennsylvania, Pittsburgh, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
368
Total Downloads

Downloads (Last 12 months)49
Downloads (Last 6 weeks)4

Other Metrics

View Author Metrics

Citations

Cited By

Sohi GVajapeyam S(1998)Instruction issue logic for high-performance, interruptable pipelined processors25 years of the international symposia on Computer architecture (selected papers)10.1145/285930.285992(329-336)Online publication date: 1-Aug-1998
https://dl.acm.org/doi/10.1145/285930.285992
Lozano LGao GMudge TEbcioğlu K(1995)Exploiting short-lived variables in superscalar processorsProceedings of the 28th annual international symposium on Microarchitecture10.5555/225160.225206(292-302)Online publication date: 1-Dec-1995
https://dl.acm.org/doi/10.5555/225160.225206
Lozano LGao G(1995)Exploiting short-lived variables in superscalar processorsProceedings of the 28th Annual International Symposium on Microarchitecture10.1109/MICRO.1995.476839(292-302)Online publication date: Nov-1995
https://doi.org/10.1109/MICRO.1995.476839
Hwu WChang P(1992)Efficient Instruction Sequencing with Inline Target InsertionIEEE Transactions on Computers10.1109/12.21466241:12(1537-1551)Online publication date: 1-Dec-1992
https://dl.acm.org/doi/10.1109/12.214662
Sohi G(1990)Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined ComputersIEEE Transactions on Computers10.1109/12.4886539:3(349-359)Online publication date: 1-Mar-1990
https://dl.acm.org/doi/10.1109/12.48865
González ALlaberia JPaul GPapatheodorou TGannon DPudue E(1989)Instruction fetch unit for parallel execution of branch instructionsProceedings of the 3rd international conference on Supercomputing10.1145/318789.318884(417-426)Online publication date: 1-Jun-1989
https://dl.acm.org/doi/10.1145/318789.318884
Sohi GVajapeyam S(1987)Instruction issue logic for high-performance, interruptable pipelined processorsProceedings of the 14th annual international symposium on Computer architecture10.1145/30350.30354(27-34)Online publication date: 1-Jun-1987
https://dl.acm.org/doi/10.1145/30350.30354

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents