article

Free access

Available instruction-level parallelism for superscalar and superpipelined machines

Authors:

N. P. Jouppi and

D. W. WallAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 17, Issue 2

Pages 272 - 282

https://doi.org/10.1145/68182.68207

Published: 01 April 1989 Publication History

Abstract

Superscalar machines can issue several instructions per cycle. Superpipelined machines can issue only one instruction per cycle, but they have cycle times shorter than the latency of any functional unit. In this paper these two techniques are shown to be roughly equivalent ways of exploiting instruction-level parallelism. A parameterizable code reorganization and simulation system was developed and used to measure instruction-level parallelism for a series of benchmarks. Results of these simulations in the presence of various compiler optimizations are presented. The average degree of superpipelining metric is introduced. Our simulations suggest that this metric is already high for many machines. These machines already exploit all of the instruction-level parallelism available in many non-numeric applications, even without parallel instruction issue or higher degrees of pipelining.

References

[1]

Acosta, R. D., Kjelstrup, J., and Tomg, H. C. "An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors." IEEE Transactions on Computers C-35, 9 (September 1986), 815-828.

Digital Library

[2]

Aho, Alfred V., Sethi, Ravi, and Ullman, Jeffrey D. Compilers'Principles, Techniques, and Tools. Addison- Wesley, 1986.

Digital Library

[3]

Allen, Randy, and Kennedy, Ken. "Automatic Translation of FORTRAN Programs to Vector Form." ACM Transactions on Programming Languages and Systems 9, 4 (October 1987), 491-542.

Digital Library

[4]

Charlesworth, Alan E. "An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family." Computer 14, 9 (September 1981), 18-27.

Digital Library

[5]

Ellis, John R. Bulldog: A Compiler for VLIW Architectures. Ph.D. Th., Yale University, 1985.

Digital Library

[6]

Foster, Caxton C., and Riseman, Edward M. "Percolation of Code to Enhance Parallel Dispatching and Execution." IEEE Transactions on Computers C-21, 12 (December 1972), 1411-1415.

Digital Library

[7]

Gross, Thomas. Code Optimization of Pipeline Constraints. Tech. Rept. 83-255, Stanford University, Computer Systems Lab, December, 1983.

[8]

Hennessy, John L., Jouppi, Norman P., Przybylski, Steven, Rowen, Christopher, and Gross, Thomas. Design of a High Performance VLSI Processor. Third Caltech Conference on VLSi, Computer Science Press, March, 1983, pp. 33-54.

[9]

Jouppi, Norman P., Dion, Jeremy, Boggs, David, and Nielsen, Michael j. K. MultiTitan: Four Architecture Papers. Tech. Rept. 87/8, Digital Equipment Corporation Westem Research Lab, April, 1988.

[10]

Katevenis, Manolis G. H. Reduced Instruction Set Architectures for VLSI. Tech. Rept. UCB/CSD 83/141, University of California, Berkeley, Computer Science Division of EECS, October, 1983.

Digital Library

[11]

Lam, Monica. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. SIGPLAN '88 Conference on Programming Language Design and Implementation, June, 1988, pp. 318-328.

Digital Library

[12]

Nicolau, Alexandm, and Fisher, Joseph A. "Measuring the Parallelism Available for Very Long Instruction Word Architectures." IEEE Transactions on Computers C-33, 11 (November 1984), 968-976.

Digital Library

[13]

Nielsen, Michael J. K. Titan System Manual. Tech. Rept. 86/1, Digital Equipment Corporation Western Research Lab, September, 1986.

[14]

Riseman, Edward M., and Foster, Caxton C. "The Inhibition of Potential Parallelism by Conditional Jumps." IEEE Transactions on Computers C-21, 12 (December 1972), 1405-1411.

Digital Library

[15]

Tjaden, Garold S., and Flynn, Michael J. "Detection and Parallel Execution of Independent Instructions." IEEE Transactions on Computers C-19, I0 (October 1970), 889-895.

Digital Library

[16]

Wall, David W. Global Register Allocation at Link- Time. SIGPLAN '86 Conference on Compiler Construction, June, 1986, pp. 264-275.

Digital Library

[17]

Wall, David W., and Powell, Michael L. The Mahler Experience: Using an Intermediate Language as the Machine Description. Second International Conference on Architectural Support for Programming Languages and Operating Systems, IEEE Computer Society Press, October, 1987, pp. 100-104.

Cited By

Krishnakumar AOgras UMarculescu RKishinevsky MMudge T(2023)Domain-Specific Architectures: Research Problems and Promising ApproachesACM Transactions on Embedded Computing Systems10.1145/356394622:2(1-26)Online publication date: 24-Jan-2023
https://dl.acm.org/doi/10.1145/3563946
He YChen X(2023)Survey and Comparison of Pipeline of Some RISC and CISC System Architectures2023 8th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS57501.2023.10150975(785-790)Online publication date: 21-Apr-2023
https://doi.org/10.1109/ICCCS57501.2023.10150975
Hirvonen ALeppänen THepola KMultanen JHoozemans JJääskeläinen P(2023)AEx: Automated High-Level Synthesis of Compiler Programmable Co-ProcessorsJournal of Signal Processing Systems10.1007/s11265-023-01841-395:9(1051-1065)Online publication date: 15-Feb-2023
https://doi.org/10.1007/s11265-023-01841-3
Show More Cited By

Index Terms

Available instruction-level parallelism for superscalar and superpipelined machines

Recommendations

Superscalar vs. superpipelined machines

The performance and implementation cost of superscalar and superpipelined machines are compared. Superscalar machines can issue several instructions per cycle. Superpipelined machines can issue only one instruction per cycle, but they have cycle times ...
Read More
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems

Superscalar machines can issue several instructions per cycle. Superpipelined machines can issue only one instruction per cycle, but they have cycle times shorter than the latency of any functional unit. In this paper these two techniques are shown to ...
Read More
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 17, Issue 2

Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems

April 1989

291 pages

ISSN:0163-5964

DOI:10.1145/68182

Editor:
Joel Emer

Issue’s Table of Contents

ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems
April 1989
303 pages
ISBN:0897913000
DOI:10.1145/70082
Chairman:
Joel Emer,
General Chair:
John Hennessy
Stanford University

Copyright © 1989 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989

Published in SIGARCH Volume 17, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

267
Total Citations
View Citations
3,276
Total Downloads

Downloads (Last 12 months)362
Downloads (Last 6 weeks)24

Other Metrics

View Author Metrics

Citations

Cited By

Krishnakumar AOgras UMarculescu RKishinevsky MMudge T(2023)Domain-Specific Architectures: Research Problems and Promising ApproachesACM Transactions on Embedded Computing Systems10.1145/356394622:2(1-26)Online publication date: 24-Jan-2023
https://dl.acm.org/doi/10.1145/3563946
He YChen X(2023)Survey and Comparison of Pipeline of Some RISC and CISC System Architectures2023 8th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS57501.2023.10150975(785-790)Online publication date: 21-Apr-2023
https://doi.org/10.1109/ICCCS57501.2023.10150975
Hirvonen ALeppänen THepola KMultanen JHoozemans JJääskeläinen P(2023)AEx: Automated High-Level Synthesis of Compiler Programmable Co-ProcessorsJournal of Signal Processing Systems10.1007/s11265-023-01841-395:9(1051-1065)Online publication date: 15-Feb-2023
https://doi.org/10.1007/s11265-023-01841-3
Yu ZSu ZYang YLiang JJiang YCui AChang WWang R(2022)Mercury: Instruction Pipeline Aware Code Generation for Simulink ModelsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319996741:11(4504-4515)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1109/TCAD.2022.3199967
Wang XLi CZhang LHou XChen QGuo M(2022)Exploring Efficient Microservice Level Parallelism2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00030(223-233)Online publication date: May-2022
https://doi.org/10.1109/IPDPS53621.2022.00030
Yao CMeng ZGuo WZhou JGuo Z(2022)Analysis and Optimization of the Branch Prediction Unit of SweRV EH12022 IEEE 16th International Conference on Anti-counterfeiting, Security, and Identification (ASID)10.1109/ASID56930.2022.9996038(5-9)Online publication date: 2-Dec-2022
https://doi.org/10.1109/ASID56930.2022.9996038
Tarasov IKazantseva LDaeva S(2022)Modeling of Processor Datapaths with VLIW Architecture at the System LevelHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-030-94141-3_1(3-12)Online publication date: 17-Jan-2022
https://doi.org/10.1007/978-3-030-94141-3_1
Gamino del Río IMartínez Hellín AR. Polo ÓJiménez Arribas MParra Pda Silva ASánchez JSánchez S(2020)A RISC-V Processor Design for Transparent TracingElectronics10.3390/electronics91118739:11(1873)Online publication date: 7-Nov-2020
https://doi.org/10.3390/electronics9111873
Oppermann JKoch A(2016)Detecting Kernels Suitable for C-Based High-Level Hardware Synthesis2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld)10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0178(1157-1164)Online publication date: Jul-2016
https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0178
Soliman M(2009)Exploiting ILP, TLP, and DLP to Improve Multi-Core Performance of One-Sided Jacobi SVDParallel Processing Letters10.1142/S012962640900026219:02(355-375)Online publication date: Jun-2009
https://doi.org/10.1142/S0129626409000262
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents