article

Free access

Architecture and compiler tradeoffs for a long instruction wordprocessor

Authors:

Robert Cohn,

Thomas Gross,

Monica LamAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 17, Issue 2

Pages 2 - 14

https://doi.org/10.1145/68182.68183

Published: 01 April 1989 Publication History

PDF eReader

Abstract

A very long instruction word (VLIW) processor exploits parallelism by controlling multiple operations in a single instruction word. This paper describes the architecture and compiler tradeoffs in the design of iWarp, a VLIW single-chip microprocessor developed in a joint project with Intel Corp. The iWarp processor is capable of specifying up to nine operations in an instruction word and has a peak performance of 20 million floating-point operations and 20 million integer operations per second. An optimizing compiler has been constructed and used as a tool to evaluate the different architectural proposals in the development of iWarp. We present here the analysis and compiler optimizations for those architectural features that address two key issues in the design of a VLIW microprocessor: code density and a streamlined execution cycle. We support the results of our analysis with performance data for the Livermore Loops and a selection of programs from the LINPACK library.

References

[1]

Annaratone, M., Amould, E., Gross, T., Kung, Iffc~ T., Lain, M., Menzilcioglu, O., and Webb, J. A. The Warp Computer: Architecture, Implementation and Performance. IEEE Transactions on Computers, vol. C-36 (1987), pp. 1523-1538.

Digital Library

Google Scholar

[2]

Borkar, S., Cohn, R., Cox, G., Gleason, S., Gross, T., Kung, H. T., Lam, M., Moore, B., Peterson, C., Pieper, J., Rankin, L., Tseng, P. S., Sutton, J., Urbanski, J., and Webb, J. iWarp: An Integrated Solution to High-Speed Parallel Computing. In: Proceedings of Supercomputing '88, IEEE Computer Society and ACM SIGARCH. 1988.

Digital Library

Google Scholar

[3]

Chow, F., Correll, S., Himmelstein, M., Killian, E., and Weber, L. How Many Addressing Modes are Enough? In: Proc. Second SIGARCH/SIGPLAN Symposium on Architectural Support for Programming Languages and Operating Systems, ACM. Palo Alto, 1987, pp. 117- 121.

Crossref

Google Scholar

[4]

Colwell, R. P., Nix, R. P., O'Donnell, J. J., Papworth, D. P., and Rodman, P. K. A VL1W Architecture for a Trace Scheduling Compiler. IEEE Transactions on Computers, vol. 37 (1988), pp. 967-979.

Digital Library

Google Scholar

[5]

Craig, G. L., Goodman, J. R., Katz, R. H., Pleszkun, A. R., Ramachanclran, K., Sayah, J., and Smith, J. PIPE: A High Performance VLSI Architecture, J. VLSI and Computer Systems, vol. 2 (1986), pp. 1-22.

Digital Library

Google Scholar

[6]

Cydrome Inc. CYDRA 5 Directed Datafiow Architecture. 1987.

Google Scholar

[7]

Feo, J. T. An Analysis of the Computational and Parallel Complexity of the Livermore Loops. Parallel Computing, vol. 7 (1988), pp. 163-186.

Crossref

Google Scholar

[8]

Fisher, J. Trace Scheduling.' A Technique for Global Microcode Compaction. IEEE Transactions on Computers, vol. C-30 (1981), pp. 478-490.

Google Scholar

[9]

Fisher, J. Very Long Instruction Word Architectures and the ELI-512. In: Proe. of the Tenth Annual Symposium on Computer Architecture. Stockholm, 1983, pp. 140- 150.

Digital Library

Google Scholar

[10]

Gross, T. and Lain, M.' Compilation for a Highperformance Systolic Array. In: Proceedings of the SIG- PLAN 86 Symposium on Compiler Construction, ACM SIGPLAN. 1986, pp. 27-38.

Digital Library

Google Scholar

[11]

Lain, M. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In: ACM Sigplan '88 Conference on Programming Language Design and Implementation. 1988, pp. 318-328.

Digital Library

Google Scholar

[12]

Lam, M. S. A Systolic Array Optimizing Compiler. Kluwer Academic Publishers, 1987.

Digital Library

Google Scholar

[13]

Sites, R. L. The Compilation of Loop Induction Expressions. ACM TOPLAS, vol. 1 (1979), pp. 50-57.

Digital Library

Google Scholar

[14]

Tokoro, M., Tamura, E., and Takizuka, T. Optimization of Microprograms. IEEE Transactions on Computers, vol. C-30 (1981), pp. 491-504.

Google Scholar

Cited By

View all

Leung APalem KUngureanu C(1997)Run-Time versus Compile-Time Instruction Scheduling in Superscalar (RISC) ProcessorsJournal of Parallel and Distributed Computing10.1006/jpdc.1997.132945:1(13-28)Online publication date: 25-Aug-1997
https://dl.acm.org/doi/10.1006/jpdc.1997.1329
Matthes W(1991)How many operation units are adequate?ACM SIGARCH Computer Architecture News10.1145/122576.12258619:4(94-108)Online publication date: 1-Jul-1991
https://dl.acm.org/doi/10.1145/122576.122586
Haghighi SGaudiot JFranklin MJacob BBatina LMathew BAsanovic´ KSakiyama KVerbauwhede IQuammen DAnantha Ramakrishna SGrzegorczyk T(2009)Computer Architecture and DesignPhysics and Applications of Negative Refractive Index Materials10.1201/9781420068764.sec1Online publication date: 16-Nov-2009
https://doi.org/10.1201/9781420068764.sec1
Show More Cited By

Index Terms

Architecture and compiler tradeoffs for a long instruction wordprocessor
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multiple instruction, multiple data
      2. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Architecture and compiler tradeoffs for a long instruction wordprocessor
ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems

A very long instruction word (VLIW) processor exploits parallelism by controlling multiple operations in a single instruction word. This paper describes the architecture and compiler tradeoffs in the design of iWarp, a VLIW single-chip microprocessor ...
Compiler Processor Tradeoffs for DISVLIW Architecture
ISPAN '02: Proceedings of the 2002 International Symposium on Parallel Architectures, Algorithms and Networks

The Dynamically Instruction Scheduled VLIW (DISVLIW) processor architecture is designed for balancing scheduling effort more evenly between the compiler and the processor. The DISVLIW instruction format is augmented to allow dependency bit vectors to be ...
Enhancing instruction level parallelism through compiler-controlled speculation

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 17, Issue 2

Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems

April 1989

291 pages

ISSN:0163-5964

DOI:10.1145/68182

Editor:
Joel Emer

Issue’s Table of Contents

ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems
April 1989
303 pages
ISBN:0897913000
DOI:10.1145/70082
Chairman:
Joel Emer,
General Chair:
John Hennessy
Stanford University

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989

Published in SIGARCH Volume 17, Issue 2

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

38
Total Citations
View Citations
587
Total Downloads

Downloads (Last 12 months)80
Downloads (Last 6 weeks)11

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Leung APalem KUngureanu C(1997)Run-Time versus Compile-Time Instruction Scheduling in Superscalar (RISC) ProcessorsJournal of Parallel and Distributed Computing10.1006/jpdc.1997.132945:1(13-28)Online publication date: 25-Aug-1997
https://dl.acm.org/doi/10.1006/jpdc.1997.1329
Matthes W(1991)How many operation units are adequate?ACM SIGARCH Computer Architecture News10.1145/122576.12258619:4(94-108)Online publication date: 1-Jul-1991
https://dl.acm.org/doi/10.1145/122576.122586
Haghighi SGaudiot JFranklin MJacob BBatina LMathew BAsanovic´ KSakiyama KVerbauwhede IQuammen DAnantha Ramakrishna SGrzegorczyk T(2009)Computer Architecture and DesignPhysics and Applications of Negative Refractive Index Materials10.1201/9781420068764.sec1Online publication date: 16-Nov-2009
https://doi.org/10.1201/9781420068764.sec1
Shrivastava ASanghyun PEarlie EDutt NNicolau AYunheung P(2007)Automatic Design Space Exploration of Register Bypasses in Embedded ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2007.90706626:12(2102-2115)Online publication date: 1-Dec-2007
https://dl.acm.org/doi/10.1109/TCAD.2007.907066
Shrivastava AEarlie EDutt NNicolau A(2006)Retargetable pipeline hazard detection for partially bypassed processorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.87846814:8(791-801)Online publication date: Aug-2006
https://doi.org/10.1109/TVLSI.2006.878468
Shrivastava ADutt NNicolau AEarlie E(2005)PBExploreProceedings of the conference on Design, Automation and Test in Europe - Volume 210.1109/DATE.2005.236(1264-1269)Online publication date: 7-Mar-2005
https://dl.acm.org/doi/10.1109/DATE.2005.236
Kudlur MFan KChu MRavindran RClark NMahlke S(2004)FLASHProceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization10.5555/977395.977671Online publication date: 20-Mar-2004
https://dl.acm.org/doi/10.5555/977395.977671
Kudlur MFan KChu MRavindran RClark NMahlke S(2004)FLASH: foresighted latency-aware scheduling heuristic for processors with customized datapathsInternational Symposium on Code Generation and Optimization, 2004. CGO 2004.10.1109/CGO.2004.1281675(201-212)Online publication date: 2004
https://doi.org/10.1109/CGO.2004.1281675
Chang PMahlke SChen WWarter NHwu W(1998)IMPACT25 years of the international symposia on Computer architecture (selected papers)10.1145/285930.286000(408-417)Online publication date: 1-Aug-1998
https://dl.acm.org/doi/10.1145/285930.286000
Leung APalem KUngureanu C(1997)Run-Time versus Compile-Time Instruction Scheduling in Superscalar (RISC) ProcessorsJournal of Parallel and Distributed Computing10.1006/jpdc.1997.132945:1(13-28)Online publication date: 25-Aug-1997
https://dl.acm.org/doi/10.1006/jpdc.1997.1329
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations