Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article
Free access

Architecture and compiler tradeoffs for a long instruction wordprocessor

Published: 01 April 1989 Publication History

Abstract

A very long instruction word (VLIW) processor exploits parallelism by controlling multiple operations in a single instruction word. This paper describes the architecture and compiler tradeoffs in the design of iWarp, a VLIW single-chip microprocessor developed in a joint project with Intel Corp. The iWarp processor is capable of specifying up to nine operations in an instruction word and has a peak performance of 20 million floating-point operations and 20 million integer operations per second. An optimizing compiler has been constructed and used as a tool to evaluate the different architectural proposals in the development of iWarp. We present here the analysis and compiler optimizations for those architectural features that address two key issues in the design of a VLIW microprocessor: code density and a streamlined execution cycle. We support the results of our analysis with performance data for the Livermore Loops and a selection of programs from the LINPACK library.

References

[1]
Annaratone, M., Amould, E., Gross, T., Kung, Iffc~ T., Lain, M., Menzilcioglu, O., and Webb, J. A. The Warp Computer: Architecture, Implementation and Performance. IEEE Transactions on Computers, vol. C-36 (1987), pp. 1523-1538.
[2]
Borkar, S., Cohn, R., Cox, G., Gleason, S., Gross, T., Kung, H. T., Lam, M., Moore, B., Peterson, C., Pieper, J., Rankin, L., Tseng, P. S., Sutton, J., Urbanski, J., and Webb, J. iWarp: An Integrated Solution to High-Speed Parallel Computing. In: Proceedings of Supercomputing '88, IEEE Computer Society and ACM SIGARCH. 1988.
[3]
Chow, F., Correll, S., Himmelstein, M., Killian, E., and Weber, L. How Many Addressing Modes are Enough? In: Proc. Second SIGARCH/SIGPLAN Symposium on Architectural Support for Programming Languages and Operating Systems, ACM. Palo Alto, 1987, pp. 117- 121.
[4]
Colwell, R. P., Nix, R. P., O'Donnell, J. J., Papworth, D. P., and Rodman, P. K. A VL1W Architecture for a Trace Scheduling Compiler. IEEE Transactions on Computers, vol. 37 (1988), pp. 967-979.
[5]
Craig, G. L., Goodman, J. R., Katz, R. H., Pleszkun, A. R., Ramachanclran, K., Sayah, J., and Smith, J. PIPE: A High Performance VLSI Architecture, J. VLSI and Computer Systems, vol. 2 (1986), pp. 1-22.
[6]
Cydrome Inc. CYDRA 5 Directed Datafiow Architecture. 1987.
[7]
Feo, J. T. An Analysis of the Computational and Parallel Complexity of the Livermore Loops. Parallel Computing, vol. 7 (1988), pp. 163-186.
[8]
Fisher, J. Trace Scheduling.' A Technique for Global Microcode Compaction. IEEE Transactions on Computers, vol. C-30 (1981), pp. 478-490.
[9]
Fisher, J. Very Long Instruction Word Architectures and the ELI-512. In: Proe. of the Tenth Annual Symposium on Computer Architecture. Stockholm, 1983, pp. 140- 150.
[10]
Gross, T. and Lain, M.' Compilation for a Highperformance Systolic Array. In: Proceedings of the SIG- PLAN 86 Symposium on Compiler Construction, ACM SIGPLAN. 1986, pp. 27-38.
[11]
Lain, M. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In: ACM Sigplan '88 Conference on Programming Language Design and Implementation. 1988, pp. 318-328.
[12]
Lam, M. S. A Systolic Array Optimizing Compiler. Kluwer Academic Publishers, 1987.
[13]
Sites, R. L. The Compilation of Loop Induction Expressions. ACM TOPLAS, vol. 1 (1979), pp. 50-57.
[14]
Tokoro, M., Tamura, E., and Takizuka, T. Optimization of Microprograms. IEEE Transactions on Computers, vol. C-30 (1981), pp. 491-504.

Cited By

View all
  • (1997)Run-Time versus Compile-Time Instruction Scheduling in Superscalar (RISC) ProcessorsJournal of Parallel and Distributed Computing10.1006/jpdc.1997.132945:1(13-28)Online publication date: 25-Aug-1997
  • (1991)How many operation units are adequate?ACM SIGARCH Computer Architecture News10.1145/122576.12258619:4(94-108)Online publication date: 1-Jul-1991
  • (2009)Computer Architecture and DesignPhysics and Applications of Negative Refractive Index Materials10.1201/9781420068764.sec1Online publication date: 16-Nov-2009
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 17, Issue 2
Special issue: Proceedings of ASPLOS-III: the third international conference on architecture support for programming languages and operating systems
April 1989
291 pages
ISSN:0163-5964
DOI:10.1145/68182
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS III: Proceedings of the third international conference on Architectural support for programming languages and operating systems
    April 1989
    303 pages
    ISBN:0897913000
    DOI:10.1145/70082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 1989
Published in SIGARCH Volume 17, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)11
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (1997)Run-Time versus Compile-Time Instruction Scheduling in Superscalar (RISC) ProcessorsJournal of Parallel and Distributed Computing10.1006/jpdc.1997.132945:1(13-28)Online publication date: 25-Aug-1997
  • (1991)How many operation units are adequate?ACM SIGARCH Computer Architecture News10.1145/122576.12258619:4(94-108)Online publication date: 1-Jul-1991
  • (2009)Computer Architecture and DesignPhysics and Applications of Negative Refractive Index Materials10.1201/9781420068764.sec1Online publication date: 16-Nov-2009
  • (2007)Automatic Design Space Exploration of Register Bypasses in Embedded ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2007.90706626:12(2102-2115)Online publication date: 1-Dec-2007
  • (2006)Retargetable pipeline hazard detection for partially bypassed processorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2006.87846814:8(791-801)Online publication date: Aug-2006
  • (2005)PBExploreProceedings of the conference on Design, Automation and Test in Europe - Volume 210.1109/DATE.2005.236(1264-1269)Online publication date: 7-Mar-2005
  • (2004)FLASHProceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization10.5555/977395.977671Online publication date: 20-Mar-2004
  • (2004)FLASH: foresighted latency-aware scheduling heuristic for processors with customized datapathsInternational Symposium on Code Generation and Optimization, 2004. CGO 2004.10.1109/CGO.2004.1281675(201-212)Online publication date: 2004
  • (1998)IMPACT25 years of the international symposia on Computer architecture (selected papers)10.1145/285930.286000(408-417)Online publication date: 1-Aug-1998
  • (1997)Run-Time versus Compile-Time Instruction Scheduling in Superscalar (RISC) ProcessorsJournal of Parallel and Distributed Computing10.1006/jpdc.1997.132945:1(13-28)Online publication date: 25-Aug-1997
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media