Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/36206.36191acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Free access

A study of scalar compilation techniques for pipelined supercomputers

Published: 01 October 1987 Publication History

Abstract

This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of instruction buffer) on the efficiency of loop unrolling. We also develop a methodology for classifying software pipelining techniques. For loop unrolling, a straightforward scheduling algorithm is shown to produce near-optimal results when not inhibited by recurrences or memory hazards. Software pipelining requires less hardware but also achieves less speedup. Finally, we show that the performance produced with a modified CRAY-1S scalar architecture and a code scheduler utilizing loop unrolling is comparable to the performance achieved by the CRAY-1S with a vector unit and the CFT vectorizing compiler.

References

[1]
{BONS69} P. Bonseigneur, "Description of the 7600 Computer System," Computer Group News, pp. 11--15, May 1969.
[2]
{CDC79} "FORTRAN Extended Version 4 Reference Manual", Publication 60497800, Control Data Corp., Arden Hills, MN, 1979.
[3]
{CDC81} "CDC CYBER 200 FORTRAN Version 2 Reference Manual", Publication 6048500, Control Data Corp., Arden Hills, MN, 1981.
[4]
{CHAR81} A. E. Charlesworth, "An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family", Computer, V 14, N 9, September 1981.
[5]
{DONG79} J. J. Dongarra, A. R. Hinds, "Unrolling Loops in FORTRAN", Software-Practice and Experience, V 9, N 3, March 1979.
[6]
{FISH83} J. A. Fisher, "Very Long Instruction Word Architectures and the ELI-512", 10th Annual International Symposium on Computer Architecture, Stockholm, Sweden, June 1983.
[7]
{LEE80} R. B. Lee, "Empirical Results on the Speed, Efficiency, Redundancy and Quality of Parallel Computations", Int'l Conference Parallel Processing, pp. 91--100, August 1980.
[8]
{MCMA72} F. H. McMahon, "FORTRAN CPU Performance Analysis," Lawrence Livermore Laboratories, 1972.
[9]
{PADU86} D. Padua and M. J. Wolfe, "Advanced Compiler Optimizations for Supercomputers", Communications of the ACM, Dec. 1986.
[10]
{PANG83} N. Pang and J. E. Smith, "CRAY-1 Simulation Tools," Tech. Report ECE-83-11, University of Wisconsin-Madison, Dec. 1983.
[11]
{THOR67} J. F. Thorlin, "Code Generation for PIE (parallel instruction execution) computers", AFIPS Spring Joint Computer Conf. Proc., Atlantic City, New Jersey, April 1967.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS II: Proceedings of the second international conference on Architectual support for programming languages and operating systems
October 1987
205 pages
ISBN:0818608056
DOI:10.1145/36206
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 1987

Check for updates

Qualifiers

  • Article

Conference

ASPLOS II
Sponsor:

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)17
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Effects of loop unrolling and use of instruction buffer on processor energy consumption2011 International Symposium on System on Chip (SoC)10.1109/ISSOC.2011.6089224(82-85)Online publication date: Oct-2011
  • (2004)Software pipeliningACM SIGPLAN Notices10.1145/989393.98942039:4(244-256)Online publication date: 1-Apr-2004
  • (2001)Optimized Unrolling of Nested LoopsInternational Journal of Parallel Programming10.1023/A:101224603167129:5(545-581)Online publication date: 1-Oct-2001
  • (1998)The effect of instruction fetch bandwidth on value predictionACM SIGARCH Computer Architecture News10.1145/279361.27805826:3(272-281)Online publication date: 16-Apr-1998
  • (1998)Experiences with Cooperating Register Allocation and Instruction SchedulingInternational Journal of Parallel Programming10.1023/A:101873811263926:3(241-283)Online publication date: 1-Jun-1998
  • (1997)Resource-bounded partial evaluationACM SIGPLAN Notices10.1145/258994.25901732:12(179-192)Online publication date: 1-Dec-1997
  • (1997)Resource-bounded partial evaluationProceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation10.1145/258993.259017(179-192)Online publication date: Dec-1997
  • (1995)Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguationProceedings of the 28th annual international symposium on Microarchitecture10.5555/225160.225184(125-132)Online publication date: 1-Dec-1995
  • (1995)Register allocation sensitive region schedulingProceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques10.5555/224659.224668(1-10)Online publication date: 27-Jun-1995
  • (1995)Three Architectural Models for Compiler-Controlled Speculative ExecutionIEEE Transactions on Computers10.1109/12.37616444:4(481-494)Online publication date: 1-Apr-1995
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media