Article

Free access

A study of scalar compilation techniques for pipelined supercomputers

Authors:

ASPLOS II: Proceedings of the second international conference on Architectual support for programming languages and operating systems

Pages 105 - 109

https://doi.org/10.1145/36206.36191

Published: 01 October 1987 Publication History

PDF eReader

Abstract

This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of instruction buffer) on the efficiency of loop unrolling. We also develop a methodology for classifying software pipelining techniques. For loop unrolling, a straightforward scheduling algorithm is shown to produce near-optimal results when not inhibited by recurrences or memory hazards. Software pipelining requires less hardware but also achieves less speedup. Finally, we show that the performance produced with a modified CRAY-1S scalar architecture and a code scheduler utilizing loop unrolling is comparable to the performance achieved by the CRAY-1S with a vector unit and the CFT vectorizing compiler.

References

[1]

{BONS69} P. Bonseigneur, "Description of the 7600 Computer System," Computer Group News, pp. 11--15, May 1969.

Google Scholar

[2]

{CDC79} "FORTRAN Extended Version 4 Reference Manual", Publication 60497800, Control Data Corp., Arden Hills, MN, 1979.

Google Scholar

[3]

{CDC81} "CDC CYBER 200 FORTRAN Version 2 Reference Manual", Publication 6048500, Control Data Corp., Arden Hills, MN, 1981.

Google Scholar

[4]

{CHAR81} A. E. Charlesworth, "An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family", Computer, V 14, N 9, September 1981.

Digital Library

Google Scholar

[5]

{DONG79} J. J. Dongarra, A. R. Hinds, "Unrolling Loops in FORTRAN", Software-Practice and Experience, V 9, N 3, March 1979.

Crossref

Google Scholar

[6]

{FISH83} J. A. Fisher, "Very Long Instruction Word Architectures and the ELI-512", 10th Annual International Symposium on Computer Architecture, Stockholm, Sweden, June 1983.

Digital Library

Google Scholar

[7]

{LEE80} R. B. Lee, "Empirical Results on the Speed, Efficiency, Redundancy and Quality of Parallel Computations", Int'l Conference Parallel Processing, pp. 91--100, August 1980.

Google Scholar

[8]

{MCMA72} F. H. McMahon, "FORTRAN CPU Performance Analysis," Lawrence Livermore Laboratories, 1972.

Google Scholar

[9]

{PADU86} D. Padua and M. J. Wolfe, "Advanced Compiler Optimizations for Supercomputers", Communications of the ACM, Dec. 1986.

Digital Library

Google Scholar

[10]

{PANG83} N. Pang and J. E. Smith, "CRAY-1 Simulation Tools," Tech. Report ECE-83-11, University of Wisconsin-Madison, Dec. 1983.

Google Scholar

[11]

{THOR67} J. F. Thorlin, "Code Generation for PIE (parallel instruction execution) computers", AFIPS Spring Joint Computer Conf. Proc., Atlantic City, New Jersey, April 1967.

Google Scholar

Cited By

View all

Guzma VPitkanen TTakala J(2011)Effects of loop unrolling and use of instruction buffer on processor energy consumption2011 International Symposium on System on Chip (SoC)10.1109/ISSOC.2011.6089224(82-85)Online publication date: Oct-2011
https://doi.org/10.1109/ISSOC.2011.6089224
Lam M(2004)Software pipeliningACM SIGPLAN Notices10.1145/989393.98942039:4(244-256)Online publication date: 1-Apr-2004
https://dl.acm.org/doi/10.1145/989393.989420
Sarkar V(2001)Optimized Unrolling of Nested LoopsInternational Journal of Parallel Programming10.1023/A:101224603167129:5(545-581)Online publication date: 1-Oct-2001
https://dl.acm.org/doi/10.1023/A%3A1012246031671
Show More Cited By

Index Terms

Recommendations

A study of scalar compilation techniques for pipelined supercomputers

This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of ...
A study of scalar compilation techniques for pipelined supercomputers

This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of ...
A study of scalar compilation techniques for pipelined supercomputers

This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of ...

Comments

Information & Contributors

Information

Published In

ASPLOS II: Proceedings of the second international conference on Architectual support for programming languages and operating systems

October 1987

205 pages

ISBN:0818608056

DOI:10.1145/36206

Editor:
Randy Katz
Univ. of California, Berkeley
,
General Chair:
Martin Freeman
Stanford University and Philips/Signetics

ACM SIGARCH Computer Architecture News Volume 15, Issue 5
Oct. 1987
189 pages
ISSN:0163-5964
DOI:10.1145/36177
Editor:
Randy H. Katz
Univ. of California, Berkeley
Issue’s Table of Contents
ACM SIGPLAN Notices Volume 22, Issue 10
Oct. 1987
189 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/36205
Editor:
Randy Katz
Univ. of California, Berkeley
Issue’s Table of Contents
ACM SIGOPS Operating Systems Review Volume 21, Issue 4
Oct. 1987
189 pages
ISSN:0163-5980
DOI:10.1145/36204
Editor:
Randy Katz
Univ. of California, Berkeley
Issue’s Table of Contents

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 October 1987

Check for updates

Qualifiers

Article

Conference

ASPLOS II

Sponsor:

SIGARCH

ASPLOS II: Architectual support for programming languages and operating systems

October 5 - 8, 1987

California, Palo Alto, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
1,043
Total Downloads

Downloads (Last 12 months)77
Downloads (Last 6 weeks)17

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Guzma VPitkanen TTakala J(2011)Effects of loop unrolling and use of instruction buffer on processor energy consumption2011 International Symposium on System on Chip (SoC)10.1109/ISSOC.2011.6089224(82-85)Online publication date: Oct-2011
https://doi.org/10.1109/ISSOC.2011.6089224
Lam M(2004)Software pipeliningACM SIGPLAN Notices10.1145/989393.98942039:4(244-256)Online publication date: 1-Apr-2004
https://dl.acm.org/doi/10.1145/989393.989420
Sarkar V(2001)Optimized Unrolling of Nested LoopsInternational Journal of Parallel Programming10.1023/A:101224603167129:5(545-581)Online publication date: 1-Oct-2001
https://dl.acm.org/doi/10.1023/A%3A1012246031671
Gabbay FMendelson A(1998)The effect of instruction fetch bandwidth on value predictionACM SIGARCH Computer Architecture News10.1145/279361.27805826:3(272-281)Online publication date: 16-Apr-1998
https://dl.acm.org/doi/10.1145/279361.278058
Norris CPollock L(1998)Experiences with Cooperating Register Allocation and Instruction SchedulingInternational Journal of Parallel Programming10.1023/A:101873811263926:3(241-283)Online publication date: 1-Jun-1998
https://dl.acm.org/doi/10.1023/A%3A1018738112639
Debray S(1997)Resource-bounded partial evaluationACM SIGPLAN Notices10.1145/258994.25901732:12(179-192)Online publication date: 1-Dec-1997
https://dl.acm.org/doi/10.1145/258994.259017
Debray SGallagher JConsel C(1997)Resource-bounded partial evaluationProceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation10.1145/258993.259017(179-192)Online publication date: Dec-1997
https://doi.org/10.1145/258993.259017
Davidson JJinturkar SMudge TEbcioğlu K(1995)Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguationProceedings of the 28th annual international symposium on Microarchitecture10.5555/225160.225184(125-132)Online publication date: 1-Dec-1995
https://dl.acm.org/doi/10.5555/225160.225184
Norris CPollock LBic LEvripidou PBöhm WGaudiot J(1995)Register allocation sensitive region schedulingProceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques10.5555/224659.224668(1-10)Online publication date: 27-Jun-1995
https://dl.acm.org/doi/10.5555/224659.224668
Warter NChang PMahlke SChen WHwu W(1995)Three Architectural Models for Compiler-Controlled Speculative ExecutionIEEE Transactions on Computers10.1109/12.37616444:4(481-494)Online publication date: 1-Apr-1995
https://dl.acm.org/doi/10.1109/12.376164
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations