Article

Single-Dimension Software Pipelining for Multi-Dimensional Loops

Authors:

R. Govindarajan,

Alban Douillet,

Guang R. GaoAuthors Info & Claims

CGO '04: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

Page 163

Published: 20 March 2004 Publication History

Abstract

Traditionally, software pipelining is applied either to theinnermost loop of a given loop nest or from the innermostloop to outer loops. In this paper, we propose a three-stepapproach, called Single-dimension Software Pipelining(SSP), to software pipeline a loop nest at an arbitraryloop level.The first step identifies the most profitable loop level forsoftware pipelining in terms of initiation rate or data reusepotential. The second step simplifies the multi-dimensionaldata-dependence graph (DDG) into a 1-dimensional DDGand constructs a 1-dimensional schedule for the selectedloop level. The third step derives a simple mapping functionwhich specifies the schedule time for the operations of themulti-dimensional loop, based on the 1-dimensional schedule.We prove that the SSP method is correct and at least asefficient as other modulo scheduling methods.We establish the feasibility and correctness of our approachby implementing it on the IA-64 architecture. Experimentalresults on a small number of loops show significantperformance improvements over existing modulo schedulingmethods that software pipeline a loop nest from the innermostloop.

References

[1]

{1} Intel IA-64 Architecture Software Developer's Manual, Vol. 1: IA-64 Application Architecture. Intel Corp., 2001.

[2]

{2} A. Aiken and A. Nicolau. Fine-grain parallelization and the wavefront method. Languages and Compilers for Parallel Computing, MIT Press, Cambridge, MA, pages 1-16, 1990.

Digital Library

[3]

{3} A. Aiken, A. Nicolau, and S. Novack. Resource-constrained software pipelining. IEEE Transactions on Parallel and Distributed Systems, 6(12):1248-1270, Dec. 1995.

Digital Library

[4]

{4} V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. Software pipelining. ACM Computing Surveys, 27(3):367-432, September 1995.

Digital Library

[5]

{5} J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Conf. Record of the Tenth Annual ACM Symp. on Principles of Programming Languages, pages 177-189, Austin, Texas, January 1983. ACM SIGACT and SIGPLAN.

Digital Library

[6]

{6} R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence Based Approach. Morgan Kaufman, San Francisco, 2002.

Digital Library

[7]

{7} U. Banerjee. Loop transformations for restructuring compilers: the foundations. Kluwer Academic, Boston, 1993.

Digital Library

[8]

{8} S. Carr, C. Ding, and P. Sweany. Improving software pipelining with unroll-and-jam. In Proc. 29th Annual Hawaii Intl. Conf. on System Sciences, pages 183-192, 1996.

Digital Library

[9]

{9} S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. on Prog. Lang. and Systems, 16(6):1768-1810, Nov. 1994.

Digital Library

[10]

{10} A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814-822, Aug. 1994.

Digital Library

[11]

{11} A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhuser, Boston, 2000. 280 p.

Digital Library

[12]

{12} A. Darte, R. Schreiber, B. R. Rau, and F. Vivien. Constructing and exploiting linear schedules with prescribed parallelism. ACM Trans. Des. Autom. Electron. Syst., 7(1):159- 172, 2002.

Digital Library

[13]

{13} P. Feautrier. Automatic parallelization in the polytope model. Lecture Notes in Computer Science, 1132:79-103, 1996.

Digital Library

[14]

{14} G. R. Gao, Q. Ning, and V. Van Dongen. Software pipelining for nested loops. ACAPS Tech Memo 53, School of Computer Science, McGill Univ., Montréal, Québec, May 1993.

[15]

{15} R. Govindarajan, E. R. Altman, and G. R. Gao. A framework for resource-constrained rate-optimal software pipelining. IEEE Transactions on Parallel and Distributed Systems, 7(11):1133-1149, November 1996.

Digital Library

[16]

{16} R. A. Huff. Lifetime-sensitive modulo scheduling. In Proc. of the ACM SIGPLAN '93 Conf. on Prog. Lang. Design and Implementation, pages 258-267, Albuquerque, New Mexico, June 23-25, 1993. SIGPLAN Notices, 28(6), June 1993.

Digital Library

[17]

{17} M. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation , pages 318-328, Atlanta, Georgia, June 22-24, 1988. SIGPLAN Notices, 23(7), July 1988.

Digital Library

[18]

{18} L. Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83-93, February 1974.

Digital Library

[19]

{19} S.-M. Moon and K. Ebcio¿lu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems , 19(6):853-898, Nov. 1997.

Digital Library

[20]

{20} K. Muthukumar and G. Doshi. Software pipelining of nested loops. Lecture Notes in Computer Science, 2027:165-??, 2001.

[21]

{21} T. Peters. Livermore loops coded in c. http://www.netlib.org/benchmark/livermorec.

[22]

{22} D. Petkov, R. Harr, and S. Amarasinghe. Efficient pipelining of nested loops: unroll-and-squash. In 16th Intl. Parallel and Distributed Processing Symposium (IPDPS '02), Fort Lauderdale, FL, Apr. 2002. IEEE.

Digital Library

[23]

{23} J. Ramanujam. Optimal software pipelining of nested loops. In Proc. of the 8th Intl. Parallel Processing Symp., pages 335-342, Cancún, Mexico, April 1994. IEEE.

Digital Library

[24]

{24} B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 63- 74, San Jose, California, November 30-December 2, 1994.

Digital Library

[25]

{25} B. R. Rau and J. A. Fisher. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing , 7:9-50, May 1993.

Digital Library

[26]

{26} H. Rong. Software Pipelining of Nested Loops. PhD thesis, Tsinghua University, Beijing, China, 2001.

[27]

{27} H. Rong, A. Douillet, R. Govindarajan, and G. R. Gao. Code generation for single-dimension software pipelining of multi-dimensional loops. In Proc. of the 2004 Intl. Symp. on Code Generation and Optimization (CGO), March 2004.

Digital Library

[28]

{28} H. Rong and Z. Tang. Hardware controlled shifts and rotations supporting software pipelining of loop nests. China Patent, November 2000. #00133535.9.

[29]

{29} H. Rong, Z. Tang, A. Douillet, R. Govindarajan, and G. R. Gao. Single-dimension software pipelining for multi-dimensional loops. CAPSL Technical Memo 49, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware, September 2003. In ftp://ftp.capsl.udel.edu/pub/doc/memos/memo049.ps.gz.

[30]

{30} J. Wang and G. R. Gao. Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In Proc. of the 6th Intl. Conf. on Compiler Construction, CC '96, volume 1060 of Lecture Notes in Computer Science, pages 1-17, Linkoping, Sweden, April 1996.

Digital Library

[31]

{31} M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proc. of the ACM SIGPLAN '91 Conf. on Prog. Lang. Design and Implementation, pages 30-44, Toronto, June 26-28, 1991. SIGPLAN Notices, 26(6), June 1991.

Digital Library

[32]

{32} M. E. Wolf, D. E. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In Proc. of the 29th Annual Intl. Symp. on Microarchitecture (MICRO 29), pages 274-286, Paris, December 2-4, 1996.

Digital Library

Cited By

Sim HLee HSeo SLee J(2016)Mapping Imperfect Loops to Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.250491835:7(1092-1104)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1109/TCAD.2015.2504918
Lee JSeo SLee HSim HMarculescu RNicolescu G(2014)Flattening-based mapping of imperfect loop nests for CGRAsProceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis10.1145/2656075.2656085(1-10)Online publication date: 12-Oct-2014
https://dl.acm.org/doi/10.1145/2656075.2656085
Kim YLee JMai TPaek Y(2012)Improving performance of nested loops on reconfigurable array processorsACM Transactions on Architecture and Code Optimization10.1145/2086696.20867118:4(1-23)Online publication date: 26-Jan-2012
https://dl.acm.org/doi/10.1145/2086696.2086711
Show More Cited By

Index Terms

Single-Dimension Software Pipelining for Multi-Dimensional Loops
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Patterns
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling
    2. Software system structures
      1. Distributed systems organizing principles
        Client-server architectures
      2. Software architectures

Recommendations

Single-dimension software pipelining for multidimensional loops

Traditionally, software pipelining is applied either to the innermost loop of a given loop nest or from the innermost loop to outer loops. This paper proposes a three-step approach, called single-dimension software pipelining (SSP), to software pipeline ...
Software Pipelining of Nested Loops
CC '01: Proceedings of the 10th International Conference on Compiler Construction

Software pipelining is a technique to improve the performance of a loop by overlapping the execution of several iterations. The execution of a software-pipelined loop goes through three phases: prolog, kernel, and epilog. Software pipelining works best ...
Register allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation

Software pipelining of a multi-dimensional loop is an important optimization that overlaps the execution of successive outermost loop iterations to explore instruction-level parallelism from the entire n-dimensional iteration space. This paper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CGO '04: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

March 2004

301 pages

ISBN:0769521029

Copyright © Copyright (c) 2004 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 20 March 2004

Check for updates

Qualifiers

Article

Conference

CGO04

Sponsor:

CGO04: Second Annual IEEE / ACM International Symposium on Code Generation and Optimization

March 20 - 24, 2004

California, Palo Alto

Acceptance Rates

CGO '04 Paper Acceptance Rate 25 of 79 submissions, 32%;

Overall Acceptance Rate 312 of 1,061 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
422
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sim HLee HSeo SLee J(2016)Mapping Imperfect Loops to Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.250491835:7(1092-1104)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1109/TCAD.2015.2504918
Lee JSeo SLee HSim HMarculescu RNicolescu G(2014)Flattening-based mapping of imperfect loop nests for CGRAsProceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis10.1145/2656075.2656085(1-10)Online publication date: 12-Oct-2014
https://dl.acm.org/doi/10.1145/2656075.2656085
Kim YLee JMai TPaek Y(2012)Improving performance of nested loops on reconfigurable array processorsACM Transactions on Architecture and Code Optimization10.1145/2086696.20867118:4(1-23)Online publication date: 26-Jan-2012
https://dl.acm.org/doi/10.1145/2086696.2086711
Liu QTodman TLuk WDe Micheli GAl-Hashimi BMueller WMacii E(2010)Combining optimizations in automated low power designProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871358(1791-1796)Online publication date: 8-Mar-2010
https://dl.acm.org/doi/10.5555/1870926.1871358
Choi YLin YChong NMahlke SMudge T(2009)Stream Compilation for Real-Time Embedded Multicore SystemsProceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO.2009.27(210-220)Online publication date: 22-Mar-2009
https://dl.acm.org/doi/10.1109/CGO.2009.27
Turkington KConstantinides GMasselos KCheung P(2008)Outer loop pipelining for application specific datapaths in FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.5555/1515843.151584616:10(1268-1280)Online publication date: 1-Oct-2008
https://dl.acm.org/doi/10.5555/1515843.1515846
Fellahi MCohen ATouati SFoglia PPrete CBartolini SGiorgi R(2007)Code-size conscious pipelining of imperfectly nested loopsProceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture10.1145/1327171.1327177(49-55)Online publication date: 16-Sep-2007
https://dl.acm.org/doi/10.1145/1327171.1327177
Lin YKudlur MMahlke SMudge TKim TSainrat PLumetta SNavarro N(2007)Hierarchical coarse-grained stream compilation for software defined radioProceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1289881.1289903(115-124)Online publication date: 30-Sep-2007
https://dl.acm.org/doi/10.1145/1289881.1289903
Rong HTang ZGovindarajan RDouillet AGao G(2007)Single-dimension software pipelining for multidimensional loopsACM Transactions on Architecture and Code Optimization10.1145/1216544.12165504:1(7-es)Online publication date: 1-Mar-2007
https://dl.acm.org/doi/10.1145/1216544.1216550
Gao GSterling TStevens RHereld MZhu W(2006)Hierarchical multithreadingProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898800(281-281)Online publication date: 25-Apr-2006
https://dl.acm.org/doi/10.5555/1898699.1898800
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents