Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/977395.977657acmconferencesArticle/Chapter ViewAbstractPublication PagescgoConference Proceedingsconference-collections
Article

Single-Dimension Software Pipelining for Multi-Dimensional Loops

Published: 20 March 2004 Publication History
  • Get Citation Alerts
  • Abstract

    Traditionally, software pipelining is applied either to theinnermost loop of a given loop nest or from the innermostloop to outer loops. In this paper, we propose a three-stepapproach, called Single-dimension Software Pipelining(SSP), to software pipeline a loop nest at an arbitraryloop level.The first step identifies the most profitable loop level forsoftware pipelining in terms of initiation rate or data reusepotential. The second step simplifies the multi-dimensionaldata-dependence graph (DDG) into a 1-dimensional DDGand constructs a 1-dimensional schedule for the selectedloop level. The third step derives a simple mapping functionwhich specifies the schedule time for the operations of themulti-dimensional loop, based on the 1-dimensional schedule.We prove that the SSP method is correct and at least asefficient as other modulo scheduling methods.We establish the feasibility and correctness of our approachby implementing it on the IA-64 architecture. Experimentalresults on a small number of loops show significantperformance improvements over existing modulo schedulingmethods that software pipeline a loop nest from the innermostloop.

    References

    [1]
    {1} Intel IA-64 Architecture Software Developer's Manual, Vol. 1: IA-64 Application Architecture. Intel Corp., 2001.
    [2]
    {2} A. Aiken and A. Nicolau. Fine-grain parallelization and the wavefront method. Languages and Compilers for Parallel Computing, MIT Press, Cambridge, MA, pages 1-16, 1990.
    [3]
    {3} A. Aiken, A. Nicolau, and S. Novack. Resource-constrained software pipelining. IEEE Transactions on Parallel and Distributed Systems, 6(12):1248-1270, Dec. 1995.
    [4]
    {4} V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. Software pipelining. ACM Computing Surveys, 27(3):367-432, September 1995.
    [5]
    {5} J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence. In Conf. Record of the Tenth Annual ACM Symp. on Principles of Programming Languages, pages 177-189, Austin, Texas, January 1983. ACM SIGACT and SIGPLAN.
    [6]
    {6} R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures: A Dependence Based Approach. Morgan Kaufman, San Francisco, 2002.
    [7]
    {7} U. Banerjee. Loop transformations for restructuring compilers: the foundations. Kluwer Academic, Boston, 1993.
    [8]
    {8} S. Carr, C. Ding, and P. Sweany. Improving software pipelining with unroll-and-jam. In Proc. 29th Annual Hawaii Intl. Conf. on System Sciences, pages 183-192, 1996.
    [9]
    {9} S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. on Prog. Lang. and Systems, 16(6):1768-1810, Nov. 1994.
    [10]
    {10} A. Darte and Y. Robert. Constructive methods for scheduling uniform loop nests. IEEE Transactions on Parallel and Distributed Systems, 5(8):814-822, Aug. 1994.
    [11]
    {11} A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhuser, Boston, 2000. 280 p.
    [12]
    {12} A. Darte, R. Schreiber, B. R. Rau, and F. Vivien. Constructing and exploiting linear schedules with prescribed parallelism. ACM Trans. Des. Autom. Electron. Syst., 7(1):159- 172, 2002.
    [13]
    {13} P. Feautrier. Automatic parallelization in the polytope model. Lecture Notes in Computer Science, 1132:79-103, 1996.
    [14]
    {14} G. R. Gao, Q. Ning, and V. Van Dongen. Software pipelining for nested loops. ACAPS Tech Memo 53, School of Computer Science, McGill Univ., Montréal, Québec, May 1993.
    [15]
    {15} R. Govindarajan, E. R. Altman, and G. R. Gao. A framework for resource-constrained rate-optimal software pipelining. IEEE Transactions on Parallel and Distributed Systems, 7(11):1133-1149, November 1996.
    [16]
    {16} R. A. Huff. Lifetime-sensitive modulo scheduling. In Proc. of the ACM SIGPLAN '93 Conf. on Prog. Lang. Design and Implementation, pages 258-267, Albuquerque, New Mexico, June 23-25, 1993. SIGPLAN Notices, 28(6), June 1993.
    [17]
    {17} M. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation , pages 318-328, Atlanta, Georgia, June 22-24, 1988. SIGPLAN Notices, 23(7), July 1988.
    [18]
    {18} L. Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83-93, February 1974.
    [19]
    {19} S.-M. Moon and K. Ebcio¿lu. Parallelizing nonnumerical code with selective scheduling and software pipelining. ACM Transactions on Programming Languages and Systems , 19(6):853-898, Nov. 1997.
    [20]
    {20} K. Muthukumar and G. Doshi. Software pipelining of nested loops. Lecture Notes in Computer Science, 2027:165-??, 2001.
    [21]
    {21} T. Peters. Livermore loops coded in c. http://www.netlib.org/benchmark/livermorec.
    [22]
    {22} D. Petkov, R. Harr, and S. Amarasinghe. Efficient pipelining of nested loops: unroll-and-squash. In 16th Intl. Parallel and Distributed Processing Symposium (IPDPS '02), Fort Lauderdale, FL, Apr. 2002. IEEE.
    [23]
    {23} J. Ramanujam. Optimal software pipelining of nested loops. In Proc. of the 8th Intl. Parallel Processing Symp., pages 335-342, Cancún, Mexico, April 1994. IEEE.
    [24]
    {24} B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 63- 74, San Jose, California, November 30-December 2, 1994.
    [25]
    {25} B. R. Rau and J. A. Fisher. Instruction-level parallel processing: History, overview and perspective. Journal of Supercomputing , 7:9-50, May 1993.
    [26]
    {26} H. Rong. Software Pipelining of Nested Loops. PhD thesis, Tsinghua University, Beijing, China, 2001.
    [27]
    {27} H. Rong, A. Douillet, R. Govindarajan, and G. R. Gao. Code generation for single-dimension software pipelining of multi-dimensional loops. In Proc. of the 2004 Intl. Symp. on Code Generation and Optimization (CGO), March 2004.
    [28]
    {28} H. Rong and Z. Tang. Hardware controlled shifts and rotations supporting software pipelining of loop nests. China Patent, November 2000. #00133535.9.
    [29]
    {29} H. Rong, Z. Tang, A. Douillet, R. Govindarajan, and G. R. Gao. Single-dimension software pipelining for multi-dimensional loops. CAPSL Technical Memo 49, Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware, September 2003. In ftp://ftp.capsl.udel.edu/pub/doc/memos/memo049.ps.gz.
    [30]
    {30} J. Wang and G. R. Gao. Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In Proc. of the 6th Intl. Conf. on Compiler Construction, CC '96, volume 1060 of Lecture Notes in Computer Science, pages 1-17, Linkoping, Sweden, April 1996.
    [31]
    {31} M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proc. of the ACM SIGPLAN '91 Conf. on Prog. Lang. Design and Implementation, pages 30-44, Toronto, June 26-28, 1991. SIGPLAN Notices, 26(6), June 1991.
    [32]
    {32} M. E. Wolf, D. E. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In Proc. of the 29th Annual Intl. Symp. on Microarchitecture (MICRO 29), pages 274-286, Paris, December 2-4, 1996.

    Cited By

    View all
    • (2016)Mapping Imperfect Loops to Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.250491835:7(1092-1104)Online publication date: 1-Jul-2016
    • (2014)Flattening-based mapping of imperfect loop nests for CGRAsProceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis10.1145/2656075.2656085(1-10)Online publication date: 12-Oct-2014
    • (2012)Improving performance of nested loops on reconfigurable array processorsACM Transactions on Architecture and Code Optimization10.1145/2086696.20867118:4(1-23)Online publication date: 26-Jan-2012
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CGO '04: Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
    March 2004
    301 pages
    ISBN:0769521029

    Sponsors

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 20 March 2004

    Check for updates

    Qualifiers

    • Article

    Conference

    CGO04

    Acceptance Rates

    CGO '04 Paper Acceptance Rate 25 of 79 submissions, 32%;
    Overall Acceptance Rate 312 of 1,061 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Mapping Imperfect Loops to Coarse-Grained Reconfigurable ArchitecturesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2015.250491835:7(1092-1104)Online publication date: 1-Jul-2016
    • (2014)Flattening-based mapping of imperfect loop nests for CGRAsProceedings of the 2014 International Conference on Hardware/Software Codesign and System Synthesis10.1145/2656075.2656085(1-10)Online publication date: 12-Oct-2014
    • (2012)Improving performance of nested loops on reconfigurable array processorsACM Transactions on Architecture and Code Optimization10.1145/2086696.20867118:4(1-23)Online publication date: 26-Jan-2012
    • (2010)Combining optimizations in automated low power designProceedings of the Conference on Design, Automation and Test in Europe10.5555/1870926.1871358(1791-1796)Online publication date: 8-Mar-2010
    • (2009)Stream Compilation for Real-Time Embedded Multicore SystemsProceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO.2009.27(210-220)Online publication date: 22-Mar-2009
    • (2008)Outer loop pipelining for application specific datapaths in FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.5555/1515843.151584616:10(1268-1280)Online publication date: 1-Oct-2008
    • (2007)Code-size conscious pipelining of imperfectly nested loopsProceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture10.1145/1327171.1327177(49-55)Online publication date: 16-Sep-2007
    • (2007)Hierarchical coarse-grained stream compilation for software defined radioProceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1289881.1289903(115-124)Online publication date: 30-Sep-2007
    • (2007)Single-dimension software pipelining for multidimensional loopsACM Transactions on Architecture and Code Optimization10.1145/1216544.12165504:1(7-es)Online publication date: 1-Mar-2007
    • (2006)Hierarchical multithreadingProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898800(281-281)Online publication date: 25-Apr-2006
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media