Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/77726.255177acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article
Free access

Towards efficient fine-grain software pipelining

Published: 01 June 1990 Publication History
  • Get Citation Alerts
  • Abstract

    Dataflow software pipelining was proposed as a means of structuring fine-grain parallelism and has been studied mostly under an idealized dataflow architecture model with infinite resources[9]. In this paper, we investigate the effects of software pipelining under realistic architecture models with finite resources. Our target architecture is the McGill Dataflow Architecture which employs conventional pipelined techniques to achieve fast instruction execution, while exploiting fine-grain parallelism via a data-driven instruction scheduler. To achieve optimal execution efficiency, the compiled code must be able to make a balanced use of both the parallelism in the instruction execution unit and the fine-grain synchronization power of the machine.
    A detailed analysis based on simulation results is presented, focusing on two key architectural factors - the fine-grain synchronization capacity and the scheduling mechanism for enabling instructions. On one hand, our results provide experimental evidence that software pipelining is an effective method for exploiting fine-grain parallelism in loops. On the other, the experiments have also revealed the (somewhat pessimistic) fact that even a fully software pipelined code may not achieve good performance if the overhead for fine-grain synchronization exceeds the capacity of the machine.

    References

    [1]
    Arvind aJad D.E. Culler. Dataflow architectures. Annual Reviews in Computer Science, 1:225-253, 1986.
    [2]
    M. Babu et al. An enable memory controller chip. Technical report, McGill University, Nov. 1989. In the Proceedings of the VLSI Reseaxch Review, Centre de recherche informatique de Montreal.
    [3]
    J. Backus. Can programming be liberated from the yon Neumann style? A functional style and its algebra of programs. CACM, 21(8):613-641, Aug. 1978.
    [4]
    J. Cocke. The search for performance in scientific processors. Communications of the A CM, 31(3), March 1988.
    [5]
    D.E. Culler and Arvind. Resource requirements of dataflow programs. In Proc. of the I5th Annual International Syrup. on Computer Architecture, pages 141- 150, 1988.
    [6]
    J.B. Dennis and G.R. Gnu. An efficient pipelined dataflow processor architecture. In Joint Conf. on Su. percomputinp, pages 368-373, Florida, Nov. 1988. IEEE Computer Society and ACM SIGARCH.
    [7]
    G.R. Gnu. A plpelined code mapping scheme for static dataflow computers. Technical Report TR-371, Laboratory for Computer Science, MIT, 1986.
    [8]
    G.R. Gnu. A maximally pipelined tridiagonal linear equation solver. Journal of Parallel and Distributed Computing, 3(2):215-235, June 1986.
    [9]
    G.R. Gnu. Aspects of balancing techniques for pipelined data flow code generation. Journal o} Parallel and Distributed Computinp, 6:39-61, 1989.
    [10]
    G.R. Gnu. A flexible architecture model for hybrid dataflow and control-flow evaluation. In Proc. of the International Workshop: Dataflow- A Status Report, Israel, May 1989. in conjunction with 'the ACM Annual Symposium on Computer Architecture. To be published by Prentice-Hall.
    [11]
    G.R. Gnu, H.H.J. Hum, and Y.B. Wong. Parallel function invocation in a dynamic argument-fetching datatiow architecture. In PARBASE '90, Miami Beach, Florida, March 1990.
    [12]
    G.R. Ga~ and Z. Paraskevas. Dataflow software pipelining: A case study. ACAPS Design Note 06, School of Computer Science, McGill University, Montreal, Que., Feb. 1989. Presented as a short paper at the International Conference on Supercomputing '89, Crete, Greece, June 1989.
    [13]
    G.R. Gnu and R. Tio. instruction set design of an efficient pipdined dataflow architecture. In Proceedings of the P~nd international Conf. of System Science, pages 383-393, Hawaii, Java. 1989.
    [14]
    G.R. Gnu, R. Tio, and H.J. Hum. Design of an efficient dataflow architecture without dataflow. In Proc. of the International Conf. on Fifth.Generation Computers, pages 861-868, Tokyo, Japan, Dec. 1988.
    [15]
    J.R. Gurd, C.C. Kirkham, and I. Watson. The Manchester prototype dataflow computer. CA CM, 28(1):34- 52, Jan. 1985.
    [16]
    W.-K. Hung. IF1 parser for HDDG. ACAPS Design Note 01, School Of Computer Science, McGill University, Montreale Que., June 1988.
    [17]
    P. Hudak. Arrays, non-determinism, and parallelism: A functional perspective. In Graph Reduction, pages 312-327. Springer-Verlag, LNCS-2?9, 1987.
    [18]
    M. Lain. Software pipelining: An effective scheduling technique for VLIW machines. In Proc. of the 1988 A CM SIGPLAN Con}. on Programming Languages Design and Implementation, pages 318-328, Atlanta, Georgia, June 1988.
    [19]
    I. Little. A hierarchical data dependency graph viewer. ACAPS Design Note 08, School of Computer Science, McGi}l University, Montreal, Que., Feb. 1989.
    [20]
    Z. Paraskev~. Code generation for dataflow software pipelining. Master's thesis, McGill University, Montreal, Quebec, J~xrte 1989.
    [21]
    B.R. Rau and C.D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proc. of the l~th Annual Workshop on Micropropramming, pages 183-198, 1981.
    [22]
    C.A. Ruggiero and J. Sargeant. Control of parallelism in the Manchester dataflow machine. In Functional Prog. Lan9. and Cutup. Arch., pages 1-15. Springer- Verlag, LNCS-274, 1987.
    [23]
    R. Tio. The A-code assembly language reference manual. ACAPS Design Note 02, School Of Computer Science, McGill University, Montreal, Que., July 1988.
    [24]
    R. Tio. DASM: The A-code data-driven assembler program reference manual. ACAPS Design Note 03, School Of Computer Science, MeGill University, Montreal, Que., July 1988.
    [25]
    R.F. Touzeau. A FORTRAN compiler for the FPS-164 scientific computer. In Proc. oj the A CM SIGPLAN 'Sj Syrup. on Compiler Construction, pages 48-57, June 1984.
    [26]
    P.L. Wadler. A new array operations. In Graph Redue. tion, pages 328-335. SprinKer-Verlag, LNCS-279, 1987.

    Cited By

    View all
    • (2005)A novel high-speed memory organization for fine-grain multi-thread computingPARLE '91 Parallel Architectures and Languages Europe10.1007/BFb0035095(34-51)Online publication date: 23-Jun-2005
    • (2005)Minimizing loop storage allocation for an argument-fetching dataflow architecture modelPARLE '92 Parallel Architectures and Languages Europe10.1007/3-540-55599-4_112(585-600)Online publication date: 14-Jul-2005
    • (2005)An efficient scheme for fine-grain software pipeliningCONPAR 90 — VAPP IV10.1007/3-540-53065-7_147(709-720)Online publication date: 2-Jun-2005
    • Show More Cited By

    Index Terms

    1. Towards efficient fine-grain software pipelining

                      Recommendations

                      Comments

                      Information & Contributors

                      Information

                      Published In

                      cover image ACM Conferences
                      ICS '90: Proceedings of the 4th international conference on Supercomputing
                      June 1990
                      492 pages
                      ISBN:0897913698
                      DOI:10.1145/77726
                      • cover image ACM SIGARCH Computer Architecture News
                        ACM SIGARCH Computer Architecture News  Volume 18, Issue 3b
                        Special Issue: Proceedings of the 4th international conference on Supercomputing
                        Sept. 1990
                        489 pages
                        ISSN:0163-5964
                        DOI:10.1145/255129
                        Issue’s Table of Contents
                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Sponsors

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      Published: 01 June 1990

                      Permissions

                      Request permissions for this article.

                      Check for updates

                      Qualifiers

                      • Article

                      Conference

                      IC'90
                      Sponsor:
                      IC'90: ACM SIGARCH International Conference on Supercomputing
                      June 11 - 15, 1990
                      Amsterdam, The Netherlands

                      Acceptance Rates

                      Overall Acceptance Rate 629 of 2,180 submissions, 29%

                      Contributors

                      Other Metrics

                      Bibliometrics & Citations

                      Bibliometrics

                      Article Metrics

                      • Downloads (Last 12 months)25
                      • Downloads (Last 6 weeks)5
                      Reflects downloads up to 12 Aug 2024

                      Other Metrics

                      Citations

                      Cited By

                      View all
                      • (2005)A novel high-speed memory organization for fine-grain multi-thread computingPARLE '91 Parallel Architectures and Languages Europe10.1007/BFb0035095(34-51)Online publication date: 23-Jun-2005
                      • (2005)Minimizing loop storage allocation for an argument-fetching dataflow architecture modelPARLE '92 Parallel Architectures and Languages Europe10.1007/3-540-55599-4_112(585-600)Online publication date: 14-Jul-2005
                      • (2005)An efficient scheme for fine-grain software pipeliningCONPAR 90 — VAPP IV10.1007/3-540-53065-7_147(709-720)Online publication date: 2-Jun-2005
                      • (1991)Efficient support of concurrent threads in a hybrid dataflow/von Neumann architectureProceedings of the 1991 Third IEEE Symposium on Parallel and Distributed Processing10.1109/SPDP.1991.218280(190-193)Online publication date: 2-Dec-1991
                      • (1991)A Novel High-Speed Memory Organization for Fine-Grain Multi-Thread ComputingParle ’91 Parallel Architectures and Languages Europe10.1007/978-3-662-25209-3_4(34-51)Online publication date: 1991
                      • (1993)LiteraturverzeichnisDatenflußrechner10.1007/978-3-322-94688-1_9(357-389)Online publication date: 1993

                      View Options

                      View options

                      PDF

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader

                      Get Access

                      Login options

                      Media

                      Figures

                      Other

                      Tables

                      Share

                      Share

                      Share this Publication link

                      Share on social media