Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/77726.255177acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article
Free access

Towards efficient fine-grain software pipelining

Published: 01 June 1990 Publication History

Abstract

Dataflow software pipelining was proposed as a means of structuring fine-grain parallelism and has been studied mostly under an idealized dataflow architecture model with infinite resources[9]. In this paper, we investigate the effects of software pipelining under realistic architecture models with finite resources. Our target architecture is the McGill Dataflow Architecture which employs conventional pipelined techniques to achieve fast instruction execution, while exploiting fine-grain parallelism via a data-driven instruction scheduler. To achieve optimal execution efficiency, the compiled code must be able to make a balanced use of both the parallelism in the instruction execution unit and the fine-grain synchronization power of the machine.
A detailed analysis based on simulation results is presented, focusing on two key architectural factors - the fine-grain synchronization capacity and the scheduling mechanism for enabling instructions. On one hand, our results provide experimental evidence that software pipelining is an effective method for exploiting fine-grain parallelism in loops. On the other, the experiments have also revealed the (somewhat pessimistic) fact that even a fully software pipelined code may not achieve good performance if the overhead for fine-grain synchronization exceeds the capacity of the machine.

References

[1]
Arvind aJad D.E. Culler. Dataflow architectures. Annual Reviews in Computer Science, 1:225-253, 1986.
[2]
M. Babu et al. An enable memory controller chip. Technical report, McGill University, Nov. 1989. In the Proceedings of the VLSI Reseaxch Review, Centre de recherche informatique de Montreal.
[3]
J. Backus. Can programming be liberated from the yon Neumann style? A functional style and its algebra of programs. CACM, 21(8):613-641, Aug. 1978.
[4]
J. Cocke. The search for performance in scientific processors. Communications of the A CM, 31(3), March 1988.
[5]
D.E. Culler and Arvind. Resource requirements of dataflow programs. In Proc. of the I5th Annual International Syrup. on Computer Architecture, pages 141- 150, 1988.
[6]
J.B. Dennis and G.R. Gnu. An efficient pipelined dataflow processor architecture. In Joint Conf. on Su. percomputinp, pages 368-373, Florida, Nov. 1988. IEEE Computer Society and ACM SIGARCH.
[7]
G.R. Gnu. A plpelined code mapping scheme for static dataflow computers. Technical Report TR-371, Laboratory for Computer Science, MIT, 1986.
[8]
G.R. Gnu. A maximally pipelined tridiagonal linear equation solver. Journal of Parallel and Distributed Computing, 3(2):215-235, June 1986.
[9]
G.R. Gnu. Aspects of balancing techniques for pipelined data flow code generation. Journal o} Parallel and Distributed Computinp, 6:39-61, 1989.
[10]
G.R. Gnu. A flexible architecture model for hybrid dataflow and control-flow evaluation. In Proc. of the International Workshop: Dataflow- A Status Report, Israel, May 1989. in conjunction with 'the ACM Annual Symposium on Computer Architecture. To be published by Prentice-Hall.
[11]
G.R. Gnu, H.H.J. Hum, and Y.B. Wong. Parallel function invocation in a dynamic argument-fetching datatiow architecture. In PARBASE '90, Miami Beach, Florida, March 1990.
[12]
G.R. Ga~ and Z. Paraskevas. Dataflow software pipelining: A case study. ACAPS Design Note 06, School of Computer Science, McGill University, Montreal, Que., Feb. 1989. Presented as a short paper at the International Conference on Supercomputing '89, Crete, Greece, June 1989.
[13]
G.R. Gnu and R. Tio. instruction set design of an efficient pipdined dataflow architecture. In Proceedings of the P~nd international Conf. of System Science, pages 383-393, Hawaii, Java. 1989.
[14]
G.R. Gnu, R. Tio, and H.J. Hum. Design of an efficient dataflow architecture without dataflow. In Proc. of the International Conf. on Fifth.Generation Computers, pages 861-868, Tokyo, Japan, Dec. 1988.
[15]
J.R. Gurd, C.C. Kirkham, and I. Watson. The Manchester prototype dataflow computer. CA CM, 28(1):34- 52, Jan. 1985.
[16]
W.-K. Hung. IF1 parser for HDDG. ACAPS Design Note 01, School Of Computer Science, McGill University, Montreale Que., June 1988.
[17]
P. Hudak. Arrays, non-determinism, and parallelism: A functional perspective. In Graph Reduction, pages 312-327. Springer-Verlag, LNCS-2?9, 1987.
[18]
M. Lain. Software pipelining: An effective scheduling technique for VLIW machines. In Proc. of the 1988 A CM SIGPLAN Con}. on Programming Languages Design and Implementation, pages 318-328, Atlanta, Georgia, June 1988.
[19]
I. Little. A hierarchical data dependency graph viewer. ACAPS Design Note 08, School of Computer Science, McGi}l University, Montreal, Que., Feb. 1989.
[20]
Z. Paraskev~. Code generation for dataflow software pipelining. Master's thesis, McGill University, Montreal, Quebec, J~xrte 1989.
[21]
B.R. Rau and C.D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proc. of the l~th Annual Workshop on Micropropramming, pages 183-198, 1981.
[22]
C.A. Ruggiero and J. Sargeant. Control of parallelism in the Manchester dataflow machine. In Functional Prog. Lan9. and Cutup. Arch., pages 1-15. Springer- Verlag, LNCS-274, 1987.
[23]
R. Tio. The A-code assembly language reference manual. ACAPS Design Note 02, School Of Computer Science, McGill University, Montreal, Que., July 1988.
[24]
R. Tio. DASM: The A-code data-driven assembler program reference manual. ACAPS Design Note 03, School Of Computer Science, MeGill University, Montreal, Que., July 1988.
[25]
R.F. Touzeau. A FORTRAN compiler for the FPS-164 scientific computer. In Proc. oj the A CM SIGPLAN 'Sj Syrup. on Compiler Construction, pages 48-57, June 1984.
[26]
P.L. Wadler. A new array operations. In Graph Redue. tion, pages 328-335. SprinKer-Verlag, LNCS-279, 1987.

Cited By

View all
  • (2005)A novel high-speed memory organization for fine-grain multi-thread computingPARLE '91 Parallel Architectures and Languages Europe10.1007/BFb0035095(34-51)Online publication date: 23-Jun-2005
  • (2005)Minimizing loop storage allocation for an argument-fetching dataflow architecture modelPARLE '92 Parallel Architectures and Languages Europe10.1007/3-540-55599-4_112(585-600)Online publication date: 14-Jul-2005
  • (2005)An efficient scheme for fine-grain software pipeliningCONPAR 90 — VAPP IV10.1007/3-540-53065-7_147(709-720)Online publication date: 2-Jun-2005
  • Show More Cited By

Index Terms

  1. Towards efficient fine-grain software pipelining

                    Recommendations

                    Comments

                    Information & Contributors

                    Information

                    Published In

                    cover image ACM Conferences
                    ICS '90: Proceedings of the 4th international conference on Supercomputing
                    June 1990
                    492 pages
                    ISBN:0897913698
                    DOI:10.1145/77726
                    • cover image ACM SIGARCH Computer Architecture News
                      ACM SIGARCH Computer Architecture News  Volume 18, Issue 3b
                      Special Issue: Proceedings of the 4th international conference on Supercomputing
                      Sept. 1990
                      489 pages
                      ISSN:0163-5964
                      DOI:10.1145/255129
                      Issue’s Table of Contents
                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Sponsors

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    Published: 01 June 1990

                    Permissions

                    Request permissions for this article.

                    Check for updates

                    Qualifiers

                    • Article

                    Conference

                    IC'90
                    Sponsor:
                    IC'90: ACM SIGARCH International Conference on Supercomputing
                    June 11 - 15, 1990
                    Amsterdam, The Netherlands

                    Acceptance Rates

                    Overall Acceptance Rate 629 of 2,180 submissions, 29%

                    Contributors

                    Other Metrics

                    Bibliometrics & Citations

                    Bibliometrics

                    Article Metrics

                    • Downloads (Last 12 months)21
                    • Downloads (Last 6 weeks)1
                    Reflects downloads up to 16 Oct 2024

                    Other Metrics

                    Citations

                    Cited By

                    View all
                    • (2005)A novel high-speed memory organization for fine-grain multi-thread computingPARLE '91 Parallel Architectures and Languages Europe10.1007/BFb0035095(34-51)Online publication date: 23-Jun-2005
                    • (2005)Minimizing loop storage allocation for an argument-fetching dataflow architecture modelPARLE '92 Parallel Architectures and Languages Europe10.1007/3-540-55599-4_112(585-600)Online publication date: 14-Jul-2005
                    • (2005)An efficient scheme for fine-grain software pipeliningCONPAR 90 — VAPP IV10.1007/3-540-53065-7_147(709-720)Online publication date: 2-Jun-2005
                    • (1991)Efficient support of concurrent threads in a hybrid dataflow/von Neumann architectureProceedings of the 1991 Third IEEE Symposium on Parallel and Distributed Processing10.1109/SPDP.1991.218280(190-193)Online publication date: 2-Dec-1991
                    • (1991)A Novel High-Speed Memory Organization for Fine-Grain Multi-Thread ComputingParle ’91 Parallel Architectures and Languages Europe10.1007/978-3-662-25209-3_4(34-51)Online publication date: 1991
                    • (1993)LiteraturverzeichnisDatenflußrechner10.1007/978-3-322-94688-1_9(357-389)Online publication date: 1993

                    View Options

                    View options

                    PDF

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader

                    Get Access

                    Login options

                    Media

                    Figures

                    Other

                    Tables

                    Share

                    Share

                    Share this Publication link

                    Share on social media