Article

Free access

Towards efficient fine-grain software pipelining

Authors:

Herbert H. J. Hum,

Yue-Bong WongAuthors Info & Claims

ICS '90: Proceedings of the 4th international conference on Supercomputing

Pages 369 - 379

https://doi.org/10.1145/77726.255177

Published: 01 June 1990 Publication History

Abstract

Dataflow software pipelining was proposed as a means of structuring fine-grain parallelism and has been studied mostly under an idealized dataflow architecture model with infinite resources[9]. In this paper, we investigate the effects of software pipelining under realistic architecture models with finite resources. Our target architecture is the McGill Dataflow Architecture which employs conventional pipelined techniques to achieve fast instruction execution, while exploiting fine-grain parallelism via a data-driven instruction scheduler. To achieve optimal execution efficiency, the compiled code must be able to make a balanced use of both the parallelism in the instruction execution unit and the fine-grain synchronization power of the machine.

A detailed analysis based on simulation results is presented, focusing on two key architectural factors - the fine-grain synchronization capacity and the scheduling mechanism for enabling instructions. On one hand, our results provide experimental evidence that software pipelining is an effective method for exploiting fine-grain parallelism in loops. On the other, the experiments have also revealed the (somewhat pessimistic) fact that even a fully software pipelined code may not achieve good performance if the overhead for fine-grain synchronization exceeds the capacity of the machine.

References

[1]

Arvind aJad D.E. Culler. Dataflow architectures. Annual Reviews in Computer Science, 1:225-253, 1986.

Digital Library

[2]

M. Babu et al. An enable memory controller chip. Technical report, McGill University, Nov. 1989. In the Proceedings of the VLSI Reseaxch Review, Centre de recherche informatique de Montreal.

[3]

J. Backus. Can programming be liberated from the yon Neumann style? A functional style and its algebra of programs. CACM, 21(8):613-641, Aug. 1978.

Digital Library

[4]

J. Cocke. The search for performance in scientific processors. Communications of the A CM, 31(3), March 1988.

Digital Library

[5]

D.E. Culler and Arvind. Resource requirements of dataflow programs. In Proc. of the I5th Annual International Syrup. on Computer Architecture, pages 141- 150, 1988.

Digital Library

[6]

J.B. Dennis and G.R. Gnu. An efficient pipelined dataflow processor architecture. In Joint Conf. on Su. percomputinp, pages 368-373, Florida, Nov. 1988. IEEE Computer Society and ACM SIGARCH.

Digital Library

[7]

G.R. Gnu. A plpelined code mapping scheme for static dataflow computers. Technical Report TR-371, Laboratory for Computer Science, MIT, 1986.

[8]

G.R. Gnu. A maximally pipelined tridiagonal linear equation solver. Journal of Parallel and Distributed Computing, 3(2):215-235, June 1986.

Digital Library

[9]

G.R. Gnu. Aspects of balancing techniques for pipelined data flow code generation. Journal o} Parallel and Distributed Computinp, 6:39-61, 1989.

Digital Library

[10]

G.R. Gnu. A flexible architecture model for hybrid dataflow and control-flow evaluation. In Proc. of the International Workshop: Dataflow- A Status Report, Israel, May 1989. in conjunction with 'the ACM Annual Symposium on Computer Architecture. To be published by Prentice-Hall.

[11]

G.R. Gnu, H.H.J. Hum, and Y.B. Wong. Parallel function invocation in a dynamic argument-fetching datatiow architecture. In PARBASE '90, Miami Beach, Florida, March 1990.

[12]

G.R. Ga~ and Z. Paraskevas. Dataflow software pipelining: A case study. ACAPS Design Note 06, School of Computer Science, McGill University, Montreal, Que., Feb. 1989. Presented as a short paper at the International Conference on Supercomputing '89, Crete, Greece, June 1989.

[13]

G.R. Gnu and R. Tio. instruction set design of an efficient pipdined dataflow architecture. In Proceedings of the P~nd international Conf. of System Science, pages 383-393, Hawaii, Java. 1989.

[14]

G.R. Gnu, R. Tio, and H.J. Hum. Design of an efficient dataflow architecture without dataflow. In Proc. of the International Conf. on Fifth.Generation Computers, pages 861-868, Tokyo, Japan, Dec. 1988.

[15]

J.R. Gurd, C.C. Kirkham, and I. Watson. The Manchester prototype dataflow computer. CA CM, 28(1):34- 52, Jan. 1985.

Digital Library

[16]

W.-K. Hung. IF1 parser for HDDG. ACAPS Design Note 01, School Of Computer Science, McGill University, Montreale Que., June 1988.

[17]

P. Hudak. Arrays, non-determinism, and parallelism: A functional perspective. In Graph Reduction, pages 312-327. Springer-Verlag, LNCS-2?9, 1987.

[18]

M. Lain. Software pipelining: An effective scheduling technique for VLIW machines. In Proc. of the 1988 A CM SIGPLAN Con}. on Programming Languages Design and Implementation, pages 318-328, Atlanta, Georgia, June 1988.

Digital Library

[19]

I. Little. A hierarchical data dependency graph viewer. ACAPS Design Note 08, School of Computer Science, McGi}l University, Montreal, Que., Feb. 1989.

[20]

Z. Paraskev~. Code generation for dataflow software pipelining. Master's thesis, McGill University, Montreal, Quebec, J~xrte 1989.

[21]

B.R. Rau and C.D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proc. of the l~th Annual Workshop on Micropropramming, pages 183-198, 1981.

Digital Library

[22]

C.A. Ruggiero and J. Sargeant. Control of parallelism in the Manchester dataflow machine. In Functional Prog. Lan9. and Cutup. Arch., pages 1-15. Springer- Verlag, LNCS-274, 1987.

[23]

R. Tio. The A-code assembly language reference manual. ACAPS Design Note 02, School Of Computer Science, McGill University, Montreal, Que., July 1988.

[24]

R. Tio. DASM: The A-code data-driven assembler program reference manual. ACAPS Design Note 03, School Of Computer Science, MeGill University, Montreal, Que., July 1988.

[25]

R.F. Touzeau. A FORTRAN compiler for the FPS-164 scientific computer. In Proc. oj the A CM SIGPLAN 'Sj Syrup. on Compiler Construction, pages 48-57, June 1984.

Digital Library

[26]

P.L. Wadler. A new array operations. In Graph Redue. tion, pages 328-335. SprinKer-Verlag, LNCS-279, 1987.

Cited By

Hum HGao G(2005)A novel high-speed memory organization for fine-grain multi-thread computingPARLE '91 Parallel Architectures and Languages Europe10.1007/BFb0035095(34-51)Online publication date: 23-Jun-2005
https://doi.org/10.1007/BFb0035095
Ning QGao G(2005)Minimizing loop storage allocation for an argument-fetching dataflow architecture modelPARLE '92 Parallel Architectures and Languages Europe10.1007/3-540-55599-4_112(585-600)Online publication date: 14-Jul-2005
https://doi.org/10.1007/3-540-55599-4_112
Gao GHum HWong Y(2005)An efficient scheme for fine-grain software pipeliningCONPAR 90 — VAPP IV10.1007/3-540-53065-7_147(709-720)Online publication date: 2-Jun-2005
https://doi.org/10.1007/3-540-53065-7_147
Show More Cited By

Index Terms

Recommendations

Towards efficient fine-grain software pipelining
Special Issue: Proceedings of the 4th international conference on Supercomputing

Dataflow software pipelining was proposed as a means of structuring fine-grain parallelism and has been studied mostly under an idealized dataflow architecture model with infinite resources[9]. In this paper, we investigate the effects of software ...
Software pipelining: an effective scheduling technique for VLIW machines
20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999: A Selection

The basic idea behind software pipelining was first developed by Patel and Davidson for scheduling hardware pipe-lines. As instruction-level parallelism made its way into general-purpose computing, it became necessary to automate scheduling. How and ...
Decoupled Software Pipelining with the Synchronization Array
PACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques

Despite the success of instruction-level parallelism (ILP) optimizations in increasing the performance of microprocessors, certain codes remain elusive. In particular, codes containing recursive data structure (RDS) traversal loops have been largely ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '90: Proceedings of the 4th international conference on Supercomputing

June 1990

492 pages

ISBN:0897913698

DOI:10.1145/77726

Chairmen:
Ahmed Sameh
Univ. of Illinois
,
Henk van der Vorst
Delft Univ. of Technology and CWI, The Netherlands

ACM SIGARCH Computer Architecture News Volume 18, Issue 3b
Special Issue: Proceedings of the 4th international conference on Supercomputing
Sept. 1990
489 pages
ISSN:0163-5964
DOI:10.1145/255129
Chairmen:
Ahmed Sameh
Univ. of Illinois at Urbana-Champaign, Urbana
,
Henk van der Vorst
Delft Univ. of Technology and CWI, The Netherlands
Issue’s Table of Contents

Copyright © 1990 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1990

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

IC'90

Sponsor:

SIGARCH

IC'90: ACM SIGARCH International Conference on Supercomputing

June 11 - 15, 1990

Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
384
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hum HGao G(2005)A novel high-speed memory organization for fine-grain multi-thread computingPARLE '91 Parallel Architectures and Languages Europe10.1007/BFb0035095(34-51)Online publication date: 23-Jun-2005
https://doi.org/10.1007/BFb0035095
Ning QGao G(2005)Minimizing loop storage allocation for an argument-fetching dataflow architecture modelPARLE '92 Parallel Architectures and Languages Europe10.1007/3-540-55599-4_112(585-600)Online publication date: 14-Jul-2005
https://doi.org/10.1007/3-540-55599-4_112
Gao GHum HWong Y(2005)An efficient scheme for fine-grain software pipeliningCONPAR 90 — VAPP IV10.1007/3-540-53065-7_147(709-720)Online publication date: 2-Jun-2005
https://doi.org/10.1007/3-540-53065-7_147
Hum Gao (1991)Efficient support of concurrent threads in a hybrid dataflow/von Neumann architectureProceedings of the 1991 Third IEEE Symposium on Parallel and Distributed Processing10.1109/SPDP.1991.218280(190-193)Online publication date: 2-Dec-1991
https://dl.acm.org/doi/10.1109/SPDP.1991.218280
Hum HGao G(1991)A Novel High-Speed Memory Organization for Fine-Grain Multi-Thread ComputingParle ’91 Parallel Architectures and Languages Europe10.1007/978-3-662-25209-3_4(34-51)Online publication date: 1991
https://doi.org/10.1007/978-3-662-25209-3_4
Ungerer TUngerer T(1993)LiteraturverzeichnisDatenflußrechner10.1007/978-3-322-94688-1_9(357-389)Online publication date: 1993
https://doi.org/10.1007/978-3-322-94688-1_9

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents