research-article

Generating Local Addresses and Communication Sets for Data-Parallel Programs

Authors:

S.H. TengAuthors Info & Claims

Journal of Parallel and Distributed Computing, Volume 26, Issue 1

Pages 72 - 84

https://doi.org/10.1006/jpdc.1995.1049

Published: 01 April 1995 Publication History

Abstract

Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We demonstrate a storage scheme for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution that does not waste any storage, and show that, under this storage scheme, the local memory access sequence of any processor for a computation involving the regular section A( :h:s) is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and we extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little runtime overhead and acceptable preprocessing time.

References

[1]

C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell, A linear algebra framework for static HPF code distribution. In Workshop on Compilers for Parallel Computers , Delft, Dec. 1993.

[2]

D. H. Bailey, Unfavorable strides in cache memory systems. RNR technical report RNR-92-015, Numerical Aerodynamic Simulation Systems Division, NASA Ames Research Center, Moffett Field, CA, May 1992.

[3]

J. J. Dongarra, Performance of various computers using standard linear equations software. Comput. Architect. News 20 (3), 22-44 (June 1992).

Digital Library

[4]

J. J. Dongarra, R. van de Geijn, and D. W. Walker, A look at scalable dense linear algebra libraries. In Proceedings of the Scalable High Performance Computing Conference . Williamsburg, VA, Apr. 1992. IEEE Computer Society Press, Los Alamitos, CA, pp. 372-379. Also available as technical report ORNL/TM-12126 from Oak Ridge National Laboratory.

[5]

G. C. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M.-Y. Wu, Fortran D language specification. Technical report Rice COMP TR90-141, Department of Computer Science, Rice University, Houston, TX, Dec. 1990.

[6]

S. K. S. Gupta, S. D. Kaushik, S. Mufti, S. Sharma, C.-H. Huang, and P. Sadayappan, On compiling array expressions for efficient execution on distributed-memory machines. In A. N. Choudhary and P. B. Berra (Eds.). Proceedings of the 1993 International Conference on Parallel Processing , CRC Press, Boca Raton, FL, 1993, Vol. II, pp. 301-305.

[7]

High Performance Fortran Forum, High Performance Fortran language specification version 1.0. Draft, Jan. 1993. Also available as technical report CRPC-TR 92225, Center for Research on Parallel Computation, Rice University.

[8]

D. E. Knuth, The Art of Computer Programming: Seminumerical Algorithms . Addison-Wesley, Reading, MA, 1981, 2nd ed., Vol. 2.

[9]

C. Koelbel, Compile-time generation of regular communication patterns. In Proceedings of Supercomputing '91 , Albuquerque, NM, Nov. 1991, pp. 101-110.

Digital Library

[10]

C. Koelbel and P. Mehrotra, Compiling global name-space parallel loops for distributed execution. IEEE Trans. Parallel Distribut. Systems 2 (4), 440-451 (Oct. 1991).

Digital Library

[11]

C. H. Koelbel, D. B. Loveman, R. S. Schreiber, G. L. Steele, Jr., and M. E. Zosel, The High Performance Fortran Handbook . Scientific and Engineering Computation. The MIT Press, Cambridge, MA, 1994.

Digital Library

[12]

T. MacDonald, D. Pase, and A. Meltzer, Addressing in Cray Research's MPP Fortran. In Proceedings of the Third Workshop on Compilers for Parallel Computers , Vienna, Austria, July 1992. Austrian Center for Parallel Computation, pp. 161-172.

[13]

R. Mirchandaney, J. H. Saltz, R. M. Smith, K. Crowley, and D. M. Nicol, Principles of runtime support for parallel processors. In Third International Conference on Supercomputing . ACM press, New York, 1988, pp. 140-152.

[14]

I. Niven and H. S. Zuckerman, An Introduction to the Theory of Numbers . Wiley, New York, 1964.

[15]

J. M. Sitchnoth, Efficient compilation of array statements for private memory multicomputers. Technical report CMU-CS-93-109, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Feb. 1993.

Cited By

Wang YLi PCong JBetz VConstantinides G(2014)Theory and algorithm for generalized memory partitioning in high-level synthesisProceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays10.1145/2554688.2554780(199-208)Online publication date: 26-Feb-2014
https://dl.acm.org/doi/10.1145/2554688.2554780
Wang YLi PZhang PZhang CCong J(2013)Memory partitioning for multidimensional arrays in high-level synthesisProceedings of the 50th Annual Design Automation Conference10.1145/2463209.2488748(1-8)Online publication date: 29-May-2013
https://dl.acm.org/doi/10.1145/2463209.2488748
Kandemir MZhang YMuralidhara SOzturk ONarayanan SHenkel JParameswaran S(2009)Slicing based code parallelization for minimizing inter-processor communicationProceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1629395.1629409(87-96)Online publication date: 11-Oct-2009
https://dl.acm.org/doi/10.1145/1629395.1629409
Show More Cited By

Index Terms

Generating Local Addresses and Communication Sets for Data-Parallel Programs
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
  2. Integrated circuits
    1. Semiconductor memory
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Object oriented languages

Recommendations

Generating local addresses and communication sets for data-parallel programs

Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We show that for an array A affinely aligned to a template that is distributed ...
Generating local addresses and communication sets for data-parallel programs
PPOPP '93: Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming

Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We show that for an array A affinely aligned to a template that is distributed ...
Compiling data-parallel programs for clusters of SMPs: Research Articles
Compilers for Parallel Computers

Clusters of shared-memory multiprocessors (SMPs) have become the most promising parallel computing platforms for scientific computing. However, SMP clusters significantly increase the complexity of user application development when using the low-level ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Parallel and Distributed Computing

Journal of Parallel and Distributed Computing Volume 26, Issue 1

April 1, 1995

135 pages

ISSN:0743-7315

Editors:
Allan Gottlieb
New York Univ., New York, NY
,
Kai Hwang
Univ. of Southern California, Los Angeles
,
Sartaj Sahni
Univ. of Florida, Gainesville

Issue’s Table of Contents

Copyright © Academic Press.

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 April 1995

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang YLi PCong JBetz VConstantinides G(2014)Theory and algorithm for generalized memory partitioning in high-level synthesisProceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays10.1145/2554688.2554780(199-208)Online publication date: 26-Feb-2014
https://dl.acm.org/doi/10.1145/2554688.2554780
Wang YLi PZhang PZhang CCong J(2013)Memory partitioning for multidimensional arrays in high-level synthesisProceedings of the 50th Annual Design Automation Conference10.1145/2463209.2488748(1-8)Online publication date: 29-May-2013
https://dl.acm.org/doi/10.1145/2463209.2488748
Kandemir MZhang YMuralidhara SOzturk ONarayanan SHenkel JParameswaran S(2009)Slicing based code parallelization for minimizing inter-processor communicationProceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1629395.1629409(87-96)Online publication date: 11-Oct-2009
https://dl.acm.org/doi/10.1145/1629395.1629409
Huang JChu C(2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
https://dl.acm.org/doi/10.1007/s11227-007-0166-9
Hsu CChen MYang CLi K(2006)Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing CompilersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2006.16217:11(1226-1241)Online publication date: 1-Nov-2006
https://dl.acm.org/doi/10.1109/TPDS.2006.162
Huang JChu C(2006)An Efficient Communication Scheduling Method for the Processor Mapping Technique Applied Data RedistributionThe Journal of Supercomputing10.1007/s11227-006-6615-z37:3(297-318)Online publication date: 1-Sep-2006
https://dl.acm.org/doi/10.1007/s11227-006-6615-z
Tseng EGaudiot J(2005)Automatic array partitioning based on the Smith normal formInternational Journal of Parallel Programming10.1007/s10766-004-1460-233:1(35-56)Online publication date: 1-Feb-2005
https://dl.acm.org/doi/10.1007/s10766-004-1460-2
Seinstra FKoelma DBagdanov A(2004)Finite State Machine-Based Optimization of Data Parallel Regular Domain Problems Applied in Low-Level Image ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2004.5515:10(865-877)Online publication date: 1-Oct-2004
https://dl.acm.org/doi/10.1109/TPDS.2004.55
Hsu CYu K(2004)A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse MatrixThe Journal of Supercomputing10.1023/B:SUPE.0000026846.74050.1829:2(125-143)Online publication date: 1-Aug-2004
https://dl.acm.org/doi/10.1023/B%3ASUPE.0000026846.74050.18
Hsu CYu K(2003)A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrixProceedings of the 2003 international conference on Parallel and distributed processing and applications10.5555/1761566.1761577(53-64)Online publication date: 2-Jul-2003
https://dl.acm.org/doi/10.5555/1761566.1761577
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents