research-article

Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors

Authors:

Ashwath Thirumalai,

J. RamanujamAuthors Info & Claims

Journal of Parallel and Distributed Computing, Volume 38, Issue 2

Pages 188 - 203

https://doi.org/10.1006/jpdc.1996.0140

Published: 01 November 1996 Publication History

Abstract

Arrays are mapped to processors through a two-step process alignment followed by distribution in data-parallel languages such as High Performance Fortran. This process of mapping creates disjoint pieces of the array that are locally owned by each processor. An HPF compiler that generates code for array statements must compute the sequence of local memory addresses accessed by each processor and the sequence of sends and receives for a given processor to access nonlocal data. In this paper, we present an approach to the address sequence generation problem using the theory of integer lattices. The set of elements referenced can be generated by integer linear combinations of basis vectors. Unlike other work on this problem, we derive closed form expressions for the basis vectors as a function of the mapping of data. Using these basis vectors and exploiting the fact that there is a repeating pattern in the access sequence, we derive highly optimized code that generates the pattern at runtime. The code generated uses table-lookup of the pattern. Experimental results show that our approach is faster than other solutions to this problem.

References

[1]

A. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. Proc. of the 4th Workshop on Compilers for Parallel Computers . Delft, The Netherlands, Dec. 1993.

[2]

S. Benkner. Handling block-cyclic distributed arrays in Vienna Fortran 90. Proc. International Conference on Parallel Architectures and Compilation Techniques . Limassol, Cyprus, June 1995.

[3]

B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. Sci. Programming 1 (1), 31-50, 1992.

Digital Library

[4]

S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Teng. Generating local addresses and communication sets for data parallel programs. J. Parallel Distrib. Comput. 26 , 72-84, 1995.

Digital Library

[5]

G. Dantzig and B. Eaves. Fourier-Motzkin elimination and its dual. J. Combin. Theory A 14 , 288-297, 1973.

[6]

G. Fox, S. Hiranandani, K. Kennedy, C Koelbel, U. Kremer, C.-W. Tseng, and M. Wu. Fortran D language specification. Technical Report TR-91-170, Dept. of Computer Science, Rice University, Dec. 1991.

[7]

P. M. Gruber and C. G. Lekkerkerker. Geometry of numbers. North-Holland Mathematical Library, 2nd ed., Volume 37, North-Holland, Amsterdam, 1987.

[8]

S. Gupta, S. Kaushik, C. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. J. Parallel Distrib. Comput. , to appear.

[9]

High Performance Fortran Forum. High Performance Fortran language specification. Sci. Programming 2 (1-2), 1-170, 1993.

[10]

S. Hiranandani, K. Kennedy, J. Mellor-Crummey, and A. Sethi. Compilation techniques for block-cyclic distributions. Proc. ACM International Conference on Supercomputing , July 1994.

Digital Library

[11]

K. Kennedy, N. Nedeljkovic, and A. Sethi. A linear-time algorithm for computing the memory access sequence in data-parallel programs. Proc. of Fifth ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming , Santa Barbara, CA, July 1995.

Digital Library

[12]

C. Koelbel. Compile-time generation of communication for scientific programs. In Supercomputing '91 , pp. 101-110, Nov. 1991.

Digital Library

[13]

T. MacDonald, D. Pase, and A. Meltzer. Addressing in Cray Research's MPP Fortran. Proceedings of the Third Workshop on Compilers for Parallel Computers (Vienna, Austria, July 1992), Austrian Center for Parallel Computation, pp. 161-172.

[14]

S. Midkiff. Local iteration set computation for block-cyclic distributions. Proc. International Conference on Parallel Processing , Vol. II, pp. 77-84, Aug. 1995.

[15]

J. Ramanujam. Non-unimodular transformations of nested loops. Proc. Supercomputing 92 , pp. 214-223, Nov. 1992.

[16]

J. Ramanujam. Beyond unimodular transformations. J. Supercomputing 9 (4), 365-389, 1995.

Digital Library

[17]

C. van Reeuwijk, H. J. Sips, W. Denissen, and E. M. Paalvast. Implementing HPF distributed arrays on a message-passing parallel computer system. CP Technical Report series, TR9506, Delft University of Technology, 1995.

[18]

J. M. Stichnoth. Efficient compilation of array statements for private memory multicomputers. Technical Report CMU-CS-93-109, School of Computer Science, Carnegie Mellon University, Feb. 1993.

Digital Library

[19]

J. M. Stichnoth, D. O'Hallaron, and T. Gross. Generating communication for array statements: Design, implementation, and evaluation. J. Parallel Distrib. Comput. 21 150-159, 1994.

Digital Library

[20]

A. Thirumalai. Code generation and optimization for High Performance Fortran. M.S. thesis, Department of Electrical and Computer Engineering, Louisiana State University, Aug. 1995.

[21]

A. Thirumalai and J. Ramanujam. Code generation and optimization for array statements in HPF. Technical Report, Louisiana State University, Nov. 1994, revised Aug. 1995.

[22]

A. Thirumalai and J. Ramanujam. Address sequence determination using closed forms for basis vectors of the array section lattice. Technical Report, Louisiana State University, Apr. 1995.

[23]

A. Thirumalai and J. Ramanujam. Fast address sequence generation for data-parallel programs using integer lattices. In Languages and Compilers for Parallel Computing (C.-H. Huang et al. , Eds.), Lecture Notes in Computer Science, Vol. 1033, pp. 191-208, Springer-Verlag, Berlin/New York, 1996.

[24]

A. Thirumalai, J. Ramanujam, and A. Venkatachar. Communication generation and optimization for HPF. In Languages, Compilers, and Run-Time Systems for Scalable Computers (B. Szymanski and B. Sinharoy, Eds.), Chap. 29, pp. 311-316, Kluwer Academic, Norwell, MA, 1995.

[25]

A. Venkatachar, J. Ramanujam, and A. Thirumalai. Generalized overlap regions for communication optimization in data-parallel programs. To appear in Languages and Compilers for Parallel Computing (U. Banerjee et al. , Eds.) (Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing, August 1996), Lecture Notes in Computer Science, Springer-Verlag, Berlin/New York, 1997.

[26]

H. A. G. Wijshoff. Data organization in parallel computers . Kluwer Academic, Norwell, MA, 1989.

[27]

M. Wolfe. High performance compilers for parallel computing . Addison-Wesley, Redwood City, CA, 1996.

Digital Library

Cited By

Tseng EGaudiot J(2005)Automatic array partitioning based on the Smith normal formInternational Journal of Parallel Programming10.1007/s10766-004-1460-233:1(35-56)Online publication date: 1-Feb-2005
https://dl.acm.org/doi/10.1007/s10766-004-1460-2
Huang TShiu L(2003)Efficient communication sets generation for block-cyclic distribution on distributed-memory machinesJournal of Systems Architecture: the EUROMICRO Journal10.1016/S1383-7621(03)00002-X48:8-10(255-265)Online publication date: 1-Mar-2003
https://dl.acm.org/doi/10.1016/S1383-7621%2803%2900002-X
Huang TShiu LLin Y(2003)Generating efficient local memory access sequences for coupled subscripts in data-parallel programsInformation Sciences: an International Journal10.1016/S0020-0255(02)00278-5149:4(249-261)Online publication date: 1-Feb-2003
https://dl.acm.org/doi/10.1016/S0020-0255%2802%2900278-5
Show More Cited By

Index Terms

Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Efficient Address Generation for Affine Subscripts in Data-Parallel Programs

Address generation for compiling programs, written in HPF, to executable SPMD code is an important and necessary phase in a parallelizing compiler. This paper presents an efficient compilation technique to generate the local memory access sequences for ...
Generating efficient local memory access sequences for coupled subscripts in data-parallel programs

Generating the local memory access sequences is an integral part of compiling a data-parallel program into an SPMD code. Most previous research into local memory access sequences have focused on one-dimensional arrays distributed with CYCLIC(k) ...
Efficient Generation of Parallel Quasirandom Faure Sequences Via Scrambling
ICCS '07: Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007

Much of the recent work on parallelizing quasi-Monte Carlo methods has been aimed at splitting a quasirandom sequence into many subsequences which are then used independently on the various parallel processes. This method works well for the ...

Comments

Information & Contributors

Information

Published In

Copyright © Academic Press.

Publisher

Academic Press, Inc.

United States

Publication History

Published: 01 November 1996

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tseng EGaudiot J(2005)Automatic array partitioning based on the Smith normal formInternational Journal of Parallel Programming10.1007/s10766-004-1460-233:1(35-56)Online publication date: 1-Feb-2005
https://dl.acm.org/doi/10.1007/s10766-004-1460-2
Huang TShiu L(2003)Efficient communication sets generation for block-cyclic distribution on distributed-memory machinesJournal of Systems Architecture: the EUROMICRO Journal10.1016/S1383-7621(03)00002-X48:8-10(255-265)Online publication date: 1-Mar-2003
https://dl.acm.org/doi/10.1016/S1383-7621%2803%2900002-X
Huang TShiu LLin Y(2003)Generating efficient local memory access sequences for coupled subscripts in data-parallel programsInformation Sciences: an International Journal10.1016/S0020-0255(02)00278-5149:4(249-261)Online publication date: 1-Feb-2003
https://dl.acm.org/doi/10.1016/S0020-0255%2802%2900278-5
Lee PChen W(2002)Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computersParallel Computing10.1016/S0167-8191(02)00113-828:9(1329-1368)Online publication date: 1-Sep-2002
https://dl.acm.org/doi/10.1016/S0167-8191%2802%2900113-8
Ramanujam J(2001)Integer lattice based methods for local address generation for block-cyclic distributionsCompiler optimizations for scalable parallel systems10.5555/380466.380483(597-645)Online publication date: 1-Jun-2001
https://dl.acm.org/doi/10.5555/380466.380483
Joisha PBanerjee P(2001)The Efficient Computation of Ownership Sets in HPFIEEE Transactions on Parallel and Distributed Systems10.1109/71.94665012:8(769-788)Online publication date: 1-Aug-2001
https://dl.acm.org/doi/10.1109/71.946650
Hsu CBai SChung YYang C(2000)A Generalized Basic-Cycle Calculation Method for Efficient Array RedistributionIEEE Transactions on Parallel and Distributed Systems10.1109/71.89578911:12(1201-1216)Online publication date: 1-Dec-2000
https://dl.acm.org/doi/10.1109/71.895789
Shih KSheu JChang C(2000)Efficient Address Generation for Affine Subscripts in Data-Parallel ProgramsThe Journal of Supercomputing10.1023/A:100819060607917:2(205-227)Online publication date: 1-Sep-2000
https://dl.acm.org/doi/10.1023/A%3A1008190606079
Hsu CChung YDow C(2000)Efficient Methods for Multi-Dimensional Array RedistributionThe Journal of Supercomputing10.1023/A:100816762115417:1(23-46)Online publication date: 1-Aug-2000
https://dl.acm.org/doi/10.1023/A%3A1008167621154
Shih KSheu JHuang CChang C(2000)Efficient Index Generation for Compiling Two-Level Mappings in Data-Parallel ProgramsJournal of Parallel and Distributed Computing10.1006/jpdc.1999.160160:2(189-216)Online publication date: 1-Feb-2000
https://dl.acm.org/doi/10.1006/jpdc.1999.1601
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents