Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors

Published: 01 November 1996 Publication History

Abstract

Arrays are mapped to processors through a two-step process alignment followed by distribution in data-parallel languages such as High Performance Fortran. This process of mapping creates disjoint pieces of the array that are locally owned by each processor. An HPF compiler that generates code for array statements must compute the sequence of local memory addresses accessed by each processor and the sequence of sends and receives for a given processor to access nonlocal data. In this paper, we present an approach to the address sequence generation problem using the theory of integer lattices. The set of elements referenced can be generated by integer linear combinations of basis vectors. Unlike other work on this problem, we derive closed form expressions for the basis vectors as a function of the mapping of data. Using these basis vectors and exploiting the fact that there is a repeating pattern in the access sequence, we derive highly optimized code that generates the pattern at runtime. The code generated uses table-lookup of the pattern. Experimental results show that our approach is faster than other solutions to this problem.

References

[1]
A. Ancourt, F. Coelho, F. Irigoin, and R. Keryell. A linear algebra framework for static HPF code distribution. Proc. of the 4th Workshop on Compilers for Parallel Computers . Delft, The Netherlands, Dec. 1993.
[2]
S. Benkner. Handling block-cyclic distributed arrays in Vienna Fortran 90. Proc. International Conference on Parallel Architectures and Compilation Techniques . Limassol, Cyprus, June 1995.
[3]
B. Chapman, P. Mehrotra, and H. Zima. Programming in Vienna Fortran. Sci. Programming 1 (1), 31-50, 1992.
[4]
S. Chatterjee, J. Gilbert, F. Long, R. Schreiber, and S. Teng. Generating local addresses and communication sets for data parallel programs. J. Parallel Distrib. Comput. 26 , 72-84, 1995.
[5]
G. Dantzig and B. Eaves. Fourier-Motzkin elimination and its dual. J. Combin. Theory A 14 , 288-297, 1973.
[6]
G. Fox, S. Hiranandani, K. Kennedy, C Koelbel, U. Kremer, C.-W. Tseng, and M. Wu. Fortran D language specification. Technical Report TR-91-170, Dept. of Computer Science, Rice University, Dec. 1991.
[7]
P. M. Gruber and C. G. Lekkerkerker. Geometry of numbers. North-Holland Mathematical Library, 2nd ed., Volume 37, North-Holland, Amsterdam, 1987.
[8]
S. Gupta, S. Kaushik, C. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. J. Parallel Distrib. Comput. , to appear.
[9]
High Performance Fortran Forum. High Performance Fortran language specification. Sci. Programming 2 (1-2), 1-170, 1993.
[10]
S. Hiranandani, K. Kennedy, J. Mellor-Crummey, and A. Sethi. Compilation techniques for block-cyclic distributions. Proc. ACM International Conference on Supercomputing , July 1994.
[11]
K. Kennedy, N. Nedeljkovic, and A. Sethi. A linear-time algorithm for computing the memory access sequence in data-parallel programs. Proc. of Fifth ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming , Santa Barbara, CA, July 1995.
[12]
C. Koelbel. Compile-time generation of communication for scientific programs. In Supercomputing '91 , pp. 101-110, Nov. 1991.
[13]
T. MacDonald, D. Pase, and A. Meltzer. Addressing in Cray Research's MPP Fortran. Proceedings of the Third Workshop on Compilers for Parallel Computers (Vienna, Austria, July 1992), Austrian Center for Parallel Computation, pp. 161-172.
[14]
S. Midkiff. Local iteration set computation for block-cyclic distributions. Proc. International Conference on Parallel Processing , Vol. II, pp. 77-84, Aug. 1995.
[15]
J. Ramanujam. Non-unimodular transformations of nested loops. Proc. Supercomputing 92 , pp. 214-223, Nov. 1992.
[16]
J. Ramanujam. Beyond unimodular transformations. J. Supercomputing 9 (4), 365-389, 1995.
[17]
C. van Reeuwijk, H. J. Sips, W. Denissen, and E. M. Paalvast. Implementing HPF distributed arrays on a message-passing parallel computer system. CP Technical Report series, TR9506, Delft University of Technology, 1995.
[18]
J. M. Stichnoth. Efficient compilation of array statements for private memory multicomputers. Technical Report CMU-CS-93-109, School of Computer Science, Carnegie Mellon University, Feb. 1993.
[19]
J. M. Stichnoth, D. O'Hallaron, and T. Gross. Generating communication for array statements: Design, implementation, and evaluation. J. Parallel Distrib. Comput. 21 150-159, 1994.
[20]
A. Thirumalai. Code generation and optimization for High Performance Fortran. M.S. thesis, Department of Electrical and Computer Engineering, Louisiana State University, Aug. 1995.
[21]
A. Thirumalai and J. Ramanujam. Code generation and optimization for array statements in HPF. Technical Report, Louisiana State University, Nov. 1994, revised Aug. 1995.
[22]
A. Thirumalai and J. Ramanujam. Address sequence determination using closed forms for basis vectors of the array section lattice. Technical Report, Louisiana State University, Apr. 1995.
[23]
A. Thirumalai and J. Ramanujam. Fast address sequence generation for data-parallel programs using integer lattices. In Languages and Compilers for Parallel Computing (C.-H. Huang et al. , Eds.), Lecture Notes in Computer Science, Vol. 1033, pp. 191-208, Springer-Verlag, Berlin/New York, 1996.
[24]
A. Thirumalai, J. Ramanujam, and A. Venkatachar. Communication generation and optimization for HPF. In Languages, Compilers, and Run-Time Systems for Scalable Computers (B. Szymanski and B. Sinharoy, Eds.), Chap. 29, pp. 311-316, Kluwer Academic, Norwell, MA, 1995.
[25]
A. Venkatachar, J. Ramanujam, and A. Thirumalai. Generalized overlap regions for communication optimization in data-parallel programs. To appear in Languages and Compilers for Parallel Computing (U. Banerjee et al. , Eds.) (Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing, August 1996), Lecture Notes in Computer Science, Springer-Verlag, Berlin/New York, 1997.
[26]
H. A. G. Wijshoff. Data organization in parallel computers . Kluwer Academic, Norwell, MA, 1989.
[27]
M. Wolfe. High performance compilers for parallel computing . Addison-Wesley, Redwood City, CA, 1996.

Cited By

View all

Index Terms

  1. Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    Publisher

    Academic Press, Inc.

    United States

    Publication History

    Published: 01 November 1996

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2005)Automatic array partitioning based on the Smith normal formInternational Journal of Parallel Programming10.1007/s10766-004-1460-233:1(35-56)Online publication date: 1-Feb-2005
    • (2003)Efficient communication sets generation for block-cyclic distribution on distributed-memory machinesJournal of Systems Architecture: the EUROMICRO Journal10.1016/S1383-7621(03)00002-X48:8-10(255-265)Online publication date: 1-Mar-2003
    • (2003)Generating efficient local memory access sequences for coupled subscripts in data-parallel programsInformation Sciences: an International Journal10.1016/S0020-0255(02)00278-5149:4(249-261)Online publication date: 1-Feb-2003
    • (2002)Generating communication sets of array assignment statements for block-cyclic distribution on distributed memory parallel computersParallel Computing10.1016/S0167-8191(02)00113-828:9(1329-1368)Online publication date: 1-Sep-2002
    • (2001)Integer lattice based methods for local address generation for block-cyclic distributionsCompiler optimizations for scalable parallel systems10.5555/380466.380483(597-645)Online publication date: 1-Jun-2001
    • (2001)The Efficient Computation of Ownership Sets in HPFIEEE Transactions on Parallel and Distributed Systems10.1109/71.94665012:8(769-788)Online publication date: 1-Aug-2001
    • (2000)A Generalized Basic-Cycle Calculation Method for Efficient Array RedistributionIEEE Transactions on Parallel and Distributed Systems10.1109/71.89578911:12(1201-1216)Online publication date: 1-Dec-2000
    • (2000)Efficient Address Generation for Affine Subscripts in Data-Parallel ProgramsThe Journal of Supercomputing10.1023/A:100819060607917:2(205-227)Online publication date: 1-Sep-2000
    • (2000)Efficient Methods for Multi-Dimensional Array RedistributionThe Journal of Supercomputing10.1023/A:100816762115417:1(23-46)Online publication date: 1-Aug-2000
    • (2000)Efficient Index Generation for Compiling Two-Level Mappings in Data-Parallel ProgramsJournal of Parallel and Distributed Computing10.1006/jpdc.1999.160160:2(189-216)Online publication date: 1-Feb-2000
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media