Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Generating Local Addresses and Communication Sets for Data-Parallel Programs

Published: 01 April 1995 Publication History
  • Get Citation Alerts
  • Abstract

    Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance Fortran. We demonstrate a storage scheme for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution that does not waste any storage, and show that, under this storage scheme, the local memory access sequence of any processor for a computation involving the regular section A( :h:s) is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and we extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little runtime overhead and acceptable preprocessing time.

    References

    [1]
    C. Ancourt, F. Coelho, F. Irigoin, and R. Keryell, A linear algebra framework for static HPF code distribution. In Workshop on Compilers for Parallel Computers , Delft, Dec. 1993.
    [2]
    D. H. Bailey, Unfavorable strides in cache memory systems. RNR technical report RNR-92-015, Numerical Aerodynamic Simulation Systems Division, NASA Ames Research Center, Moffett Field, CA, May 1992.
    [3]
    J. J. Dongarra, Performance of various computers using standard linear equations software. Comput. Architect. News 20 (3), 22-44 (June 1992).
    [4]
    J. J. Dongarra, R. van de Geijn, and D. W. Walker, A look at scalable dense linear algebra libraries. In Proceedings of the Scalable High Performance Computing Conference . Williamsburg, VA, Apr. 1992. IEEE Computer Society Press, Los Alamitos, CA, pp. 372-379. Also available as technical report ORNL/TM-12126 from Oak Ridge National Laboratory.
    [5]
    G. C. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M.-Y. Wu, Fortran D language specification. Technical report Rice COMP TR90-141, Department of Computer Science, Rice University, Houston, TX, Dec. 1990.
    [6]
    S. K. S. Gupta, S. D. Kaushik, S. Mufti, S. Sharma, C.-H. Huang, and P. Sadayappan, On compiling array expressions for efficient execution on distributed-memory machines. In A. N. Choudhary and P. B. Berra (Eds.). Proceedings of the 1993 International Conference on Parallel Processing , CRC Press, Boca Raton, FL, 1993, Vol. II, pp. 301-305.
    [7]
    High Performance Fortran Forum, High Performance Fortran language specification version 1.0. Draft, Jan. 1993. Also available as technical report CRPC-TR 92225, Center for Research on Parallel Computation, Rice University.
    [8]
    D. E. Knuth, The Art of Computer Programming: Seminumerical Algorithms . Addison-Wesley, Reading, MA, 1981, 2nd ed., Vol. 2.
    [9]
    C. Koelbel, Compile-time generation of regular communication patterns. In Proceedings of Supercomputing '91 , Albuquerque, NM, Nov. 1991, pp. 101-110.
    [10]
    C. Koelbel and P. Mehrotra, Compiling global name-space parallel loops for distributed execution. IEEE Trans. Parallel Distribut. Systems 2 (4), 440-451 (Oct. 1991).
    [11]
    C. H. Koelbel, D. B. Loveman, R. S. Schreiber, G. L. Steele, Jr., and M. E. Zosel, The High Performance Fortran Handbook . Scientific and Engineering Computation. The MIT Press, Cambridge, MA, 1994.
    [12]
    T. MacDonald, D. Pase, and A. Meltzer, Addressing in Cray Research's MPP Fortran. In Proceedings of the Third Workshop on Compilers for Parallel Computers , Vienna, Austria, July 1992. Austrian Center for Parallel Computation, pp. 161-172.
    [13]
    R. Mirchandaney, J. H. Saltz, R. M. Smith, K. Crowley, and D. M. Nicol, Principles of runtime support for parallel processors. In Third International Conference on Supercomputing . ACM press, New York, 1988, pp. 140-152.
    [14]
    I. Niven and H. S. Zuckerman, An Introduction to the Theory of Numbers . Wiley, New York, 1964.
    [15]
    J. M. Sitchnoth, Efficient compilation of array statements for private memory multicomputers. Technical report CMU-CS-93-109, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Feb. 1993.

    Cited By

    View all
    • (2014)Theory and algorithm for generalized memory partitioning in high-level synthesisProceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays10.1145/2554688.2554780(199-208)Online publication date: 26-Feb-2014
    • (2013)Memory partitioning for multidimensional arrays in high-level synthesisProceedings of the 50th Annual Design Automation Conference10.1145/2463209.2488748(1-8)Online publication date: 29-May-2013
    • (2009)Slicing based code parallelization for minimizing inter-processor communicationProceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1629395.1629409(87-96)Online publication date: 11-Oct-2009
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Journal of Parallel and Distributed Computing
    Journal of Parallel and Distributed Computing  Volume 26, Issue 1
    April 1, 1995
    135 pages
    ISSN:0743-7315
    Issue’s Table of Contents

    Publisher

    Academic Press, Inc.

    United States

    Publication History

    Published: 01 April 1995

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Theory and algorithm for generalized memory partitioning in high-level synthesisProceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays10.1145/2554688.2554780(199-208)Online publication date: 26-Feb-2014
    • (2013)Memory partitioning for multidimensional arrays in high-level synthesisProceedings of the 50th Annual Design Automation Conference10.1145/2463209.2488748(1-8)Online publication date: 29-May-2013
    • (2009)Slicing based code parallelization for minimizing inter-processor communicationProceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1629395.1629409(87-96)Online publication date: 11-Oct-2009
    • (2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
    • (2006)Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing CompilersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2006.16217:11(1226-1241)Online publication date: 1-Nov-2006
    • (2006)An Efficient Communication Scheduling Method for the Processor Mapping Technique Applied Data RedistributionThe Journal of Supercomputing10.1007/s11227-006-6615-z37:3(297-318)Online publication date: 1-Sep-2006
    • (2005)Automatic array partitioning based on the Smith normal formInternational Journal of Parallel Programming10.1007/s10766-004-1460-233:1(35-56)Online publication date: 1-Feb-2005
    • (2004)Finite State Machine-Based Optimization of Data Parallel Regular Domain Problems Applied in Low-Level Image ProcessingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2004.5515:10(865-877)Online publication date: 1-Oct-2004
    • (2004)A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse MatrixThe Journal of Supercomputing10.1023/B:SUPE.0000026846.74050.1829:2(125-143)Online publication date: 1-Aug-2004
    • (2003)A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrixProceedings of the 2003 international conference on Parallel and distributed processing and applications10.5555/1761566.1761577(53-64)Online publication date: 2-Jul-2003
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media