Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Efficient Methods for Multi-Dimensional Array Redistribution

Published: 01 August 2000 Publication History
  • Get Citation Alerts
  • Abstract

    In many scientific applications, array redistribution is usually required to enhance data locality and reduce remote memory access on distributed memory multicomputers. Since the redistribution is performed at run-time, there is a performance tradeoff between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient methods for multi-dimensional array redistribution. Based on the previous work, the basic-cycle calculation technique, we present a basic-block calculation (BBC) and a complete-dimension calculation (CDC) techniques. We also developed a theoretical model to analyze the computation costs of these two techniques. The theoretical model shows that the BBC method has smaller indexing costs and performs well for the redistribution with small array size. The CDC method has smaller packing/unpacking costs and performs well when array size is large. When implemented these two techniques on an IBM SP2 parallel machine along with the PITFALLS method and the Prylli's method, the experimental results show that the BBC method has the smallest execution time of these four algorithms when the array size is small. The CDC method has the smallest execution time of these four algorithms when the array size is large.

    References

    [1]
    1. S. Benkner. Handling block-cyclic distribution arrays in Vienna Fortran 90. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, pp. 224-253, June 1995.
    [2]
    2. B. Chapman, P. Mehrotra, H. Moritsch, and H. Zima. Dynamic data distribution in Vienna Fortran. In Proceedings of Supercomputing '93, pp. 284-293, November 1993.
    [3]
    3. S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, and S.-H. Teng. Generating local address and communication sets for data parallel programs. Journal of Parallel and Distributed Computing, 26:72-84, 1995.
    [4]
    4. Y.-C. Chung, C.-H. Hsu, and S.-W. Bai. A basic-cycle calculation technique for efficient dynamic data redistribution. IEEE Transactions on Parallel and Distributed Systems, 9(4):359-377, April 1998.
    [5]
    5. F. Desprez, J. Dongarra, and A. Petitet. Scheduling block-cyclic array redistribution. IEEE Transactions on Parallel and Distributed Systems, 9(2):192-205, February 1998.
    [6]
    6. G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M. Wu. Fortran-D language specification. Technical Report TR-91-170, Department of Computer Science, Rice University, December 1991.
    [7]
    7. S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. On the generation of efficient data communication for distributed-memory machines. Proceedings of International Computing Symposium , pp. 504-513, 1992.
    [8]
    8. S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. Journal of Parallel and Distributed Computing, 32:155-172, 1996.
    [9]
    9. High Performance Fortran Forum. High performance Fortran language specification, version 1.1, Rice University, November 1994.
    [10]
    10. S. Hiranandani, K. Kennedy, J. Mellor-Crammey, and A. Sethi. Compilation technique for block-cyclic distribution. In Proceedings of the ACM International Conference on Supercomputing, pp. 392-403, July 1994.
    [11]
    11. C.-H. Hsu and Y.-C. Chung. Efficient methods for krr and rkr array redistribution. The Journal of Supercomputing, 12(2):253-276, May 1998.
    [12]
    12. E. T. Kalns and L. M. Ni. Processor mapping technique toward efficient data redistribution. IEEE Transactions on Parallel and Distributed Systems, 6(12):1234-1247, December 1995.
    [13]
    13. L. M. Ni, H. Xu, and E. T. Kalns. Issues in scalable library design for massively parallel computers. In Supercomputing '93, pp. 181-190, November 1993.
    [14]
    14. S. D. Kaushik, C. H. Huang, R. W. Johnson, and P. Sadayappan. An approach to communication efficient data redistribution. In Proceedings of the International Conference on Supercomputing, pp. 364-373, July 1994.
    [15]
    15. S. D. Kaushik, C. H. Huang, J. Ramanujam, and P. Sadayappan. Multiphase array redistribution: modeling and evaluation. In Proceedings of the International Parallel Processing Symposium, pp. 441-445, 1995.
    [16]
    16. S. D. Kaushik, C. H. Huang, and P. Sadayappan. Efficient index set generation for compiling HPF array statements on distributed-memory machines. Journal of Parallel and Distributed Computing, 38:237-247, 1996.
    [17]
    17. K. Kennedy, N. Nedeljkovic, and A. Sethi. Efficient address generation for block-cyclic distribution. In Proceedings of the International Conference on Supercomputing, Barcelona, pp. 180-184, July 1995.
    [18]
    18. P.-Z. Lee and W. Y. Chen. Compiler techniques for determining data distribution and generating communication sets on distributed-memory multicomputers. In 29th IEEE Hawaii International Conference on System Sciences, Maui, Hawaii, pp. 537-546, January 1996.
    [19]
    19. Y. W. Lim, P. B. Bhat, and V. K. Prasanna. Efficient algorithms for block-cyclic redistribution of arrays. In Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, pp. 74-83, 1996.
    [20]
    20. Y. W. Lim, N. Park, and V. K. Prasanna. Efficient algorithms for multi-dimensional block-cyclic redistribution of arrays. In Proceedings of the 26th International Conference on Parallel Processing, pp. 234-241, 1997.
    [21]
    21. L. Prylli and B. Touranchean. Fast runtime block cyclic data redistribution on multiprocessors. Journal of Parallel and Distributed Computing, 45:63-72, August 1997.
    [22]
    22. S. Ramaswamy and P. Banerjee. Automatic generation of efficient array redistribution routines for distributed memory multicomputers. In Frontier '95: The Fifth Symposium on the Frontiers of Massively Parallel Computation, McLean, Va., pp. 342-349, February 1995.
    [23]
    23. S. Ramaswamy, B. Simons, and P. Banerjee. Optimization for efficient array redistribution on distributed memory multicomputers. Journal of Parallel and Distributed Computing, 38:217-228, 1996.
    [24]
    24. J. M. Stichnoth, D. O'Hallaron, and T. R. Gross. Generating communication for array statements: design, implementation, and evaluation. Journal of Parallel and Distributed Computing, 21:150-159, 1994.
    [25]
    25. R. Thakur, A. Choudhary, and G. Fox. Runtime array redistribution in HPF programs. In Proceedings of the 1994 Scalable High Performance Computing Conference, pp. 309-316, May 1994.
    [26]
    26. R. Thakur, A. Choudhary, and J. Ramanujam. Efficient algorithms for array redistribution. IEEE Transactions on Parallel and Distributed Systems, 7(6):587-594, June 1996.
    [27]
    27. A. Thirumalai and J. Ramanujam. Efficient computation of address sequence in data parallel programs using closed forms for basis vectors. Journal of Parallel and Distributed Computing, 38:188-203, 1996.
    [28]
    28. V. Van Dongen, C. Bonello, and C. Freehill. High performance C--language specification version 0.8.9. Technical Report CRIM-EPPP-94/04-12, 1994.
    [29]
    29. C. Van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM, Philadelphia, Pa., 1992.
    [30]
    30. D. W. Walker and S. W. Otto. Redistribution of BLOCK-CYCLIC data distributions using MPI. Concurrency: Practice and Experience, 8(9):707-728, November 1996.
    [31]
    31. A. Wakatani and M. Wolfe. A new approach to array redistribution: strip mining redistribution. In Proceeding of Parallel Architectures and Languages Europe, July 1994.
    [32]
    32. A. Wakatani and M. Wolfe. Optimization of array redistribution for distributed memory multicomputers (short communication). Parallel Computing, 21(9):1485-1490, September 1995.
    [33]
    33. H. Zima, P. Brezany, B. Chapman, P. Mehrotra, and A. Schwald. Vienna Fortran--a language specification version 1.1. ICASE Interim Report 21, ICASE NASA Langley Research Center, Hampton, Va., March 1992.

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image The Journal of Supercomputing
    The Journal of Supercomputing  Volume 17, Issue 1
    Aug. 2000
    105 pages
    ISSN:0920-8542
    Issue’s Table of Contents

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 August 2000

    Author Tags

    1. array redistribution
    2. distributed memory multicomputers
    3. the basic-block calculation technique
    4. the complete-dimension calculation technique

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 29 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse MatrixThe Journal of Supercomputing10.1023/B:SUPE.0000026846.74050.1829:2(125-143)Online publication date: 1-Jun-2019
    • (2018)Sparse Matrix Block-Cyclic Realignment on Distributed Memory MachinesThe Journal of Supercomputing10.1007/s11227-005-0247-633:3(175-196)Online publication date: 30-Dec-2018
    • (2006)Memory efficient parallel matrix multiplication operation for irregular problemsProceedings of the 3rd conference on Computing frontiers10.1145/1128022.1128054(229-240)Online publication date: 3-May-2006
    • (2006)Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing CompilersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2006.16217:11(1226-1241)Online publication date: 1-Nov-2006
    • (2003)A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrixProceedings of the 2003 international conference on Parallel and distributed processing and applications10.5555/1761566.1761577(53-64)Online publication date: 2-Jul-2003

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media