article

Efficient Methods for Multi-Dimensional Array Redistribution

Authors:

Ching-Hsien Hsu,

Yeh-Ching Chung,

Chyi-Ren DowAuthors Info & Claims

The Journal of Supercomputing, Volume 17, Issue 1

Pages 23 - 46

https://doi.org/10.1023/A:1008167621154

Published: 01 August 2000 Publication History

Abstract

In many scientific applications, array redistribution is usually required to enhance data locality and reduce remote memory access on distributed memory multicomputers. Since the redistribution is performed at run-time, there is a performance tradeoff between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present efficient methods for multi-dimensional array redistribution. Based on the previous work, the basic-cycle calculation technique, we present a basic-block calculation (BBC) and a complete-dimension calculation (CDC) techniques. We also developed a theoretical model to analyze the computation costs of these two techniques. The theoretical model shows that the BBC method has smaller indexing costs and performs well for the redistribution with small array size. The CDC method has smaller packing/unpacking costs and performs well when array size is large. When implemented these two techniques on an IBM SP2 parallel machine along with the PITFALLS method and the Prylli's method, the experimental results show that the BBC method has the smallest execution time of these four algorithms when the array size is small. The CDC method has the smallest execution time of these four algorithms when the array size is large.

References

[1]

1. S. Benkner. Handling block-cyclic distribution arrays in Vienna Fortran 90. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, pp. 224-253, June 1995.

Digital Library

[2]

2. B. Chapman, P. Mehrotra, H. Moritsch, and H. Zima. Dynamic data distribution in Vienna Fortran. In Proceedings of Supercomputing '93, pp. 284-293, November 1993.

Digital Library

[3]

3. S. Chatterjee, J. R. Gilbert, F. J. E. Long, R. Schreiber, and S.-H. Teng. Generating local address and communication sets for data parallel programs. Journal of Parallel and Distributed Computing, 26:72-84, 1995.

Digital Library

[4]

4. Y.-C. Chung, C.-H. Hsu, and S.-W. Bai. A basic-cycle calculation technique for efficient dynamic data redistribution. IEEE Transactions on Parallel and Distributed Systems, 9(4):359-377, April 1998.

Digital Library

[5]

5. F. Desprez, J. Dongarra, and A. Petitet. Scheduling block-cyclic array redistribution. IEEE Transactions on Parallel and Distributed Systems, 9(2):192-205, February 1998.

Digital Library

[6]

6. G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M. Wu. Fortran-D language specification. Technical Report TR-91-170, Department of Computer Science, Rice University, December 1991.

[7]

7. S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. On the generation of efficient data communication for distributed-memory machines. Proceedings of International Computing Symposium , pp. 504-513, 1992.

[8]

8. S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, and P. Sadayappan. On compiling array expressions for efficient execution on distributed-memory machines. Journal of Parallel and Distributed Computing, 32:155-172, 1996.

Digital Library

[9]

9. High Performance Fortran Forum. High performance Fortran language specification, version 1.1, Rice University, November 1994.

[10]

10. S. Hiranandani, K. Kennedy, J. Mellor-Crammey, and A. Sethi. Compilation technique for block-cyclic distribution. In Proceedings of the ACM International Conference on Supercomputing, pp. 392-403, July 1994.

Digital Library

[11]

11. C.-H. Hsu and Y.-C. Chung. Efficient methods for kr → r and r → kr array redistribution. The Journal of Supercomputing, 12(2):253-276, May 1998.

Digital Library

[12]

12. E. T. Kalns and L. M. Ni. Processor mapping technique toward efficient data redistribution. IEEE Transactions on Parallel and Distributed Systems, 6(12):1234-1247, December 1995.

Digital Library

[13]

13. L. M. Ni, H. Xu, and E. T. Kalns. Issues in scalable library design for massively parallel computers. In Supercomputing '93, pp. 181-190, November 1993.

Digital Library

[14]

14. S. D. Kaushik, C. H. Huang, R. W. Johnson, and P. Sadayappan. An approach to communication efficient data redistribution. In Proceedings of the International Conference on Supercomputing, pp. 364-373, July 1994.

Digital Library

[15]

15. S. D. Kaushik, C. H. Huang, J. Ramanujam, and P. Sadayappan. Multiphase array redistribution: modeling and evaluation. In Proceedings of the International Parallel Processing Symposium, pp. 441-445, 1995.

Digital Library

[16]

16. S. D. Kaushik, C. H. Huang, and P. Sadayappan. Efficient index set generation for compiling HPF array statements on distributed-memory machines. Journal of Parallel and Distributed Computing, 38:237-247, 1996.

Digital Library

[17]

17. K. Kennedy, N. Nedeljkovic, and A. Sethi. Efficient address generation for block-cyclic distribution. In Proceedings of the International Conference on Supercomputing, Barcelona, pp. 180-184, July 1995.

Digital Library

[18]

18. P.-Z. Lee and W. Y. Chen. Compiler techniques for determining data distribution and generating communication sets on distributed-memory multicomputers. In 29th IEEE Hawaii International Conference on System Sciences, Maui, Hawaii, pp. 537-546, January 1996.

Digital Library

[19]

19. Y. W. Lim, P. B. Bhat, and V. K. Prasanna. Efficient algorithms for block-cyclic redistribution of arrays. In Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, pp. 74-83, 1996.

Digital Library

[20]

20. Y. W. Lim, N. Park, and V. K. Prasanna. Efficient algorithms for multi-dimensional block-cyclic redistribution of arrays. In Proceedings of the 26th International Conference on Parallel Processing, pp. 234-241, 1997.

Digital Library

[21]

21. L. Prylli and B. Touranchean. Fast runtime block cyclic data redistribution on multiprocessors. Journal of Parallel and Distributed Computing, 45:63-72, August 1997.

Digital Library

[22]

22. S. Ramaswamy and P. Banerjee. Automatic generation of efficient array redistribution routines for distributed memory multicomputers. In Frontier '95: The Fifth Symposium on the Frontiers of Massively Parallel Computation, McLean, Va., pp. 342-349, February 1995.

Digital Library

[23]

23. S. Ramaswamy, B. Simons, and P. Banerjee. Optimization for efficient array redistribution on distributed memory multicomputers. Journal of Parallel and Distributed Computing, 38:217-228, 1996.

Digital Library

[24]

24. J. M. Stichnoth, D. O'Hallaron, and T. R. Gross. Generating communication for array statements: design, implementation, and evaluation. Journal of Parallel and Distributed Computing, 21:150-159, 1994.

Digital Library

[25]

25. R. Thakur, A. Choudhary, and G. Fox. Runtime array redistribution in HPF programs. In Proceedings of the 1994 Scalable High Performance Computing Conference, pp. 309-316, May 1994.

[26]

26. R. Thakur, A. Choudhary, and J. Ramanujam. Efficient algorithms for array redistribution. IEEE Transactions on Parallel and Distributed Systems, 7(6):587-594, June 1996.

Digital Library

[27]

27. A. Thirumalai and J. Ramanujam. Efficient computation of address sequence in data parallel programs using closed forms for basis vectors. Journal of Parallel and Distributed Computing, 38:188-203, 1996.

Digital Library

[28]

28. V. Van Dongen, C. Bonello, and C. Freehill. High performance C--language specification version 0.8.9. Technical Report CRIM-EPPP-94/04-12, 1994.

[29]

29. C. Van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM, Philadelphia, Pa., 1992.

Digital Library

[30]

30. D. W. Walker and S. W. Otto. Redistribution of BLOCK-CYCLIC data distributions using MPI. Concurrency: Practice and Experience, 8(9):707-728, November 1996.

[31]

31. A. Wakatani and M. Wolfe. A new approach to array redistribution: strip mining redistribution. In Proceeding of Parallel Architectures and Languages Europe, July 1994.

Digital Library

[32]

32. A. Wakatani and M. Wolfe. Optimization of array redistribution for distributed memory multicomputers (short communication). Parallel Computing, 21(9):1485-1490, September 1995.

Digital Library

[33]

33. H. Zima, P. Brezany, B. Chapman, P. Mehrotra, and A. Schwald. Vienna Fortran--a language specification version 1.1. ICASE Interim Report 21, ICASE NASA Langley Research Center, Hampton, Va., March 1992.

Cited By

Hsu CYu K(2019)A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse MatrixThe Journal of Supercomputing10.1023/B:SUPE.0000026846.74050.1829:2(125-143)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1023/B%3ASUPE.0000026846.74050.18
Hsu C(2018)Sparse Matrix Block-Cyclic Realignment on Distributed Memory MachinesThe Journal of Supercomputing10.1007/s11227-005-0247-633:3(175-196)Online publication date: 30-Dec-2018
https://dl.acm.org/doi/10.1007/s11227-005-0247-6
Krishnan MNieplocha JAlderighi MSalapura VMcKee S(2006)Memory efficient parallel matrix multiplication operation for irregular problemsProceedings of the 3rd conference on Computing frontiers10.1145/1128022.1128054(229-240)Online publication date: 3-May-2006
https://dl.acm.org/doi/10.1145/1128022.1128054
Show More Cited By

Index Terms

Efficient Methods for Multi-Dimensional Array Redistribution

Recommendations

Efficient Methods for kr → r and r → kr Array Redistribution1

Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance tradeoff between the efficiency of new data decomposition for ...
Message Encoding Techniques for Efficient Arrary Redistribution
ICPP '97: Proceedings of the international Conference on Parallel Processing

In this paper, we present message encoding techniques to improve the performance of BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) (and vice versa) array redistribution algorithms. The message encoding techniques are machine independent and could be used with ...
Essential Cycle Calculation Method for Irregular Array Redistribution

In many parallel programs, run-time array redistribution is usually required to enhance data locality and reduce remote memory access on the distributed memory multicomputers. In general, array distribution can be classified into regular distribution ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing

The Journal of Supercomputing Volume 17, Issue 1

Aug. 2000

105 pages

ISSN:0920-8542

Editor:
Hamid Arabnia
Univ. of Georgia, Atlanta, GA

Issue’s Table of Contents

Copyright © Copyright © 2000 Kluwer Academic Publishers.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 August 2000

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hsu CYu K(2019)A Compressed Diagonals Remapping Technique for Dynamic Data Redistribution on Banded Sparse MatrixThe Journal of Supercomputing10.1023/B:SUPE.0000026846.74050.1829:2(125-143)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1023/B%3ASUPE.0000026846.74050.18
Hsu C(2018)Sparse Matrix Block-Cyclic Realignment on Distributed Memory MachinesThe Journal of Supercomputing10.1007/s11227-005-0247-633:3(175-196)Online publication date: 30-Dec-2018
https://dl.acm.org/doi/10.1007/s11227-005-0247-6
Krishnan MNieplocha JAlderighi MSalapura VMcKee S(2006)Memory efficient parallel matrix multiplication operation for irregular problemsProceedings of the 3rd conference on Computing frontiers10.1145/1128022.1128054(229-240)Online publication date: 3-May-2006
https://dl.acm.org/doi/10.1145/1128022.1128054
Hsu CChen MYang CLi K(2006)Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing CompilersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2006.16217:11(1226-1241)Online publication date: 1-Nov-2006
https://dl.acm.org/doi/10.1109/TPDS.2006.162
Hsu CYu K(2003)A compressed diagonals remapping technique for dynamic data redistribution on banded sparse matrixProceedings of the 2003 international conference on Parallel and distributed processing and applications10.5555/1761566.1761577(53-64)Online publication date: 2-Jul-2003
https://dl.acm.org/doi/10.5555/1761566.1761577

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents