research-article

Processor Mapping Techniques Toward Efficient Data Redistribution

Authors:

Edgar T. Kalns,

Lionel M. NiAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 6, Issue 12

Pages 1234 - 1247

https://doi.org/10.1109/71.476166

Published: 01 December 1995 Publication History

Abstract

Run-time data redistribution can enhance algorithm performance in distributed-memory machines. Explicit redistribution of data can be performed between algorithm phases when a different data decomposition is expected to deliver increased performance for a subsequent phase of computation. Redistribution, however, represents increased program overhead as algorithm computation is discontinued while data are exchanged among processor memories. In this paper, we present a technique that minimizes the amount of data exchange for BLOCK to CYCLIC(c) (or vice-versa) redistributions of arbitrary number of dimensions. Preserving the semantics of the target (destination) distribution pattern, the technique manipulates the data to logical processor mapping of the target pattern. When implemented on an IBM SP, the mapping technique demonstrates redistribution performance improvements of approximately 40% over traditional data to processor mapping. Relative to the traditional mapping technique, the proposed method affords greater flexibility in specifying precisely which data elements are redistributed and which elements remain on-processor.

References

[1]

High Performance Fortran Forum, “High performance Fortran language specification (version 1.0, draft),” May 1993.]]

[2]

G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M.-Y. Wu, “Fortran D language specification,” Tech. Rep. COMP TR90-141, Rice Univ., Dept. of Computer Science, Dec. 1990.]]

[3]

H. Zima, P. Brezany, B. Chapman, P. Mehrotra, and A. Schwald, “Vienna Fortran: A language specification (version 1.1),” 1991.]]

[4]

P. Mehrotra and J.V. Rosendale, Programming Distributed Memory Architectures Using Kali, ch. 19, pp. 364-384. MIT Press, 1991.]]

[5]

M. Rosing, R.B. Schnabel, and R.P. Weaver, “The DINO parallel programming language,” J. of Parallel and Distributed Computing, vol. 13, pp. 30-42, Sept. 1991.]]

Digital Library

[6]

M.W. Hall, S. Hiranandani, K. Kennedy, and C.-W. Tseng, “Interprocedural compilation of Fortran D for MIMD machines,” Proc. of Supercomputing’92, pp. 522-534, Nov. 1992.]]

Digital Library

[7]

M. Baber, “Hypertasking support for dynamically redistributable and resizeable arrays on the iPSC,” Proc. Sixth Distributed Memory Computing Conf., pp. 59-66, Apr. 1991.]]

[8]

B. Chapman, P. Mehrotra, H. Moritsch, and H. Zima, “Dynamic data distributions in Vienna Fortran,” Proc. of Supercomputing’93, pp. 284-293, Nov. 1993.]]

Digital Library

[9]

C.-T. Ho and S.L. Johnsson, “Distributed routing algorithms for broadcasting and personalized communication in hypercubes,” Proc. 1986 Int’l Conf. on Parallel Processing, pp. 640-648, Aug. 1986.]]

[10]

S. Gupta, S. Kaushik, S. Mufti, S. Sharma, C.-H. Huang, and P. Sadayappan, “On the generation of efficient data communications for distributed-memory machines,” Proc. 1992 Int’l Computer Symp., Taichung, Taiwan, pp. 504-513, Dec. 1992.]]

[11]

R. Thakur, A. Choudhary, and G. Fox, “Runtime array redistribution in HPF programs,” Proc. 1994 Scalable High Performance Computing Conf., pp. 309-316, May 1994.]]

[12]

J. Stichnoth, D. O’Hallaron, and T. Gross, “Generating communication for array statements: Design, implementation, and evaluation,” J. of Parallel and Distributed Computing, vol. 21, no. 1, pp. 150-159, 1994.]]

Digital Library

[13]

S. Ramaswamy and P. Banerjee, “Automatic generation of efficient array redistribution routines for distributed memory multicomputers,” Tech. Rep. CRHC-94-09, Center for Reliable and High Performance Computing, Computer Systems and Research Laboratory, Univ. of Illinois, 1994.]]

[14]

D.J. Palermo, E. Su, J.A. Chandy, and P. Banerjee, “Communication optimizations used in the PARADIGM compiler for distributed-memory multicomputers,” Proc. 1994 Int’l Conf. on Parallel Processing, vol. 2, pp. 1-10, Aug. 1994.]]

Digital Library

[15]

E.T. Kalns and L.M. Ni, “DaReL: A portable data redistribution library for distributed-memory machines,” Proc. 1994 Scalable Parallel Libraries Conf. 2, Oct. 1994.]]

[16]

J. Bruck, R. Cypher, C.-T. Ho, and S. Kipnis, “Efficient algorithms for the index operation in message-passing systems,” Tech. Rep. RJ 9300 (82230), IBM Research Division, Almaden Research Center, San Jose, Calif., and T.J. Watson Research Center, Yorktown Heights, N.Y., Apr. 1993.]]

[17]

P.K. McKinley, Y.-J. Tsai, and D.F. Robinson, “A survey of collective communication in wormhole-routed massively parallel computers,” Tech. Rep. MSU-CPS-94-35, Dept. of Computer Science, Michigan State Univ., 1994.]]

[18]

S.D. Kaushik, C.-H. Huang, R.W. Johnson, and P. Sadayappan, “An approach to communication-efficient data redistribution,” Proc. 1994 Int’l Conf. on Supercomputing, pp. 364-373, July 1994.]]

Digital Library

[19]

S.L. Johnsson and C.-T. Ho, “The complexity of reshaping arrays on boolean cubes,” Proc. Fifth Distributed Memory Computing Conf., vol. 1, pp. 370-377, Apr. 1990.]]

[20]

E.T. Kalns and L.M. Ni, “Processor mapping techniques toward efficient data redistribution,” Proc. of the Eighth Int’l Parallel Processing Symp., pp. 469-476, Apr. 1994.]]

Digital Library

[21]

A. Wakatani and M. Wolfe, “Optimization of the redistribution of arrays for distributed memory multicomputers,” Tech. Rep., Matsushita Electric Industrial Co., Ltd., Osaka, Japan, and Dept. of Computer Science and Eng., Oregon Graduate Inst., Portland, Dec. 1993.]]

[22]

H. Franke, P. Hochschild, P. Pattnaik, J.-P. Prost, and M. Snir, “MPI on IBM SP1/SP2: Current status and future directions,” Proc. 1994 Scalable Parallel Libraries Conf., Oct. 1994.]]

[23]

W. Gropp and E. Lusk, “User’s guide for the ANL IBM SPx system,” URL http://www.mcs.anl.gov/Projects/sp1/guide-r2.html, 1995.]]

Cited By

Herrmann JBosilca GHérault TMarchal LRobert YDongarra J(2016)Assessing the cost of redistribution followed by a computational kernelParallel Computing10.1016/j.parco.2015.09.00552:C(22-41)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1016/j.parco.2015.09.005
Souravlas SRoumeliotis M(2015)Scheduling array redistribution with virtual channel supportThe Journal of Supercomputing10.1007/s11227-015-1519-471:11(4215-4234)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1007/s11227-015-1519-4
Hsu CChen S(2012)Efficient selection strategies towards processor reordering techniques for improving data locality in heterogeneous clustersThe Journal of Supercomputing10.1007/s11227-010-0463-660:3(284-300)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1007/s11227-010-0463-6
Show More Cited By

Index Terms

Processor Mapping Techniques Toward Efficient Data Redistribution

Recommendations

A flexible processor mapping technique toward data localization for block-cyclic data redistribution

Array redistribution is usually needed for more efficiently executing a data-parallel program on distributed memory multicomputers. To minimize the redistribution data transfer cost, processor mapping techniques were proposed to reduce the amount of ...
An Efficient Communication Scheduling Method for the Processor Mapping Technique Applied Data Redistribution

Array redistribution is usually required for more efficiently executing a data-parallel program on distributed memory multi-computers. In performing array redistribution using synchronous communication mode, data communications among the processors ...
A Basic-Cycle Calculation Technique for Efficient Dynamic Data Redistribution

Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance trade-off between the efficiency of the new data ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 6, Issue 12

December 1995

118 pages

ISSN:1045-9219

Editor:
Duncan Lawrie
Univ. of Illinois, Urbana

Issue’s Table of Contents

Copyright © Copyright © 1995 IEEE. All Rights Reserved.

Publisher

IEEE Press

Publication History

Published: 01 December 1995

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Herrmann JBosilca GHérault TMarchal LRobert YDongarra J(2016)Assessing the cost of redistribution followed by a computational kernelParallel Computing10.1016/j.parco.2015.09.00552:C(22-41)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1016/j.parco.2015.09.005
Souravlas SRoumeliotis M(2015)Scheduling array redistribution with virtual channel supportThe Journal of Supercomputing10.1007/s11227-015-1519-471:11(4215-4234)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1007/s11227-015-1519-4
Hsu CChen S(2012)Efficient selection strategies towards processor reordering techniques for improving data locality in heterogeneous clustersThe Journal of Supercomputing10.1007/s11227-010-0463-660:3(284-300)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.1007/s11227-010-0463-6
Wang CChen SHsu CYang C(2008)Optimizing communications of data parallel programs in scalable cluster systemsProceedings of the 3rd international conference on Advances in grid and pervasive computing10.5555/1788754.1788762(29-37)Online publication date: 25-May-2008
https://dl.acm.org/doi/10.5555/1788754.1788762
Souravlas SRoumeliotis M(2008)A message combining approach for efficient array redistribution in non-all-to-all communication networksInternational Journal of Computer Mathematics10.1080/0020716070153787385:11(1609-1619)Online publication date: 1-Nov-2008
https://dl.acm.org/doi/10.1080/00207160701537873
Souravlas SRoumeliotis M(2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
https://dl.acm.org/doi/10.1007/s11227-008-0185-1
Huang JChu C(2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
https://dl.acm.org/doi/10.1007/s11227-007-0166-9
Yu C(2007)On the complexity of the max-edge-coloring problem with its variantsProceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies10.5555/2399256.2399288(350-361)Online publication date: 7-Apr-2007
https://dl.acm.org/doi/10.5555/2399256.2399288
Sudarsan RRibbens C(2007)Efficient multidimensional data redistribution for resizable parallel computationsProceedings of the 5th international conference on Parallel and Distributed Processing and Applications10.5555/2395970.2395991(182-194)Online publication date: 29-Aug-2007
https://dl.acm.org/doi/10.5555/2395970.2395991
Lin C(2007)A dominant input stream for LUD incremental computing on a contention networkProceedings of the 7th international conference on Algorithms and architectures for parallel processing10.5555/1769115.1769158(403-414)Online publication date: 11-Jun-2007
https://dl.acm.org/doi/10.5555/1769115.1769158
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents