Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Processor Mapping Techniques Toward Efficient Data Redistribution

Published: 01 December 1995 Publication History

Abstract

Run-time data redistribution can enhance algorithm performance in distributed-memory machines. Explicit redistribution of data can be performed between algorithm phases when a different data decomposition is expected to deliver increased performance for a subsequent phase of computation. Redistribution, however, represents increased program overhead as algorithm computation is discontinued while data are exchanged among processor memories. In this paper, we present a technique that minimizes the amount of data exchange for BLOCK to CYCLIC(c) (or vice-versa) redistributions of arbitrary number of dimensions. Preserving the semantics of the target (destination) distribution pattern, the technique manipulates the data to logical processor mapping of the target pattern. When implemented on an IBM SP, the mapping technique demonstrates redistribution performance improvements of approximately 40% over traditional data to processor mapping. Relative to the traditional mapping technique, the proposed method affords greater flexibility in specifying precisely which data elements are redistributed and which elements remain on-processor.

References

[1]
High Performance Fortran Forum, “High performance Fortran language specification (version 1.0, draft),” May 1993.]]
[2]
G. Fox, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C.-W. Tseng, and M.-Y. Wu, “Fortran D language specification,” Tech. Rep. COMP TR90-141, Rice Univ., Dept. of Computer Science, Dec. 1990.]]
[3]
H. Zima, P. Brezany, B. Chapman, P. Mehrotra, and A. Schwald, “Vienna Fortran: A language specification (version 1.1),” 1991.]]
[4]
P. Mehrotra and J.V. Rosendale, Programming Distributed Memory Architectures Using Kali, ch. 19, pp. 364-384. MIT Press, 1991.]]
[5]
M. Rosing, R.B. Schnabel, and R.P. Weaver, “The DINO parallel programming language,” J. of Parallel and Distributed Computing, vol. 13, pp. 30-42, Sept. 1991.]]
[6]
M.W. Hall, S. Hiranandani, K. Kennedy, and C.-W. Tseng, “Interprocedural compilation of Fortran D for MIMD machines,” Proc. of Supercomputing’92, pp. 522-534, Nov. 1992.]]
[7]
M. Baber, “Hypertasking support for dynamically redistributable and resizeable arrays on the iPSC,” Proc. Sixth Distributed Memory Computing Conf., pp. 59-66, Apr. 1991.]]
[8]
B. Chapman, P. Mehrotra, H. Moritsch, and H. Zima, “Dynamic data distributions in Vienna Fortran,” Proc. of Supercomputing’93, pp. 284-293, Nov. 1993.]]
[9]
C.-T. Ho and S.L. Johnsson, “Distributed routing algorithms for broadcasting and personalized communication in hypercubes,” Proc. 1986 Int’l Conf. on Parallel Processing, pp. 640-648, Aug. 1986.]]
[10]
S. Gupta, S. Kaushik, S. Mufti, S. Sharma, C.-H. Huang, and P. Sadayappan, “On the generation of efficient data communications for distributed-memory machines,” Proc. 1992 Int’l Computer Symp., Taichung, Taiwan, pp. 504-513, Dec. 1992.]]
[11]
R. Thakur, A. Choudhary, and G. Fox, “Runtime array redistribution in HPF programs,” Proc. 1994 Scalable High Performance Computing Conf., pp. 309-316, May 1994.]]
[12]
J. Stichnoth, D. O’Hallaron, and T. Gross, “Generating communication for array statements: Design, implementation, and evaluation,” J. of Parallel and Distributed Computing, vol. 21, no. 1, pp. 150-159, 1994.]]
[13]
S. Ramaswamy and P. Banerjee, “Automatic generation of efficient array redistribution routines for distributed memory multicomputers,” Tech. Rep. CRHC-94-09, Center for Reliable and High Performance Computing, Computer Systems and Research Laboratory, Univ. of Illinois, 1994.]]
[14]
D.J. Palermo, E. Su, J.A. Chandy, and P. Banerjee, “Communication optimizations used in the PARADIGM compiler for distributed-memory multicomputers,” Proc. 1994 Int’l Conf. on Parallel Processing, vol. 2, pp. 1-10, Aug. 1994.]]
[15]
E.T. Kalns and L.M. Ni, “DaReL: A portable data redistribution library for distributed-memory machines,” Proc. 1994 Scalable Parallel Libraries Conf. 2, Oct. 1994.]]
[16]
J. Bruck, R. Cypher, C.-T. Ho, and S. Kipnis, “Efficient algorithms for the index operation in message-passing systems,” Tech. Rep. RJ 9300 (82230), IBM Research Division, Almaden Research Center, San Jose, Calif., and T.J. Watson Research Center, Yorktown Heights, N.Y., Apr. 1993.]]
[17]
P.K. McKinley, Y.-J. Tsai, and D.F. Robinson, “A survey of collective communication in wormhole-routed massively parallel computers,” Tech. Rep. MSU-CPS-94-35, Dept. of Computer Science, Michigan State Univ., 1994.]]
[18]
S.D. Kaushik, C.-H. Huang, R.W. Johnson, and P. Sadayappan, “An approach to communication-efficient data redistribution,” Proc. 1994 Int’l Conf. on Supercomputing, pp. 364-373, July 1994.]]
[19]
S.L. Johnsson and C.-T. Ho, “The complexity of reshaping arrays on boolean cubes,” Proc. Fifth Distributed Memory Computing Conf., vol. 1, pp. 370-377, Apr. 1990.]]
[20]
E.T. Kalns and L.M. Ni, “Processor mapping techniques toward efficient data redistribution,” Proc. of the Eighth Int’l Parallel Processing Symp., pp. 469-476, Apr. 1994.]]
[21]
A. Wakatani and M. Wolfe, “Optimization of the redistribution of arrays for distributed memory multicomputers,” Tech. Rep., Matsushita Electric Industrial Co., Ltd., Osaka, Japan, and Dept. of Computer Science and Eng., Oregon Graduate Inst., Portland, Dec. 1993.]]
[22]
H. Franke, P. Hochschild, P. Pattnaik, J.-P. Prost, and M. Snir, “MPI on IBM SP1/SP2: Current status and future directions,” Proc. 1994 Scalable Parallel Libraries Conf., Oct. 1994.]]
[23]
W. Gropp and E. Lusk, “User’s guide for the ANL IBM SPx system,” URL http://www.mcs.anl.gov/Projects/sp1/guide-r2.html, 1995.]]

Cited By

View all
  • (2016)Assessing the cost of redistribution followed by a computational kernelParallel Computing10.1016/j.parco.2015.09.00552:C(22-41)Online publication date: 1-Feb-2016
  • (2015)Scheduling array redistribution with virtual channel supportThe Journal of Supercomputing10.1007/s11227-015-1519-471:11(4215-4234)Online publication date: 1-Nov-2015
  • (2012)Efficient selection strategies towards processor reordering techniques for improving data locality in heterogeneous clustersThe Journal of Supercomputing10.1007/s11227-010-0463-660:3(284-300)Online publication date: 1-Jun-2012
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 6, Issue 12
December 1995
118 pages
ISSN:1045-9219
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 December 1995

Author Tags

  1. Distributed-memory architectures
  2. High Performance Fortran
  3. data decomposition
  4. data redistribution
  5. data-parallel programming.
  6. processor mapping

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2016)Assessing the cost of redistribution followed by a computational kernelParallel Computing10.1016/j.parco.2015.09.00552:C(22-41)Online publication date: 1-Feb-2016
  • (2015)Scheduling array redistribution with virtual channel supportThe Journal of Supercomputing10.1007/s11227-015-1519-471:11(4215-4234)Online publication date: 1-Nov-2015
  • (2012)Efficient selection strategies towards processor reordering techniques for improving data locality in heterogeneous clustersThe Journal of Supercomputing10.1007/s11227-010-0463-660:3(284-300)Online publication date: 1-Jun-2012
  • (2008)Optimizing communications of data parallel programs in scalable cluster systemsProceedings of the 3rd international conference on Advances in grid and pervasive computing10.5555/1788754.1788762(29-37)Online publication date: 25-May-2008
  • (2008)A message combining approach for efficient array redistribution in non-all-to-all communication networksInternational Journal of Computer Mathematics10.1080/0020716070153787385:11(1609-1619)Online publication date: 1-Nov-2008
  • (2008)A message passing strategy for array redistributions in a torus networkThe Journal of Supercomputing10.1007/s11227-008-0185-146:1(40-57)Online publication date: 1-Oct-2008
  • (2008)A flexible processor mapping technique toward data localization for block-cyclic data redistributionThe Journal of Supercomputing10.1007/s11227-007-0166-945:2(151-172)Online publication date: 1-Aug-2008
  • (2007)On the complexity of the max-edge-coloring problem with its variantsProceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies10.5555/2399256.2399288(350-361)Online publication date: 7-Apr-2007
  • (2007)Efficient multidimensional data redistribution for resizable parallel computationsProceedings of the 5th international conference on Parallel and Distributed Processing and Applications10.5555/2395970.2395991(182-194)Online publication date: 29-Aug-2007
  • (2007)A dominant input stream for LUD incremental computing on a contention networkProceedings of the 7th international conference on Algorithms and architectures for parallel processing10.5555/1769115.1769158(403-414)Online publication date: 11-Jun-2007
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media