research-article

Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks

Authors:

Karen D. Devine,

Kevin Pedretti,

Mark A. Taylor,

Sivasankaran Rajamanickam,

Ümit V. ÇatalyürekAuthors Info & Claims

IEEE Transactions on Parallel and Distributed Systems, Volume 30, Issue 9

Pages 2018 - 2032

https://doi.org/10.1109/TPDS.2019.2900043

Published: 01 September 2019 Publication History

Abstract

We present a new method for reducing parallel applications’ communication time by mapping their MPI tasks to processors in a way that lowers the distance messages travel and the amount of congestion in the network. Assuming geometric proximity among the tasks is a good approximation of their communication interdependence, we use a geometric partitioning algorithm to order both the tasks and the processors, assigning task parts to the corresponding processor parts. In this way, interdependent tasks are assigned to “nearby” cores in the network. We also present a number of algorithmic optimizations that exploit specific features of the network or application to further improve the quality of the mapping. We specifically address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network. However, our methods generalize to contiguous allocations as well, and results are shown for both contiguous and non-contiguous allocations. We show that, for the structured finite difference mini-application MiniGhost, our mapping methods reduced communication time up to 75 percent relative to MiniGhost's default mapping on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modeling code E3SM/HOMME, our methods reduced communication time up to 31% on 16K cores of an IBM BlueGene/Q with contiguous allocation.

References

[1]

H. M. Aktulga, C. Yang, E. G. Ng, P. Maris, and J. P. Vary, “Topology-aware mappings for large-scale eigenvalue problems,” in Proc. Euro-Par Parallel Process., 2012, pp. 830–842.

[2]

C. Albing, N. Troullier, S. Whalen, R. Olson, and J. Glensk, “Topology, bandwidth and performance: A new approach in linear orderings for application placement in a 3D torus,” in Proc Cray User Group (CUG), 2011.

[3]

G. Almasi, S. Chatterjee, A. Gara, J. Gunnels, M. Gupta, A. Henning, J. Moreira, and B. Walkup, “Unlocking the performance of the BlueGene/L supercomputer,” in Proc. 2004 ACM/IEEE Conf. Supercomputing, 2004, Art. no.

[4]

R. Barrett, C. Vaughan, S. Hammond, and D. Roweth, “Reducing the Bulk of the Bulk Synchronous Parallel Model,” Parallel Process. Lett., vol. 23, no. 4, 2013, Art. no.

[5]

R. F. Barrett, C. T. Vaughan, and M. A. Heroux, “MiniGhost: A miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing,” Sandia Nat. Laboratories, Albuquerque, NM, USA, Tech. Rep., 2012.

[6]

M. Berger and S. Bokhari, “A partitioning strategy for nonuniform problems on multiprocessors,” IEEE Trans. Comput., vol. C-36, no. 5, pp. 570–580, May 1987.

Digital Library

[7]

A. Bhatele, G. Gupta, L. Kale, and I.-H. Chung, “Automated mapping of regular communication graphs on mesh interconnects,” in Proc Int. Conf High Perform. Comput., 2010, pp. 1–10.

[8]

A. Bhatelé, E. Bohm, and L. V. Kalé, “Optimizing communication for Charm++ applications by reducing network contention,” Concurrency Comput.: Practice Experience, vol. 23, no. 2, pp. 211–222, 2011.

Digital Library

[9]

A. Bhatele, N. Jain, K. E. Isaacs, R. Buch, T. Gamblin, S. H. Langer, and L. V. Kale, “Optimizing the performance of parallel applications on a 5d torus via task mapping,” in Proc. 21st Int. Conf. High Perform. Comput., 2014, pp. 1–10.

[10]

A. Bhatele, L. V. Kale, and S. Kumar, “Dynamic topology aware load balancing algorithms for molecular dynamics applications,” in Proc 23rd Int. Conf. Supercomputing, 2009, pp. 110–116.

[11]

S. H. Bokhari, “On the mapping problem,” IEEE Trans. Comput., vol. C-100, no. 3, pp. 207–214, Mar. 1981.

[12]

S. W. Bollinger and S. F. Midkiff, “Heuristic technique for processor and link assignment in multicomputers,” IEEE Trans. Comput., vol. 40, no. 3, pp. 325–333, Mar. 1991.

Digital Library

[13]

E. G. Boman, K. D. Devine, V. J. Leung, S. Rajamanickam, L. A. Riesen, M. Deveci, and U. Catalyurek, “Zoltan2: Next-generation combinatorial toolkit,” Sandia Nat. Laboratories, Albuquerque, NM, USA, Tech. Rep., 2012.

[14]

J. Brandt, K. Devine, A. Gentile, and K. Pedretti, “Demonstrating improved application performance using dynamic monitoring and task mapping,” in Proc. IEEE Int. Conf. Cluster Comput., Sep. 2014, pp. 408–415.

[15]

W. M. Brown, T. D. Nguyen, M. Fuentes-Cabrera, J. D. Fowlkes, P. D. Rack, M. Berger, and A. S. Bland, “An evaluation of molecular dynamics performance on the hybrid Cray XK6 supercomputer,” in Proc Int. Conf. Comput. Sci., 2012, pp. 186–195.

[16]

T. Chockalingam and S. Arunkumar, “Genetic algorithm based heuristics for the mapping problem,” Comput. Oper. Res., vol. 22, no. 1, pp. 55–64, 1995.

Digital Library

[17]

I.-H. Chung, C.-R. Lee, J. Zhou, and Y.-C. Chung, “Hierarchical mapping for HPC applications,” in Proc Workshop Large-Scale Parallel Process., 2011, pp. 1810–1818.

[18]

J. Dennis, J. Edwards, K. Evans, O. Guba, P. Lauritzen, A. Mirin, A. St-Cyr, M. A. Taylor, and P. H. Worley, “CAM-SE: A scalable spectral element dynamical core for the community atmosphere model,” Int. J. High Perform. Comput. Appl., vol. 26, pp. 74–89, 2012.

Digital Library

[19]

M. Deveci, K. Devine, K. Pedretti, M. Taylor, S. Rajamanickam, and U. Catalyurek, “Geometric partitioning and ordering strategies for task mapping on parallel computers,” Sandia Nat. Laboratories, Albuquerque, NM, USA, Tech. Rep., 2018.

[20]

M. Deveci, K. Kaya, B. Uçar, and Ü. V. Çatalyürek, “Fast and high quality topology-aware task mapping,” in Proc. IEEE Int. Parallel Distrib. Proc. Symp., 2015, pp. 197–206.

[21]

M. Deveci, S. Rajamanickam, K. D. Devine, and Ü. V. Çatalyürek, “Multi-jagged: A scalable parallel spatial partitioning algorithm,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 3, pp. 803–817, Mar. 2016.

Digital Library

[22]

M. Deveci, S. Rajamanickam, V. J. Leung, K. Pedretti, S. L. Olivier, D. P. Bunde, U. V. Catalyürek, and K. Devine, “Exploiting geometric partitioning in task mapping for parallel computers,” in Proc. IEEE Int. Parallel Distrib. Proc. Symp., 2014, pp. 27–36.

[23]

J. J. Galvez, N. Jain, and L. V. Kale, “Automatic topology mapping of diverse large-scale parallel applications,” in Proc. Intl. Conf. Supercomputing, 2017, Art. no.

[24]

F. Gray, “Pulse code communication,” U.S. Patent 2 632 058, 17 Mar. 1953.

[25]

F. Gygi, E. W. Draeger, M. Schulz, B. de Supinski, J. Gunnels, V. Austel, J. Sexton, F. Franchetti, S. Kral, C. Ueberhuber, and J. Lorenz, “Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform,” in Proc. ACM/IEEE Conf. Supercomputing, 17 Mar. 2006, Art. no.

[26]

D. Han, Z. Wang, and D. P. Bunde, “Improving valiant routing for slim fly networks,” in Proc. 46th Int. Conf. Parallel Process. Workshops, 2017, pp. 155–161.

[27]

J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A k-means clustering algorithm,” J. Roy. Statistical Soc. C Appl. Statist, vol. 28, no. 1, pp. 100–108, 1979.

[28]

D. Hilbert, The Foundations of Geometry. Chicago, IL, USA: Open Court Pub. Co., 1902.

[29]

T. Hoefler and M. Snir, “Generic topology mapping strategies for large-scale parallel architectures,” in Proc. 25th Int. Conf. Supercomputing, 2011, pp. 75–84.

[30]

G. Karypis and V. Kumar, “ParMETIS: Parallel graph partitioning and sparse matrix ordering library,” Dept. Comput. Sci., Univ. Minnesota, Minneapolis, MN, USA, Tech. Rep., 1997.

[31]

H. Kikuchi, B. Karki, and S. Saini, “Topology-aware parallel molecular dynamics simulation algorithm,” in Proc. Int. Conf. Parallel Distrib. Proc. Tech. Appl., 2006.

[32]

A. Kovalov, E. Lobe, A. Gerndt, and D. Lüdtke, “Task-node mapping in an arbitrary computer network using smt solver,” in Proc. Int. Conf. Integr. Formal Methods, 2017, pp. 177–191.

[33]

S.-Y. Lee and J. Aggarwal, “A mapping strategy for parallel processing,” IEEE Trans. Comput., vol. C-100, no. 4, pp. 433–442, Apr. 1987.

[34]

V. J. Leung, D. Bunde, J. Ebbers, S. Feer, N. Price, Z. Rhodes, and M. Swank, “Task mapping stencil computations for non-contiguous allocations,” in Proc. 19th Symp. Principals Practice Parallel Program., 2014, pp. 377–378.

[35]

G. Michelogiannakis, K. Z. Ibrahim, J. Shalf, J. J. Wilke, S. Knight, and J. P. Kenny, “Aphid: Hierarchical task placement to enable a tapered fat tree topology for lower power and cost in hpc networks,” in Proc. 17th IEEE/ACM Int. Symp. Cluster Cloud Grid Comput., 2017, pp. 228–237.

[36]

S. H. Mirsadeghi and A. Afsahi, “Ptram: A parallel topology-and routing-aware mapping framework for large-scale hpc systems,” in Proc. IEEE Int. Parallel Distrib. Process. Symp. Workshops, 2016, pp. 386–396.

[37]

G. M. Morton, A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. New York, NY, USA: International Business Machines Company, 1966.

[38]

K. Pedretti, C. Vaughan, R. Barrett, K. Devine, and K. S. Hemmert, “Using the Cray Gemini performance counters,” in Proc. Cray User Group, 2013.

[39]

F. Pellegrini and J. Roman, “Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs,” in Proc. Int. Conf. Exhib. High-Perform. Comput. Netw., 1996, pp. 493–498.

[40]

H. Simon, “Partitioning of unstructured problems for parallel processing,” Comput. Syst. Eng., vol. 2, no. 2, pp. 135–148, 1991.

[41]

O. Tuncer, V. J. Leung, and A. K. Coskun, “Pacmap: Topology mapping of unstructured communication patterns onto non-contiguous allocations,” in Proc. Int. Conf. Supercomputing, 2015, pp. 37–46.

[42]

C. Walshaw and M. Cross, “Multilevel mesh partitioning for heterogeneous communication networks,” Future Generation Comput. Syst., vol. 17, no. 5, pp. 601–623, 2001.

[43]

H. Yu, I.-H. Chung, and J. Moreira, “Topology mapping for Blue Gene/L supercomputer,” in Proc. ACM/IEEE Conf. Supercomputing, 2006, Art. no.

Cited By

Fan WXiao FLv MHan LYu S(2024)Efficient Fault-Tolerant Path Embedding for 3D Torus Network Using Locally Faulty BlocksIEEE Transactions on Computers10.1109/TC.2024.341669573:9(2305-2319)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TC.2024.3416695
Germann TAcer SAzad ABoman EBuluç ADevine KFerdous SGawande NGhosh SHalappanavar MKalyanaraman AKhan AMinutoli MPothen ARajamanickam SSelvitopi OTallent NTumeo A(2021)EXAGRAPHInternational Journal of High Performance Computing Applications10.1177/1094342021102929935:6(553-571)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1177/10943420211029299
Yu JYang WWang FDong DFeng JLi Y(2020)Spatially Bursty I/O on Supercomputers: Causes, Impacts and SolutionsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300557231:12(2908-2922)Online publication date: 9-Jul-2020
https://dl.acm.org/doi/10.1109/TPDS.2020.3005572

Index Terms

Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Exploiting Geometric Partitioning in Task Mapping for Parallel Computers
IPDPS '14: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium

We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a ...
Combined scheduling and mapping for scalable computing with parallel tasks
Biological Knowledge Discovery and Data Mining

Recent and future parallel clusters and supercomputers use symmetric multiprocessors SMPs and multi-core processors as basic nodes, providing a huge amount of parallel resources. These systems often have hierarchically structured interconnection ...
Programming Massively Parallel Processors: A Hands-on Approach

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems Volume 30, Issue 9

Sept. 2019

228 pages

ISSN:1045-9219

Issue’s Table of Contents

1045-9219 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 September 2019

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fan WXiao FLv MHan LYu S(2024)Efficient Fault-Tolerant Path Embedding for 3D Torus Network Using Locally Faulty BlocksIEEE Transactions on Computers10.1109/TC.2024.341669573:9(2305-2319)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TC.2024.3416695
Germann TAcer SAzad ABoman EBuluç ADevine KFerdous SGawande NGhosh SHalappanavar MKalyanaraman AKhan AMinutoli MPothen ARajamanickam SSelvitopi OTallent NTumeo A(2021)EXAGRAPHInternational Journal of High Performance Computing Applications10.1177/1094342021102929935:6(553-571)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1177/10943420211029299
Yu JYang WWang FDong DFeng JLi Y(2020)Spatially Bursty I/O on Supercomputers: Causes, Impacts and SolutionsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300557231:12(2908-2922)Online publication date: 9-Jul-2020
https://dl.acm.org/doi/10.1109/TPDS.2020.3005572

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents