Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks

Published: 01 September 2019 Publication History

Abstract

We present a new method for reducing parallel applications’ communication time by mapping their MPI tasks to processors in a way that lowers the distance messages travel and the amount of congestion in the network. Assuming geometric proximity among the tasks is a good approximation of their communication interdependence, we use a geometric partitioning algorithm to order both the tasks and the processors, assigning task parts to the corresponding processor parts. In this way, interdependent tasks are assigned to “nearby” cores in the network. We also present a number of algorithmic optimizations that exploit specific features of the network or application to further improve the quality of the mapping. We specifically address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network. However, our methods generalize to contiguous allocations as well, and results are shown for both contiguous and non-contiguous allocations. We show that, for the structured finite difference mini-application MiniGhost, our mapping methods reduced communication time up to 75 percent relative to MiniGhost's default mapping on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modeling code E3SM/HOMME, our methods reduced communication time up to 31% on 16K cores of an IBM BlueGene/Q with contiguous allocation.

References

[1]
H. M. Aktulga, C. Yang, E. G. Ng, P. Maris, and J. P. Vary, “Topology-aware mappings for large-scale eigenvalue problems,” in Proc. Euro-Par Parallel Process., 2012, pp. 830–842.
[2]
C. Albing, N. Troullier, S. Whalen, R. Olson, and J. Glensk, “Topology, bandwidth and performance: A new approach in linear orderings for application placement in a 3D torus,” in Proc Cray User Group (CUG), 2011.
[3]
G. Almasi, S. Chatterjee, A. Gara, J. Gunnels, M. Gupta, A. Henning, J. Moreira, and B. Walkup, “Unlocking the performance of the BlueGene/L supercomputer,” in Proc. 2004 ACM/IEEE Conf. Supercomputing, 2004, Art. no.
[4]
R. Barrett, C. Vaughan, S. Hammond, and D. Roweth, “Reducing the Bulk of the Bulk Synchronous Parallel Model,” Parallel Process. Lett., vol. 23, no. 4, 2013, Art. no.
[5]
R. F. Barrett, C. T. Vaughan, and M. A. Heroux, “MiniGhost: A miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing,” Sandia Nat. Laboratories, Albuquerque, NM, USA, Tech. Rep., 2012.
[6]
M. Berger and S. Bokhari, “A partitioning strategy for nonuniform problems on multiprocessors,” IEEE Trans. Comput., vol. C-36, no. 5, pp. 570–580, May 1987.
[7]
A. Bhatele, G. Gupta, L. Kale, and I.-H. Chung, “Automated mapping of regular communication graphs on mesh interconnects,” in Proc Int. Conf High Perform. Comput., 2010, pp. 1–10.
[8]
A. Bhatelé, E. Bohm, and L. V. Kalé, “Optimizing communication for Charm++ applications by reducing network contention,” Concurrency Comput.: Practice Experience, vol. 23, no. 2, pp. 211–222, 2011.
[9]
A. Bhatele, N. Jain, K. E. Isaacs, R. Buch, T. Gamblin, S. H. Langer, and L. V. Kale, “Optimizing the performance of parallel applications on a 5d torus via task mapping,” in Proc. 21st Int. Conf. High Perform. Comput., 2014, pp. 1–10.
[10]
A. Bhatele, L. V. Kale, and S. Kumar, “Dynamic topology aware load balancing algorithms for molecular dynamics applications,” in Proc 23rd Int. Conf. Supercomputing, 2009, pp. 110–116.
[11]
S. H. Bokhari, “On the mapping problem,” IEEE Trans. Comput., vol. C-100, no. 3, pp. 207–214, Mar. 1981.
[12]
S. W. Bollinger and S. F. Midkiff, “Heuristic technique for processor and link assignment in multicomputers,” IEEE Trans. Comput., vol. 40, no. 3, pp. 325–333, Mar. 1991.
[13]
E. G. Boman, K. D. Devine, V. J. Leung, S. Rajamanickam, L. A. Riesen, M. Deveci, and U. Catalyurek, “Zoltan2: Next-generation combinatorial toolkit,” Sandia Nat. Laboratories, Albuquerque, NM, USA, Tech. Rep., 2012.
[14]
J. Brandt, K. Devine, A. Gentile, and K. Pedretti, “Demonstrating improved application performance using dynamic monitoring and task mapping,” in Proc. IEEE Int. Conf. Cluster Comput., Sep. 2014, pp. 408–415.
[15]
W. M. Brown, T. D. Nguyen, M. Fuentes-Cabrera, J. D. Fowlkes, P. D. Rack, M. Berger, and A. S. Bland, “An evaluation of molecular dynamics performance on the hybrid Cray XK6 supercomputer,” in Proc Int. Conf. Comput. Sci., 2012, pp. 186–195.
[16]
T. Chockalingam and S. Arunkumar, “Genetic algorithm based heuristics for the mapping problem,” Comput. Oper. Res., vol. 22, no. 1, pp. 55–64, 1995.
[17]
I.-H. Chung, C.-R. Lee, J. Zhou, and Y.-C. Chung, “Hierarchical mapping for HPC applications,” in Proc Workshop Large-Scale Parallel Process., 2011, pp. 1810–1818.
[18]
J. Dennis, J. Edwards, K. Evans, O. Guba, P. Lauritzen, A. Mirin, A. St-Cyr, M. A. Taylor, and P. H. Worley, “CAM-SE: A scalable spectral element dynamical core for the community atmosphere model,” Int. J. High Perform. Comput. Appl., vol. 26, pp. 74–89, 2012.
[19]
M. Deveci, K. Devine, K. Pedretti, M. Taylor, S. Rajamanickam, and U. Catalyurek, “Geometric partitioning and ordering strategies for task mapping on parallel computers,” Sandia Nat. Laboratories, Albuquerque, NM, USA, Tech. Rep., 2018.
[20]
M. Deveci, K. Kaya, B. Uçar, and Ü. V. Çatalyürek, “Fast and high quality topology-aware task mapping,” in Proc. IEEE Int. Parallel Distrib. Proc. Symp., 2015, pp. 197–206.
[21]
M. Deveci, S. Rajamanickam, K. D. Devine, and Ü. V. Çatalyürek, “Multi-jagged: A scalable parallel spatial partitioning algorithm,” IEEE Trans. Parallel Distrib. Syst., vol. 27, no. 3, pp. 803–817, Mar. 2016.
[22]
M. Deveci, S. Rajamanickam, V. J. Leung, K. Pedretti, S. L. Olivier, D. P. Bunde, U. V. Catalyürek, and K. Devine, “Exploiting geometric partitioning in task mapping for parallel computers,” in Proc. IEEE Int. Parallel Distrib. Proc. Symp., 2014, pp. 27–36.
[23]
J. J. Galvez, N. Jain, and L. V. Kale, “Automatic topology mapping of diverse large-scale parallel applications,” in Proc. Intl. Conf. Supercomputing, 2017, Art. no.
[24]
F. Gray, “Pulse code communication,” U.S. Patent 2 632 058, 17 Mar. 1953.
[25]
F. Gygi, E. W. Draeger, M. Schulz, B. de Supinski, J. Gunnels, V. Austel, J. Sexton, F. Franchetti, S. Kral, C. Ueberhuber, and J. Lorenz, “Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform,” in Proc. ACM/IEEE Conf. Supercomputing, 17 Mar. 2006, Art. no.
[26]
D. Han, Z. Wang, and D. P. Bunde, “Improving valiant routing for slim fly networks,” in Proc. 46th Int. Conf. Parallel Process. Workshops, 2017, pp. 155–161.
[27]
J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A k-means clustering algorithm,” J. Roy. Statistical Soc. C Appl. Statist, vol. 28, no. 1, pp. 100–108, 1979.
[28]
D. Hilbert, The Foundations of Geometry. Chicago, IL, USA: Open Court Pub. Co., 1902.
[29]
T. Hoefler and M. Snir, “Generic topology mapping strategies for large-scale parallel architectures,” in Proc. 25th Int. Conf. Supercomputing, 2011, pp. 75–84.
[30]
G. Karypis and V. Kumar, “ParMETIS: Parallel graph partitioning and sparse matrix ordering library,” Dept. Comput. Sci., Univ. Minnesota, Minneapolis, MN, USA, Tech. Rep., 1997.
[31]
H. Kikuchi, B. Karki, and S. Saini, “Topology-aware parallel molecular dynamics simulation algorithm,” in Proc. Int. Conf. Parallel Distrib. Proc. Tech. Appl., 2006.
[32]
A. Kovalov, E. Lobe, A. Gerndt, and D. Lüdtke, “Task-node mapping in an arbitrary computer network using smt solver,” in Proc. Int. Conf. Integr. Formal Methods, 2017, pp. 177–191.
[33]
S.-Y. Lee and J. Aggarwal, “A mapping strategy for parallel processing,” IEEE Trans. Comput., vol. C-100, no. 4, pp. 433–442, Apr. 1987.
[34]
V. J. Leung, D. Bunde, J. Ebbers, S. Feer, N. Price, Z. Rhodes, and M. Swank, “Task mapping stencil computations for non-contiguous allocations,” in Proc. 19th Symp. Principals Practice Parallel Program., 2014, pp. 377–378.
[35]
G. Michelogiannakis, K. Z. Ibrahim, J. Shalf, J. J. Wilke, S. Knight, and J. P. Kenny, “Aphid: Hierarchical task placement to enable a tapered fat tree topology for lower power and cost in hpc networks,” in Proc. 17th IEEE/ACM Int. Symp. Cluster Cloud Grid Comput., 2017, pp. 228–237.
[36]
S. H. Mirsadeghi and A. Afsahi, “Ptram: A parallel topology-and routing-aware mapping framework for large-scale hpc systems,” in Proc. IEEE Int. Parallel Distrib. Process. Symp. Workshops, 2016, pp. 386–396.
[37]
G. M. Morton, A Computer Oriented Geodetic Data Base and a New Technique in File Sequencing. New York, NY, USA: International Business Machines Company, 1966.
[38]
K. Pedretti, C. Vaughan, R. Barrett, K. Devine, and K. S. Hemmert, “Using the Cray Gemini performance counters,” in Proc. Cray User Group, 2013.
[39]
F. Pellegrini and J. Roman, “Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs,” in Proc. Int. Conf. Exhib. High-Perform. Comput. Netw., 1996, pp. 493–498.
[40]
H. Simon, “Partitioning of unstructured problems for parallel processing,” Comput. Syst. Eng., vol. 2, no. 2, pp. 135–148, 1991.
[41]
O. Tuncer, V. J. Leung, and A. K. Coskun, “Pacmap: Topology mapping of unstructured communication patterns onto non-contiguous allocations,” in Proc. Int. Conf. Supercomputing, 2015, pp. 37–46.
[42]
C. Walshaw and M. Cross, “Multilevel mesh partitioning for heterogeneous communication networks,” Future Generation Comput. Syst., vol. 17, no. 5, pp. 601–623, 2001.
[43]
H. Yu, I.-H. Chung, and J. Moreira, “Topology mapping for Blue Gene/L supercomputer,” in Proc. ACM/IEEE Conf. Supercomputing, 2006, Art. no.

Cited By

View all
  • (2024)Efficient Fault-Tolerant Path Embedding for 3D Torus Network Using Locally Faulty BlocksIEEE Transactions on Computers10.1109/TC.2024.341669573:9(2305-2319)Online publication date: 1-Sep-2024
  • (2021)EXAGRAPHInternational Journal of High Performance Computing Applications10.1177/1094342021102929935:6(553-571)Online publication date: 1-Nov-2021
  • (2020)Spatially Bursty I/O on Supercomputers: Causes, Impacts and SolutionsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300557231:12(2908-2922)Online publication date: 9-Jul-2020

Index Terms

  1. Geometric Mapping of Tasks to Processors on Parallel Computers with Mesh or Torus Networks
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image IEEE Transactions on Parallel and Distributed Systems
              IEEE Transactions on Parallel and Distributed Systems  Volume 30, Issue 9
              Sept. 2019
              228 pages

              Publisher

              IEEE Press

              Publication History

              Published: 01 September 2019

              Qualifiers

              • Research-article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 10 Nov 2024

              Other Metrics

              Citations

              Cited By

              View all
              • (2024)Efficient Fault-Tolerant Path Embedding for 3D Torus Network Using Locally Faulty BlocksIEEE Transactions on Computers10.1109/TC.2024.341669573:9(2305-2319)Online publication date: 1-Sep-2024
              • (2021)EXAGRAPHInternational Journal of High Performance Computing Applications10.1177/1094342021102929935:6(553-571)Online publication date: 1-Nov-2021
              • (2020)Spatially Bursty I/O on Supercomputers: Causes, Impacts and SolutionsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.300557231:12(2908-2922)Online publication date: 9-Jul-2020

              View Options

              View options

              Get Access

              Login options

              Media

              Figures

              Other

              Tables

              Share

              Share

              Share this Publication link

              Share on social media