Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2063384.2063486acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Avoiding hot-spots on two-level direct networks

Published: 12 November 2011 Publication History

Abstract

A low-diameter, fast interconnection network is going to be a prerequisite for building exascale machines. A two-level direct network has been proposed by several groups as a scalable design for future machines. IBM's PERCS topology and the dragonfly network discussed in the DARPA exascale hardware study are examples of this design. The presence of multiple levels in this design leads to hot-spots on a few links when processes are grouped together at the lowest level to minimize total communication volume. This is especially true for communication graphs with a small number of neighbors per task. Routing and mapping choices can impact the communication performance of parallel applications running on a machine with a two-level direct topology. This paper explores intelligent topology aware mappings of different communication patterns to the physical topology to identify cases that minimize link utilization. We also analyze the trade-offs between using direct and indirect routing with different mappings. We use simulations to study communication and overall performance of applications since there are no installations of two-level direct networks yet. This study raises interesting issues regarding the choice of job scheduling, routing and mapping for future machines.

References

[1]
T. Agarwal, A. Sharma, and L. V. Kalé. Topology-aware task mapping for reducing communication contention on large parallel machines. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2006, April 2006.
[2]
B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS High-Performance Interconnect. In 2010 IEEE 18th Annual Symposium on High Performance Interconnects (HOTI), pages 75--82, August 2010.
[3]
C. Bernard, T. Burch, T. A. DeGrand, C. DeTar, S. Gottlieb, U. M. Heller, J. E. Hetrick, K. Orginos, B. Sugar, and D. Toussaint. Scaling tests of the improved Kogut-Susskind quark action. Physical Review D, (61), 2000.
[4]
G. Bhanot, A. Gara, P. Heidelberger, E. Lawless, J. C. Sexton, and R. Walkup. Optimizing task layout on the Blue Gene/L supercomputer. IBM Journal of Research and Development, 49(2/3):489--500, 2005.
[5]
A. Bhatelé, E. Bohm, and L. V. Kalé. Optimizing communication for Charm++ applications by reducing network contention. Concurrency and Computation: Practice & Experience, 23(2):211--222, February 2011.
[6]
A. Bhatele, S. Kumar, C. Mei, J. C. Phillips, G. Zheng, and L. V. Kale. Overcoming scaling challenges in biomolecular simulations across multiple platforms. In Proceedings of IEEE International Parallel and Distributed Processing Symposium 2008, pages 1--12, April 2008.
[7]
F. Ercal and J. Ramanujam and P. Sadayappan. Task allocation onto a hypercube by recursive mincut bipartitioning. In Proceedings of the 3rd conference on Hypercube concurrent computers and applications, pages 210--221. ACM Press, 1988.
[8]
B. G. Fitch, A. Rayshubskiy, M. Eleftheriou, T. J. C. Ward, M. Giampapa, and M. C. Pitman. Blue matter: Approaching the limits of concurrency for classical molecular dynamics. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, New York, NY, USA, 2006. ACM Press.
[9]
Francine Berman and Lawrence Snyder. On mapping parallel algorithms into parallel architectures. Journal of Parallel and Distributed Computing, 4(5):439--458, 1987.
[10]
F. Gygi, E. W. Draeger, M. Schulz, B. R. D. Supinski, J. A. Gunnels, V. Austel, J. C. Sexton, F. Franchetti, S. Kral, C. Ueberhuber, and J. Lorenz. Large-Scale Electronic Structure Calculations of High-Z Metals on the Blue Gene/L Platform. In Proceedings of the International Conference in Supercomputing. ACM Press, 2006.
[11]
T. Hoefler and M. Snir. Generic topology mapping strategies for large-scale parallel architectures. In Proceedings of the international conference on Supercomputing, ICS '11, pages 75--84, New York, NY, USA, 2011. ACM.
[12]
C. Huang, G. Zheng, S. Kumar, and L. V. Kalé. Performance Evaluation of Adaptive MPI. In Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2006, March 2006.
[13]
L. Kalé and S. Krishnan. CHARM++: A Portable Concurrent Object Oriented System Based on C++. In A. Paepcke, editor, Proceedings of OOPSLA'93, pages 91--108. ACM Press, September 1993.
[14]
L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar. Scaling applications to massively parallel machines using projections performance analysis tool. In Future Generation Computer Systems Special Issue on: Large-Scale System Performance Modeling and Analysis, volume 22, pages 347--358, February 2006.
[15]
J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-driven, highly-scalable dragonfly topology. SIGARCH Comput. Archit. News, 36:77--88, June 2008.
[16]
P. Kogge, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R. S. Williams, and K. Yelick. Exascale computing study: Technology challenges in achieving exascale systems, 2008.
[17]
Michalakes, J., J. Dudhia, D. Gill, T. Henderson, J. Klemp, W. Skamarock, and W. Wang. The Weather Research and Forecast Model: Software Architecture and Performance. In Proceedings of the 11th ECMWF Workshop on the Use of High Performance Computing In Meteorology, October 2004.
[18]
S. Wayne Bollinger and Scott F. Midkiff. Processor and Link Assignment in Multicomputers Using Simulated Annealing. In ICPP (1), pages 1--7, 1988.
[19]
Soo-Young Lee and J. K. Aggarwal. A Mapping Strategy for Parallel Processing. IEEE Trans. Computers, 36(4):433--442, 1987.
[20]
H. Yu, I.-H. Chung, and J. Moreira. Topology mapping for Blue Gene/L supercomputer. In SC '06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 116, New York, NY, USA, 2006. ACM.
[21]
G. Zheng, T. Wilmarth, P. Jagadishprasad, and L. V. Kalé. Simulation-based performance prediction for large parallel machines. In International Journal of Parallel Programming, volume 33, pages 183--207, 2005.

Cited By

View all
  • (2023)GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPCProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593732(437-449)Online publication date: 21-Jun-2023
  • (2023)Workload Interference Prevention with Intelligent Routing and Flexible Job Placement on DragonflyProceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3573900.3591119(23-33)Online publication date: 21-Jun-2023
  • (2022)Study of Workload Interference with Intelligent Routing on DragonflySC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00025(1-14)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
November 2011
866 pages
ISBN:9781450307710
DOI:10.1145/2063384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. communication
  2. dragonfly network
  3. exascale
  4. mapping
  5. performance

Qualifiers

  • Research-article

Funding Sources

Conference

SC '11
Sponsor:

Acceptance Rates

SC '11 Paper Acceptance Rate 74 of 352 submissions, 21%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)GRAP: Group-level Resource Allocation Policy for Reconfigurable Dragonfly Network in HPCProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593732(437-449)Online publication date: 21-Jun-2023
  • (2023)Workload Interference Prevention with Intelligent Routing and Flexible Job Placement on DragonflyProceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3573900.3591119(23-33)Online publication date: 21-Jun-2023
  • (2022)Study of Workload Interference with Intelligent Routing on DragonflySC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00025(1-14)Online publication date: Nov-2022
  • (2021)Performance Evaluation of Adaptive Routing on Dragonfly-based Production Systems2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00042(340-349)Online publication date: May-2021
  • (2020)Fast Modeling of Network Contention in Batch Point-to-point Communications by Packet-level Simulation with Dynamic Time-steppingWorkshop Proceedings of the 49th International Conference on Parallel Processing10.1145/3409390.3409398(1-10)Online publication date: 17-Aug-2020
  • (2019)Fault-Tolerant Adaptive Routing in Dragonfly NetworksIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2017.269337216:2(259-271)Online publication date: 1-Mar-2019
  • (2018)RM-replayProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291690(1-13)Online publication date: 11-Nov-2018
  • (2018)Brief AnnouncementProceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures10.1145/3210377.3210665(91-93)Online publication date: 11-Jul-2018
  • (2018)RM-replayProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00028(1-13)Online publication date: 11-Nov-2018
  • (2018)Megafly: A Topology for Exascale SystemsHigh Performance Computing10.1007/978-3-319-92040-5_15(289-310)Online publication date: 29-May-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media