Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2600212.2600225acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks

Published: 23 June 2014 Publication History

Abstract

Dragonflies are recent network designs that are one of the most promising topologies for the Exascale effort due to their scalability and cost. While being able to achieve very high throughput under random uniform all-to-all traffic, this type of network can experience significant performance degradation for other common high performance computing workloads such as stencil (multi-dimensional nearest neighbor) patterns. Often, the lack of peak performance is caused by an insufficient understanding of the interaction between the workload and the network, and an insufficient understanding of how application specific task-to-node mapping strategies can serve as optimization vehicles.
To address these issues, we propose a theoretical performance analysis framework that takes as inputs a network specification and a traffic demand matrix characterizing an arbitrary workload and is able to predict where bottlenecks will occur in the network and what their impact will be on the effective sustainable injection bandwidth. We then focus our analysis on a specific high-interest communication pattern, the multi-dimensional Cartesian nearest neighbor exchange, and provide analytic bounds (owing to bottlenecks in the remote links of the Dragonfly) on its expected performance across a multitude of possible mapping strategies.
Finally, using a comprehensive set of simulations results, we validate the correctness of the theoretical approach and in the process address some misconceptions regarding Dragonfly network behavior and evaluation, (such as the choice of throughput maximization over workload completion time minimization as optimization objective) and the question of whether the standard notion of Dragonfly balance can be extended to workloads other than uniform random traffic.

References

[1]
M. Alvanos, G. Tanase, M. Farreras, E. Tiotto, J. N. Amaral, and X. Martorell. Improving performance of all-to-all communication through loop scheduling in PGAS environments. In Proc. of the $27^th$ International Conference on Supercomputing, ICS '13, pages 457--458, New York, NY, USA, 2013. ACM.
[2]
B. Arimilli, R. Arimilli, V. Chung, S. Clark, W. Denzel, B. Drerup, T. Hoefler, J. Joyner, J. Lewis, J. Li, N. Ni, and R. Rajamony. The PERCS High-Performance Interconnect. In Proc. of the $18^th$ IEEE Symposium on High Performance Interconnects, HOTI '10, pages 75--82, Washington, DC, USA, 2010. IEEE Computer Society.
[3]
A. Bhatele, N. Jain, W. D. Gropp, and L. V. Kale. Avoiding hot-spots on two-level direct networks. In Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 76:1--76:11, New York, NY, USA, 2011. ACM.
[4]
W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2003.
[5]
G. Faanes, A. Bataineh, D. Roweth, T. Court, E. Froese, B. Alverson, T. Johnson, J. Kopnick, M. Higgins, and J. Reinhard. Cray Cascade: a scalable HPC system based on a Dragonfly network. In Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 103:1--103:9, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
[6]
M. Garcia, E. Vallejo, R. Beivide, M. Odriozola, C. Camarero, M. Valero, G. Rodriguez, J. Labarta, and C. Minkenberg. On-the-fly adaptive routing in high-radix hierarchical networks. In Proc. of the $41^st$ International Conference on Parallel Processing (ICPP), pages 279--288, 2012.
[7]
G. H. Gonnet. Expected length of the longest probe sequence in hash code searching. J. of the ACM (JACM), 28(2):289--304, 1981.
[8]
T. Goodale, G. Allen, G. Lanfermann, J. Massó, T. Radke, E. Seidel, and J. Shalf. The Cactus framework and toolkit: Design and applications, 2003.
[9]
T. Hoefler, R. Rabenseifner, H. Ritzdorf, B. R. de Supinski, R. Thakur, and J. L. Traeff. The Scalable Process Topology Interface of MPI 2.2. Concurrency and Computation: Practice and Experience, 23(4):293--310, Aug. 2010.
[10]
N. L. Johnson and S. Kotz. Urn models and their application: an approach to modern discrete probability theory. Wiley New York, 1977.
[11]
A. Jokanovic, B. Prisacari, G. Rodriguez, and C. Minkenberg. Randomizing task placement does not randomize traffic (enough). In Proc. of the 7th Interconnection Network Architecture: On-Chip, Multi-Chip, IMA-OCMC '13, pages 9--12, New York, NY, USA, 2013. ACM.
[12]
S. Kamil, J. Shalf, L. Oliker, and D. Skinner. Understanding ultra-scale application communication requirements. Proc. of the Workload Characterization Symposium, pages 178--187, Oct. 2005.
[13]
J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-driven, highly-scalable dragonfly topology. SIGARCH Comput. Archit. News, 36(3):77--88, June 2008.
[14]
Z. Lin, S. Ethier, T. S. Hahm, and W. M. Tang. Size scaling of turbulent transport in magnetically confined plasmas. Phys. Rev. Lett., 88(19):195004--, 2002.
[15]
C. Minkenberg, W. Denzel, G. Rodriguez, and R. Birke. End-to-end modeling and simulation of high-performance computing systems. Springer Proc. in Physics: Use Cases of Discrete Event Simulation: Appliance and Research, page 201, 2012.
[16]
B. Prisacari, G. Rodriguez, M. Garcia, E. Vallejo, R. Beivide, and C. Minkenberg. Performance implications of remote-only load balancing under adversarial traffic in dragonflies. In Proc. of the 8th International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip, INA-OCMC '14, pages 5:1--5:4, New York, NY, USA, 2014. ACM.
[17]
M. Raab and A. Steger. Balls into bins - a simple and tight analysis. In Randomization and Approximation Techniques in Computer Science, pages 159--170. Springer, 1998.
[18]
L. G. Valiant. A scheme for fast parallel communication. SIAM J. Comput., 11(2):350--361, 1982.
[19]
S. Williams, J. Carter, L. Oliker, J. Shalf, and K. Yelick. Lattice Boltzmann simulation optimization on leading multicore platforms. In Proc. of the International Conference on Parallel and Distributed Computing Systems (IPDPS), 2008.

Cited By

View all
  • (2022)Noise in the CloudsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706096:3(1-27)Online publication date: 8-Dec-2022
  • (2022)HammingMesh: A Network Topology for Large-Scale Deep LearningSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00016(1-18)Online publication date: Nov-2022
  • (2022)The Case for Disjoint Job Mapping on High-Radix Networked Parallel ComputersAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-95388-1_9(123-143)Online publication date: 23-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing
June 2014
334 pages
ISBN:9781450327497
DOI:10.1145/2600212
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cartesian and random task placement
  2. direct and indirect routing
  3. dragonfly networks
  4. nearest neighbor exchanges
  5. stencil computation

Qualifiers

  • Research-article

Conference

HPDC'14
Sponsor:

Acceptance Rates

HPDC '14 Paper Acceptance Rate 21 of 130 submissions, 16%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Noise in the CloudsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706096:3(1-27)Online publication date: 8-Dec-2022
  • (2022)HammingMesh: A Network Topology for Large-Scale Deep LearningSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00016(1-18)Online publication date: Nov-2022
  • (2022)The Case for Disjoint Job Mapping on High-Radix Networked Parallel ComputersAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-95388-1_9(123-143)Online publication date: 23-Feb-2022
  • (2021)FlareProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476178(1-16)Online publication date: 14-Nov-2021
  • (2021)Performance Evaluation of Adaptive Routing on Dragonfly-based Production Systems2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00042(340-349)Online publication date: May-2021
  • (2021)A scheduling policy to save 10% of communication time in parallel fast Fourier transformConcurrency and Computation: Practice and Experience10.1002/cpe.650835:15Online publication date: 18-Jul-2021
  • (2020)An In-Depth Analysis of the Slingshot InterconnectSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00039(1-14)Online publication date: Nov-2020
  • (2020)FatPaths: Routing in Supercomputers and Data Centers when Shortest Paths Fall ShortSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00031(1-18)Online publication date: Nov-2020
  • (2019)Analyzing Cost-Performance Tradeoffs of HPC Network Designs under Different Constraints using SimulationsProceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/3316480.3325516(1-12)Online publication date: 29-May-2019
  • (2019)Mitigating network noise on Dragonfly networks through application-aware routingProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356196(1-32)Online publication date: 17-Nov-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media