Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2488551.2488557acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Optimization of MPI_Allreduce on the blue Gene/Q supercomputer

Published: 15 September 2013 Publication History

Abstract

The IBM Blue Gene/Q supercomputer has a 5D torus network where each node is connected to ten bi-directional links. In this paper we present techniques to optimize the MPI_Allreduce collective operation by building ten different edge disjoint spanning trees on the ten torus links. We accelerate summing of network packets with local buffers by the use of Quad Processing SIMD unit in the BG/Q cores and executing the sums on multiple communication threads created by the PAMI libraries. The net gain we achieve is a peak throughput of 6.3 GB/sec for double precision floating point sum allreduce, that is a speedup of 3.75x over the collective network based algorithm in the product MPI stack on BG/Q.

References

[1]
D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The IBM Blue Gene/Q interconnection network and message unit. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pages 26:1--26:10. ACM, 2011.
[2]
W. Gropp, E. Lusk, N. Doss, and A. Skjellum. MPICH: A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing, 22(6):789--828, September 1996.
[3]
IBM Blue Gene Team. Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development, 52(1/2), 2008.
[4]
S. Kumar, A. Faraj, A. R. Mamidala, B. E. Smith, G. Dózsa, B. Cernohous, J. A. Gunnels, D. Miller, J. Ratterman, and P. Heidelberger. Architecture of the component collective messaging interface. International Journal of High Performance Computing Applications (IJHPCA), pages 16--33, 2010.
[5]
S. Kumar, A. Mamidala, D. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. Steinmacher-Burow. PAMI: A parallel active message interface for the BlueGene/Q supercomputer. In Proceedings of 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.
[6]
G. I. Tanase, G. Almási, H. Xue, and C. Archer. Composable, Non-blocking Collective Operations on POWER7 IH. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, 2012.

Cited By

View all
  • (2024)Near-Optimal Wafer-Scale ReduceProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658693(334-347)Online publication date: 3-Jun-2024
  • (2018)Efficient Training of Convolutional Neural Nets on Large Distributed Systems2018 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2018.00057(392-401)Online publication date: Sep-2018
  • (2016)Adaptive Impact-Driven Detection of Silent Data Corruption for HPC ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.251763927:10(2809-2823)Online publication date: 1-Oct-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting
September 2013
289 pages
ISBN:9781450319034
DOI:10.1145/2488551
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • ARCOS: Computer Architecture and Technology Area, Universidad Carlos III de Madrid

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI
  2. blue gene
  3. collective optimization algorithms

Qualifiers

  • Research-article

Funding Sources

  • US Government

Conference

EuroMPI '13
Sponsor:
  • ARCOS
EuroMPI '13: 20th European MPI Users's Group Meeting
September 15 - 18, 2013
Madrid, Spain

Acceptance Rates

EuroMPI '13 Paper Acceptance Rate 22 of 47 submissions, 47%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Near-Optimal Wafer-Scale ReduceProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658693(334-347)Online publication date: 3-Jun-2024
  • (2018)Efficient Training of Convolutional Neural Nets on Large Distributed Systems2018 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2018.00057(392-401)Online publication date: Sep-2018
  • (2016)Adaptive Impact-Driven Detection of Silent Data Corruption for HPC ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.251763927:10(2809-2823)Online publication date: 1-Oct-2016
  • (2016)Optimization and Analysis of MPI Collective Communication on Fat-Tree Networks2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.85(1031-1040)Online publication date: May-2016
  • (2015)Improving Communication Throughput by Multipath Load Balancing on Blue Gene/QProceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2015.44(115-124)Online publication date: 16-Dec-2015
  • (2015)Multipath Load Balancing for M × N Communication Patterns on the Blue Gene/Q Supercomputer Interconnection NetworkProceedings of the 2015 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2015.140(833-840)Online publication date: 8-Sep-2015
  • (2014)Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputerThe International Journal of High Performance Computing Applications10.1177/109434201455208628:4(450-464)Online publication date: 7-Nov-2014
  • (2014)Improving Data Movement Performance for Sparse Data Patterns on the Blue Gene/Q SupercomputerProceedings of the 2014 43rd International Conference on Parallel Processing Workshops10.1109/ICPPW.2014.47(302-311)Online publication date: 9-Sep-2014

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media