Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2966884.2966918acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Using InfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All

Published: 25 September 2016 Publication History

Abstract

The MPI all-to-all algorithm is a data intensive, high-cost collective algorithm used by many scientific High Performance Computing applications. Optimizations for small data exchange use aggregation techniques, such as the Bruck algorithm, to minimize the number of messages sent, and minimize overall operation latency. This paper presents three variants of the Bruck algorithm, which differ in the way data is laid out in memory at intermediate steps of the algorithm. Mellanox's InfiniBand support for Host Channel Adapter (HCA) hardware scatter/gather is used selectively to replace CPU-based buffer packing and unpacking. Using this offload capability reduces the eight and sixteen byte all-to-all latency on 1024 MPI Processes by 9.7% and 9.1%, respectively. The optimization accounts for a decrease in the total memory handling time of 40.6% and 57.9%, respectively.

References

[1]
"http://www.mpi-forum.org".
[2]
"http://mvapich.cse.ohio-state.edu/overview/".
[3]
"https://www.mpich.org/".
[4]
"http://www.ana-gainaru.com/eurompi16/pingpong.xlsx".
[5]
G. Almási, P. Heidelberger, C. J. Archer, X. Martorell, C. C. Erway, J. E. Moreira, B. Steinmacher-Burow, and Y. Zheng. Optimization of mpi collective communication on bluegene/1 systems. In Proceedings of the 19th Annual International Conference on Supercomputing, ICS '05, pages 253--262, New York, NY, USA, 2005. ACM.
[6]
J. Bruck, S. Member, C. tien Ho, S. Kipnis, E. Upfal, S. Member, and D. Weathersby. Efficient algorithms for all-to-all communications in multi-port message-passing systems. In IEEE Transactions on Parallel and Distributed Systems, pages 298--309, 1997.
[7]
G. Fagg, G. Bosilca, J. Pješivac-Grbović, T. Angskun, and J. Dongarra. Tuned: An open mpi collective communications component. In P. Kacsuk, T. Fahringer, and Z. Németh, editors, Distributed and Parallel Systems, chapter 7, pages 65--72. Springer US, Boston, MA, 2007.
[8]
D. P. G. Santhanaraman, J. Wu. Zero-copy mpi derived datatype communication over infiniband. September 2004.
[9]
E. Gabriel, G. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. Daniel, R. L. Graham, and T. S. Woodall. Open mpi: Goals, concept, and design of a next generation mpi implementation. In In Proceedings, 11th European PVM/MPI Users' Group Meeting, pages 97--104, 2004.
[10]
InfiniBand Trade Association. The InfiniBand Architecture. http://www.infinibandta.org/specs.
[11]
K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, S. Sur, and D. K. Panda. High-performance and scalable non-blocking all-to-all with collective offload on infiniband clusters: a study with parallel 3d fft. Comput. Sci., 26:237--246, June 2011.
[12]
A. R. Mamidala, R. Kumar, D. De, and D. K. Panda. Mpi collectives on modern multicore clusters: Performance optimizations and communication characteristics. 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 0:130--137, 2008.
[13]
Y. Qian. Design and evaluation of efficient collective communications on modern interconnects and multi-core clusters. PhD thesis, Queens University, Kingston, Ontario, Canada, 2010.
[14]
Y. Qian and A. Afsahi. Process arrival pattern aware alltoall and allgather on infiniband clusters. International Journal of Parallel Programming, 39(4):473--493, 2011.
[15]
R. Thakur, R. Rabenseifner, and W. Gropp. Optimization of collective communication operations in mpich. IJHPCA, 19(1):49--66, 2005.
[16]
J. L. Träff and A. Rougier. Mpi collectives and datatypes for hierarchical all-to-all communication. In Proceedings of the 21st European MPI Users' Group Meeting, EuroMPI/ASIA '14, pages 27:27--27:32, New York, NY, USA, 2014. ACM.
[17]
M. G. Venkata, R. L. Graham, J. Ladd, and P. Shamis. Exploring the all-to-all collective optimization space with connectx core-direct. 2014 43rd International Conference on Parallel Processing, 0:289--298, 2012.
[18]
A. Venkatesh, S. Potluri, R. Rajachandrasekar, M. Luo, K. Hamidouche, and D. Panda. High performance alltoall and allgather designs for infiniband mic clusters. May 2014.
[19]
E. Zahavi, G. Johnson, D. J. Kerbyson, and M. Lang. Optimized infiniband fat-tree routing for shift all-to-all communication patterns. Concurrency and Computation: Practice and Experience, 22(2):217--231, 2010.

Cited By

View all
  • (2024)Configurable Algorithms for All-to-All CollectivesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528936(1-12)Online publication date: May-2024
  • (2023)PetPS: Supporting Huge Embedding Models with Persistent MemoryProceedings of the VLDB Endowment10.14778/3579075.357907716:5(1013-1022)Online publication date: 6-Mar-2023
  • (2022)Hybrid Approach to Optimize MPI Collectives by In-network-computation and Point-to-Point Messages2022 7th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS55155.2022.9846190(773-783)Online publication date: 22-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
September 2016
225 pages
ISBN:9781450342346
DOI:10.1145/2966884
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. All-to-All
  2. Collective Communication
  3. MPI
  4. Network Offload

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroMPI 2016
EuroMPI 2016: The 23rd European MPI Users' Group Meeting
September 25 - 28, 2016
Edinburgh, United Kingdom

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)2
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Configurable Algorithms for All-to-All CollectivesISC High Performance 2024 Research Paper Proceedings (39th International Conference)10.23919/ISC.2024.10528936(1-12)Online publication date: May-2024
  • (2023)PetPS: Supporting Huge Embedding Models with Persistent MemoryProceedings of the VLDB Endowment10.14778/3579075.357907716:5(1013-1022)Online publication date: 6-Mar-2023
  • (2022)Hybrid Approach to Optimize MPI Collectives by In-network-computation and Point-to-Point Messages2022 7th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS55155.2022.9846190(773-783)Online publication date: 22-Apr-2022
  • (2022)Hierarchical Communication Optimization for FFT2022 IEEE/ACM International Workshop on Hierarchical Parallelism for Exascale Computing (HiPar)10.1109/HiPar56574.2022.00007(12-21)Online publication date: Nov-2022
  • (2021)Breakfast of championsProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465287(199-205)Online publication date: 1-Jun-2021
  • (2020)Using Arm Scalable Vector Extension to Optimize OPEN MPI2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)10.1109/CCGrid49817.2020.00-71(222-231)Online publication date: May-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media