Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3416315.3416321acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Implementation and performance evaluation of MPI persistent collectives in MPC: a case study

Published: 07 October 2020 Publication History

Abstract

Persistent collective communications have recently been voted in the MPI standard, opening the door to many optimizations to reduce collectives cost, in particular for recurring operations. Indeed persistent semantics contains an initialization phase called only once for a specific collective. It can be used to collect building costs necessary to the collective, to avoid paying them each time the operation is performed.
We propose an overview of the implementation of the persistent collectives in the MPC MPI runtime. We first present a naïve implementation for MPI runtimes already providing nonblocking collectives. Then, we improve this first implementation with two levels of caching optimizations. We present the performance results of the naïve and optimized versions and discuss their impact on different collective algorithms. We observe performance improvement compared to the naïve version on a repetitive benchmark, up to a 3x speedup for the reduce collective.

References

[1]
Alexandra Carpen-Amarie, Sascha Hunold, and Jesper Larsson Träff. 2017. On expected and observed communication performance with MPI derived datatypes. Parallel Comput. 69, 98 – 117. https://doi.org/10.1016/j.parco.2017.08.006
[2]
Alexandre Denis, Julien Jaeger, Emmanuel Jeannot, Marc Pérache, and Hugo Taboada. 2018. Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor. In Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings(Lecture Notes in Computer Science), Marco Aldinucci, Luca Padovani, and Massimo Torquati(Eds.), Vol. 11014. Springer, 616–627. https://doi.org/10.1007/978-3-319-96983-1_44
[3]
Alexandre Denis, Julien Jaeger, Emmanuel Jeannot, Marc Pérache, and Hugo Taboada. 2019. Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor. Int. J. High Perform. Comput. Appl. 33, 6 (2019). https://doi.org/10.1177/1094342019860184
[4]
MPI Forum. 1994. MPI: A message-passing interface standard, version 1.1.
[5]
MPI Forum. 2015. MPI: A message-passing interface standard, version 3.1.
[6]
Brice Goglin, Emmanuel Jeannot, Farouk Mansouri, and Guillaume Mercier. 2018. Hardware topology management in MPI applications through hierarchical communicators. Parallel Comput. 76(2018), 70 – 90. https://doi.org/10.1016/j.parco.2018.05.006
[7]
Khalid Hasanov and Alexey Lastovetsky. 2017. Hierarchical redesign of classic MPI reduction algorithms. The Journal of Supercomputing 73, 2 (2017), 713–725.
[8]
Masayuki Hatanaka, Atsushi Hori, and Yutaka Ishikawa. 2013. Optimization of MPI Persistent Communication. In Proceedings of the 20th European MPI Users’ Group Meeting(EuroMPI ’13). Association for Computing Machinery, New York, NY, USA, 79–84. https://doi.org/10.1145/2488551.2488566
[9]
Masayuki Hatanaka, Masamichi Takagi, Atsushi Hori, and Yutaka Ishikawa. 2017. Offloaded MPI Persistent Collectives Using Persistent Generalized Request Interface. In Proceedings of the 24th European MPI Users’ Group Meeting(EuroMPI ’17). Association for Computing Machinery, New York, NY, USA, Article Article 5, 10 pages. https://doi.org/10.1145/3127024.3127029
[10]
Torsten Hoefler and Andrew Lumsdaine. 2006. Design, Implementation, and Usage of LibNBC. School of Informatics.
[11]
Torsten Hoefler and Timo Schneider. 2012. Optimization Principles for Collective Neighborhood Communications. International Conference for High Performance Computing, Networking, Storage and Analysis, SC, 98:1–98:10. https://doi.org/10.1109/SC.2012.86
[12]
Qiao Kang, Jesper Larsson Träff, Reda Al-Bahrani, Ankit Agrawal, Alok Choudhary, and Wei-keng Liao. 2018. Full-Duplex Inter-Group All-to-All Broadcast Algorithms with Optimal Bandwidth. In Proceedings of the 25th European MPI Users’ Group Meeting(EuroMPI ’18). Association for Computing Machinery, New York, NY, USA, Article Article 1, 10 pages. https://doi.org/10.1145/3236367.3236374
[13]
S. Kumar, P. Heidelberger, D. Chen, and M. Hines. 2010. Optimization of applications with non-blocking neighborhood collectives via multisends on the Blue Gene/P supercomputer. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS). 1–11. https://doi.org/10.1109/IPDPS.2010.5470407
[14]
Jesper Larsson Träff, Alexandra Carpen-Amarie, Sascha Hunold, and Antoine Rougier. 2016. Message-Combining Algorithms for Isomorphic, Sparse Collective Communication. arXiv e-prints, Article arXiv:1606.07676, arXiv:1606.07676 pages. arxiv:cs.DC/1606.07676
[15]
Bradley Morgan, Daniel J. Holmes, Anthony Skjellum, Purushotham Bangalore, and Srinivas Sridharan. 2017. Planning for Performance: Persistent Collective Operations for MPI. In Proceedings of the 24th European MPI Users’ Group Meeting(EuroMPI ’17). Association for Computing Machinery, New York, NY, USA, Article Article 4, 11 pages. https://doi.org/10.1145/3127024.3127028
[16]
Marc Pérache, Patrick Carribault, and Hervé Jourdren. 2009. MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Proceedings of the 16th European PVM/MPI Users’ Group Meeting (EuroPVM/MPI 2009), Matti Ropo, Jan Westerholm, and Jack Dongarra (Eds.). Lecture Notes in Computer Science, Vol. 5759. Springer Berlin Heidelberg, 94–103. https://doi.org/10.1007/978-3-642-03770-2_16
[17]
Martin Ruefenacht, Mark Bull, and Stephen Booth. 2017. Generalisation of recursive doubling for AllReduce: Now with simulation. Parallel Comput. 69(2017), 24 – 44. https://doi.org/10.1016/j.parco.2017.08.004
[18]
Anthony Skjellum. 1998. High Performance MPI: Extending the Message Passing Interface for Higher Performance and Higher Predictability. In International Conference on Parallel and Distributed Processing Techniques and Applications(PDPTA’98). 25–32.
[19]
Jesper Larsson Träff and Sascha Hunold. 2019. Cartesian Collective Communication. In Proceedings of the 48th International Conference on Parallel Processing(ICPP 2019). Association for Computing Machinery, New York, NY, USA, Article 48, 11 pages. https://doi.org/10.1145/3337821.3337848
[20]
Jesper Larsson Träff, Antoine Rougier, and Sascha Hunold. 2014. Implementing a Classic: Zero-Copy All-to-All Communication with Mpi Datatypes. In Proceedings of the 28th ACM International Conference on Supercomputing(ICS’14). Association for Computing Machinery, New York, NY, USA, 135–144. https://doi.org/10.1145/2597652.2597662

Cited By

View all
  • (2022)Towards leveraging collective performance with the support of MPI 4.0 features in MPCParallel Computing10.1016/j.parco.2021.102860109:COnline publication date: 1-Mar-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI/USA '20: Proceedings of the 27th European MPI Users' Group Meeting
September 2020
88 pages
ISBN:9781450388801
DOI:10.1145/3416315
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroMPI/USA '20
EuroMPI/USA '20: 27th European MPI Users' Group Meeting
September 21 - 24, 2020
TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Towards leveraging collective performance with the support of MPI 4.0 features in MPCParallel Computing10.1016/j.parco.2021.102860109:COnline publication date: 1-Mar-2022

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media