research-article

Quantifying the Performance Benefits of Partitioned Communication in MPI

Authors:

Ken Raffenetti,

Rajeev ThakurAuthors Info & Claims

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

Pages 285 - 294

https://doi.org/10.1145/3605573.3605599

Published: 13 September 2023 Publication History

Abstract

Partitioned communication was introduced in MPI 4.0 as a user-friendly interface to support pipelined communication patterns, particularly common in the context of MPI+threads. It provides the user with the ability to divide a global buffer into smaller independent chunks, called partitions, which can then be communicated independently. In this work we first model the performance gain that can be expected when using partitioned communication. Next, we describe the improvements we made to MPICH to enable those gains and provide a high-quality implementation of MPI partitioned communication. We then evaluate partitioned communication in various common use cases and assess the performance in comparison with other MPI point-to-point and one-sided approaches. Specifically, we first investigate two scenarios commonly encountered for small partition sizes in a multithreaded environment: thread contention and overhead of using many partitions. We propose two solutions to alleviate the measured penalty and demonstrate their use. We then focus on large messages and the gain obtained when exploiting the delay resulting from computations or load imbalance. We conclude with our perspectives on the benefits of partitioned communication and the various results obtained.

References

[1]

M. G. F. Dosanjh, T. Groves, R. E. Grant, R. Brightwell, and P. G. Bridges. 2016. RMA-MT: A Benchmark Suite for Assessing MPI Multi-threaded RMA Performance, In 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 550–559. https://doi.org/10.1109/CCGrid.2016.84

Digital Library

[2]

Matthew G. F. Dosanjh, Andrew Worley, Derek Schafer, Prema Soundararajan, Sheikh Ghafoor, Anthony Skjellum, Purushotham V. Bangalore, and Ryan E. Grant. 2021. Implementation and evaluation of MPI 4.0 partitioned communication libraries. Parallel Comput. 108 (2021), 102827. https://doi.org/10.1016/j.parco.2021.102827

Digital Library

[3]

Thomas Gillis. 2023. bench-pcomm. https://github.com/pmodels/bench-pcomm.

[4]

Ryan E. Grant, Matthew G. F. Dosanjh, Michael J. Levenhagen, Ron Brightwell, and Anthony Skjellum. 2019. Finepoints: Partitioned Multithreaded MPI Communication. In High Performance Computing. Springer International Publishing, 330–350. https://doi.org/10.1007/978-3-030-20656-7_17

[5]

Yiltan Hassan Temucin, Ryan E. Grant, and Ahmad Afsahi. 2023. Micro-Benchmarking MPI Partitioned Point-to-Point Communication. In Proceedings of the 51st International Conference on Parallel Processing(ICPP ’22). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3545008.3545088

Digital Library

[6]

D. J. Holmes, A. Skjellum, J. Jaeger, R. E. Grant, P. V. Bangalore, M. G. F. Dosanjh, A. Bienz, and D. Schafer. 2021. Partitioned Collective Communication, In 2021 Workshop on Exascale MPI (ExaMPI). 2021 Workshop on Exascale MPI (ExaMPI), 9–17. https://doi.org/10.1109/ExaMPI54564.2021.00007

[7]

Huda Ibeid, Luke Olson, and William Gropp. 2020. FFT, FMM, and multigrid on the road to exascale: Performance challenges and opportunities. J. Parallel and Distrib. Comput. 136 (2020), 63–74. https://doi.org/10.1016/j.jpdc.2019.09.014

Digital Library

[8]

H. Kamal and A. Wagner. 2010. FG-MPI: Fine-grain MPI for multicore and clusters, In 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW). 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 1–8. https://doi.org/10.1109/IPDPSW.2010.5470773

[9]

Message Passing Interface Forum. 2021. MPI: A Message-Passing Interface Standard Version 4.0. https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf

[10]

P. Shamis, M. G. Venkata, M. G. Lopez, M. B. Baker, O. Hernandez, Y. Itigin, M. Dubman, G. Shainer, R. L. Graham, L. Liss, Y. Shahar, S. Potluri, D. Rossetti, D. Becker, D. Poole, C. Lamb, S. Kumar, C. Stunkel, G. Bosilca, and A. Bouteiller. 2015. UCX: An Open Source Framework for HPC Network APIs and Beyond, In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, 40–43. https://doi.org/10.1109/HOTI.2015.13

Digital Library

[11]

M. Si, A. J. Peña, J. Hammond, P. Balaji, M. Takagi, and Y. Ishikawa. 2015. Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures, In 2015 IEEE International Parallel and Distributed Processing Symposium. 2015 IEEE International Parallel and Distributed Processing Symposium, 665–676. https://doi.org/10.1109/IPDPS.2015.35

Digital Library

[12]

Rohit Zambre and Aparna Chandramowlishwaran. 2022. Lessons Learned on MPI+threads Communication. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis(SC ’22).

Digital Library

[13]

R. Zambre, A. Chandramowlishwaran, and P. Balaji. 2018. Scalable Communication Endpoints for MPI+Threads Applications, In 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), 803–812. https://doi.org/10.1109/PADSW.2018.8645059

[14]

Rohit Zambre, Aparna Chandramowliswharan, and Pavan Balaji. 2020. How I Learned to Stop Worrying about User-Visible Endpoints and Love MPI. In Proceedings of the 34th ACM International Conference on Supercomputing(ICS ’20). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3392717.3392773

Digital Library

[15]

Hui Zhou, Ken Raffenetti, Yanfei Guo, and Rajeev Thakur. 2022. MPIX Stream: An Explicit Solution to Hybrid MPI+X Programming. In Proceedings of the 29th European MPI Users’ Group Meeting(EuroMPI/USA’22). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3555819.3555820

Digital Library

Cited By

Temuçin YSchonbein WLevy SSojoodi AGrant RAfsahi A(2024)Design and Implementation of MPI-Native GPU-Initiated MPI Partitioned CommunicationProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00065(436-447)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00065
De Rango AUtrera GGil MMartorell XGiordano AD'Ambrosio DMendicino G(2024)Partitioned Reduction for Heterogeneous Environments2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00047(285-289)Online publication date: 20-Mar-2024
https://doi.org/10.1109/PDP62718.2024.00047
Marts WKruse DDosanjh MSchonbein WLevy SBridges P(2024)CMB: A Configurable Messaging Benchmark to Explore Fine-Grained Communication2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00013(28-38)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00013
Show More Cited By

Recommendations

Micro-Benchmarking MPI Partitioned Point-to-Point Communication
ICPP '22: Proceedings of the 51st International Conference on Parallel Processing

Modern High-Performance Computing (HPC) architectures have developed the need for scalable hybrid programming models. The latest Message Passing Interface (MPI) 4.0 standard has introduced a new communication model: MPI Partitioned Point-to-Point ...
Towards Correctness Checking of MPI Partitioned Communication in MUST
SC-W '23: Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis

The partitioned point-to-point communication concept introduced with the MPI-4.0 standard can improve the communication efficiency of hybrid parallel execution models. It allows threads on the sender and the receiver side to start working on parts of a ...
Implementation and evaluation of MPI 4.0 partitioned communication libraries
Abstract
Partitioned point-to-point communication primitives provide a performance-oriented mechanism to support a hybrid parallel programming model and have been included in the upcoming MPI-4.0 standard. These primitives enable an MPI library ...
Highlights
- Motivation and applications for MPI Partitioned communication.
- A summary of the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

August 2023

858 pages

ISBN:9798400708435

DOI:10.1145/3605573

Copyright © 2023 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICPP 2023

ICPP 2023: 52nd International Conference on Parallel Processing

August 7 - 10, 2023

UT, Salt Lake City, USA

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
129
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Temuçin YSchonbein WLevy SSojoodi AGrant RAfsahi A(2024)Design and Implementation of MPI-Native GPU-Initiated MPI Partitioned CommunicationProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00065(436-447)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00065
De Rango AUtrera GGil MMartorell XGiordano AD'Ambrosio DMendicino G(2024)Partitioned Reduction for Heterogeneous Environments2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP62718.2024.00047(285-289)Online publication date: 20-Mar-2024
https://doi.org/10.1109/PDP62718.2024.00047
Marts WKruse DDosanjh MSchonbein WLevy SBridges P(2024)CMB: A Configurable Messaging Benchmark to Explore Fine-Grained Communication2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00013(28-38)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00013
Adam JBesnard JRoussel AJaeger JCarribault PPérache M(2024)To Share or Not to Share: A Case for MPI in Shared-MemoryRecent Advances in the Message Passing Interface10.1007/978-3-031-73370-3_6(89-102)Online publication date: 25-Sep-2024
https://doi.org/10.1007/978-3-031-73370-3_6
Schwitanski SSakic NJenke JTomski FHermanns M(2023)Towards Correctness Checking of MPI Partitioned Communication in MUSTProceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624089(224-227)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624089

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten