article

Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming

Authors:

Darius Buntinas,

Rajeev ThakurAuthors Info & Claims

International Journal of High Performance Computing Applications, Volume 24, Issue 1

Pages 49 - 57

https://doi.org/10.1177/1094342009360206

Published: 01 February 2010 Publication History

Abstract

As high-end computing systems continue to grow in scale, recent advances in multi- and many-core architectures have pushed such growth toward more dense architectures, that is, more processing elements per physical node, rather than more physical nodes themselves. Although a large number of scientific applications have relied so far on an MPI-everywhere model for programming high-end parallel systems; this model may not be sufficient for future machines, given their physical constraints such as decreasing amounts of memory per processing element and shared caches. As a result, application and computer scientists are exploring alternative programming models that involve using MPI between address spaces and some other threaded model, such as OpenMP, Pthreads, or Intel TBB, within an address space. Such hybrid models require efficient support from an MPI implementation for MPI messages sent from multiple threads simultaneously. In this paper, we explore the issues involved in designing such an implementation. We present four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity (from coarse-grain locks to fine-grain locks to lock-free operations) and correspondingly increasing levels of complexity. We present performance results that demonstrate the performance implications of the different approaches.

References

[1]

Caglar, S.G., Benson, G.D., Huang, Q. and Chu, C.-W. ( 2003). USFMPI: A multi-threaded implementation of MPI for Linux clusters. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems.

[2]

Chapman, B.M. and Massaioli, F. ( 2005). OpenMP. Parallel Computing 31(10-12): 957-959.

Digital Library

[3]

Demaine, E.D. ( 1997). A threads-only MPI implementation for the development of parallel programs. In Proceedings of the 11th International Symposium on High Performance Computing Systems, pp. 153-163.

[4]

Dijkstra, E.W. ( 1965). Solution of a problem in concurrent programming control . Communications of the ACM 8(9): 569.

Digital Library

[5]

García, F., Calderón, A. and Carretero, J. ( 1999). MiMPI: A multithread-safe implementation of MPI. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 6th European PVM/MPI Users¿ Group Meeting (Lecture Notes in Computer Science, Vol. 1697). Berlin: Springer, pp. 207-214.

[6]

Gropp, W. and Thakur, R. ( 2007). Thread safety in an MPI implementation: Requirements and analysis. Parallel Computing 33(9): 595-604.

Digital Library

[7]

Herlihy, M. ( 1991). Wait-free synchronization. ACM Transactions on Programming Languages and Systems 11(1): 124- 149.

Digital Library

[8]

Ieee/Ansi ( 1996). Portable Operating System Interface (POSIX)-Part 1: System Application Program Interface (API) {C Language}. IEEE/ANSI Standard 1003.1 (1996 edition).

[9]

Lee, E.A. ( 2006). The problem with threads. Computer 39(5): 33-42.

Digital Library

[10]

MPI Forum (1997). MPI-2: Extensions to the message-passing interface. <url>http://www.mpi-forum.org/docs/docs.html</url> .

[11]

Plachetka, T. ( 2002). (Quasi-) thread-safe PVM and (quasi-) thread-safe MPI without active polling. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users¿ Group Meeting (Lecture Notes in Computer Science, Vol. 2474). Berlin: Springer, pp. 296-305.

[12]

Protopopov, B.V. and Skjellum, A. ( 2001). A multithreaded message passing interface (MPI) architecture: Performance and program issues. Journal of Parallel and Distributed Computing 61(4): 449-466.

Digital Library

[13]

Reinders, J. ( 2007). Intel Threading Building Blocks. Sebastopol, CA: O'Reilly & Associates, Inc .

[14]

Seiler, L. et al. (2008). Larrabee: a many-core x86 architecture for visual computing. In SIGGRAPH ¿08: ACM SIGGRAPH 2008 papers. New York: ACM Press, pp. 1-15.

[15]

Skjellum, A., Protopopov, B. and Hebert, S. ( 1996). A thread taxonomy for MPI. In Proceedings of the 2nd MPI Developers Conference, pp. 50-57.

[16]

Tang, H. and Yang, T. ( 2001). Optimizing threaded MPI execution on SMP clusters. In Proceedings of the 15th ACM International Conference on Supercomputing, pp. 381- 392.

[17]

Top500 (2008). Top500 supercomputer sites, November. <url>http://www.top500.org/lists/2008/11</url> .

[18]

Vance, A. ( 2008). Sun's Niagara 3 will have 16-cores and 16 threads per core. The Register <url>http://www.theregister.co.uk/2008/06/23/sun_niagara_k2</url>.

Cited By

Watanabe YTsuji MMurai HBoku TSato M(2024)Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communicationThe Journal of Supercomputing10.1007/s11227-024-06201-x80:14(20715-20742)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1007/s11227-024-06201-x
Zambre RChandramowlishwaran AWolf FShende SCulhane CAlam SJagode H(2022)Lessons learned on MPI+threads communicationProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571987(1-16)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571987
Zambre RChandramowliswharan ABalaji PAyguadé EHwu WBadia RHofstee H(2020)How I learned to stop worrying about user-visible endpoints and love MPIProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392773(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392773
Show More Cited By

Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming
1. Computer systems organization
2. General and reference
  1. Cross-computing tools and techniques

Recommendations

MT-MPI: multithreaded MPI for many-core environments
ICS '14: Proceedings of the 28th ACM international conference on Supercomputing

Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardware threads. To utilize such architectures, application programmers are increasingly looking at hybrid programming models, where multiple threads interact ...
Overlapping communication and computation with OpenMP and MPI

Machines comprised of a distributed collection of shared memory or SMP nodes are becoming common for parallel computing. OpenMP can be combined with MPI on many such machines. Motivations for combing OpenMP and MPI are discussed. While OpenMP is ...
Toward Operating System Support for Scalable Multithreaded Message Passing
EuroMPI '15: Proceedings of the 22nd European MPI Users' Group Meeting

Modern CPU architectures provide a large number of processing cores and application programmers are increasingly looking at hybrid programming models, where multiple threads of a single process interact with the MPI library simultaneously. Moreover, ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of High Performance Computing Applications

International Journal of High Performance Computing Applications Volume 24, Issue 1

February 2010

102 pages

ISSN:1094-3420

Issue’s Table of Contents

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 February 2010

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Watanabe YTsuji MMurai HBoku TSato M(2024)Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communicationThe Journal of Supercomputing10.1007/s11227-024-06201-x80:14(20715-20742)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1007/s11227-024-06201-x
Zambre RChandramowlishwaran AWolf FShende SCulhane CAlam SJagode H(2022)Lessons learned on MPI+threads communicationProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571987(1-16)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571987
Zambre RChandramowliswharan ABalaji PAyguadé EHwu WBadia RHofstee H(2020)How I learned to stop worrying about user-visible endpoints and love MPIProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392773(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392773
Amer AArcher CBlocksome MCao CChuvelev MFujita HGarzaran MGuo YHammond JIwasaki SRaffenetti KShiryaev MSi MTaura KThapaliya SBalaji PEigenmann RDing CMcKee S(2019)Software combining to mitigate multithreaded MPI contentionProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330378(367-379)Online publication date: 26-Jun-2019
https://dl.acm.org/doi/10.1145/3330345.3330378
Amer ALu HBalaji PChabbi MWei YHammond JMatsuoka S(2019)Lock Contention Management in Multithreaded MPIACM Transactions on Parallel Computing10.1145/32754435:3(1-21)Online publication date: 8-Jan-2019
https://dl.acm.org/doi/10.1145/3275443
Endo WTaura K(2018)Parallelized Software Offloading of Low-Level Communication with User-Level ThreadsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149475(289-298)Online publication date: 28-Jan-2018
https://dl.acm.org/doi/10.1145/3149457.3149475
Schonbein WDosanjh MGrant RBridges P(2018)Measuring Multithreaded Message Matching MiseryEuro-Par 2018: Parallel Processing10.1007/978-3-319-96983-1_34(480-491)Online publication date: 27-Aug-2018
https://dl.acm.org/doi/10.1007/978-3-319-96983-1_34
Raffenetti KAmer AOden LArcher CBland WFujita HGuo YJanjusic TDurnov DBlocksome MSi MSeo SLanger AZheng GTakagi MCoffman PJose JSur SSannikov AOblomov SChuvelev MHatanaka MZhao XFischer PRathnayake TOtten MMin MBalaji PMohr BRaghavan P(2017)Why is MPI so slow?Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126963(1-12)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126963
Dang HSnir MGropp W(2017)Eliminating contention bottlenecks in multithreaded MPIParallel Computing10.1016/j.parco.2017.08.00369:C(1-23)Online publication date: 1-Nov-2017
https://dl.acm.org/doi/10.1016/j.parco.2017.08.003
Dang HSnir MGropp WDongarra JHolmes DCollis ALarsson Träff JSmith L(2016)Towards millions of communicating threadsProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966914(1-14)Online publication date: 25-Sep-2016
https://dl.acm.org/doi/10.1145/2966884.2966914
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents