Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming

Published: 01 February 2010 Publication History

Abstract

As high-end computing systems continue to grow in scale, recent advances in multi- and many-core architectures have pushed such growth toward more dense architectures, that is, more processing elements per physical node, rather than more physical nodes themselves. Although a large number of scientific applications have relied so far on an MPI-everywhere model for programming high-end parallel systems; this model may not be sufficient for future machines, given their physical constraints such as decreasing amounts of memory per processing element and shared caches. As a result, application and computer scientists are exploring alternative programming models that involve using MPI between address spaces and some other threaded model, such as OpenMP, Pthreads, or Intel TBB, within an address space. Such hybrid models require efficient support from an MPI implementation for MPI messages sent from multiple threads simultaneously. In this paper, we explore the issues involved in designing such an implementation. We present four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity (from coarse-grain locks to fine-grain locks to lock-free operations) and correspondingly increasing levels of complexity. We present performance results that demonstrate the performance implications of the different approaches.

References

[1]
Caglar, S.G., Benson, G.D., Huang, Q. and Chu, C.-W. ( 2003). USFMPI: A multi-threaded implementation of MPI for Linux clusters. In Proceedings of the IASTED Conference on Parallel and Distributed Computing and Systems.
[2]
Chapman, B.M. and Massaioli, F. ( 2005). OpenMP. Parallel Computing 31(10-12): 957-959.
[3]
Demaine, E.D. ( 1997). A threads-only MPI implementation for the development of parallel programs. In Proceedings of the 11th International Symposium on High Performance Computing Systems, pp. 153-163.
[4]
Dijkstra, E.W. ( 1965). Solution of a problem in concurrent programming control . Communications of the ACM 8(9): 569.
[5]
García, F., Calderón, A. and Carretero, J. ( 1999). MiMPI: A multithread-safe implementation of MPI. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 6th European PVM/MPI Users¿ Group Meeting (Lecture Notes in Computer Science, Vol. 1697). Berlin: Springer, pp. 207-214.
[6]
Gropp, W. and Thakur, R. ( 2007). Thread safety in an MPI implementation: Requirements and analysis. Parallel Computing 33(9): 595-604.
[7]
Herlihy, M. ( 1991). Wait-free synchronization. ACM Transactions on Programming Languages and Systems 11(1): 124- 149.
[8]
Ieee/Ansi ( 1996). Portable Operating System Interface (POSIX)-Part 1: System Application Program Interface (API) {C Language}. IEEE/ANSI Standard 1003.1 (1996 edition).
[9]
Lee, E.A. ( 2006). The problem with threads. Computer 39(5): 33-42.
[10]
MPI Forum (1997). MPI-2: Extensions to the message-passing interface. <url>http://www.mpi-forum.org/docs/docs.html</url> .
[11]
Plachetka, T. ( 2002). (Quasi-) thread-safe PVM and (quasi-) thread-safe MPI without active polling. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users¿ Group Meeting (Lecture Notes in Computer Science, Vol. 2474). Berlin: Springer, pp. 296-305.
[12]
Protopopov, B.V. and Skjellum, A. ( 2001). A multithreaded message passing interface (MPI) architecture: Performance and program issues. Journal of Parallel and Distributed Computing 61(4): 449-466.
[13]
Reinders, J. ( 2007). Intel Threading Building Blocks. Sebastopol, CA: O'Reilly &amp; Associates, Inc .
[14]
Seiler, L. et al. (2008). Larrabee: a many-core x86 architecture for visual computing. In SIGGRAPH ¿08: ACM SIGGRAPH 2008 papers. New York: ACM Press, pp. 1-15.
[15]
Skjellum, A., Protopopov, B. and Hebert, S. ( 1996). A thread taxonomy for MPI. In Proceedings of the 2nd MPI Developers Conference, pp. 50-57.
[16]
Tang, H. and Yang, T. ( 2001). Optimizing threaded MPI execution on SMP clusters. In Proceedings of the 15th ACM International Conference on Supercomputing, pp. 381- 392.
[17]
Top500 (2008). Top500 supercomputer sites, November. <url>http://www.top500.org/lists/2008/11</url> .
[18]
Vance, A. ( 2008). Sun's Niagara 3 will have 16-cores and 16 threads per core. The Register <url>http://www.theregister.co.uk/2008/06/23/sun_niagara_k2</url>.

Cited By

View all
  • (2024)Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communicationThe Journal of Supercomputing10.1007/s11227-024-06201-x80:14(20715-20742)Online publication date: 3-Jun-2024
  • (2022)Lessons learned on MPI+threads communicationProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571987(1-16)Online publication date: 13-Nov-2022
  • (2020)How I learned to stop worrying about user-visible endpoints and love MPIProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392773(1-13)Online publication date: 29-Jun-2020
  • Show More Cited By
  1. Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image International Journal of High Performance Computing Applications
      International Journal of High Performance Computing Applications  Volume 24, Issue 1
      February 2010
      102 pages

      Publisher

      Sage Publications, Inc.

      United States

      Publication History

      Published: 01 February 2010

      Author Tags

      1. MPI
      2. fine-grained locks
      3. hybrid programming
      4. threads

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 22 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communicationThe Journal of Supercomputing10.1007/s11227-024-06201-x80:14(20715-20742)Online publication date: 3-Jun-2024
      • (2022)Lessons learned on MPI+threads communicationProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571987(1-16)Online publication date: 13-Nov-2022
      • (2020)How I learned to stop worrying about user-visible endpoints and love MPIProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392773(1-13)Online publication date: 29-Jun-2020
      • (2019)Software combining to mitigate multithreaded MPI contentionProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330378(367-379)Online publication date: 26-Jun-2019
      • (2019)Lock Contention Management in Multithreaded MPIACM Transactions on Parallel Computing10.1145/32754435:3(1-21)Online publication date: 8-Jan-2019
      • (2018)Parallelized Software Offloading of Low-Level Communication with User-Level ThreadsProceedings of the International Conference on High Performance Computing in Asia-Pacific Region10.1145/3149457.3149475(289-298)Online publication date: 28-Jan-2018
      • (2018)Measuring Multithreaded Message Matching MiseryEuro-Par 2018: Parallel Processing10.1007/978-3-319-96983-1_34(480-491)Online publication date: 27-Aug-2018
      • (2017)Why is MPI so slow?Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126963(1-12)Online publication date: 12-Nov-2017
      • (2017)Eliminating contention bottlenecks in multithreaded MPIParallel Computing10.1016/j.parco.2017.08.00369:C(1-23)Online publication date: 1-Nov-2017
      • (2016)Towards millions of communicating threadsProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966914(1-14)Online publication date: 25-Sep-2016
      • Show More Cited By

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media