This paper presents an overview of MPI, a proposed standard message passing interface for MIMD distributed memory concurrent computers. The design of MPI has been a collective effort involving researchers in the Untied states and Europe from many organizations and institutions. MPI includes point-to-point and collective communication routines, as well as support for process groups, communication contexts, and application topologies. While making use of new ideas where appropriate, the MPI standard is based largely on current practice.
Cited By
- Fei J, Ho C, Sahu A, Canini M and Sapio A Efficient sparse collective communication and its application to accelerate distributed deep learning Proceedings of the 2021 ACM SIGCOMM 2021 Conference, (676-691)
- Cartier H, Dinan J and Larkins D Optimizing Work Stealing Communication with Structured Atomic Operations Proceedings of the 50th International Conference on Parallel Processing, (1-10)
- Kim J, Aghayev A, Gibson G and Xing E STRADS-AP Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference, (207-221)
- Sang D and Lan P BK.Synapse Proceedings of the Tenth International Symposium on Information and Communication Technology, (43-48)
- Niethammer C and Rabenseifner R An MPI interface for application and hardware aware cartesian topology optimization Proceedings of the 26th European MPI Users' Group Meeting, (1-8)
- Larkins D, Snyder J and Dinan J Accelerated Work Stealing Proceedings of the 48th International Conference on Parallel Processing, (1-10)
- Chunduri S, Parker S, Balaji P, Harms K and Kumaran K Characterization of MPI usage on a production supercomputer Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, (1-15)
- Lass M, Mohr S, Wiebeler H, Kühne T and Plessl C A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices Proceedings of the Platform for Advanced Scientific Computing Conference, (1-11)
- Chunduri S, Parker S, Balaji P, Harms K and Kumaran K Characterization of MPI usage on a production supercomputer Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, (1-15)
- Song S, Liu X, Wu Q, Gerstlauer A, Li T and John L (2018). Start late or finish early, Proceedings of the VLDB Endowment, 12:2, (154-168), Online publication date: 1-Oct-2018.
- Gindraud F, Rastello F, Cohen A and Broquedis F (2016). A bounded memory allocator for software-defined global address spaces, ACM SIGPLAN Notices, 51:11, (78-88), Online publication date: 19-Jul-2018.
- Chapp D, Sato K, Ahn D and Taufer M (2021). Record-and-Replay Techniques for HPC Systems: A Survey, Supercomputing Frontiers and Innovations: an International Journal, 5:1, (11-30), Online publication date: 15-Mar-2018.
- Tran V, Renault É, Do X and Ha V Implementation of OpenMP Data-Sharing on CAPE Proceedings of the 9th International Symposium on Information and Communication Technology, (359-366)
- Larkins D, Snyder J and Dinan J Efficient Runtime Support for a Partitioned Global Logical Address Space Proceedings of the 47th International Conference on Parallel Processing, (1-10)
- Hjelm N, Dosanjh M, Grant R, Groves T, Bridges P and Arnold D Improving MPI Multi-threaded RMA Communication Performance Proceedings of the 47th International Conference on Parallel Processing, (1-11)
- Morgan B, Holmes D, Skjellum A, Bangalore P and Sridharan S Planning for performance Proceedings of the 24th European MPI Users' Group Meeting, (1-11)
- Sato K, Ahn D, Laguna I, Lee G, Schulz M and Chambreau C Noise Injection Techniques to Expose Subtle and Unintended Message Races Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (89-101)
- Mhembere D, Zheng D, Priebe C, Vogelstein J and Burns R knor Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, (67-78)
- Qiu R, Khurshid S, Păsăreanu C and Yang G A synergistic approach for distributed symbolic execution using test ranges Proceedings of the 39th International Conference on Software Engineering Companion, (130-132)
- Ahmed H, Skjellumh A, Bangalore P and Pirkelbauer P Transforming blocking MPI collectives to Non-blocking and persistent operations Proceedings of the 24th European MPI Users' Group Meeting, (1-11)
- Sato K, Ahn D, Laguna I, Lee G, Schulz M and Chambreau C (2017). Noise Injection Techniques to Expose Subtle and Unintended Message Races, ACM SIGPLAN Notices, 52:8, (89-101), Online publication date: 26-Oct-2017.
- Stegmeier A, Mische J, Frieb M and Ungerer T (2016). WCTT bounds for MPI primitives in the PaterNoster NoC, ACM SIGBED Review, 13:4, (25-30), Online publication date: 3-Nov-2016.
- Gindraud F, Rastello F, Cohen A and Broquedis F A bounded memory allocator for software-defined global address spaces Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management, (78-88)
- Harvey P, Bakanov K, Spence I and Nikolopoulos D A Scalable Runtime for the ECOSCALE Heterogeneous Exascale Hardware Platform Proceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, (1-8)
- Hansen G, Xavier P, Mish S, Voth T, Heinstein M and Glass M (2016). An MPI+$$X$$X implementation of contact global search using Kokkos, Engineering with Computers, 32:2, (295-311), Online publication date: 1-Apr-2016.
- Chakraborty S, Subramoni H, Moody A, Venkatesh A, Perkins J and Panda D Non-blocking PMI extensions for fast MPI startup Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (131-140)
- Liebig T, Stolpe M and Morik K Distributed traffic flow prediction with label proportions Proceedings of the 2nd International Conference on Mining Urban Data - Volume 1392, (36-43)
- Yan Y, Lin P, Liao C, de Supinski B and Quinlan D Supporting multiple accelerators in high-level programming models Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, (170-180)
- You Y, Wu H, Tsai Y and Chao Y VirtCL: a framework for OpenCL device abstraction and management Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (161-172)
- Iyer C, Avron H, Kollias G, Ineichen Y, Carothers C and Drineas P A scalable randomized least squares solver for dense overdetermined systems Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, (1-8)
- Sato K, Ahn D, Laguna I, Lee G and Schulz M Clock delta compression for scalable order-replay of non-deterministic parallel applications Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-12)
- You Y, Wu H, Tsai Y and Chao Y (2015). VirtCL: a framework for OpenCL device abstraction and management, ACM SIGPLAN Notices, 50:8, (161-172), Online publication date: 18-Dec-2015.
- Baker M, Welch A and Gorentla Venkata M Parallelizing the Smith-Waterman Algorithm Using OpenSHMEM and MPI-3 One-Sided Interfaces Revised Selected Papers of the Second Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies - Volume 9397, (178-191)
- Potluri S, Rossetti D, Becker D, Poole D, Gorentla Venkata M, Hernandez O, Shamis P, Lopez M, Baker M and Poole W Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems Revised Selected Papers of the Second Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies - Volume 9397, (18-35)
- Tanase G, Almási G, Xue H and Archer C Composable, non-blocking collective operations on power7 IH Proceedings of the 26th ACM international conference on Supercomputing, (215-224)
- Gupta A, Milojicic D and Kalé L Optimizing VM placement for HPC in the cloud Proceedings of the 2012 workshop on Cloud services, federation, and the 8th open cirrus summit, (1-6)
- Sarmiento E, Breitman K, Dávila A and Viterbo J A framework for readapting and running bioinformatics applications in the cloud Proceedings of the 2012 ACM Research in Applied Computation Symposium, (86-91)
- Jaros J and Schwarz J Parallel BMDA with an aggregation of probability models Proceedings of the Eleventh conference on Congress on Evolutionary Computation, (1683-1690)
- Scalas A, Casu G and Pili P High-performance technical computing with erlang Proceedings of the 7th ACM SIGPLAN workshop on ERLANG, (49-60)
- Palmer R, Gopalakrishnan G and Kirby R Semantics driven dynamic partial-order reduction of MPI-based parallel programs Proceedings of the 2007 ACM workshop on Parallel and distributed systems: testing and debugging, (43-53)
- Yelick K, Hilfinger P, Graham S, Bonachea D, Su J, Kamil A, Datta K, Colella P and Wen T (2007). Parallel Languages and Compilers, International Journal of High Performance Computing Applications, 21:3, (266-290), Online publication date: 1-Aug-2007.
- Faraj A, Yuan X and Patarasuk P (2007). A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters, IEEE Transactions on Parallel and Distributed Systems, 18:2, (264-276), Online publication date: 1-Feb-2007.
- Berthold J and Loogen R Parallel coordination made explicit in a functional setting Proceedings of the 18th international conference on Implementation and application of functional languages, (73-90)
- Xiao N and Armstrong M A specialized island model and its application in multiobjective optimization Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, (1530-1540)
- Banikazemi M, Govindaraju R, Blackmore R and Panda D (2001). MPI-LAPI, IEEE Transactions on Parallel and Distributed Systems, 12:10, (1081-1093), Online publication date: 1-Oct-2001.
- Smith L and Bull M (2001). Development of mixed mode MPI / OpenMP applications, Scientific Programming, 9:2,3, (83-98), Online publication date: 1-Aug-2001.
- Kesavan R and Panda D (1999). Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks, IEEE Transactions on Parallel and Distributed Systems, 10:4, (371-393), Online publication date: 1-Apr-1999.
- Rauber T and Rünger G A coordination language for mixed task and and data parallel programs Proceedings of the 1999 ACM symposium on Applied computing, (146-155)
- Wu J (1998). Adaptive Fault-Tolerant Routing in Cube-Based Multicomputers Using Safety Vectors, IEEE Transactions on Parallel and Distributed Systems, 9:4, (321-334), Online publication date: 1-Apr-1998.
- Gomez J, Mascarenhas E and Rego V (1998). The CLAM Approach to Multithreaded Communication on Shared-Memory Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 9:1, (36-49), Online publication date: 1-Jan-1998.
- Tseng Y, Lin T, Panda D and Gupta S (1997). Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks, IEEE Transactions on Parallel and Distributed Systems, 8:4, (380-396), Online publication date: 1-Apr-1997.
- Li P, Whitman S, Mendoza R and Tsaio J ParVox Proceedings of the IEEE symposium on Parallel rendering, (7-ff.)
- Cesari G CALYPSO Proceedings of the second international symposium on Parallel symbolic computation, (204-216)
- Lai C and Tsay J Communication algorithms on alternating group graphs Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
- Tseng Y, Panda D and Lai T (1996). A Trip-Based Multicasting Model in Wormhole-Routed Networks with Virtual Channels, IEEE Transactions on Parallel and Distributed Systems, 7:2, (138-150), Online publication date: 1-Feb-1996.
- Huber J, Chien A, Elford C, Blumenthal D and Reed D PPFS Proceedings of the 9th international conference on Supercomputing, (385-394)
Recommendations
A Multithreaded Message Passing Interface (MPI) Architecture
This paper discusses a multithreaded software architecture for message-passing interface (MPI) software specification. The architecture is thread-safe, allows for concurrent communication over several communications media (multifabric communication), ...
Efficient Message Passing Interface (MPI) for Parallel Computing on Clusters of Workstations
Special issue on workstation clusters and network-based computingParallel computing on clusters of workstations and personal computers has very high potential, since it leverages existing hardware and software. Parallel programming environments offer the user a convenient way to express parallel computation and ...
Checkpointing Message-Passing Interface(MPI) Parallel Programs
PRFTS '97: Proceedings of the 1997 Pacific Rim International Symposium on Fault-Tolerant SystemsMany scientific problems can be distributed on a large number of processos to take advantage of low cost workstations. In a parallel systems, a failure on any processor can halt the computation and requires restarting all applications. Checkpointing is ...