research-article

Remote Memory Access Programming in MPI-3

Authors:

Torsten Hoefler,

Keith UnderwoodAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 2

Article No.: 9, Pages 1 - 26

https://doi.org/10.1145/2780584

Published: 29 June 2015 Publication History

Abstract

The Message Passing Interface (MPI) 3.0 standard, introduced in September 2012, includes a significant update to the one-sided communication interface, also known as remote memory access (RMA). In particular, the interface has been extended to better support popular one-sided and global-address-space parallel programming models to provide better access to hardware performance features and enable new data-access modes. We present the new RMA interface and specify formal axiomatic models for data consistency and access semantics. Such models can help users reason about details of the semantics that are hard to extract from the English prose in the standard. It also fosters the development of tools and compilers, enabling them to automatically analyze, optimize, and debug RMA programs.

References

[1]

Sarita V. Adve and Kourosh Gharachorloo. 1996. Shared memory consistency models: A tutorial. Computer 29, 12, 66--76.

Digital Library

[2]

Robert Alverson, Duncan Roweth, and Larry Kaplan. 2010. The Gemini System interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects (HOTI’10). IEEE, Los Alamitos, CA, 83--87.

Digital Library

[3]

Hans-J. Boehm. 2005. Threads cannot be implemented as a library. ACM SIGPLAN Notices 40, 6, 261--268.

Digital Library

[4]

Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ concurrency memory model. ACM SIGPLAN Notices 43, 6, 68--78.

Digital Library

[5]

Philip Buonadonna, Andrew Geweke, and David Culler. 1998. An implementation and analysis of the virtual interface architecture. In Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (Supercomputing’98). IEEE, Los Alamitos, CA, USA, 1--15.

Digital Library

[6]

Bradford L. Chamberlain, David Callahan, and Hans P. Zima. 2007. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications 21, 3, 291--312.

Digital Library

[7]

Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Notices 40, 10, 519--538.

Digital Library

[8]

Anthony Danalis, Lori Pollock, Martin Swany, and John Cavazos. 2009. MPI-aware compiler optimizations for improving communication-computation overlap. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). ACM, New York, NY, 316--325.

Digital Library

[9]

Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08). IEEE, Los Alamitos, CA, Article No. 4. http://dl.acm.org/citation.cfm?id=1413370.1413375

Digital Library

[10]

Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. 2012. Cray Cascade: A scalable HPC system based on a Dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). IEEE, Los Alamitos, CA, Article No. 103. http://dl.acm.org/citation.cfm?id=2388996.2389136

Digital Library

[11]

Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2013. Enabling highly-scalable remote memory access programming with MPI-3 one sided. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’13). ACM, New York, NY, Article No. 53.

Digital Library

[12]

Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, and Rajeev Thakur. 2012. Leveraging MPI’s one-sided communication interface for shared-memory programming. In Proceedings of the 19th European Conference on Recent Advances in the Message Passing Interface (EuroMPI’12). 132--141.

Digital Library

[13]

Torsten Hoefler, Rolf Rabenseifner, Hubert Ritzdorf, Bronis R. de Supinski, Rajeev Thakur, and Jesper Larsson Träff. 2011. The scalable process topology interface of MPI 2.2. Concurrency and Computation: Practice and Experience 23, 4, 293--310.

Digital Library

[14]

Torsten Hoefler and Timo Schneider. 2012. Optimization principles for collective neighborhood communications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). IEEE, Los Alamitos, CA, Article No. 98. http://dl.acm.org/citation.cfm?id=2388996.2389129

Digital Library

[15]

Torsten Hoefler and Marc Snir. 2011. Writing parallel libraries with MPI—common practice, issues, and extensions. In Proceedings of the 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface (EuroMPI’11). 345--355. http://dl.acm.org/citation.cfm?id=2042476.2042521

Digital Library

[16]

Sven Karlsson and Mats Brorsson. 1998. A comparative characterization of communication patterns in applications using MPI and shared memory on an IBM SP2. In Proceedings of the 2nd International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications (CANPC’98). 189--201. http://dl.acm.org/citation.cfm?id=646092.680546

Digital Library

[17]

Jesper Larsson Traff, Hubert Ritzdorf, and Rolf Hempel. 2000. The implementation of MPI-2 one-sided communication for the NEC SX-5. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (Supercomputing’00). IEEE, Los Alamitos, CA, Article No. 1. http://dl.acm.org/citation.cfm?id=370049.370878

Digital Library

[18]

Sela Mador-Haim, Rajeev Alur, and Milo M. K. Martin. 2011. Litmus tests for comparing memory consistency models: How long do they need to be? In Proceedings of the 48th Design Automation Conference (DAC’11). ACM, New York, NY, 504--509.

Digital Library

[19]

Jeremy Manson, William Pugh, and Sarita V. Adve. 2005. The Java memory model. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’05). ACM, New York, NY, 378--391.

Digital Library

[20]

John Mellor-Crummey, Laksono Adhianto, William N. Scherer III, and Guohua Jin. 2009. A new vision for coarray Fortran. In Proceedings of the 3rd Conference on Partitioned Global Address Space Programming Models (PGAS’09). ACM, New York, NY, Article No. 5.

Digital Library

[21]

John Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems 9, 1, 21--65.

Digital Library

[22]

MPI Forum. 2009. MPI: A Message-Passing Interface Standard. Version 2.2.

[23]

MPI Forum. 2012. MPI: A Message-Passing Interface Standard. Version 3.0.

[24]

Jaroslaw Nieplocha, Robert J. Harrison, and Richard J. Littlefield. 1996. Global arrays: A nonuniform memory access programming model for high-performance computers. Journal of Supercomputing 10, 2, 169--189.

Digital Library

[25]

Robert W. Numrich and John Reid. 1998. Co-array Fortran for parallel programming. ACM SIGPLAN Fortran Forum 17, 2, 1--31.

Digital Library

[26]

Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A better x86 memory model: x86-TSO. In Theorem Proving in Higher Order Logics. Lecture Notes in Computer Science, Vol. 5674. Springer, 391--407.

Digital Library

[27]

Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty, Peter Sewell, Luc Maranget, Jade Alglave, and Derek Williams. 2012. Synchronising C/C++ and power. ACM SIGPLAN Notices 47, 6, 311--322.

Digital Library

[28]

Timo Schneider, Robert Gerstenberger, and Torsten Hoefler. 2013. Compiler optimizations for non-contiguous remote data movement. In Proceedings of the 26th International Workshop on Languages and Compilers for Parallel Computing.

[29]

Piyush Shivam, Pete Wyckoff, and Dhabaleswar Panda. 2001. EMP: Zero-copy OS-bypass NIC-driven Gigabit Ethernet message passing. In Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (Supercomputing’01). ACM, New York, NY, 57--57.

Digital Library

[30]

Emina Torlak, Mandana Vaziri, and Julian Dolby. 2010. MemSAT: Checking axiomatic specifications of memory models. ACM SIGPLAN Notices 45, 6, 341--350.

Digital Library

[31]

Jesper Larsson Traff. 2002. Implementing the MPI process topology mechanism. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (Supercomputing’02). IEEE, Los Alamitos, CA, 1--14. http://dl.acm.org/citation.cfm?id=762761.762767

Digital Library

[32]

UPC Consortium. 2005. UPC Language Specifications, v1.2. Technical Report LBNL-59208. Lawrence Berkeley National Laboratory.

[33]

Jeremiah Willcock, Torsten Hoefler, Nick Edmonds, and Andrew Lumsdaine. 2011. Active pebbles: Parallel programming for data-driven applications. In Proceedings of the International Conference on Supercomputing (ICS’11). ACM, New York, NY, 235--244.

Digital Library

[34]

Tim S. Woodall, Galen M. Shipman, George Bosilca, and Arthur B. Maccabe. 2006. High performance RDMA protocols in HPC. In Proceedings of the 13th European PVM/MPI User’s Group Conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI’06). 76--85.

Digital Library

[35]

Chaoran Yang, Wesley Bland, John Mellor-Crummey, and Pavan Balaji. 2014. Portable, MPI-interoperable Coarray Fortran. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). ACM, New York, NY, 81--92.

Digital Library

Cited By

Schwitanski SOraji YPätzold CJenke JTomski FMüller M(2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673109
Abuelsoud MPaznikov A(2024)Improving the MPI Remote Memory Access Model for Distributed-memory Systems by Implementing One-sided Broadcast2024 XXVII International Conference on Soft Computing and Measurements (SCM)10.1109/SCM62608.2024.10554130(17-21)Online publication date: 22-May-2024
https://doi.org/10.1109/SCM62608.2024.10554130
Paznikov ABurachenko AAbuelsoud M(2024)Decentralized lock-free distributed queue in MPI remote memory access modelE3S Web of Conferences10.1051/e3sconf/202454803007548(03007)Online publication date: 12-Jul-2024
https://doi.org/10.1051/e3sconf/202454803007
Show More Cited By

Index Terms

Remote Memory Access Programming in MPI-3
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
      2. Language types
        Parallel programming languages

Recommendations

Implementing OpenSHMEM Using MPI-3 One-Sided Communication
OpenSHMEM 2014: Proceedings of the First Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools - Volume 8356

This paper reports the design and implementation of Open- SHMEM over MPI using new one-sided communication features in MPI- 3, which include not only new functions (e.g. remote atomics) but also a newmemory model that is consistent with that of SHMEM.We ...
Correctness checking of MPI one-sided communication using marmot
EuroPVM/MPI'06: Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface

The MPI-2 standard defines functions for Remote Memory Access (RMA) by allowing one process to specify all communication parameters both for the sending and the receiving side, which is also referred to as one-sided communication. Having experienced ...
Poster: High-level, one-sided programming models on MPI: a case study with global arrays and NWChem
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

Global Arrays (GA) is popular high-level parallel programming model that provides data and computation management facilities to the NWChem computational chemistry suite. GA's global-view data model is supported by the ARMCI partitioned global address ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 2, Issue 2

July 2015

160 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/2798443

Editor:
Phillip B. Gibbons
Intel Labs, Pittsburgh, USA

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 June 2015

Accepted: 01 December 2014

Received: 01 March 2013

Published in TOPC Volume 2, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Eidgenössische Technische Hochschule Zürich

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

71
Total Citations
View Citations
751
Total Downloads

Downloads (Last 12 months)62
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Schwitanski SOraji YPätzold CJenke JTomski FMüller M(2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673109
Abuelsoud MPaznikov A(2024)Improving the MPI Remote Memory Access Model for Distributed-memory Systems by Implementing One-sided Broadcast2024 XXVII International Conference on Soft Computing and Measurements (SCM)10.1109/SCM62608.2024.10554130(17-21)Online publication date: 22-May-2024
https://doi.org/10.1109/SCM62608.2024.10554130
Paznikov ABurachenko AAbuelsoud M(2024)Decentralized lock-free distributed queue in MPI remote memory access modelE3S Web of Conferences10.1051/e3sconf/202454803007548(03007)Online publication date: 12-Jul-2024
https://doi.org/10.1051/e3sconf/202454803007
Schwitanski SJenke JKlotz SMüller M(2023)RMARaceBench: A Microbenchmark Suite to Evaluate Race Detection Tools for RMA ProgramsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624087(205-214)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624087
Vinayagame RSaillard EThibault SNguyen VSergent M(2023)Rethinking Data Race Detection in MPI-RMA ProgramsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624086(196-204)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624086
Shiina STaura KMohror KArnold DBadia R(2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607049
Mishler DCiesko JOlivier SBosilca G(2023)Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)10.1109/CLUSTERWorkshops61457.2023.00028(66-67)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTERWorkshops61457.2023.00028
Beni MCrisci LCosenza B(2023)EMPI: Enhanced Message Passing Interface in Modern C++2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00023(141-153)Online publication date: May-2023
https://doi.org/10.1109/CCGrid57682.2023.00023
Diep THa PFürlinger K(2023)A general approach for supporting nonblocking data structures on distributed-memory systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.11.006173(48-60)Online publication date: Mar-2023
https://doi.org/10.1016/j.jpdc.2022.11.006
Hideshima TSato STaura K(2022)Cost-aware Programming on Page-based Distributed Shared MemoryJournal of Information Processing10.2197/ipsjjip.30.46430(464-475)Online publication date: 2022
https://doi.org/10.2197/ipsjjip.30.464
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents