Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Remote Memory Access Programming in MPI-3

Published: 29 June 2015 Publication History
  • Get Citation Alerts
  • Abstract

    The Message Passing Interface (MPI) 3.0 standard, introduced in September 2012, includes a significant update to the one-sided communication interface, also known as remote memory access (RMA). In particular, the interface has been extended to better support popular one-sided and global-address-space parallel programming models to provide better access to hardware performance features and enable new data-access modes. We present the new RMA interface and specify formal axiomatic models for data consistency and access semantics. Such models can help users reason about details of the semantics that are hard to extract from the English prose in the standard. It also fosters the development of tools and compilers, enabling them to automatically analyze, optimize, and debug RMA programs.

    References

    [1]
    Sarita V. Adve and Kourosh Gharachorloo. 1996. Shared memory consistency models: A tutorial. Computer 29, 12, 66--76.
    [2]
    Robert Alverson, Duncan Roweth, and Larry Kaplan. 2010. The Gemini System interconnect. In Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects (HOTI’10). IEEE, Los Alamitos, CA, 83--87.
    [3]
    Hans-J. Boehm. 2005. Threads cannot be implemented as a library. ACM SIGPLAN Notices 40, 6, 261--268.
    [4]
    Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ concurrency memory model. ACM SIGPLAN Notices 43, 6, 68--78.
    [5]
    Philip Buonadonna, Andrew Geweke, and David Culler. 1998. An implementation and analysis of the virtual interface architecture. In Proceedings of the 1998 ACM/IEEE Conference on Supercomputing (Supercomputing’98). IEEE, Los Alamitos, CA, USA, 1--15.
    [6]
    Bradford L. Chamberlain, David Callahan, and Hans P. Zima. 2007. Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications 21, 3, 291--312.
    [7]
    Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Notices 40, 10, 519--538.
    [8]
    Anthony Danalis, Lori Pollock, Martin Swany, and John Cavazos. 2009. MPI-aware compiler optimizations for improving communication-computation overlap. In Proceedings of the 23rd International Conference on Supercomputing (ICS’09). ACM, New York, NY, 316--325.
    [9]
    Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (SC’08). IEEE, Los Alamitos, CA, Article No. 4. http://dl.acm.org/citation.cfm?id=1413370.1413375
    [10]
    Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. 2012. Cray Cascade: A scalable HPC system based on a Dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). IEEE, Los Alamitos, CA, Article No. 103. http://dl.acm.org/citation.cfm?id=2388996.2389136
    [11]
    Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2013. Enabling highly-scalable remote memory access programming with MPI-3 one sided. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’13). ACM, New York, NY, Article No. 53.
    [12]
    Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, and Rajeev Thakur. 2012. Leveraging MPI’s one-sided communication interface for shared-memory programming. In Proceedings of the 19th European Conference on Recent Advances in the Message Passing Interface (EuroMPI’12). 132--141.
    [13]
    Torsten Hoefler, Rolf Rabenseifner, Hubert Ritzdorf, Bronis R. de Supinski, Rajeev Thakur, and Jesper Larsson Träff. 2011. The scalable process topology interface of MPI 2.2. Concurrency and Computation: Practice and Experience 23, 4, 293--310.
    [14]
    Torsten Hoefler and Timo Schneider. 2012. Optimization principles for collective neighborhood communications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC’12). IEEE, Los Alamitos, CA, Article No. 98. http://dl.acm.org/citation.cfm?id=2388996.2389129
    [15]
    Torsten Hoefler and Marc Snir. 2011. Writing parallel libraries with MPI—common practice, issues, and extensions. In Proceedings of the 18th European MPI Users’ Group Conference on Recent Advances in the Message Passing Interface (EuroMPI’11). 345--355. http://dl.acm.org/citation.cfm?id=2042476.2042521
    [16]
    Sven Karlsson and Mats Brorsson. 1998. A comparative characterization of communication patterns in applications using MPI and shared memory on an IBM SP2. In Proceedings of the 2nd International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications (CANPC’98). 189--201. http://dl.acm.org/citation.cfm?id=646092.680546
    [17]
    Jesper Larsson Traff, Hubert Ritzdorf, and Rolf Hempel. 2000. The implementation of MPI-2 one-sided communication for the NEC SX-5. In Proceedings of the 2000 ACM/IEEE Conference on Supercomputing (Supercomputing’00). IEEE, Los Alamitos, CA, Article No. 1. http://dl.acm.org/citation.cfm?id=370049.370878
    [18]
    Sela Mador-Haim, Rajeev Alur, and Milo M. K. Martin. 2011. Litmus tests for comparing memory consistency models: How long do they need to be? In Proceedings of the 48th Design Automation Conference (DAC’11). ACM, New York, NY, 504--509.
    [19]
    Jeremy Manson, William Pugh, and Sarita V. Adve. 2005. The Java memory model. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’05). ACM, New York, NY, 378--391.
    [20]
    John Mellor-Crummey, Laksono Adhianto, William N. Scherer III, and Guohua Jin. 2009. A new vision for coarray Fortran. In Proceedings of the 3rd Conference on Partitioned Global Address Space Programming Models (PGAS’09). ACM, New York, NY, Article No. 5.
    [21]
    John Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems 9, 1, 21--65.
    [22]
    MPI Forum. 2009. MPI: A Message-Passing Interface Standard. Version 2.2.
    [23]
    MPI Forum. 2012. MPI: A Message-Passing Interface Standard. Version 3.0.
    [24]
    Jaroslaw Nieplocha, Robert J. Harrison, and Richard J. Littlefield. 1996. Global arrays: A nonuniform memory access programming model for high-performance computers. Journal of Supercomputing 10, 2, 169--189.
    [25]
    Robert W. Numrich and John Reid. 1998. Co-array Fortran for parallel programming. ACM SIGPLAN Fortran Forum 17, 2, 1--31.
    [26]
    Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A better x86 memory model: x86-TSO. In Theorem Proving in Higher Order Logics. Lecture Notes in Computer Science, Vol. 5674. Springer, 391--407.
    [27]
    Susmit Sarkar, Kayvan Memarian, Scott Owens, Mark Batty, Peter Sewell, Luc Maranget, Jade Alglave, and Derek Williams. 2012. Synchronising C/C++ and power. ACM SIGPLAN Notices 47, 6, 311--322.
    [28]
    Timo Schneider, Robert Gerstenberger, and Torsten Hoefler. 2013. Compiler optimizations for non-contiguous remote data movement. In Proceedings of the 26th International Workshop on Languages and Compilers for Parallel Computing.
    [29]
    Piyush Shivam, Pete Wyckoff, and Dhabaleswar Panda. 2001. EMP: Zero-copy OS-bypass NIC-driven Gigabit Ethernet message passing. In Proceedings of the 2001 ACM/IEEE Conference on Supercomputing (Supercomputing’01). ACM, New York, NY, 57--57.
    [30]
    Emina Torlak, Mandana Vaziri, and Julian Dolby. 2010. MemSAT: Checking axiomatic specifications of memory models. ACM SIGPLAN Notices 45, 6, 341--350.
    [31]
    Jesper Larsson Traff. 2002. Implementing the MPI process topology mechanism. In Proceedings of the 2002 ACM/IEEE Conference on Supercomputing (Supercomputing’02). IEEE, Los Alamitos, CA, 1--14. http://dl.acm.org/citation.cfm?id=762761.762767
    [32]
    UPC Consortium. 2005. UPC Language Specifications, v1.2. Technical Report LBNL-59208. Lawrence Berkeley National Laboratory.
    [33]
    Jeremiah Willcock, Torsten Hoefler, Nick Edmonds, and Andrew Lumsdaine. 2011. Active pebbles: Parallel programming for data-driven applications. In Proceedings of the International Conference on Supercomputing (ICS’11). ACM, New York, NY, 235--244.
    [34]
    Tim S. Woodall, Galen M. Shipman, George Bosilca, and Arthur B. Maccabe. 2006. High performance RDMA protocols in HPC. In Proceedings of the 13th European PVM/MPI User’s Group Conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI’06). 76--85.
    [35]
    Chaoran Yang, Wesley Bland, John Mellor-Crummey, and Pavan Balaji. 2014. Portable, MPI-interoperable Coarray Fortran. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’14). ACM, New York, NY, 81--92.

    Cited By

    View all
    • (2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
    • (2024)Improving the MPI Remote Memory Access Model for Distributed-memory Systems by Implementing One-sided Broadcast2024 XXVII International Conference on Soft Computing and Measurements (SCM)10.1109/SCM62608.2024.10554130(17-21)Online publication date: 22-May-2024
    • (2024)Decentralized lock-free distributed queue in MPI remote memory access modelE3S Web of Conferences10.1051/e3sconf/202454803007548(03007)Online publication date: 12-Jul-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Parallel Computing
    ACM Transactions on Parallel Computing  Volume 2, Issue 2
    July 2015
    160 pages
    ISSN:2329-4949
    EISSN:2329-4957
    DOI:10.1145/2798443
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 June 2015
    Accepted: 01 December 2014
    Received: 01 March 2013
    Published in TOPC Volume 2, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MPI
    2. RMA
    3. one-sided communication

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)62
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)RMASanitizer: Generalized Runtime Detection of Data Races in Remote Memory Access ApplicationsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673109(833-844)Online publication date: 12-Aug-2024
    • (2024)Improving the MPI Remote Memory Access Model for Distributed-memory Systems by Implementing One-sided Broadcast2024 XXVII International Conference on Soft Computing and Measurements (SCM)10.1109/SCM62608.2024.10554130(17-21)Online publication date: 22-May-2024
    • (2024)Decentralized lock-free distributed queue in MPI remote memory access modelE3S Web of Conferences10.1051/e3sconf/202454803007548(03007)Online publication date: 12-Jul-2024
    • (2023)RMARaceBench: A Microbenchmark Suite to Evaluate Race Detection Tools for RMA ProgramsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624087(205-214)Online publication date: 12-Nov-2023
    • (2023)Rethinking Data Race Detection in MPI-RMA ProgramsProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624086(196-204)Online publication date: 12-Nov-2023
    • (2023)Itoyori: Reconciling Global Address Space and Global Fork-Join Task ParallelismProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607049(1-15)Online publication date: 12-Nov-2023
    • (2023)Performance Insights into Device-initiated RMA Using Kokkos Remote Spaces2023 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops)10.1109/CLUSTERWorkshops61457.2023.00028(66-67)Online publication date: 31-Oct-2023
    • (2023)EMPI: Enhanced Message Passing Interface in Modern C++2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00023(141-153)Online publication date: May-2023
    • (2023)A general approach for supporting nonblocking data structures on distributed-memory systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.11.006173(48-60)Online publication date: Mar-2023
    • (2022)Cost-aware Programming on Page-based Distributed Shared MemoryJournal of Information Processing10.2197/ipsjjip.30.46430(464-475)Online publication date: 2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media