research-article

Public Access

Lock Contention Management in Multithreaded MPI

Authors:

Abdelhalim Amer,

Satoshi MatsuokaAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 5, Issue 3

Article No.: 12, Pages 1 - 21

https://doi.org/10.1145/3275443

Published: 08 January 2019 Publication History

All formats PDF

Abstract

In this article, we investigate contention management in lock-based thread-safe MPI libraries. Specifically, we make two assumptions: (1) locks are the only form of synchronization when protecting communication paths; and (2) contention occurs, and thus serialization is unavoidable. Our work distinguishes between lock acquisitions with respect to work being performed inside a critical section; productive vs. unproductive. Waiting for message reception without doing anything else inside a critical section is an example of unproductive lock acquisition. We show that the high-throughput nature of modern scalable locking protocols translates into better communication progress for throughput-intensive MPI communication but negatively impacts latency-sensitive communication because of overzealous unproductive lock acquisition. To reduce unproductive lock acquisitions, we devised a method that promotes threads with productive work using a generic two-level priority locking protocol. Our results show that using a high-throughput protocol for productive work and a fair protocol for less productive code paths ensures the best tradeoff for fine-grained communication, whereas a fair protocol is sufficient for more coarse-grained communication. Although these efforts have been rewarding, scalability degradation remains significant. We discuss techniques that diverge from the pure locking model and offer the potential to further improve scalability.

References

[1]

David Dice, Virendra J. Marathe, and Nir Shavit. 2015. Lock cohorting: A general technique for designing NUMA locks. ACM Transactions on Parallel Computing 1, 2 (2015), Article 13.

Digital Library

[2]

Abdelhalim Amer, Pavan Balaji, Wesley Bland, William Gropp, Rob Latham, Huiwei Lu, Lena Oden, Antonio Pena, Ken Raffenetti, Sangmin Seo, et al. 2015. MPICH User’s Guide.

[3]

Abdelhalim Amer, Huiwei Lu, Pavan Balaji, and Satoshi Matsuoka. 2015. Characterizing MPI and hybrid MPI+threads applications at scale: Case study with BFS. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15). 1075--1083.

Digital Library

[4]

Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, and Satoshi Matsuoka. 2015. MPI+threads: Runtime contention and remedies. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). 239--248.

Digital Library

[5]

A. Amer, Huiwei Lu, Yanjie Wei, Jeff Hammond, Satoshi Matsuoka, and Pavan Balaji. 2016. Locking Aspects in Multithreaded MPI Implementations. Technical Report P6005-0516. Argonne National Lab.

[6]

Randal S. Baker and Kenneth R. Koch. 1998. An S<sub>n</sub> algorithm for the massively parallel CM-200 computer. Nuclear Science and Engineering 128, 3 (1998), 312--320.

[7]

Pavan Balaji, Darius Buntinas, D. Goodell, W. D. Gropp, and Rajeev Thakur. 2010. Fine-grained multithreading support for hybrid threaded MPI programming. International Journal of High Performance Computing Applications (IJHPCA) 24 (2010), 49--57.

Digital Library

[8]

François Broquedis, Jérôme Clet-Ortega, Stéphanie Moreaud, Nathalie Furmento, Brice Goglin, Guillaume Mercier, Samuel Thibault, and Raymond Namyst. 2010. hwloc: A generic framework for managing hardware affinities in HPC applications. In Proceedings of the 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP’10). IEEE, 180--186.

Digital Library

[9]

Irina Calciu, Dave Dice, Yossi Lev, Victor Luchangco, Virendra J Marathe, and Nir Shavit. 2013. NUMA-aware reader-writer locks. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’13), Vol. 48. 157--166.

Digital Library

[10]

Milind Chabbi, Abdelhalim Amer, Shasha Wen, and Xu Liu. 2017. An efficient abortable-locking protocol for multi-level NUMA systems. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 61--74.

Digital Library

[11]

Milind Chabbi, Michael Fagan, and John Mellor-Crummey. 2015. High performance locks for multi-level NUMA systems. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’15). 215--226.

Digital Library

[12]

Milind Chabbi and John Mellor-Crummey. 2016. Contention-conscious, locality-preserving locks. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). 22:1--22:14.

Digital Library

[13]

Hoang-Vu Dang, Sangmin Seo, Abdelhalim Amer, and Pavan Balaji. 2017. Advanced thread synchronization for multithreaded MPI implementations. In Proceedings of the17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’17). IEEE, 314--324.

Digital Library

[14]

Dave Dice. 2017. Malthusian locks. In Proceedings of the 12th European Conference on Computer Systems. ACM, 314--327.

Digital Library

[15]

David Dice, Virendra J. Marathe, and Nir Shavit. 2015. Lock cohorting: A general technique for designing NUMA locks. ACM Transactions on Parallel Computing 1, 2 (2015), 13.

Digital Library

[16]

James Dinan, Pavan Balaji, Dave Goodell, Doug Miller, Marc Snir, and Rajeev Thakur. 2013. Enabling MPI interoperability through flexible communication endpoints. In Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface (EuroMPI’13), 13--18.

Digital Library

[17]

Gábor Dózsa, Sameer Kumar, Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Joe Ratterman, and Rajeev Thakur. 2010. Enabling concurrent multithreaded MPI communication on multicore petascale systems. In Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface (EuroMPI’10). Springer-Verlag, Berlin, 11--20.

Digital Library

[18]

Ulrich Drepper. 2009. Futexes are tricky. Retrieved October 18, 2016 from https://www.akkadia.org/drepper/futex.pdf. Red Hat Inc. (2009).

[19]

Ulrich Drepper and Ingo Molnar. 2005. The native POSIX thread library for Linux. Retrieved October 18, 2016 from https://www.akkadia.org/drepper/nptl-design.pdf. White Paper, Red Hat Inc. (2005).

[20]

Hubertus Franke, Rusty Russell, and Matthew Kirkwood. {n.d.}. Fuss, futexes and furwocks: Fast userlevel locking in linux. In AUUG Conference Proceedings.

[21]

David Goodell, Pavan Balaji, Darius Buntinas, Gabor Dozsa, William Gropp, Sameer Kumar, Bronis R. de Supinski, and Rajeev Thakur. 2010. Minimizing MPI resource contention in multithreaded multicore environments. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER’10), 1--8.

Digital Library

[22]

William Gropp and Rajeev Thakur. 2007. Thread-safety in an MPI implementation: Requirements and analysis. Parallel Computing 33 (2007), 595--604.

Digital Library

[23]

Torsten Hoefler, Greg Bronevetsky, Brian Barrett, Bronis R. de Supinski, and Andrew Lumsdaine. 2010. Efficient MPI support for advanced hybrid programming models. In Proceedings of the 17th European MPI Users’ Group Meeting Conference on Recent Advances in the Message Passing Interface (EuroMPI’10), Vol. 6305. Springer, 50--61.

Digital Library

[24]

Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, and Rajeev Thakur. 2013. MPI+MPI: A new hybrid approach to parallel programming with MPI plus shared memory. Computing 95, 12 (2013), 1121--1136.

Digital Library

[25]

Saurabh Kalikar and Rupesh Nasre. 2016. DomLock: A new multi-granularity locking technique for hierarchies. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). 23.

Digital Library

[26]

John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems (TOCS) 9, 1 (1991), 21--65.

Digital Library

[27]

Message Passing Interface Forum. 2015. MPI: A Message-Passing Interface Standard Version 3.1. Technical Report.

Cited By

Nookala PChard KRaicu I(2024)X-OpenMP — eXtreme fine-grained tasking using lock-less work stealingFuture Generation Computer Systems10.1016/j.future.2024.05.019159:C(444-458)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.future.2024.05.019
Cho SLee SPham KAnh NKim SSon Y(2022)A Survey on Minimizing Lock Contention in Shared Resources in Linux Kernel2022 13th International Conference on Information and Communication Technology Convergence (ICTC)10.1109/ICTC55196.2022.9952854(1133-1135)Online publication date: 19-Oct-2022
https://doi.org/10.1109/ICTC55196.2022.9952854
Bang JKim CKim SChen QLee CByun ELee JEom H(2021)Finer-LRU: A Scalable Page Management Scheme for HPC Manycore Architectures2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00065(567-576)Online publication date: May-2021
https://doi.org/10.1109/IPDPS49936.2021.00065

Index Terms

Lock Contention Management in Multithreaded MPI
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent algorithms
  2. Parallel computing methodologies
    1. Parallel algorithms
      1. Massively parallel algorithms
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages

Recommendations

MPI+Threads: runtime contention and remedies
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Hybrid MPI+Threads programming has emerged as an alternative model to the “MPI everywhere” model to better handle the increasing core density in cluster nodes. While the MPI standard allows multithreaded concurrent communication, such flexibility comes ...
Analyzing lock contention in multithreaded applications
PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Many programs exploit shared-memory parallelism using multithreading. Threaded codes typically use locks to coordinate access to shared data. In many cases, contention for locks reduces parallel efficiency and hurts scalability. Being able to quantify ...
MPI+Threads: runtime contention and remedies
PPoPP '15

Hybrid MPI+Threads programming has emerged as an alternative model to the “MPI everywhere” model to better handle the increasing core density in cluster nodes. While the MPI standard allows multithreaded concurrent communication, such flexibility comes ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 5, Issue 3

September 2018

89 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/3305217

Editor:
David Bader
Georgia Institute of Technology, USA

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 January 2019

Accepted: 01 July 2018

Revised: 01 May 2018

Received: 01 May 2016

Published in TOPC Volume 5, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Exascale Computing Project
Science Technology and Innovation Committee of Shenzhen Municipality
JSPS KAKENHI
U.S. Department of Energy Office of Science
National Nuclear Security Administration

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
808
Total Downloads

Downloads (Last 12 months)247
Downloads (Last 6 weeks)31

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nookala PChard KRaicu I(2024)X-OpenMP — eXtreme fine-grained tasking using lock-less work stealingFuture Generation Computer Systems10.1016/j.future.2024.05.019159:C(444-458)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.future.2024.05.019
Cho SLee SPham KAnh NKim SSon Y(2022)A Survey on Minimizing Lock Contention in Shared Resources in Linux Kernel2022 13th International Conference on Information and Communication Technology Convergence (ICTC)10.1109/ICTC55196.2022.9952854(1133-1135)Online publication date: 19-Oct-2022
https://doi.org/10.1109/ICTC55196.2022.9952854
Bang JKim CKim SChen QLee CByun ELee JEom H(2021)Finer-LRU: A Scalable Page Management Scheme for HPC Manycore Architectures2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00065(567-576)Online publication date: May-2021
https://doi.org/10.1109/IPDPS49936.2021.00065
Bang JKim CChen QLee CByun ESung HEom HLee J(undefined)A Fine-Grained Page Management Scheme For Hpc Manycore I/O SystemsSSRN Electronic Journal10.2139/ssrn.4192491
https://doi.org/10.2139/ssrn.4192491

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents