research-article

Open access

Scalability Techniques for Practical Synchronization Primitives: Designing locking primitives with performance in mind

Author:

Davidlohr BuesoAuthors Info & Claims

Queue, Volume 12, Issue 11

Pages 40 - 53

https://doi.org/10.1145/2693193.2698990

Published: 22 November 2014 Publication History

All formats PDF

Abstract

In an ideal world, applications are expected to scale automatically when executed on increasingly larger systems. In practice, however, not only does this scaling not occur, but it is common to see performance actually worsen on those larger systems.

References

[1]

Al Bahra, S. 2013. Nonblocking algorithms and scalable multicore programming. ACM Queue 11(5).

Digital Library

[2]

Boyd-Wickizer, S. Kaashoek, F. M., Morris, R. Zeldovich, N. 2012. Non-scalable locks are dangerous. Proceedings of the Linux Symposium. Ottawa, Canada.

[3]

Bueso, D., Norton, S. J. 2014. An overview of kernel lock improvements. LinuxCon North America, Chicago, IL; http://events.linuxfoundation.org/sites/events/files/slides/linuxcon-2014-locking-final.pdf.

[4]

Corbet, J. 2013. Cramming more into struct page. LWN.net; http://lwn.net/Articles/565097/.

[5]

Corbet, J. 2013. Improving ticket spinlocks. LWN.net; http://lwn.net/Articles/531254/.

[6]

Crummey-Mellor, J. M., Scott, M. L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions of Computer Systems 9(1): 21-65.

Digital Library

[7]

Fuerst, S. 2014. Unfairness and locking. Lockless Inc.; http://locklessinc.com/articles/unfairness/.

[8]

Gray, J. N., Lorie, R. A., Putzolu, G. R., Traiger, I. L. 1975. Granularity of locks and degrees of consistency in a shared data base. San Jose, CA: IBM Research Laboratory.

[9]

Lameter, C. 2014. Normal and exotic use cases for NUMA features. Linux Foundation Collaboration Summit, Napa, CA.

[10]

McKenney, P. E. 2014. Is parallel programming hard, and, if so, what can you do about it?; https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html.

[11]

McKenney, P. E., Boyd-Wickizer, S., Walpole, J. 2013. RCU usage in the Linux kernel: one decade later; http://www2.rdrop.com/users/paulmck/techreports/RCUUsage.2013.02.24a.pdf.

[12]

Molnar, I. Bueso, D. 2014. Design of the generic mutex subsystem. Linux kernel source code: documentation/mutex-design.txt.

[13]

van Riel, R., Bueso, D. 2013. ipc,sem: sysv semaphore scalability. LWN.net; http://lwn.net/Articles/543659/.

[14]

Rudolph, L., Segall, Z. 1984. Dynamic decentralized cache schemes for MIMD parallel processors. Proceedings of the 11th Annual International Symposium on Computer Architecture: 340-347.

Digital Library

[15]

Scott, M. L. 2013. Shared-memory synchronization. Synthesis Lectures on Computer Architecture. San Rafael, CA: Morgan & Claypool Publishers.

Digital Library

[16]

Unrau, R. C. Krieger, O. Gamsa, B., Stumm, M. 1994. Experiences with locking in a NUMA multiprocessor operating system kernel. Symposium on Operating Systems Design and Implementation; https://www.usenix.org/legacy/publications/library/proceedings/osdi/full_papers/unrau.a.

Digital Library

[17]

Zijlstra, P., Long, W. 2014. locking: qspinlock. LWN.net; http://lwn.net/Articles/590189/.

Cited By

Singh AChakraborty P(2020)Optimizing Trace Tool-overhead for Lock-Intensive Multi-threaded Parallel Applications2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)10.1109/PDGC50313.2020.9315323(225-230)Online publication date: 6-Nov-2020
https://doi.org/10.1109/PDGC50313.2020.9315323
Dice DKogan A(2019)Compact NUMA-aware LocksProceedings of the Fourteenth EuroSys Conference 201910.1145/3302424.3303984(1-15)Online publication date: 25-Mar-2019
https://dl.acm.org/doi/10.1145/3302424.3303984
Langmead BWilks CAntonescu VCharles R(2018)Scaling read aligners to hundreds of threads on general-purpose processorsBioinformatics10.1093/bioinformatics/bty64835:3(421-432)Online publication date: 18-Jul-2018
https://doi.org/10.1093/bioinformatics/bty648
Show More Cited By

Recommendations

Scalability techniques for practical synchronization primitives

Designing locking primitives with performance in mind.
On the Importance of Synchronization Primitives with Low Consensus Numbers
ICDCN '18: Proceedings of the 19th International Conference on Distributed Computing and Networking

The consensus number of a synchronization primitive is the maximum number of processes for which the primitive can solve consensus. This has been the traditional measure of power of a synchronization primitive. Thus, the compare-and-swap primitive, ...
On the inherent weakness of conditional synchronization primitives
PODC '04: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing

The "wait-free hierarchy" classifies multiprocessor synchronization primitives according to their power to solve consensus. The classification is based on assigning a number n to each synchronization primitive, where n is the maximal number of processes ...

Comments

Information & Contributors

Information

Published In

cover image Queue

Queue Volume 12, Issue 11

Concurrency

November 2014

34 pages

ISSN:1542-7730

EISSN:1542-7749

DOI:10.1145/2693193

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2014

Published in QUEUE Volume 12, Issue 11

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Popular
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
41,078
Total Downloads

Downloads (Last 12 months)2,898
Downloads (Last 6 weeks)419

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Singh AChakraborty P(2020)Optimizing Trace Tool-overhead for Lock-Intensive Multi-threaded Parallel Applications2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)10.1109/PDGC50313.2020.9315323(225-230)Online publication date: 6-Nov-2020
https://doi.org/10.1109/PDGC50313.2020.9315323
Dice DKogan A(2019)Compact NUMA-aware LocksProceedings of the Fourteenth EuroSys Conference 201910.1145/3302424.3303984(1-15)Online publication date: 25-Mar-2019
https://dl.acm.org/doi/10.1145/3302424.3303984
Langmead BWilks CAntonescu VCharles R(2018)Scaling read aligners to hundreds of threads on general-purpose processorsBioinformatics10.1093/bioinformatics/bty64835:3(421-432)Online publication date: 18-Jul-2018
https://doi.org/10.1093/bioinformatics/bty648
Kashyap SMin CKim T(2016)Opportunistic SpinlocksACM SIGOPS Operating Systems Review10.1145/2903267.290327150:1(9-16)Online publication date: 11-Mar-2016
https://dl.acm.org/doi/10.1145/2903267.2903271

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Magazine Site

View this article on the magazine site (external)

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents