research-article

Open access

Everything you always wanted to know about synchronization but were afraid to ask

Authors:

Rachid Guerraoui,

Vasileios TrigonakisAuthors Info & Claims

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

Pages 33 - 48

https://doi.org/10.1145/2517349.2522714

Published: 03 November 2013 Publication History

Abstract

This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket -- uniform and non-uniform -- to multi-socket -- directory and broadcast-based -- many-cores. We draw a set of observations that, roughly speaking, imply that scalability of synchronization is mainly a property of the hardware.

Supplementary Material

MP4 File (d1-03-vasileios-trigonakis.mp4)

Download
1178.02 MB

References

[1]

J. Abellan, J. Fernandez, and M. Acacio. GLocks: Efficient support for highly-contended locks in Many-Core CMPs. IPDPS 2011, pages 893--905.

Digital Library

[2]

AMD. Software optimization guide for AMD family 10h and 12h processors. 2011.

[3]

G. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. AFIPS 1967 (Spring), pages 483--485.

Digital Library

[4]

T. E. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE TPDS, 1(1):6--16, 1990.

Digital Library

[5]

H. Attiya, A. Bar-Noy, and D. Dolev. Sharing memory robustly in message-passing systems. PODC 1990, pages 363--375.

Digital Library

[6]

A. Baumann, P. Barham, P. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. SOSP 2009, pages 29--44.

Digital Library

[7]

L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, and A. Vainshtein. Optimal strategies for spinning and blocking. J. Parallel Distrib. Comput., 21(2):246--254, 1994.

Digital Library

[8]

S. Borkar. Design challenges of technology scaling. Micro, IEEE, 19(4):23--29, 1999.

Digital Library

[9]

S. Borkar and A. Chien. The future of microprocessors. Communications of the ACM, 54(5):67--77, 2011.

Digital Library

[10]

S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: an operating system for many cores. OSDI 2008, pages 43--57.

Digital Library

[11]

S. Boyd-Wickizer, A. Clements, Y. Mao, A. Pesterev, M. Kaashoek, R. Morris, and N. Zeldovich. An analysis of Linux scalability to many cores. In OSDI 2010, pages 1--16.

Digital Library

[12]

S. Boyd-Wickizer, M. Kaashoek, R. Morris, and N. Zeldovich. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium, 2012.

[13]

P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache hierarchy and memory subsystem of the AMD Opteron processor. Micro, IEEE, 30(2):16--29, 2010.

Digital Library

[14]

D. Dice, V. Marathe, and N. Shavit. Lock cohorting: a general technique for designing numa locks. PPoPP 2012, pages 247--256.

Digital Library

[15]

B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. OSDI 1999, pages 87--100.

Digital Library

[16]

V. Gramoli, R. Guerraoui, and V. Trigonakis. TM2C: a software transactional memory for many-cores. EuroSys 2012, pages 351--364.

Digital Library

[17]

D. Hackenberg, D. Molka, and W. Nagel. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems. MICRO 2009, pages 413--422.

Digital Library

[18]

D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. SPAA 2010, pages 355--364. ACM.

Digital Library

[19]

M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, 1991.

Digital Library

[20]

M. Herlihy and N. Shavit. The art of multiprocessor programming, revised first edition. 2012.

Digital Library

[21]

M. Hill and M. Marty. Amdahl's law in the multi-core era. Computer, 41(7):33--38, 2008.

Digital Library

[22]

Intel. An introduction to the Intel QuickPath interconnect. 2009.

[23]

Intel. Intel 64 and IA-32 architectures software developer's manual. 2013.

[24]

Intel. Transactional Synchronization Extensions Overview. 2013.

[25]

libmemcached. http://libmemcached.org/libMemcached.html.

[26]

J. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. USENIX ATC 2012.

Digital Library

[27]

V. Luchangco, D. Nussbaum, and N. Shavit. A hierarchical CLH queue lock. ICPP 2006, pages 801--810.

Digital Library

[28]

J. Mellor-Crummey and M. Scott. Synchronization without contention. ASPLOS 1991, pages 269--278.

Digital Library

[29]

J. Mellor-Crummey and M. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM TOCS, 1991.

Digital Library

[30]

Memcached. http://www.memcached.org.

[31]

M. Michael and M. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. PODC 1996, pages 267--275.

Digital Library

[32]

S. Microsystems. UltraSPARC T2 supplement to the UltraSPARC architecture. 2007.

[33]

D. Molka, R. Schöne, D. Hackenberg, and M. Müller. Memory performance and SPEC OpenMP scalability on quad-socket x86 64 systems. ICA3PP 2011, pages 170--181.

Digital Library

[34]

MonetDB. http://www.monetdb.org/.

[35]

J. Moses, R. Illikkal, L. Zhao, S. Makineni, and D. Newell. Effects of locking and synchronization on future large scale CMP platforms. CAECW 2006.

[36]

U. Nawathe, M. Hassan, K. Yen, A. Kumar, A. Ramachandran, and D. Greenhill. Implementation of an 8-Core, 64-thread, power-efficient SPARC server on a chip. Solid-State Circuits, IEEE Journal of, 43(1):6--20, 2008.

[37]

M. Papamarcos and J. Patel. A low-overhead coherence solution for multiprocessors with private cache memories. ISCA 1984, pages 348--354.

Digital Library

[38]

C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. HPCA 2007, pages 13--24.

Digital Library

[39]

M. Schroeder and M. Burrows. Performance of Firefly RPC. In SOSP 1989, pages 83--90.

Digital Library

[40]

M. Scott and W. Scherer. Scalable queue-based spin locks with timeout. PPoPP 2001, pages 44--52.

Digital Library

[41]

Tilera tile-gx. http://www.tilera.com/products/processors/TILE-Gx_Family.

[42]

TPC-H. http://www.tpc.org/tpch/.

[43]

C. T. S. Building FIFO and priority-queuing spin locks from atomic swap. Technical report, 1993.

[44]

J. Tseng, H. Yu, S. Nagar, N. Dubey, H. Franke, P. Pattnaik, H. Inoue, and T. Nakatani. Performance studies of commercial workloads on a multi-core system. IISWC 2007, pages 57--65.

Digital Library

[45]

D. Wentzlaff and A. Agarwal. Factored operating systems (fos): the case for a scalable operating system for multicores. OSR, 43(2):76--85, 2009.

Digital Library

Cited By

Joo JKim HHan HGyu Im EKang S(2025)Highly Parallel Regular Expression Matching Using a Real Processing-in-Memory SystemIEEE Access10.1109/ACCESS.2025.353294413(18937-18951)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3532944
Kumar AKatkam RChaudhary PNaik PVutukuru M(2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00012
Gao JLu YXie MWang QShu JNaor DGoel A(2023)CITRONProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585957(297-314)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.5555/3585938.3585957
Show More Cited By

Recommendations

Everything you wanted to know about the running time of Mergesort but were afraid to ask

Although mergesort is an algorithm that is frequently glossed over in textbooks, it provides fertile ground for planting ideas about algorithm analysis in the minds of students. Why can we assume that n is a power of 2? How big is the increase in ...
Everything you always wanted to know about planning (but were afraid to ask)
KI'11: Proceedings of the 34th Annual German conference on Advances in artificial intelligence

Domain-independent planning is one of the long-standing sub-areas of Artificial Intelligence (AI), aiming at approaching human problem-solving flexibility. The area has long had an affinity towards playful illustrative examples, imprinting it on the ...
Something I Always Wanted to Know About Test, But Was Afraid to Ask
ETS '09: Proceedings of the 2009 European Test Symposium

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

November 2013

498 pages

ISBN:9781450323888

DOI:10.1145/2517349

General Chair:
Michael Kaminsky
Intel Labs
,
Program Chair:
Mike Dahlin
Google and UT Austin

Copyright © 2013 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2013

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

SOSP '13

Sponsor:

SIGOPS

SOSP '13: ACM SIGOPS 24th Symposium on Operating Systems Principles

November 3 - 6, 2013

Pennsylvania, Farminton

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25

Sponsor:
sigops

ACM SIGOPS 31st Symposium on Operating Systems Principles

October 13 - 16, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

175
Total Citations
View Citations
8,289
Total Downloads

Downloads (Last 12 months)875
Downloads (Last 6 weeks)133

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Joo JKim HHan HGyu Im EKang S(2025)Highly Parallel Regular Expression Matching Using a Real Processing-in-Memory SystemIEEE Access10.1109/ACCESS.2025.353294413(18937-18951)Online publication date: 2025
https://doi.org/10.1109/ACCESS.2025.3532944
Kumar AKatkam RChaudhary PNaik PVutukuru M(2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00012
Gao JLu YXie MWang QShu JNaor DGoel A(2023)CITRONProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585957(297-314)Online publication date: 21-Feb-2023
https://dl.acm.org/doi/10.5555/3585938.3585957
Firmli SChiadmi D(2023)A Scalable Data Structure for Efficient Graph Analytics and In-Place MutationsData10.3390/data81101668:11(166)Online publication date: 3-Nov-2023
https://doi.org/10.3390/data8110166
Cha HHao XWang TZhang HAkella AYu X(2023)B-hash: An Adaptive Hybrid Index for In-Memory Time-Series DatabasesProceedings of the VLDB Endowment10.14778/3583140.358314316:6(1235-1248)Online publication date: 1-Feb-2023
https://dl.acm.org/doi/10.14778/3583140.3583143
Iyer RUnal MKogias MCandea GDruschel PKaufmann AMace JFlinn JSeltzer M(2023)Achieving Microsecond-Scale Tail Latency Efficiently with Approximate Optimal SchedulingProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613136(466-481)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3600006.3613136
Zhang QLiang CButt AMi NChard K(2023)Distributed Logical Timestamp Allocation for DBMS Concurrency Control on Many-core MachinesProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3595942(313-314)Online publication date: 7-Aug-2023
https://dl.acm.org/doi/10.1145/3588195.3595942
Jesus RWeiland MDehnavi MKulkarni MKrishnamoorthy S(2023)AArch64 AtomicsProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3579838(419-421)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3579838
Narayanan VDetweiler DHuang TBurtsev AFedorova ANarayanan DDi Luna GQuerzoni L(2023)DRAMHiT: A Hash Table Architected for the Speed of DRAMProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587457(817-834)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3552326.3587457
Sheng YHassan ASpear M(2023)Separating Mechanism from Policy in STMProceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT58117.2023.00031(279-296)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1109/PACT58117.2023.00031
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten