Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2517349.2522714acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open access

Everything you always wanted to know about synchronization but were afraid to ask

Published: 03 November 2013 Publication History

Abstract

This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket -- uniform and non-uniform -- to multi-socket -- directory and broadcast-based -- many-cores. We draw a set of observations that, roughly speaking, imply that scalability of synchronization is mainly a property of the hardware.

Supplementary Material

MP4 File (d1-03-vasileios-trigonakis.mp4)

References

[1]
J. Abellan, J. Fernandez, and M. Acacio. GLocks: Efficient support for highly-contended locks in Many-Core CMPs. IPDPS 2011, pages 893--905.
[2]
AMD. Software optimization guide for AMD family 10h and 12h processors. 2011.
[3]
G. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. AFIPS 1967 (Spring), pages 483--485.
[4]
T. E. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE TPDS, 1(1):6--16, 1990.
[5]
H. Attiya, A. Bar-Noy, and D. Dolev. Sharing memory robustly in message-passing systems. PODC 1990, pages 363--375.
[6]
A. Baumann, P. Barham, P. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The multikernel: a new OS architecture for scalable multicore systems. SOSP 2009, pages 29--44.
[7]
L. Boguslavsky, K. Harzallah, A. Kreinen, K. Sevcik, and A. Vainshtein. Optimal strategies for spinning and blocking. J. Parallel Distrib. Comput., 21(2):246--254, 1994.
[8]
S. Borkar. Design challenges of technology scaling. Micro, IEEE, 19(4):23--29, 1999.
[9]
S. Borkar and A. Chien. The future of microprocessors. Communications of the ACM, 54(5):67--77, 2011.
[10]
S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: an operating system for many cores. OSDI 2008, pages 43--57.
[11]
S. Boyd-Wickizer, A. Clements, Y. Mao, A. Pesterev, M. Kaashoek, R. Morris, and N. Zeldovich. An analysis of Linux scalability to many cores. In OSDI 2010, pages 1--16.
[12]
S. Boyd-Wickizer, M. Kaashoek, R. Morris, and N. Zeldovich. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium, 2012.
[13]
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache hierarchy and memory subsystem of the AMD Opteron processor. Micro, IEEE, 30(2):16--29, 2010.
[14]
D. Dice, V. Marathe, and N. Shavit. Lock cohorting: a general technique for designing numa locks. PPoPP 2012, pages 247--256.
[15]
B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm. Tornado: Maximizing locality and concurrency in a shared memory multiprocessor operating system. OSDI 1999, pages 87--100.
[16]
V. Gramoli, R. Guerraoui, and V. Trigonakis. TM2C: a software transactional memory for many-cores. EuroSys 2012, pages 351--364.
[17]
D. Hackenberg, D. Molka, and W. Nagel. Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems. MICRO 2009, pages 413--422.
[18]
D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. SPAA 2010, pages 355--364. ACM.
[19]
M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, 1991.
[20]
M. Herlihy and N. Shavit. The art of multiprocessor programming, revised first edition. 2012.
[21]
M. Hill and M. Marty. Amdahl's law in the multi-core era. Computer, 41(7):33--38, 2008.
[22]
Intel. An introduction to the Intel QuickPath interconnect. 2009.
[23]
Intel. Intel 64 and IA-32 architectures software developer's manual. 2013.
[24]
Intel. Transactional Synchronization Extensions Overview. 2013.
[25]
libmemcached. http://libmemcached.org/libMemcached.html.
[26]
J. Lozi, F. David, G. Thomas, J. Lawall, and G. Muller. Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. USENIX ATC 2012.
[27]
V. Luchangco, D. Nussbaum, and N. Shavit. A hierarchical CLH queue lock. ICPP 2006, pages 801--810.
[28]
J. Mellor-Crummey and M. Scott. Synchronization without contention. ASPLOS 1991, pages 269--278.
[29]
J. Mellor-Crummey and M. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM TOCS, 1991.
[30]
Memcached. http://www.memcached.org.
[31]
M. Michael and M. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. PODC 1996, pages 267--275.
[32]
S. Microsystems. UltraSPARC T2 supplement to the UltraSPARC architecture. 2007.
[33]
D. Molka, R. Schöne, D. Hackenberg, and M. Müller. Memory performance and SPEC OpenMP scalability on quad-socket x86 64 systems. ICA3PP 2011, pages 170--181.
[34]
MonetDB. http://www.monetdb.org/.
[35]
J. Moses, R. Illikkal, L. Zhao, S. Makineni, and D. Newell. Effects of locking and synchronization on future large scale CMP platforms. CAECW 2006.
[36]
U. Nawathe, M. Hassan, K. Yen, A. Kumar, A. Ramachandran, and D. Greenhill. Implementation of an 8-Core, 64-thread, power-efficient SPARC server on a chip. Solid-State Circuits, IEEE Journal of, 43(1):6--20, 2008.
[37]
M. Papamarcos and J. Patel. A low-overhead coherence solution for multiprocessors with private cache memories. ISCA 1984, pages 348--354.
[38]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. HPCA 2007, pages 13--24.
[39]
M. Schroeder and M. Burrows. Performance of Firefly RPC. In SOSP 1989, pages 83--90.
[40]
M. Scott and W. Scherer. Scalable queue-based spin locks with timeout. PPoPP 2001, pages 44--52.
[41]
Tilera tile-gx. http://www.tilera.com/products/processors/TILE-Gx_Family.
[42]
TPC-H. http://www.tpc.org/tpch/.
[43]
C. T. S. Building FIFO and priority-queuing spin locks from atomic swap. Technical report, 1993.
[44]
J. Tseng, H. Yu, S. Nagar, N. Dubey, H. Franke, P. Pattnaik, H. Inoue, and T. Nakatani. Performance studies of commercial workloads on a multi-core system. IISWC 2007, pages 57--65.
[45]
D. Wentzlaff and A. Agarwal. Factored operating systems (fos): the case for a scalable operating system for multicores. OSR, 43(2):76--85, 2009.

Cited By

View all
  • (2025)Highly Parallel Regular Expression Matching Using a Real Processing-in-Memory SystemIEEE Access10.1109/ACCESS.2025.353294413(18937-18951)Online publication date: 2025
  • (2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
  • (2023)CITRONProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585957(297-314)Online publication date: 21-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
November 2013
498 pages
ISBN:9781450323888
DOI:10.1145/2517349
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2013

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SOSP '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)875
  • Downloads (Last 6 weeks)133
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Highly Parallel Regular Expression Matching Using a Real Processing-in-Memory SystemIEEE Access10.1109/ACCESS.2025.353294413(18937-18951)Online publication date: 2025
  • (2024)AppSteer: Framework for Improving Multicore Scalability of Network Functions via Application-aware Packet Steering2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00012(18-27)Online publication date: 6-May-2024
  • (2023)CITRONProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585957(297-314)Online publication date: 21-Feb-2023
  • (2023)A Scalable Data Structure for Efficient Graph Analytics and In-Place MutationsData10.3390/data81101668:11(166)Online publication date: 3-Nov-2023
  • (2023)B-hash: An Adaptive Hybrid Index for In-Memory Time-Series DatabasesProceedings of the VLDB Endowment10.14778/3583140.358314316:6(1235-1248)Online publication date: 1-Feb-2023
  • (2023)Achieving Microsecond-Scale Tail Latency Efficiently with Approximate Optimal SchedulingProceedings of the 29th Symposium on Operating Systems Principles10.1145/3600006.3613136(466-481)Online publication date: 23-Oct-2023
  • (2023)Distributed Logical Timestamp Allocation for DBMS Concurrency Control on Many-core MachinesProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3595942(313-314)Online publication date: 7-Aug-2023
  • (2023)AArch64 AtomicsProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3579838(419-421)Online publication date: 25-Feb-2023
  • (2023)DRAMHiT: A Hash Table Architected for the Speed of DRAMProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587457(817-834)Online publication date: 8-May-2023
  • (2023)Separating Mechanism from Policy in STMProceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT58117.2023.00031(279-296)Online publication date: 21-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media