research-article

Fast concurrent queues for x86 processors

Authors:

Yehuda AfekAuthors Info & Claims

PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 103 - 112

https://doi.org/10.1145/2442516.2442527

Published: 23 February 2013 Publication History

Abstract

Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to propose combining-based concurrent queues.

This paper takes a different approach, showing how to rely on fetch-and-add (F&A), a less powerful primitive that is available on x86 processors, to construct a nonblocking (lock-free) linearizable concurrent FIFO queue which, despite the F&A being a contended hot spot, outperforms combining-based implementations by 1.5x to 2.5x in all concurrency levels on an x86 server with four multicore processors, in both single-processor and multi-processor executions.

References

[1]

Power ISA Version 2.06. http://www.power.org/resources/downloads/PowerISA_V2.06B_V2_PUBLIC.pdf, January 2009.

[2]

G. E. Blelloch, P. B. Gibbons, and S. H. Vardhan. Combinable memory-block transactions. In SPAA 2008.

Digital Library

[3]

G. E. Blelloch, P. Cheng, and P. B. Gibbons. Scalable room synchronizations. Theory of Computing Systems, 36, 2003.

[4]

R. Colvin and L. Groves. Formal verification of an array-based nonblocking queue. In ICECCS 2005.

Digital Library

[5]

D. Dice, V. J. Marathe, and N. Shavit. Lock cohorting: a general technique for designing numa locks. In PPoPP 2012.

Digital Library

[6]

J. Evans. Scalable memory allocation using jemalloc. http://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919, 2011.

[7]

P. Fatourou and N. D. Kallimanis. Revisiting the combining synchronization technique. In PPoPP 2012.

Digital Library

[8]

P. Fatourou and N. D. Kallimanis. A highly-efficient wait-free universal construction. In SPAA 2011.

Digital Library

[9]

E. Freudenthal and A. Gottlieb. Process coordination with fetch-and-increment. In ASPLOS 1991.

Digital Library

[10]

A. Gottlieb, B. D. Lubachevsky, and L. Rudolph. Basic techniques for the efficient coordination of very large numbers of cooperating sequential processors. TOPLAS, 5(2), Apr. 1983.

Digital Library

[11]

D. Hendler, I. Incze, N. Shavit, and M. Tzafrir. Flat combining and the synchronization-parallelism tradeoff. In SPAA 2010.

Digital Library

[12]

M. Herlihy. Wait-free synchronization. TOPLAS, 13:124--149, January 1991.

Digital Library

[13]

M. Herlihy and N. Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2008.

Digital Library

[14]

M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. TOPLAS, 12:463--492, July 1990.

Digital Library

[15]

M. Hoffman, O. Shalev, and N. Shavit. The baskets queue. In OPODIS 2007.

Digital Library

[16]

A. Kogan and E. Petrank. Wait-free queues with multiple enqueuers and dequeuers. In PPoPP 2011.

Digital Library

[17]

E. Ladan-Mozes and N. Shavit. An optimistic approach to lock-free FIFO queues. In DISC 2004.

[18]

M. M. Michael. Hazard pointers: Safe memory reclamation for lockfree objects. IEEE TPDS, 15(6):491--504, June 2004.

Digital Library

[19]

M. M. Michael and M. L. Scott. Simple, fast, and practical nonblocking and blocking concurrent queue algorithms. In PODC 1996.

Digital Library

[20]

M. Moir, D. Nussbaum, O. Shalev, and N. Shavit. Using elimination to implement scalable and lock-free FIFO queues. In SPAA 2005.

Digital Library

[21]

P. Sewell, S. Sarkar, S. Owens, F. Z. Nardelli, and M. O. Myreen. x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors. Communications of the ACM, 53(7):89--97, July 2010.

Digital Library

[22]

N. Shafiei. Non-blocking array-based algorithms for stacks and queues. In ICDCN 2009.

Digital Library

[23]

P. Tsigas and Y. Zhang. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems. In SPAA 2001.

Digital Library

Cited By

Nikolaev RRavindran B(2024)A Family of Fast and Memory Efficient Lock- and Wait-Free ReclamationProceedings of the ACM on Programming Languages10.1145/36588518:PLDI(2174-2198)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3658851
Wu QLi RBeard JJohn LRodríguez GSadayappan PSukumaran-Rajam A(2024)BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less QueuingProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641568(100-112)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641568
Pham KCho SLee SNguyen LYeo HJeong ILee SKim NSon Y(2024)ScaleCache: A Scalable Page Cache for Multiple Solid-State DrivesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629588(641-656)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629588
Show More Cited By

Index Terms

Fast concurrent queues for x86 processors

Recommendations

Fast concurrent queues for x86 processors
PPoPP '13

Conventional wisdom in designing concurrent data structures is to use the most powerful synchronization primitive, namely compare-and-swap (CAS), and to avoid contended hot spots. In building concurrent FIFO queues, this reasoning has led researchers to ...
Practical, Fast and Simple Concurrent FIFO Queues Using Single Word Synchronization Primitives
Ada-Europe '08: Proceedings of the 13th Ada-Europe international conference on Reliable Software Technologies

We present an efficient and practical non-blocking implementation of a concurrent array-based FIFO queue that is suitable for preemptive multi threaded systems. It is well known that concurrent FIFO queues relying on mutual exclusion cause blocking, ...
On the Importance of Synchronization Primitives with Low Consensus Numbers
ICDCN '18: Proceedings of the 19th International Conference on Distributed Computing and Networking

The consensus number of a synchronization primitive is the maximum number of processes for which the primitive can solve consensus. This has been the traditional measure of power of a synchronization primitive. Thus, the compare-and-swap primitive, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '13: Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

February 2013

332 pages

ISBN:9781450319225

DOI:10.1145/2442516

General Chairs:
Alex Nicolau
University of California, Irvine, USA
,
Xiaowei Shen
IBM Research, China
,
Program Chairs:
Saman Amarasinghe
Massachusetts Institute of Technology, USA
,
Richard Vuduc
Georgia Institute of Technology, USA

ACM SIGPLAN Notices Volume 48, Issue 8
PPoPP '13
August 2013
309 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2517327
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PPoPP '13

Sponsor:

SIGPLAN

PPoPP '13: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 23 - 27, 2013

Shenzhen, China

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

115
Total Citations
View Citations
1,066
Total Downloads

Downloads (Last 12 months)160
Downloads (Last 6 weeks)9

Reflects downloads up to 06 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nikolaev RRavindran B(2024)A Family of Fast and Memory Efficient Lock- and Wait-Free ReclamationProceedings of the ACM on Programming Languages10.1145/36588518:PLDI(2174-2198)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3658851
Wu QLi RBeard JJohn LRodríguez GSadayappan PSukumaran-Rajam A(2024)BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less QueuingProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641568(100-112)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641568
Pham KCho SLee SNguyen LYeo HJeong ILee SKim NSon Y(2024)ScaleCache: A Scalable Page Cache for Multiple Solid-State DrivesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629588(641-656)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629588
von Geijer KTsigas P(2024)How to Relax Instantly: Elastic Relaxation of Concurrent Data StructuresEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_9(119-133)Online publication date: 26-Aug-2024
https://doi.org/10.1007/978-3-031-69583-4_9
Fatourou PGiachoudis NMallis G(2024)Highly-Efficient Persistent FIFO QueuesStructural Information and Communication Complexity10.1007/978-3-031-60603-8_14(238-261)Online publication date: 23-May-2024
https://doi.org/10.1007/978-3-031-60603-8_14
Kappes GAnastasiadis S(2023)Diciclo: Flexible User-level Services for Efficient Multitenant IsolationACM Transactions on Computer Systems10.1145/363940442:1-2(1-47)Online publication date: 30-Dec-2023
https://dl.acm.org/doi/10.1145/3639404
Koval NKhalanskiy DAlistarh D(2023)CQS: A Formally-Verified Framework for Fair and Abortable SynchronizationProceedings of the ACM on Programming Languages10.1145/35912307:PLDI(244-266)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591230
Naderibeni HRuppert EOshman RNolin AHalldorsson MBalliu A(2023)A Wait-free Queue with Polylogarithmic Step ComplexityProceedings of the 2023 ACM Symposium on Principles of Distributed Computing10.1145/3583668.3594565(124-134)Online publication date: 19-Jun-2023
https://dl.acm.org/doi/10.1145/3583668.3594565
Jesus RWeiland MDehnavi MKulkarni MKrishnamoorthy S(2023)AArch64 AtomicsProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3579838(419-421)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3579838
Koval NAlistarh DElizarov RDehnavi MKulkarni MKrishnamoorthy S(2023)Fast and Scalable Channels in Kotlin CoroutinesProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577481(107-118)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577481
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents