Article

A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems

Authors:

Philippas Tsigas,

Yi ZhangAuthors Info & Claims

SPAA '01: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures

Pages 134 - 143

https://doi.org/10.1145/378580.378611

Published: 03 July 2001 Publication History

Get Access

Abstract

A non-blocking FIFO queue algorithm for multiprocessor shared memory systems is presented in this paper. The algorithm is very simple, fast and scales very well in both symmetric and non-symmetric multiprocessor shared memory systems. Experiments on a 64-node SUN Enterprise 10000 — a symmetric multiprocessorsystem — and on a 64-node SGI Origin 2000 — a cache coherent non uniform memory access multiprocessorsystem — indicate that our algorithm considerably outperforms the best of the known alternatives in both multiprocessors in any level of multiprogramming. This work introduces two new, simple algorithmic mechanisms. The first lowers the contention to key variables used by the concurrent enqueue and/or dequeue operations which consequently results in the good performance of the algorithm, the second deals with the pointer recycling problem, an inconsistency problem that all non-blocking algorithms based on the compare-and-swap synchronisation primitive have to address. In our construction we selected to use compare-and-swap since compare-and-swap is an atomic primitive that scales well under contention and either is supported by modern multiprocessors or can be implemented efficiently on them.

References

[1]

A. Charlesworth. Starfire extending the SMP envelope. IEEE Micro, 18(1):39-49, 1998.

Digital Library

Google Scholar

[2]

D. Cortesi. Origin 2000 and onyx2 performance tuning and optimization guide. http://techpubs.sgi.com/library/, SGI Inc., 1998.

Google Scholar

[3]

M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124-149, Jan. 1991.

Digital Library

Google Scholar

[4]

M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463-492, July 1990.

Digital Library

Google Scholar

[5]

A. R. Karlin, K. Li, M. S. Manasse, and S. Owicki. Empirical studies of competitive spinning for a shared-memory multiprocessor. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles Operating Systems Review (13th SOSP 1991), pages 41-55, Pacific Grove, CA, Oct. 1991.

Digital Library

Google Scholar

[6]

L. Lamport. Specifying concurrent program modules. ACM Transactions on Programming Languages and Systems, 5(2):190-222, Apr. 1983.

Digital Library

Google Scholar

[7]

J. Laudon and D. Lenoski. The SGI origin: A ccNUMA highly scalable server. In Proceedings of the 24th Annual International Symposium on Computer Architecture (ISCA-97), volume 25,2 of Computer Architecture News, pages 241-251, New YOrk, June 2-4 1997. ACM Press.

Digital Library

Google Scholar

[8]

H. Massalin and C. Pu. A lock-free multiprocessor OS kernel. Technical Report CUCS-005-91, Columbia University, 1991.

Google Scholar

[9]

J. M. Mellor-Crummey and M. L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Transactions on Computer Systems (TOCS), 9(1):21-65, Feb. 1991.

Digital Library

Google Scholar

[10]

M. M. Michael and M. L. Scott. Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors. Journal of Parallel and Distributed Computing, 51(1):1-26, 25 May 1998.

Digital Library

Google Scholar

[11]

S. Prakash, Y. Lee, and T. Johnson. A nonblocking algorithm for shared queues using compare-and-swap. IEEE Transactions on Computers, 43:548-559, May 1994.

Digital Library

Google Scholar

[12]

J. D. Valois. Lock-Free Data Structures. PhD thesis, Rensselaer Polytechnic Institute, Department of Computer Science, 1995.

Digital Library

Google Scholar

[13]

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characteriation and methodological considerations. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 24-37, New York, June 22-24 1995. ACM Press.

Digital Library

Google Scholar

[14]

J. Zahorjan, E. D. Lazowska, and D. L. Eager. The eyect of scheduling discipline on spin overhead in shared memory parallel processors. IEEE Transactions on Parallel and Distributed Systems, PDS-2(2):180-198, Apr. 1991.

Digital Library

Google Scholar

Cited By

View all

Naderibeni HRuppert E(2024)A wait-free queue with polylogarithmic step complexityDistributed Computing10.1007/s00446-024-00471-737:4(309-334)Online publication date: 17-Aug-2024
https://doi.org/10.1007/s00446-024-00471-7
Naderibeni HRuppert EOshman RNolin AHalldorsson MBalliu A(2023)A Wait-free Queue with Polylogarithmic Step ComplexityProceedings of the 2023 ACM Symposium on Principles of Distributed Computing10.1145/3583668.3594565(124-134)Online publication date: 19-Jun-2023
https://dl.acm.org/doi/10.1145/3583668.3594565
Romanov RKoval NDehnavi MKulkarni MKrishnamoorthy S(2023)The State-of-the-Art LCRQ Concurrent Queue Algorithm Does NOT Require CAS2Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577485(14-26)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577485
Show More Cited By

Index Terms

A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems

Recommendations

Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
Non-blocking timeout in scalable queue-based spin locks
PODC '02: Proceedings of the twenty-first annual symposium on Principles of distributed computing

Queue-based spin locks allow programs with busy-wait synchronization to scale to very large multiprocessors, without fear of starvation or performance-destroying contention. Timeout-capable spin locks allow a thread to abandon its attempt to acquire a ...
Speculative Locks for Concurrent Execution of Critical Sections in Shared-Memory Multiprocessors

Comments

Information & Contributors

Information

Published In

SPAA '01: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures

July 2001

340 pages

ISBN:1581134096

DOI:10.1145/378580

Chairman:
Arnold Rosenberg
Univ. of Massachusetts

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SPAA01

Sponsor:

SPAA01: 13th ACM Symposium on Parallel Algorithms and Architectures

Crete Island, Greece

Acceptance Rates

SPAA '01 Paper Acceptance Rate 34 of 93 submissions, 37%;

Overall Acceptance Rate 447 of 1,461 submissions, 31%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

109
Total Citations
View Citations
1,450
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)6

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Naderibeni HRuppert E(2024)A wait-free queue with polylogarithmic step complexityDistributed Computing10.1007/s00446-024-00471-737:4(309-334)Online publication date: 17-Aug-2024
https://doi.org/10.1007/s00446-024-00471-7
Naderibeni HRuppert EOshman RNolin AHalldorsson MBalliu A(2023)A Wait-free Queue with Polylogarithmic Step ComplexityProceedings of the 2023 ACM Symposium on Principles of Distributed Computing10.1145/3583668.3594565(124-134)Online publication date: 19-Jun-2023
https://dl.acm.org/doi/10.1145/3583668.3594565
Romanov RKoval NDehnavi MKulkarni MKrishnamoorthy S(2023)The State-of-the-Art LCRQ Concurrent Queue Algorithm Does NOT Require CAS2Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577485(14-26)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3572848.3577485
Milman-Sela GKogan ALev YLuchangco VPetrank E(2022)BQ: A Lock-Free Queue with BatchingACM Transactions on Parallel Computing10.1145/35127579:1(1-49)Online publication date: 23-Mar-2022
https://dl.acm.org/doi/10.1145/3512757
Nikolaev RRavindran BAgrawal KLee I(2022)wCQProceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3490148.3538572(307-319)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1145/3490148.3538572
Zhao MTroendle DJang B(2022)A Concurrent Relaxed Queue for Unordered Parallel Accesses on GPUs2022 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI58124.2022.00243(1352-1358)Online publication date: Dec-2022
https://doi.org/10.1109/CSCI58124.2022.00243
Dehnavi SGoswami DGoossens K(2021)Analyzable Publish-Subcribe Communication Through a Wait-Free FIFO Channel for MPSoC Real-Time Applications2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)10.1109/MCSoC51149.2021.00064(388-395)Online publication date: Dec-2021
https://doi.org/10.1109/MCSoC51149.2021.00064
Zhao ZJiang ZChen YGong XWang WYew PLee J(2021)Enhancing atomic instruction emulation for cross-ISA dynamic binary translationProceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO51591.2021.9370312(351-362)Online publication date: 27-Feb-2021
https://dl.acm.org/doi/10.1109/CGO51591.2021.9370312
Tommasi FDe Luca VMelle C(2021)QoS monitoring in real-time streaming overlays based on lock-free data structuresMultimedia Tools and Applications10.1007/s11042-020-10198-9Online publication date: 11-Mar-2021
https://doi.org/10.1007/s11042-020-10198-9
Patel MAmritha P(2021)Binary Decision Tree Based Packet Queuing Schema for Next Generation FirewallAdvances in Computing and Data Sciences10.1007/978-3-030-81462-5_21(224-233)Online publication date: 23-Oct-2021
https://doi.org/10.1007/978-3-030-81462-5_21
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms

Non-blocking timeout in scalable queue-based spin locks

Speculative Locks for Concurrent Execution of Critical Sections in Shared-Memory Multiprocessors

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms

Non-blocking timeout in scalable queue-based spin locks

Speculative Locks for Concurrent Execution of Critical Sections in Shared-Memory Multiprocessors

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations