A stealing mechanism for delegation methods

Yi, Zhengming; Yao, Yiping

doi:10.1007/s11227-021-03719-2

A stealing mechanism for delegation methods

Published: 12 March 2021

Volume 77, pages 10827–10849, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Zhengming Yi¹ &
Yiping Yao²

214 Accesses
Explore all metrics

Abstract

Modern multi-core architectures exhibit non-uniform memory access (NUMA) behavior, where access by a core to data cached locally on a NUMA node is much faster than access to data cached on a remote node. Prior work has shown that on the NUMA multi-core system, delegation is the highly efficient solution to address the generally poor performance of locking methods on highly contended locks due to delegating the execution of critical section to one thread, reducing the movement of shared data between cores. However, we observe that delegation methods exhibit sub-par single-thread performance due mainly to the overheads of the communication between the server and client threads and request array traversal. To address this problem, this paper proposes a stealing mechanism that under no contention the clients can directly enter the critical section by enabling lock stealing. Meanwhile, under contention it employs delegation protocol by disabling lock stealing. Furthermore, we apply stealing mechanism to the state-of-the-art delegation methods: FFWD and RCL. The evaluation shows that delegation augmented with lock stealing can significantly improve the performance in the non-contended case, while matching the performance of their original counterparts in the contended circumstances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems

LRMalloc: A Modern and Competitive Lock-Free Dynamic Memory Allocator

LC-MEMENTO: A Memory Model for Accelerated Architectures

References

Anderson TE (1990) The performance of spin lock alternatives for shared memory multiprocessors. IEEE Trans Parallel Distrib Syst 1(1):6–16
Article Google Scholar
Baumann A, Barham P, Dagand PE, Tyshasta H, Rebecca I, Simon P, Timothy R, Adrian S, Akhilesh S (2009) The multi-kernel: a new OS architecture for scalable multicore systems. The ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP 09), pp. 29-44
Baumann A, Peter S, Adrian S, Akhilesh S, Timothy R, Paul B, Rebecca I (2009) Your computer is already a distributed system. Why Isn’t Your OS? Workshop on Workstation Operating Systems /workshop on Hot Topics in Operating Systems (HotOS 09), pp. 12-12
Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. Dissertation. Princeton University, Princeton, NJ
Boyd-Wickizer S, Clements AT, Mao Y, Pesterev A, Kaashoek MF, Morris RT, Zeldovich N (2010) An analysis of Linux scalability to many cores. In: Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Vancouver, Canada, pp. 1-16
Boyd-Wickizer S, Kaashoek MF, Morris R, Zeldovich N (2012) Non-scalable locks are dangerous. In: Proceedings of the Linux Symposium. Ottawa, Canada
Chabbi M, Fagan M, Mellor-Crummey J (2015) High performance locks for multi-level NUMA systems. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 215-226
Chabbi M, Mellor-Crummey J (2016) Contention-conscious, locality-preserving locks. In: Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)
Craig T (1993) Building FIFO and priority queuing spin locks from atomic swap, Univ. Washington, Seattle, WA, USA, Tech. Rep. 93-02-02
Dave Dice, Virendra J. Marathe, and Nir Shavit. Flat-combining NUMA locks. In Proceedings of the Twenty-third Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 65-74, 2011
David T, Guerraoui R, Trigonakis V (2013) Everything you always wanted to know about synchronization but were afraid to ask. In: Proceedings of the 24th ACM Symposium on Operating Systems Principles(SOSP 13), pp. 33-48
Dice D (2017) Malthusian locks. In: Proceedings of the Twelfth European Conference on Computer Systems (EuroSys), pp. 314-327
Dice D, Marathe VJ, Shavit N (2015) Lock cohorting: a general technique for designing NUMA locks. ACM Trans Parallel Comput 1(2):13:1-13:42
Article Google Scholar
Dice D, Kogan A (2019) Compact NUMA-aware Locks. In: Proceedings of the Fourteenth EuroSys Conference 2019 (EuroSys’19). ACM, New York, NY, USA, Article 12, 15 pages
Dogan H, Hijaz F, Ahmad M, Kahne B, Wilson P, Khan O (2017). Accelerating graph and machine learning workloads using a shared memory multicore architecture with auxiliary support for in-hardware explicit messaging. In Parallel and Distributed Processing Symposium (IPDPS), 2017 IEEE International. IEEE, pp. 254-264
Eyerman S, Eeckhout L (2010) Modeling critical sections in Amdahl’s law and its implications for multicore design. In: Proceedings of the Annual International Symposium on Computer Architecture (ISCA), pp. 362-370
Fatourou P, Kallimanis ND (2012) Revisiting the combining synchronization technique. In: Proceedings of the 17th ACM Symposium on Principles and Practice of Parallel Programming (PPOPP), pp. 257-266, New Orleans, LA
Guerraoui R, Guiroux H, Lachaize R, Quma V, Trigonakis V (2019) Lock-unlock: Is that all? a pragmatic analysis of locking in software systems. ACM Trans Comput Syst 36(1):1:1-1:149
Article Google Scholar
Guerraoui R, Trigonakis V (2016) Optimistic concurrency with OPTIK. In: Proceedings of the 21st ACM Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, Barcelona, Spain, pp. 18:1-18:12
Guiroux H, Lachaize R, Quéma V (2016) Multicore locks: the case is not closed yet. In: Proceedings of the USENIX Annual Technical Conference (ATC), pp. 649-662
Hackenberg D, Molka D, Nagel W (2009) Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems. ACM International Symposium on Microarchitecture(MICRO 09), pp. 413- 422
Hendler D, Incze I, Shavit N, Tzafrir M (2010) Flat combining and the synchronization-parallelism tradeoff. In: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures, pp. 355-364. ACM
He B, Scherer WN, Scott ML (2005) Preemption Adaptivity in Time-published Queue-based Spin Locks. In: Proceedings of the 12th International Conference on High Performance Computing (HiPC’05) (2005), Springer-Verlag
Kashyap S, Calciu I, Cheng X, Min C, Kim T (2019) Scalable and practical locking with shuffling. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (SOSP), pp. 586-599
Kashyap S, Min C, Kim T (2017) Scalable NUMA-aware Blocking Synchronization Primitives. In: Proceedings of the (2017) USENIX Annual Technical Conference (ATC). USENIX Association, Santa Clara, CA
Kim J, Mathew A, Kashyap S, Ramanathan MK, Min C (2019) MV-RLU: scaling read-log-update with multi-versioning. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 779-792
Klaftenegger D, Sagonas K, Winblad K (2018) Queue delegation locking. IEEE Trans Parallel Distrib Syst 29(3):687–704
Article Google Scholar
Liu T, Berger ED (2011) Sheriff: precise detection and automatic mitigation of false sharing. ACM Sigplan Notices 46:3–18
Article Google Scholar
Lozi JP, David F, Thomas G, Lawall J, Muller G (2016) Fast and portable locking for multicore architectures. ACM Trans. Comput. Syst., 33(4):13:1-13:62
Lozi JP , David F, Thomas G, Lawall JL, Muller G et al. (2012) Remote core locking: Migrating critical-section execution to improve the performance of multithreaded applications. In: USENIX Annual Technical Conference, pp. 65-76
Magnusson PS, Landin A, Hagersten E (1994) Queue locks on cache coherent multiprocessors. In: Proceedings of the 8th International Parallel Processing Symposium, pages pp. 165-171
Mellor-Crummey JM, Scott ML (1991) Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans Comput Syst. https://doi.org/10.1145/103727.103729
Article Google Scholar
Memcached, . https://memcached.org
Memslap, . http://docs.libmemcached.org/bin/ memaslap.html
Nanavati M, Spear M, Taylor N, Rajagopalan S, Meyer DT, Aiello W, Warfield A (2013) Whose cache line is it anyway?: operating system support for live detection and repair of false sharing. In Proceedings of the 8th ACM European Conference on Computer Systems, pp. 141-154. ACM, 2013
Oyama Y, Taura K, Yonezawa A (1999) Executing parallel programs with synchronization bottlenecks efficiently, In: Proc. Int. Workshop Parallel Distrib. Comput. Symbolic Irregular Appl., pp. 182-204
Radovic Z, Hagersten E (2003) Hierarchical backoff locks for nonuniform communication architectures. In International Symposium on High- Performance Computer Architecture (HPCA), pp. 241-252
Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating mapreduce for multi-core and multiprocessor systems. In High Performance Computer Architecture. HPCA (2007) IEEE 13th International Symposium on, pp. 13-24. Ieee, 2007
Roghanchi S, Eriksson J, Basu N (2017) ffwd: delegation is (much) faster than you think. In: Proceedings of the 26th Symposium on Operating Systems Principles. ACM, pp. 342-358
Splash-2. http://www.capsl.udel.edu/splash
Tang X , Zhai J, Qian X, Chen W (2019) pLock: a fast lock for architectures with explicit inter-core message passing. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 765-778
Victor Luchangco, Dan Nussbaum, and Nir Shavit. A hierarchical CLH queue lock. In Proceedings of the 12th International Conference on Parallel Processing (EuroPar), pages 801-810, 2006
Zhang M, Chen H, Cheng L, Lau FCM, Wang CL (2017) Scalable Adaptive NUMA-Aware Lock. IEEE Trans Parallel Distrib Syst 28(6):1754–1769
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer, NUDT, Changsha, China
Zhengming Yi
State Key Laboratory of High Performance Computing, NUDT, Changsha, China
Yiping Yao

Authors

Zhengming Yi
View author publications
You can also search for this author in PubMed Google Scholar
Yiping Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhengming Yi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yi, Z., Yao, Y. A stealing mechanism for delegation methods. J Supercomput 77, 10827–10849 (2021). https://doi.org/10.1007/s11227-021-03719-2

Download citation

Accepted: 25 February 2021
Published: 12 March 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s11227-021-03719-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A stealing mechanism for delegation methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems

LRMalloc: A Modern and Competitive Lock-Free Dynamic Memory Allocator

LC-MEMENTO: A Memory Model for Accelerated Architectures

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A stealing mechanism for delegation methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Staccato: Cache-Aware Work-Stealing Task Scheduler for Shared-Memory Systems

LRMalloc: A Modern and Competitive Lock-Free Dynamic Memory Allocator

LC-MEMENTO: A Memory Model for Accelerated Architectures

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation