Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503221.3508420acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Asymmetry-aware scalable locking

Published: 28 March 2022 Publication History

Abstract

The pursuit of power-efficiency is popularizing asymmetric multicore processors (AMP) such as ARM big.LITTLE, Apple M1 and recent Intel Alder Lake with big and little cores. However, we find that existing scalable locks fail to scale on AMP and cause collapses in either throughput or latency, or both, because their implicit assumption of symmetric cores no longer holds. To address this issue, we propose the first asymmetry-aware scalable lock named LibASL. LibASL provides a new lock ordering guided by applications' latency requirements, which allows big cores to reorder with little cores for higher throughput under the condition of preserving applications' latency requirements. Using LibASL only requires linking the applications with it and, if latency-critical, inserting few lines of code to annotate the coarse-grained latency requirement. We evaluate LibASL in various benchmarks including five popular databases on Apple M1. Evaluation results show that LibASL can improve the throughput by up to 5 times while precisely preserving the tail latency designated by applications.

References

[1]
[n.d.]. Akamai: IoT Edge Connect. https://www.akamai.com/cn/zh/products/performance/iot-edge-connect.jsp.
[2]
[n.d.]. AMD Strix Point Hybrid (Big-Little) CPU. https://www.hardwaretimes.com/amd-strix-point-hybrid-big-little-cpu-to-feature-3nm-zen-5-cores-zen-4d-cores-l4-cache/.
[3]
[n. d.]. Apple M1 Chip. https://www.apple.com/mac/m1/.
[4]
[n. d.]. ARM DynamIQ Shared Unit Technical Reference Manual. https://developer.arm.com/documentation/100453/0002/functional-description/introduction/about-the-dsu.
[5]
[n.d.]. Azure IoT Edge SQLite Module. https://github.com/Azure/iot-edge-sqlite.
[6]
[n. d.]. The best IoT Databases for the Edge - an overview and compact guide. https://objectbox.io/the-best-iot-databases-for-the-edge-an-overview-and-compact-guide/.
[7]
[n.d.]. CFS wakeup path and Arm big.LITTLE/DynamIQ. https://lwn.net/Articles/793379/.
[8]
[n.d.]. CORELLIUM: How We Port Linux to Ml. https://corellium.com/blog/linux-m1.
[9]
[n.d.]. Energy Aware Scheduling. https://www.kernel.org/doc/html/latest/scheduler/sched-energy.html.
[10]
[n.d.]. Google Cloud: Defining SLOs. https://cloud.google.com/solutions/defining-SLOs.
[11]
[n.d.]. HiKey970. https://www.96boards.org/product/hikey970/.
[12]
[n. d.]. Intel Alder Lake: Performance Hybrid with Golden Cove and Gracemont for 2021, Intel Architecture Day 2020. https://newsroom.intel.com/press-kits/architecture-day-2020/.
[13]
[n.d.]. Intel Core i5-L16G7 Processor. https://ark.intel.com/content/www/us/en/ark/products/202777/intel-core-i5-l16g7-processor-4m-cache-up-to-3-0ghz.html.
[14]
[n. d.]. Kyoto Cabinet: a straightforward implementation of DBM. https://dbmx.net/kyotocabinet/.
[15]
[n.d.]. LevelDB. https://github.com/google/leveldb.
[16]
[n.d.]. LMDB TECHNICAL INFORMATION. https://symas.com/lmdb/technical/.
[17]
[n. d.]. A Look at Intel Lakefield: A 3D-Stacked Single-ISA Heterogeneous Penta-Core SoC. https://fuse.wikichip.org/news/3417/a-look-at-intel-lakefield-a-3d-stacked-single-isa-heterogeneous-penta-core-soc/.
[18]
[n. d.]. New Intel Core Processors with Intel Hybrid Technology. https://www.intel.com/content/www/us/en/products/docs/processors/core/core-processors-with-hybrid-technology-brief.html.
[19]
[n. d.]. Processing Architecture for Power Efficiency and Performance. https://www.arm.com/why-arm/technologies/big-little.
[20]
[n.d.]. SQLite. https://www.sqlite.org/index.html.
[21]
[n. d.]. Sysbench: Scriptable database and system performance benchmark. https://github.com/akopytov/sysbench.
[22]
[n.d.]. upscaledb: embedded database technology. https://upscaledb.com/.
[23]
[n.d.]. YCSB Core Workloads. https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads.
[24]
S. Akram, J. B. Sartor, and L. Eeckhout. 2016. DVFS performance prediction for managed multithreaded applications. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 12--23.
[25]
Mark Allman, Vern Paxson, Wright Stevens, et al. 1999. TCP congestion control. (1999).
[26]
Silas Boyd-Wickizer, M Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2012. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium. 119--130.
[27]
Juan M. Cebrian, Daniel Sánchez, Juan L. Aragón, and Stefanos Kaxiras. 2013. Efficient inter-core power and thermal balancing for multicore processors. Computing 95, 7 (2013), 537--566.
[28]
Milind Chabbi, Michael Fagan, and John Mellor-Crummey. 2015. High Performance Locks for Multi-Level NUMA Systems. SIGPLAN Not. 50, 8 (Jan. 2015), 215--226.
[29]
Milind Chabbi and John Mellor-Crummey. 2016. Contention-Conscious, Locality-Preserving Locks. SIGPLAN Not. 51, 8, Article 22 (Feb. 2016), 14 pages.
[30]
Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan. 2016. Fast Consensus Using Bounded Staleness for Scalable Read-Mostly Synchronization. IEEE Trans. Parallel Distributed Syst. 27, 12 (2016), 3485--3500.
[31]
Howard Chu. 2011. MDB: A memory-mapped database and backend for OpenLDAP. In Proceedings of the 3rd International Conference on LDAP, Heidelberg, Germany. 35.
[32]
Rafael Lourenco de Lima Chehab, Antonio Paolillo, Diogo Behrens, Ming Fu, Hermann Härtig, and Haibo Chen. 2021. CLoF: A Compositional Lock Framework for Multi-Level NUMA Systems. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany) (SOSP '21). Association for Computing Machinery, New York, NY, USA, 851--865.
[33]
Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56 (2013), 74--80. http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext
[34]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-Value Store. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles (Stevenson, Washington, USA) (SOSP '07). Association for Computing Machinery, New York, NY, USA, 205--220.
[35]
Dave Dice. 2017. Malthusian Locks. In Proceedings of the Twelfth European Conference on Computer Systems (Belgrade, Serbia) (EuroSys '17). Association for Computing Machinery, New York, NY, USA, 314--327.
[36]
Dave Dice and Alex Kogan. 2019. Compact NUMA-Aware Locks. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) (EuroSys '19). Association for Computing Machinery, New York, NY, USA, Article 12, 15 pages.
[37]
Dave Dice, Virendra J. Marathe, and Nir Shavit. 2011. Flat-Combining NUMA Locks. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures (San Jose, California, USA) (SPAA '11). Association for Computing Machinery, New York, NY, USA, 65--74.
[38]
David Dice, Virendra J. Marathe, and Nir Shavit. 2012. Lock Cohorting: A General Technique for Designing NUMA Locks. SIGPLAN Not. 47, 8 (Feb. 2012), 247--256.
[39]
Babak Falsafi, Rachid Guerraoui, Javier Picorel, and Vasileios Trigonakis. 2016. Unlocking Energy. In 2016 USENIX Annual Technical Conference, USENIX ATC 2016, Denver, CO, USA, June 22--24, 2016, Ajay Gulati and Hakim Weatherspoon (Eds.). USENIX Association, 393--406. https://www.usenix.org/conference/atc16/technical-sessions/presentation/falsafi
[40]
Songchun Fan and Benjamin C. Lee. 2016. Evaluating asymmetric multiprocessing for mobile applications. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2016, Uppsala, Sweden, April 17--19, 2016. IEEE Computer Society, 235--244.
[41]
Panagiota Fatourou and Nikolaos D. Kallimanis. 2011. A Highly-Efficient Wait-Free Universal Construction. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures (San Jose, California, USA) (SPAA '11). Association for Computing Machinery, New York, NY, USA, 325--334.
[42]
Panagiota Fatourou and Nikolaos D. Kallimanis. 2012. Revisiting the Combining Synchronization Technique. SIGPLAN Not. 47, 8 (Feb. 2012), 257--266.
[43]
Jinyu Gu, Qianqian Yu, Xiayang Wang, Zhaoguo Wang, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Pisces: A Scalable and Efficient Persistent Transactional Memory. In 2019 USENIX Annual Technical Conference, USENIX ATC 2019, Renton, WA, USA, July 10--12, 2019, Dahlia Malkhi and Dan Tsafrir (Eds.). USENIX Association, 913--928. https://www.usenix.org/conference/atc19/presentation/gu
[44]
Rachid Guerraoui, Hugo Guiroux, Renaud Lachaize, Vivien Quéma, and Vasileios Trigonakis. 2019. Lock-Unlock: Is That All? A Pragmatic Analysis of Locking in Software Systems. ACM Trans. Comput. Syst. 36, 1, Article 1 (March 2019), 149 pages.
[45]
Phuong Hoai Ha, Marina Papatriantafilou, and Philippas Tsigas. 2007. Efficient self-tuning spin-locks using competitive analysis. J. Syst. Softw. 80, 7 (2007), 1077--1090.
[46]
Mingzhe Hao, Huaicheng Li, Michael Hao Tong, Chrisma Pakha, Riza O. Suminto, Cesar A. Stuardo, Andrew A. Chien, and Haryadi S. Gunawi. 2017. MittOS: Supporting Millisecond Tail Tolerance with Fast Rejecting SLO-Aware OS Interface. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 168--183.
[47]
Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat Combining and the Synchronization-Parallelism Tradeoff. In Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures (Thira, Santorini, Greece) (SPAA '10). Association for Computing Machinery, New York, NY, USA, 355--364.
[48]
Brian Jeff. 2013. big. LITTLE technology moves towards fully heterogeneous global task scheduling. ARM white paper (2013).
[49]
Anna R. Karlin, Kai Li, Mark S. Manasse, and Susan S. Owicki. 1991. Empirical Studies of Competitive Spinning for a Shared-Memory Multiprocessor. In Proceedings of the Thirteenth ACM Symposium on Operating System Principles, SOSP 1991, Asilomar Conference Center, Pacific Grove, California, USA, October 13--16, 1991, Henry M. Levy (Ed.). ACM, 41--55.
[50]
Sanidhya Kashyap, Irina Calciu, Xiaohe Cheng, Changwoo Min, and Taesoo Kim. 2019. Scalable and Practical Locking with Shuffling. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP '19). Association for Computing Machinery, 586--599.
[51]
Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. 2017. Scalable NUMA-aware Blocking Synchronization Primitives. In 2017 USENIX Annual Technical Conference, USENIX ATC 2017, Santa Clara, CA, USA, July 12--14, 2017, Dilma Da Silva and Bryan Ford (Eds.). USENIX Association, 603--615. https://www.usenix.org/conference/atc17/technical-sessions/presentation/kashyap
[52]
David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Bias Scheduling in Heterogeneous Multi-Core Architectures. In Proceedings of the 5th European Conference on Computer Systems (Paris, France) (EuroSys '10). Association for Computing Machinery, New York, NY, USA, 125--138.
[53]
Rakesh Kumar, Keith I Farkas, Norman P Jouppi, Parthasarathy Ranganathan, and Dean M Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36. IEEE, 81--92.
[54]
Rakesh Kumar, Dean M Tullsen, Parthasarathy Ranganathan, Norman P Jouppi, and Keith I Farkas. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings. 31st Annual International Symposium on Computer Architecture, 2004. IEEE, 64--75.
[55]
Nian Liu, Binyu Zang, and Haibo Chen. 2020. No Barrier in the Road: A Comprehensive Study and Optimization of ARM Barriers. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, California) (PPoPP '20). Association for Computing Machinery, New York, NY, USA, 348--361.
[56]
Ran Liu, Heng Zhang, and Haibo Chen. 2014. Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks. In 2014 USENIX Annual Technical Conference, USENIX ATC '14, Philadelphia, PA, USA, June 19--20, 2014, Garth Gibson and Nickolai Zeldovich (Eds.). USENIX Association, 219--230. https://www.usenix.org/conference/atc14/technical-sessions/presentation/liu
[57]
David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014, Minneapolis, MN, USA, June 14--18, 2014. IEEE Computer Society, 301--312.
[58]
Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia L. Lawall, and Gilles Muller. 2012. Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications. In 2012 USENIX Annual Technical Conference, Boston, MA, USA, June 13--15, 2012, Gernot Heiser and Wilson C. Hsieh (Eds.). USENIX Association, 65--76. https://www.usenix.org/conference/atc12/technical-sessions/presentation/lozi
[59]
Victor Luchangco, Daniel Nussbaum, and Nir Shavit. 2006. A Hierarchical CLH Queue Lock. In Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28 - September 1, 2006, Proceedings (Lecture Notes in Computer Science, Vol. 4128), Wolfgang E. Nagel, Wolfgang V. Walter, and Wolfgang Lehner (Eds.). Springer, 801--810.
[60]
Romolo Marotta, Davide Tiriticco, Pierangelo di Sanzo, Alessandro Pellegrini, Bruno Ciciani, and Francesco Quaglia. 2020. Mutable locks: Combining the best of spin and sleep locks. Concurr. Comput. Pract. Exp. 32, 22 (2020).
[61]
John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21--65.
[62]
Yoshihiro Oyama, Kenjiro Taura, and Akinori Yonezawa. 1999. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, Vol. 16. Citeseer, 95.
[63]
Zoran Radovic and Erik Hagersten. 2003. Hierarchical Backoff Locks for Nonuniform Communication Architectures. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), Anaheim, California, USA, February 8--12, 2003. IEEE Computer Society, 241--252.
[64]
Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster than You Think. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 342--358.
[65]
Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, and Xi Wang. 2016. Push-Button Verification of File Systems via Crash Refinement. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 1--16.
[66]
Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 2012 39th Annual International Symposium on Computer Architecture (ISCA). 213--224.
[67]
Jons-Tobias Wamhoff, Stephan Diestelhorst, Christof Fetzer, Patrick Marlier, Pascal Felber, and Dave Dice. 2014. The TURBO Diaries: Application-controlled Frequency Scaling Explained. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). USENIX Association, Philadelphia, PA, 193--204. https://www.usenix.org/conference/atc14/technical-sessions/presentation/wamhoff
[68]
Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy H. Katz, and Ion Stoica. 2012. Cake: enabling high-level SLOs on shared storage systems. In ACM Symposium on Cloud Computing, SOCC '12, San Jose, CA, USA, October 14--17, 2012, Michael J. Carey and Steven Hand (Eds.). ACM, 14.
[69]
Xi Yang, Stephen M. Blackburn, and Kathryn S. McKinley. 2016. Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading. In 2016 USENIX Annual Technical Conference, USENIX ATC 2016, Denver, CO, USA, June 22--24, 2016, Ajay Gulati and Hakim Weatherspoon (Eds.). USENIX Association, 309--322. https://www.usenix.org/conference/atc16/technical-sessions/presentation/yang
[70]
Kisoo Yu, Donghee Han, Changhwan Youn, Seungkon Hwang, and Jaechul Lee. 2013. Power-aware task scheduling for big. LITTLE mobile processor. In 2013 International SoC Design Conference (ISOCC). IEEE, 208--212.
[71]
Mingzhe Zhang, Haibo Chen, Luwei Cheng, Francis C. M. Lau, and Cho-Li Wang. 2017. Scalable Adaptive NUMA-Aware Lock. IEEE Trans. Parallel Distributed Syst. 28, 6 (2017), 1754--1769.
[72]
Timothy Zhu, Michael A. Kozuch, and Mor Harchol-Balter. 2017. Work-loadCompactor: reducing datacenter cost while providing tail latency SLO guarantees. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, September 24--27, 2017. ACM, 598--610.

Cited By

View all
  • (2024)CAL: Core-Aware Lock for the big.LITTLE Multicore ArchitectureApplied Sciences10.3390/app1415644914:15(6449)Online publication date: 24-Jul-2024
  • (2024)TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334257335:2(297-309)Online publication date: 1-Feb-2024
  • (2023)HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00025(209-220)Online publication date: 31-Oct-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
April 2022
495 pages
ISBN:9781450392044
DOI:10.1145/3503221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2022

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. asymmetric multicore processor
  2. lock
  3. scalability
  4. synchronization primitives

Qualifiers

  • Research-article

Funding Sources

  • China National Natural Science Foundation
  • High-Tech Support Program from Shanghai Committee of Science and Technology

Conference

PPoPP '22

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)15
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CAL: Core-Aware Lock for the big.LITTLE Multicore ArchitectureApplied Sciences10.3390/app1415644914:15(6449)Online publication date: 24-Jul-2024
  • (2024)TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334257335:2(297-309)Online publication date: 1-Feb-2024
  • (2023)HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00025(209-220)Online publication date: 31-Oct-2023

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media