research-article

Asymmetry-aware scalable locking

Authors:

Haibo ChenAuthors Info & Claims

PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Pages 294 - 308

https://doi.org/10.1145/3503221.3508420

Published: 28 March 2022 Publication History

Abstract

The pursuit of power-efficiency is popularizing asymmetric multicore processors (AMP) such as ARM big.LITTLE, Apple M1 and recent Intel Alder Lake with big and little cores. However, we find that existing scalable locks fail to scale on AMP and cause collapses in either throughput or latency, or both, because their implicit assumption of symmetric cores no longer holds. To address this issue, we propose the first asymmetry-aware scalable lock named LibASL. LibASL provides a new lock ordering guided by applications' latency requirements, which allows big cores to reorder with little cores for higher throughput under the condition of preserving applications' latency requirements. Using LibASL only requires linking the applications with it and, if latency-critical, inserting few lines of code to annotate the coarse-grained latency requirement. We evaluate LibASL in various benchmarks including five popular databases on Apple M1. Evaluation results show that LibASL can improve the throughput by up to 5 times while precisely preserving the tail latency designated by applications.

References

[1]

[n.d.]. Akamai: IoT Edge Connect. https://www.akamai.com/cn/zh/products/performance/iot-edge-connect.jsp.

[2]

[n.d.]. AMD Strix Point Hybrid (Big-Little) CPU. https://www.hardwaretimes.com/amd-strix-point-hybrid-big-little-cpu-to-feature-3nm-zen-5-cores-zen-4d-cores-l4-cache/.

[3]

[n. d.]. Apple M1 Chip. https://www.apple.com/mac/m1/.

[4]

[n. d.]. ARM DynamIQ Shared Unit Technical Reference Manual. https://developer.arm.com/documentation/100453/0002/functional-description/introduction/about-the-dsu.

[5]

[n.d.]. Azure IoT Edge SQLite Module. https://github.com/Azure/iot-edge-sqlite.

[6]

[n. d.]. The best IoT Databases for the Edge - an overview and compact guide. https://objectbox.io/the-best-iot-databases-for-the-edge-an-overview-and-compact-guide/.

[7]

[n.d.]. CFS wakeup path and Arm big.LITTLE/DynamIQ. https://lwn.net/Articles/793379/.

[8]

[n.d.]. CORELLIUM: How We Port Linux to Ml. https://corellium.com/blog/linux-m1.

[9]

[n.d.]. Energy Aware Scheduling. https://www.kernel.org/doc/html/latest/scheduler/sched-energy.html.

[10]

[n.d.]. Google Cloud: Defining SLOs. https://cloud.google.com/solutions/defining-SLOs.

[11]

[n.d.]. HiKey970. https://www.96boards.org/product/hikey970/.

[12]

[n. d.]. Intel Alder Lake: Performance Hybrid with Golden Cove and Gracemont for 2021, Intel Architecture Day 2020. https://newsroom.intel.com/press-kits/architecture-day-2020/.

[13]

[n.d.]. Intel Core i5-L16G7 Processor. https://ark.intel.com/content/www/us/en/ark/products/202777/intel-core-i5-l16g7-processor-4m-cache-up-to-3-0ghz.html.

[14]

[n. d.]. Kyoto Cabinet: a straightforward implementation of DBM. https://dbmx.net/kyotocabinet/.

[15]

[n.d.]. LevelDB. https://github.com/google/leveldb.

[16]

[n.d.]. LMDB TECHNICAL INFORMATION. https://symas.com/lmdb/technical/.

[17]

[n. d.]. A Look at Intel Lakefield: A 3D-Stacked Single-ISA Heterogeneous Penta-Core SoC. https://fuse.wikichip.org/news/3417/a-look-at-intel-lakefield-a-3d-stacked-single-isa-heterogeneous-penta-core-soc/.

[18]

[n. d.]. New Intel Core Processors with Intel Hybrid Technology. https://www.intel.com/content/www/us/en/products/docs/processors/core/core-processors-with-hybrid-technology-brief.html.

[19]

[n. d.]. Processing Architecture for Power Efficiency and Performance. https://www.arm.com/why-arm/technologies/big-little.

[20]

[n.d.]. SQLite. https://www.sqlite.org/index.html.

[21]

[n. d.]. Sysbench: Scriptable database and system performance benchmark. https://github.com/akopytov/sysbench.

[22]

[n.d.]. upscaledb: embedded database technology. https://upscaledb.com/.

[23]

[n.d.]. YCSB Core Workloads. https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads.

[24]

S. Akram, J. B. Sartor, and L. Eeckhout. 2016. DVFS performance prediction for managed multithreaded applications. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 12--23.

[25]

Mark Allman, Vern Paxson, Wright Stevens, et al. 1999. TCP congestion control. (1999).

[26]

Silas Boyd-Wickizer, M Frans Kaashoek, Robert Morris, and Nickolai Zeldovich. 2012. Non-scalable locks are dangerous. In Proceedings of the Linux Symposium. 119--130.

[27]

Juan M. Cebrian, Daniel Sánchez, Juan L. Aragón, and Stefanos Kaxiras. 2013. Efficient inter-core power and thermal balancing for multicore processors. Computing 95, 7 (2013), 537--566.

[28]

Milind Chabbi, Michael Fagan, and John Mellor-Crummey. 2015. High Performance Locks for Multi-Level NUMA Systems. SIGPLAN Not. 50, 8 (Jan. 2015), 215--226.

Digital Library

[29]

Milind Chabbi and John Mellor-Crummey. 2016. Contention-Conscious, Locality-Preserving Locks. SIGPLAN Not. 51, 8, Article 22 (Feb. 2016), 14 pages.

Digital Library

[30]

Haibo Chen, Heng Zhang, Ran Liu, Binyu Zang, and Haibing Guan. 2016. Fast Consensus Using Bounded Staleness for Scalable Read-Mostly Synchronization. IEEE Trans. Parallel Distributed Syst. 27, 12 (2016), 3485--3500.

Digital Library

[31]

Howard Chu. 2011. MDB: A memory-mapped database and backend for OpenLDAP. In Proceedings of the 3rd International Conference on LDAP, Heidelberg, Germany. 35.

[32]

Rafael Lourenco de Lima Chehab, Antonio Paolillo, Diogo Behrens, Ming Fu, Hermann Härtig, and Haibo Chen. 2021. CLoF: A Compositional Lock Framework for Multi-Level NUMA Systems. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles (Virtual Event, Germany) (SOSP '21). Association for Computing Machinery, New York, NY, USA, 851--865.

Digital Library

[33]

Jeffrey Dean and Luiz André Barroso. 2013. The Tail at Scale. Commun. ACM 56 (2013), 74--80. http://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext

Digital Library

[34]

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon's Highly Available Key-Value Store. In Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles (Stevenson, Washington, USA) (SOSP '07). Association for Computing Machinery, New York, NY, USA, 205--220.

Digital Library

[35]

Dave Dice. 2017. Malthusian Locks. In Proceedings of the Twelfth European Conference on Computer Systems (Belgrade, Serbia) (EuroSys '17). Association for Computing Machinery, New York, NY, USA, 314--327.

Digital Library

[36]

Dave Dice and Alex Kogan. 2019. Compact NUMA-Aware Locks. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) (EuroSys '19). Association for Computing Machinery, New York, NY, USA, Article 12, 15 pages.

Digital Library

[37]

Dave Dice, Virendra J. Marathe, and Nir Shavit. 2011. Flat-Combining NUMA Locks. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures (San Jose, California, USA) (SPAA '11). Association for Computing Machinery, New York, NY, USA, 65--74.

Digital Library

[38]

David Dice, Virendra J. Marathe, and Nir Shavit. 2012. Lock Cohorting: A General Technique for Designing NUMA Locks. SIGPLAN Not. 47, 8 (Feb. 2012), 247--256.

Digital Library

[39]

Babak Falsafi, Rachid Guerraoui, Javier Picorel, and Vasileios Trigonakis. 2016. Unlocking Energy. In 2016 USENIX Annual Technical Conference, USENIX ATC 2016, Denver, CO, USA, June 22--24, 2016, Ajay Gulati and Hakim Weatherspoon (Eds.). USENIX Association, 393--406. https://www.usenix.org/conference/atc16/technical-sessions/presentation/falsafi

[40]

Songchun Fan and Benjamin C. Lee. 2016. Evaluating asymmetric multiprocessing for mobile applications. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2016, Uppsala, Sweden, April 17--19, 2016. IEEE Computer Society, 235--244.

[41]

Panagiota Fatourou and Nikolaos D. Kallimanis. 2011. A Highly-Efficient Wait-Free Universal Construction. In Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures (San Jose, California, USA) (SPAA '11). Association for Computing Machinery, New York, NY, USA, 325--334.

Digital Library

[42]

Panagiota Fatourou and Nikolaos D. Kallimanis. 2012. Revisiting the Combining Synchronization Technique. SIGPLAN Not. 47, 8 (Feb. 2012), 257--266.

Digital Library

[43]

Jinyu Gu, Qianqian Yu, Xiayang Wang, Zhaoguo Wang, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Pisces: A Scalable and Efficient Persistent Transactional Memory. In 2019 USENIX Annual Technical Conference, USENIX ATC 2019, Renton, WA, USA, July 10--12, 2019, Dahlia Malkhi and Dan Tsafrir (Eds.). USENIX Association, 913--928. https://www.usenix.org/conference/atc19/presentation/gu

[44]

Rachid Guerraoui, Hugo Guiroux, Renaud Lachaize, Vivien Quéma, and Vasileios Trigonakis. 2019. Lock-Unlock: Is That All? A Pragmatic Analysis of Locking in Software Systems. ACM Trans. Comput. Syst. 36, 1, Article 1 (March 2019), 149 pages.

Digital Library

[45]

Phuong Hoai Ha, Marina Papatriantafilou, and Philippas Tsigas. 2007. Efficient self-tuning spin-locks using competitive analysis. J. Syst. Softw. 80, 7 (2007), 1077--1090.

Digital Library

[46]

Mingzhe Hao, Huaicheng Li, Michael Hao Tong, Chrisma Pakha, Riza O. Suminto, Cesar A. Stuardo, Andrew A. Chien, and Haryadi S. Gunawi. 2017. MittOS: Supporting Millisecond Tail Tolerance with Fast Rejecting SLO-Aware OS Interface. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 168--183.

Digital Library

[47]

Danny Hendler, Itai Incze, Nir Shavit, and Moran Tzafrir. 2010. Flat Combining and the Synchronization-Parallelism Tradeoff. In Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures (Thira, Santorini, Greece) (SPAA '10). Association for Computing Machinery, New York, NY, USA, 355--364.

Digital Library

[48]

Brian Jeff. 2013. big. LITTLE technology moves towards fully heterogeneous global task scheduling. ARM white paper (2013).

[49]

Anna R. Karlin, Kai Li, Mark S. Manasse, and Susan S. Owicki. 1991. Empirical Studies of Competitive Spinning for a Shared-Memory Multiprocessor. In Proceedings of the Thirteenth ACM Symposium on Operating System Principles, SOSP 1991, Asilomar Conference Center, Pacific Grove, California, USA, October 13--16, 1991, Henry M. Levy (Ed.). ACM, 41--55.

Digital Library

[50]

Sanidhya Kashyap, Irina Calciu, Xiaohe Cheng, Changwoo Min, and Taesoo Kim. 2019. Scalable and Practical Locking with Shuffling. In Proceedings of the 27th ACM Symposium on Operating Systems Principles (Huntsville, Ontario, Canada) (SOSP '19). Association for Computing Machinery, 586--599.

Digital Library

[51]

Sanidhya Kashyap, Changwoo Min, and Taesoo Kim. 2017. Scalable NUMA-aware Blocking Synchronization Primitives. In 2017 USENIX Annual Technical Conference, USENIX ATC 2017, Santa Clara, CA, USA, July 12--14, 2017, Dilma Da Silva and Bryan Ford (Eds.). USENIX Association, 603--615. https://www.usenix.org/conference/atc17/technical-sessions/presentation/kashyap

[52]

David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Bias Scheduling in Heterogeneous Multi-Core Architectures. In Proceedings of the 5th European Conference on Computer Systems (Paris, France) (EuroSys '10). Association for Computing Machinery, New York, NY, USA, 125--138.

Digital Library

[53]

Rakesh Kumar, Keith I Farkas, Norman P Jouppi, Parthasarathy Ranganathan, and Dean M Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36. IEEE, 81--92.

[54]

Rakesh Kumar, Dean M Tullsen, Parthasarathy Ranganathan, Norman P Jouppi, and Keith I Farkas. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings. 31st Annual International Symposium on Computer Architecture, 2004. IEEE, 64--75.

[55]

Nian Liu, Binyu Zang, and Haibo Chen. 2020. No Barrier in the Road: A Comprehensive Study and Optimization of ARM Barriers. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (San Diego, California) (PPoPP '20). Association for Computing Machinery, New York, NY, USA, 348--361.

Digital Library

[56]

Ran Liu, Heng Zhang, and Haibo Chen. 2014. Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks. In 2014 USENIX Annual Technical Conference, USENIX ATC '14, Philadelphia, PA, USA, June 19--20, 2014, Garth Gibson and Nickolai Zeldovich (Eds.). USENIX Association, 219--230. https://www.usenix.org/conference/atc14/technical-sessions/presentation/liu

[57]

David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014, Minneapolis, MN, USA, June 14--18, 2014. IEEE Computer Society, 301--312.

[58]

Jean-Pierre Lozi, Florian David, Gaël Thomas, Julia L. Lawall, and Gilles Muller. 2012. Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications. In 2012 USENIX Annual Technical Conference, Boston, MA, USA, June 13--15, 2012, Gernot Heiser and Wilson C. Hsieh (Eds.). USENIX Association, 65--76. https://www.usenix.org/conference/atc12/technical-sessions/presentation/lozi

[59]

Victor Luchangco, Daniel Nussbaum, and Nir Shavit. 2006. A Hierarchical CLH Queue Lock. In Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28 - September 1, 2006, Proceedings (Lecture Notes in Computer Science, Vol. 4128), Wolfgang E. Nagel, Wolfgang V. Walter, and Wolfgang Lehner (Eds.). Springer, 801--810.

Digital Library

[60]

Romolo Marotta, Davide Tiriticco, Pierangelo di Sanzo, Alessandro Pellegrini, Bruno Ciciani, and Francesco Quaglia. 2020. Mutable locks: Combining the best of spin and sleep locks. Concurr. Comput. Pract. Exp. 32, 22 (2020).

[61]

John M. Mellor-Crummey and Michael L. Scott. 1991. Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21--65.

Digital Library

[62]

Yoshihiro Oyama, Kenjiro Taura, and Akinori Yonezawa. 1999. Executing parallel programs with synchronization bottlenecks efficiently. In Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, Vol. 16. Citeseer, 95.

[63]

Zoran Radovic and Erik Hagersten. 2003. Hierarchical Backoff Locks for Nonuniform Communication Architectures. In Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), Anaheim, California, USA, February 8--12, 2003. IEEE Computer Society, 241--252.

[64]

Sepideh Roghanchi, Jakob Eriksson, and Nilanjana Basu. 2017. Ffwd: Delegation is (Much) Faster than You Think. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). Association for Computing Machinery, New York, NY, USA, 342--358.

Digital Library

[65]

Helgi Sigurbjarnarson, James Bornholt, Emina Torlak, and Xi Wang. 2016. Push-Button Verification of File Systems via Crash Refinement. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 1--16.

Digital Library

[66]

Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 2012 39th Annual International Symposium on Computer Architecture (ISCA). 213--224.

[67]

Jons-Tobias Wamhoff, Stephan Diestelhorst, Christof Fetzer, Patrick Marlier, Pascal Felber, and Dave Dice. 2014. The TURBO Diaries: Application-controlled Frequency Scaling Explained. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). USENIX Association, Philadelphia, PA, 193--204. https://www.usenix.org/conference/atc14/technical-sessions/presentation/wamhoff

[68]

Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy H. Katz, and Ion Stoica. 2012. Cake: enabling high-level SLOs on shared storage systems. In ACM Symposium on Cloud Computing, SOCC '12, San Jose, CA, USA, October 14--17, 2012, Michael J. Carey and Steven Hand (Eds.). ACM, 14.

Digital Library

[69]

Xi Yang, Stephen M. Blackburn, and Kathryn S. McKinley. 2016. Elfen Scheduling: Fine-Grain Principled Borrowing from Latency-Critical Workloads Using Simultaneous Multithreading. In 2016 USENIX Annual Technical Conference, USENIX ATC 2016, Denver, CO, USA, June 22--24, 2016, Ajay Gulati and Hakim Weatherspoon (Eds.). USENIX Association, 309--322. https://www.usenix.org/conference/atc16/technical-sessions/presentation/yang

[70]

Kisoo Yu, Donghee Han, Changhwan Youn, Seungkon Hwang, and Jaechul Lee. 2013. Power-aware task scheduling for big. LITTLE mobile processor. In 2013 International SoC Design Conference (ISOCC). IEEE, 208--212.

[71]

Mingzhe Zhang, Haibo Chen, Luwei Cheng, Francis C. M. Lau, and Cho-Li Wang. 2017. Scalable Adaptive NUMA-Aware Lock. IEEE Trans. Parallel Distributed Syst. 28, 6 (2017), 1754--1769.

Digital Library

[72]

Timothy Zhu, Michael A. Kozuch, and Mor Harchol-Balter. 2017. Work-loadCompactor: reducing datacenter cost while providing tail latency SLO guarantees. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, September 24--27, 2017. ACM, 598--610.

Digital Library

Cited By

Nie SLiu YNiu JWu W(2024)CAL: Core-Aware Lock for the big.LITTLE Multicore ArchitectureApplied Sciences10.3390/app1415644914:15(6449)Online publication date: 24-Jul-2024
https://doi.org/10.3390/app14156449
Li NGuo JHuang BLi YZhang YLi CHuang W(2024)TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334257335:2(297-309)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3342573
Li WCheng HLu ZLu YLiu W(2023)HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00025(209-220)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00025

Index Terms

Asymmetry-aware scalable locking
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Multithreading
        Mutual exclusion
    2. Extra-functional properties
      1. Software performance

Recommendations

HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors
ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

Sparse general matrix-matrix multiplication (SpGEMM) is an important kernel in computational science and engineering, and has been widely studied on homogeneous processors, e.g., CPUs and GPUs. Recently, the asymmetric multicore processors (AMPs), ...
Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era
Special Issue on High-Performance Embedded Architectures and Compilers

Extracting high memory-level parallelism (MLP) is essential for speeding up single-threaded applications which are memory bound. At the same time, the projected amount of dark silicon (the fraction of the chip powered off) on a chip is growing. Hence, ...
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
GPGPU-7: Proceedings of Workshop on General Purpose Processing Using GPUs

Heap is one of the most important fundamental data structures in computer science. Unfortunately, for a long time heaps did not obtain ideal performance gain from widely used throughput-oriented processors because of two reasons: (1) heap property ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '22: Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

April 2022

495 pages

ISBN:9781450392044

DOI:10.1145/3503221

General Chair:
Jaejin Lee
Seoul National University
,
Program Chairs:
Kunal Agrawal
Washington University
,
Michael Spear
Lehigh University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

China National Natural Science Foundation
High-Tech Support Program from Shanghai Committee of Science and Technology

Conference

PPoPP '22

Sponsor:

PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

April 2 - 6, 2022

Seoul, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
330
Total Downloads

Downloads (Last 12 months)86
Downloads (Last 6 weeks)15

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nie SLiu YNiu JWu W(2024)CAL: Core-Aware Lock for the big.LITTLE Multicore ArchitectureApplied Sciences10.3390/app1415644914:15(6449)Online publication date: 24-Jul-2024
https://doi.org/10.3390/app14156449
Li NGuo JHuang BLi YZhang YLi CHuang W(2024)TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334257335:2(297-309)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TPDS.2023.3342573
Li WCheng HLu ZLu YLiu W(2023)HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00025(209-220)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00025

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents