research-article

Public Access

SyncPerf: Categorizing, Detecting, and Diagnosing Synchronization Performance Bugs

Authors:

Mohammad Mejbah ul Alam,

Guangming Zeng, and

Abdullah MuzahidAuthors Info & Claims

EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems

April 2017

Pages 298 - 313

https://doi.org/10.1145/3064176.3064186

Published: 23 April 2017 Publication History

Abstract

Despite the obvious importance, performance issues related to synchronization primitives are still lacking adequate attention. No literature extensively investigates categories, root causes, and fixing strategies of such performance issues. Existing work primarily focuses on one type of problems, while ignoring other important categories. Moreover, they leave the burden of identifying root causes to programmers. This paper first conducts an extensive study of categories, root causes, and fixing strategies of performance issues related to explicit synchronization primitives. Based on this study, we develop two tools to identify root causes of a range of performance issues. Compare with existing work, our proposal, SyncPerf, has three unique advantages. First, SyncPerf's detection is very lightweight, with 2.3% performance overhead on average. Second, SyncPerf integrates information based on callsites, lock variables, and types of threads. Such integration helps identify more latent problems. Last but not least, when multiple root causes generate the same behavior, SyncPerf provides a second analysis tool that collects detailed accesses inside critical sections and helps identify possible root causes. SyncPerf discovers many unknown but significant synchronization performance issues. Fixing them provides a performance gain anywhere from 2.5% to 42%. Low overhead, better coverage, and informative reports make SyncPerf an effective tool to find synchronization performance bugs in the production environment.

References

[1]

T. E. Anderson. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst., 1(1):6--16, January 1990.

Digital Library

[2]

Alexander Barkov. "thr_lock_charset global mutex abused by innodb". https://bugs.mysql.com/bug.php?id=42649, 2009.

[3]

Christian Bienia and Kai Li. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, June 2009.

[4]

William J. Bolosky and Michael L. Scott. False sharing and its effect on shared memory performance. In USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems - Volume 4, Sedms'93, pages 3--3, Berkeley, CA, USA, 1993. USENIX Association.

[5]

Clay P. Breshears. Using intel thread profiler for win32* threads: Philosophy and theory. https://software.intel.com/en-us/articles/using-intel-thread-profiler-for-win32-threads-philosophy-and-theory, February 2011.

[6]

Ray Bryant and John Hawkes. Lockmeter: Highly-informative instrumentation for spin locks in the linux®kernel. In Proceedings of the 4th Annual Linux Showcase & Conference - Volume 4, ALS'00, pages 17--17, Berkeley, CA, USA, 2000. USENIX Association.

[7]

Mark Callaghan. "fast mutexes in mysql 5.1 have mutex contention when calling random()". https://bugs.mysql.com/bug.php?id=38941, 2008.

[8]

Guancheng Chen and Per Stenstrom. Critical lock analysis: Diagnosing critical section bottlenecks in multithreaded applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 71:1--71:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.

Digital Library

[9]

GCC community. "built-in functions for memory model aware atomic operations". https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html, 2015.

[10]

Charlie Curtsinger and Emery D. Berger. Coz: Finding code that counts with causal profiling. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP '15, pages 184--197, New York, NY, USA, 2015. ACM.

Digital Library

[11]

Luiz DeRose, Bill Homer, and Dean Johnson. Detecting application load imbalance on high end massively parallel systems. In Anne-Marie Kermarrec, Luc Boug, and Thierry Priol, editors, Euro-Par 2007 Parallel Processing, volume 4641 of Lecture Notes in Computer Science, pages 150--159. Springer Berlin Heidelberg, 2007.

[12]

Luiz DeRose, Bill Homer, Dean Johnson, Steve Kaufmann, and Heidi Poxon. Cray performance analysis tools. In Michael Resch, Rainer Keller, Valentin Himmler, Bettina Krammer, and Alexander Schulz, editors, Tools for High Performance Computing, pages 191--199. Springer Berlin Heidelberg, 2008.

[13]

David Dice, Virendra J. Marathe, and Nir Shavit. Lock cohorting: A general technique for designing numa locks. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 247--256, New York, NY, USA, 2012. ACM.

Digital Library

[14]

Pedro C. Diniz and Martin C. Rinard. Lock coarsening: Eliminating lock overhead in automatically parallelized object-based programs. J. Parallel Distrib. Comput., 49(2):218--244, March 1998.

Digital Library

[15]

Kristof Du Bois, Stijn Eyerman, Jennifer B. Sartor, and Lieven Eeckhout. Criticality stacks: Identifying critical threads in parallel programs using synchronization behavior. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA '13, pages 511--522, New York, NY, USA, 2013. ACM.

Digital Library

[16]

S.J. Eggers and T.E. Jeremiassen. Eliminating false sharing. In International Conference on Parallel Processing, volume I, pages 377--381, August 1991.

[17]

ej-technologies GmbH. Jprofiler: The award-winning all-in-one java profiler. http://www.ej-technologies.com/products/jprofiler/overview.html.

[18]

David Florian. "continuous and efficient lock profiling for java on multicore architectures". http://www-public.tem-tsp.eu/~thomas_g/research/etudiants/theses/david-phd-thesis.pdf, 2015.

[19]

Rui Gu, Guoliang Jin, Linhai Song, Linjie Zhu, and Shan Lu. What change history tells us about thread synchronization. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 426--438, New York, NY, USA, 2015. ACM.

Digital Library

[20]

Maurice Herlihy and J. Eliot B. Moss. Transactional memory: Architectural support for lock-free data structures. In Proceedings of the 20th Annual International Symposium on Computer Architecture, ISCA '93, pages 289--300, New York, NY, USA, 1993. ACM.

Digital Library

[21]

Yongbing Huang, Zehan Cui, Licheng Chen, Wenli Zhang, Yungang Bao, and Mingyu Chen. Halock: Hardware-assisted lock contention detection in multithreaded applications. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT '12, pages 253--262, New York, NY, USA, 2012. ACM.

Digital Library

[22]

Intel. Using the rdtsc instruction for performance monitoring. https://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf, 1997.

[23]

Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. Understanding and detecting real-world performance bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '12, pages 77--88, New York, NY, USA, 2012. ACM.

Digital Library

[24]

Piotr Zalewski Jinwoo Hwang. Ibm thread and monitor dump analyze for java. https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=2245aa39-fa5c-4475-b891-14c205f7333c.

[25]

Milind Kulkarni, Patrick Carribault, Keshav Pingali, Ganesh Ramanarayanan, Bruce Walter, Kavita Bala, and L. Paul Chew. Scheduling strategies for optimistic parallel execution of irregular programs. In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures, SPAA '08, pages 217--228, New York, NY, USA, 2008. ACM.

Digital Library

[26]

Ran Liu, Heng Zhang, and Haibo Chen. Scalable read-mostly synchronization using passive reader-writer locks. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, pages 219--230, Berkeley, CA, USA, 2014. USENIX Association.

Digital Library

[27]

Tongping Liu and Emery D. Berger. Sheriff: precise detection and automatic mitigation of false sharing. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, OOPSLA '11, pages 3--18, New York, NY, USA, 2011. ACM.

Digital Library

[28]

Xu Liu, John Mellor-Crummey, and Michael Fagan. A new approach for performance analysis of openmp programs. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, ICS '13, pages 69--80, New York, NY, USA, 2013. ACM.

Digital Library

[29]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 190--200, New York, NY, USA, 2005. ACM.

Digital Library

[30]

Mecki. "when should one use a spinlock instead of mutex?". http://stackoverflow.com/questions/5869825/when-should-one-use-a-spinlock-instead-of-mutex, 2011.

[31]

Wagner Meira, Jr., Thomas J. LeBlanc, and Alexandros Poulos. Waiting time analysis and performance visualization in carnival. In Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, SPDT '96, pages 1--10, New York, NY, USA, 1996. ACM.

Digital Library

[32]

John M. Mellor-Crummey and Michael L. Scott. Scalable reader-writer synchronization for shared-memory multiprocessors. In Proceedings of the Third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '91, pages 106--113, New York, NY, USA, 1991. ACM.

Digital Library

[33]

Angeles Navarro, Rafael Asenjo, Siham Tabik, and Calin Cascaval. Analytical modeling of pipeline parallelism. In Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques, PACT '09, pages 281--290, Washington, DC, USA, 2009. IEEE Computer Society.

Digital Library

[34]

Notlikethat. "atomic operations on floats". http: //stackoverflow.com/questions/20981007/atomicoperations-on-floats, 2014.

[35]

Oracle. Hprof: A heap/cpu profiling tool. http://docs.oracle.com/javase/7/docs/technotes/samples/hprof.html.

[36]

Oracle. Solaris Performance Analyzer. http://www.oracle.com/technetwork/server-storage/solarisstudio/documentation/o11-151-perf-analyzer-brief-1405338.pdf.

[37]

James Rapp. "diagnosing lock contention with the concurrency visualizer". http://blogs.msdn.com/b/visualizeparallel/archive/2010/07/30/diagnosing-lock-contention-with-the-concurrency-visualizer.aspx, 2010.

[38]

Michael L. Scott. Non-blocking timeout in scalable queue-based spin locks. In Proceedings of the Twenty-first Annual Symposium on Principles of Distributed Computing, PODC '02, pages 31--40, New York, NY, USA, 2002. ACM.

Digital Library

[39]

M. Aater Suleman, Moinuddin K. Qureshi, Khubaib, and Yale N. Patt. Feedback-directed pipeline parallelism. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT '10, pages 147--156, New York, NY, USA, 2010. ACM.

Digital Library

[40]

Nathan R. Tallent, Laksono Adhianto, and John M. Mellor-Crummey. Scalable identification of load imbalance in parallel executions using call path profiles. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--11, Washington, DC, USA, 2010.

Digital Library

[41]

Nathan R. Tallent, John M. Mellor-Crummey, and Allan Porterfield. Analyzing lock contention in multithreaded applications. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 269--280, New York, NY, USA, 2010. ACM.

Digital Library

[42]

Weiwei Xiong, Soyeon Park, Jiaqi Zhang, Yuanyuan Zhou, and Zhiqiang Ma. Ad hoc synchronization considered harmful. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI'10, pages 1--8, Berkeley, CA, USA, 2010. USENIX Association.

[43]

Tingting Yu and Michael Pradel. Syncprof: Detecting, localizing, and optimizing synchronization bottlenecks, 2016.

[44]

Long Zheng, Xiaofei Liao, Bingsheng He, Song Wu, and Hai Jin. On performance debugging of unnecessary lock contentions on multicore processors: A replay-based approach. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO '15, pages 56--67, Washington, DC, USA, 2015. IEEE Computer Society.

Cited By

Ahmed ALiscano RAzim AChan YSundaresan VSaadatmand MLonetti FBudnik CLi JGuerriero A(2024)Identification of Java lock contention anti-patterns based on run-time performance dataProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)10.1145/3644032.3644466(209-213)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3644032.3644466
Khosravi Tabrizi AEzzati-Jivan NTetreault FBalsamo SKnottenbelt WAbad CShang W(2024)An Adaptive Logging System (ALS): Enhancing Software Logging with Reinforcement Learning TechniquesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645033(37-47)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3629526.3645033
Li NGuo JHuang BLi YZhang YLi CHuang W(2024)TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334257335:2(297-309)Online publication date: Feb-2024
https://doi.org/10.1109/TPDS.2023.3342573
Show More Cited By

SyncPerf: Categorizing, Detecting, and Diagnosing Synchronization Performance Bugs
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

Using Node Diagnosability to Determine t-Diagnosability under the Comparison Diagnosis Model

Diagnosis is an essential subject for the reliability of a multiprocessor system. Under the comparison diagnosis model, Sengupta and Dahbura proposed a polynomial-time algorithm with time complexity O(N^{5}) to identify all the faulty processors for a ...
Read More
Guidelines for snowballing in systematic literature studies and a replication in software engineering
EASE '14: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering

Background: Systematic literature studies have become common in software engineering, and hence it is important to understand how to conduct them efficiently and reliably.

Objective: This paper presents guidelines for conducting literature reviews using ...
Read More
Equal relation between g-good-neighbor diagnosability under the PMC model and g-good-neighbor diagnosability under the MM∗ model of a graph
Abstract
Diagnosability has played an important role in reliability of an interconnection network. In 2012, Peng et al. proposed a new measure of diagnosability, namely, g-good-neighbor diagnosability, which requires every fault-free vertex has ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems

April 2017

648 pages

ISBN:9781450349383

DOI:10.1145/3064176

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Science Foundation
National Science Foundation2

Conference

EuroSys '17

Sponsor:

SIGOPS

EuroSys '17: Twelfth EuroSys Conference 2017

April 23 - 26, 2017

Belgrade, Serbia

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25

Sponsor:
sigops

Twentieth European Conference on Computer Systems

March 30 - April 3, 2025

Rotterdam , Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
1,048
Total Downloads

Downloads (Last 12 months)118
Downloads (Last 6 weeks)11

Other Metrics

View Author Metrics

Citations

Cited By

Ahmed ALiscano RAzim AChan YSundaresan VSaadatmand MLonetti FBudnik CLi JGuerriero A(2024)Identification of Java lock contention anti-patterns based on run-time performance dataProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)10.1145/3644032.3644466(209-213)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3644032.3644466
Khosravi Tabrizi AEzzati-Jivan NTetreault FBalsamo SKnottenbelt WAbad CShang W(2024)An Adaptive Logging System (ALS): Enhancing Software Logging with Reinforcement Learning TechniquesProceedings of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629526.3645033(37-47)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3629526.3645033
Li NGuo JHuang BLi YZhang YLi CHuang W(2024)TCSA: Efficient Localization of Busy-Wait Synchronization Bugs for Latency-Critical ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.334257335:2(297-309)Online publication date: Feb-2024
https://doi.org/10.1109/TPDS.2023.3342573
Zhou JSilvestro STang SYang HLiu HZeng GWu BLiu CLiu T(2023)MemPerf: Profiling Allocator-Induced Performance SlowdownsProceedings of the ACM on Programming Languages10.1145/36228487:OOPSLA2(1418-1441)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3622848
Li JZhang YLu SGunawi HGu XHuang FLi D(2023)Performance Bug Analysis and Detection for Distributed Storage and Computing SystemsACM Transactions on Storage10.1145/358028119:3(1-33)Online publication date: 19-Jun-2023
https://dl.acm.org/doi/10.1145/3580281
Weng LHu YHuang PNieh JYang JFedorova ANarayanan DDi Luna GQuerzoni L(2023)Effective Performance Issue Diagnosis with Value-Assisted Cost ProfilingProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587444(1-17)Online publication date: 8-May-2023
https://dl.acm.org/doi/10.1145/3552326.3587444
Zhao YXiao LBondi AChen BLiu Y(2023)A Large-Scale Empirical Study of Real-Life Performance Issues in Open Source ProjectsIEEE Transactions on Software Engineering10.1109/TSE.2022.316762849:2(924-946)Online publication date: 1-Feb-2023
https://doi.org/10.1109/TSE.2022.3167628
Liscano RAhmed ARobertson JAzim ASundaresan VChan Y(2023)A Lock Contention Classifier Based on Java Lock Contention Anti-Patterns2023 International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA58977.2023.00165(1106-1113)Online publication date: 15-Dec-2023
https://doi.org/10.1109/ICMLA58977.2023.00165
Velez MJamshidi PSiegmund NApel SKästner CDwyer MDamian DZeller A(2022)On debugging the performance of configurable software systemsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510043(1571-1583)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510043
Chen JShang WShihab E(2022)PerfJIT: Test-Level Just-in-Time Prediction for Performance Regression Introducing CommitsIEEE Transactions on Software Engineering10.1109/TSE.2020.302395548:5(1529-1544)Online publication date: 1-May-2022
https://doi.org/10.1109/TSE.2020.3023955
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents