Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2610384.2610393acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article

Performance regression testing of concurrent classes

Published: 21 July 2014 Publication History

Abstract

Developers of thread-safe classes struggle with two opposing goals. The class must be correct, which requires synchronizing concurrent accesses, and the class should provide reasonable performance, which is difficult to realize in the presence of unnecessary synchronization. Validating the performance of a thread-safe class is challenging because it requires diverse workloads that use the class, because existing performance analysis techniques focus on individual bottleneck methods, and because reliably measuring the performance of concurrent executions is difficult. This paper presents SpeedGun, an automatic performance regression testing technique for thread-safe classes. The key idea is to generate multi-threaded performance tests and to compare two versions of a class with each other. The analysis notifies developers when changing a thread-safe class significantly influences the performance of clients of this class. An evaluation with 113 pairs of classes from popular Java projects shows that the analysis effectively identifies 13 performance differences, including performance regressions that the respective developers were not aware of.

References

[1]
E. R. Altman, M. Arnold, S. Fink, and N. Mitchell. Performance analysis of idle programs. In Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 739–753. ACM, 2010.
[2]
C. Artho, K. Havelund, and A. Biere. High-level data races. Software Testing, Verification and Reliability, 13(4):207–227, 2003.
[3]
M. Attariyan, M. Chow, and J. Flinn. X-ray: Automating root-cause diagnosis of performance anomalies in production. In Symposium on Operating Systems Design and Implementation (OSDI), pages 307–320, 2012.
[4]
A. Avritzer and E. J. Weyuker. The automatic generation of load test suites and the assessment of the resulting software. IEEE Transactions on Software Engineering, 21(9):705–716, 1995.
[5]
P. A. Brooks and A. M. Memon. Automated GUI testing guided by usage profiles. In Conference on Automated Software Engineering (ASE), pages 333–342, 2007.
[6]
S. Burckhardt, C. Dern, M. Musuvathi, and R. Tan. Line-Up: a complete and automatic linearizability checker. In Conference on Programming Language Design and Implementation (PLDI), pages 330–340. ACM, 2010.
[7]
S. Burckhardt, P. Kothari, M. Musuvathi, and S. Nagarakatte. A randomized scheduler with probabilistic guarantees of finding bugs. In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 167–178, 2010.
[8]
J. Burnim, S. Juvekar, and K. Sen. WISE: Automated test generation for worst-case complexity. In ICSE, pages 463–473. IEEE, 2009.
[9]
C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Symposium on Operating Systems Design and Implementation (OSDI), pages 209–224. USENIX, 2008.
[10]
T. Chen, L. I. Ananiev, and A. V. Tikhonov. Keeping kernel performance from regressions. In Linux Symposium, 2007.
[11]
I. Ciupa, A. Leitner, M. Oriol, and B. Meyer. ARTOO: adaptive random testing for object-oriented software. In International Conference on Software Engineering (ICSE), pages 71–80. ACM, 2008.
[12]
K. E. Coons, S. Burckhardt, and M. Musuvathi. GAMBIT: effective unit testing for concurrency libraries. In Symposium on Principles and Practice of Parallel Programming, (PPOPP), pages 15–24. ACM, 2010.
[13]
C. Csallner and Y. Smaragdakis. JCrasher: an automatic robustness tester for Java. Software Practice and Experience, 34(11):1025–1050, 2004.
[14]
B. Daniel, D. Dig, K. Garcia, and D. Marinov. Automated testing of refactoring engines. In European Software Engineering Conference and International Symposium on Foundations of Software Engineering (ESEC/FSE), pages 185–194. ACM, 2007.
[15]
O. Edelstein, E. Farchi, Y. Nir, G. Ratsaby, and S. Ur. Multithreaded Java program test generation. IBM Systems Journal, 41(1):111–125, 2002.
[16]
C. Flanagan and S. N. Freund. FastTrack: efficient and precise dynamic race detection. In Conference on Programming Language Design and Implementation (PLDI), pages 121–133. ACM, 2009.
[17]
C. Flanagan, S. N. Freund, and J. Yi. Velodrome: a sound and complete dynamic atomicity checker for multithreaded programs. In Conference on Programming Language Design and Implementation (PLDI), pages 293–303. ACM, 2008.
[18]
K. C. Foo, Z. M. Jiang, B. Adams, A. E. Hassan, Y. Zou, and P. Flora. Mining performance regression testing repositories for automated performance analysis. In International Conference on Quality Software (QSIC), pages 32–41, 2010.
[19]
A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In Conference on Object-Oriented Programming, Systems, Languages, and Application (OOPSLA), pages 57–76. ACM, 2007.
[20]
P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. In Conference on Programming Language Design and Implementation (PLDI), pages 213–223. ACM, 2005.
[21]
S. L. Graham, P. B. Kessler, and M. K. Mckusick. Gprof: A call graph execution profiler. In SIGPLAN Symposium on Compiler Construction, pages 120–126. ACM, 1982.
[22]
M. Grechanik, C. Fu, and Q. Xie. Automatically finding performance problems with feedback-directed learning software testing. In International Conference on Software Engineering (ICSE), pages 156–166, 2012.
[23]
S. Han, Y. Dang, S. Ge, D. Zhang, and T. Xie. Performance debugging in the large via mining millions of stack traces. In International Conference on Software Engineering (ICSE), pages 145–155. IEEE, 2012.
[24]
M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: understanding the behavior of object-oriented applications. In Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 251–269, 2004.
[25]
S. Hong, J. Ahn, S. Park, M. Kim, and M. J. Harrold. Testing concurrent programs to achieve high synchronization coverage. In International Symposium on Software Testing and Analysis (ISSTA), pages 210–220. ACM, 2012.
[26]
J. Huang and C. Zhang. Persuasive prediction of concurrency access anomalies. In International Symposium on Software Testing and Analysis (ISSTA), pages 144–154. ACM, 2011.
[27]
M. Ji, E. W. Felten, and K. Li. Performance measurements for multithreaded programs. In Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), pages 161–170, 1998.
[28]
G. Jin, L. Song, X. Shi, J. Scherpelz, and S. Lu. Understanding and detecting real-world performance bugs. In Conference on Programming Language Design and Implementation (PLDI), pages 77–88. ACM, 2012.
[29]
W. Jin, A. Orso, and T. Xie. Automated behavioral regression testing. In International Conference on Software Testing, Verification and Validation (ICST), pages 137–146. IEEE, 2010.
[30]
P. Joshi, C.-S. Park, K. Sen, and M. Naik. A randomized dynamic program analysis technique for detecting real deadlocks. In Conference on Programming Language Design and Implementation (PLDI), pages 110–120. ACM, 2009.
[31]
M. Jovic, A. Adamoli, and M. Hauswirth. Catch me if you can: performance bug detection in the wild. In Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 155–170. ACM, 2011.
[32]
Z. Lai, S.-C. Cheung, and W. K. Chan. Detecting atomic-set serializability violations in multithreaded programs through active randomized testing. In International Conference on Software Engineering (ICSE), pages 235–244. ACM, 2010.
[33]
B. Lucia and L. Ceze. Finding concurrency bugs with context-aware communication graphs. In Symposium on Microarchitecture (MICRO), pages 553–563. ACM, 2009.
[34]
S. McCamant and M. D. Ernst. Predicting problems caused by component upgrades. In European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE), pages 287–296. ACM, 2003.
[35]
W. M. McKeeman. Differential testing for software. Digital Technical Journal, 10(1):100–107, 1998.
[36]
M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. A. Nainar, and I. Neamtiu. Finding and reproducing Heisenbugs in concurrent programs. In Symposium on Operating Systems Design and Implementation, pages 267–280. USENIX, 2008.
[37]
A. Nistor, Q. Luo, M. Pradel, T. R. Gross, and D. Marinov. Ballerina: Automatic generation and clustering of efficient random unit tests for multithreaded code. In International Conference on Software Engineering (ICSE), pages 727–737, 2012.
[38]
A. Nistor, L. Song, D. Marinov, and S. Lu. Toddler: Detecting performance problems via similar memory-access patterns. In International Conference on Software Engineering (ICSE), pages 562–571, 2013.
[39]
C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In International Conference on Software Engineering (ICSE), pages 75–84. IEEE, 2007.
[40]
C.-S. Park and K. Sen. Randomized active atomicity violation detection in concurrent programs. In Symposium on Foundations of Software Engineering (FSE), pages 135–145. ACM, 2008.
[41]
M. Pradel and T. R. Gross. Fully automatic and precise detection of thread safety violations. In Conference on Programming Language Design and Implementation (PLDI), pages 521–530, 2012.
[42]
M. Pradel and T. R. Gross. Leveraging test generation and specification mining for automated bug detection without false positives. In International Conference on Software Engineering (ICSE), pages 288–298, 2012.
[43]
M. Pradel and T. R. Gross. Automatic testing of sequential and concurrent substitutability. In International Conference on Software Engineering (ICSE), pages 282–291, 2013.
[44]
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. E. Anderson. Eraser: A dynamic data race detector for multithreaded programs. ACM Transactions on Computer Systems, 15(4):391–411, 1997.
[45]
K. Sen. Race directed random testing of concurrent programs. In Conference on Programming Language Design and Implementation (PLDI), pages 11–21. ACM, 2008.
[46]
O. Shacham, N. Bronson, A. Aiken, M. Sagiv, M. Vechev, and E. Yahav. Testing atomicity of composed concurrent operations. In Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages 51–64, 2011.
[47]
Y. Shi, S. Park, Z. Yin, S. Lu, Y. Zhou, W. Chen, and W. Zheng. Do I use the wrong definition? DefUse: Definition-use invariants for detecting concurrency and sequential bugs. In Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 160–174. ACM, 2010.
[48]
S. Steenbuck and G. Fraser. Generating unit tests for concurrent classes. In International Conference on Software Testing, Verification and Validation (ICST), 2013.
[49]
S. Tasharofi, M. Pradel, Y. Lin, and R. Johnson. Bita: Coverage-guided, automatic testing of actor programs. In Conference on Automated Software Engineering (ASE), 2013.
[50]
S. Thummalapenta, T. Xie, N. Tillmann, J. de Halleux, and W. Schulte. MSeqGen: Object-oriented unit-test generation via mining source code. In European Software Engineering Conference and International Symposium on Foundations of Software Engineering (ESEC/FSE), pages 193–202. ACM, 2009.
[51]
W. Visser, C. S. Pasareanu, and S. Khurshid. Test input generation with Java PathFinder. In International Symposium on Software Testing and Analysis (ISSTA), pages 97–107. ACM, 2004.
[52]
A. Wert, J. Happe, and L. Happe. Supporting swift reaction: Automatically uncovering performance problems by systematic experiments. In International Conference on Software Engineering (ICSE), pages 552–561, 2013.
[53]
T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra: A framework for generating object-oriented unit tests using symbolic execution. In Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 365–381. Springer, 2005.
[54]
G. Xu. Finding reusable data structures. In Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 1017–1034. ACM, 2012.
[55]
D. Yan, G. H. Xu, and A. Rountev. Uncovering performance problems in Java applications with reference propagation profiling. In International Conference on Software Engineering, (ICSE), pages 134–144. IEEE, 2012.
[56]
C. Yilmaz, A. S. Krishna, A. M. Memon, A. A. Porter, D. C. Schmidt, A. S. Gokhale, and B. Natarajan. Main effects screening: a distributed continuous quality assurance process for monitoring performance degradation in evolving software systems. In International Conference on Software Engineering (ICSE), pages 293–302, 2005.
[57]
J. Yu, S. Narayanasamy, C. Pereira, and G. Pokam. Maple: a coverage-driven testing tool for multithreaded programs. In Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 485–502. ACM, 2012.
[58]
S. Zaman, B. Adams, and A. E. Hassan. A qualitative study on performance bugs. In Working Conference on Mining Software Repositories (MSR), pages 199–208. IEEE, 2012.
[59]
P. Zhang, S. Elbaum, and M. B. Dwyer. Automatic generation of load tests. In Conference on Automated Software Engineering (ASE), pages 43–52, 2011.
[60]
X. Zhuang, S. Kim, M. J. Serrano, and J.-D. Choi. Perfdiff: a framework for performance difference analysis in a virtual machine environment. In Symposium on Code Generation and Optimization (CGO), pages 4–13. ACM, 2008.

Cited By

View all
  • (2024)Enhancing Performance Bug Prediction Using Performance Code MetricsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644920(50-62)Online publication date: 15-Apr-2024
  • (2024)A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports with Heuristic Linguistic PatternsIEEE Transactions on Software Engineering10.1109/TSE.2024.3390623(1-22)Online publication date: 2024
  • (2024)Evaluating Search-Based Software Microbenchmark PrioritizationIEEE Transactions on Software Engineering10.1109/TSE.2024.338083650:7(1687-1703)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2014: Proceedings of the 2014 International Symposium on Software Testing and Analysis
July 2014
460 pages
ISBN:9781450326452
DOI:10.1145/2610384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 July 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Performance measurement
  2. Test generation
  3. Thread safety

Qualifiers

  • Research-article

Conference

ISSTA '14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)1
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Enhancing Performance Bug Prediction Using Performance Code MetricsProceedings of the 21st International Conference on Mining Software Repositories10.1145/3643991.3644920(50-62)Online publication date: 15-Apr-2024
  • (2024)A Platform-Agnostic Framework for Automatically Identifying Performance Issue Reports with Heuristic Linguistic PatternsIEEE Transactions on Software Engineering10.1109/TSE.2024.3390623(1-22)Online publication date: 2024
  • (2024)Evaluating Search-Based Software Microbenchmark PrioritizationIEEE Transactions on Software Engineering10.1109/TSE.2024.338083650:7(1687-1703)Online publication date: 1-Jul-2024
  • (2023)When Database Meets New Storage Devices: Understanding and Exposing Performance Mismatches via ConfigurationsProceedings of the VLDB Endowment10.14778/3587136.358714516:7(1712-1725)Online publication date: 8-May-2023
  • (2023)A Large-Scale Empirical Study of Real-Life Performance Issues in Open Source ProjectsIEEE Transactions on Software Engineering10.1109/TSE.2022.316762849:2(924-946)Online publication date: 1-Feb-2023
  • (2023)Understanding Software Performance Challenges an Empirical Study on Stack Overflow2023 International Conference on Code Quality (ICCQ)10.1109/ICCQ57276.2023.10114662(1-15)Online publication date: 22-Apr-2023
  • (2023)A systematic mapping study of software performance researchSoftware: Practice and Experience10.1002/spe.318553:5(1249-1270)Online publication date: 2-Jan-2023
  • (2022)FADATestProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510169(896-908)Online publication date: 21-May-2022
  • (2022)PerfJIT: Test-Level Just-in-Time Prediction for Performance Regression Introducing CommitsIEEE Transactions on Software Engineering10.1109/TSE.2020.302395548:5(1529-1544)Online publication date: 1-May-2022
  • (2022)Automated Identification of Performance Changes at Code Level2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS57517.2022.00096(916-925)Online publication date: Dec-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media