Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3196398.3196407acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

An evaluation of open-source software microbenchmark suites for continuous performance assessment

Published: 28 May 2018 Publication History

Abstract

Continuous integration (CI) emphasizes quick feedback to developers. This is at odds with current practice of performance testing, which predominantely focuses on long-running tests against entire systems in production-like environments. Alternatively, software microbenchmarking attempts to establish a performance baseline for small code fragments in short time. This paper investigates the quality of microbenchmark suites with a focus on suitability to deliver quick performance feedback and CI integration. We study ten open-source libraries written in Java and Go with benchmark suite sizes ranging from 16 to 983 tests, and runtimes between 11 minutes and 8.75 hours. We show that our study subjects include benchmarks with result variability of 50% or higher, indicating that not all benchmarks are useful for reliable discovery of slowdowns. We further artificially inject actual slowdowns into public API methods of the study subjects and test whether test suites are able to discover them. We introduce a performance-test quality metric called the API benchmarking score (ABS). ABS represents a benchmark suite's ability to find slowdowns among a set of defined core API methods. Resulting benchmarking scores (i.e., fraction of discovered slowdowns) vary between 10% and 100% for the study subjects. This paper's methodology and results can be used to (1) assess the quality of existing microbenchmark suites, (2) select a set of tests to be run as part of CI, and (3) suggest or generate benchmarks for currently untested parts of an API.

References

[1]
Tarek M. Ahmed, Cor-Paul Bezemer, Tse-Hsun Chen, Ahmed E. Hassan, and Weiyi Shang. 2016. Studying the Effectiveness of Application Performance Management (APM) Tools for Detecting Performance Regressions for Web Applications: An Experience Report. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR '16). ACM, New York, NY, USA, 1--12.
[2]
Juan Pablo Sandoval Alcocer and Alexandre Bergel. 2015. Tracking Down Performance Variation Against Source Code Evolution. In Proceedings of the 11th Symposium on Dynamic Languages (DLS 2015). ACM, New York, NY, USA, 129--139.
[3]
Eytan Bakshy and Eitan Frachtenberg. 2015. Design and Analysis of Benchmarking Experiments for Distributed Internet Services. In Proceedings of the 24th International Conference on World Wide Web (WWW '15). 108--118.
[4]
Cornel Barna, Marin Litoiu, and Hamoun Ghanbari. 2011. Autonomic Load-testing Framework. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC '11). ACM, New York, NY, USA, 91--100.
[5]
Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. Oops, My Tests Broke the Build: An Explorative Analysis of Travis CI with GitHub. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR).
[6]
Andreas Brunnert and Helmut Krcmar. 2017. Continuous Performance Evaluation and Capacity Planning Using Resource Profiles for Enterprise Applications. Journal of Systems and Software 123 (2017), 239 -- 262.
[7]
Lubomár Bulej, Tomáš Bureš, Vojtěch Horký, Jaroslav Kotrč, Lukáš Marek, Tomáš Trojánek, and Petr Tüma. 2017. Unit testing performance with Stochastic Performance Logic. Automated Software Engineering 24, 1 (01 Mar 2017), 139--187.
[8]
Jinfu Chen and Weiyi Shang. 2017. An Exploratory Study of Performance Regression Introducing Code Changes. In Proceedings of the 33rd International Conference on Software Maintenance and Evolution (ICSME '17). New York, NY, USA, 12.
[9]
L. Chen. 2015. Continuous Delivery: Huge Benefits, but Challenges Too. IEEE Software 32, 2 (Mar 2015), 50--54.
[10]
Augusto Born de Oliveira, Sebastian Fischmeister, Amer Diwan, Matthias Hauswirth, and Peter Sweeney. 2017. Perphecy: Performance Regression Test Selection Made Simple but Effective. In Proceedings of the 10th IEEE International Conference on Software Testing, Verification and Validation (ICST). Tokyo, Japan.
[11]
Augusto Born de Oliveira, Sebastian Fischmeister, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. 2013. Why You Should Care About Quantile Regression. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). ACM, New York, NY, USA, 207--218.
[12]
Mikael Fagerström, Emre Emir Ismail, Grischa Liebel, Rohit Guliani, Fredrik Larsson, Karin Nordling, Eric Knauss, and Patrizio Pelliccione. 2016. Verdict Machinery: on the Need to Automatically Make Sense of Test Results. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, July 18-20, 2016. 225--234.
[13]
King Chun Foo, Zhen Ming (Jack) Jiang, Bram Adams, Ahmed E. Hassan, Ying Zou, and Parminder Flora. 2015. An Industrial Case Study on the Automated Detection of Performance Regressions in Heterogeneous Environments. In Proceedings of the 37th International Conference on Software Engineering - Volume 2 (ICSE '15). IEEE Press, Piscataway, NJ, USA, 159--168. http://dl.acm.org/citation.cfm?id=2819009.2819034
[14]
Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22Nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications (OOPSLA '07). ACM, New York, NY, USA, 57--76.
[15]
Lee Gillam, Bin Li, John O'Loughlin, and Anuz Pratap Singh Tomar. 2013. Fair Benchmarking for Cloud Computing Systems. Journal of Cloud Computing: Advances, Systems and Applications 2, 1 (2013), 6.
[16]
Mark Grechanik, Chen Fu, and Qing Xie. 2012. Automatically Finding Performance Problems with Feedback-Directed Learning Software Testing. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 156--166. http://dl.acm.org/citation.cfm?id=2337223. 2337242
[17]
Mark Harman and Bryan F Jones. 2001. Search-Based Software Engineering. Information and Software Technology 43, 14 (2001), 833 -- 839.
[18]
Christoph Heger, Jens Happe, and Roozbeh Farahbod. 2013. Automated Root Cause Isolation of Performance Regressions During Software Development. In Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering (ICPE '13). ACM, New York, NY, USA, 27--38.
[19]
Vojtěch Horký, Peter Libič, Lukáš Marek, Antonin Steinhauser, and Petr Tüma. 2015. Utilizing Performance Unit Tests To Increase Performance Awareness. In Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering (ICPE '15). ACM, New York, NY, USA, 289--300.
[20]
Peng Huang, Xiao Ma, Dongcai Shen, and Yuanyuan Zhou. 2014. Performance Regression Testing Target Prioritization via Performance Risk Analysis. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 60--71.
[21]
Alexandru Iosup, Simon Ostermann, Nezih Yigitbasi, Radu Prodan, Thomas Fahringer, and Dick Epema. 2011. Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing. IEEE Transactions on Parallel and Distributed Systems 22, 6 (June 2011), 931--945.
[22]
Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing. IEEE Trans. Softw. Eng. 37, 5 (Sept. 2011), 649--678.
[23]
Z. M. Jiang and A. E. Hassan. 2015. A Survey on Load Testing of Large-Scale Software Systems. IEEE Transactions on Software Engineering 41, 11 (Nov 2015), 1091--1118.
[24]
Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and Detecting Real-world Performance Bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 77--88.
[25]
Chung Hwan Kim, Junghwan Rhee, Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2016. PerfGuard: Binary-centric Application Performance Monitoring in Production Environments. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 595--606.
[26]
Christoph Laaber and Philipp Leitner. 2018. Dataset and Scripts "An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment". (2018).
[27]
Christoph Laaber and Philipp Leitner. 2018. GoABS. https://github.com/sealuzh/GoABS/releases/tag/msr18. (2018).
[28]
Christoph Laaber and Philipp Leitner. 2018. JavaABS. https://github.com/sealuzh/JavaABS/releases/tag/msr18. (2018).
[29]
Christoph Laaber and Philipp Leitner. 2018. JavaAPIUsageTracer. https://github.com/sealuzh/JavaAPIUsageTracer/releases/tag/msr18. (2018).
[30]
Christoph Laaber and Philipp Leitner. 2018. Performance testing in the cloud. How bad is it really? PeerJ PrePrints 6 (2018), e3507v1.
[31]
Philipp Leitner and Cor-Paul Bezemer. 2017. An Exploratory Study of the State of Practice of Performance Testing in Java-Based Open Source Projects. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE '17). ACM, New York, NY, USA, 373--384.
[32]
Philipp Leitner and Jürgen Cito. 2016. Patterns in the Chaos - A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Transactions on Internet Technology 16, 3, Article 15 (April 2016), 23 pages.
[33]
Mario Linares-Vasquez, Christopher Vendome, Qi Luo, and Denys Poshyvanyk. 2015. How Developers Detect and Fix Performance Bottlenecks in Android Apps. 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) 00 (2015), 352--361.
[34]
Qi Luo, Denys Poshyvanyk, and Mark Grechanik. 2016. Mining Performance Regression Inducing Code Changes in Evolving Software. In Proceedings of the 13th International Conference on Mining Software Repositories (MSR '16). ACM, New York, NY, USA, 25--36.
[35]
Daniel A. Menascé. 2002. Load Testing of Web Sites. IEEE Internet Computing 6, 4 (2002), 70--74.
[36]
Shaikh Mostafa, Xiaoyin Wang, and Tao Xie. 2017. PerfRanker: Prioritization of Performance Regression Tests for Collection-Intensive Software. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA '17). ACM, New York, NY, USA, 23--34.
[37]
Thanh H. D. Nguyen, Meiyappan Nagappan, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2014. An Industrial Case Study of Automatically Identifying Performance Regression-Causes. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR 2014). ACM, New York, NY, USA, 232--241.
[38]
Adrian Nistor, Po-Chun Chang, Cosmin Radoi, and Shan Lu. 2015. Caramel: Detecting and Fixing Performance Problems That Have Non-Intrusive Fixes. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15). IEEE Press, Piscataway, NJ, USA, 902--912. http://dl.acm.org/citation.cfm?id=2818754.2818863
[39]
Adrian Nistor, Linhai Song, Darko Marinov, and Shan Lu. 2013. Toddler: Detecting Performance Problems via Similar Memory-access Patterns. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13). IEEE Press, Piscataway, NJ, USA, 562--571. http://dl.acm.org/citation.cfm?id=2486788.2486862
[40]
Michael Pradel, Markus Huggler, and Thomas R. Gross. 2014. Performance Regression Testing of Concurrent Classes. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA '14). ACM, New York, NY, USA, 13--25.
[41]
Thomas Rausch, Waldemar Hummer, Philipp Leitner, and Stefan Schulte. 2017. An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software. In Proceedings of the 14th International Conference on Mining Software Repositories (MSR'17). ACM, New York, NY, USA.
[42]
Julia Rubin and Martin Rinard. 2016. The Challenges of Staying Together While Moving Fast: An Exploratory Study. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 982-- 993.
[43]
Anand Ashok Sawant and Alberto Bacchelli. 2017. Fine-GRAPE: Fine-Grained APi Usage Extractor - an Approach and Dataset to Investigate API Usage. Empirical Software Engineering 22, 3 (01 Jun 2017), 1348--1371.
[44]
Gerald Schermann, Dominik Schöni, Philipp Leitner, and Harald C. Gall. 2016. Bifrost: Supporting Continuous Deployment with Automated Enactment of MultiPhase Live Testing Strategies. In Proceedings of the 17th International Middleware Conference (Middleware '16). ACM, New York, NY, USA, Article 12, 14 pages.
[45]
Marija Selakovic and Michael Pradel. 2016. Performance Issues and Optimizations in JavaScript: An Empirical Study. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 61--72.
[46]
Daniel Ståhl and Jan Bosch. 2014. Modeling Continuous Integration Practice Differences in Industry Software Development. Journal of Systems and Software 87 (Jan. 2014), 48--59.
[47]
Petr Stefan, Vojtech Horky, Lubomir Bulej, and Petr Tuma. 2017. Unit Testing Performance in Java Projects: Are We There Yet?. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE '17). ACM, New York, NY, USA, 401--412.
[48]
Chunqiang Tang, Thawan Kooburat, Pradeep Venkatachalam, Akshay Chander, Zhe Wen, Aravind Narayanan, Patrick Dowell, and Robert Karl. 2015. Holistic Configuration Management at Facebook. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP '15). ACM, New York, NY, USA, 328--343.
[49]
Carmine Vassallo, Gerald Schermann, Fiorella Zampetti, Daniele Romano, Philipp Leitner, Andy Zaidman, Massimiliano Di Penta, and Sebastiano Panichella. 2017. A Tale of CI Build Failures: an Open Source and a Financial Organization Perspective. In Proceedings of the 33rd IEEE International Conference on Software Maintenance and Evolution (ICSME).
[50]
Kaushik Veeraraghavan, Justin Meza, David Chou, Wonho Kim, Sonia Margulis, Scott Michelson, Rajesh Nishtala, Daniel Obenshain, Dmitri Perelman, and Yee Jiun Song. 2016. Kraken: Leveraging Live Traffic Tests to Identify and Resolve Resource Utilization Bottlenecks in Large Scale Web Services. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). GA, 635--651.
[51]
Elaine J. Weyuker and Filippos I. Vokolos. 2000. Experience with Performance Testing of Software Systems: Issues, an Approach, and Case Study. IEEE Transactions on Software Engineering 26, 12 (Dec. 2000), 1147--1156.
[52]
Shahed Zaman, Bram Adams, and Ahmed E. Hassan. 2012. A Qualitative Study on Performance Bugs. In Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR '12). IEEE Press, Piscataway, NJ, USA, 199--208. http://dl.acm.org/citation.cfm?id=2664446.2664477
[53]
Hong Zhu, Patrick A. V. Hall, and John H. R. May. 1997. Software Unit Test Coverage and Adequacy. Comput. Surveys 29, 4 (Dec. 1997), 366--427.

Cited By

View all
  • (2024)An Empirical Study on Code Coverage of Performance TestingProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661196(48-57)Online publication date: 18-Jun-2024
  • (2024)VAMP: Visual Analytics for Microservices PerformanceProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636069(1209-1218)Online publication date: 8-Apr-2024
  • (2024)Evaluating Search-Based Software Microbenchmark PrioritizationIEEE Transactions on Software Engineering10.1109/TSE.2024.338083650:7(1687-1703)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories
May 2018
627 pages
ISBN:9781450357166
DOI:10.1145/3196398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Go
  2. Java
  3. continuous integration
  4. empirical study
  5. microbenchmarking
  6. software performance testing

Qualifiers

  • Research-article

Conference

ICSE '18
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)13
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An Empirical Study on Code Coverage of Performance TestingProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661196(48-57)Online publication date: 18-Jun-2024
  • (2024)VAMP: Visual Analytics for Microservices PerformanceProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3636069(1209-1218)Online publication date: 8-Apr-2024
  • (2024)Evaluating Search-Based Software Microbenchmark PrioritizationIEEE Transactions on Software Engineering10.1109/TSE.2024.338083650:7(1687-1703)Online publication date: 1-Jul-2024
  • (2023)The Early Microbenchmark Catches the Bug -- Studying Performance Issues Using Micro- and Application BenchmarksProceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing10.1145/3603166.3632128(1-10)Online publication date: 4-Dec-2023
  • (2023)GraalVM Compiler Benchmark Results Dataset (Data Artifact)Companion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3585025(65-69)Online publication date: 15-Apr-2023
  • (2023)Enhancing Trace Visualizations for Microservices Performance AnalysisCompanion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3584729(283-287)Online publication date: 15-Apr-2023
  • (2023)A Study of Java Microbenchmark Tail LatenciesCompanion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3584690(77-81)Online publication date: 15-Apr-2023
  • (2023)An HPC-Container Based Continuous Integration Tool for Detecting Scaling and Performance Issues in HPC ApplicationsIEEE Transactions on Services Computing10.1109/TSC.2023.3337662(1-12)Online publication date: 2023
  • (2023)Faster or Slower? Performance Mystery of Python Idioms Unveiled with Empirical EvidenceProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00130(1495-1507)Online publication date: 14-May-2023
  • (2023)Towards effective assessment of steady state performance in Java software: are we there yet?Empirical Software Engineering10.1007/s10664-022-10247-x28:1Online publication date: 1-Jan-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media