Abstract
A goal of performance testing is to find situations when applications unexpectedly exhibit worsened characteristics for certain combinations of input values. A fundamental question of performance testing is how to select a manageable subset of the input data faster in order to automatically find performance bottlenecks in applications. We propose FOREPOST, a novel solution, for automatically finding performance bottlenecks in applications using black-box software testing. Our solution is an adaptive, feedback-directed learning testing system that learns rules from execution traces of applications. Theses rules are then used to automatically select test input data for performance testing. We hypothesize that FOREPOST can find more performance bottlenecks as compared to random testing. We have implemented our solution and applied it to a medium-size industrial application at a major insurance company and to two open-source applications. Performance bottlenecks were found automatically and confirmed by experienced testers and developers. We also thoroughly studied the factors (or independent variables) that impact the results of FOREPOST.
Similar content being viewed by others
Notes
http://eclipse.org/tptp, last checked August 12, 2015
http://weka.sourceforge.net/doc.stable/weka/classifiers/rules/JRip.html, last checked Apr 10, 2015
http://sourceforge.net/projects/ibatisjpetstore, last checked Apr 10, 2015
http://en.community.dell.com/techcenter/extras/w/wiki/dvd-store.aspx, last checked Apr 10, 2015
http://linux.dell.com/dvdstore/, last checked Apr 10, 2015
References
Achenbach M, Ostermann K (2009) Engineering abstractions in model checking and testing. IEEE Intl Workshop SCAM:137–146
Aguilera MK, Mogul JC, Wiener JL, Reynolds P, Muthitacharoen A (2003) Performance debugging for distributed systems of black boxes. In: SOSP, pp 74–89
Ammann P, Offutt J (2008) Introduction to software testing. Cambridge University Press
Ammons G, Choi JD, Gupta M, Swamy N (2004) Finding and removing performance bottlenecks in large systems. In: ECOOP, pp 170–194
Arcuri A, Briand LC (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: ICSE, pp 1–10
Ashley C (2006) Application performance management market offers attractive benefits to european service providers. The Yankee Group
Avritzer A, Weyuker EJ (1994) Generating test suites for software load testing. In: ISSTA, pp 44–57
Avritzer A, Weyuker EJ (1996) Deriving workloads for performance testing, vol 26. Wiley, New York, pp 613–633
Avritzer A, de Souza e Silva E, Leão RMM, Weyuker EJ (2011) Automated generation of test cases using a performability model. Software IET 5(2):113–119
Barna C, Litoiu M, Ghanbari H (2011) Autonomic load-testing framework. In: roceedings of the 8th ACM international conference on autonomic computing, ICAC ’11. ACM, USA, pp 91–100
Bayan M, Cangussu JaW (2008) Automatic feedback, control-based, stress and load testing. In: Proceedings of the 2008 ACM symposium on applied computing, SAC 08. ACM, USA, pp 661–666
Beck K (2003) Test-driven development: by example. The Addison-Wesley Signature Series. Addison-Wesley
Bird DL, Munoz CU (1983) Automatic generation of random self-checking test cases. IBM Syst J 22:229–245
Bishop CM (2006) Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Secaucus
Briand LC, Labiche Y, Shousha M (2005) Stress testing real-time systems with genetic algorithms. In: Proceedings of the 7th annual conference on genetic and evolutionary computation, GECCO 05. ACM, USA, pp 1021–1028
Cohen J (2013) Statistical power analysis for the behavioral sciences. Academic Press
Cohen WW (1995) Fast effective rule induction. In: Twelfth ICML, pp 115–123
Cornelissen W, Klaassen A, Matsinger A, van Wee G (1995) How to make intuitive testing more systematic. IEEE Softw 12(5):87–89
Dickinson W, Leon D, Podgurski A (2001) Finding failures by cluster analysis of execution profiles. In: ICSE, pp 339–348
Dijkstra EW (1976) A discipline of programming, vol 1. Englewood Cliffs: prentice-hall
Dustin E, Rashka J, Paul J (1999) Automated software testing: introduction, management, and performance. Addison-Wesley Professional
Fewster M, Graham D (1999) Software test automation: effective use of test execution tools. ACM Press/Addison-Wesley Publishing Co.
Foo KC, Jiang ZM, Adams B, Hassan AE, Zou Y, Flora P (2010) Mining performance regression testing repositories for automated performance analysis. In: 10th international conference on quality software (QSIC), IEEE, pp 32–41
Freeman S, Mackinnon T, Pryce N, Walnes J (2004) Mock roles, objects. In: Companion to OOPSLA ’04, pp 236–246
Furnkranz J, Widmer G (1994) Incremental reduced error pruning. In: International conference on machine learning, pp 70–77
Garbani JP (2008) Market overview: the application performance management market. Forrester Research
Glenford JM (1979) The art of software testing. Wiley. ISBN 10:0471043281
Grant S, Cordy JR, Skillicorn D (2008) Automated concept location using independent component analysis. In: WCRE ’08, pp 138–142
Grechanik M, Fu C, Xie Q (2012) Automatically finding performance problems with feedback-directed learning software testing. In: 34th international conference on software engineering (ICSE), pp 156– 166
Grindal M, Offutt J, Andler SF (2005) Combination testing strategies: a survey. Software Testing, Verification, and Reliability 15:167–199
Group TY (2005) Enterprise application management survey. The Yankee Group
Hamlet D (2006) When only random testing will do. In: Proceedings of the 1st international workshop on random testing, RT ’06. ACM, USA, pp 1–9. doi:10.1145/1145735.1145737
Hamlet R (1994) Random testing. In: Encyclopedia of Software Engineering. Wiley, pp 970–978
Haran M, Karr A, Orso A, Porter A, Sanil A (2005) Applying classification techniques to remotely-collected program execution data. In: ESEC/FSE-13, pp 146–155
Hull E, Jackson K, Dick J (2005) Requirements engineering. Springer
Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430
IEEE (1991) IEEE standard computer dictionary: a compilation of ieee standard computer glossaries
Isaacs R, Barham P (2002) Performance analysis in loosely-coupled distributed systems. In: 7th CaberNet Radicals Workshop
Jiang ZM, Hassan AE, Hamann G, Flora P (2009) Automated performance analysis of load tests. In: ICSM, pp 125–134
Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. In: Proceedings of the 33rd ACM SIGPLAN conference on programming language design and implementation, pp 77–88
Jovic M, Adamoli A, Hauswirth M (2011) Catch me if you can: performance bug detection in the wild. ACM SIGPLAN Not 46(10):155–170
Kaner C (1997) Improving the maintainability of automated test suites. Software QA 4(4)
Kaner C (2003) What is a good test case? In: Software Testing Analysis & Review Conference (STAR) East
Koziolek H (2005) Operational profiles for software reliability. In: Seminar on Dependability Engineering, Germany, Citeseer
Linares-Vásquez M, Mcmillan C, Poshyvanyk D, Grechanik M (2014) On using machine learning to automatically classify software applications into domain categories. Empirical Softw Engg 19(3):582–618. doi:10.1007/s10664-012-9230-z
Lowry R (2014) Concepts and applications of inferential statistics. R. Lowry
Malik H, Adams B, Hassan AE (2010) Pinpointing the subsystems responsible for the performance deviations in a load test. In: IEEE 21st international symposium on software reliability engineering (ISSRE), IEEE, pp 201–210
Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 2013 international conference on software engineering, IEEE Press, pp 1012–1021
McMillan C, Linares-Vasquez M, Poshyvanyk D, Grechanik M (2011) Categorizing software applications for maintenance. In: Proceedings of the 2011 27th IEEE international conference on software maintenance, ICSM 11. IEEE Computer Society, USA, pp 343–352. doi:10.1109/ICSM.2011.6080801
Menascé DA (2002) Load testing, benchmarking, and application performance management for the web. In: International CMG Conference, pp 271–282
Molyneaux I (2009) The art of application performance testing: help for programmers and quality assurance. O’Reilly Media, Inc
Murphy TE (2008) Managing test data for maximum productivity. Tech. rep
Musa JD (1993) Operational profiles in software-reliability engineering, vol 10. IEEE Computer Society Press, Los Alamitos, pp 14–32
Nistor A, Jiang T, Tan L (2013) Discovering, reporting, and fixing performance bugs. In: Proceedings of the 10th international workshop on mining software repositories, pp 237–246
Nistor A, Chang PC, Radoi C, Lu S (2015) Caramel: detecting and fixing performance problems that have non-intrusive fixes. ICSE
Park S, Hossain BMM, Hussain I, Csallner C, Grechanik M, Taneja K, Fu C, Xie Q (2012) Carfast: Achieving higher statement coverage faster. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, FSE 12. ACM, USA, pp 35:1–35:11. doi:10.1145/2393596.2393636
Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM 15:1053–1058
Parsons S (2005) Independent component analysis: a tutorial introduction. Knowl Eng Rev 20(2):198–199
Schwaber C, Mines C, Hogan L (2006) Performance-driven software development: How it shops can more efficiently meet performance requirements. Forrester Research
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611
Shen D, Luo Q, Poshyvanyk D, Grechanik M (2015) Automating performance bottleneck detection using search-based application profiling. In: Proceedings of the 2015 international symposium on software testing and analysis, ISSTA 15. ACM, USA, pp 270–281. doi:10.1145/2771783.2771816
Syer MD, Jiang ZM, Nagappan M, Hassan AE, Nasser M, Flora P (2013) Leveraging performance counters and execution logs to diagnose memory-related performance issues. In: 29th IEEE international conference on software maintenance (ICSM), IEEE, pp 110–119
Tarr PL, Ossher H, Harrison WH Jr (1999) SMS, Degrees of separation: Multi-dimensional separation of concerns. In: ICSE, pp 107–119
Tian K, Revelle M, Poshyvanyk D (2009) Using latent dirichlet allocation for automatic categorization of software. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories, MSR 09. IEEE Computer Society, USA, pp 163–166. doi:10.1109/MSR.2009.5069496
Westcott MR (1968) Toward a contemporary psychology of intuition: a historical, theoretical, and empirical inquiry. Holt, Rinehart and Winston
Weyuker EJ, Vokolos FI (2000) Experience with performance testing of software systems: Issues, an approach, and case study. IEEE Trans Softw Eng 26(12):1147–1156
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1 (6):80–83
Wildstrom J, Stone P, Witchel E, Dahlin M (2007) Machine learning for on-line hardware reconfiguration. In: IJCAI’07, pp 1113–1118
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann
Yuhanna N (2009) Dbms selection: look beyond basic functions. Forrester Research
Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories, ACM, pp 93–102
Zaman S, Adams B, Hassan AE (2012) A qualitative study on performance bugs. In: 9th IEEE working conference on mining software repositories (MSR), pp 199–208
Zaparanuks D, Hauswirth M (2012) Algorithmic profiling. ACM SIGPLAN Not 47(6):67–76
Zhang P, Elbaum SG, Dwyer MB (2011) Automatic generation of load tests. In: ASE, pp 43–52
Acknowledgments
We are grateful to the anonymous ICSE’12 and EMSE journal reviewers for their relevant and useful comments and suggestions, which helped us to significantly improve an earlier version of this paper. We would like to thank Bogdan Dit and Kevin Moran for reading the paper and providing the feedback on early drafts. We would also like to thank Du Shen for his pertinent feedback on improving the current version of FOREPOST as well as pointing out areas for improvement. We also thank Chen Fu and Qin Xie for their contributions to the earlier version of this work. This work is supported by NSF CCF-0916139, NSF CCF-1017633, NSF CCF-1218129, NSF CCF-1525902, a major insurance company, and Accenture.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Ahmed E. Hassan
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Luo, Q., Nair, A., Grechanik, M. et al. FOREPOST: finding performance problems automatically with feedback-directed learning software testing. Empir Software Eng 22, 6–56 (2017). https://doi.org/10.1007/s10664-015-9413-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-015-9413-5