FOREPOST: finding performance problems automatically with feedback-directed learning software testing

Luo, Qi; Nair, Aswathy; Grechanik, Mark; Poshyvanyk, Denys

doi:10.1007/s10664-015-9413-5

FOREPOST: finding performance problems automatically with feedback-directed learning software testing

Published: 11 December 2015

Volume 22, pages 6–56, (2017)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Qi Luo¹,
Aswathy Nair²,
Mark Grechanik³ &
…
Denys Poshyvanyk¹

1151 Accesses
1 Altmetric
Explore all metrics

Abstract

A goal of performance testing is to find situations when applications unexpectedly exhibit worsened characteristics for certain combinations of input values. A fundamental question of performance testing is how to select a manageable subset of the input data faster in order to automatically find performance bottlenecks in applications. We propose FOREPOST, a novel solution, for automatically finding performance bottlenecks in applications using black-box software testing. Our solution is an adaptive, feedback-directed learning testing system that learns rules from execution traces of applications. Theses rules are then used to automatically select test input data for performance testing. We hypothesize that FOREPOST can find more performance bottlenecks as compared to random testing. We have implemented our solution and applied it to a medium-size industrial application at a major insurance company and to two open-source applications. Performance bottlenecks were found automatically and confirmed by experienced testers and developers. We also thoroughly studied the factors (or independent variables) that impact the results of FOREPOST.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Robust and Intelligent Machine Learning Algorithm for Software Testing

An autonomous performance testing framework using self-adaptive fuzzy reinforcement learning

Article Open access 10 March 2021

Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models

Article Open access 28 April 2023

Notes

http://eclipse.org/tptp, last checked August 12, 2015
http://weka.sourceforge.net/doc.stable/weka/classifiers/rules/JRip.html, last checked Apr 10, 2015
http://sourceforge.net/projects/ibatisjpetstore, last checked Apr 10, 2015
http://en.community.dell.com/techcenter/extras/w/wiki/dvd-store.aspx, last checked Apr 10, 2015
http://linux.dell.com/dvdstore/, last checked Apr 10, 2015
http://www.cs.wm.edu/semeru/data/EMSE-forepost/

References

Achenbach M, Ostermann K (2009) Engineering abstractions in model checking and testing. IEEE Intl Workshop SCAM:137–146
Aguilera MK, Mogul JC, Wiener JL, Reynolds P, Muthitacharoen A (2003) Performance debugging for distributed systems of black boxes. In: SOSP, pp 74–89
Ammann P, Offutt J (2008) Introduction to software testing. Cambridge University Press
Ammons G, Choi JD, Gupta M, Swamy N (2004) Finding and removing performance bottlenecks in large systems. In: ECOOP, pp 170–194
Arcuri A, Briand LC (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: ICSE, pp 1–10
Ashley C (2006) Application performance management market offers attractive benefits to european service providers. The Yankee Group
Avritzer A, Weyuker EJ (1994) Generating test suites for software load testing. In: ISSTA, pp 44–57
Avritzer A, Weyuker EJ (1996) Deriving workloads for performance testing, vol 26. Wiley, New York, pp 613–633
Google Scholar
Avritzer A, de Souza e Silva E, Leão RMM, Weyuker EJ (2011) Automated generation of test cases using a performability model. Software IET 5(2):113–119
Article Google Scholar
Barna C, Litoiu M, Ghanbari H (2011) Autonomic load-testing framework. In: roceedings of the 8th ACM international conference on autonomic computing, ICAC ’11. ACM, USA, pp 91–100
Bayan M, Cangussu JaW (2008) Automatic feedback, control-based, stress and load testing. In: Proceedings of the 2008 ACM symposium on applied computing, SAC 08. ACM, USA, pp 661–666
Beck K (2003) Test-driven development: by example. The Addison-Wesley Signature Series. Addison-Wesley
Bird DL, Munoz CU (1983) Automatic generation of random self-checking test cases. IBM Syst J 22:229–245
Article Google Scholar
Bishop CM (2006) Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Secaucus
MATH Google Scholar
Briand LC, Labiche Y, Shousha M (2005) Stress testing real-time systems with genetic algorithms. In: Proceedings of the 7th annual conference on genetic and evolutionary computation, GECCO 05. ACM, USA, pp 1021–1028
Cohen J (2013) Statistical power analysis for the behavioral sciences. Academic Press
Cohen WW (1995) Fast effective rule induction. In: Twelfth ICML, pp 115–123
Cornelissen W, Klaassen A, Matsinger A, van Wee G (1995) How to make intuitive testing more systematic. IEEE Softw 12(5):87–89
Article Google Scholar
Dickinson W, Leon D, Podgurski A (2001) Finding failures by cluster analysis of execution profiles. In: ICSE, pp 339–348
Dijkstra EW (1976) A discipline of programming, vol 1. Englewood Cliffs: prentice-hall
Dustin E, Rashka J, Paul J (1999) Automated software testing: introduction, management, and performance. Addison-Wesley Professional
Fewster M, Graham D (1999) Software test automation: effective use of test execution tools. ACM Press/Addison-Wesley Publishing Co.
Foo KC, Jiang ZM, Adams B, Hassan AE, Zou Y, Flora P (2010) Mining performance regression testing repositories for automated performance analysis. In: 10th international conference on quality software (QSIC), IEEE, pp 32–41
Freeman S, Mackinnon T, Pryce N, Walnes J (2004) Mock roles, objects. In: Companion to OOPSLA ’04, pp 236–246
Furnkranz J, Widmer G (1994) Incremental reduced error pruning. In: International conference on machine learning, pp 70–77
Garbani JP (2008) Market overview: the application performance management market. Forrester Research
Glenford JM (1979) The art of software testing. Wiley. ISBN 10:0471043281
Grant S, Cordy JR, Skillicorn D (2008) Automated concept location using independent component analysis. In: WCRE ’08, pp 138–142
Grechanik M, Fu C, Xie Q (2012) Automatically finding performance problems with feedback-directed learning software testing. In: 34th international conference on software engineering (ICSE), pp 156– 166
Grindal M, Offutt J, Andler SF (2005) Combination testing strategies: a survey. Software Testing, Verification, and Reliability 15:167–199
Article Google Scholar
Group TY (2005) Enterprise application management survey. The Yankee Group
Hamlet D (2006) When only random testing will do. In: Proceedings of the 1st international workshop on random testing, RT ’06. ACM, USA, pp 1–9. doi:10.1145/1145735.1145737
Hamlet R (1994) Random testing. In: Encyclopedia of Software Engineering. Wiley, pp 970–978
Haran M, Karr A, Orso A, Porter A, Sanil A (2005) Applying classification techniques to remotely-collected program execution data. In: ESEC/FSE-13, pp 146–155
Hull E, Jackson K, Dick J (2005) Requirements engineering. Springer
Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430
Article Google Scholar
IEEE (1991) IEEE standard computer dictionary: a compilation of ieee standard computer glossaries
Isaacs R, Barham P (2002) Performance analysis in loosely-coupled distributed systems. In: 7th CaberNet Radicals Workshop
Jiang ZM, Hassan AE, Hamann G, Flora P (2009) Automated performance analysis of load tests. In: ICSM, pp 125–134
Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. In: Proceedings of the 33rd ACM SIGPLAN conference on programming language design and implementation, pp 77–88
Jovic M, Adamoli A, Hauswirth M (2011) Catch me if you can: performance bug detection in the wild. ACM SIGPLAN Not 46(10):155–170
Article Google Scholar
Kaner C (1997) Improving the maintainability of automated test suites. Software QA 4(4)
Kaner C (2003) What is a good test case? In: Software Testing Analysis & Review Conference (STAR) East
Koziolek H (2005) Operational profiles for software reliability. In: Seminar on Dependability Engineering, Germany, Citeseer
Linares-Vásquez M, Mcmillan C, Poshyvanyk D, Grechanik M (2014) On using machine learning to automatically classify software applications into domain categories. Empirical Softw Engg 19(3):582–618. doi:10.1007/s10664-012-9230-z
Article Google Scholar
Lowry R (2014) Concepts and applications of inferential statistics. R. Lowry
Malik H, Adams B, Hassan AE (2010) Pinpointing the subsystems responsible for the performance deviations in a load test. In: IEEE 21st international symposium on software reliability engineering (ISSRE), IEEE, pp 201–210
Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 2013 international conference on software engineering, IEEE Press, pp 1012–1021
McMillan C, Linares-Vasquez M, Poshyvanyk D, Grechanik M (2011) Categorizing software applications for maintenance. In: Proceedings of the 2011 27th IEEE international conference on software maintenance, ICSM 11. IEEE Computer Society, USA, pp 343–352. doi:10.1109/ICSM.2011.6080801
Menascé DA (2002) Load testing, benchmarking, and application performance management for the web. In: International CMG Conference, pp 271–282
Molyneaux I (2009) The art of application performance testing: help for programmers and quality assurance. O’Reilly Media, Inc
Murphy TE (2008) Managing test data for maximum productivity. Tech. rep
Musa JD (1993) Operational profiles in software-reliability engineering, vol 10. IEEE Computer Society Press, Los Alamitos, pp 14–32
Google Scholar
Nistor A, Jiang T, Tan L (2013) Discovering, reporting, and fixing performance bugs. In: Proceedings of the 10th international workshop on mining software repositories, pp 237–246
Nistor A, Chang PC, Radoi C, Lu S (2015) Caramel: detecting and fixing performance problems that have non-intrusive fixes. ICSE
Park S, Hossain BMM, Hussain I, Csallner C, Grechanik M, Taneja K, Fu C, Xie Q (2012) Carfast: Achieving higher statement coverage faster. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, FSE 12. ACM, USA, pp 35:1–35:11. doi:10.1145/2393596.2393636
Parnas DL (1972) On the criteria to be used in decomposing systems into modules. Commun ACM 15:1053–1058
Article Google Scholar
Parsons S (2005) Independent component analysis: a tutorial introduction. Knowl Eng Rev 20(2):198–199
Article Google Scholar
Schwaber C, Mines C, Hogan L (2006) Performance-driven software development: How it shops can more efficiently meet performance requirements. Forrester Research
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611
Article MathSciNet MATH Google Scholar
Shen D, Luo Q, Poshyvanyk D, Grechanik M (2015) Automating performance bottleneck detection using search-based application profiling. In: Proceedings of the 2015 international symposium on software testing and analysis, ISSTA 15. ACM, USA, pp 270–281. doi:10.1145/2771783.2771816
Syer MD, Jiang ZM, Nagappan M, Hassan AE, Nasser M, Flora P (2013) Leveraging performance counters and execution logs to diagnose memory-related performance issues. In: 29th IEEE international conference on software maintenance (ICSM), IEEE, pp 110–119
Tarr PL, Ossher H, Harrison WH Jr (1999) SMS, Degrees of separation: Multi-dimensional separation of concerns. In: ICSE, pp 107–119
Tian K, Revelle M, Poshyvanyk D (2009) Using latent dirichlet allocation for automatic categorization of software. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories, MSR 09. IEEE Computer Society, USA, pp 163–166. doi:10.1109/MSR.2009.5069496
Westcott MR (1968) Toward a contemporary psychology of intuition: a historical, theoretical, and empirical inquiry. Holt, Rinehart and Winston
Weyuker EJ, Vokolos FI (2000) Experience with performance testing of software systems: Issues, an approach, and case study. IEEE Trans Softw Eng 26(12):1147–1156
Article Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1 (6):80–83
Article Google Scholar
Wildstrom J, Stone P, Witchel E, Dahlin M (2007) Machine learning for on-line hardware reconfiguration. In: IJCAI’07, pp 1113–1118
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann
Yuhanna N (2009) Dbms selection: look beyond basic functions. Forrester Research
Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories, ACM, pp 93–102
Zaman S, Adams B, Hassan AE (2012) A qualitative study on performance bugs. In: 9th IEEE working conference on mining software repositories (MSR), pp 199–208
Zaparanuks D, Hauswirth M (2012) Algorithmic profiling. ACM SIGPLAN Not 47(6):67–76
Article Google Scholar
Zhang P, Elbaum SG, Dwyer MB (2011) Automatic generation of load tests. In: ASE, pp 43–52

Download references

Acknowledgments

We are grateful to the anonymous ICSE’12 and EMSE journal reviewers for their relevant and useful comments and suggestions, which helped us to significantly improve an earlier version of this paper. We would like to thank Bogdan Dit and Kevin Moran for reading the paper and providing the feedback on early drafts. We would also like to thank Du Shen for his pertinent feedback on improving the current version of FOREPOST as well as pointing out areas for improvement. We also thank Chen Fu and Qin Xie for their contributions to the earlier version of this work. This work is supported by NSF CCF-0916139, NSF CCF-1017633, NSF CCF-1218129, NSF CCF-1525902, a major insurance company, and Accenture.

Author information

Authors and Affiliations

Department of Computer Science, College of William and Mary, Williamsburg, VA, 23188, USA
Qi Luo & Denys Poshyvanyk
Bank of American Merrill Lynch, Pennington, NJ, 08534, USA
Aswathy Nair
Department of Computer Science, University of Illinois at Chicago, Chicago, IL, 60601, USA
Mark Grechanik

Authors

Qi Luo
View author publications
You can also search for this author in PubMed Google Scholar
Aswathy Nair
View author publications
You can also search for this author in PubMed Google Scholar
Mark Grechanik
View author publications
You can also search for this author in PubMed Google Scholar
Denys Poshyvanyk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Luo.

Additional information

Communicated by: Ahmed E. Hassan

Appendix

Table 8 Six Injected bottlenecks in JPetStore, where the length of delay is measured in seconds

FOREPOST: finding performance problems automatically with feedback-directed learning software testing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Robust and Intelligent Machine Learning Algorithm for Software Testing

An autonomous performance testing framework using self-adaptive fuzzy reinforcement learning

Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation