research-article

Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study

Authors:

Frank PadbergAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology (TOSEM), Volume 24, Issue 4

Article No.: 23, Pages 1 - 49

https://doi.org/10.1145/2699688

Published: 02 September 2015 Publication History

Abstract

Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, EvoSuite. We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.

References

[1]

S. Afshan, P. McMinn, and M. Stevenson. 2013. Evolving readable string test inputs using a natural language model to reduce Human Oracle cost. In Proceedings of the International Conference on Software Testing, Verification and Validation (ICST).

Digital Library

[2]

J. H. Andrews, L. C. Briand, and Y. Labiche. 2005. Is mutation an appropriate tool for testing experiments? In Proceedings of the 27th International Conference on Software Engineering (ICSE). 402--411.

Digital Library

[3]

A. Arcuri and L. Briand. 2014. A hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verif. Reliability 24, 3, 219--250.

Digital Library

[4]

L. Baresi, P. L. Lanzi, and M. Miraz. 2010. TestFul: An evolutionary test approach for Java. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST). 185--194.

Digital Library

[5]

R. P. L. Buse, C. Sadowski, and W. Weimer. 2011. Benefits and barriers of user evaluation in software engineering research. ACM SIGPLAN Notices 46, 643--656.

Digital Library

[6]

C. Csallner and Y. Smaragdakis. 2004. JCrasher: An automatic robustness tester for Java. Softw. Practice Exper. 34, 11, 1025--1050.

Digital Library

[7]

J. T. de Souza, C. L. Maia, F. G. de Freitas, and D. P. Coutinho. 2010. The human competitiveness of search based software engineering. In Proceedings of the International Symposium on Search Based Software Engineering (SSBSE). 143--152.

Digital Library

[8]

H. Do and G. Rothermel. 2006. On the use of mutation faults in empirical assessments of test case prioritization techniques. IEEE Trans. Softw. Eng. 32, 9, 733--752.

Digital Library

[9]

G. Fraser and A. Arcuri. 2011. EvoSuite: Automatic test suite generation for object-oriented software. In Proceedings of the ACM Symposium on the Foundations of Software Engineering (FSE). 416--419.

Digital Library

[10]

G. Fraser and A. Arcuri. 2012a. The Seed is strong: Seeding strategies in search-based software testing. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST).

Digital Library

[11]

G. Fraser and A. Arcuri. 2012b. Sound empirical evidence in software testing. In Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE). 178--188.

Digital Library

[12]

G. Fraser and A. Arcuri. 2013. Whole Test Suite Generation. IEEE Trans. Softw. Eng. 39 2, 276--291.

Digital Library

[13]

G. Fraser, M. Staats, P. McMinn, A. Arcuri, and F. Padberg. 2013. Does Automated White-Box Test Generation Really Help Software Testers? In Proceedings of the ACM International Symposium on Software Testing and Analysis (ISSTA).

Digital Library

[14]

G. Fraser and A. Zeller. 2011. Exploiting common object usage in test case generation. In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE Computer Society, 80--89.

Digital Library

[15]

G. Fraser and A Zeller. 2012. Mutation-driven generation of unit tests and oracles. IEEE Trans. Softw. Eng. 28, 2, 278--292.

Digital Library

[16]

M. Harman, S. A. Mansouri, and Y. Zhang. 2012. Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45, 1

Digital Library

[17]

M. Harman and P. McMinn. 2010. A theoretical and empirical study of search based testing: Local, global and hybrid search. IEEE Trans. Softw. Eng. 36, 2, 226--247.

Digital Library

[18]

M. Inzlicht and T. Ben-Zeev. 2000. A threatening intellectual environment: Why females are susceptible to experiencing problem-solving deficits in the presence of males. Psychol. Sci. 11, 5, 365--371.

[19]

M. Islam and C. Csallner. 2010. Dsc+Mock: A test case + mock class generator in support of coding against interfaces. In Proceedings of the International Workshop on Dynamic Analysis (WODA). 26--31.

Digital Library

[20]

R. Just, F. Schweiggert, and G. M. Kapfhammer. 2011. MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler. InProceedings of the International Conference on Automated Software Engineering (ASE). 612--615.

Digital Library

[21]

B. A. Kitchenham, S. L. Pfleeger, L. M. Pickard, P. W. Jones, D. C. Hoaglin, K. El Emam, and J. Rosenberg. 2002. Preliminary guidelines for empirical research in software engineering. IEEE Trans. Softw. Eng. 28, 8, 721--734.

Digital Library

[22]

J. R. Koza. 2010. Human-competitive results produced by genetic programming. Genetic Prog. Evolvable Mach. 11, 3, 251--284.

Digital Library

[23]

K. Lakhotia, P. McMinn, and M. Harman. 2010. An empirical investigation into branch coverage for C programs Using CUTE and AUSTIN. J. Syst. Softw. 83, 12, 2379--2391.

Digital Library

[24]

N. Li, X. Meng, J. Offutt, and L. Deng. 2013. Is bytecode instrumentation as good as source code instrumentation: An empirical study with industrial tools (Experience Report). In Proceedings of the 24th International IEEE Symposium on Proceedings of Software Reliability Engineering (ISSRE). 380--389.

[25]

P. McMinn. 2004. Search-based software test data generation: A survey. Softw. Test. Verif. Reliability 14, 2, 105--156.

Digital Library

[26]

E. F. Miller Jr. and R. A. Melton. 1975. Automated generation of testcase datasets. In Proceedings of the ACM International Conference on Reliable Software. 51--58.

Digital Library

[27]

A. S. Namin and J. H. Andrews. 2009. The influence of size and coverage on test suite effectiveness. In Proceedings of the ACM International Symposium on Software Testing and Analysis (ISSTA). ACM.

Digital Library

[28]

C. Pacheco and M. D. Ernst. 2007. Randoop: Feedback-directed random testing for Java. In Proceedings of the Object-Oriented Programming Systems, Languages, and Applications Conference (OOPSLA). ACM, 815--816.

Digital Library

[29]

C. Parnin and A. Orso. 2011. Are automated debugging techniques actually helping programmers? In Proceedings of the ACM International Symposium on Software Testing and Analysis (ISSTA). 199--209.

Digital Library

[30]

C. S. Pasareanu and N. Rungta. 2010. Symbolic PathFinder: Symbolic execution of Java bytecode. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE). Vol. 10, 179--180.

Digital Library

[31]

F. Pastore, L. Mariani, and G. Fraser. 2013. CrowdOracles: Can the crowd solve the Oracle problem? In Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST).

Digital Library

[32]

R. Ramler, D. Winkler, and M. Schmidt. 2012. Random test case generation and manual unit testing: Substitute or complement in retrofitting tests for legacy code? In Proceedings of the EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 286--293.

Digital Library

[33]

G. Sautter, K. Böhm, F. Padberg, and W. Tichy. 2007. Empirical Evaluation of Semi-Automated XML Annotation of Text Documents with the GoldenGATE Editor. In Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries. 357--367.

Digital Library

[34]

C. B. Seaman. 1999. Qualitative methods in empirical studies of software engineering. IEEE Trans. Softw. Eng. 25, 4, 557--572.

Digital Library

[35]

D. I. K. Sjoberg, J. E. Hannay, O. Hansen, V. B. Kampenes, A. Karahasanovic, N. K. Liborg, and A. C. Rekdal. 2005. A survey of controlled experiments in software engineering. IEEE Trans. Softw. Eng. 31, 9, 733--753.

Digital Library

[36]

M. Staats, G. Gay, and M. P. E. Heimdahl. 2012a. Automated oracle creation support, or: How I learned to stop worrying about fault propagation and love mutation testing. In Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE). 870--880.

Digital Library

[37]

M. Staats, S. Hong, M. Kim, and G. Rothermel. 2012b. Understanding user understanding: Determining correctness of generated program invariants. In Proceedings of the ACM International Symposium on Software Testing and Analysis (ISSTA). 188--198.

Digital Library

[38]

N. Tillmann and N. J. de Halleux. 2008. Pex—White box test generation for .NET. In Proceedings of the International Conference on Tests And Proofs (TAP). 134--253.

Digital Library

[39]

P. Tonella. 2004. Evolutionary testing of classes. In Proceedings of the ACM International Symposium on Software Testing and Analysis (ISSTA). 119--128.

Digital Library

[40]

J. Wegener, A. Baresel, and H. Sthamer. 2001. Evolutionary test environment for automatic structural testing. Inform. Softw. Technol. 43, 14, 841--854.

[41]

Y. Wei, C. Furia, N. Kazmin, and B. Meyer. 2011. Inferring better contracts. In Proceedings of the ACM/IEEE International Conference on Software Engineering (ICSE). 191--200.

Digital Library

[42]

S. Yoo and M. Harman. 2012. Test data regeneration: Generating new test data from existing test data. Softw. Test. Verif. Reliability 22, 3, 171--201.

Digital Library

Cited By

Hou XZhao YLiu YYang ZWang KLi LLuo XLo DGrundy JWang H(2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3695988
Mathews NNagappan MFilkov VRay BZhou M(2024)Test-Driven Development and LLM-based Code GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695527(1583-1594)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695527
Siddiq MDa Silva Santos JTanvir RUlfat NAl Rifat FCarvalho Lopes V(2024)Using Large Language Models to Generate JUnit Tests: An Empirical StudyProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661216(313-322)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661216
Show More Cited By

Index Terms

Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Automated unit test generation during software development: a controlled experiment and think-aloud observations
ISSTA 2015: Proceedings of the 2015 International Symposium on Software Testing and Analysis

Automated unit test generation tools can produce tests that are superior to manually written ones in terms of code coverage, but are these tests helpful to developers while they are writing code? A developer would first need to know when and how to ...
A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite

Research on software testing produces many innovative automated techniques, but because software testing is by necessity incomplete and approximate, any new technique faces the challenge of an empirical assessment. In the past, we have demonstrated ...
Does automated white-box test generation really help software testers?
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Automated test generation techniques can efficiently produce test data that systematically cover structural aspects of a program. In the absence of a specification, a common assumption is that these tests relieve a developer of most of the work, as the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 24, Issue 4

Special Issue on ISSTA 2013

August 2015

177 pages

ISSN:1049-331X

EISSN:1557-7392

DOI:10.1145/2820114

Editor:
David S. Rosenblum
National University of Singapore, Singapore

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 September 2015

Accepted: 01 September 2014

Revised: 01 July 2014

Received: 01 January 2014

Published in TOSEM Volume 24, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Research Fund, Luxembourg
EPSRC
Google Focused Research Award on “Test Amplification”
Norwegian Research Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

68
Total Citations
View Citations
1,434
Total Downloads

Downloads (Last 12 months)133
Downloads (Last 6 weeks)20

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hou XZhao YLiu YYang ZWang KLi LLuo XLo DGrundy JWang H(2024)Large Language Models for Software Engineering: A Systematic Literature ReviewACM Transactions on Software Engineering and Methodology10.1145/369598833:8(1-79)Online publication date: 20-Sep-2024
https://dl.acm.org/doi/10.1145/3695988
Mathews NNagappan MFilkov VRay BZhou M(2024)Test-Driven Development and LLM-based Code GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695527(1583-1594)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695527
Siddiq MDa Silva Santos JTanvir RUlfat NAl Rifat FCarvalho Lopes V(2024)Using Large Language Models to Generate JUnit Tests: An Empirical StudyProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661216(313-322)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661216
Yu XLiu LHu XKeung JXia XLo DChristakis MPradel M(2024)Practitioners’ Expectations on Automated Test GenerationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680386(1618-1630)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680386
El Haji KBrandt CZaidman ASaadatmand MLonetti FBudnik CLi JGuerriero A(2024)Using GitHub Copilot for Test Generation in Python: An Empirical StudyProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)10.1145/3644032.3644443(45-55)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3644032.3644443
Deljouyi ARoychoudhury APaiva AAbreu RStorey M(2024)Understandable Test Generation Through Capture/Replay and LLMsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639789(261-263)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639478.3639789
Brandt CKhatami AWessel MZaidman A(2024)Shaken, Not Stirred: How Developers Like Their Amplified TestsIEEE Transactions on Software Engineering10.1109/TSE.2024.338101550:5(1264-1280)Online publication date: 22-Mar-2024
https://dl.acm.org/doi/10.1109/TSE.2024.3381015
Coutinho JLemos ATerra-Neves MRibeiro AManquinho VQuintino RMatejczyk B(2024)BugOut: Automated Test Generation and Bug Detection for Low-Code2024 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST60714.2024.00041(373-382)Online publication date: 27-May-2024
https://doi.org/10.1109/ICST60714.2024.00041
Pecorelli FGrano GPalomba FGall HDe Lucia A(2024)Toward granular search-based automatic unit test case generationEmpirical Software Engineering10.1007/s10664-024-10451-x29:4Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1007/s10664-024-10451-x
Davis MChoi SEstep SMyers BSunshine JChandra SBlincoe KTonella P(2023)NaNofuzz: A Usable Tool for Automatic Test GenerationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616327(1114-1126)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616327
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents