research-article

Automated unit test generation during software development: a controlled experiment and think-aloud observations

Authors:

José Miguel Rojas,

Andrea ArcuriAuthors Info & Claims

ISSTA 2015: Proceedings of the 2015 International Symposium on Software Testing and Analysis

Pages 338 - 349

https://doi.org/10.1145/2771783.2771801

Published: 13 July 2015 Publication History

Abstract

Automated unit test generation tools can produce tests that are superior to manually written ones in terms of code coverage, but are these tests helpful to developers while they are writing code? A developer would first need to know when and how to apply such a tool, and would then need to understand the resulting tests in order to provide test oracles and to diagnose and fix any faults that the tests reveal. Considering all this, does automatically generating unit tests provide any benefit over simply writing unit tests manually? We empirically investigated the effects of using an automated unit test generation tool (EvoSuite) during development. A controlled experiment with 41 students shows that using EvoSuite leads to an average branch coverage increase of +13%, and 36% less time is spent on testing compared to writing unit tests manually. However, there is no clear effect on the quality of the implementations, as it depends on how the test generation tool and the generated tests are used. In-depth analysis, using five think-aloud observations with professional programmers, confirms the necessity to increase the usability of automated unit test generation tools, to integrate them better during software development, and to educate software developers on how to best use those tools.

References

[1]

C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball, “Feedback-directed random test generation,” in ACM/IEEE Int. Conference on Software Engineering (ICSE), 2007, pp. 75–84.

Digital Library

[2]

B. Meyer, I. Ciupa, A. Leitner, and L. L. Liu, “Automatic testing of object-oriented software,” in SOFSEM’07: Theory and Practice of Computer Science, ser. LNCS, vol. 4362. Springer-Verlag, 2007, pp. 114–129.

Digital Library

[3]

N. Tillmann and J. N. de Halleux, “Pex — white box test generation for .NET,” in Int. Conference on Tests And Proofs (TAP), ser. LNCS, vol. 4966. Springer, 2008, pp. 134 – 253.

Digital Library

[4]

C. Csallner and Y. Smaragdakis, “JCrasher: an automatic robustness tester for Java,” Softw. Pract. Exper., vol. 34, pp. 1025–1050, 2004.

Digital Library

[5]

P. Tonella, “Evolutionary testing of classes,” in ACM Int. Symposium on Software Testing and Analysis (ISSTA), 2004, pp. 119–128.

Digital Library

[6]

J. H. Andrews, F. C. H. Li, and T. Menzies, “Nighthawk: a two-level genetic-random unit test data generator,” in IEEE/ACM Int. Conference on Automated Software Engineering (ASE), 2007, pp. 144–153.

Digital Library

[7]

L. Baresi, P. L. Lanzi, and M. Miraz, “TestFul: an evolutionary test approach for Java,” in IEEE Int. Conference on Software Testing, Verification and Validation (ICST), 2010, pp. 185–194.

Digital Library

[8]

G. Fraser and A. Arcuri, “Whole test suite generation,” IEEE Transactions on Software Engineering, vol. 39, no. 2, pp. 276–291, 2013.

Digital Library

[9]

S. Park, B. M. M. Hossain, I. Hussain, C. Csallner, M. Grechanik, K. Taneja, C. Fu, and Q. Xie, “CarFast: achieving higher statement coverage faster,” in ACM Symposium on the Foundations of Software Engineering (FSE), 2012, pp. 35:1–35:11.

Digital Library

[10]

(2014) Agitar One. {Online}. Available: http://www.agitar.com

[11]

(2014) Parasoft JTest. {Online}. Available: http://www.parasoft.com/jtest

[12]

G. Fraser, M. Staats, P. McMinn, A. Arcuri, and F. Padberg, “Does automated white-box test generation really help software testers?” in ACM Int. Symposium on Software Testing and Analysis (ISSTA), 2013, pp. 291–301.

Digital Library

[13]

G. Fraser and A. Arcuri, “EvoSuite: Automatic test suite generation for object-oriented software.” in ACM Symposium on the Foundations of Software Engineering (FSE), 2011, pp. 416–419.

Digital Library

[14]

A. Jedlitschka, M. Ciolkowski, and D. Pfahl, “Reporting experiments in software engineering,” in Guide to Advanced Empirical Software Engineering, F. Shull, J. Singer, and D. Sjøberg, Eds. Springer London, 2008, pp. 201–228.

[15]

(2014) Apache Commons Libraries. {Online}. Available: http://commons.apache.org/

[16]

(2014) JavaNCSS - a source measurement suite for Java. Version 32.53. {Online}. Available: http://www.kclee.de/clemens/java/javancss

[17]

(2014) EclEmma - Java code coverage for Eclipse. Version 2.3.1. {Online}. Available: http://www.eclemma.org/

[18]

(2014) Rabbit - Eclipse statistics tracking plugin. Version 1.2.1. {Online}. Available: https://code.google.com/p/rabbit-eclipse/

[19]

N. Li, X. Meng, J. Offutt, and L. Deng, “Is bytecode instrumentation as good as source code instrumentation: An empirical study with industrial tools (experience report),” in IEEE Int. Symposium on Software Reliability Engineering (ISSRE), 2013, pp. 380–389.

[20]

R. Just, “The Major mutation framework: Efficient and scalable mutation analysis for Java,” in ACM Int. Symposium on Software Testing and Analysis (ISSTA), 2014, pp. 433–436.

Digital Library

[21]

J. H. Andrews, L. C. Briand, and Y. Labiche, “Is mutation an appropriate tool for testing experiments?” in ACM/IEEE Int. Conference on Software Engineering (ICSE), 2005, pp. 402–411.

Digital Library

[22]

R. Just, D. Jalali, L. Inozemtseva, M. Ernst, R. Holmes, and G. Fraser, “Are mutants a valid substitute for real faults in software testing?” in ACM Symposium on the Foundations of Software Engineering (FSE), 2014.

Digital Library

[23]

M. Fay and M. Proschan, “Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules,” Statistics Surveys, vol. 4, pp. 1–39, 2010.

[24]

A. Arcuri and L. Briand, “A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering,” Software Testing, Verification and Reliability (STVR), vol. 24, no. 3, pp. 219–250, 2012.

Digital Library

[25]

R. Grissom and J. Kim, Effect sizes for research: A broad practical approach. Lawrence Erlbaum, 2005.

[26]

J. Cohen, “A power primer,” Psychological bulletin, vol. 112, no. 1, pp. 155–159, 1992.

[27]

A. Vargha and H. D. Delaney, “A critique and improvement of the CL common language effect size statistics of McGraw and Wong,” Journal of Educational and Behavioral Statistics, vol. 25, no. 2, pp. 101–132, 2000.

[28]

J. Carver, L. Jaccheri, S. Morasca, and F. Shull, “Issues in using students in empirical studies in software engineering education,” in IEEE Int. Software Metrics Symposium, 2003, pp. 239–249.

Digital Library

[29]

M. Höst, B. Regnell, and C. Wohlin, “Using students as subjects—A comparative study of students and professionals in lead-time impact assessment,” Empirical Software Engineering, vol. 5, no. 3, pp. 201–214, 2000.

Digital Library

[30]

B. A. Kitchenham, S. L. Pfleeger, L. M. Pickard, P. W. Jones, D. C. Hoaglin, K. E. Emam, and J. Rosenberg, “Preliminary guidelines for empirical research in software engineering,” IEEE Transactions on Software Engineering (TSE), vol. 28, no. 8, pp. 721–734, Aug. 2002.

Digital Library

[31]

K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, Cambridge, MA, 1993.

[32]

K. A. Ericsson, “Valid and non-reactive verbalization of thoughts during performance of tasks - towards a solution to the central problems of introspection as a source of scientific data,” Consciousness Studies, vol. 10, no. 9-10, pp. 1–18, 2003.

[33]

S. McDonald, H. Edwards, and T. Zhao, “Exploring think-alouds in usability testing: An international survey,” IEEE Transactions on Professional Communication, vol. 55, no. 1, pp. 2–19, March 2012.

[34]

D. Sjoberg, J. Hannay, O. Hansen, V. By Kampenes, A. Karahasanovic, N. Liborg, and A. C Rekdal, “A survey of controlled experiments in software engineering,” IEEE Transactions on Software Engineering (TSE), vol. 31, no. 9, pp. 733–753, 2005.

Digital Library

[35]

R. P. Buse, C. Sadowski, and W. Weimer, “Benefits and barriers of user evaluation in software engineering research,” in ACM SIGPLAN Notices, vol. 46, no. 10, 2011, pp. 643–656.

Digital Library

[36]

A. Orso and G. Rothermel, “Software Testing: A Research Travelogue (2000–2014),” in ACM Future of Software Engineering (FOSE), 2014, pp. 117–132.

Digital Library

[37]

M. Ceccato, A. Marchetto, L. Mariani, C. D. Nguyen, and P. Tonella, “An empirical study about the effectiveness of debugging when random test cases are used,” in ACM/IEEE Int. Conference on Software Engineering (ICSE), 2012, pp. 452–462.

Digital Library

[38]

R. Ramler, D. Winkler, and M. Schmidt, “Random test case generation and manual unit testing: Substitute or complement in retrofitting tests for legacy code?” in IEEE Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2012, pp. 286–293.

Digital Library

[39]

D. Saff and M. D. Ernst, “An experimental evaluation of continuous testing during development,” in ACM Int. Symposium on Software Testing and Analysis (ISSTA), 2004, pp. 76–85.

Digital Library

[40]

J. Hughes and S. Parkes, “Trends in the use of verbal protocol analysis in software engineering research,” Behaviour and Information Technology, vol. 22, no. 2, pp. 127–140, 2003.

[41]

A. M. Vans, A. von Mayrhauser, and G. Somlo, “Program understanding behavior during corrective maintenance of large-scale software,” Int. Journal of Human-Computer Studies, vol. 51, no. 1, pp. 31–70, 1999.

Digital Library

[42]

J. E. Hale, S. Sharpe, and D. P. Hale, “An evaluation of the cognitive processes of programmers engaged in software debugging,” Software Maintenance: Research and Practice, vol. 11, no. 2, pp. 73–91, 1999.

Digital Library

[43]

T. Roehm, R. Tiarks, R. Koschke, and W. Maalej, “How do professional developers comprehend software?” in ACM/IEEE Int. Conference on Software Engineering (ICSE), 2012, pp. 255–265.

Digital Library

[44]

S. Owen, P. Brereton, and D. Budgen, “Protocol analysis: A neglected practice,” Commun. ACM, vol. 49, no. 2, pp. 117–122, 2006.

Digital Library

[45]

J. Whalley and N. Kasto, “A qualitative think-aloud study of novice programmers’ code writing strategies,” in ACM Conf. on Innovation and Technology in Computer Science Education (ITiCSE), 2014, pp. 279–284.

Digital Library

[46]

J.-P. Ostberg, J. Ramadani, and S. Wagner, “A novel approach for discovering barriers in using automatic static analysis,” in ACM Int. Conference on Evaluation and Assessment in Software Engineering (EASE), 2013, pp. 78–81.

Digital Library

[47]

G. Fraser and A. Zeller, “Exploiting common object usage in test case generation,” in IEEE Int. Conference on Software Testing, Verification and Validation (ICST), 2011, pp. 80–89.

Digital Library

[48]

S. Afshan, P. McMinn, and M. Stevenson, “Evolving readable string test inputs using a natural language model to reduce human oracle cost,” in Int. Conference on Software Testing, Verification and Validation (ICST), 2013, pp. 352–361.

Digital Library

[49]

R. Santelices, P. K. Chittimalli, T. Apiwattanapong, A. Orso, and M. J. Harrold, “Test-suite augmentation for evolving software,” in IEEE/ACM Int. Conference on Automated Software Engineering (ASE), 2008, pp. 218–227.

Digital Library

[50]

Z. Xu, Y. Kim, M. Kim, G. Rothermel, and M. B. Cohen, “Directed test suite augmentation: techniques and tradeoffs,” in ACM Symposium on the Foundations of Software Engineering (FSE), 2010, pp. 257–266.

Digital Library

[51]

M. Mirzaaghaei, F. Pastore, and M. Pezze, “Supporting test suite evolution through test case adaptation,” in IEEE Int. Conference on Software Testing, Verification and Validation (ICST), 2012, pp. 231–240.

Digital Library

Cited By

Brandt CKhatami AWessel MZaidman A(2024)Shaken, Not Stirred: How Developers Like Their Amplified TestsIEEE Transactions on Software Engineering10.1109/TSE.2024.338101550:5(1264-1280)Online publication date: May-2024
https://doi.org/10.1109/TSE.2024.3381015
Pecorelli FGrano GPalomba FGall HDe Lucia A(2024)Toward granular search-based automatic unit test case generationEmpirical Software Engineering10.1007/s10664-024-10451-x29:4Online publication date: 17-May-2024
https://doi.org/10.1007/s10664-024-10451-x
Midolo ATramontana E(2023)Automatic Generation of Accurate Test Templates based on JUnit AssertsProceedings of the 7th International Conference on Algorithms, Computing and Systems10.1145/3631908.3631926(125-131)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3631908.3631926
Show More Cited By

Index Terms

Automated unit test generation during software development: a controlled experiment and think-aloud observations
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

Automated unit test generation for evolving software
ESEC/FSE 2015: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering

As developers make changes to software programs, they want to ensure that the originally intended functionality of the software has not been affected. As a result, developers write tests and execute them after making changes. However, high quality ...
Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study
Special Issue on ISSTA 2013

Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test ...
Does automated white-box test generation really help software testers?
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Automated test generation techniques can efficiently produce test data that systematically cover structural aspects of a program. In the absence of a specification, a common assumption is that these tests relieve a developer of most of the work, as the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2015: Proceedings of the 2015 International Symposium on Software Testing and Analysis

July 2015

447 pages

ISBN:9781450336208

DOI:10.1145/2771783

General Chair:
Michal Young
University of Oregon, USA
,
Program Chair:
Tao Xie
University of Illinois at Urbana-Champaign, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ISSTA '15

Sponsor:

SIGSOFT

ISSTA '15: International Symposium on Software Testing and Analysis

July 13 - 17, 2015

MD, Baltimore, USA

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
669
Total Downloads

Downloads (Last 12 months)66
Downloads (Last 6 weeks)5

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Brandt CKhatami AWessel MZaidman A(2024)Shaken, Not Stirred: How Developers Like Their Amplified TestsIEEE Transactions on Software Engineering10.1109/TSE.2024.338101550:5(1264-1280)Online publication date: May-2024
https://doi.org/10.1109/TSE.2024.3381015
Pecorelli FGrano GPalomba FGall HDe Lucia A(2024)Toward granular search-based automatic unit test case generationEmpirical Software Engineering10.1007/s10664-024-10451-x29:4Online publication date: 17-May-2024
https://doi.org/10.1007/s10664-024-10451-x
Midolo ATramontana E(2023)Automatic Generation of Accurate Test Templates based on JUnit AssertsProceedings of the 7th International Conference on Algorithms, Computing and Systems10.1145/3631908.3631926(125-131)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3631908.3631926
Davis MChoi SEstep SMyers BSunshine JChandra SBlincoe KTonella P(2023)NaNofuzz: A Usable Tool for Automatic Test GenerationProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616327(1114-1126)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616327
Liu LWang SLiu YDeng JLiu S(2023)Drift: Fine-Grained Prediction of the Co-Evolution of Production and Test Code via Machine LearningProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609449(227-237)Online publication date: 4-Aug-2023
https://dl.acm.org/doi/10.1145/3609437.3609449
Imbulpitiya AWhalley JSenapathi M(2023)Reflections on Conducting Online Think-AloudsACM Inroads10.1145/359691714:2(26-34)Online publication date: 19-May-2023
https://dl.acm.org/doi/10.1145/3596917
Galindo-Gutierrez GCarvajal MFernandez Blanco AAnquetil NSandoval Alcocer J(2023)A manual categorization of new quality issues on automatically-generated tests2023 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58846.2023.00035(271-281)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICSME58846.2023.00035
Liu XYu P(2022)Randoop-TSR: Random-based Test Generator with Test Suite ReductionProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545280(221-230)Online publication date: 11-Jun-2022
https://dl.acm.org/doi/10.1145/3545258.3545280
Liu YYandrapally RKalia ASinha STzoref-Brill RMesbah AGarrido AWong WDe Angelis GDo HNguyen B(2022)CrawLabelProceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test10.1145/3524481.3527229(103-114)Online publication date: 17-May-2022
https://dl.acm.org/doi/10.1145/3524481.3527229
Shimmi SRahimi MGarrido AWong WDe Angelis GDo HNguyen B(2022)Leveraging code-test co-evolution patterns for automated test case recommendationProceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test10.1145/3524481.3527222(65-76)Online publication date: 17-May-2022
https://dl.acm.org/doi/10.1145/3524481.3527222
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten