research-article

Optimizing Search-Based Unit Test Generation with Large Language Models: An Empirical Study

Authors:

Lin ChenAuthors Info & Claims

Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware

Pages 71 - 80

https://doi.org/10.1145/3671016.3674813

Published: 24 July 2024 Publication History

Abstract

Search-based unit test generation methods have been considered effective and widely applied, and Large Language Models (LLMs) have also demonstrated their powerful generation ability. Therefore, some scholars have proposed using LLMs to enhance search-based unit test generation methods and have preliminarily confirmed that LLMs can help alleviate the problem of test coverage plateaus. However, it is still unclear when and how LLMs should intervene in the time-consuming test generation process. This paper explores the application of LLMs at various stages of search-based test generation (SBTG) (including the initial stage, the test generation period, and the test coverage plateaus), as well as strategies for controlling the frequency of LLM intervention. A comprehensive empirical study was conducted on 486 Python benchmark modules from 27 projects. The experimental results show that 1) LLM intervention has a positive effect at any stage, whether to improve coverage over a fixed period or to reduce the time to reach a specific coverage; 2) a reasonable intervention frequency is crucial for LLMs to have a positive effect on SBTG. This work can better help understand when and how LLMs should be applied in SBTG and provide valuable suggestions for developers in practice.

References

[1]

Nasser Albunian, Gordon Fraser, and Dirk Sudholt. 2020. Causes and effects of fitness landscapes in unit test generation. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 1204–1212.

Digital Library

[2]

Aldeida Aleti, Irene Moser, and Lars Grunske. 2017. Analysing the fitness landscape of search-based software testing problems. Automated Software Engineering 24 (2017), 603–621.

Digital Library

[3]

Shaukat Ali, Lionel C Briand, Hadi Hemmati, and Rajwinder Kaur Panesar-Walawege. 2009. A systematic review of the application and empirical investigation of search-based test case generation. IEEE Transactions on Software Engineering 36, 6 (2009), 742–762.

Digital Library

[4]

Andrea Arcuri and Xin Yao. 2008. Search based software testing of object-oriented containers. Information Sciences 178, 15 (2008), 3075–3095.

Digital Library

[5]

Arthur Baars, Mark Harman, Youssef Hassoun, Kiran Lakhotia, Phil McMinn, Paolo Tonella, and Tanja Vos. 2011. Symbolic search-based testing. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 53–62.

Digital Library

[6]

Thomas Back. 1996. Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford university press.

[7]

Patrick Bareiß, Beatriz Souza, Marcelo d’Amorim, and Michael Pradel. 2022. Code generation tools (almost) for free? a study of few-shot, pre-trained language models on code. arXiv preprint arXiv:2206.01335 (2022).

[8]

Luciano Baresi, Pier Luca Lanzi, and Matteo Miraz. 2010. Testful: An evolutionary test approach for java. In 2010 Third International Conference on Software Testing, Verification and Validation. IEEE, 185–194.

Digital Library

[9]

Benoit Baudry, Franck Fleurey, J-M Jézéquel, and Yves Le Traon. 2005. Automatic test case optimization: A bacteriologic algorithm. IEEE software 22, 2 (2005), 76–82.

[10]

Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, and Weizhu Chen. 2022. Codet: Code generation with generated tests. arXiv preprint arXiv:2207.10397 (2022).

[11]

Arghavan Moradi Dakhel, Amin Nikanjam, Vahid Majdinasab, Foutse Khomh, and Michel C Desmarais. 2024. Effective test generation using pre-trained large language models and mutation testing. Information and Software Technology (2024), 107468.

[12]

Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, and Shuvendu K Lahiri. 2022. Toga: A neural method for test oracle generation. In Proceedings of the 44th International Conference on Software Engineering. 2130–2141.

Digital Library

[13]

Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.

Digital Library

[14]

Gordon Fraser and Andrea Arcuri. 2012. Whole test suite generation. IEEE Transactions on Software Engineering 39, 2 (2012), 276–291.

Digital Library

[15]

Fred Glover. 1990. Tabu search: A tutorial. Interfaces 20, 4 (1990), 74–94.

Digital Library

[16]

Mark Harman, Sung Gon Kim, Kiran Lakhotia, Phil McMinn, and Shin Yoo. 2010. Optimizing for the number of tests generated in search based test data generation with an application to the oracle cost problem. In 2010 Third International Conference on Software Testing, Verification, and Validation Workshops. IEEE, 182–191.

Digital Library

[17]

Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. 2009. Search based software engineering: A comprehensive analysis and review of trends techniques and applications. (2009).

[18]

John H Holland. 1992. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press.

[19]

Kobi Inkumsah and Tao Xie. 2008. Improving structural testing of object-oriented programs via integrating evolutionary testing and symbolic execution. In 2008 23rd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 297–306.

Digital Library

[20]

Sungmin Kang, Juyeon Yoon, and Shin Yoo. 2023. Large language models are few-shot testers: Exploring llm-based general bug reproduction. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2312–2323.

Digital Library

[21]

James Kennedy and Russell Eberhart. 1995. Particle swarm optimization. In Proceedings of ICNN’95-international conference on neural networks, Vol. 4. ieee, 1942–1948.

[22]

John R Koza. 1994. Genetic programming as a means for programming computers by natural selection. Statistics and computing 4 (1994), 87–112.

[23]

Shuvendu K Lahiri, Aaditya Naik, Georgios Sakkas, Piali Choudhury, Curtis von Veh, Madanlal Musuvathi, Jeevana Priya Inala, Chenglong Wang, and Jianfeng Gao. 2022. Interactive code generation via test-driven user-intent formalization. arXiv preprint arXiv:2208.05950 (2022).

[24]

Caroline Lemieux, Jeevana Priya Inala, Shuvendu K Lahiri, and Siddhartha Sen. 2023. Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 919–931.

Digital Library

[25]

Tsz-On Li, Wenxi Zong, Yibo Wang, Haoye Tian, Ying Wang, and Shing-Chi Cheung. 2023. Finding failure-inducing test cases with chatgpt. arXiv preprint arXiv:2304.11686 (2023).

[26]

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2023. Chatting with gpt-3 for zero-shot human-like mobile automated gui testing. arXiv preprint arXiv:2305.09434 (2023).

[27]

Stephan Lukasczyk and Gordon Fraser. 2022. Pynguin: Automated unit test generation for python. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings. 168–172.

Digital Library

[28]

Jan Malburg and Gordon Fraser. 2011. Combining search-based and constraint-based testing. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 436–439.

Digital Library

[29]

Phil McMinn. 2004. Search-based software test data generation: a survey. Software testing, Verification and reliability 14, 2 (2004), 105–156.

Digital Library

[30]

Webb Miller and David L. Spooner. 1976. Automatic generation of floating-point test data. IEEE Transactions on Software Engineering3 (1976), 223–226.

Digital Library

[31]

Annibale Panichella, Fitsum Meshesha Kifetew, and Paolo Tonella. 2017. Automated test case generation as a many-objective optimisation problem with dynamic selection of the targets. IEEE Transactions on Software Engineering 44, 2 (2017), 122–158.

[32]

Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. An empirical evaluation of using large language models for automated unit test generation. IEEE Transactions on Software Engineering (2023).

[33]

Mohammed Latif Siddiq, Joanna Santos, Ridwanul Hasan Tanvir, Noshin Ulfat, Fahmid Al Rifat, and Vinicius Carvalho Lopes. 2023. Exploring the effectiveness of large language models in generating unit tests. arXiv preprint arXiv:2305.00418 (2023).

[34]

Dimitri Stallenberg, Mitchell Olsthoorn, and Annibale Panichella. 2022. Guess what: Test case generation for Javascript with unsupervised probabilistic type inference. In International Symposium on Search Based Software Engineering. Springer, 67–82.

Digital Library

[35]

Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit test case generation with transformers and focal context. arXiv preprint arXiv:2009.05617 (2020).

[36]

PT Van Larrhoven and EH Aarts. 1988. Simulated annealing: theory and practice.

[37]

Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. 2020. On learning meaningful assert statements for unit test cases. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 1398–1409.

Digital Library

[38]

Ratnadira Widyasari, Sheng Qin Sim, Camellia Lok, Haodi Qi, Jack Phan, Qijin Tay, Constance Tan, Fiona Wee, Jodie Ethelda Tan, Yuheng Yieh, 2020. Bugsinpy: a database of existing bugs in python programs to enable controlled testing and debugging studies. In Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. 1556–1560.

Digital Library

[39]

Zhuokui Xie, Yinghao Chen, Chen Zhi, Shuiguang Deng, and Jianwei Yin. 2023. ChatUniTest: a ChatGPT-based automated unit test generation tool. arXiv preprint arXiv:2305.04764 (2023).

[40]

Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, and Xin Peng. 2023. No more manual tests? evaluating and improving chatgpt for unit test generation. arXiv preprint arXiv:2305.04207 (2023).

Index Terms

Optimizing Search-Based Unit Test Generation with Large Language Models: An Empirical Study
1. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering

Recommendations

Achieving scalable mutation-based generation of whole test suites

Without complete formal specification, automatically generated software tests need to be manually checked in order to detect faults. This makes it desirable to produce the strongest possible test set while keeping the number of tests as small as ...
Compressing Automatically Generated Unit Test Suites Through Test Parameterization
Fundamentals of Software Engineering
Abstract
Test maintenance has recently gained increasing attention from the software testing research community. When using automated unit test generation tools, the tests are typically created by random test generation or search-based algorithms. Although ...
An industrial evaluation of unit test generation: finding real faults in a financial application
ICSE-SEIP '17: Proceedings of the 39th International Conference on Software Engineering: Software Engineering in Practice Track

Automated unit test generation has been extensively studied in the literature in recent years. Previous studies on open source systems have shown that test generation tools are quite effective at detecting faults, but how effective and applicable are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

Internetware '24: Proceedings of the 15th Asia-Pacific Symposium on Internetware

July 2024

518 pages

ISBN:9798400707056

DOI:10.1145/3671016

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 July 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Natural Science Foundation of China

Conference

Internetware 2024

Sponsor:

SIGSOFT

Internetware 2024: 15th Asia-Pacific Symposium on Internetware

July 24 - 26, 2024

Macau, China

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
156
Total Downloads

Downloads (Last 12 months)156
Downloads (Last 6 weeks)28

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten