Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation

Published: 28 November 2023 Publication History

Abstract

Unit tests play a key role in ensuring the correctness of software. However, manually creating unit tests is a laborious task, motivating the need for automation. Large Language Models (LLMs) have recently been applied to various aspects of software development, including their suggested use for automated generation of unit tests, but while requiring additional training or few-shot learning on examples of existing tests. This paper presents a large-scale empirical evaluation on the effectiveness of LLMs for automated unit test generation without requiring additional training or manual effort. Concretely, we consider an approach where the LLM is provided with prompts that include the signature and implementation of a function under test, along with usage examples extracted from documentation. Furthermore, if a generated test fails, our approach attempts to generate a new test that fixes the problem by re-prompting the model with the failing test and error message. We implement our approach in <sc>TestPilot</sc>, an adaptive LLM-based test generation tool for JavaScript that automatically generates unit tests for the methods in a given project&#x0027;s API. We evaluate <sc>TestPilot</sc> using OpenAI&#x0027;s <italic>gpt3.5-turbo</italic> LLM on 25 npm packages with a total of 1,684 API functions. The generated tests achieve a median statement coverage of 70.2% and branch coverage of 52.8%. In contrast, the state-of-the feedback-directed JavaScript test generation technique, Nessie, achieves only 51.3% statement coverage and 25.6% branch coverage. Furthermore, experiments with excluding parts of the information included in the prompts show that all components contribute towards the generation of effective test suites. We also find that 92.8% of <sc>TestPilot</sc>&#x0027;s generated tests have <inline-formula><tex-math notation="LaTeX">$\leq$</tex-math><alternatives><mml:math display="inline"><mml:mo>&#x02264;</mml:mo></mml:math><inline-graphic xlink:href="schaefer-ieq1-3334955.gif"/></alternatives></inline-formula> 50% similarity with existing tests (as measured by normalized edit distance), with none of them being exact copies. Finally, we run <sc>TestPilot</sc> with two additional LLMs, OpenAI&#x0027;s older <italic>code-cushman-002</italic> LLM and <italic>StarCoder</italic>, an LLM for which the training process is publicly documented. Overall, we observed similar results with the former (68.2% median statement coverage), and somewhat worse results with the latter (54.0% median statement coverage), suggesting that the effectiveness of the approach is influenced by the size and training set of the LLM, but does not fundamentally depend on the specific model.

References

[1]
K. Beck, Extreme Programming Explained: Embrace Change. Reading, MA, USA: Addison-Wesley, 2000.
[2]
J. Shore and S. Warden, The Art of Agile Development, 2nd ed. Sebastopol, CA, USA: O’Reilly, 2021.
[3]
S. Siddiqui, Learning Test-Driven Development. Sebastopol, CA, USA: O’Reilly, 2021.
[4]
E. Daka and G. Fraser, “A survey on unit testing practices and problems,” in Proc. IEEE 25th Int. Symp. Softw. Rel. Eng., 2014, pp. 201–211.
[5]
B. P. Miller, L. Fredriksen, and B. So, “An empirical study of the reliability of UNIX utilities,” Commun. ACM, vol. 33, no. 12, pp. 32–44, 1990.
[6]
M. Zalewski. “American fuzzy lop.” Accessed: Nov. 17, 2023. [Online]. Available: https://lcamtuf.coredump.cx/afl/
[7]
C. Csallner and Y. Smaragdakis, “JCrasher: An automatic robustness tester for Java,” Softw. Pract. Experience, vol. 34, no. 11, pp. 1025–1050, 2004.
[8]
C. Pacheco and M. D. Ernst, “Randoop: Feedback-directed random testing for Java,” in Proc. Companion 22nd ACM SIGPLAN Conf. Object-Oriented Program. Syst. Appl. Companion (OOPSLA), New York, NY, USA: ACM, 2007, pp. 815–816.
[9]
C. Pacheco, S. K. Lahiri, and T. Ball, “Finding errors in.net with feedback-directed random testing,” in Proc. Int. Symp. Softw. Testing Anal. (ISSTA), New York, NY, USA: ACM, 2008, pp. 87–96.
[10]
M. Selakovic, M. Pradel, R. Karim, and F. Tip, “Test generation for higher-order functions in dynamic languages,” in Proc. ACM Program. Lang., 2018, vol. 2, pp. 161:1–161:27.
[11]
E. Arteca, S. Harner, M. Pradel, and F. Tip, “Nessie: Automatically testing JavaScript APIs with asynchronous callbacks,” in Proc. 44th IEEE/ACM 44th Int. Conf. Softw. Eng. (ICSE), Pittsburgh, PA, USA, New York, NY, USA: ACM, 2022, pp. 1494–1505.
[12]
P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed automated random testing,” in Proc. ACM SIGPLAN Conf. Program. Lang. Des. Implementation, Chicago, IL, USA, V. Sarkar and M. W. Hall, Eds., New York, NY, USA: ACM, 2005, pp. 213–223.
[13]
K. Sen, D. Marinov, and G. Agha, “CUTE: A concolic unit testing engine for C,” in Proc. 10th Eur. Softw. Eng. Conf. Held Jointly 13th ACM SIGSOFT Int. Symp. Found. Softw. Eng., Lisbon, Portugal, M. Wermelinger and H. C. Gall, Eds., New York, NY, USA: ACM, 2005, pp. 263–272.
[14]
C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler, “EXE: Automatically generating inputs of death,” in Proc. 13th ACM Conf. Comput. Commun. Secur. (CCS), Alexandria, VA, USA, A. Juels, R. N. Wright, and S. D. C. di Vimercati, Eds., New York, NY, USA: ACM, 2006, pp. 322–335.
[15]
N. Tillmann, J. de Halleux, and T. Xie, “Transferring an automated test generation tool to practice: From pex to fakes and code digger,” in Proc. ACM/IEEE Int. Conf. Automated Softw. Eng. (ASE), Vasteras, Sweden, I. Crnkovic, M. Chechik, and P. Grünbacher, Eds., New York, NY, USA: ACM, 2014, pp. 385–396.
[16]
G. Fraser and A. Arcuri, “Evolutionary generation of whole test suites,” in Proc. 11th Int. Conf. Qual. Softw. (QSIC), Madrid, Spain, M. Nú∼nez, R. M. Hierons, and M. G. Merayo, Eds., Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 2011, pp. 31–40.
[17]
G. Fraser and A. Arcuri, “EvoSuite: Automatic test suite generation for object-oriented software,” in Proc. 19th ACM SIGSOFT Symp. Found. Softw. Eng. (FSE), 13th Eur. Softw. Eng. Conf. (ESEC), Szeged, Hungary, T. Gyimóthy and A. Zeller, Eds., New York, NY, USA: ACM, 2011, pp. 416–419.
[18]
M. M. Almasi, H. Hemmati, G. Fraser, A. Arcuri, and J. Benefelds, “An industrial evaluation of unit test generation: Finding real faults in a financial application,” in Proc. IEEE/ACM 39th Int. Conf. Softw. Eng., Softw. Eng. Pract. Track (ICSE-SEIP), Piscataway, NJ, USA: IEEE Press, 2017, pp. 263–272.
[19]
G. Grano, S. Scalabrino, H. C. Gall, and R. Oliveto, “An empirical investigation on the readability of manual and generated test cases,” in Proc. IEEE/ACM 26th Int. Conf. Program Comprehension (ICPC), Piscataway, NJ, USA: IEEE Press, 2018, pp. 348–3483.
[20]
E. Daka, J. Campos, G. Fraser, J. Dorn, and W. Weimer, “Modeling readability to improve unit tests,” in Proc. 10th Joint Meeting Found. Softw. Eng. (ESEC/FSE), Bergamo, Italy, E. D. Nitto, M. Harman, and P. Heymans, Eds., New York, NY, USA: ACM, 2015, pp. 107–118.
[21]
A. Panichella, S. Panichella, G. Fraser, A. A. Sawant, and V. J. Hellendoorn, “Revisiting test smells in automatically generated tests: Limitations, pitfalls, and opportunities,” in Proc. IEEE Int. Conf. Softw. Maintenance Evol. (ICSME), Adelaide, Australia. Piscataway, NJ, USA: IEEE Press, 2020, pp. 523–533.
[22]
F. Palomba, D. D. Nucci, A. Panichella, R. Oliveto, and A. D. Lucia, “On the diffusion of test smells in automatically generated test code: An empirical study,” in Proc. 9th Int. Workshop Search-Based Softw. Testing (SBST), Austin, TX, USA. New York, NY, USA: ACM, 2016, pp. 5–14.
[23]
B. Chen et al., “CodeT: Code Generation with Generated Tests,” 2022. [Online]. Available: https://arxiv.org/abs/2207.10397
[24]
S. K. Lahiri et al., “Interactive code generation via test-driven user-intent formalization,” 2022. [Online]. Available: https://arxiv.org/abs/2208.05950
[25]
P. Bareiß, B. Souza, M. d’Amorim, and M. Pradel, “Code generation tools (almost) for free? A study of few-shot, pre-trained language models on code,” in Proc. 37th ACM/IEEE Int. Conf. Automated Softw. Eng., 2022. [Online]. Available: https://arxiv.org/abs/2206.01335
[26]
M. Tufano, D. Drain, A. Svyatkovskiy, S. K. Deng, and N. Sundaresan, “Unit test case generation with transformers and focal context,” Microsoft, Redmond, WA, USA, May 2021. [Online]. Available: https://www.microsoft.com/en-us/research/publication/unit-test-case-generation-with-transformers-and-focal-context/
[27]
C. Lemieux, J. P. Inala, S. K. Lahiri, and S. Sen, “CodaMosa: Escaping coverage plateaus in test generation with pre-trained large language models,” in Proc. 45th Int. Conf. Softw. Eng. (ICSE), 2023, pp. 919–931.
[28]
C. Watson, M. Tufano, K. Moran, G. Bavota, and D. Poshyvanyk, “On learning meaningful assert statements for unit test cases,” in Proc. ACM/IEEE 42nd Int. Conf. Softw. Eng., New York, NY, USA: ACM, Jun. 2020, pp. 1398–1409.
[29]
A. Mastropaolo et al., “Studying the usage of text-to-text transfer transformer to support code-related tasks,” in Proc. 43rd IEEE/ACM Int. Conf. Softw. Eng. (ICSE), 2021, pp. 336–347.
[30]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” 2018,.
[31]
T. B. Brown et al., “Language models are few-shot learners,” 2020. [Online]. Available: https://arxiv.org/abs/2005.14165
[32]
M. Chen et al., “Evaluating large language models trained on code,” 2021. [Online]. Available: https://arxiv.org/abs/2107.03374
[33]
Y. Li et al., “Competition-level code generation with AlphaCode,” 2022. [Online]. Available: https://arxiv.org/abs/2203.07814
[34]
J. Li, Y. Wang, M. R. Lyu, and I. King, “Code completion with neural attention and pointer networks,” in Proc. 27th Int. Joint Conf. Artif. Intell. (IJCAI), Stockholm, Sweden, J. Lang, Ed., 2018, pp. 4159–4165.
[35]
A. Svyatkovskiy, S. K. Deng, S. Fu, and N. Sundaresan, “IntelliCode compose: Code generation using transformer,” in Proc. 28th ACM Joint Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng. (ESEC/FSE), P. Devanbu, M. B. Cohen, and T. Zimmermann, Eds., New York, NY, USA: ACM, 2020, pp. 1433–1443.
[36]
R. Karampatsis, H. Babii, R. Robbes, C. Sutton, and A. Janes, “Big code != big vocabulary: Open-vocabulary models for source code,” in Proc.42nd Int. Conf. Softw. Eng. (ICSE), Seoul, South Korea, G. Rothermel and D. Bae, Eds., New York, NY, USA: ACM, 2020, pp. 1073–1085.
[37]
S. Kim, J. Zhao, Y. Tian, and S. Chandra, “Code prediction by feeding trees to transformers,” in Proc. 43rd IEEE/ACM Int. Conf. Softw. Eng. (ICSE), Madrid, Spain. Piscataway, NJ, USA: IEEE Press, 2021, pp. 150–162.
[38]
“GitHub Copilot.” GitHub. Accessed: Nov. 17, 2023. [Online]. Available: https://copilot.github.com/
[39]
R. Gupta, S. Pal, A. Kanade, and S. K. Shevade, “DeepFix: Fixing common C language errors by deep learning,” in Proc. 31st AAAI Conf. Artif. Intell., S. Singh and S. Markovitch, Eds., San Francisco, CA, USA: AAAI Press, 2017, pp. 1345–1351. [Online]. Available: http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14603
[40]
V. J. Hellendoorn, C. Sutton, R. Singh, P. Maniatis, and D. Bieber, “Global relational models of source code,” in Proc. 8th Int. Conf. Learn. Representations (ICLR), Addis Ababa, Ethiopia, 2020. [Online]. Available: https://openreview.net/forum?id=B1lnbRNtwr
[41]
M. Allamanis, H. Jackson-Flux, and M. Brockschmidt, “Self-supervised bug detection and repair,” in Proc. Adv. Neural Inf. Process. Syst., Annu. Conf. Neural Inf. Process. Syst. (NeurIPS), M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021, vol. 34, pp. 27 865–27 876. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/ea96efc03b9a050d895110db8c4af057-Abstract.html
[42]
M. Pradel and K. Sen, “DeepBugs: A learning approach to name-based bug detection,” ACM Program. Lang., vol. 2, no. OOPSLA, pp. 147:1–147:25, 2018.
[43]
M. Allamanis, M. Brockschmidt, and M. Khademi, “Learning to represent programs with graphs,” in Proc. 6th Int. Conf. Learn. Representations (ICLR), Vancouver, BC, Canada, 2018. [Online]. Available: https://openreview.net/forum?id=BJOFETxR-
[44]
M. Lewis et al., “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” 2019,.
[45]
A. Mastropaolo et al., “Using transfer learning for code-related tasks,” IEEE Trans. Softw. Eng., early access, 2022.
[46]
M. Tufano, D. Drain, A. Svyatkovskiy, and N. Sundaresan, “Generating accurate assert statements for unit test cases using pretrained transformers,” in Proc. 3rd ACM/IEEE Int. Conf. Automat. Softw. Test (AST), New York, NY, USA: ACM, 2022, pp. 54–64.
[47]
L. Reynolds and K. McDonell, “Prompt programming for large language models: Beyond the few-shot paradigm,” 2021. [Online]. Available: https://arxiv.org/abs/2102.07350
[48]
“OpenAI LLMs: Deprecations.” OpenAI. Accessed: Nov. 17, 2023. [Online]. Available: https://platform.openai.com/docs/deprecations
[49]
“Starcoder: A state-of-the-art LLM for code.” HuggingFace. Accessed: Nov. 17, 2023. [Online]. Available: https://huggingface.co/blog/starcoder
[50]
Mocha. [Online]. Available: https://mochajs.org/
[51]
G. Mezzetti, A. Møller, and M. T. Torp, “Type regression testing to detect breaking changes in Node.js libraries,” in Proc. 32nd Eur. Conf. Object-Oriented Program. (ECOOP), T. D. Millstein, Ed., Amsterdam, The Netherlands: Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, 2018, vol. 109, pp. 7:1–7:24.
[52]
“OpenAI Codex.” OpenAI. Accessed: Nov. 17, 2023. [Online]. Available: https://openai.com/blog/openai-codex
[53]
“Istanbul coverage tool.” Accessed: Nov. 17, 2023. [Online]. Available: https://istanbul.js.org/
[54]
S. Yoo and M. Harman, “Regression testing minimization, selection and prioritization: A survey,” Softw. Testing, Verification Rel., vol. 22, no. 2, pp. 67–120, 2012.
[55]
N. Cliff, “Dominance statistics: Ordinal analyses to answer ordinal questions,” Psychological Bull., vol. 114, no. 3, pp. 494–509, 1993.
[56]
M. Selakovic, M. Pradel, R. Karim, and F. Tip, “Test generation for higher-order functions in dynamic languages,” Proc. ACM Program. Lang., vol. 2, no. OOPSLA, pp. 1–27, 2018.
[57]
“CodeQL.” GitHub. Accessed: Nov. 17, 2023. [Online]. Available: https://codeql.github.com/
[58]
“Node.js error codes.” Node.js. Accessed: Nov. 17, 2023. [Online]. Available: https://nodejs.org/dist/latest-v18.x/docs/api/errors.html#nodejs-error-codes
[59]
S. Schleimer, D. S. Wilkerson, and A. Aiken, “Winnowing: Local algorithms for document fingerprinting,” in Proc. ACM SIGMOD Int. Conf. Manage. Data (SIGMOD), New York, NY, USA: ACM, 2003, pp. 76–85.
[60]
G. Myers, “A fast bit-vector algorithm for approximate string matching based on dynamic programming,” J. ACM, vol. 46, no. 3, pp. 395–415, May 1999.
[61]
“npm levenshtein distance package.” npmJS. Accessed: Nov. 17, 2023. [Online]. Available: https://www.npmjs.com/package/levenshtein
[62]
D. Schuler and A. Zeller, “Assessing oracle quality with checked coverage,” in Proc. 4th IEEE Int. Conf. Softw. Testing, Verification Validation, 2011, pp. 90–99.
[63]
P. McMinn, “Search-based software testing: Past, present and future,” in Prod. 4th IEEE Int. Conf. Softw. Testing, Verification Validation (ICST), Berlin, Germany, Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 2011, pp. 153–163.
[64]
A. Panichella, F. M. Kifetew, and P. Tonella, “Automated test case generation as a many-objective optimisation problem with dynamic selection of the targets,” IEEE Trans. Softw. Eng., vol. 44, no. 2, pp. 122–158, Feb. 2018.
[65]
M. Pradel and S. Chandra, “Neural software analysis,” Commun. ACM, vol. 65, no. 1, pp. 86–96, 2022.
[66]
H. Yu et al., “Automated assertion generation via information retrieval and its integration with deep learning,” in Proc. 44th Int. Conf. Softw. Eng. (ICSE), New York, NY, USA: ACM, 2022, pp. 163–174.
[67]
E. Dinella, G. Ryan, T. Mytkowicz, and S. K. Lahiri, “TOGA: A neural method for test oracle generation,” in Proc. 44th Int. Conf. Softw. Eng. (ICSE), New York, NY, USA: ACM, 2022, pp. 2130–2141.
[68]
Z. Liu, K. Liu, X. Xia, and X. Yang, “Towards more realistic evaluation for neural test oracle generation,” in Proc. ACM SIGSOFT Int. Symp. Softw. Testing Anal., (ISSTA), 2023, pp. 859–600.
[69]
R. Just, D. Jalali, and M. D. Ernst, “Defects4j: A database of existing faults to enable controlled testing studies for java programs,” in Proc. Int. Symp. Softw. Testing Anal. (ISSTA), San Jose, CA, USA, C. S. Pasareanu and D. Marinov, Eds., New York, NY, USA: ACM, 2014, pp. 437–440.
[70]
D. M. Stallenberg, M. Olsthoorn, and A. Panichella, “Guess what: Test case generation for Javascript with unsupervised probabilistic type inference,” in Proc. Search-Based Softw. Eng. –14th Int. Symp. (SSBSE), M. Papadakis and S. R. Vergilio, Eds., Singapore: Springer-Verlag, 2022, vol. 13711, pp. 67–82.
[71]
K. El Haji, “Empirical study on test generation using GitHub Copilot,” Master's thesis, Delft Univer. of Technology, Delft, Netherlands, 2023.
[72]
E. Andreasen et al., “A survey of dynamic analysis and test generation for JavaScript,” ACM Comput. Surv., vol. 50, no. 5, pp. 66:1–66:36, 2017.
[73]
P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and D. Song, “A symbolic execution framework for JavaScript,” in Proc. 31st IEEE Symp. Secur. Privacy (S&P), Berleley, CA, USA. Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 2010, pp. 513–528.
[74]
S. Artzi, J. Dolby, S. H. Jensen, A. Møller, and F. Tip, “A framework for automated testing of JavaScript web applications,” in Proc. 33rd Int. Conf. Softw. Eng. (ICSE), Honolulu, HI, USA, R. N. Taylor, H. C. Gall, and N. Medvidovic, Eds., New York, NY, USA: ACM, 2011, pp. 571–580.
[75]
G. Li, E. Andreasen and I. Ghosh, “SymJS: Automatic symbolic testing of JavaScript web applications,” in Proc. 22nd ACM SIGSOFT Int. Symp. Found. Softw. Eng., (FSE), Hong Kong, China, S. Cheung, A. Orso, and M. D. Storey, Eds., New York, NY, USA: ACM, 2014, pp. 449–459.
[76]
H. Tanida, T. Uehara, G. Li, and I. Ghosh, “Automatic Unit Test Generation and Execution for JavaScript Program through Symbolic Execution,” in Proc. 9th Int. Conf. Software Eng. Adv., 2014, pp. 259–265.
[77]
A. M. Fard, A. Mesbah, and E. Wohlstadter, “Generating fixtures for JavaScript unit testing,” in Proc. 30th IEEE/ACM Int. Conf. Automated Softw. Eng. (ASE), Lincoln, NE, USA, M. B. Cohen, L. Grunske, and M. Whalen, Eds., Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 2015, pp. 190–200.
[78]
A. Marchetto and P. Tonella, “Using search-based algorithms for Ajax event sequence generation during testing,” Empirical Softw. Eng., vol. 16, no. 1, pp. 103–140, 2011.
[79]
S. Mirshokraie and A. Mesbah, “JSART: JavaScript assertion-based regression testing,” in Proc. Web Eng. – 12th Int. Conf. (ICWE), Berlin, Germany, M. Brambilla, T. Tokuda, and R. Tolksdorf, Eds., Berlin, Germany: Springer-Verlag, 2012, vol. 7387, pp. 238–252.
[80]
“The Daikon invariant detector,” Univ. of Washington, Seattle, WA, USA, 2023. [Online]. Available: http://plse.cs.washington.edu/daikon/
[81]
S. Mirshokraie, A. Mesbah, and K. Pattabiraman, “PYTHIA: Generating test cases with oracles for JavaScript applications,” in Proc. 28th IEEE/ACM Int. Conf. Automated Softw. Eng. (ASE), Silicon Valley, CA, USA, E. Denney, T. Bultan, and A. Zeller, Eds., Piscataway, NJ, USA: IEEE Press, 2013, pp. 610–615.
[82]
S. Mirshokraie, A. Mesbah, and K. Pattabiraman, “JSEFT: Automated JavaScript unit test generation,” in Proc. 8th IEEE Int. Conf. Softw. Testing, Verification Validation (ICST), Graz, Austria, Los Alamitos, CA, USA: IEEE Comput. Soc. Press, 2015, pp. 1–10.
[83]
A. M. Fard, M. MirzaAghaei, and A. Mesbah, “Leveraging existing tests in automated test generation for web applications,” in Proc. ACM/IEEE Int. Conf. Automated Softw. Eng. (ASE), Vasteras, Sweden, I. Crnkovic, M. Chechik, and P. Grünbacher, Eds., New York, NY, USA: ACM, 2014, pp. 67–78.

Cited By

View all
  • (2024)Increasing Test Coverage by Automating BDD Tests in Proofs of Concepts (POCs) using LLMProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701637(519-525)Online publication date: 5-Nov-2024
  • (2024)B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible TestsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695536(1693-1705)Online publication date: 27-Oct-2024
  • (2024)On the Evaluation of Large Language Models in Unit Test GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695529(1607-1619)Online publication date: 27-Oct-2024
  • Show More Cited By

Index Terms

  1. An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image IEEE Transactions on Software Engineering
        IEEE Transactions on Software Engineering  Volume 50, Issue 1
        Jan. 2024
        158 pages

        Publisher

        IEEE Press

        Publication History

        Published: 28 November 2023

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 09 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Increasing Test Coverage by Automating BDD Tests in Proofs of Concepts (POCs) using LLMProceedings of the XXIII Brazilian Symposium on Software Quality10.1145/3701625.3701637(519-525)Online publication date: 5-Nov-2024
        • (2024)B4: Towards Optimal Assessment of Plausible Code Solutions with Plausible TestsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695536(1693-1705)Online publication date: 27-Oct-2024
        • (2024)On the Evaluation of Large Language Models in Unit Test GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695529(1607-1619)Online publication date: 27-Oct-2024
        • (2024)Towards Understanding the Effectiveness of Large Language Models on Directed Test Input GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695513(1408-1420)Online publication date: 27-Oct-2024
        • (2024)JavaBench: A Benchmark of Object-Oriented Code Generation for Evaluating Large Language ModelsProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695470(870-882)Online publication date: 27-Oct-2024
        • (2024)Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMsProceedings of the ACM on Programming Languages10.1145/36897358:OOPSLA2(677-708)Online publication date: 8-Oct-2024
        • (2024)Automatic Library Migration Using Large Language Models: First ResultsProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3690746(427-433)Online publication date: 24-Oct-2024
        • (2024)Self-Collaboration Code Generation via ChatGPTACM Transactions on Software Engineering and Methodology10.1145/367245933:7(1-38)Online publication date: 12-Jun-2024
        • (2024)A Pilot Study in Surveying Data Challenges of Automatic Software Engineering TasksProceedings of the 4th International Workshop on Software Engineering and AI for Data Quality in Cyber-Physical Systems/Internet of Things10.1145/3663530.3665020(6-11)Online publication date: 15-Jul-2024
        • (2024)An Empirical Study on How Large Language Models Impact Software Testing LearningProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661273(555-564)Online publication date: 18-Jun-2024
        • Show More Cited By

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media