Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3605098.3636058acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

From Fine-tuning to Output: An Empirical Investigation of Test Smells in Transformer-Based Test Code Generation

Published: 21 May 2024 Publication History

Abstract

Researchers have recently leveraged transformer-based test code generation models to improve testing-related tasks (e.g., assert completion and test method generation). One such model, AthenaTest, has been recently introduced, and it has been well accepted by developers due to its ability to generate test cases similar to the ones written by developers. While the AthenaTest model provides adequate test coverage and improves test code readability, concerns remain regarding the quality of its generated test code, particularly the presence of test smells, which could degrade the test code's comprehension, readability, performance, and maintainability. In this paper, we investigated whether test cases generated by a transformer-based test code generation model - AthenaTest, contain test smells, including the presence of test smells in its fine-tuning dataset (Methods2Test). We evaluated seven test smells in AthenaTest's and Methods2Test's test cases. Our results reveal that 65% of Methods2Test's test cases contain test smells, which influence the output of AthenaTest, where 62% of its test cases contain at least one test smell. We also examined the design (test size and assertion frequency) of AthenaTest and Methods2Test test cases. Our findings show that AthenaTest tends to generate more assertions than the Methods2Test test case, which influenced the model to increase the occurrence rate of Assertion Roulette and Duplicate Assert smells.

References

[1]
Saranya Alagarsamy, Chakkrit Tantithamthavorn, and Aldeida Aleti. 2023. A3Test: Assertion-Augmented Automated Test Case Generation. arXiv preprint arXiv:2302.10352 (2023).
[2]
Wajdi Aljedaani, Anthony Peruma, Ahmed Aljohani, Mazen Alotaibi, Mohamed Wiem Mkaouer, Ali Ouni, Christian D Newman, Abdullatif Ghallab, and Stephanie Ludi. 2021. Test smell detection tools: A systematic mapping study. Evaluation and Assessment in Software Engineering (2021), 170--180.
[3]
Gabriele Bavota, Abdallah Qusef, Rocco Oliveto, Andrea De Lucia, and Dave Binkley. 2015. Are test smells really harmful? an empirical study. Empirical Software Engineering 20 (2015), 1052--1094.
[4]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
[5]
Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, and Weizhu Chen. 2022. Codet: Code generation with generated tests. arXiv preprint arXiv:2207.10397 (2022).
[6]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
[7]
Ermira Daka, José Campos, Gordon Fraser, Jonathan Dorn, and Westley Weimer. 2015. Modeling readability to improve unit tests. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 107--118.
[8]
Ermira Daka and Gordon Fraser. 2014. A survey on unit testing practices and problems. In 2014 IEEE 25th International Symposium on Software Reliability Engineering. IEEE, 201--211.
[9]
Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416--419.
[10]
Gordon Fraser and Andrea Arcuri. 2012. Whole test suite generation. IEEE Transactions on Software Engineering 39, 2 (2012), 276--291.
[11]
Giovanni Grano, Fabio Palomba, Dario Di Nucci, Andrea De Lucia, and Harald C Gall. 2019. Scented since the beginning: On the diffuseness of test smells in automatically generated test code. Journal of Systems and Software 156 (2019), 312--327.
[12]
Giovanni Grano, Simone Scalabrino, Harald C Gall, and Rocco Oliveto. 2018. An empirical investigation on the readability of manual and generated test cases. In Proceedings of the 26th Conference on Program Comprehension. 348--351.
[13]
René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437--440.
[14]
Caroline Lemieux, Jeevana Priya Inala, Shuvendu K Lahiri, and Siddhartha Sen. 2023. Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In 45th International Conference on Software Engineering, ser. ICSE.
[15]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461 (2019).
[16]
Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the usage of text-to-text transfer transformer to support code-related tasks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 336--347.
[17]
Microsoft. 2023. Methods2Test. https://github.com/microsoft/methods2test.
[18]
Carlos Pacheco and Michael D Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion. 815--816.
[19]
Fabio Palomba, Dario Di Nucci, Annibale Panichella, Rocco Oliveto, and Andrea De Lucia. 2016. On the diffusion of test smells in automatically generated test code: An empirical study. In Proceedings of the 9th international workshop on search-based software testing. 5--14.
[20]
Annibale Panichella, Sebastiano Panichella, Gordon Fraser, Anand Ashok Sawant, and Vincent J Hellendoorn. 2020. Revisiting test smells in automatically generated tests: limitations, pitfalls, and opportunities. In 2020 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 523--533.
[21]
Anthony Peruma, Khalid Almalki, Christian D Newman, Mohamed Wiem Mkaouer, Ali Ouni, and Fabio Palomba. 2020. Tsdetect: An open source test smells detection tool. In Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. 1650--1654.
[22]
Anthony Peruma, Khalid Saeed Almalki, Christian D Newman, Mohamed Wiem Mkaouer, Ali Ouni, and Fabio Palomba. 2019. On the distribution of test smells in open source android applications: An exploratory study. (2019).
[23]
Aniket Potdar and Emad Shihab. 2014. An exploratory study on self-admitted technical debt. In 2014 IEEE International Conference on Software Maintenance and Evolution. IEEE, 91--100.
[24]
Abdelilah Sakti, Gilles Pesant, and Yann-Gaël Guéhéneuc. 2014. Instance generator and problem representation to improve object oriented code coverage. IEEE Transactions on Software Engineering 41, 3 (2014), 294--313.
[25]
Max Schäfer, Sarah Nadi, Aryaz Eghbali, and Frank Tip. 2023. Adaptive test generation using a large language model. arXiv preprint arXiv:2302.06527 (2023).
[26]
Mohammed Latif Siddiq, Shafayat H Majumder, Maisha R Mim, Sourov Jajodia, and Joanna CS Santos. 2022. An Empirical Study of Code Smells in Transformer-based Code Generation Techniques. In 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 71--82.
[27]
Davide Spadini, Fabio Palomba, Andy Zaidman, Magiel Bruntink, and Alberto Bacchelli. 2018. On the relation of test smells to software code quality. In 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 1--12.
[28]
Davide Spadini, Martin Schvarcbacher, Ana-Maria Oprescu, Magiel Bruntink, and Alberto Bacchelli. 2020. Investigating severity thresholds for test smells. In Proceedings of the 17th International Conference on Mining Software Repositories. 311--321.
[29]
Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. 2019. Pythia: Ai-assisted code completion system. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining. 2727--2735.
[30]
Michele Tufano. 2023. AthenaTest-Showcase. https://github.com/micheletufano/AthenaTest-Showcase.
[31]
Michele Tufano, Shao Kun Deng, Neel Sundaresan, and Alexey Svyatkovskiy. 2022. Methods2Test: A dataset of focal methods mapped to test cases. In Proceedings of the 19th International Conference on Mining Software Repositories. 299--303.
[32]
Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2020. Unit test case generation with transformers and focal context. arXiv preprint arXiv:2009.05617 (2020).
[33]
Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, and Neel Sundaresan. 2022. Generating accurate assert statements for unit test cases using pretrained transformers. In Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test. 54--64.
[34]
Arie Van Deursen, Leon Moonen, Alex Van Den Bergh, and Gerard Kok. 2001. Refactoring test code. In Proceedings of the 2nd international conference on extreme programming and flexible processes in software engineering (XP2001). Citeseer, 92--95.
[35]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[36]
Tássio Virgínio, Luana Almeida Martins, Larissa Rocha Soares, Railana Santana, Heitor Costa, and Ivan Machado. 2020. An empirical study of automatically-generated tests from the perspective of test smells. In Proceedings of the XXXIV Brazilian Symposium on Software Engineering. 92--96.
[37]
Tássio Virgínio, Railana Santana, Luana Almeida Martins, Larissa Rocha Soares, Heitor Costa, and Ivan Machado. 2019. On the influence of test smells on test coverage. In Proceedings of the XXXIII Brazilian Symposium on Software Engineering. 467--471.
[38]
Zhuokui Xie, Yinghao Chen, Chen Zhi, Shuiguang Deng, and Jianwei Yin. 2023. ChatUniTest: a ChatGPT-based automated unit test generation tool. arXiv preprint arXiv:2305.04764 (2023).

Index Terms

  1. From Fine-tuning to Output: An Empirical Investigation of Test Smells in Transformer-Based Test Code Generation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing
    April 2024
    1898 pages
    ISBN:9798400702433
    DOI:10.1145/3605098
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 May 2024

    Check for updates

    Author Tags

    1. test smells
    2. athenatest
    3. Methods2Test
    4. unit test
    5. transformer-based test code

    Qualifiers

    • Research-article

    Conference

    SAC '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 45
      Total Downloads
    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media