research-article

TOGA: a neural method for test oracle generation

Authors:

Elizabeth Dinella,

Todd Mytkowicz,

Shuvendu K. LahiriAuthors Info & Claims

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Pages 2130 - 2141

https://doi.org/10.1145/3510003.3510141

Published: 05 July 2022 Publication History

Abstract

Testing is widely recognized as an important stage of the software development lifecycle. Effective software testing can provide benefits such as bug finding, preventing regressions, and documentation. In terms of documentation, unit tests express a unit's intended functionality, as conceived by the developer. A test oracle, typically expressed as an condition, documents the intended behavior of a unit under a given test prefix. Synthesizing a functional test oracle is a challenging problem, as it must capture the intended functionality rather than the implemented functionality.

In this paper, we propose TOGA (a neural method for <u>T</u>est <u>O</u>racle <u>G</u>ener<u>A</u>tion), a unified transformer-based neural approach to infer both exceptional and assertion test oracles based on the context of the focal method. Our approach can handle units with ambiguous or missing documentation, and even units with a missing implementation. We evaluate our approach on both oracle inference accuracy and functional bug-finding. Our technique improves accuracy by 33% over existing oracle inference approaches, achieving 96% overall accuracy on a held out test dataset. Furthermore, we show that when integrated with a automated test generation tool (EvoSuite), our approach finds 57 real world bugs in large-scale Java programs, including 30 bugs that are not found by any other automated testing method in our evaluation.

References

[1]

[n.d.]. Methods2Test. https://github.com/microsoft/methods2test.

[2]

Arianna Blasi, Alberto Goffi, Konstantin Kuznetsov, Alessandra Gorla, Michael D. Ernst, Mauro Pezzè, and Sergio Delgado Castellanos. 2018. Translating Code Comments to Procedure Specifications. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (Amsterdam, Netherlands) (ISSTA 2018). Association for Computing Machinery, New York, NY, USA, 242--253.

Digital Library

[3]

Arianna Blasi, Alessandra Gorla, Michael D. Ernst, Mauro Pezzè, and Antonio Carzaniga. 2021. MeMo: Automatically identifying metamorphic relations in Javadoc comments for test automation. Journal of Systems and Software 181 (2021), 111041.

Digital Library

[4]

Colin B. Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: multi-mode translation of natural language and Python code with transformers. arXiv:2010.03150 [cs.LG]

[5]

Henry Coles, Thomas Laurent, Christopher Henard, Mike Papadakis, and Anthony Ventresque. 2016. PIT: A Practical Mutation Testing Tool for Java (Demo). In Proceedings of the 25th International Symposium on Software Testing and Analysis (Saarbrücken, Germany) (ISSTA 2016). Association for Computing Machinery, New York, NY, USA, 449--452.

Digital Library

[6]

Christoph Csallner, Nikolai Tillmann, and Yannis Smaragdakis. 2008. DySy. In 2008 ACM/IEEE 30th International Conference on Software Engineering. IEEE, 281--290.

[7]

Ermira Daka and Gordon Fraser. 2014. A Survey on Unit Testing Practices and Problems. In 2014 IEEE 25th International Symposium on Software Reliability Engineering. 201--211.

Digital Library

[8]

Michael D Ernst, Jeff H Perkins, Philip J Guo, Stephen McCamant, Carlos Pacheco, Matthew S Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Science of computer programming 69, 1--3 (2007), 35--45.

[9]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv:2002.08155 [cs.CL]

[10]

Gordon Fraser and Andrea Arcuri. 2011. Evolutionary Generation of Whole Test Suites. In International Conference On Quality Software (QSIC). IEEE Computer Society, Los Alamitos, CA, USA, 31--40.

Digital Library

[11]

Gordon Fraser and Andrea Arcuri. 2014. A large-scale evaluation of automated unit test generation using evosuite. ACM Transactions on Software Engineering and Methodology (TOSEM) 24, 2 (2014), 1--42.

Digital Library

[12]

Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12--15, 2005, Vivek Sarkar and Mary W. Hall (Eds.). ACM, 213--223.

Digital Library

[13]

Alberto Goffi, Alessandra Gorla, Michael D. Ernst, and Mauro Pezzè. 2016. Automatic Generation of Oracles for Exceptional Behaviors. In Proceedings of the 25th International Symposium on Software Testing and Analysis (Saarbrücken, Germany) (ISSTA 2016). Association for Computing Machinery, New York, NY, USA, 213--224.

Digital Library

[14]

René Just, Darioush Jalali, and Michael D. Ernst. 2014. Defects4J: A Database of existing faults to enable controlled testing studies for Java programs. In ISSTA 2014, Proceedings of the 2014 International Symposium on Software Testing and Analysis. San Jose, CA, USA, 437--440. Tool demo.

[15]

Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and Evaluating Contextual Embedding of Source Code. arXiv:2001.00059 [cs.SE]

[16]

Stephan Lukasczyk, Florian Kroiß, and Gordon Fraser. 2020. Automated Unit Test Generation for Python. In Proceedings of the 12th Symposium on Search-based Software Engineering (SSBSE 2020, Bari, Italy, October 7--8) (Lecture Notes in Computer Science, Vol. 12420). Springer, 9--24.

Digital Library

[17]

Antonio Mastropaolo, Simone Scalabrino, Nathan Cooper, David Nader Palacio, Denys Poshyvanyk, Rocco Oliveto, and Gabriele Bavota. 2021. Studying the usage of text-to-text transfer transformer to support code-related tasks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 336--347.

Digital Library

[18]

Aleksandar Milicevic, Sasa Misailovic, Darko Marinov, and Sarfraz Khurshid. 2007. Korat: A Tool for Generating Structurally Complex Test Inputs. In 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20--26, 2007. IEEE Computer Society, 771--774.

Digital Library

[19]

Facundo Molina. 2020. Applying learning techniques to oracle synthesis. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1153--1157.

Digital Library

[20]

Facundo Molina, Pablo Ponzio, Nazareno Aguirre, and Marcelo Frias. 2021. EvoSpex: An evolutionary algorithm for learning postconditions. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 1223--1235.

Digital Library

[21]

Carlos Pacheco and Michael D Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion. 815--816.

[22]

Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball. 2007. Feedback-directed random test generation. In ICSE 2007, Proceedings of the 29th International Conference on Software Engineering. Minneapolis, MN, USA, 75--84.

Digital Library

[23]

Rahul Pandita, Xusheng Xiao, Hao Zhong, Tao Xie, Stephen Oney, and Amit Paradkar. 2012. Inferring method specifications from natural language API descriptions. In 2012 34th International Conference on Software Engineering (ICSE). 815--825.

[24]

Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: a concolic unit testing engine for C. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005, Lisbon, Portugal, September 5--9, 2005, Michel Wermelinger and Harald C. Gall (Eds.). ACM, 263--272.

Digital Library

[25]

Sina Shamshiri, René Just, José Miguel Rojas, Gordon Fraser, Phil McMinn, and Andrea Arcuri. 2015. Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 201--211.

Digital Library

[26]

Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. IntelliCode Compose: Code Generation Using Transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 1433--1443.

Digital Library

[27]

Alexey Svyatkovskiy, Todd Mytkowicz, Negar Ghorbani, Sarah Fakhoury, Elizabeth Dinella, Christian Bird, Neel Sundaresan, and Shuvendu Lahiri. 2021. MergeBERT: Program Merge Conflict Resolution via Neural Transformers. arXiv:2109.00084 [cs.SE]

[28]

Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T. Leavens. 2012. @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies. In Proceedings of the Fifth IEEE International Conference on Software Testing, Verification and Validation (ICST 2012). Montreal, Canada, 260--269.

[29]

Gregory Tassey. 2002. The Economic Impacts of Inadequate Infrastructure for Software Testing. (05 2002).

[30]

Valerio Terragni, Gunel Jahangirova, Paolo Tonella, and Mauro Pezzè. 2020. Evolutionary Improvement of Assertion Oracles (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 1178--1189.

Digital Library

[31]

Nikolai Tillmann and Jonathan de Halleux. 2008. Pex-White Box Test Generation for .NET. In Tests and Proofs - 2nd International Conference, TAP 2008, Prato, Italy, April 9--11, 2008. Proceedings (Lecture Notes in Computer Science, Vol. 4966), Bernhard Beckert and Reiner Hähnle (Eds.). Springer, 134--153.

[32]

Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, Shao Kun Deng, and Neel Sundaresan. 2021. Unit Test Case Generation with Transformers and Focal Context. arXiv:2009.05617 [cs.SE]

[33]

Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. Generating Accurate Assert Statements for Unit Test Cases using Pretrained Transformers. arXiv:2009.05634 [cs.SE]

[34]

Cody Watson, Michele Tufano, Kevin Moran, Gabriele Bavota, and Denys Poshyvanyk. 2020. On learning meaningful assert statements for unit test cases. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (Jun 2020).

Digital Library

[35]

Robert White and Jens Krinke. 2020. Reassert: Deep learning for assert generation. arXiv preprint arXiv:2011.09784 (2020).

[36]

Michal Zalewski. 2015. American Fuzzy Lop (AFL). http://lcamtuf.coredump.cx/afl/

[37]

Juan Zhai, Yu Shi, Minxue Pan, Guian Zhou, Yongxiang Liu, Chunrong Fang, Shiqing Ma, Lin Tan, and Xiangyu Zhang. 2020. C2S: Translating Natural Language Comments to Formal Program Specifications. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Virtual Event, USA) (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 25--37.

Digital Library

Cited By

Kotsiantis SVerykios VTzagarakis M(2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
https://doi.org/10.3390/electronics13040767
Yang LYang CGao SWang WWang BZhu QChu XZhou JLiang GWang QChen JFilkov VRay BZhou M(2024)On the Evaluation of Large Language Models in Unit Test GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695529(1607-1619)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695529
Xiao DGuo YLi YChen L(2024)Optimizing Search-Based Unit Test Generation with Large Language Models: An Empirical StudyProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674813(71-80)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674813
Show More Cited By

Recommendations

A Static Approach to Prioritizing JUnit Test Cases

Test case prioritization is used in regression testing to schedule the execution order of test cases so as to expose faults earlier in testing. Over the past few years, many test case prioritization techniques have been proposed in the literature. Most ...
Faster mutation testing inspired by test prioritization and reduction
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis

Mutation testing is a well-known but costly approach for determining test adequacy. The central idea behind the approach is to generate mutants, which are small syntactic transformations of the program under test, and then to measure for a given test ...
Spartan: a spectral and entropy-based partial-scan and test point insertion algorithm

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

May 2022

2508 pages

ISBN:9781450392211

DOI:10.1145/3510003

General Chair:
Matthew B Dwyer
University of Virginia
,
Program Chairs:
Daniela Damian
University of Victoria, Canada
,
Andreas Zeller
CISPA, Germany

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

ICSE '22

Sponsor:

SIGSOFT

ICSE '22: 44th International Conference on Software Engineering

May 21 - 29, 2022

Pennsylvania, Pittsburgh

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

42
Total Citations
View Citations
894
Total Downloads

Downloads (Last 12 months)433
Downloads (Last 6 weeks)68

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kotsiantis SVerykios VTzagarakis M(2024)AI-Assisted Programming Tasks Using Code Embeddings and TransformersElectronics10.3390/electronics1304076713:4(767)Online publication date: 15-Feb-2024
https://doi.org/10.3390/electronics13040767
Yang LYang CGao SWang WWang BZhu QChu XZhou JLiang GWang QChen JFilkov VRay BZhou M(2024)On the Evaluation of Large Language Models in Unit Test GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695529(1607-1619)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695529
Xiao DGuo YLi YChen L(2024)Optimizing Search-Based Unit Test Generation with Large Language Models: An Empirical StudyProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674813(71-80)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674813
Endres MFakhoury SChakraborty SLahiri S(2024)Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?Proceedings of the ACM on Software Engineering10.1145/36607911:FSE(1889-1912)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660791
He YHuang JYu HXie T(2024)An Empirical Study on Focal Methods in Deep-Learning-Based Approaches for Assertion GenerationProceedings of the ACM on Software Engineering10.1145/36607851:FSE(1750-1771)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660785
Yang ZLiu FYu ZKeung JLi JLiu SHong YMa XJin ZLi G(2024)Exploring and Unleashing the Power of Large Language Models in Automated Code TranslationProceedings of the ACM on Software Engineering10.1145/36607781:FSE(1585-1608)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660778
Yu XLiu LHu XKeung JXia XLo DChristakis MPradel M(2024)Practitioners’ Expectations on Automated Test GenerationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680386(1618-1630)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680386
Shin JHashtroudi SHemmati HWang SChristakis MPradel M(2024)Domain Adaptation for Code Model-Based Unit Test Case GenerationProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680354(1211-1222)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680354
He YHuang JRong YGuo YWang EChen HChristakis MPradel M(2024)UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program TestingProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680342(1061-1072)Online publication date: 11-Sep-2024
https://dl.acm.org/doi/10.1145/3650212.3680342
Zhang JCambronero JGulwani SLe VPiskac RSoares GVerbruggen G(2024)PyDex: Repairing Bugs in Introductory Python Assignments using LLMsProceedings of the ACM on Programming Languages10.1145/36498508:OOPSLA1(1100-1124)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649850
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents