research-article

Free access

Just Accepted

Automatically Learning a Precise Measurement for Fault Diagnosis Capability of Test Cases

Authors:

Qingyuan Liang,

Lu ZhangAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology

Accepted on 17 December 2024

https://doi.org/10.1145/3712189

Online AM: 15 January 2025 Publication History

Abstract

Prevalent Fault Localization (FL) techniques rely on tests to localize buggy program elements. Tests could be treated as fuel to further boost FL by providing more debugging information. Therefore, it is highly valuable to measure the Fault Diagnosis Capability (FDC) of a test for diagnosing faults, so as to select or generate tests to better help FL (i.e., FL-oriented test selection or FL-oriented test generation). To this end, researchers have proposed many FDC metrics, which serve as the selection criterion in FL-oriented test selection or the fitness function in FL-oriented test generation. Existing FDC metrics can be classified into result-agnostic and result-aware metrics depending on whether they take test results (i.e., passing or failing) as input. Although result-aware metrics perform better in test selection, they have restricted applications due to the input of test results, e.g., they cannot be applied to guide test generation. Moreover, all the existing FDC metrics are designed based on some predefined heuristics and have achieved limited FL performance due to their inaccuracy. To address these issues, in this paper, we reconsider result-agnostic metrics (i.e., metrics that do not take test results as input), and propose a novel result-agnostic metric RLFDC which predicts FDC values of tests through reinforcement learning. In particular, we treat FL results as reward signals, and train an FDC prediction model with the direct FL feedback to automatically learn a more accurate measurement rather than design one based on predefined heuristics. Finally, we evaluate the proposed RLFDC on Defects4J by applying the studied metrics to test selection and generation. According to the experimental results, the proposed RLFDC outperforms all the result-agnostic metrics in both test selection and generation, e.g., when applied to selecting human-written tests, RLFDC achieves 28.2% and 21.6% higher acc@1 and mAP values compared to the state-of-the-art result-agnostic metric TfD. Besides, RLFDC even achieves competitive performance compared to the state-of-the-art result-aware metric FDG in test selection.

References

[1]

2020. PyTorch. https://pytorch.org.

[2]

Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan JC Van Gemund. 2009. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software 82, 11 (2009), 1780–1792.

Digital Library

[3]

Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2006. An evaluation of similarity coefficients for software fault localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC’06). IEEE, 39–46.

Digital Library

[4]

Mohammad Amin Alipour, August Shi, Rahul Gopinath, Darko Marinov, and Alex Groce. 2016. Evaluating non-adequate test-case reduction. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 16–26.

Digital Library

[5]

Gabin An and Shin Yoo. 2022. FDG: a precise measurement of fault diagnosability gain of test cases. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 14–26.

Digital Library

[6]

Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd international conference on software engineering. 1–10.

Digital Library

[7]

Shay Artzi, Julian Dolby, Frank Tip, and Marco Pistoia. 2010. Directed test generation for effective fault localization. In Proceedings of the 19th international symposium on Software testing and analysis. 49–60.

Digital Library

[8]

Aritra Bandyopadhyay and Sudipto Ghosh. 2011. Proximity based weighting of test cases to improve spectrum based fault localization. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 420–423.

Digital Library

[9]

Benoit Baudry, Franck Fleurey, and Yves Le Traon. 2006. Improving test suites for efficient fault localization. In Proceedings of the 28th international conference on Software engineering. 82–91.

Digital Library

[10]

Borja Calvo and Guzmán Santafé Rodrigo. 2016. scmamp: Statistical comparison of multiple algorithms in multiple problems. The R Journal, Vol. 8/1, Aug. 2016 (2016).

[11]

José Campos, Rui Abreu, Gordon Fraser, and Marcelo d’Amorim. 2013. Entropy-based test generation for improved fault localization. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 257–267.

Digital Library

[12]

Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, and Shuvendu K Lahiri. 2022. Toga: A neural method for test oracle generation. In Proceedings of the 44th International Conference on Software Engineering. 2130–2141.

Digital Library

[13]

Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.

Digital Library

[14]

Liang Gong, David Lo, Lingxiao Jiang, and Hongyu Zhang. 2012. Diversity maximization speedup for fault localization. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 30–39.

Digital Library

[15]

Alberto Gonzalez-Sanchez, Rui Abreu, Hans-Gerhard Gross, and Arjan JC van Gemund. 2011. Prioritizing tests for fault localization through ambiguity group reduction. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 83–92.

Digital Library

[16]

Dan Hao, Tao Xie, Lu Zhang, Xiaoyin Wang, Jiasu Sun, and Hong Mei. 2010. Test input reduction for result inspection to facilitate fault localization. Automated software engineering 17 (2010), 5–31.

[17]

Hado Hasselt. 2010. Double Q-learning. Advances in neural information processing systems 23 (2010).

[18]

Shin Hong, Byeongcheol Lee, Taehoon Kwak, Yiru Jeon, Bongsuk Ko, Yunho Kim, and Moonzoo Kim. 2015. Mutation-based fault localization for real-world multilingual programs (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 464–475.

Digital Library

[19]

Soneya Binta Hossain, Antonio Filieri, Matthew B Dwyer, Sebastian Elbaum, and Willem Visser. 2023. Neural-based test oracle generation: A large-scale evaluation and lessons learned. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 120–132.

Digital Library

[20]

Ronald L Iman and James M Davenport. 1980. Approximations of the critical region of the fbietkan statistic. Communications in Statistics-Theory and Methods 9, 6 (1980), 571–595.

[21]

Wei Jin and Alessandro Orso. 2012. Bugredux: Reproducing field failures for in-house debugging. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 474–484.

Digital Library

[22]

Wei Jin and Alessandro Orso. 2013. F3: Fault localization for field failures. In Proceedings of the 2013 International Symposium on Software Testing and Analysis. 213–223.

Digital Library

[23]

James A Jones and Mary Jean Harrold. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering. 273–282.

Digital Library

[24]

Lou Jost. 2006. Entropy and diversity. Oikos 113, 2 (2006), 363–375.

[25]

René Just. [n. d.]. Defects4J Github repository. https://github.com/rjust/defects4j. Accessed on: 2024-08-18.

[26]

René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437–440.

Digital Library

[27]

Sungmin Kang, Gabin An, and Shin Yoo. 2024. A quantitative and qualitative evaluation of LLM-based explainable fault localization. Proceedings of the ACM on Software Engineering 1, FSE (2024), 1424–1446.

Digital Library

[28]

Xia Li, Wei Li, Yuqun Zhang, and Lingming Zhang. 2019. Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. In Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. 169–180.

Digital Library

[29]

Xiangyu Li, Shaowei Zhu, Marcelo d’Amorim, and Alessandro Orso. 2018. Enlightened debugging. In Proceedings of the 40th International Conference on Software Engineering. 82–92.

Digital Library

[30]

Yi Li, Shaohua Wang, and Tien Nguyen. 2021. Fault localization with code coverage representation learning. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 661–673.

Digital Library

[31]

Yun Lin, Jun Sun, Yinxing Xue, Yang Liu, and Jinsong Dong. 2017. Feedback-based debugging. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 393–403.

Digital Library

[32]

Zhongxin Liu, Kui Liu, Xin Xia, and Xiaohu Yang. 2023. Towards more realistic evaluation for neural test oracle generation. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 589–600.

Digital Library

[33]

Yiling Lou, Ali Ghanbari, Xia Li, Lingming Zhang, Haotian Zhang, Dan Hao, and Lu Zhang. 2020. Can automated program repair refine fault localization? a unified debugging approach. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 75–87.

Digital Library

[34]

Yiling Lou, Qihao Zhu, Jinhao Dong, Xia Li, Zeyu Sun, Dan Hao, Lu Zhang, and Lingming Zhang. 2021. Boosting coverage-based fault localization via graph-based representation learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 664–676.

Digital Library

[35]

Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the mutants: Mutating faulty programs for fault localization. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation. IEEE, 153–162.

Digital Library

[36]

Lee Naish, Hua Jie Lee, and Kotagiri Ramamohanarao. 2011. A model for spectra-based software diagnosis. ACM Transactions on software engineering and methodology (TOSEM) 20, 3 (2011), 1–32.

[37]

Carlos Pacheco and Michael D Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion. 815–816.

[38]

Mike Papadakis and Yves Le Traon. 2012. Using mutants to locate” unknown” faults. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation. IEEE, 691–700.

Digital Library

[39]

Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization. Software Testing, Verification and Reliability 25, 5-7 (2015), 605–628.

Digital Library

[40]

Alexandre Perez, Rui Abreu, and Arie Van Deursen. 2017. A test-suite diagnosability metric for spectrum-based fault localization approaches. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 654–664.

Digital Library

[41]

Jeremias Röβler, Gordon Fraser, Andreas Zeller, and Alessandro Orso. 2012. Isolating failure causes through test case generation. In Proceedings of the 2012 international symposium on software testing and analysis. 309–319.

Digital Library

[42]

Sina Shamshiri, René Just, José Miguel Rojas, Gordon Fraser, Phil McMinn, and Andrea Arcuri. 2015. Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 201–211.

Digital Library

[43]

Jeongju Sohn and Shin Yoo. 2017. Fluccs: Using code and change metrics to improve fault localization. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 273–283.

Digital Library

[44]

Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.

[45]

Iris Vessey. 1985. Expertise in debugging computer programs: A process analysis. International Journal of Man-Machine Studies 23, 5 (1985), 459–494.

[46]

W Eric Wong, Vidroha Debroy, Ruizhi Gao, and Yihao Li. 2013. The DStar method for effective software fault localization. IEEE Transactions on Reliability 63, 1 (2013), 290–308.

[47]

W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.

Digital Library

[48]

Huan Xie, Yan Lei, Meng Yan, Yue Yu, Xin Xia, and Xiaoguang Mao. 2022. A universal data augmentation approach for fault localization. In Proceedings of the 44th International Conference on Software Engineering. 48–60.

Digital Library

[49]

Huaqing Xiong, Lin Zhao, Yingbin Liang, and Wei Zhang. 2020. Finite-time analysis for double Q-learning. Advances in neural information processing systems 33 (2020), 16628–16638.

[50]

Zhaogui Xu, Shiqing Ma, Xiangyu Zhang, Shuofei Zhu, and Baowen Xu. 2018. Debugging with intelligence via probabilistic inference. In Proceedings of the 40th International Conference on Software Engineering. 1171–1181.

Digital Library

[51]

Jifeng Xuan and Martin Monperrus. 2014. Test case purification for improving fault localization. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. 52–63.

Digital Library

[52]

Aidan ZH Yang, Claire Le Goues, Ruben Martins, and Vincent Hellendoorn. 2024. Large language models for test-free fault localization. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12.

Digital Library

[53]

Shin Yoo and Mark Harman. 2010. Using hybrid algorithm for pareto efficient multi-objective test suite minimisation. Journal of Systems and Software 83, 4 (2010), 689–701.

Digital Library

[54]

Shin Yoo, Mark Harman, and David Clark. 2013. Fault localization prioritization: Comparing information-theoretic and coverage-based approaches. ACM Transactions on software engineering and methodology (TOSEM) 22, 3 (2013), 1–29.

Digital Library

[55]

Yanbing Yu, James A Jones, and Mary Jean Harrold. 2008. An empirical study of the effects of test-suite reduction on fault localization. In Proceedings of the 30th international conference on Software engineering. 201–210.

Digital Library

[56]

Zhuo Zhang, Yan Lei, Xiaoguang Mao, and Panpan Li. 2019. CNN-FL: An effective approach for localizing faults using convolutional neural networks. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 445–455.

[57]

Daming Zou, Jingjing Liang, Yingfei Xiong, Michael D Ernst, and Lu Zhang. 2019. An empirical study of fault localization families and their combinations. IEEE Transactions on Software Engineering 47, 2 (2019), 332–347.

Digital Library

Index Terms

Automatically Learning a Precise Measurement for Fault Diagnosis Capability of Test Cases
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

FDG: a precise measurement of fault diagnosability gain of test cases
ISSTA 2022: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis

The performance of many Fault Localisation (FL) techniques directly depends on the quality of the used test suites. Consequently, it is extremely useful to be able to precisely measure how much diagnostic power each test case can introduce when added to ...
Improving Fault Detection Capability by Selectively Retaining Test Cases during Test Suite Reduction

Software testing is a critical part of software development. As new test cases are generated over time due to software modifications, test suite sizes may grow significantly. Because of time and resource constraints for testing, test suite minimization ...
An effective fault aware test case prioritization by incorporating a fault localization technique
ESEM '10: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement

Prior coverage-based test case prioritization techniques aim to increase fault detection rates by ordering the test cases according to some coverage criteria. However, in practice, since detected faults are typically removed, test cases that already ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Just Accepted

EISSN:1557-7392

Table of Contents

Copyright © 2025 Copyright held by the owner/author(s).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 15 January 2025

Accepted: 17 December 2024

Revised: 01 December 2024

Received: 14 March 2024

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
41
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)41

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media