Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Free access
Just Accepted

Automatically Learning a Precise Measurement for Fault Diagnosis Capability of Test Cases

Online AM: 15 January 2025 Publication History


Prevalent Fault Localization (FL) techniques rely on tests to localize buggy program elements. Tests could be treated as fuel to further boost FL by providing more debugging information. Therefore, it is highly valuable to measure the Fault Diagnosis Capability (FDC) of a test for diagnosing faults, so as to select or generate tests to better help FL (i.e., FL-oriented test selection or FL-oriented test generation). To this end, researchers have proposed many FDC metrics, which serve as the selection criterion in FL-oriented test selection or the fitness function in FL-oriented test generation. Existing FDC metrics can be classified into result-agnostic and result-aware metrics depending on whether they take test results (i.e., passing or failing) as input. Although result-aware metrics perform better in test selection, they have restricted applications due to the input of test results, e.g., they cannot be applied to guide test generation. Moreover, all the existing FDC metrics are designed based on some predefined heuristics and have achieved limited FL performance due to their inaccuracy. To address these issues, in this paper, we reconsider result-agnostic metrics (i.e., metrics that do not take test results as input), and propose a novel result-agnostic metric RLFDC which predicts FDC values of tests through reinforcement learning. In particular, we treat FL results as reward signals, and train an FDC prediction model with the direct FL feedback to automatically learn a more accurate measurement rather than design one based on predefined heuristics. Finally, we evaluate the proposed RLFDC on Defects4J by applying the studied metrics to test selection and generation. According to the experimental results, the proposed RLFDC outperforms all the result-agnostic metrics in both test selection and generation, e.g., when applied to selecting human-written tests, RLFDC achieves 28.2% and 21.6% higher acc@1 and mAP values compared to the state-of-the-art result-agnostic metric TfD. Besides, RLFDC even achieves competitive performance compared to the state-of-the-art result-aware metric FDG in test selection.


2020. PyTorch. https://pytorch.org.
Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan JC Van Gemund. 2009. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software 82, 11 (2009), 1780–1792.
Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2006. An evaluation of similarity coefficients for software fault localization. In 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC’06). IEEE, 39–46.
Mohammad Amin Alipour, August Shi, Rahul Gopinath, Darko Marinov, and Alex Groce. 2016. Evaluating non-adequate test-case reduction. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 16–26.
Gabin An and Shin Yoo. 2022. FDG: a precise measurement of fault diagnosability gain of test cases. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 14–26.
Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. In Proceedings of the 33rd international conference on software engineering. 1–10.
Shay Artzi, Julian Dolby, Frank Tip, and Marco Pistoia. 2010. Directed test generation for effective fault localization. In Proceedings of the 19th international symposium on Software testing and analysis. 49–60.
Aritra Bandyopadhyay and Sudipto Ghosh. 2011. Proximity based weighting of test cases to improve spectrum based fault localization. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 420–423.
Benoit Baudry, Franck Fleurey, and Yves Le Traon. 2006. Improving test suites for efficient fault localization. In Proceedings of the 28th international conference on Software engineering. 82–91.
Borja Calvo and Guzmán Santafé Rodrigo. 2016. scmamp: Statistical comparison of multiple algorithms in multiple problems. The R Journal, Vol. 8/1, Aug. 2016 (2016).
José Campos, Rui Abreu, Gordon Fraser, and Marcelo d’Amorim. 2013. Entropy-based test generation for improved fault localization. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 257–267.
Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, and Shuvendu K Lahiri. 2022. Toga: A neural method for test oracle generation. In Proceedings of the 44th International Conference on Software Engineering. 2130–2141.
Gordon Fraser and Andrea Arcuri. 2011. Evosuite: automatic test suite generation for object-oriented software. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. 416–419.
Liang Gong, David Lo, Lingxiao Jiang, and Hongyu Zhang. 2012. Diversity maximization speedup for fault localization. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 30–39.
Alberto Gonzalez-Sanchez, Rui Abreu, Hans-Gerhard Gross, and Arjan JC van Gemund. 2011. Prioritizing tests for fault localization through ambiguity group reduction. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 83–92.
Dan Hao, Tao Xie, Lu Zhang, Xiaoyin Wang, Jiasu Sun, and Hong Mei. 2010. Test input reduction for result inspection to facilitate fault localization. Automated software engineering 17 (2010), 5–31.
Hado Hasselt. 2010. Double Q-learning. Advances in neural information processing systems 23 (2010).
Shin Hong, Byeongcheol Lee, Taehoon Kwak, Yiru Jeon, Bongsuk Ko, Yunho Kim, and Moonzoo Kim. 2015. Mutation-based fault localization for real-world multilingual programs (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 464–475.
Soneya Binta Hossain, Antonio Filieri, Matthew B Dwyer, Sebastian Elbaum, and Willem Visser. 2023. Neural-based test oracle generation: A large-scale evaluation and lessons learned. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 120–132.
Ronald L Iman and James M Davenport. 1980. Approximations of the critical region of the fbietkan statistic. Communications in Statistics-Theory and Methods 9, 6 (1980), 571–595.
Wei Jin and Alessandro Orso. 2012. Bugredux: Reproducing field failures for in-house debugging. In 2012 34th International Conference on Software Engineering (ICSE). IEEE, 474–484.
Wei Jin and Alessandro Orso. 2013. F3: Fault localization for field failures. In Proceedings of the 2013 International Symposium on Software Testing and Analysis. 213–223.
James A Jones and Mary Jean Harrold. 2005. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering. 273–282.
Lou Jost. 2006. Entropy and diversity. Oikos 113, 2 (2006), 363–375.
René Just. [n. d.]. Defects4J Github repository. https://github.com/rjust/defects4j. Accessed on: 2024-08-18.
René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 international symposium on software testing and analysis. 437–440.
Sungmin Kang, Gabin An, and Shin Yoo. 2024. A quantitative and qualitative evaluation of LLM-based explainable fault localization. Proceedings of the ACM on Software Engineering 1, FSE (2024), 1424–1446.
Xia Li, Wei Li, Yuqun Zhang, and Lingming Zhang. 2019. Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. In Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. 169–180.
Xiangyu Li, Shaowei Zhu, Marcelo d’Amorim, and Alessandro Orso. 2018. Enlightened debugging. In Proceedings of the 40th International Conference on Software Engineering. 82–92.
Yi Li, Shaohua Wang, and Tien Nguyen. 2021. Fault localization with code coverage representation learning. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 661–673.
Yun Lin, Jun Sun, Yinxing Xue, Yang Liu, and Jinsong Dong. 2017. Feedback-based debugging. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 393–403.
Zhongxin Liu, Kui Liu, Xin Xia, and Xiaohu Yang. 2023. Towards more realistic evaluation for neural test oracle generation. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 589–600.
Yiling Lou, Ali Ghanbari, Xia Li, Lingming Zhang, Haotian Zhang, Dan Hao, and Lu Zhang. 2020. Can automated program repair refine fault localization? a unified debugging approach. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 75–87.
Yiling Lou, Qihao Zhu, Jinhao Dong, Xia Li, Zeyu Sun, Dan Hao, Lu Zhang, and Lingming Zhang. 2021. Boosting coverage-based fault localization via graph-based representation learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 664–676.
Seokhyeon Moon, Yunho Kim, Moonzoo Kim, and Shin Yoo. 2014. Ask the mutants: Mutating faulty programs for fault localization. In 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation. IEEE, 153–162.
Lee Naish, Hua Jie Lee, and Kotagiri Ramamohanarao. 2011. A model for spectra-based software diagnosis. ACM Transactions on software engineering and methodology (TOSEM) 20, 3 (2011), 1–32.
Carlos Pacheco and Michael D Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion. 815–816.
Mike Papadakis and Yves Le Traon. 2012. Using mutants to locate” unknown” faults. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation. IEEE, 691–700.
Mike Papadakis and Yves Le Traon. 2015. Metallaxis-FL: mutation-based fault localization. Software Testing, Verification and Reliability 25, 5-7 (2015), 605–628.
Alexandre Perez, Rui Abreu, and Arie Van Deursen. 2017. A test-suite diagnosability metric for spectrum-based fault localization approaches. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 654–664.
Jeremias Röβler, Gordon Fraser, Andreas Zeller, and Alessandro Orso. 2012. Isolating failure causes through test case generation. In Proceedings of the 2012 international symposium on software testing and analysis. 309–319.
Sina Shamshiri, René Just, José Miguel Rojas, Gordon Fraser, Phil McMinn, and Andrea Arcuri. 2015. Do automatically generated unit tests find real faults? an empirical study of effectiveness and challenges (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 201–211.
Jeongju Sohn and Shin Yoo. 2017. Fluccs: Using code and change metrics to improve fault localization. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. 273–283.
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
Iris Vessey. 1985. Expertise in debugging computer programs: A process analysis. International Journal of Man-Machine Studies 23, 5 (1985), 459–494.
W Eric Wong, Vidroha Debroy, Ruizhi Gao, and Yihao Li. 2013. The DStar method for effective software fault localization. IEEE Transactions on Reliability 63, 1 (2013), 290–308.
W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.
Huan Xie, Yan Lei, Meng Yan, Yue Yu, Xin Xia, and Xiaoguang Mao. 2022. A universal data augmentation approach for fault localization. In Proceedings of the 44th International Conference on Software Engineering. 48–60.
Huaqing Xiong, Lin Zhao, Yingbin Liang, and Wei Zhang. 2020. Finite-time analysis for double Q-learning. Advances in neural information processing systems 33 (2020), 16628–16638.
Zhaogui Xu, Shiqing Ma, Xiangyu Zhang, Shuofei Zhu, and Baowen Xu. 2018. Debugging with intelligence via probabilistic inference. In Proceedings of the 40th International Conference on Software Engineering. 1171–1181.
Jifeng Xuan and Martin Monperrus. 2014. Test case purification for improving fault localization. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. 52–63.
Aidan ZH Yang, Claire Le Goues, Ruben Martins, and Vincent Hellendoorn. 2024. Large language models for test-free fault localization. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering. 1–12.
Shin Yoo and Mark Harman. 2010. Using hybrid algorithm for pareto efficient multi-objective test suite minimisation. Journal of Systems and Software 83, 4 (2010), 689–701.
Shin Yoo, Mark Harman, and David Clark. 2013. Fault localization prioritization: Comparing information-theoretic and coverage-based approaches. ACM Transactions on software engineering and methodology (TOSEM) 22, 3 (2013), 1–29.
Yanbing Yu, James A Jones, and Mary Jean Harrold. 2008. An empirical study of the effects of test-suite reduction on fault localization. In Proceedings of the 30th international conference on Software engineering. 201–210.
Zhuo Zhang, Yan Lei, Xiaoguang Mao, and Panpan Li. 2019. CNN-FL: An effective approach for localizing faults using convolutional neural networks. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 445–455.
Daming Zou, Jingjing Liang, Yingfei Xiong, Michael D Ernst, and Lu Zhang. 2019. An empirical study of fault localization families and their combinations. IEEE Transactions on Software Engineering 47, 2 (2019), 332–347.

Index Terms

  1. Automatically Learning a Precise Measurement for Fault Diagnosis Capability of Test Cases



    Information & Contributors


    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology Just Accepted
    Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].


    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Online AM: 15 January 2025
    Accepted: 17 December 2024
    Revised: 01 December 2024
    Received: 14 March 2024

    Check for updates

    Author Tags

    1. Fault localization
    2. Fault diagnosability
    3. Reinforcement learning


    • Research-article


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • 0
      Total Citations
    • 41
      Total Downloads
    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)41
    Reflects downloads up to 31 Jan 2025

    Other Metrics


    View Options

    View options


    View or Download as a PDF file.



    View online with eReader.


    Login options

    Full Access






    Share this Publication link

    Share on social media