short-paper

Evaluating Fault Localization and Program Repair Capabilities of Existing Closed-Source General-Purpose LLMs

Authors:

Shengbei Jiang,

Jie ZhangAuthors Info & Claims

LLM4Code '24: Proceedings of the 1st International Workshop on Large Language Models for Code

Pages 75 - 78

https://doi.org/10.1145/3643795.3648390

Published: 10 September 2024 Publication History

Abstract

Automated debugging is an emerging research field that aims to automatically find and repair bugs. In this field, Fault Localization (FL) and Automated Program Repair (APR) gain the most research efforts. Most recently, researchers have adopted pre-trained Large Language Models (LLMs) to facilitate FL and APR and their results are promising. However, the LLMs they used either vanished (such as Codex) or outdated (such as early versions of GPT). In this paper, we evaluate the performance of recent commercial closed-source general-purpose LLMs on FL and APR, i.e., ChatGPT 3.5, ERNIE Bot 3.5, and IFlytek Spark 2.0. We select three popular LLMs and evaluate them on 120 real-world Java bugs from the benchmark Defects4J. For FL and APR, we designed three kinds of prompts for each, considering different kinds of information. The results show that these LLMs could successfully locate 53.3% and correctly fix 12.5% of these bugs.

References

[1]

Samuel Benton, Xia Li, Yiling Lou, and Lingming Zhang. 2020. On the effectiveness of unified debugging: An extensive study on 16 program repair systems. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 907--918.

Digital Library

[2]

Jialun Cao, Meiziniu Li, Ming Wen, and Shing-chi Cheung. 2023. A study on prompt design, advantages and limitations of chatgpt for deep learning program repair. arXiv preprint arXiv:2304.08191 (2023).

[3]

Angela Fan, Beliz Gokkaya, Mark Harman, Mitya Lyubarskiy, Shubho Sengupta, Shin Yoo, and Jie M Zhang. 2023. Large Language Models for Software Engineering: Survey and Open Problems. arXiv preprint arXiv:2310.03533 (2023).

[4]

Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. 2023. Automated repair of programs from large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1469--1481.

Digital Library

[5]

Luca Gazzola, Daniela Micucci, and Leonardo Mariani. 2018. Automatic software repair: A survey. In Proceedings of the 40th International Conference on Software Engineering. 1219--1219.

Digital Library

[6]

René Just, Darioush Jalali, and Michael D Ernst. 2014. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis. 437--440.

Digital Library

[7]

Sungmin Kang, Gabin An, and Shin Yoo. 2023. A Preliminary Evaluation of LLM-Based Fault Localization. arXiv preprint arXiv:2308.05487 (2023).

[8]

Xia Li, Wei Li, Yuqun Zhang, and Lingming Zhang. 2019. Deepfl: Integrating multiple fault diagnosis dimensions for deep fault localization. In Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis. 169--180.

Digital Library

[9]

Yiling Lou, Ali Ghanbari, Xia Li, Lingming Zhang, Haotian Zhang, Dan Hao, and Lu Zhang. 2020. Can automated program repair refine fault localization? a unified debugging approach. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 75--87.

Digital Library

[10]

Yiling Lou, Qihao Zhu, Jinhao Dong, Xia Li, Zeyu Sun, Dan Hao, Lu Zhang, and Lingming Zhang. 2021. Boosting coverage-based fault localization via graph-based representation learning. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 664--676.

Digital Library

[11]

Xiangxin Meng, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2022. Improving fault localization and program repair with deep semantic features and transferred knowledge. In Proceedings of the 44th International Conference on Software Engineering. 1169--1180.

Digital Library

[12]

Yonghao Wu, Zheng Li, Jie M Zhang, Mike Papadakis, Mark Harman, and Yong Liu. 2023. Large language models in fault localisation. arXiv preprint arXiv:2308.15276 (2023).

[13]

Chunqiu Steven Xia, Yifeng Ding, and Lingming Zhang. 2023. The Plastic Surgery Hypothesis in the Era of Large Language Models. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering.

Digital Library

[14]

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023. Automated program repair in the era of large pre-trained language models. In Proceedings of the 45th International Conference on Software Engineering (ICSE 2023). Association for Computing Machinery.

Digital Library

[15]

Chunqiu Steven Xia and Lingming Zhang. 2022. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 959--971.

Digital Library

[16]

Chunqiu Steven Xia and Lingming Zhang. 2023. Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. arXiv preprint arXiv:2304.00385 (2023).

[17]

Huan Xie, Yan Lei, Meng Yan, Yue Yu, Xin Xia, and Xiaoguang Mao. 2022. A universal data augmentation approach for fault localization. In Proceedings of the 44th International Conference on Software Engineering. 48--60.

Digital Library

[18]

Yingfei Xiong and Bo Wang. 2022. L2S: A framework for synthesizing the most probable program under a specification. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 3 (2022), 1--45.

Digital Library

[19]

He Ye, Matias Martinez, Xiapu Luo, Tao Zhang, and Martin Monperrus. 2022. Selfapr: Self-supervised program repair with test execution diagnostics. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1--13.

Digital Library

[20]

He Ye, Matias Martinez, and Martin Monperrus. 2022. Neural program repair with execution-based backpropagation. In Proceedings of the 44th International Conference on Software Engineering. 1506--1518.

Digital Library

[21]

Muhan Zeng, Yiqian Wu, Zhentao Ye, Yingfei Xiong, Xin Zhang, and Lu Zhang. 2022. Fault localization via efficient probabilistic modeling of program semantics. In Proceedings of the 44th International Conference on Software Engineering. 958--969.

Digital Library

[22]

Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. 2023. A Survey of Learning-Based Automated Program Repair. ACM Transactions on Software Engineering and Methodology (2023).

[23]

Qihao Zhu, Zeyu Sun, Yuan-an Xiao, Wenjie Zhang, Kang Yuan, Yingfei Xiong, and Lu Zhang. 2021. A syntax-guided edit decoder for neural program repair. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 341--353.

Digital Library

Cited By

Widyasari RZhang TBouraffa AMaalej WLo D(2024)Explaining Explanations: An Empirical Study of Explanations in Code ReviewsACM Transactions on Software Engineering and Methodology10.1145/3708518Online publication date: 18-Dec-2024
https://doi.org/10.1145/3708518
Yang LYang CGao SWang WWang BZhu QChu XZhou JLiang GWang QChen JFilkov VRay BZhou M(2024)On the Evaluation of Large Language Models in Unit Test GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695529(1607-1619)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695529
Li JZhang MLi NWeyns DJin ZTei K(2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3686803
Show More Cited By

Index Terms

Evaluating Fault Localization and Program Repair Capabilities of Existing Closed-Source General-Purpose LLMs
1. Software and its engineering
  1. Software creation and management
    1. Search-based software engineering
    2. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Recommendations

ConDefects: A Complementary Dataset to Address the Data Leakage Concern for LLM-Based Fault Localization and Program Repair
FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering

With the growing interest on Large Language Models (LLMs) for fault localization and program repair, ensuring the integrity and generalizability of the LLM-based methods becomes paramount. The code in existing widely-adopted benchmarks for these tasks ...
Improving fault localization and program repair with deep semantic features and transferred knowledge
ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Automatic software debugging mainly includes two tasks of fault localization and automated program repair. Compared with the traditional spectrum-based and mutation-based methods, deep learning-based methods are proposed to achieve better performance for ...
Practical program repair via bytecode mutation
ISSTA 2019: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis

Automated Program Repair (APR) is one of the most recent advances in automated debugging, and can directly fix buggy programs with minimal human intervention. Although various advanced APR techniques (including search-based or semantic-based ones) have ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LLM4Code '24: Proceedings of the 1st International Workshop on Large Language Models for Code

April 2024

144 pages

ISBN:9798400705793

DOI:10.1145/3643795

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 September 2024

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

LLM4Code '24

Sponsor:

SIGSOFT

LLM4Code '24: 1st International Workshop on Large Language Models for Code

April 20, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
103
Total Downloads

Downloads (Last 12 months)103
Downloads (Last 6 weeks)16

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Widyasari RZhang TBouraffa AMaalej WLo D(2024)Explaining Explanations: An Empirical Study of Explanations in Code ReviewsACM Transactions on Software Engineering and Methodology10.1145/3708518Online publication date: 18-Dec-2024
https://doi.org/10.1145/3708518
Yang LYang CGao SWang WWang BZhu QChu XZhou JLiang GWang QChen JFilkov VRay BZhou M(2024)On the Evaluation of Large Language Models in Unit Test GenerationProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695529(1607-1619)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695529
Li JZhang MLi NWeyns DJin ZTei K(2024)Generative AI for Self-Adaptive Systems: State of the Art and Research RoadmapACM Transactions on Autonomous and Adaptive Systems10.1145/368680319:3(1-60)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3686803
Wang BWei JChen MChen CLin YZhang J(2024)A Systematic Exploration of Mutation‐Based Fault Localization FormulaeSoftware Testing, Verification and Reliability10.1002/stvr.1905Online publication date: 11-Nov-2024
https://doi.org/10.1002/stvr.1905

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten