Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ASE.2015.73acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Combining deep learning with information retrieval to localize buggy files for bug reports

Published: 09 November 2015 Publication History

Abstract

Bug localization refers to the automated process of locating the potential buggy files for a given bug report. To help developers focus their attention to those files is crucial. Several existing automated approaches for bug localization from a bug report face a key challenge, called lexical mismatch, in which the terms used in bug reports to describe a bug are different from the terms and code tokens used in source files. This paper presents a novel approach that uses deep neural network (DNN) in combination with rVSM, an information retrieval (IR) technique. rVSM collects the feature on the textual similarity between bug reports and source files. DNN is used to learn to relate the terms in bug reports to potentially different code tokens and terms in source files and documentation if they appear frequently enough in the pairs of reports and buggy files. Our empirical evaluation on real-world projects shows that DNN and IR complement well to each other to achieve higher bug localization accuracy than individual models. Importantly, our new model, HyLoc, with a combination of the features built from DNN, rVSM, and project's bug-fixing history, achieves higher accuracy than the state-of-the-art IR and machine learning techniques. In half of the cases, it is correct with just a single suggested file. Two out of three cases, a correct buggy file is in the list of three suggested files.

References

[1]
E. Arisoy, T. N. Sainath, B. Kingsbury, and B. Ramabhadran. Deep neural network language models. In Proceedings of the NAACL-HLT 2012 Workshop, WLM '12, pages 20--28. ACL, 2012.
[2]
H. U. Asuncion, A. U. Asuncion, and R. N. Taylor. Software traceability with topic modeling. In ICSE '10, pages 95--104. ACM, 2010.
[3]
Y. Bengio. Foundations and Trends in Machine Learning - Learning Deep Architectures for AI. NOW, the essence of knowledge, 2009.
[4]
J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In ASE'05, pp. 273--282. ACM.
[5]
D. Kim, Y. Tao, S. Kim, and A. Zeller. Where should we fix this bug? a two-phase recommendation model. IEEE Transactions on Software Engineering, 39(11):1597--1610, 2013.
[6]
S. K. Lukins, N. A. Kraft, and L. H. Etzkorn. Bug localization using latent dirichlet allocation. Inf. Softw. Technol., 52(9):972--990, 2010.
[7]
X. Ye, R. Bunescu, and C. Liu. Learning to rank relevant files for bug reports using domain knowledge. In FSE'14, pp. 689--699. ACM, 2014.
[8]
A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. N. Nguyen. A topic-based approach for narrowing the search space of buggy files from a bug report. In ASE '11, pp. 263--272. IEEE, 2011.
[9]
D. Poshyvanyk, Y.-G. Gueheneuc, A. Marcus, G. Antoniol, V. Rajlich. Feature location using probabilistic ranking of methods based on execution scenarios and info. retrieval. IEEE TSE, 33(6):420--432, 2007.
[10]
J. Zhou, H. Zhang, and D. Lo. Where should the bugs be fixed? - more accurate information retrieval-based bug localization based on bug reports. In ICSE '12, pages 14--24. IEEE Press, 2012.

Cited By

View all
  • (2024)Vulnerability Root Cause Function Locating For Java VulnerabilitiesProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3641225(444-446)Online publication date: 14-Apr-2024
  • (2024)RLocator: Reinforcement Learning for Bug LocalizationIEEE Transactions on Software Engineering10.1109/TSE.2024.345259550:10(2695-2708)Online publication date: 1-Oct-2024
  • (2024)A systematic mapping study of bug reproduction and localizationInformation and Software Technology10.1016/j.infsof.2023.107338165:COnline publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '15: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering
November 2015
935 pages
ISBN:9781509000241

Sponsors

In-Cooperation

  • IEEE CS

Publisher

IEEE Press

Publication History

Published: 09 November 2015

Check for updates

Qualifiers

  • Research-article

Conference

ASE '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Vulnerability Root Cause Function Locating For Java VulnerabilitiesProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3641225(444-446)Online publication date: 14-Apr-2024
  • (2024)RLocator: Reinforcement Learning for Bug LocalizationIEEE Transactions on Software Engineering10.1109/TSE.2024.345259550:10(2695-2708)Online publication date: 1-Oct-2024
  • (2024)A systematic mapping study of bug reproduction and localizationInformation and Software Technology10.1016/j.infsof.2023.107338165:COnline publication date: 1-Jan-2024
  • (2023)Capturing the long-distance dependency in the control flow graph via structural-guided attention for bug localizationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/249(2242-2250)Online publication date: 19-Aug-2023
  • (2023)APICom: Automatic API Completion via Prompt Learning and Adversarial Training-based Data AugmentationProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609450(259-269)Online publication date: 4-Aug-2023
  • (2023)A Hypothesis Testing-based Framework for Software Cross-modal Retrieval in Heterogeneous Semantic SpacesACM Transactions on Software Engineering and Methodology10.1145/359186832:5(1-28)Online publication date: 21-Jul-2023
  • (2023)Applications of natural language processing in software traceabilityJournal of Systems and Software10.1016/j.jss.2023.111616198:COnline publication date: 1-Apr-2023
  • (2023)Git command recommendations using crowd-sourced knowledgeInformation and Software Technology10.1016/j.infsof.2023.107199159:COnline publication date: 10-May-2023
  • (2022)API recommendation for machine learning libraries: how far are we?Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549124(370-381)Online publication date: 7-Nov-2022
  • (2022)How to better utilize code graphs in semantic code search?Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549087(722-733)Online publication date: 7-Nov-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media