Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2351676.2351687acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
Article

Duplicate bug report detection with a combination of information retrieval and topic modeling

Published: 03 September 2012 Publication History

Abstract

Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms.
This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug reports as the ones about the same technical issue(s). Trained with historical data including identified duplicate reports, it is able to learn the sets of different terms describing the same technical issues and to detect other not-yet-identified duplicate ones. Our empirical evaluation on real-world systems shows that DBTM improves the state-of-the-art approaches by up to 20% in accuracy.

References

[1]
J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In ICSE ’06, pages 361–370. ACM, 2006.
[2]
N. Bettenburg, S. Just, A. Schröter, C. Weiß, R. Premraj, and T. Zimmermann. Quality of bug reports in Eclipse. In ETX’07, pp. 21–25. ACM, 2007.
[3]
N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann. What makes a good bug report? In FSE’08, pages 308–318. ACM, 2008.
[4]
N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim. Duplicate bug reports considered harmful... really? In ICSM’08, pages 337–345. IEEE CS, 2008.
[5]
N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim. Extracting structural information from bug reports. In MSR’08, pages 27–30. ACM, 2008.
[6]
D. M. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet Allocation. J. Mach. Learn. Res., 3:993–1022, 2003.
[7]
G. Canfora, L. Cerulo. How software repositories can help in resolving a new change request. In WESRE’05.
[8]
G. Canfora and L. Cerulo. Supporting change request assignment in open source development. In SAC’06.
[9]
J. Chang and D. Blei. Relational topic models for document networks. In AIStats, 2009.
[10]
D. Cubranic and G. Murphy. Automatic bug triage using text categorization. In SEKE’04. KSI, 2004.
[11]
Ensemble averaging. http://en.wikipedia.org/wiki/Ensemble averaging.
[12]
M. Fischer, M. Pinzger, and H. Gall. Analyzing and relating bug report data for feature tracking. In WCRE’03. IEEE CS, 2003.
[13]
Gibbs sampling. wikipedia.org/wiki/Gibbs sampling
[14]
L. Hiew. Assisted detection of duplicate bug reports. Master’s thesis, University of British Columbia, 2006.
[15]
P. Hooimeijer and W. Weimer. Modeling bug report quality. In ASE’07, pages 34–43. ACM, 2007.
[16]
N. Jalbert and W. Weimer. Automated duplicate detection for bug tracking systems. In Int. Conf. on Dependable Systems and Networks, pp. 52–61. 2008.
[17]
S. Kim and E. J. Whitehead, Jr. How long did it take to fix bugs? In MSR’06, pages 173–174. ACM, 2006.
[18]
A. J. Ko, B. A. Myers, and D. H. Chau. A linguistic analysis of how people describe software problems. In VLHCC’06, pages 127–134. IEEE CS, 2006.
[19]
T. Menzies and A. Marcus. Automated severity assessment of software defect reports. In ICSM’08.
[20]
A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. N. Nguyen. A Topic-based Approach for Narrowing the Search Space of Buggy Files from a Bug Report. In ASE’11, pp. 263-272. IEEE CS, 2011.
[21]
A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In ICSE’03.
[22]
S. Robertson, H. Zaragoza, and M. Taylor. Simple BM25 extension to multiple weighted fields. In CIKM’04, pages 42–49. ACM, 2004.
[23]
P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In ICSE’07. IEEE CS, 2007.
[24]
C. Sun, D. Lo, S.-C. Khoo, and J. Jiang. Towards more accurate retrieval of duplicate bug reports. In ASE’11, pages 253–262. IEEE CS, 2011.
[25]
C. Sun, D. Lo, X. Wang, J. Jiang, and S.-C. Khoo. A discriminative model approach for accurate duplicate bug report retrieval. In ICSE’10. ACM, 2010.
[26]
X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun. An approach to detecting duplicate bug reports using natural language and execution information. In ICSE’08, pages 461–470. ACM, 2008.
[27]
W. Weimer. Patches as better bug reports. In GPCE’06, pages 181–190. ACM, 2006.
[28]
C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. How long will it take to fix this bug? In MSR’07.
[29]
H. Zaragoza, N. Craswell, M. J. Taylor, S. Saria, and S. Robertson, Microsoft Cambridge at TREC 13: Web and HARD tracks, in TREC, 2004.
[30]
M. Taylor, H. Zaragoza, N. Craswell, S. Robertson, and C. Burges, Optimisation methods for ranking functions with multiple parameters, In CIKM’06.

Cited By

View all
  • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
  • (2024)Architecture-Based Cross-Component Issue Management and Propagation AnalysisProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639814(145-149)Online publication date: 14-Apr-2024
  • (2024)Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug ReportsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639163(1-13)Online publication date: 20-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
September 2012
409 pages
ISBN:9781450312042
DOI:10.1145/2351676
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 September 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Duplicate Bug Reports
  2. Information Retrieval
  3. Topic Model

Qualifiers

  • Article

Conference

ASE'12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)5
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
  • (2024)Architecture-Based Cross-Component Issue Management and Propagation AnalysisProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639814(145-149)Online publication date: 14-Apr-2024
  • (2024)Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug ReportsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639163(1-13)Online publication date: 20-May-2024
  • (2024)An empirical study on the potential of word embedding techniques in bug report management tasksEmpirical Software Engineering10.1007/s10664-024-10510-329:5Online publication date: 25-Jul-2024
  • (2024)Vulnerability discovery based on source code patch commit mining: a systematic literature reviewInternational Journal of Information Security10.1007/s10207-023-00795-823:2(1513-1526)Online publication date: 6-Jan-2024
  • (2024)Mandelbug Classification Engine: Transfer Learning and NLP ApproachMachine Intelligence, Tools, and Applications10.1007/978-3-031-65392-6_3(29-39)Online publication date: 30-Jul-2024
  • (2024)A Similarity Approach for the Classification of Mitigations in Public Cybersecurity Repositories into NIST-SP 800-53 CatalogInformation Security Theory and Practice10.1007/978-3-031-60391-4_5(64-79)Online publication date: 18-Jun-2024
  • (2023)Code Reviewer Intelligent Prediction in Open Source Industrial Software ProjectComputer Modeling in Engineering & Sciences10.32604/cmes.2023.027466137:1(687-704)Online publication date: 2023
  • (2023)Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion UnderstandingIEEE Transactions on Software Engineering10.1109/TSE.2023.3285787(1-20)Online publication date: 2023
  • (2023)DupHunter: Detecting Duplicate Pull Requests in Fork-Based DevelopmentIEEE Transactions on Software Engineering10.1109/TSE.2023.323594249:4(2920-2940)Online publication date: 1-Apr-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media