Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2351676.2351687acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
Article

Duplicate bug report detection with a combination of information retrieval and topic modeling

Published: 03 September 2012 Publication History
  • Get Citation Alerts
  • Abstract

    Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms.
    This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug reports as the ones about the same technical issue(s). Trained with historical data including identified duplicate reports, it is able to learn the sets of different terms describing the same technical issues and to detect other not-yet-identified duplicate ones. Our empirical evaluation on real-world systems shows that DBTM improves the state-of-the-art approaches by up to 20% in accuracy.

    References

    [1]
    J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In ICSE ’06, pages 361–370. ACM, 2006.
    [2]
    N. Bettenburg, S. Just, A. Schröter, C. Weiß, R. Premraj, and T. Zimmermann. Quality of bug reports in Eclipse. In ETX’07, pp. 21–25. ACM, 2007.
    [3]
    N. Bettenburg, S. Just, A. Schröter, C. Weiss, R. Premraj, and T. Zimmermann. What makes a good bug report? In FSE’08, pages 308–318. ACM, 2008.
    [4]
    N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim. Duplicate bug reports considered harmful... really? In ICSM’08, pages 337–345. IEEE CS, 2008.
    [5]
    N. Bettenburg, R. Premraj, T. Zimmermann, and S. Kim. Extracting structural information from bug reports. In MSR’08, pages 27–30. ACM, 2008.
    [6]
    D. M. Blei, A. Y. Ng, M. I. Jordan. Latent Dirichlet Allocation. J. Mach. Learn. Res., 3:993–1022, 2003.
    [7]
    G. Canfora, L. Cerulo. How software repositories can help in resolving a new change request. In WESRE’05.
    [8]
    G. Canfora and L. Cerulo. Supporting change request assignment in open source development. In SAC’06.
    [9]
    J. Chang and D. Blei. Relational topic models for document networks. In AIStats, 2009.
    [10]
    D. Cubranic and G. Murphy. Automatic bug triage using text categorization. In SEKE’04. KSI, 2004.
    [11]
    Ensemble averaging. http://en.wikipedia.org/wiki/Ensemble averaging.
    [12]
    M. Fischer, M. Pinzger, and H. Gall. Analyzing and relating bug report data for feature tracking. In WCRE’03. IEEE CS, 2003.
    [13]
    Gibbs sampling. wikipedia.org/wiki/Gibbs sampling
    [14]
    L. Hiew. Assisted detection of duplicate bug reports. Master’s thesis, University of British Columbia, 2006.
    [15]
    P. Hooimeijer and W. Weimer. Modeling bug report quality. In ASE’07, pages 34–43. ACM, 2007.
    [16]
    N. Jalbert and W. Weimer. Automated duplicate detection for bug tracking systems. In Int. Conf. on Dependable Systems and Networks, pp. 52–61. 2008.
    [17]
    S. Kim and E. J. Whitehead, Jr. How long did it take to fix bugs? In MSR’06, pages 173–174. ACM, 2006.
    [18]
    A. J. Ko, B. A. Myers, and D. H. Chau. A linguistic analysis of how people describe software problems. In VLHCC’06, pages 127–134. IEEE CS, 2006.
    [19]
    T. Menzies and A. Marcus. Automated severity assessment of software defect reports. In ICSM’08.
    [20]
    A. T. Nguyen, T. T. Nguyen, J. Al-Kofahi, H. V. Nguyen, and T. N. Nguyen. A Topic-based Approach for Narrowing the Search Space of Buggy Files from a Bug Report. In ASE’11, pp. 263-272. IEEE CS, 2011.
    [21]
    A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In ICSE’03.
    [22]
    S. Robertson, H. Zaragoza, and M. Taylor. Simple BM25 extension to multiple weighted fields. In CIKM’04, pages 42–49. ACM, 2004.
    [23]
    P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In ICSE’07. IEEE CS, 2007.
    [24]
    C. Sun, D. Lo, S.-C. Khoo, and J. Jiang. Towards more accurate retrieval of duplicate bug reports. In ASE’11, pages 253–262. IEEE CS, 2011.
    [25]
    C. Sun, D. Lo, X. Wang, J. Jiang, and S.-C. Khoo. A discriminative model approach for accurate duplicate bug report retrieval. In ICSE’10. ACM, 2010.
    [26]
    X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun. An approach to detecting duplicate bug reports using natural language and execution information. In ICSE’08, pages 461–470. ACM, 2008.
    [27]
    W. Weimer. Patches as better bug reports. In GPCE’06, pages 181–190. ACM, 2006.
    [28]
    C. Weiss, R. Premraj, T. Zimmermann, and A. Zeller. How long will it take to fix this bug? In MSR’07.
    [29]
    H. Zaragoza, N. Craswell, M. J. Taylor, S. Saria, and S. Robertson, Microsoft Cambridge at TREC 13: Web and HARD tracks, in TREC, 2004.
    [30]
    M. Taylor, H. Zaragoza, N. Craswell, S. Robertson, and C. Burges, Optimisation methods for ranking functions with multiple parameters, In CIKM’06.

    Cited By

    View all
    • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
    • (2024)Architecture-Based Cross-Component Issue Management and Propagation AnalysisProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639814(145-149)Online publication date: 14-Apr-2024
    • (2024)Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug ReportsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639163(1-13)Online publication date: 20-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASE '12: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
    September 2012
    409 pages
    ISBN:9781450312042
    DOI:10.1145/2351676
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 September 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Duplicate Bug Reports
    2. Information Retrieval
    3. Topic Model

    Qualifiers

    • Article

    Conference

    ASE'12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 82 of 337 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)53
    • Downloads (Last 6 weeks)7

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
    • (2024)Architecture-Based Cross-Component Issue Management and Propagation AnalysisProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639814(145-149)Online publication date: 14-Apr-2024
    • (2024)Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug ReportsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639163(1-13)Online publication date: 20-May-2024
    • (2024)Vulnerability discovery based on source code patch commit mining: a systematic literature reviewInternational Journal of Information Security10.1007/s10207-023-00795-823:2(1513-1526)Online publication date: 6-Jan-2024
    • (2024)A Similarity Approach for the Classification of Mitigations in Public Cybersecurity Repositories into NIST-SP 800-53 CatalogInformation Security Theory and Practice10.1007/978-3-031-60391-4_5(64-79)Online publication date: 18-Jun-2024
    • (2023)Code Reviewer Intelligent Prediction in Open Source Industrial Software ProjectComputer Modeling in Engineering & Sciences10.32604/cmes.2023.027466137:1(687-704)Online publication date: 2023
    • (2023)Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion UnderstandingIEEE Transactions on Software Engineering10.1109/TSE.2023.3285787(1-20)Online publication date: 2023
    • (2023)DupHunter: Detecting Duplicate Pull Requests in Fork-Based DevelopmentIEEE Transactions on Software Engineering10.1109/TSE.2023.323594249:4(2920-2940)Online publication date: 1-Apr-2023
    • (2023) To Follow or Not to Follow: Understanding Issue/Pull-Request Templates on GitHub IEEE Transactions on Software Engineering10.1109/TSE.2022.322405349:4(2530-2544)Online publication date: 1-Apr-2023
    • (2023)Enhancing Mobile App Bug Reporting via Real-Time Understanding of Reproduction StepsIEEE Transactions on Software Engineering10.1109/TSE.2022.317402849:3(1246-1272)Online publication date: 1-Mar-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media