Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1368088.1368151acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

An approach to detecting duplicate bug reports using natural language and execution information

Published: 10 May 2008 Publication History

Abstract

An open source project typically maintains an open bug repository so that bug reports from all over the world can be gathered. When a new bug report is submitted to the repository, a person, called a triager, examines whether it is a duplicate of an existing bug report. If it is, the triager marks it as DUPLICATE and the bug report is removed from consideration for further work. In the literature, there are approaches exploiting only natural language information to detect duplicate bug reports. In this paper we present a new approach that further involves execution information. In our approach, when a new bug report arrives, its natural language information and execution information are compared with those of the existing bug reports. Then, a small number of existing bug reports are suggested to the triager as the most similar bug reports to the new bug report. Finally, the triager examines the suggested bug reports to determine whether the new bug report duplicates an existing bug report. We calibrated our approach on a subset of the Eclipse bug repository and evaluated our approach on a subset of the Firefox bug repository. The experimental results show that our approach can detect 67%-93% of duplicate bug reports in the Firefox bug repository, compared to 43%-72% using natural language information alone.

References

[1]
Anvik, J., Hiew, L., and Murphy, G. Who Should Fix This Bug? In Proc. ICSE., 2006, 371--380.
[2]
Anvik, J., Hiew, L., and Murphy, G. Coping with Open Bug Repositories. In Proc. of OOPSLA Workshop on Eclipse Technology eXchange (ETX), 2005, 35--39.
[3]
Clause, J. and Orso, A. A Technique for Enabling and Supporting Debugging of Field Failures. In Proc. ICSE, 2007, 261--270
[4]
Cubranic, D. and Murphy, G. Automatic Bug Triage Using Text Classification. In Proc. SEKE, 2004, 92--97.
[5]
Elbaum S. and M. Diep. Profiling Deployed Software: Assessing Strategies and Testing Opportunities. IEEE TSE, 31, 4: p312--327, 2005.
[6]
Francis, P., Leon, D., and Minch, M. Tree-Based Methods for classifying Software Failures. In Proc. ISSRE, 2004, 451--462.
[7]
Greengrass, E. Information Retrieval: A Survey, University of Maryland, Baltimore County, 2000
[8]
Hiew, L. Assisted Detection of Duplicate Bug Reports. Master's thesis, University of British Columbia, Canada, 2006.
[9]
Hildebrandt, R. and Zeller, A. Simplifying failure-inducing input. In Proc. ISSTA, 2000, 135--145.
[10]
Ko, A., Myers, B., and Chau, D.H. A Linguistic Analysis of How People Describe Software Problems in Bug Reports. In Proc. of IEEE Conf. on Visual Language and Human-Centric Computing (VL/HCC), 2006, 127--134.
[11]
Liblit B., Aiken A. and Zheng A. Bug Isolation via Remote Program Sampling. In Proc. PLDI, 2003, 15--26.
[12]
Lucca, D., Penta, D., Granada, S., An Approach to Classify Software Maintenance Requests. In Proc. ICSM, 2002, 93--102.
[13]
Manning, D., Schutze, H. Foundations of Statistical Natural Language Processing. Cambridge, USA, MIT Press 1999.
[14]
Mockus, A., Fielding, R., and Herbsleb, J. Two Case Studies of Open Source Software Development: Apache and Mozilla. ACM TOSEM, 11, 3: p309--346, 2002
[15]
Podgurski, A., Leon, D., and Francis, P. Automated Support for Classifying Software Failure Reports. In Proc. ICSE, 2003, 465--475.
[16]
Raghavan, V., Wong, M. A critical analysis of vector space model for information retrieval. Journal of the American Society for Information Science, 37, 5: p279--287, 1986.
[17]
Reiss, S., and Renieris, M. Encoding Program Executions. In Proc. ICSE, 2001, 221--230.
[18]
Runeson, P., Alexanderson, M., Nyholm, O. Detection of Duplicate Defect Reports Using Natural Language Processing. In Proc. ICSE, 2007, 499--510.
[19]
Sandusky, J., Gasser, L., and Ripoche, G. Bug Report Networks: Varieties, Strategies, and Impacts in an OSS Development Community, In Proc. MSR, 2004, 80--84.

Cited By

View all
  • (2024)Clustering and Prioritization of Web Crowdsourced Test Reports Based on Text ClassificationInternational Journal of Web Services Research10.4018/IJWSR.35799921:1(1-19)Online publication date: 26-Oct-2024
  • (2024)Enhancement of Recommendation Engine Technique for Bug System FixesJournal of Advances in Information Technology10.12720/jait.15.4.555-56415:4(555-564)Online publication date: 2024
  • (2024)Negative Results of Image Processing for Identifying Duplicate Questions on Stack OverflowProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686688(281-291)Online publication date: 24-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '08: Proceedings of the 30th international conference on Software engineering
May 2008
558 pages
ISBN:9781605580791
DOI:10.1145/1368088
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. duplicate bug report
  2. execution information
  3. information retrieval

Qualifiers

  • Research-article

Conference

ICSE '08
Sponsor:

Acceptance Rates

ICSE '08 Paper Acceptance Rate 56 of 370 submissions, 15%;
Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)5
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Clustering and Prioritization of Web Crowdsourced Test Reports Based on Text ClassificationInternational Journal of Web Services Research10.4018/IJWSR.35799921:1(1-19)Online publication date: 26-Oct-2024
  • (2024)Enhancement of Recommendation Engine Technique for Bug System FixesJournal of Advances in Information Technology10.12720/jait.15.4.555-56415:4(555-564)Online publication date: 2024
  • (2024)Negative Results of Image Processing for Identifying Duplicate Questions on Stack OverflowProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/3674805.3686688(281-291)Online publication date: 24-Oct-2024
  • (2024)Mobile Bug Report Reproduction via Global Search on the App UI ModelProceedings of the ACM on Software Engineering10.1145/36608241:FSE(2656-2676)Online publication date: 12-Jul-2024
  • (2024)Exploring the Role of Automation in Duplicate Bug Report Detection: An Industrial Case StudyProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024)10.1145/3644032.3644450(193-203)Online publication date: 15-Apr-2024
  • (2024)Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug ReportsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639163(1-13)Online publication date: 20-May-2024
  • (2024)A Comparative Analysis of Text Embedding Models for Bug Report Semantic Similarity2024 11th International Conference on Signal Processing and Integrated Networks (SPIN)10.1109/SPIN60856.2024.10512000(262-267)Online publication date: 21-Mar-2024
  • (2024)HYDBre: A Hybrid Retrieval Method for Detecting Duplicate Software Bug Reports2024 11th International Conference on Dependable Systems and Their Applications (DSA)10.1109/DSA63982.2024.00040(242-251)Online publication date: 2-Nov-2024
  • (2024)An empirical study on the potential of word embedding techniques in bug report management tasksEmpirical Software Engineering10.1007/s10664-024-10510-329:5Online publication date: 25-Jul-2024
  • (2024)App review driven collaborative bug findingEmpirical Software Engineering10.1007/s10664-024-10489-x29:5Online publication date: 26-Jul-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media