Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/APSEC.2010.49guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Detecting Duplicate Bug Report Using Character N-Gram-Based Features

Published: 30 November 2010 Publication History

Abstract

We present an approach to identify duplicate bug reports expressed in free-form text. Duplicate reports needs to be identified to avoid a situation where duplicate reports get assigned to multiple developers. Also, duplicate reports can contain complementary information which can be useful for bug fixing. Automatic identification of duplicate reports (from thousands of existing reports in a bug repository) can increase the productivity of a Triager by reducing the amount of time a Triager spends in searching for duplicate bug reports of any incoming report. The proposed method uses character N-gram-based model for the task of duplicate bug report detection. Previous approaches are word-based whereas this study investigates the usefulness of low-level features based on characters which have certain inherent advantages (such as natural-language independence, robustness towards noisy data and effective handling of domain specific term variations) over word-based features for the problem of duplicate bug report detection. The proposed solution is evaluated on a publicly-available dataset consisting of more than 200 thousand bug reports from the open-source Eclipse project. The dataset consists of ground-truth (pre-annotated dataset having bug reports tagged as duplicate by the Triager). Empirical results and evaluation metrics quantifying retrieval performance indicate that the approach is effective.

Cited By

View all
  • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
  • (2024)Foliage: Nourishing Evolving Software by Characterizing and Clustering Field BugsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680363(1325-1337)Online publication date: 11-Sep-2024
  • (2022)BuildSheriffProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510132(312-324)Online publication date: 21-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
APSEC '10: Proceedings of the 2010 Asia Pacific Software Engineering Conference
November 2010
452 pages
ISBN:9780769542669

Publisher

IEEE Computer Society

United States

Publication History

Published: 30 November 2010

Author Tags

  1. Bug Report Analysis
  2. Duplicate Bug Detection
  3. Maintenance
  4. Software Engineering Task Automation
  5. Software Testing
  6. Text Classification

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
  • (2024)Foliage: Nourishing Evolving Software by Characterizing and Clustering Field BugsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680363(1325-1337)Online publication date: 11-Sep-2024
  • (2022)BuildSheriffProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510132(312-324)Online publication date: 21-May-2022
  • (2021)It Takes Two to TangoProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00091(957-969)Online publication date: 22-May-2021
  • (2021)Prioritize Crowdsourced Test Reports via Deep Screenshot UnderstandingProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00090(946-956)Online publication date: 22-May-2021
  • (2021)A Novel Method for Automated Suggestion of Similar Software Incidents Using 2-Stage Filtering: Findings on Primary DataDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86331-9_43(667-682)Online publication date: 5-Sep-2021
  • (2020)Duplicate Bug Report Detection Using Dual-Channel Convolutional Neural NetworksProceedings of the 28th International Conference on Program Comprehension10.1145/3387904.3389263(117-127)Online publication date: 13-Jul-2020
  • (2020)A Soft Alignment Model for Bug DeduplicationProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387470(43-53)Online publication date: 29-Jun-2020
  • (2020)Collaborative bug finding for Android appsProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380349(1335-1347)Online publication date: 27-Jun-2020
  • (2019)Duplicate Pull Request DetectionProceedings of the 11th Asia-Pacific Symposium on Internetware10.1145/3361242.3361254(1-10)Online publication date: 28-Oct-2019
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media