Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval

Published: 01 October 2018 Publication History

Abstract

Issue tracking systems (ITSs) allow software end-users and developers to file issue reports and change requests. Reports are frequently duplicately filed for the same software issue. The retrieval of these duplicate issue reports is a tedious manual task. Prior research proposed several automated approaches for the retrieval of duplicate issue reports. Recent versions of ITSs added a feature that does basic retrieval of duplicate issue reports at the filing time of an issue report in an effort to avoid the filing of duplicates as early as possible. This paper investigates the impact of this just-in-time duplicate retrieval on the duplicate reports that end up in the ITS of an open source project. In particular, we study the differences between duplicate reports for open source projects before and after the activation of this new feature. We show how the experimental results of prior research would vary given the new data after the activation of the just-in-time duplicate retrieval feature. We study duplicate issue reports from the Mozilla-Firefox, Mozilla-Core and Eclipse-Platform projects. In addition, we compare the performance of the state of the art of the automated retrieval of duplicate reports using two popular approaches (i.e., BM25F and REP). We find that duplicate issue reports after the activation of the just-in-time duplicate retrieval feature are less textually similar, have a greater identification delay and require more discussion to be retrieved as duplicate reports than duplicates before the activation of the feature. Prior work showed that REP outperforms BM25F in terms of Recall rate and Mean average precision. We observe that the performance gap between BM25F and REP becomes even larger after the activation of the just-in-time duplicate retrieval feature. We recommend that future studies focus on duplicates that were reported after the activation of the just-in-time duplicate retrieval feature as these duplicates are more representative of future incoming issue reports and therefore, give a better representation of the future performance of proposed approaches.

References

[1]
Aggarwal K, Rutgers T, Timbers F, Hindle A, Greiner R, Stroulia E (2015) Detecting duplicate bug reports with software engineering domain knowledge. In: Proceedings of the 22th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, pp 211-220.
[2]
Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp 183-192.
[3]
Anvik J, Hiew L, Murphy GC (2005) Coping with an open bug repository. In: Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse). ACM, pp 35-39.
[4]
Banerjee S, Syed Z, Helmick J, Culp M, Ryan K, Cukic B (2017) Automated triaging of very large bug repositories. Inf Softw Technol 89(Supplement C):1-13.
[5]
Berry MW, Castellanos M (2004) Survey of text mining. Comput Rev 45(9):548.
[6]
Bettenburg N, Just S, Schröter A, Weiß C, Premraj R, Zimmermann T (2007) Quality of bug reports in eclipse. In: Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange (Eclipse). ACM, pp 21-25.
[7]
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT/FSE). ACM, pp 308-318.
[8]
Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Duplicate bug reports considered harmful...really? In: Proceedings of the 24th International Conference on Software Maintenance (ICSM). IEEE, pp 337-345.
[9]
Borg M, Runeson P (2014) Changes, evolution, and bugs. Springer, Berlin, pp 477-509.
[10]
Borg M, Runeson P, Johansson J, Mäntylä MV (2014) A replicated study on duplicate detection: Using apache lucene to search among android defects. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ACM, New York, pp 8:1-8:4.
[11]
Bugzilla Release notes for Bugzilla 4.0 (2017) https://www.bugzilla.org/releases/4.0/release-notes.html. Last visited on 11/12/2017.
[12]
Cavalcanti YC, Neto PAdMS, Lucrédio D, Vale T, de Almeida ES, de Lemos Meira SR (2013) The bug report duplication problem: an exploratory study. Softw Qual J 21(1):39-66.
[13]
Cavalcanti YC, da Mota Silveira Neto PA, Machado IdC, Vale TF, de Almeida ES, Meira SRdL (2014) Challenges and opportunities for software change request repositories: a systematic mapping study. J Softw Evol Process 26(7):620-653.
[14]
Chowdhury G (2010) Introduction to modern information retrieval. Facet publishing, UK.
[15]
Gehan EA (1965) A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52(1-2):203-223.
[16]
Hamers L, Hemeryck Y, Herweyers G, Janssen M, Keters H, Rousseau R, Vanhoutte A (1989) Similarity measures in scientometric research: The Jaccard index versus Salton's cosine formula. Inf Process Manag 25(3):315-318.
[17]
Hassan AE (2008) The road ahead for mining software repositories. In: Proceedings of the Frontiers of Software Maintenance (FoSM). IEEE, pp 48-57.
[18]
Hindle A (2016) Stopping duplicate bug reports before they start with Continuous Querying for bug reports. PeerJ Prepr 4:e2373v1.
[19]
Hindle A, Alipour A, Stroulia E (2016) A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Softw Eng 21(2):368-410.
[20]
Jalbert N, Weimer W (2008) Automated duplicate detection for bug tracking systems. In: Proceedings of the 38th International Conference on Dependable Systems and Networks With FTCS and DCC (DSN). IEEE, pp 52-61.
[21]
Jira Duplicate Detection (2017) https://marketplace.atlassian.com/plugins/com.deniz.jira.similarissues/server/overview. Last visited on 11/12/2017.
[22]
Koponen T (2006) Life cycle of defects in open source software projects. In: Open Source Systems. Springer, pp 195-200.
[23]
Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR). ACM, pp 308-311.
[24]
Long JD, Feng D, Cliff N (2003) Ordinal analysis of behavioral data. Handbook of psychology.
[25]
Mantis Bug Tracker (2017) https://www.mantisbt.org/. Last visited on 11/12/2017.
[26]
Nagwani NK, Singh P (2009) Weight similarity measurement model based, object oriented approach for bug databases mining to detect similar and duplicate bugs. In: Proceedings of the 1st International Conference on Advances in Computing, Communication and Control (ICAC3). ACM, pp 202-207.
[27]
Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE). ACM, pp 70-79.
[28]
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 311-318.
[29]
Rakha MS, Shang W, Hassan AE (2016) Studying the needed effort for identifying duplicate issues. Empir Softw Eng (EMSE) 21(5):1960-1989.
[30]
Rakha MS, Bezemer CP, Hassan AE (2017) Revisiting the Performance of Automated Approaches for the Retrieval of Duplicate Reports in Issue Tracking Systems that Perform Just-in-Time Duplicate Retrieval: Online Appendix. https://github.com/SAILResearch/replication-jit_duplicates. Last visited on 11/12/2017.
[31]
Rakha MS, Bezemer CP, Hassan AE (2017) Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports. IEEE Trans Softw Eng (TSE) PP(99):1-27.
[32]
RedMine Flexible Project Management (2017) https://www.redmine.org/. Last visited on 11/12/2017.
[33]
Robertson S, Zaragoza H, Taylor M (2004) Simple BM25 extension to multiple weighted fields. In: Proceedings of the 13th International Conference on Information and Knowledge Management (CIKM). ACM, pp 42-49.
[34]
Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L (2006) Exploring methods for evaluating group differences on the nsse and other surveys: Are the t-test and Cohens'd indices the most appropriate choices. In: Annual Meeting of the Southern Association for Institutional Research.
[35]
Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th International Conference on Software Engineering (ICSE). IEEE Computer Society, pp 499-510.
[36]
Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using Latent Dirichlet Allocation. In: Proceedings of the 5th India Software Engineering Conference (ISEC). ACM, pp 125-130.
[37]
Strzalkowski T, Lin F, Wang J, Perez-Carballo J (1999) Evaluating natural language processing techniques in information retrieval. In: Natural language information retrieval. Springer, pp 113-145.
[38]
Sun C, Lo D, Wang X, Jiang J, Khoo SC (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32th ACM/IEEE International Conference on Software Engineering (ICSE). ACM, pp 45-54.
[39]
Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp 253-262.
[40]
Sun C, Le V, Zhang Q, Su Z (2016) Toward understanding compiler bugs in GCC and LLVM. In: Proceedings of the 25th International Symposium on Software Testing and Analysis (ISSTA). ACM, New York, pp 294-305.
[41]
Sureka A, Jalote P (2010) Detecting duplicate bug report using character n-gram-based features. In: Proceedings of the 17th Asia Pacific Software Engineering Conference (APSEC). IEEE Computer Society, pp 366-374.
[42]
Taylor M, Zaragoza H, Craswell N, Robertson S, Burges C (2006) Optimisation methods for ranking functions with multiple parameters. In: CIKM 2006: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. ACM, pp 585-593.
[43]
The Trac Project (2017) https://trac.edgewall.org/. Last visited on 11/12/2017.
[44]
Wang X, Zhang L, Xie T, Anvik J, Sun J (2008) An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th International Conference on Software Engineering (ICSE). ACM, pp 461-470.
[45]
Zhou J, Zhang H (2012) Learning to rank duplicate bug reports. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM). ACM, pp 852-861.
[46]
Zou J, Xu L, Yang M, Zhang X, Zeng J, Hirokawa S (2016) Automated duplicate bug report detection using multi-factor analysis. IEICE Trans Inf Syst E99.D(7):1762-1775.

Cited By

View all
  • (2023)Duplicate Bug Report Detection: How Far Are We?ACM Transactions on Software Engineering and Methodology10.1145/357604232:4(1-32)Online publication date: 27-May-2023
  • (2023)Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game DevelopersEmpirical Software Engineering10.1007/s10664-022-10256-w28:1Online publication date: 1-Jan-2023
  • (2022)Bugsby: a tool support for bug triage automationProceedings of the 2nd ACM International Workshop on AI and Software Testing/Analysis10.1145/3536168.3543301(17-20)Online publication date: 18-Jul-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 23, Issue 5
October 2018
551 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2018

Author Tags

  1. Duplicate bug reports
  2. Duplicate issue report retrieval
  3. Just-in-time duplicate issue report retrieval

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Duplicate Bug Report Detection: How Far Are We?ACM Transactions on Software Engineering and Methodology10.1145/357604232:4(1-32)Online publication date: 27-May-2023
  • (2023)Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game DevelopersEmpirical Software Engineering10.1007/s10664-022-10256-w28:1Online publication date: 1-Jan-2023
  • (2022)Bugsby: a tool support for bug triage automationProceedings of the 2nd ACM International Workshop on AI and Software Testing/Analysis10.1145/3536168.3543301(17-20)Online publication date: 18-Jul-2022
  • (2022)BuildSheriffProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510132(312-324)Online publication date: 21-May-2022
  • (2021)It Takes Two to TangoProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00091(957-969)Online publication date: 22-May-2021
  • (2020)A Soft Alignment Model for Bug DeduplicationProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387470(43-53)Online publication date: 29-Jun-2020
  • (2019)CTRASProceedings of the 41st International Conference on Software Engineering10.1109/ICSE.2019.00096(900-910)Online publication date: 25-May-2019
  • (2019)Preventing duplicate bug reports by continuously querying bug reportsEmpirical Software Engineering10.1007/s10664-018-9643-424:2(902-936)Online publication date: 1-Apr-2019

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media