Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

An empirical study on the issue reports with questions raised during the issue resolving process

Published: 01 April 2019 Publication History

Abstract

An issue report describes a bug or a feature request for a software system. When resolving an issue report, developers may discuss with other developers and/or the reporter to clarify and resolve the reported issue. During this process, questions can be raised by developers in issue reports. Having unnecessary questions raised may impair the efficiency to resolve the reported issues, since developers may have to wait a considerable amount of time before receiving the answers to their questions. In this paper, we perform an empirical study on the questions raised in the issue resolution process to understand the further delay caused by these questions. Our goal is to gain insights on the factors that may trigger questions in issue reports. We build prediction models to capture such issue reports when they are submitted. Our results indicate that it is feasible to give developers an early warning as to whether questions will be raised in an issue report at the issue report filling time. We examine the raised questions in 154,493 issue reports of three large-scale systems (i.e., Linux, Firefox and Eclipse). First, we explore the topics of the raised questions. Then, we investigate four characteristics of issue reports with raised questions: (i) resolving time, (ii) number of developers, (iii) comments, and (iv) reassignments. Finally, we build a prediction model to predict if questions are likely to be raised by a developer in an issue report. We apply the random forest, logistic regression and Naïve Bayes models to predict the possibility of raising questions in issue reports. Our prediction models obtain an Area Under Curve (AUC) value of 0.78, 0.65, and 0.70 in the Linux, Firefox, and Eclipse systems, respectively. The most important variables according to our prediction models are the number of Carbon Copies (CC), the issue severity and priority, and the reputation of the issue reporter.

References

[1]
Anbalagan P, Vouk M (2009) On predicting the time taken to correct bug reports in open source projects. In: Proceedings of the 2009 IEEE international conference on software maintenance, ICSM '09, pp 523-526.
[2]
Anvik J, Hiew L, Murphy GC (2006) Who should fix this bug? In: Proceedings of the 28th international conference on Software engineering. ACM, pp 361-370.
[3]
Arun R, Suresh V, Veni Madhavan CE, Narasimha Murthy MN (2010) On finding the natural number of topics with latent dirichlet allocation: some observations. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 391-402.
[4]
Barua A, Thomas SW, Hassan A (2014) What are developers talking about? An analysis of topics and trends in stack overflow. Empir Softw Eng 19(3):619-654.
[5]
Baysal O, Godfrey MW, Cohen R (2009) A bug you like; a framework for automated assignment of bugs. In: IEEE 17th international conference on program comprehension, 2009. ICPC'09. IEEE, pp 297-298.
[6]
Bettenburg N, Just S, Schröter A, Weiß C, Premraj R, Zimmermann T (2007) Quality of bug reports in eclipse. In: Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange. ACM, pp 21-25.
[7]
Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008a) What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 308-318.
[8]
Bettenburg N, Premraj R, Zimmermann T, Kim S (2008b) Duplicate bug reports considered harmful? really? In: IEEE international conference on software maintenance, 2008. ICSM 2008. IEEE, pp 337-345.
[9]
Bhattacharya P, Neamtiu I (2011) Bug-fix time prediction models; can we do better? In: Proceedings of the 8th working conference on mining software repositories (MSR), pp 207-210.
[10]
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993-1022.
[11]
Breu S, Premraj R, Sillito J, Zimmermann T (2009) Frequently asked questions in bug reports. Technical report, University of Calgary.
[12]
Breu S, Premraj R, Sillito J, Zimmermann T (2010) Information needs in bug reports: improving cooperation between developers and users. In: Proceedings of the 2010 ACM conference on computer supported cooperative work. ACM, pp 301-310.
[13]
Cao J, Xia T, Li J, Zhang Y, Tang S (2009) A density-based method for adaptive LDA model selection. Neurocomputing 72(7):1775-1781.
[14]
Cliff N (1993) Dominance statistics: ordinal analyses to answer ordinal questions. Psychol Bull 114:494-509.
[15]
da Costa DA, McIntosh S, Kulesza U, Hassan A, Abebe SL (2017a) An empirical study of the integration time of fixed issues. Empir Softw Eng J (EMSE) 23:1-50.
[16]
da Costa DA, McIntosh S, Treude C, Kulesza U, Hassan AE (2017b) The impact of rapid release cycles on the integration delay of fixed issues. Empir Softw Eng J (EMSE) 23:1-70.
[17]
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 233-240.
[18]
Dayton CM (1992) Logistic regression analysis. Stat 474-574, (https://www.researchgate.net/profile/C_Dayton/publication/268416984_Logistic_Regression_Analysis/links/550312ff0cf2d60c0e64c8ca/Logistic-Regression-Analysis.pdf).
[19]
Deveaud R, SanJuan E, Bellot P (2014) Accurate and effective latent concept modeling for ad hoc information retrieval. Doc Numér 17(1):61-84.
[20]
Domingos P, Pazzani M (1996) Beyond independence: conditions for the optimality of the simple Bayesian classifier. In: Proceedings of the 13th international conference on machine learning, pp 105-112.
[21]
Erdem A, Johnson WL, Marsella S (1998) Task oriented software understanding. In: 13th IEEE international conference on automated software engineering, 1998. Proceedings. IEEE, pp 230-239.
[22]
Joorabchi ME, Mirzaaghaei M, Mesbah A (2014) Works for me! Characterizing non-reproducible bug reports. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 62-71.
[23]
Francalanci C, Merlo F (2008) Empirical analysis of the bug fixing process in open source projects. In: IFIP international conference on open source systems. Springer, pp 187-196.
[24]
Ghapanchi AH, Aurum A (2011) Measuring the effectiveness of the defect-fixing process in open source software projects. In: 2011 44th Hawaii international conference on system sciences (HICSS). IEEE, pp 1-11.
[25]
Giger E, Pinzger M, Gall H (2010) Predicting the fix time of bugs. In: Proceedings of the 2nd international workshop on recommendation systems for software engineering (RSSE), pp 52-56.
[26]
Gousios G, Zaidman A, Storey M-A, Van Deursen A (2015) Work practices and challenges in pull-based development: the integrator's perspective. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 358-368.
[27]
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228-5235.
[28]
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010a) Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows. In: 2010 ACM/IEEE 32nd international conference on software engineering, vol 1. IEEE, pp 495-504.
[29]
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010b) Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows. In: Proceedings of the 32Nd ACM/IEEE international conference on software engineering (ICSE), pp 495-504.
[30]
Herbold S, Grabowski J, Waack S, Bünting U (2011) Improved bug reporting and reproduction through non-intrusive gui usage monitoring and automated replaying. In: 2011 IEEE fourth international conference on software testing, verification and validation workshops (ICSTW). IEEE, pp 232-241.
[31]
Herbsleb JD, Kuwana E (1993) Preserving knowledge in design projects: what designers need to know. In: Proceedings of the INTERACT'93 and CHI'93 conference on human factors in computing systems. ACM, pp 7-14.
[32]
Herraiz I, German DM, Gonzalez-Barahona JM, Robles G (2008) Towards a simplification of the bug report form in eclipse. In: Proceedings of the 2008 international working conference on mining software repositories (MSR), pp 145-148.
[33]
Hilbe JM (2009) Logistic regression models. CRC Press, Boca Raton.
[34]
Hooimeijer P, Weimer W (2007) Modeling bug report quality. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 34-43.
[35]
Jones KS (1997) Readings in information retrieval. Morgan Kaufmann, San Mateo.
[36]
Kamei Y, Matsumoto S, Monden A, Matsumoto K-I, Adams B, Hassan A (2010) Revisiting common bug prediction findings using effort-aware models. In: 2010 IEEE international conference on software maintenance (ICSM). IEEE, pp 1-10.
[37]
Kim S, Whitehead EJ Jr (2006) How long did it take to fix bugs? In: Proceedings of the 2006 international workshop on mining software repositories (MSR), pp 173-174.
[38]
Ko AJ, Chilana PK (2011) Design, discussion, and dissent in open bug reports. In: Proceedings of the 2011 iConference. ACM, pp 106-113.
[39]
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18-22.
[40]
Lotufo R, Passos L, Czarnecki K (2012) Towards improving bug tracking systems with game mechanisms. In: Proceedings of the 9th IEEE working conference on mining software repositories. IEEE Press, pp 2-11.
[41]
Marks L, Zou Y, Hassan A (2011) Studying the fix-time for bugs in large open source projects. In: Proceedings of the 7th international conference on predictive models in software engineering (PROMISE), pp 11:1-11:8.
[42]
McCallum DR, Peterson JL (1982) Computer-based readability indexes. In: Proceedings of the ACM'82 conference. ACM, pp 44-48.
[43]
McIntosh S, Kamei Y, Adams B, Hassan A (2015) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21:1-44.
[44]
Morakot C, Khanh DH, Truyen T, Aditya G (2015) Predicting delays in software projects using networked classification. In: 30th international conference on automated software engineering (ASE).
[45]
Morakot C, Dam HK, Tran T, Ghose A (2017) Predicting the delay of issues with due dates in software projects. Empir Softw Eng J (EMSE) 22:1-41.
[46]
Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering. ACM, pp 70-79.
[47]
Ohira M, Hassan A, Osawa N, Matsumoto K-I (2012) The impact of bug management patterns on bug fixing: a case study of eclipse projects. In: 2012 28th IEEE international conference on software maintenance (ICSM). IEEE, pp 264-273.
[48]
Panjer LD (2007) Predicting eclipse bug lifetimes. In: Proceedings of the fourth international workshop on mining software repositories (MSR). IEEE Computer Society, p 29.
[49]
Ponweiser M (2012) Latent Dirichlet allocation in R. WU Vienna University of Economics and Business.
[50]
Powers DM (2011) Evaluation: from precision, recall and f-measure to ROC, informedness, markedness and correlation.
[51]
Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 432-441.
[52]
Rakha MS, Shang W, Hassan A (2015) Studying the needed effort for identifying duplicate issues. Empir Softw Eng 21:1-30.
[53]
Rakha MS, Shang W, Hassan A (2016) Studying the needed effort for identifying duplicate issues. Empir Softw Eng 21(5):1960-1989.
[54]
Rakha MS, Bezemer C-P, Hassan A (2017) Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports, vol 43.
[55]
Rish I (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3. IBM, New York, pp 41-46.
[56]
Roehm T, Gurbanova N, Bruegge B, Joubert C, Maalej W (2013) Monitoring user interactions for supporting failure reproduction. In: 2013 IEEE 21st international conference on program comprehension (ICPC). IEEE, pp 73-82.
[57]
Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Should we really be using t-test and cohen's d for evaluating group differences on the nsse and other surveys? In: Annual meeting of the Florida Association of Institutional Research.
[58]
Runeson P, Alexandersson M, Nyholm O (2007) Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th international conference on software engineering. IEEE Computer Society, pp 499-510.
[59]
Saha RK, Khurshid S, Perry DE (2014) An empirical study of long lived bugs. In: 2014 Software evolution week - IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE), pp 144-153.
[60]
Sheskin DJ (2003) Handbook of parametric and nonparametric statistical procedures. CRC Press, Boca Raton.
[61]
Shihab E, Ihara A, Kamei Y, Ibrahim WM, Ohira M, Adams B, Hassan A, Matsumoto K-I (2010) Predicting re-opened bugs: a case study on the eclipse project. In: 2010 17th working conference on reverse engineering (WCRE). IEEE, pp 249-258.
[62]
Sillito J, Murphy GC, De Volder K (2006) Questions programmers ask during software evolution tasks. In: Proceedings of the 14th ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 23-34.
[63]
Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinform 9(1):1.
[64]
Sun C, Lo D, Wang X, Jiang J, Khoo S-C (2010) A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, vol 1. ACM, pp 45-54.
[65]
Sun C, Lo D, Khoo S-C, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: 2011 26th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 253-262.
[66]
Tantithamthavorn C, McIntosh S, Hassan A, Matsumoto K (2015) An empirical comparison of model validation techniques for defect prediction models. IEEE Trans Softw Eng 43.
[67]
Tian Y, Wijedasa D, Lo D, Le Gouesy C (2016) Learning to rank for bug report assignee recommendation. In: 2016 IEEE 24th international conference on program comprehension (ICPC). IEEE, pp 1-10.
[68]
Venkatesh PK, Wang S, Zhang F, Zou Y, Hassan A (2016) What concerns do client developers have when using web APIs? An empirical study of developer forums and stack overflow. In: IEEE international conference on web services (ICWS), pp 131-138.
[69]
Weiss C, Premraj R, Zimmermann T, Zeller A (2007) How long will it take to fix this bug? In: Proceedings of the fourth international workshop on mining software repositories (MSR), p 1.
[70]
Wikipedia (2017) Multicollinearity--wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Multicollinearity&oldid=762815943. [Online; accessed 6 Feb 2017].
[71]
Xia X, Lo D, Shihab E, Wang X, Zhou B (2015) Automatic, high accuracy prediction of reopened bugs. Autom Softw Eng 22(1):75-109.
[72]
Yin RK (2013) Case study research: design and methods. Sage, Newbury Park.
[73]
Zhang F, Khomh F, Zou Y, Hassan A (2012a) An empirical study on factors impacting bug fixing time. In: 2012 19th working conference on reverse engineering (WCRE), pp 225-234.
[74]
Zhang F, Khomh F, Zou Y, Hassan A (2012b) An empirical study of the effect of file editing patterns on software quality. In: 456-465.
[75]
Zhang H, Gong L, Versteeg S (2013) Predicting bug-fixing time: an empirical study of commercial software projects. In: Proceedings of the 2013 international conference on software engineering (ICSE), pp 1042- 1051.
[76]
Zimmermann T, Nagappan N, Guo PJ, Murphy B (2012) Characterizing and predicting which bugs get reopened. In: Proceedings of the 34th international conference on software engineering. IEEE Press, pp 1074-1083.

Cited By

View all
  • (2023)Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software SystemsACM Transactions on Software Engineering and Methodology10.1145/359380232:6(1-34)Online publication date: 30-Sep-2023
  • (2022)Recommending good first issues in GitHub OSS projectsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510196(1830-1842)Online publication date: 21-May-2022
  • (2020)A first look at good first issues on GitHubProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3409746(398-409)Online publication date: 8-Nov-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 24, Issue 2
April 2019
519 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 April 2019

Author Tags

  1. Bug fixing
  2. Empirical study
  3. Issue reports
  4. Questions

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Predicting the Change Impact of Resolving Defects by Leveraging the Topics of Issue Reports in Open Source Software SystemsACM Transactions on Software Engineering and Methodology10.1145/359380232:6(1-34)Online publication date: 30-Sep-2023
  • (2022)Recommending good first issues in GitHub OSS projectsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510196(1830-1842)Online publication date: 21-May-2022
  • (2020)A first look at good first issues on GitHubProceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3368089.3409746(398-409)Online publication date: 8-Nov-2020

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media