Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Cleaning ground truth data in software task assignment

Published: 01 September 2022 Publication History

Abstract

Context:

In the context of collaborative software development, there are many application areas of task assignment such as assigning a developer to fix a bug, or assigning a code reviewer to a pull request. Most task assignment techniques in the literature build and evaluate their models based on datasets collected from real projects. The techniques invariably presume that these datasets reliably represent the “ground truth”. In a project dataset used to build an automated task assignment system, the recommended assignee for the task is usually assumed to be the best assignee for that task. However, in practice, the task assignee may not be the best possible task assignee, or even a sufficiently qualified one.

Objective:

We aim to clean up the ground truth by removing the samples that are potentially problematic or suspect with the assumption that removing such samples would reduce any systematic labeling bias in the dataset and lead to performance improvements.

Method:

We devised a debiasing method to detect potentially problematic samples in task assignment datasets. We then evaluated the method’s impact on the performance of seven task assignment techniques by comparing the Mean Reciprocal Rank (MRR) scores before and after debiasing. We used two different task assignment applications for this purpose: Code Reviewer Recommendation (CRR) and Bug Assignment (BA).

Results:

In the CRR application, we achieved an average MRR improvement of 18.17% for the three learning-based techniques tested on two datasets. No significant improvements were observed for the two optimization-based techniques tested on the same datasets. In the BA application, we achieved a similar average MRR improvement of 18.40% for the two learning-based techniques tested on four different datasets.

Conclusion:

Debiasing the ground truth data by removing suspect samples can help improve the performance of learning-based techniques in software task assignment applications.

Highlights

Devised a debiasing method to clean task assignment datasets.
Conducted experiments in two task assignment applications.
Debiasing the ground truth data improves learning-based techniques’ performance.

References

[1]
Smith R., Hale J., Parrish A., An empirical study using task assignment patterns to improve the accuracy of software effort estimation, IEEE Trans. Softw. Eng. 27 (3) (2001) 264–271,.
[2]
Hannebauer C., Patalas M., Stünkel S., Gruhn V., Automatically recommending code reviewers based on their expertise: An empirical comparison, in: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, in: ASE 2016, Association for Computing Machinery, New York, NY, USA, 2016, pp. 99–110,.
[3]
J. Anvik, L. Hiew, G.C. Murphy, Who should fix this bug? in: Proceedings of the 28th International Conference on Software Engineering, 2006, pp. 361–370.
[4]
Turhan B., Tosun Mısırlı A., Bener A., Empirical evaluation of the effects of mixed project data on learning defect predictors, Inf. Softw. Technol. 55 (6) (2013) 1101–1118,.
[5]
Weiss C., Premraj R., Zimmermann T., Zeller A., How long will it take to fix this bug?, in: Proceedings of the Fourth International Workshop on Mining Software Repositories, in: MSR ’07, IEEE Computer Society, USA, 2007, p. 1,.
[6]
Tuzun E., Erdogmus H., Baldassarre M.T., Felderer M., Feldt R., Turhan B., Ground truth deficiencies in software engineering: when codifying the past can be counterproductive, IEEE Softw. (2021),.
[7]
Sadowski C., Söderberg E., Church L., Sipko M., Bacchelli A., Modern code review : A case study at google, in: Proceedings of the 40th International Conference on Software Engineering Software Engineering in Practice - ICSE-SEIP 18, 2018,.
[9]
Balachandran V., Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation, in: 2013 35th International Conference on Software Engineering (ICSE), IEEE, 2013, pp. 931–940,.
[10]
Lee J.B., Ihara A., Monden A., Matsumoto K., Patch reviewer recommendation in OSS projects, in: 2013 20th Asia-Pacific Software Engineering Conference (APSEC), Vol. 2, IEEE, 2013, pp. 1–6,.
[11]
Thongtanunam P., Tantithamthavorn C., Kula R.G., Yoshida N., Iida H., Matsumoto K.-i., Who should review my code ?, in: 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), IEEE, 2015, pp. 141–150,.
[12]
Xia X., Lo D., Wang X., Yang X., Who should review this change ?, in: 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, 2015, pp. 261–270,.
[13]
Ouni A., Kula R.G., Inoue K., Search-based peer reviewers recommendation in modern code review, in: 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, 2016, pp. 367–377,.
[14]
Zanjani M.B., Member S., Automatically recommending peer reviewers in modern code review, IEEE Trans. Softw. Eng. 42 (6) (2016) 530–543,.
[15]
Sülün E., Tüzün E., Dogrusöz U., Reviewer recommendation using software artifact traceability graphs, in: PROMISE’19: Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering, 2019, pp. 66–75,.
[16]
Jiang J., He J.-H., Chen X.-Y., CoreDevRec: Automatic core member recommendation for contribution evaluation, J. Comput. Sci. Tech. 30 (5) (2015) 998–1016,.
[17]
Xia Z., Sun H., Jiang J., Wang X., Liu X., A hybrid approach to code reviewer recommendation with collaborative filtering, in: 2017 6th International Workshop on Software Mining, 2017, pp. 24–31.
[18]
Dogan E., Tuzun E., Tecimer K.A., Guvenir H.A., Investigating the validity of ground truth in code reviewer recommendation studies, in: 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2019, pp. 1–6,.
[19]
Kovalenko V., Tintarev N., Pasynkov E., Bird C., Bacchelli A., Does reviewer recommendation help developers?, IEEE Trans. Softw. Eng. (2018) 1,.
[20]
Cubranic D., Murphy G.C., Automatic bug triage using text categorization, in: SEKE 2004: Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering, KSI Press, 2004, pp. 92–97.
[21]
Jonsson L., Borg M., Broman D., Sandahl K., Eldh S., Runeson P., Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts 21, 2016,.
[22]
Hu H., Zhang H., Xuan J., Sun W., Effective bug triage based on historical bug-fix information, in: 2014 IEEE 25th International Symposium on Software Reliability Engineering, 2014, pp. 122–132,.
[23]
Naguib H., Narayan N., Brügge B., Helal D., Bug report assignee recommendation using activity profiles, in: 2013 10th Working Conference on Mining Software Repositories (MSR), 2013, pp. 22–30,.
[24]
A. Søgaard, B. Plank, D. Hovy, Selection Bias, Label Bias, and Bias in Ground Truth, in: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Tutorial Abstracts, 2014.
[25]
Cabrera G.F., Miller C.J., Schneider J., Systematic labeling bias: De-biasing where everyone is wrong, in: Proceedings - International Conference on Pattern Recognition, 2014,.
[26]
Sulun E., Tuzun E., Dogrusoz U., Rstrace+: Reviewer suggestion using software artifact traceability graphs, Inf. Softw. Technol. 130 (2021),. URL: https://www.sciencedirect.com/science/article/pii/S0950584920300021.
[27]
Fejzer M., Przymus P., Stencel K., Profile based recommendation of code reviewers, J. Intell. Inf. Syst. (2018),.
[28]
Lee S.-R., Heo M.-J., Lee C.-G., Kim M., Jeong G., Applying deep learning based automatic bug triager to industrial projects., in: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017, pp. 926–931. URL: https://search.ebscohost.com/login.aspx?direct=true&db=edscma&AN=edscma.3117776&site=eds-live.
[29]
Zaidi S.F.A., Awan F.M., Lee M., Woo H., Lee C.-G., Applying convolutional neural networks with different word representation techniques to recommend bug fixers, IEEE Access 8 (2020) 213729–213747,.
[30]
Cetin H.A., Dogan E., Tuzun E., A review of code reviewer recommendation studies: Challenges and future directions, Sci. Comput. Program. 208 (2021),. URL: https://www.sciencedirect.com/science/article/pii/S0167642321000459.
[31]
Rebai S., Amich A., Molaei S., Kessentini M., Kazman R., Multi-objective code reviewer recommendations: Balancing expertise, availability and collaborations, Autom. Softw. Engg. 27 (3–4) (2020) 301–328,.
[32]
Strand A., Gunnarson M., Britto R., Usman M., Using a context-aware approach to recommend code reviewers: Findings from an industrial case study, in: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice, in: ICSE-SEIP ’20, Association for Computing Machinery, New York, NY, USA, 2020, pp. 1–10,.
[33]
Lipcak J., Rossi B., A large-scale study on source code reviewer recommendation, in: 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA 2018), 2018.
[34]
Zhang T., Chen J., Yang G., Lee B., Luo X., Towards more accurate severity prediction and fixer recommendation of software bugs, J. Syst. Softw. 117 (2016) 166–184. URL: https://search.ebscohost.com/login.aspx?direct=true&db=edselp&AN=S0164121216000765&site=eds-live.
[35]
Xia X., Lo D., Ding Y., Al-Kofahi J.M., Nguyen T.N., Wang X., Improving automated bug triaging with specialized topic model, IEEE Trans. Softw. Eng. 43 (3) (2017) 272–297,.
[36]
Mani S., Sankaran A., Aralikatte R., DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging, Association for Computing Machinery, New York, NY, USA, 2019,.
[37]
Bhattacharya P., Neamtiu I., Shelton C.R., Automated, highly-accurate, bug assignment using machine learning and tossing graphs, J. Syst. Softw. 85 (10) (2012) 2275–2292,. URL: https://www.sciencedirect.com/science/article/pii/S0164121212001240. Automated Software Evolution.
[38]
Yang D., Lei Y., Mao X., Lo D., Xie H., Yan M., Is the ground truth really accurate? Dataset purification for automated program repair, in: 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2021, pp. 96–107,.
[39]
Tecimer K.A., Tüzün E., Dibeklioglu H., Erdogmus H., Detection and elimination of systematic labeling bias in code reviewer recommendation systems, in: Evaluation and Assessment in Software Engineering, in: EASE 2021, Association for Computing Machinery, New York, NY, USA, 2021, pp. 181–190,.
[40]
Mohanani R., Salman I., Turhan B., Rodríguez P., Ralph P., Cognitive biases in software engineering: A systematic mapping study, IEEE Trans. Softw. Eng. 46 (12) (2020) 1318–1339,.
[41]
Ralph P., Toward a theory of debiasing software development, Lecture Notes in Business Information Processing, vol. 93, 2010, pp. 92–105,.
[42]
Stacy W., Macmillan J., Cognitive bias in software engineering, Commun. ACM 38 (6) (1995) 57–63.
[43]
Smith E., Bahill A.T., Attribute substitution in systems engineering, Syst. Eng. 13 (2009) 130–148,.
[44]
Bird C., Bachmann A., Aune E., Duffy J., Bernstein A., Filkov V., Devanbu P., Fair and balanced? Bias in bug-fix datasets, 2009, pp. 121–130,.
[45]
Nguyen T.H.D., Adams B., Hassan A.E., A case study of bias in bug-fix datasets, 2010,.
[46]
Herzig K., Zeller A., The impact of tangled code changes, in: Proceedings of the 10th Working Conference on Mining Software Repositories, in: MSR ’13, IEEE Press, Piscataway, NJ, USA, 2013, pp. 121–130. URL: http://dl.acm.org/citation.cfm?id=2487085.2487113.
[47]
Ahluwalia A., Falessi D., Penta M.D., Snoring : a noise in defect prediction datasets, in: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), IEEE, 2019, pp. 63–67,.
[48]
Chen T.-H., Nagappan M., Shihab E., Hassan A.E., An empirical study of dormant bugs, in: Proceedings of the 11th Working Conference on Mining Software Repositories, in: MSR 2014, Association for Computing Machinery, New York, NY, USA, 2014, pp. 82–91,.
[49]
Rath M., Mäder P., The SEOSS 33 dataset — Requirements, bug reports, code history, and trace links for entire projects, Data Brief (2019),.
[50]
S. Dueñas, V. Cosentino, G. Robles, J.M. Gonzalez-Barahona, Perceval: Software project data at your will, in: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings, 2018, pp. 1–4.
[51]
Qamar K.A., Sulun E., Tuzun E., Towards a taxonomy of bug tracking process smells: A quantitative analysis, in: 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2021, pp. 138–147,.
[52]
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay E., Scikit-learn: Machine learning in python, J. Mach. Learn. Res. 12 (null) (2011) 2825–2830.
[53]
Mikolov T., Chen K., Corrado G., Dean J., Efficient estimation of word representations in vector space, in: Bengio Y., LeCun Y. (Eds.), 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013, URL: http://arxiv.org/abs/1301.3781.
[54]
Thongtanunam P., Kula R.G., Cruz A.E.C., Yoshida N., Iida H., Improving code review effectiveness through reviewer recommendations, in: Proceedings of the 7th International Workshop on Cooperative and Human Aspects of Software Engineering, in: CHASE 2014, Association for Computing Machinery, New York, NY, USA, 2014, pp. 119–122,.
[55]
Voorhees E.M., Tice D.M., The TREC-8 question answering track, in: Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00), European Language Resources Association (ELRA), Athens, Greece, 2000, URL: http://www.lrec-conf.org/proceedings/lrec2000/pdf/26.pdf.
[56]
Song H., Kim M., Park D., Shin Y., Lee J.-G., Learning from noisy labels with deep neural networks: A survey, 2020, arXiv preprint arXiv:2007.08199.
[57]
Xia X., Lo D., Shihab E., Wang X., Zhou B., Automatic, high accuracy prediction of reopened bugs, Autom. Softw. Eng. (2014),.
[58]
Fernández A., García S., Galar M., Prati R., Krawczyk B., Herrera F., Learning from imbalanced data sets, 2018,.
[59]
Runeson P., Höst M., Guidelines for conducting and reporting case study research in software engineering, Empir. Softw. Eng. 14 (2) (2008) 131,.

Cited By

View all
  • (2024)Sharing is Caring: A Practical Guide to FAIR(ER) Open Data ReleaseProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671468(6513-6522)Online publication date: 25-Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information and Software Technology
Information and Software Technology  Volume 149, Issue C
Sep 2022
142 pages

Publisher

Butterworth-Heinemann

United States

Publication History

Published: 01 September 2022

Author Tags

  1. Task assignment
  2. Code reviewer recommendation
  3. Bug assignment
  4. Ground truth
  5. Labeling bias elimination
  6. Systematic labeling bias
  7. Data cleaning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Sharing is Caring: A Practical Guide to FAIR(ER) Open Data ReleaseProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671468(6513-6522)Online publication date: 25-Aug-2024

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media