research-article

Improving classifier-based effort-aware software defect prediction by reducing ranking errors

Authors:

Martin Shepperd,

Ning LiAuthors Info & Claims

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

Pages 160 - 169

https://doi.org/10.1145/3661167.3661195

Published: 18 June 2024 Publication History

Abstract

Context: Software defect prediction utilizes historical data to direct software quality assurance resources to potentially problematic components. Effort-aware (EA) defect prediction prioritizes more bug-like components by taking cost-effectiveness into account. In other words, it is a ranking problem, however, existing ranking strategies based on classification, give limited consideration to ranking errors. Objective: Improve the performance of classifier-based EA ranking methods by focusing on ranking errors. Method: We propose a ranking score calculation strategy called EA-Z which sets a lower bound to avoid near-zero ranking errors. We investigate four primary EA ranking strategies with 16 classification learners, and conduct the experiments for EA-Z and the other four existing strategies. Results: Experimental results from 72 data sets show EA-Z is the best ranking score calculation strategy in terms of Recall@20% and Popt when considering all 16 learners. For particular learners, imbalanced ensemble learner UBag-svm and UBst-rf achieve top performance with EA-Z. Conclusion: Our study indicates the effectiveness of reducing ranking errors for classifier-based effort-aware defect prediction. We recommend using EA-Z with imbalanced ensemble learning.

References

[1]

E. Arisholm, L. Briand, and M. Fuglerud. 2007. Data mining techniques for building fault-proneness models in telecom java software. In The 18th IEEE International Symposium on Software Reliability (ISSRE’07). IEEE, 215–224.

[2]

E. Arisholm, L. Briand, and E. Johannessen. 2010. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software 83, 1 (2010), 2–17.

Digital Library

[3]

Y. Benjamini and D. Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. Annals of statistics (2001), 1165–1188.

[4]

G Boetticher. 2007. The PROMISE repository of empirical software engineering data. http://promisedata. org/repository (2007).

[5]

X. Chen, Y. Zhao, Q. Wang, and Z. Yuan. 2018. MULTI: Multi-objective effort-aware just-in-time software defect prediction. Information and Software Technology 93 (2018), 1–13.

Digital Library

[6]

M. D’Ambros, M. Lanza, and R. Robbes. 2012. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Software Engineering 17, 4–5 (2012), 531–577.

[7]

X. Du, T. Wang, L. Wang, W. Pan, C. Chai, X. Xu, B. Jiang, and J. Wang. 2022. CoreBug: Improving effort-aware bug prediction in software systems using generalized k-core decomposition in class dependency networks. Axioms 11, 5 (2022), 205.

[8]

W. Fu and T. Menzies. 2017. Revisiting unsupervised learning for defect prediction. In Proceedings of the 2017 11th joint meeting on foundations of software engineering. 72–83.

[9]

Y. Guo, M. Shepperd, and N. Li. 2018. Bridging Effort-aware Prediction and Strong Classification: A Just-in-time Software Defect Prediction Study. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (Gothenburg, Sweden) (ICSE ’18). ACM, New York, NY, USA, 325–326. https://doi.org/10.1145/3183440.3194992

Digital Library

[10]

T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell. 2012. A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering 38, 6 (2012), 1276–1304.

Digital Library

[11]

M. Hamill and K. Goseva-Popstojanova. 2017. Analyzing and predicting effort associated with finding and fixing software faults. Information and Software Technology 87 (2017), 1–18.

Digital Library

[12]

Q. Huang, X. Xia, and D. Lo. 2017. Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 159–170.

[13]

Q. Huang, X. Xia, and D. Lo. 2019. Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empirical Software Engineering 24 (2019), 2823–2862.

Digital Library

[14]

Y. Kamei, S. Matsumoto, A. Monden, K. Matsumoto, B. Adams, and A. Hassan. 2010. Revisiting common bug prediction findings using effort-aware models. In IEEE International Conference on Software Maintenance (ICSM2010). IEEE, 1–10.

[15]

Y. Kamei, E. Shihab, B. Adams, A. Hassan, A. Mockus, A. Sinha, and N. Ubayashi. 2013. A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering 39, 6 (2013), 757–773.

Digital Library

[16]

Y. Khatri and S. Singh. 2023. An effective feature selection based cross-project defect prediction model for software quality improvement. International Journal of System Assurance Engineering and Management (2023), 1–19.

[17]

S. Lessmann, B. Baesens, C. Mues, and S. Pietsch. 2008. Benchmarking classification models for software defect prediction: A proposed framework and novel findings. IEEE Transactions on Software Engineering 34, 4 (2008), 485–496.

Digital Library

[18]

F. Li, P. Yang, J. Keung, Wenhua Hu, H. Luo, and X. Yu. 2023. Revisiting ‘revisiting supervised methods for effort-aware cross-project defect prediction’. IET Software 17, 4 (2023), 472–495.

Digital Library

[19]

W. Li, W. Zhang, X. Jia, and Z. Huang. 2020. Effort-aware semi-supervised just-in-time defect prediction. Information and Software Technology 126 (2020), 106364.

[20]

J. Liu, Y. Zhou, Y. Yang, H. Lu, and B. Xu. 2017. Code churn: A neglected metric in effort-aware just-in-time defect prediction. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 11–19.

[21]

S. Liu, Z. Guo, Y. Li, C. Wang, L. Chen, Z. Sun, Y. Zhou, and B. Xu. 2022. Inconsistent defect labels: Essence, causes, and influence. IEEE Transactions on Software Engineering 49, 2 (2022), 586–610.

[22]

R. Malhotra. 2015. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27 (2015), 504–518.

Digital Library

[23]

T. Mende and R. Koschke. 2009. Revisiting the evaluation of defect prediction models. In Proceedings of the 5th International Conference on Predictor Models in Software Engineering (PROMISE 2009). ACM.

[24]

T. Mende and R. Koschke. 2010. Effort-aware defect prediction models. In 14th European Conference on Software Maintenance and Re-engineering (CSMR 2010). IEEE, 107–116.

[25]

C. Ni, X. Xia, D. Lo, X. Chen, and Q. Gu. 2020. Revisiting supervised and unsupervised methods for effort-aware cross-project defect prediction. IEEE Transactions on Software Engineering (2020).

[26]

C. Ni, X. Xia, D. Lo, X. Yang, and A. Hassan. 2022. Just-in-time defect prediction on JavaScript projects: A replication study. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 4 (2022), 1–38.

Digital Library

[27]

Y. Qu, J. Chi, and H. Yin. 2021. Leveraging developer information for efficient effort-aware bug prediction. Information and Software Technology 137 (2021), 106605.

[28]

J. Rao, X. Yu, C. Zhang, J. Zhou, and JianJ.wen Xiang. 2021. Learning to rank software modules for effort-aware defect prediction. In 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 372–380.

[29]

Q. Song, Y. Guo, and M. Shepperd. 2018. A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Transactions on Software Engineering 45, 12 (2018), 1253–1269.

[30]

C. Tantithamthavorn, S. McIntosh, A. Hassan, and K. Matsumoto. 2018. The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering 45, 7 (2018), 683–711.

[31]

M. Tomczak and E. Tomczak. 2014. The need to report effect size estimates revisited. An overview of some recommended measures of effect size. (2014).

[32]

X. Yang, D. Lo, X. Xia, and J. Sun. 2017. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology 87 (2017), 206–220.

Digital Library

[33]

X. Yang, D. Lo, X. Xia, Y. Zhang, and J. Sun. 2015. Deep learning for just-in-time defect prediction. In 2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE, 17–26.

[34]

X. Yang, H. Yu, G. Fan, and K. Yang. 2020. A differential evolution-based approach for effort-aware just-in-time software defect prediction. In Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages. 13–16.

[35]

X. Yang, H. Yu, G. Fan, and K. Yang. 2021. DEJIT: a differential evolution algorithm for effort-aware just-in-time software defect prediction. International Journal of Software Engineering and Knowledge Engineering 31, 03 (2021), 289–310.

[36]

Y. Yang, Y. Zhou, J. Liu, Y. Zhao, H. Lu, L. Xu, B. Xu, and H. Leung. 2016. Effort-aware just-in-time defect prediction: simple unsupervised models could be better than supervised models. In 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 157–168.

[37]

Y. Yang, Y. Zhou, H. Lu, L. Chen, Z. Chen, B. Xu, H. Leung, and Z. Zhang. 2014. Are slice-based cohesion metrics actually useful in effort-aware post-release fault-proneness prediction? An empirical study. IEEE Transactions on Software Engineering 41, 4 (2014), 331–357.

Digital Library

[38]

X. Yu, H. Dai, L. Li, X. Gu, J. Keung, K. Bennin, F. Li, and J. Liu. 2023. Finding the best learning to rank algorithms for effort-aware defect prediction. Information and Software Technology (2023), 107165.

[39]

X. Yu, L. Liu, L. Zhu, J. Keung, Z. Wang, and F. Li. 2023. A multi-objective effort-aware defect prediction approach based on NSGA-II. Applied Soft Computing 149 (2023), 110941.

Digital Library

[40]

X. Yu, J. Rao, L. Liu, G. Lin, W. Hu, J. Keung, J. Zhou, and J. Xiang. 2024. Improving effort-aware defect prediction by directly learning to rank software modules. Information and Software Technology 165 (2024), 107250.

Digital Library

[41]

Y. Zhou, Y. Yang, H. Lu, L. Chen, Y. Li, Y. Zhao, J. Qian, and B. Xu. 2018. How far we have progressed in the journey? an examination of cross-project defect prediction. ACM Transactions on Software Engineering and Methodology (TOSEM) 27, 1 (2018), 1–51.

Digital Library

Index Terms

Improving classifier-based effort-aware software defect prediction by reducing ranking errors
1. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis

Recommendations

Software defect prediction: do different classifiers find the same defects?

During the last 10 years, hundreds of different defect prediction models have been published. The performance of the classifiers used in these models is reported to be similar with models rarely performing above the predictive performance ceiling of ...
Software Defect Association Mining and Defect Correction Effort Prediction

Much current software defect prediction work focuses on the number of defects remaining in a software system. In this paper, we present association rule mining based methods to predict defect associations and defect correction effort. This is to help ...
A multi-objective effort-aware defect prediction approach based on NSGA-II
Abstract
Effort-Aware Defect Prediction (EADP) technique sorts software modules by the defect density and aims to find more bugs when testing a certain number of Lines of Code (LOC). The existing EADP methods ignore the number of required inspected ...
Highlights
- Propose a multi-objective effort-aware defect prediction approach based on NSGA-II.
- The main objective of effort-aware defect prediction is finding more bugs and inspecting as fewer modules as possible, when testing a specific number ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '24: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering

June 2024

728 pages

ISBN:9798400717017

DOI:10.1145/3661167

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EASE 2024

EASE 2024: 28th International Conference on Evaluation and Assessment in Software Engineering

June 18 - 21, 2024

Salerno, Italy

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
29
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)8

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten