An Experience in the Evaluation of Fault Prediction

Lavazza, Luigi; Morasca, Sandro; Rotoloni, Gabriele

doi:10.1007/978-3-031-49266-2_22

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14483))

Included in the following conference series:

International Conference on Product-Focused Software Process Improvement

686 Accesses

Abstract

Background. ROC (Receiver Operating Characteristic) curves are widely used to represent the performance (i.e., degree of correctness) of fault proneness models. AUC, the Area Under the ROC Curve is a quite popular performance metric, which summarizes into a single number the goodness of the predictions represented by the ROC curve. Alternative techniques have been proposed for evaluating the performance represented by a ROC curve: among these are RRA (Ratio of Relevant Areas) and $\phi $ (alias Matthews Correlation Coefficient).

Objectives. In this paper, we aim at evaluating AUC as a performance metric, also with respect to alternative proposals.

Method. We carry out an empirical study by replicating a previously published fault prediction study and measuring the performance of the obtained faultiness models using AUC, RRA, and a recently proposed way of relating a specific kind of ROC curves to $\phi $, based on iso-$\phi $ ROC curves, i.e., ROC curves with constant $\phi $. We take into account prevalence, i.e., the proportion of faulty modules in the dataset that is the object of predictions.

Results. AUC appears to provide indications that are concordant with $\phi $ for fairly balanced datasets, while it is much more optimistic than $\phi $ for quite imbalanced datasets. RRA’s indications appear to be moderately affected by the degree of balance in a dataset. In addition, RRA appears to agree with $\phi $.

Conclusions. Based on the collected evidence, AUC does not seem to be suitable for evaluating the performance of fault proneness models when used with imbalanced datasets. In these cases, using RRA can be a better choice. At any rate, more research is needed to generalize these conclusions.

Partly supported by Fondo di Ricerca d’Ateneo dell’Università degli Studi dell’Insubria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

References

Arisholm, E., Briand, L.C., Fuglerud, M.: Data mining techniques for building fault proneness models in telecom java software. In: The 18th IEEE International Symposium on Software Reliability, 2007. ISSRE2007, pp. 215–224. IEEE (2007)
Google Scholar
Beecham, S., Hall, T., Bowes, D., Gray, D., Counsell, S., Black, S.: A systematic review of fault prediction approaches used in software engineering. Technical report Lero-TR-2010-04, Lero (2010)
Google Scholar
Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Article Google Scholar
Catal, C.: Performance evaluation metrics for software fault prediction studies. Acta Polytech. Hung. 9(4), 193–206 (2012)
Google Scholar
Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)
Article Google Scholar
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 1–13 (2020)
Article Google Scholar
Chicco, D., Jurman, G.: The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 16(1), 1–23 (2023)
Article Google Scholar
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences Lawrence Earlbaum Associates. Routledge, New York (1988)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010
Article MathSciNet Google Scholar
Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77(1), 103–123 (2009). https://doi.org/10.1007/s10994-009-5119-5
Article MATH Google Scholar
Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, Hoboken (2013)
Book MATH Google Scholar
Lavazza, L., Morasca, S.: Comparing $\phi $ and the F-measure as performance metrics for software-related classifications. EMSE 27(7), 185 (2022)
Google Scholar
Lavazza, L., Morasca, S., Rotoloni, G.: On the reliability of the area under the roc curve in empirical software engineering. In: Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering (EASE). Association for Computing Machinery (ACM) (2023)
Google Scholar
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 405(2), 442–451 (1975)
Google Scholar
Morasca, S., Lavazza, L.: On the assessment of software defect prediction models via ROC curves. Empir. Softw. Eng. 25(5), 3977–4019 (2020)
Article Google Scholar
Moussa, R., Sarro, F.: On the use of evaluation measures for defect prediction studies. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM (2022)
Google Scholar
Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Software Eng. 39(9), 1208–1215 (2013)
Article Google Scholar
Singh, Y., Kaur, A., Malhotra, R.: Empirical validation of object-oriented metrics for predicting fault proneness models. Softw. Qual. J. 18(1), 3 (2010)
Article Google Scholar
Uchigaki, S., Uchida, S., Toda, K., Monden, A.: An ensemble approach of simple regression models to cross-project fault prediction. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 476–481. IEEE (2012)
Google Scholar
Yao, J., Shepperd, M.: Assessing software defection prediction performance: why using the Matthews correlation coefficient matters. In: Proceedings of the Evaluation and Assessment in Software Engineering, pp. 120–129 (2020)
Google Scholar
Zhu, Q.: On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recogn. Lett. 136, 71–80 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Università degli Studi dell’Insubria, 21100, Varese, Italy
Luigi Lavazza, Sandro Morasca & Gabriele Rotoloni

Authors

Luigi Lavazza
View author publications
You can also search for this author in PubMed Google Scholar
Sandro Morasca
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Rotoloni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luigi Lavazza .

Editor information

Editors and Affiliations

FHV Vorarlberg University of Applied Science, Dornbirn, Austria
Regine Kadgien
Fraunhofer Institute for Experimental Software Engineering, Kaiserslautern, Germany
Andreas Jedlitschka
FHV Vorarlberg University of Applied Science, Dornbirn, Austria
Andrea Janes
University of Oulu, Oulu, Finland
Valentina Lenarduzzi
University of Oulu, Oulu, Finland
Xiaozhou Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lavazza, L., Morasca, S., Rotoloni, G. (2024). An Experience in the Evaluation of Fault Prediction. In: Kadgien, R., Jedlitschka, A., Janes, A., Lenarduzzi, V., Li, X. (eds) Product-Focused Software Process Improvement. PROFES 2023. Lecture Notes in Computer Science, vol 14483. Springer, Cham. https://doi.org/10.1007/978-3-031-49266-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-49266-2_22
Published: 02 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49265-5
Online ISBN: 978-3-031-49266-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Experience in the Evaluation of Fault Prediction