Abstract
Background. ROC (Receiver Operating Characteristic) curves are widely used to represent the performance (i.e., degree of correctness) of fault proneness models. AUC, the Area Under the ROC Curve is a quite popular performance metric, which summarizes into a single number the goodness of the predictions represented by the ROC curve. Alternative techniques have been proposed for evaluating the performance represented by a ROC curve: among these are RRA (Ratio of Relevant Areas) and \(\phi \) (alias Matthews Correlation Coefficient).
Objectives. In this paper, we aim at evaluating AUC as a performance metric, also with respect to alternative proposals.
Method. We carry out an empirical study by replicating a previously published fault prediction study and measuring the performance of the obtained faultiness models using AUC, RRA, and a recently proposed way of relating a specific kind of ROC curves to \(\phi \), based on iso-\(\phi \) ROC curves, i.e., ROC curves with constant \(\phi \). We take into account prevalence, i.e., the proportion of faulty modules in the dataset that is the object of predictions.
Results. AUC appears to provide indications that are concordant with \(\phi \) for fairly balanced datasets, while it is much more optimistic than \(\phi \) for quite imbalanced datasets. RRA’s indications appear to be moderately affected by the degree of balance in a dataset. In addition, RRA appears to agree with \(\phi \).
Conclusions. Based on the collected evidence, AUC does not seem to be suitable for evaluating the performance of fault proneness models when used with imbalanced datasets. In these cases, using RRA can be a better choice. At any rate, more research is needed to generalize these conclusions.
Partly supported by Fondo di Ricerca d’Ateneo dell’Università degli Studi dell’Insubria.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Arisholm, E., Briand, L.C., Fuglerud, M.: Data mining techniques for building fault proneness models in telecom java software. In: The 18th IEEE International Symposium on Software Reliability, 2007. ISSRE2007, pp. 215–224. IEEE (2007)
Beecham, S., Hall, T., Bowes, D., Gray, D., Counsell, S., Black, S.: A systematic review of fault prediction approaches used in software engineering. Technical report Lero-TR-2010-04, Lero (2010)
Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn. 30(7), 1145–1159 (1997)
Catal, C.: Performance evaluation metrics for software fault prediction studies. Acta Polytech. Hung. 9(4), 193–206 (2012)
Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1), 1–13 (2020)
Chicco, D., Jurman, G.: The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Min. 16(1), 1–23 (2023)
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences Lawrence Earlbaum Associates. Routledge, New York (1988)
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010
Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77(1), 103–123 (2009). https://doi.org/10.1007/s10994-009-5119-5
Hosmer, D.W., Jr., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression. Wiley, Hoboken (2013)
Lavazza, L., Morasca, S.: Comparing \(\phi \) and the F-measure as performance metrics for software-related classifications. EMSE 27(7), 185 (2022)
Lavazza, L., Morasca, S., Rotoloni, G.: On the reliability of the area under the roc curve in empirical software engineering. In: Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering (EASE). Association for Computing Machinery (ACM) (2023)
Matthews, B.W.: Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta (BBA)-Protein Struct. 405(2), 442–451 (1975)
Morasca, S., Lavazza, L.: On the assessment of software defect prediction models via ROC curves. Empir. Softw. Eng. 25(5), 3977–4019 (2020)
Moussa, R., Sarro, F.: On the use of evaluation measures for defect prediction studies. In: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM (2022)
Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Software Eng. 39(9), 1208–1215 (2013)
Singh, Y., Kaur, A., Malhotra, R.: Empirical validation of object-oriented metrics for predicting fault proneness models. Softw. Qual. J. 18(1), 3 (2010)
Uchigaki, S., Uchida, S., Toda, K., Monden, A.: An ensemble approach of simple regression models to cross-project fault prediction. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 476–481. IEEE (2012)
Yao, J., Shepperd, M.: Assessing software defection prediction performance: why using the Matthews correlation coefficient matters. In: Proceedings of the Evaluation and Assessment in Software Engineering, pp. 120–129 (2020)
Zhu, Q.: On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recogn. Lett. 136, 71–80 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lavazza, L., Morasca, S., Rotoloni, G. (2024). An Experience in the Evaluation of Fault Prediction. In: Kadgien, R., Jedlitschka, A., Janes, A., Lenarduzzi, V., Li, X. (eds) Product-Focused Software Process Improvement. PROFES 2023. Lecture Notes in Computer Science, vol 14483. Springer, Cham. https://doi.org/10.1007/978-3-031-49266-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-49266-2_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49265-5
Online ISBN: 978-3-031-49266-2
eBook Packages: Computer ScienceComputer Science (R0)