Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Taking the Confusion Out of Multinomial Confusion Matrices and Imbalanced Classes

  • Conference paper
  • First Online:
Data Mining (AusDM 2021)

Abstract

Classification is a fundamental task in machine learning, and the principled design and evaluation of classifiers is vital to create effective classification systems and to characterise their strengths and limitations in different contexts. Binary classifiers have a range of well-known measures to summarise performance, but characterising the performance of multinomial classifiers (systems that classify instances into one of many classes) is an open problem. While confusion matrices can summarise the empirical performance of multinomial classifiers, they are challenging to interpret at a glance—challenges compounded when classes are imbalanced.

We present a way to decompose multinomial confusion matrices into components that represent the prior and posterior probabilities of correctly classifying each class, and the intrinsic ability of the classifier to discriminate each class: the Bayes factor or likelihood ratio of a positive (or negative) outcome. This approach uses the odds formulation of Bayes’ rule and leads to compact, informative visualisations of confusion matrices, able to accommodate far more classes than existing methods. We call this method confusR and demonstrate its utility on 2-, 17-, and 379-class confusion matrices. We describe how confusR could be used in the formative assessment of classification systems, investigation of algorithmic fairness, and algorithmic auditing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Change history

  • 09 December 2021

    In the originally published version of chapter 2, the Table 1. contained an error in a formula. The formula error in Table 1. has been corrected.

References

  1. Alsallakh, B., Hanbury, A., Hauser, H., Miksch, S., Rauber, A.: Visual methods for analyzing probabilistic classification data. IEEE Trans. Visual Comput. Graphics 20(12), 1703–1712 (2014). https://doi.org/10.1109/TVCG.2014.2346660

    Article  Google Scholar 

  2. Caelen, O.: A Bayesian interpretation of the confusion matrix. Ann. Math. Artif. Intell. 81(3), 429–450 (2017). https://doi.org/10.1007/s10472-017-9564-8

  3. Delgado, R., Tibau, X.A.: Why Cohen’s Kappa should be avoided as performance measure in classification. PLoS ONE 14(9), e0222916 (2019). https://doi.org/10.1371/journal.pone.0222916

    Article  Google Scholar 

  4. Diri, B., Albayrak, S.: Visualization and analysis of classifiers performance in multi-class medical data. Expert Syst. Appl. 34(1), 628–634 (2008). https://doi.org/10.1016/j.eswa.2006.10.016

    Article  Google Scholar 

  5. Dujardin, B., Van den Ende, J., Van Gompel, A., Unger, J.P., Van der Stuyft, P.: Likelihood ratios: a real improvement for clinical decision making? Eur. J. Epidemiol. 10(1), 29–36 (1994). https://doi.org/10.1007/BF01717448

    Article  Google Scholar 

  6. Eddy, D.M.: Probabilistic reasoning in clinical medicine: Problems and opportunities. In: Tversky, A., Kahneman, D., Slovic, P. (eds.) Judgment under Uncertainty: Heuristics and Biases, pp. 249–267. Cambridge University Press, Cambridge (1982). https://doi.org/10.1017/CBO9780511809477.019

  7. Etz, A., Wagenmakers, E.J.: J. B. S. Haldane’s contribution to the bayes factor hypothesis test. Stat. Sci. 32(2), 313–329 (2017)

    Google Scholar 

  8. Fagan, T.: Nomogram for bayes’s theorem. N. Engl. J. Med. 293(5), 257–257 (1975). https://doi.org/10.1056/NEJM197507312930513

    Article  Google Scholar 

  9. Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010

    Article  MathSciNet  Google Scholar 

  10. Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30(1), 27–38 (2009). https://doi.org/10.1016/j.patrec.2008.08.010

    Article  Google Scholar 

  11. Glas, A.S., Lijmer, J.G., Prins, M.H., Bonsel, G.J., Bossuyt, P.M.M.: The diagnostic odds ratio: a single indicator of test performance. J. Clin. Epidemiol. 56(11), 1129–1135 (2003). https://doi.org/10.1016/S0895-4356(03)00177-X

    Article  Google Scholar 

  12. Gorodkin, J.: Comparing two K-category assignments by a K-category correlation coefficient. Comput. Biol. Chem. 28(5), 367–374 (2004). https://doi.org/10.1016/j.compbiolchem.2004.09.006

    Article  MATH  Google Scholar 

  13. Grimes, D.A., Schulz, K.F.: Refining clinical diagnosis with likelihood ratios. The Lancet 365(9469), 1500–1505 (2005). https://doi.org/10.1016/S0140-6736(05)66422-7

    Article  Google Scholar 

  14. Hinterreiter, A., et al.: ConfusionFlow: a model-agnostic visualization for temporal analysis of classifier confusion. IEEE Trans. Visualization Comput. Graph., 1 (2020). https://doi.org/10.1109/TVCG.2020.3012063

  15. Jurman, G., Riccadonna, S., Furlanello, C.: A comparison of MCC and CEN error measures in multi-class prediction. PLoS ONE 7(8) (2012). https://doi.org/10.1371/journal.pone.0041882

  16. Kuhn, M.: Building predictive models in r using the caret package. J. Stat. Softw. Articles 28(5), 1–26 (2008). https://doi.org/10.18637/jss.v028.i05

  17. Lu, M.Y., et al.: AI-based pathology predicts origins for cancers of unknown primary. Nature 594(7861), 106–110 (2021). https://doi.org/10.1038/s41586-021-03512-4

    Article  Google Scholar 

  18. Luque, A., Carrasco, A., Martín, A., de las Heras, A.: The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn. 91, 216–231 (2019). https://doi.org/10.1016/j.patcog.2019.02.023

  19. Maier-Hein, L., Eisenmann, M., Reinke, A., Onogur, S., Stankovic, M., Scholz, P., et al.: Why rankings of biomedical image analysis competitions should be interpreted with care. Nat. Commun. 9(1), 5217 (2018). https://doi.org/10.1038/s41467-018-07619-7

    Article  Google Scholar 

  20. Mullick, S.S., Datta, S., Dhekane, S.G., Das, S.: Appropriateness of performance indices for imbalanced data classification: an analysis. Pattern Recogn. 102 (2020). https://doi.org/10.1016/j.patcog.2020.107197

  21. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  22. R Core Team: R: A language and environment for statistical computing. Technical report, Vienna, Austria (2020). https://www.R-project.org/, R Foundation for Statistical Computing

  23. Raji, I.D., Smart, A., White, R.N., Mitchell, M., Gebru, T., Hutchinson, B., et al.: Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 33–44. ACM, Barcelona (2020). https://doi.org/10.1145/3351095.3372873

  24. Ren, D., Amershi, S., Lee, B., Suh, J., Williams, J.D.: Squares: supporting interactive performance analysis for multiclass classifiers. IEEE Trans. Visual Comput. Graphics 23(1), 61–70 (2017). https://doi.org/10.1109/TVCG.2016.2598828

    Article  Google Scholar 

  25. Sanderson, G.: The medical test paradox: can redesigning Bayes rule help? (2020). https://www.youtube.com/watch?v=lG4VkPoG3ko

  26. Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45(4), 427–437 (2009). https://doi.org/10.1016/j.ipm.2009.03.002

    Article  Google Scholar 

  27. Thoma, M.: The HASYv2 dataset. arXiv:1701.08380 [cs] (2017)

  28. Verma, S., Rubin, J.: Fairness definitions explained. In: Proceedings of the International Workshop on Software Fairness, FairWare 2018, pp. 1–7. ACM, New York (2018). https://doi.org/10.1145/3194770.3194776

  29. Ware, C.: Information visualization: perception for design. Interactive technologies, 3rd edn. Morgan Kaufmann, Waltham (2013)

    Google Scholar 

  30. Wei, J.M., Yuan, X.J., Hu, Q.H., Wang, S.Q.: A novel measure for evaluating classifiers. Expert Syst. Appl. 37(5), 3799–3809 (2010). https://doi.org/10.1016/j.eswa.2009.11.040

    Article  Google Scholar 

  31. Wickham, H., et al.: Welcome to the tidyverse. J. Open Source Softw. 4(43), 1686 (2019). https://doi.org/10.21105/joss.01686

  32. Wu, X.Z., Zhou, Z.H.: A unified view of multi-label performance measures. arXiv:1609.00288 [cs] (2017)

  33. Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 694–699. Association for Computing Machinery, New York (2002). https://doi.org/10.1145/775047.775151

  34. Zhou, Z.H., Liu, X.Y.: On multi-class cost-sensitive learning. In: Proceedings of the 21st National Conference on Artificial Intelligence, AAAI 2006, vol. 1, pp. 567–572. AAAI Press, Boston (2006)

    Google Scholar 

  35. Zicari, R.V., Ahmed, S., Amann, J., Braun, S.A., Brodersen, J., et al.: Co-design of a trustworthy AI system in healthcare: deep learning based skin lesion classifier. Front. Hum. Dyn. 3, 40 (2021). https://doi.org/10.3389/fhumd.2021.688152

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Lovell .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lovell, D., McCarron, B., Langfield, B., Tran, K., Bradley, A.P. (2021). Taking the Confusion Out of Multinomial Confusion Matrices and Imbalanced Classes. In: Xu, Y., et al. Data Mining. AusDM 2021. Communications in Computer and Information Science, vol 1504. Springer, Singapore. https://doi.org/10.1007/978-981-16-8531-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-8531-6_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-8530-9

  • Online ISBN: 978-981-16-8531-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics