A power-controlled reliability assessment for multi-class probabilistic classifiers

Gweon, Hyukjun

doi:10.1007/s11634-022-00528-0

A power-controlled reliability assessment for multi-class probabilistic classifiers

Regular Article
Published: 17 November 2022

Volume 17, pages 927–949, (2023)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Hyukjun Gweon ORCID: orcid.org/0000-0001-9035-984X¹

343 Accesses
Explore all metrics

Abstract

In multi-class classification, the output of a probabilistic classifier is a probability distribution of the classes. In this work, we focus on a statistical assessment of the reliability of probabilistic classifiers for multi-class problems. Our approach generates a Pearson $\chi ^2$ statistic based on the k-nearest-neighbors in the prediction space. Further, we develop a Bayesian approach for estimating the expected power of the reliability test that can be used for an appropriate sample size k. We propose a sampling algorithm and demonstrate that this algorithm obtains a valid prior distribution. The effectiveness of the proposed reliability test and expected power is evaluated through a simulation study. We also provide illustrative examples of the proposed methods with practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing the Reliability of a Multi-Class Classifier

On optimal Bayesian classification and risk estimation under multiple classes

Article Open access 24 October 2015

Statistical comparison of classifiers through Bayesian hierarchical modelling

Article 18 May 2017

Notes

Because $\hat{\textbf{p}}$ is a user-defined vector, one can choose $\hat{\textbf{p}}$ to meet the necessary conditions. Another solution to ensure that $p_j - \epsilon >0$ is to merge classes with low probabilities.
The number of clusters was set to six to illustrate diverse reliability test results without being redundant.
In this section, the true difference between each representative pattern and the corresponding underlying probability vector was used to empirically demonstrate the effectiveness of the proposed expected power compared with the actual rejection rate.

References

Breiman L (1984) Classification and regression trees. Taylor & Francis, LLC, Boca Raton, FL
MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1023/A:1018054314350
Article MATH Google Scholar
Bröcker J, Smith LA (2007) Increasing the reliability of reliability diagrams. Weather Forecast 22(3):651–661
Article Google Scholar
Cheng D, Branscum AJ, Stamey JD (2010) A Bayesian approach to sample size determination for studies designed to evaluate continuous medical tests. Comput Stat Data Anal 54(2):298–307. https://doi.org/10.1016/j.csda.2009.09.024
Article MathSciNet MATH Google Scholar
Daimon T (2008) Bayesian sample size calculations for a non-inferiority test of two proportions in clinical trials. Contemp Clin Trials 29(4):507–516. https://doi.org/10.1016/j.cct.2007.12.001
Article Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fagerland MW, Hosmer DW, Bofin AM (2008) Multinomial goodness-of-fit tests for logistic regression models. Stat Med 27(21):4238–4253. https://doi.org/10.1002/sim.3202
Article MathSciNet Google Scholar
Fix E, Hodges J (1951) Discriminatory analysis, nonparametric discrimination: Consistency properties. Technical report, USAF School of Aviation Medivine, Randolph Field, Texas, project 21-49-004, Rept. 4, Contract AF41(128)-31, February 1951
Gerrard DJ (1969) Competition quotient: A new measure of the competition affecting individual forest trees. Research Bulletin No. 20, Agricultural Experimental Station, Michigan State University
Gweon H, Yu H (2019) How reliable is your reliability diagram? Pattern Recogn Lett 125:687–693. https://doi.org/10.1016/j.patrec.2019.07.012
Article Google Scholar
Hamid HA, Wah Y, Xie X et al (2018) Investigating the power of goodness-of-fit tests for multinomial logistic regression. Commun Stat Simul Comput 47(4):1039–1055. https://doi.org/10.1080/03610918.2017.1303727
Article MathSciNet MATH Google Scholar
Hartigan JA, Wong MA (1979) A k-means clustering algorithm. J Roy Stat Soc: Ser C (Appl Stat) 28(1):100–108. https://doi.org/10.2307/2346830
Article MATH Google Scholar
Hosmer DW, Lemeshow S (1980) Goodness of fit tests for the multiple logistic regression model. Commun Stat Theory Methods 9(10):1043–1069. https://doi.org/10.1080/03610928008827941
Article MATH Google Scholar
Jiang X, Osl M, Kim J et al (2012) Calibrating predictive model estimates to support personalized medicine. J Am Med Inform Assoc 19:263–274
Article Google Scholar
Kumar A, Sarawagi S, Jain U (2018) Trainable calibration measures for neural networks from kernel mean embeddings. In: Dy J, Krause A (eds) Proceedings of the 35th international conference on machine learning, Stockholmsmässan, Stockholm Sweden, pp 2805–2814
Lloyd S (1957) Least squares quantization in pcm. Technical report RR-5497, Bell Lab
Murphy AH, Winkler RL (1977) Reliability of subjective probability forecasts of precipitation and temperature. J Roy Stat Soc: Ser C (Appl Stat) 26(1):41–47
Google Scholar
Naeini MP, Cooper GF, Hauskrecht M (2015) Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 2901—2907
Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. In: Proceedings of the 22nd international conference on machine learning. ACM, New York, NY, USA, pp 625–632
Paul P, Pennell ML, Lemeshow S (2013) Standardizing the power of the Hosmer–Lemeshow goodness of fit test in large data sets. Stat Med 32(1):67–80. https://doi.org/10.1002/sim.5525
Article MathSciNet Google Scholar
Pham-Gia T, Turkkan N (2003) Determination of exact sample sizes in the Bayesian estimation of the difference of two proportions. J Royal Stat Soc Ser D (The Statistician) 52(2):131–150. https://doi.org/10.1111/1467-9884.00347
Article MathSciNet Google Scholar
Pigeon JG, Heyse JF (1999) An improved goodness of fit statistic for probability prediction models. Biom J 41(1):71–82
Article MATH Google Scholar
Rauch G, Kieser M (2013) An expected power approach for the assessment of composite endpoints and their components. Comput Stat Data Anal 60:111–122. https://doi.org/10.1016/j.csda.2012.11.001
Article MathSciNet MATH Google Scholar
Read T, Cressie N (1988) Goodness-of-fit statistics for discrete multivariate data. Springer, New York
Book MATH Google Scholar
Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6(1):1–114
MathSciNet MATH Google Scholar
Vaicenavicius J, Widmann D, Andersson C, et al (2019) Evaluating model calibration in classification. In: Proceedings of the 22nd international conference on artificial intelligence and statistics, pp 3459–3467
Widmann D, Lindsten F, Zachariah D (2019) Calibration tests in multi-class classification: A unifying framework. In: Advances in neural information processing systems, pp 12,236 – 12,246
Widmann D, Lindsten F, Zachariah D (2021) Calibration tests beyond classification. In: Proceedings of the 9th international conference on learning representations

Download references

Acknowledgements

We gratefully acknowledge funding from grant RGPIN-2022-04698 (PI: Gweon) from the Natural Sciences and Engineering Research Council of Canada (NSERC).

Author information

Authors and Affiliations

Department of Statistical and Actuarial Sciences, Western University, 1151 Richmond St, London, ON, N6A 3K7, Canada
Hyukjun Gweon

Authors

Hyukjun Gweon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyukjun Gweon.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proof of Theorem 1

We show that the total area under $f_{\textbf{r}}(r_1,\ldots ,r_c)$ equals to 1. Because there are $\left( {\begin{array}{c}c\\ h\end{array}}\right) $ cases that h number of the $r_i$ $(i=1,\ldots ,c)$ values are negative, the total area can be expressed as

$$\begin{aligned} \idotsint f_{\textbf{r}}(r_1,r_2,\ldots ,r_c) \,dr_1 \ldots \,dr_c = \sum _{h=1}^{c-1} \left( {\begin{array}{c}c\\ h\end{array}}\right) \text {A}_h, \end{aligned}$$

where $\text {A}_h$ represents the probability such that

$$\begin{aligned} \sum _{i=1}^{h} r_i = -\frac{\epsilon }{2}, \quad \sum _{i=h+1}^{c} r_i = \frac{\epsilon }{2}, \quad r_1,\ldots ,r_h < 0, \text { and} \quad r_{h+1},\ldots ,r_c > 0 \end{aligned}$$

and thus the support of $(r_1,\ldots ,r_c)$ becomes

$$\begin{aligned}&r_1 \in \left( -\frac{\epsilon }{2},0\right) , \\&r_2 \in \left( -\frac{\epsilon }{2}- r_1,0\right) , \\&... \\&r_{h-1} \in \left( -\frac{\epsilon }{2}- r_1...-r_{h-2},0\right) , \\&r_h = -\frac{\epsilon }{2}- r_1...-r_{h-1}, \\&r_{h+1} \in \left( 0,\frac{\epsilon }{2}\right) , \\&r_{h+2} \in \left( 0,\frac{\epsilon }{2}-r_{h+1}\right) , \\&... \\&r_{c-1} \in \left( 0,\frac{\epsilon }{2}-r_{h+1} ... - r_{c-2}\right) , \text { and}\\&r_c = \frac{\epsilon }{2}-r_{h+1} ... - r_{c-1}. \end{aligned}$$

Using change of variable, we define $w_i = - \epsilon /2 - \sum _{j=0}^{i}r_{h-j}$ ($i=0,\ldots ,h-2)$ and $v_i = \epsilon /2 - \sum _{j=0}^{i}r_{c-j}$ ($i=0,\ldots ,c-h-2)$. Then, we have

$$\begin{aligned} \text {A}_h&= \int _{-\frac{\epsilon }{2}}^{0} \int _{-\frac{\epsilon }{2}}^{w_{h-2}} \ldots \int _{-\frac{\epsilon }{2}}^{w_1} \int _{0}^{\frac{\epsilon }{2}} \int _{v_{c-h-2}}^{\frac{\epsilon }{2}} \ldots \int _{v_1}^{\frac{\epsilon }{2}} \frac{1}{\left( {\begin{array}{c}c\\ h\end{array}}\right) } \frac{(c-2)!}{\epsilon ^{c-2}} \vert J \vert \,dw_0 \ldots \,dv_{c-h-2} \\&=\frac{1}{\left( {\begin{array}{c}c\\ h\end{array}}\right) } \frac{(c-2)!}{\epsilon ^{c-2}} \left[ \int _{0}^{\frac{\epsilon }{2}} \ldots \int _{v_{1}}^{\frac{\epsilon }{2}} \left[ \int _{-\frac{\epsilon }{2}}^{0} \ldots \int _{-\frac{\epsilon }{2}}^{w_1} \,dw_0 \ldots \,dw_{h-2} \right] \,dv_0 \ldots \,dv_{c-h-2} \right] , \end{aligned}$$

where the Jacobian $\vert J \vert = 1$ due to the property of the determinant of a triangular matrix.

We first prove by induction that

$$\begin{aligned} \int _{-\frac{\epsilon }{2}}^{w_n} \ldots \int _{-\frac{\epsilon }{2}}^{w_1} \,dw_0 \ldots \,dw_{n-1} = \sum _{j=0}^{n} \left( \frac{\epsilon }{2}\right) ^j \frac{1}{j!} \frac{1}{(n-j)!} w_n^{n-j} \end{aligned}$$

(A1)

for any positive integer n. When $n=1$, we have

$$\begin{aligned} \int _{-\frac{\epsilon }{2}}^{w_1} \,dw_0 = w_1 + \frac{\epsilon }{2} = \sum _{j=0}^{1} \left( \frac{\epsilon }{2}\right) ^j \frac{1}{j!} \frac{1}{(1-j)!} w_1^{1-j}. \end{aligned}$$

Assuming that

$$\begin{aligned} \int _{-\frac{\epsilon }{2}}^{w_k} \ldots \int _{-\frac{\epsilon }{2}}^{w_1} \,dw_0 \ldots \,dw_{k-1} = \sum _{j=0}^{k} \left( \frac{\epsilon }{2}\right) ^j \frac{1}{j!} \frac{1}{(k-j)!} w_k^{k-j}, \end{aligned}$$

we have

$$\begin{aligned} \int _{-\frac{\epsilon }{2}}^{w_{k+1}} \ldots \int _{-\frac{\epsilon }{2}}^{w_1} \,dw_0 \ldots \,dw_{k}&= \int _{-\frac{\epsilon }{2}}^{w_{k+1}} \left( \sum _{j=0}^{k} \left( \frac{\epsilon }{2}\right) ^j \frac{1}{j!} \frac{1}{(k-j)!} w_k^{k-j} \right) \,dw_k \\&= \sum _{j=0}^{k} \left( \frac{\epsilon }{2}\right) ^j \frac{1}{j!} \frac{1}{(k-j)!} \left( \int _{-\frac{\epsilon }{2}}^{w_{k+1}} w_k^{k-j} \,dw_k \right) \\&= \sum _{j=0}^{k} \left( \frac{\epsilon }{2}\right) ^j \frac{1}{j!} \frac{1}{(k-j+1)!} w_{k+1}^{k-j+1} \\&\quad - \sum _{j=0}^{k} \left( \frac{\epsilon }{2}\right) ^j \left( - \frac{\epsilon }{2} \right) ^{k-j+1} \frac{1}{j!} \frac{1}{(k-j+1)!}. \end{aligned}$$

From the binomial theorem that is given as

$$\begin{aligned} \sum _{j=0}^{k+1} \left( \frac{\epsilon }{2}\right) ^j \left( - \frac{\epsilon }{2} \right) ^{k-j+1} \frac{(k+1)!}{j! (k-j+1)!} = 0, \end{aligned}$$

we have

$$\begin{aligned} \sum _{j=0}^{k} \left( \frac{\epsilon }{2}\right) ^j \left( - \frac{\epsilon }{2} \right) ^{k-j+1} \frac{1}{j!} \frac{1}{(k-j+1)!} = -\frac{1}{(k+1)!} \left( \frac{\epsilon }{2} \right) ^{k+1} \end{aligned}$$

Hence,

$$\begin{aligned} \int _{-\frac{\epsilon }{2}}^{w_{k+1}} \ldots \int _{-\frac{\epsilon }{2}}^{w_1} \,dw_0 \ldots \,dw_{k}&= \sum _{j=0}^{k} \left( \frac{\epsilon }{2}\right) ^j \frac{1}{j!} \frac{1}{(k-j+1)!} w_{k+1}^{k-j+1} + \frac{1}{(k+1)!} \left( \frac{\epsilon }{2} \right) ^{k+1} \\&= \sum _{j=0}^{k+1} \left( \frac{\epsilon }{2}\right) ^j \frac{1}{j!} \frac{1}{(k-j+1)!} w_{k+1}^{k-j+1}. \end{aligned}$$

Then, using the result in Eq. (A1),

$$\begin{aligned} \int _{-\frac{\epsilon }{2}}^{0} \int _{-\frac{\epsilon }{2}}^{w_{h-2}} \ldots \int _{-\frac{\epsilon }{2}}^{w_1} \,dw_0 \ldots \,dw_{h-2}&= \int _{-\frac{\epsilon }{2}}^{0} \left( \sum _{j=0}^{k+1} \left( \frac{\epsilon }{2}\right) ^j \frac{1}{j!} \frac{1}{(k-j+1)!} w_{k+1}^{k-j+1} \right) \,dw_{h-2} \\&= \frac{1}{(h-1)!} \left( \frac{\epsilon }{2}\right) ^{h-1}. \end{aligned}$$

Similarly, we can show that

$$\begin{aligned} \int _{0}^{\frac{\epsilon }{2}} \int _{v_{c-h-2}}^{\frac{\epsilon }{2}} \ldots \int _{v_{1}}^{\frac{\epsilon }{2}} \,dv_0 \ldots \,dv_{c-h-2} = \frac{1}{(c-h-1)!} \left( \frac{\epsilon }{2}\right) ^{c-h-1}. \end{aligned}$$

Therefore,

$$\begin{aligned} \idotsint f_{\textbf{r}}(r_1,\ldots ,r_c) \,dr_1 \ldots \,dr_c&= \sum _{h=1}^{c-1} \left( {\begin{array}{c}c\\ h\end{array}}\right) \text {A}_h \\&= \frac{(c-2)!}{\epsilon ^{c-2}} \sum _{h=1}^{c-1} \left( \frac{(\epsilon /2)^{h-1}}{(h-1)!} \frac{(\epsilon /2)^{c-h-1}}{(c-h-1)!} \right) \\&= \frac{1}{2^{c-2}} \sum _{h=1}^{c-1} \frac{(c-2)!}{(h-1)!(c-h-1)!} \\&= \frac{1}{2^{c-2}} \sum _{b=0}^{c-2} \frac{(c-2)!}{b!(c-b-2)!}= 1 \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gweon, H. A power-controlled reliability assessment for multi-class probabilistic classifiers. Adv Data Anal Classif 17, 927–949 (2023). https://doi.org/10.1007/s11634-022-00528-0

Download citation

Received: 01 January 2022
Revised: 01 November 2022
Accepted: 04 November 2022
Published: 17 November 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11634-022-00528-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A power-controlled reliability assessment for multi-class probabilistic classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Assessing the Reliability of a Multi-Class Classifier

On optimal Bayesian classification and risk estimation under multiple classes

Statistical comparison of classifiers through Bayesian hierarchical modelling

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

A power-controlled reliability assessment for multi-class probabilistic classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Assessing the Reliability of a Multi-Class Classifier

On optimal Bayesian classification and risk estimation under multiple classes

Statistical comparison of classifiers through Bayesian hierarchical modelling

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A Proof of Theorem 1

Appendix A Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation