article

Optimal ROC-Based Classification and Performance Analysis under Bayesian Uncertainty Models

Author:

Lori A. DaltonAuthors Info & Claims

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), Volume 13, Issue 4

Pages 719 - 729

https://doi.org/10.1109/TCBB.2015.2465966

Published: 01 July 2016 Publication History

Abstract

Popular tools to evaluate classifier performance are the false positive rate FPR, true positive rate TPR, receiver operator characteristic ROC curve, and area under the curve AUC. Typically, these quantities are estimated from training data using simple resampling and counting methods, which have been shown to perform poorly when the sample size is small, as is typical in many applications. This work takes a model-based approach in classifier training and performance analysis, where we assume the true population densities are members of an uncertainty class of distributions. Given a prior over the uncertainty class and data, we form a posterior and derive optimal mean-squared-error MSE FPR and TPR estimators, as well as the sample-conditioned MSE performance of these estimators. The theory also naturally leads to optimal ROC and AUC estimators. Finally, we develop a Neyman-Pearson-based approach to optimal classifier design, which maximizes the estimated TPR for a given estimated FPR. These tools are optimal over the uncertainty class of distributions given the sample, and are available in closed form or can be easily approximated for many models. Applications are demonstrated on both synthetic and real genomic data. MATLAB code and simulations results are available in the online supplementary material.

References

[1]

M. S. Pepe, H. Janes, G. Longton, W. Leisenring, and P. Newcomb, "Limitations of the odds ratio in guaging the performance of a diagnositc, prognostic, or screening marker," Am J. Epidemiol., vol. 159, pp. 882-890, 2004.

[2]

K. A. Spackman, "Signal detection theory: Valuable tools for evaluating inductive learning," in Proc. 6th Int. Workshop Mach. Learning, 1989, pp. 160-163.

[3]

T. Fawcett, "An introduction to ROC analysis," Pattern Recog. Lett., vol. 27, pp. 861-874, 2006.

[4]

B. Hanczar, J. Hua, C. Sima, J. Weinstein, M. Bittner, and E. Dougherty, "Small-sample precision of ROC-related estimates," Bioinformatics, vol. 26, no. 6, pp. 822-830, 2010.

[5]

U. M. Braga-Neto and E. R. Dougherty, "Is cross-validation valid for small-sample microarray classification," Bioinformatics, vol. 20, no. 3, pp. 374-380, 2004.

[6]

U. Braga-Neto and E. R. Dougherty, "Exact performance of error estimators for discrete classifiers," Pattern Recogn., vol. 38, no. 11, pp. 1799-1814, 2005.

[7]

A. Zollanvari, U. M. Braga-Neto, and E. R. Dougherty, "On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers," Pattern Recog., vol. 42, no. 11, pp. 2705-2723, Nov. 2009.

[8]

E. R. Dougherty, A. Zollanvari, and U. M. Braga-Neto, "The illusion of distribution-free small-sample classification in genomics," Curr. Genomics, vol. 12, no. 5, pp. 333-341, 2011.

[9]

L. A. Dalton and E. R. Dougherty, "Bayesian minimum mean-square error estimation for classification error--Part I: Definition and the Bayesian MMSE error estimator for discrete classification," IEEE Trans. Signal Process., vol. 59, no. 1, pp. 115-129, Jan. 2011.

[10]

L. A. Dalton and E. R. Dougherty, "Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error--Part I: Representation," IEEE Trans. Signal Process., vol. 60, no. 5, pp. 2575-2587, May 2012.

[11]

L. A. Dalton and E. R. Dougherty, "Optimal classifiers with minimum expected error within a Bayesian framework--Part I: Discrete and Gaussian models," Pattern Recog., vol. 46, no. 5, pp. 1301-1314, 2013.

[12]

L. D. Broemeling, Bayesian Biostatistics and Diagnostic Medicine. Boca Raton, FL, USA: Chapman & Hall, 2007.

[13]

L. D. Broemeling, Advanced Bayesian Methods for Medical Test Accuracy. Boca Raton, FL, USA: CRC Press, 2011.

[14]

K. H. Zou, A. Liu, A. I. Bandos, L. Ohno-Machado, and H. E. Rockette, Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis (ser. Chapman & Hall/CRC biostatistics series). Boca Raton, FL, USA: CRC Press, 2012.

[15]

X.-H. Zhou, N. A. Obuchowski, and D. K. McClish, Statistical Methods in Diagnostic Medicine (ser. Wiley series in probability and statistics), 2nd ed. Hoboken, NJ, USA: Wiley, 2011.

[16]

Y.-K. Choi, W. O. Johnson, M. T. Collins, and I. A. Gardner, "Bayesian inferences for receiver operating characteristic curves in the absence of a gold standard," J. Agricultural, Biol., Environ. Statist., vol. 11, no. 2, pp. 210-229, 2006.

[17]

H. Ishwaran and C. Gatsonis, "A general class of hierarchical ordinal regression models with applications to correlated ROC analysis," Can. J. Statist., vol. 28, no. 4, pp. 731-750, 2000.

[18]

K. H. Zou and A. J. O'Malley, "A Bayesian hierarchical non-linear regression model in receiver operating characteristic analysis of clustered continuous diagnostic data," Biomtrical J., vol. 47, no. 4, pp. 417-427, 2005.

[19]

T. D. Johnson and V. E. Johnson, "A Bayesian hierarchical approach to multirater correlated ROC analysis," Statist. Med., vol. 25, no. 11, pp. 1858-1871, 2006.

[20]

A. J. O'Malley and K. H. Zou, "Bayesian multivariate hierarchical transformation models for ROC analysis," Statist. Med., vol. 25, no. 3, pp. 459-479, 2006.

[21]

A. Erkanli, M. Sung, E. J. Costello, and A. Angold, "Bayesian semi-parametric ROC analysis," Statist. Med., vol. 25, no. 22, pp. 3905-3928, 2006.

[22]

J. Gu, S. Ghosal, and D. E. Kleiner, "Bayesian ROC curve estimation under verification bias," Statist. Med., vol. 33, pp. 5081-5096, 2014.

[23]

J. Gu, S. Ghosal, and A. Roy, "Bayesian bootstrap estimation of ROC curve," Statist. Med., vol. 27, no. 26, pp. 5407-5420, 2008.

[24]

C. Scott, "Performance measures for Neyman-Pearson classification," IEEE Trans. Inf. Theory, vol. 53, no. 8, pp. 2852-2863, Aug. 2007.

[25]

L. A. Dalton and E. R. Dougherty, "Bayesian minimum mean-square error estimation for classification error--Part II: The Bayesian MMSE error estimator for linear classification of Gaussian distributions," IEEE Trans. Signal Process., vol. 59, no. 1, pp. 130-144, Jan. 2011.

[26]

L. A. Dalton and E. R. Dougherty, "Application of the Bayesian MMSE estimator for classification error to gene expression microarray data," Bioinformatics, vol. 27, no. 13, pp. 1822-1831, 2011.

[27]

D. J. Hand and R. J. Till, "A simple generalisation of the area under the ROC curve for multiple class classification problems," Mach. Learn., vol. 45, pp. 171-186, 2001.

[28]

L. A. Dalton and E. R. Dougherty, "Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error--Part II: Consistency and performance analysis," IEEE Trans. Signal Process., vol. 60, no. 5, pp. 2588-2603, May 2012.

[29]

C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, pp. 27:1-27:27, 2011.

[30]

M. J. van de Vijver, Y. D. He, L. J. van 't Veer, H. Dai, A. A. M. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend, and R. Bernards, "A gene-expression signature as a predictor of survival in breast cancer," New Engl. J. Med., vol. 347, no. 25, pp. 1999-2009, Dec. 2002.

[31]

N. Glick, "Additive estimators for probabilites of correct classification," Pattern Recog., vol. 10, no. 3, pp. 211-222, Jan. 1978.

[32]

U. Braga-Neto and E. R. Dougherty, "Bolstered error estimation," Pattern Recog., vol. 37, no. 6, pp. 1267-1281, Jun. 2004.

[33]

B. Hanczar, J. Hua, and E. R. Dougherty, "Decorrelation of the true and estimated classifier errors in high-dimensional settings," EURASIP J. Bioinformat. Syst. Biol., vol. 2007, pp. 38473-1-38473- 12, 2007.

Cited By

Hassan SHuttunen HNiemi JTohka J(2019)Bayesian receiver operating characteristic metric for linear classifiersPattern Recognition Letters10.1016/j.patrec.2019.07.016128:C(52-59)Online publication date: 1-Dec-2019
https://dl.acm.org/doi/10.1016/j.patrec.2019.07.016
Ferri CHernández-Orallo JFlach P(2019)Setting decision thresholds when operating conditions are uncertainData Mining and Knowledge Discovery10.1007/s10618-019-00613-733:4(805-847)Online publication date: 20-Jul-2019
https://dl.acm.org/doi/10.1007/s10618-019-00613-7

Optimal ROC-Based Classification and Performance Analysis under Bayesian Uncertainty Models
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches

Recommendations

Optimal classifiers with minimum expected error within a Bayesian framework - Part II: Properties and performance analysis

In part I of this two-part study, we introduced a new optimal Bayesian classification methodology that utilizes the same modeling framework proposed in Bayesian minimum-mean-square error (MMSE) error estimation. Optimal Bayesian classification thus ...
Optimal classifiers with minimum expected error within a Bayesian framework-Part I: Discrete and Gaussian models

In recent years, biomedicine has faced a flood of difficult small-sample phenotype discrimination problems. A host of classification rules have been proposed to discriminate types of pathology, stages of disease and other diagnoses. Typically, these ...
Bayesian Minimum Mean-Square Error Estimation for Classification Error—Part I: Definition and the Bayesian MMSE Error Estimator for Discrete Classification

With the advent of high-throughput genomic and proteomic technologies, in conjunction with the difficulty in obtaining even moderately sized samples, small-sample classifier design has become a major issue in the biological and medical communities. With ...

Comments

Information & Contributors

Information

Published In

cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics

IEEE/ACM Transactions on Computational Biology and Bioinformatics Volume 13, Issue 4

July/August 2016

211 pages

ISSN:1545-5963

Issue’s Table of Contents

Publisher

IEEE Computer Society Press

Washington, DC, United States

Publication History

Published: 01 July 2016

Published in TCBB Volume 13, Issue 4

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
55
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hassan SHuttunen HNiemi JTohka J(2019)Bayesian receiver operating characteristic metric for linear classifiersPattern Recognition Letters10.1016/j.patrec.2019.07.016128:C(52-59)Online publication date: 1-Dec-2019
https://dl.acm.org/doi/10.1016/j.patrec.2019.07.016
Ferri CHernández-Orallo JFlach P(2019)Setting decision thresholds when operating conditions are uncertainData Mining and Knowledge Discovery10.1007/s10618-019-00613-733:4(805-847)Online publication date: 20-Jul-2019
https://dl.acm.org/doi/10.1007/s10618-019-00613-7

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents