Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Optimal ROC-Based Classification and Performance Analysis under Bayesian Uncertainty Models

Published: 01 July 2016 Publication History

Abstract

Popular tools to evaluate classifier performance are the false positive rate FPR, true positive rate TPR, receiver operator characteristic ROC curve, and area under the curve AUC. Typically, these quantities are estimated from training data using simple resampling and counting methods, which have been shown to perform poorly when the sample size is small, as is typical in many applications. This work takes a model-based approach in classifier training and performance analysis, where we assume the true population densities are members of an uncertainty class of distributions. Given a prior over the uncertainty class and data, we form a posterior and derive optimal mean-squared-error MSE FPR and TPR estimators, as well as the sample-conditioned MSE performance of these estimators. The theory also naturally leads to optimal ROC and AUC estimators. Finally, we develop a Neyman-Pearson-based approach to optimal classifier design, which maximizes the estimated TPR for a given estimated FPR. These tools are optimal over the uncertainty class of distributions given the sample, and are available in closed form or can be easily approximated for many models. Applications are demonstrated on both synthetic and real genomic data. MATLAB code and simulations results are available in the online supplementary material.

References

[1]
M. S. Pepe, H. Janes, G. Longton, W. Leisenring, and P. Newcomb, "Limitations of the odds ratio in guaging the performance of a diagnositc, prognostic, or screening marker," Am J. Epidemiol., vol. 159, pp. 882-890, 2004.
[2]
K. A. Spackman, "Signal detection theory: Valuable tools for evaluating inductive learning," in Proc. 6th Int. Workshop Mach. Learning, 1989, pp. 160-163.
[3]
T. Fawcett, "An introduction to ROC analysis," Pattern Recog. Lett., vol. 27, pp. 861-874, 2006.
[4]
B. Hanczar, J. Hua, C. Sima, J. Weinstein, M. Bittner, and E. Dougherty, "Small-sample precision of ROC-related estimates," Bioinformatics, vol. 26, no. 6, pp. 822-830, 2010.
[5]
U. M. Braga-Neto and E. R. Dougherty, "Is cross-validation valid for small-sample microarray classification," Bioinformatics, vol. 20, no. 3, pp. 374-380, 2004.
[6]
U. Braga-Neto and E. R. Dougherty, "Exact performance of error estimators for discrete classifiers," Pattern Recogn., vol. 38, no. 11, pp. 1799-1814, 2005.
[7]
A. Zollanvari, U. M. Braga-Neto, and E. R. Dougherty, "On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers," Pattern Recog., vol. 42, no. 11, pp. 2705-2723, Nov. 2009.
[8]
E. R. Dougherty, A. Zollanvari, and U. M. Braga-Neto, "The illusion of distribution-free small-sample classification in genomics," Curr. Genomics, vol. 12, no. 5, pp. 333-341, 2011.
[9]
L. A. Dalton and E. R. Dougherty, "Bayesian minimum mean-square error estimation for classification error--Part I: Definition and the Bayesian MMSE error estimator for discrete classification," IEEE Trans. Signal Process., vol. 59, no. 1, pp. 115-129, Jan. 2011.
[10]
L. A. Dalton and E. R. Dougherty, "Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error--Part I: Representation," IEEE Trans. Signal Process., vol. 60, no. 5, pp. 2575-2587, May 2012.
[11]
L. A. Dalton and E. R. Dougherty, "Optimal classifiers with minimum expected error within a Bayesian framework--Part I: Discrete and Gaussian models," Pattern Recog., vol. 46, no. 5, pp. 1301-1314, 2013.
[12]
L. D. Broemeling, Bayesian Biostatistics and Diagnostic Medicine. Boca Raton, FL, USA: Chapman & Hall, 2007.
[13]
L. D. Broemeling, Advanced Bayesian Methods for Medical Test Accuracy. Boca Raton, FL, USA: CRC Press, 2011.
[14]
K. H. Zou, A. Liu, A. I. Bandos, L. Ohno-Machado, and H. E. Rockette, Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis (ser. Chapman & Hall/CRC biostatistics series). Boca Raton, FL, USA: CRC Press, 2012.
[15]
X.-H. Zhou, N. A. Obuchowski, and D. K. McClish, Statistical Methods in Diagnostic Medicine (ser. Wiley series in probability and statistics), 2nd ed. Hoboken, NJ, USA: Wiley, 2011.
[16]
Y.-K. Choi, W. O. Johnson, M. T. Collins, and I. A. Gardner, "Bayesian inferences for receiver operating characteristic curves in the absence of a gold standard," J. Agricultural, Biol., Environ. Statist., vol. 11, no. 2, pp. 210-229, 2006.
[17]
H. Ishwaran and C. Gatsonis, "A general class of hierarchical ordinal regression models with applications to correlated ROC analysis," Can. J. Statist., vol. 28, no. 4, pp. 731-750, 2000.
[18]
K. H. Zou and A. J. O'Malley, "A Bayesian hierarchical non-linear regression model in receiver operating characteristic analysis of clustered continuous diagnostic data," Biomtrical J., vol. 47, no. 4, pp. 417-427, 2005.
[19]
T. D. Johnson and V. E. Johnson, "A Bayesian hierarchical approach to multirater correlated ROC analysis," Statist. Med., vol. 25, no. 11, pp. 1858-1871, 2006.
[20]
A. J. O'Malley and K. H. Zou, "Bayesian multivariate hierarchical transformation models for ROC analysis," Statist. Med., vol. 25, no. 3, pp. 459-479, 2006.
[21]
A. Erkanli, M. Sung, E. J. Costello, and A. Angold, "Bayesian semi-parametric ROC analysis," Statist. Med., vol. 25, no. 22, pp. 3905-3928, 2006.
[22]
J. Gu, S. Ghosal, and D. E. Kleiner, "Bayesian ROC curve estimation under verification bias," Statist. Med., vol. 33, pp. 5081-5096, 2014.
[23]
J. Gu, S. Ghosal, and A. Roy, "Bayesian bootstrap estimation of ROC curve," Statist. Med., vol. 27, no. 26, pp. 5407-5420, 2008.
[24]
C. Scott, "Performance measures for Neyman-Pearson classification," IEEE Trans. Inf. Theory, vol. 53, no. 8, pp. 2852-2863, Aug. 2007.
[25]
L. A. Dalton and E. R. Dougherty, "Bayesian minimum mean-square error estimation for classification error--Part II: The Bayesian MMSE error estimator for linear classification of Gaussian distributions," IEEE Trans. Signal Process., vol. 59, no. 1, pp. 130-144, Jan. 2011.
[26]
L. A. Dalton and E. R. Dougherty, "Application of the Bayesian MMSE estimator for classification error to gene expression microarray data," Bioinformatics, vol. 27, no. 13, pp. 1822-1831, 2011.
[27]
D. J. Hand and R. J. Till, "A simple generalisation of the area under the ROC curve for multiple class classification problems," Mach. Learn., vol. 45, pp. 171-186, 2001.
[28]
L. A. Dalton and E. R. Dougherty, "Exact sample conditioned MSE performance of the Bayesian MMSE estimator for classification error--Part II: Consistency and performance analysis," IEEE Trans. Signal Process., vol. 60, no. 5, pp. 2588-2603, May 2012.
[29]
C.-C. Chang and C.-J. Lin, "LIBSVM: A library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, pp. 27:1-27:27, 2011.
[30]
M. J. van de Vijver, Y. D. He, L. J. van 't Veer, H. Dai, A. A. M. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J. Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend, and R. Bernards, "A gene-expression signature as a predictor of survival in breast cancer," New Engl. J. Med., vol. 347, no. 25, pp. 1999-2009, Dec. 2002.
[31]
N. Glick, "Additive estimators for probabilites of correct classification," Pattern Recog., vol. 10, no. 3, pp. 211-222, Jan. 1978.
[32]
U. Braga-Neto and E. R. Dougherty, "Bolstered error estimation," Pattern Recog., vol. 37, no. 6, pp. 1267-1281, Jun. 2004.
[33]
B. Hanczar, J. Hua, and E. R. Dougherty, "Decorrelation of the true and estimated classifier errors in high-dimensional settings," EURASIP J. Bioinformat. Syst. Biol., vol. 2007, pp. 38473-1-38473- 12, 2007.

Cited By

View all
  • (2019)Bayesian receiver operating characteristic metric for linear classifiersPattern Recognition Letters10.1016/j.patrec.2019.07.016128:C(52-59)Online publication date: 1-Dec-2019
  • (2019)Setting decision thresholds when operating conditions are uncertainData Mining and Knowledge Discovery10.1007/s10618-019-00613-733:4(805-847)Online publication date: 20-Jul-2019
  1. Optimal ROC-Based Classification and Performance Analysis under Bayesian Uncertainty Models

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEEE/ACM Transactions on Computational Biology and Bioinformatics
      IEEE/ACM Transactions on Computational Biology and Bioinformatics  Volume 13, Issue 4
      July/August 2016
      211 pages

      Publisher

      IEEE Computer Society Press

      Washington, DC, United States

      Publication History

      Published: 01 July 2016
      Published in TCBB Volume 13, Issue 4

      Author Tags

      1. Bayesian estimation
      2. Classification
      3. area under the curve
      4. error estimation
      5. receiver operating characteristic

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 04 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Bayesian receiver operating characteristic metric for linear classifiersPattern Recognition Letters10.1016/j.patrec.2019.07.016128:C(52-59)Online publication date: 1-Dec-2019
      • (2019)Setting decision thresholds when operating conditions are uncertainData Mining and Knowledge Discovery10.1007/s10618-019-00613-733:4(805-847)Online publication date: 20-Jul-2019

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media