Data complexity assessment in undersampled classification of high-dimensional biomedical data

R Baumgartner, RL Somorjai - Pattern Recognition Letters, 2006 - Elsevier
R Baumgartner, RL Somorjai
Pattern Recognition Letters, 2006Elsevier
Regularized linear classifiers have been successfully applied in undersampled, ie small
sample size/high dimensionality biomedical classification problems. Additionally, a design of
data complexity measures was proposed in order to assess the competence of a classifier in
a particular context. Our work was motivated by the analysis of ill-posed regression
problems by Elden and the interpretation of linear discriminant analysis as a mean square
error classifier. Using Singular Value Decomposition analysis, we define a discriminatory …
Regularized linear classifiers have been successfully applied in undersampled, i.e. small sample size/high dimensionality biomedical classification problems. Additionally, a design of data complexity measures was proposed in order to assess the competence of a classifier in a particular context. Our work was motivated by the analysis of ill-posed regression problems by Elden and the interpretation of linear discriminant analysis as a mean square error classifier. Using Singular Value Decomposition analysis, we define a discriminatory power spectrum and show that it provides useful means of data complexity assessment for undersampled classification problems. In five real-life biomedical data sets of increasing difficulty we demonstrate how the data complexity of a classification problem can be related to the performance of regularized linear classifiers. We show that the concentration of the discriminatory power manifested in the discriminatory power spectrum is a deciding factor for the success of the regularized linear classifiers in undersampled classification problems. As a practical outcome of our work, the proposed data complexity assessment may facilitate the choice of a classifier for a given undersampled problem.
Elsevier