Abstract
Feature extraction is an important step before actual learning. Although many feature extraction methods have been proposed for clustering, classification and regression, very limited work has been done on multi-class classification problems. This paper proposes a novel feature extraction method, called orientation distance–based discriminative (ODD) feature extraction, particularly designed for multi-class classification problems. Our proposed method works in two steps. In the first step, we extend the Fisher Discriminant idea to determine an appropriate kernel function and map the input data with all classes into a feature space where the classes of the data are well separated. In the second step, we put forward two variants of ODD features, i.e., one-vs-all-based ODD and one-vs-one-based ODD features. We first construct hyper-plane (SVM) based on one-vs-all scheme or one-vs-one scheme in the feature space; we then extract one-vs-all-based or one-vs-one-based ODD features between a sample and each hyper-plane. These newly extracted ODD features are treated as the representative features and are thereafter used in the subsequent classification phase. Extensive experiments have been conducted to investigate the performance of one-vs-all-based and one-vs-one-based ODD features for multi-class classification. The statistical results show that the classification accuracy based on ODD features outperforms that of the state-of-the-art feature extraction methods.
Similar content being viewed by others
References
Blake CL, MERZ CJ (1998) UCI Repository of machine learning databases: http://www.ics.uci.edu/mlearn/MLRepository.html
Chien J, Chen BC (2003) A new independent component analysis for speech recognition and separation. IEEE Trans Pattern Anal Mach Intell 14(4):1245–1254
Dagher I, Nachar R (2006) Face recognition using ipca-ica algorithm. IEEE Trans Pattern Anal Mach Intell 28(6):996–1000
Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, London
Escalera S, Pujol O, Radeva P (2011) Online error correcting output codes. Pattern Recognit Lett 32(3):458–467
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2011) An overview of ensemble methods for binary classifiers in multi-class problems: experimental study on one-vs-one and one-vs-all schemes. Pattern Recognit 44(8):1761–1776
Girolami M, Cichocki A, Amari SI (1998) A common neural network model for unsupervised exploratory data analysis and independent component analysis. IEEE Trans Neural Netw 9(6):1495–1501
Guyon I, Gunn S, Nikravesh M, Zadeh L (2006) Feature extraction foundations and applications. Studies in fuzziness and soft computing. Springer, Germany
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422
He Q, Xie Z, Hu Q (2011) Neighborhood based sample and feature selection for svm classification learning. Neurocomputing 74(10):1585–1594
Hsu C (2011) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13:415–425
Keerthi S (2002) Efficient tuning of svm hyperparameters using radius/margin bound and iterative algorithms. IEEE Trans Neural Netw 5:1225–1229
Kocsor A, Kovacs K, Szepesvari C (2004) Margin maximizing discriminant analysis. In: International conference on machine learning, pp 227–238
Kuo SC, Lin CJ, Liao JR (2011) 3d reconstruction and face recognition using kernel-based ica and neural networks. Expert Syst Appl 38(5):5406–5415
Liu Y, Lita LV, Niculescu RS, Bai K, Mitra P, Giles CL (2008) Real-time data pre-processing technique for efficient feature extraction in large scale datasets. In: ACM international conference on information and knowledge management, ACM, pp 981–990
Liu Z, Hsiao W, Cantarel BL (2011) Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data. Bioinformatics 27(23):3242–3249
Mika S, Rätsch G, Weston J, Schoölkopf B, Müller KR (1999) Fisher discriminant analysis with kernels. In: Hu YH, Larsen J, Wilson E, and Douglas S, (eds) Neural Networks for Signal Processing IX, Piscataway, NJ:IEEE, pp 41–48
Moustakidis SP, Theocharis JB (2010) A novel svm-based feature selection method using a fuzzy complementary criterion. Pattern Recognit 41(11):3712–3729
Pan F, Converse T, Ahn D, Salvetti F, Donato G (2009) Feature selection for ranking using boosted trees. In: ACM international conference on information and knowledge management, pp 2025–2028
Ren J, Qiu Z, Fan W, Cheng H, Yu PS (2008) Forward semi-supervised feature selection. In: Pacific-Asia conference on knowledge discovery and data mining, pp 970–976
Roth V, Steinhage V (2000) Nonlinear discriminant analysis using kernel function. Adv Neural Inf Process Syst 568–574 MIT Press, Cambridge
Schölkopf B, Mika S, Burges C, Knirsch P, Müller K, Rütsch G, Smola A (1999) Input space vs. feature space in kernel-based methods. IEEE Trans Neural Netw 10:1000–1017
Schölkopf B, Smola A (2011) Learning with kernels. MIT Press, Cambridge
Shima K, Todoriki M, Suzuki A (2004) Svm-based feature selection of latent semantic features. Pattern Recognit Lett 25(9):1051–1057
Song L, Smola A, Gretton A, Borgwardt K, Bedo J (2007) Supervised feature selection via dependence estimation. In: International conference on machine learning, pp 823–830
Sun T, Chen S, Yang J, Shi P (2008) A novel method of combined feature extraction for recognition. In: IEEE international conference on data mining, pp 1550–4786
Tang F, Crabb R, Tao H (2007) Representing images using nonorthogonal haar-like bases. IEEE Trans Pattern Anal Mach Intell 29(12):2120–2134
Tsang IW, Andras K, Kocsor TK (2006) Effcient kernel feature extraction for massive data sets. In: ACM SIGKDD conference on knowledge discovery and data mining, pp 724–729
Vapnik V (1998) Statistical learning theory. Springer, Berlin
Wang JH, Li Q, You J (2011) Fast kernel fisher discriminant analysis via approximating the kernel principal component analysis. Neurocomputing 74(17):3313–3322
Weng J, Zhang Y, Hwang WS (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25(8):1034–1040
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of zero-norm with linear models and kernel method. J Mach Learn Res 3:1439–1461
Xiao YQ, He YG (2011) A novel approach for analog fault diagnosis based on neural networks and improved kernel pca. Neurocomputing 74(7):1102–1115
Xu B, Jin X, Guo P, Bie F (2006) Kica feature extraction in application to fnn based image registration. In: International joint conference on neural networks, pp 3602–3608
Xu Y, Furao S, Zhao J, Hasegawa O (2009) To obtain orthogonal feature extraction using training data selection. In ACM international conference on information and knowledge management, pp 1819–1822
Yang J, Frangi AF, Yang JY, Zhang D, Jin Z (2005) Kpca plus lda: a complete kernel fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell 27(2):230–244
Zhang F (2004) A polygonal line algorithm based nonlinear feature extraction method. In: International conference on data mining, pp 281–288
Zhang J, Gruenwald L (2006) A high-level approach to computer document formatting. In : IEEE opening the black box of feature extraction: incorporating into high-dimensional data mining processes, pp 1550–4786
Zhao H, Sun S, Jing Z, Yang J (2006) Local structure based supervised feature extraction. Pattern Recognit 39:1546–1550
Zhou JD, Wang XD, Song H (2012) Feature selection with conjunctions of decision stumps and learning from microarray data. IEEE Trans Pattern Anal Mach Intell 34(1):174–186
Zhu ZB, Song ZH (2011) A novel fault diagnosis system using pattern classification on kernel fda subspace. Expert Syst Appl 38(6):6895–6905
Zuo W, Zhang D, Yang J, Wang K (2006) Bdpca plus lda: a novel fast feature extraction technique for face recognition. IEEE Trans Syst Man Cybern Part B Cybern 36(4):946–953
Dhir CS, Lee J, Lee SY (2012) Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30(2):375
Zhang Z, Ye N (2011) Locality preserving multimodal discriminative learning for supervised feature selection. Knowl Inf Syst 27(3):473–490
Yang S, Hu B (2012) Discriminative feature selection by nonparametric bayes error minimization. IEEE Trans Knowl Data Eng 24(8):1422–1434
Quanz B, Huan J, Mishra M (2012) Knowledge transfer with low-quality data: a feature extraction issue. IEEE Trans Knowl Data Eng 24(10):1789–1802
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Garcia S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all Pairwise Comparisons. J Mach Learn Res 9:2677–2694
Acknowledgments
The authors would like to thank the anonymous reviewers for their valuable comments. This work is supported in part by US NSF through grants IIS-0905215, CNS-1115234, IIS-0914934, DBI-0960443, and OISE-1129076, US Department of Army through grant W911NF-12-1-0066, Google Mobile 2014 Program, HUAWEI and KAU grants, Natural Science Foundation of China (61070033, 61203280, 61202270), Natural Science Foundation of Guangdong province (9251009001000005, S2011040004187, S2012040007078), Specialized Research Fund for the Doctoral Program of Higher Education (20124420120004), Australian Research Council Discovery Grant (DP1096218, DP130102691) and ARC Linkage Grant (LP100200774 and LP120100566).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, B., Xiao, Y., Yu, P.S. et al. An efficient orientation distance–based discriminative feature extraction method for multi-classification. Knowl Inf Syst 39, 409–433 (2014). https://doi.org/10.1007/s10115-013-0613-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0613-2