Abstract
Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.
Similar content being viewed by others
Notes
\(\Vert A \Vert _{F}^{2} = trace(A^TA)\).
Data sets are available in MATLAB format at:
http://www.zjucadcg.cn/dengcai/Data/FaceData.html.
The following implementations were used:
FSFS: http://www.facweb.iitkgp.ernet.in/~pabitra/paper/fsfs.tar.gz.
LS: http://www.zjucadcg.cn/dengcai/Data/code/LaplacianScore.m.
SPEC: http://featureselection.asu.edu/algorithms/fs_uns_spec.zip.
The CPFA method was not included in the comparison as its implementation details were not completely specified in [20].
The experiments on the first four data sets were conducted on an Intel P4 3.6 GHz machine with 2 GB RAM, while the experiments on the last two last sets were conducted on an Intel Core i5 650 3.2 GHz machine with 8 GB RAM.
The implementations of AP and SPEC algorithms do not scale to run on the USPS data set, and those of AP, PCA-LRG, FSFS, and SPEC do not scale to run on the TDT2-30 and 20NG data sets on the used simulation machines.
References
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) A review of feature selection methods on synthetic data. Knowl Inf Syst 1–37. doi:10.1007/s10115-012-0487-8
Boutsidis C, Mahoney M, Drineas P (2009) Unsupervised feature selection for the \(k\)-means clustering problem. In: Proceedings of advances in neural information processing systems (NIPS), vol 22. Curran Associates, Red Hook, pp 153–161
Boutsidis C, Mahoney MW, Drineas P (2008) Unsupervised feature selection for principal components analysis. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 61–69
Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), ACM, New York, pp 333–342
Cieri C, Graff D, Liberman M, Martey N, Strassel S (1999) The TDT-2 text and speech corpus. In: Proceedings of the DARPA Broadcast News, Workshop, pp 57–60
Cole R, Fanty M (1990) Spoken letter recognition. In: Proceedings of the third DARPA workshop on speech and natural language, pp 385–390
Cui Y, Dy J (2008) Orthogonal principal feature selection, the sparse optimization and variable selection workshop at the international conference on machine learning (ICML)
Dhillon I, Modha D (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1):143–175
Dhir C, Lee J, Lee S-Y (2012) Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30:359–375
Farahat A, Ghodsi A, Kamel M (2011) An efficient greedy method for unsupervised feature selection. In: Proceedings of the 2011 IEEE 11th international conference on data mining (ICDM), pp 161–170
Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972
Guyon I (2006) Feature extraction: foundations and applications. Springer, Berlin
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Proceedings of advances in neural information processing systems (NIPS) 18, MIT Press, Cambridge, pp 507–514
Hull J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River
Jolliffe I (2002) Principal component analysis, 2nd edn. Springer, Berlin
Lu Y, Cohen I, Zhou X, Tian Q (2007) Feature selection using principal feature analysis. In: Proceedings of the 15th international conference on multimedia. ACM, New York, pp 301–304
Lütkepohl H (1996) Handbook of matrices. Wiley, New York
Masaeli M, Yan Y, Cui Y, Fung G, Dy J (2010) Convex principal feature selection. In: Proceedings of SIAM international conference on data mining (SDM). SIAM, Philadelphia, pp 619–628
Mitra P, Murthy C, Pal S (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312
Nene S, Nayar S, Murase H (1996) Columbia object image library (COIL-20), technical report CUCS-005-96, Columbia University
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Proceedings of advances in neural information processing systems (NIPS), vol 14, MIT Press, Cambridge, pp 849–856
Samaria F, Harter A (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Wolf L, Shashua A (2005) Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J Mach Learn Res 6:1855–1887
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Proceedings of advances in neural information processing systems (NIPS), vol 16. MIT Press, Cambridge, pp 1601–1608
Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning (ICML), ACM, New York, pp 1151–1157
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appeared as Farahat et al. [10].
Rights and permissions
About this article
Cite this article
Farahat, A.K., Ghodsi, A. & Kamel, M.S. Efficient greedy feature selection for unsupervised learning. Knowl Inf Syst 35, 285–310 (2013). https://doi.org/10.1007/s10115-012-0538-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0538-1