Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Efficient greedy feature selection for unsupervised learning

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. \(\Vert A \Vert _{F}^{2} = trace(A^TA)\).

  2. Data sets are available in MATLAB format at:

    http://www.zjucadcg.cn/dengcai/Data/FaceData.html.

    http://www.zjucadcg.cn/dengcai/Data/MLData.html.

    http://www.zjucadcg.cn/dengcai/Data/TextData.html.

  3. http://people.csail.mit.edu/jrennie/20Newsgroups/.

  4. The following implementations were used:

    FSFS: http://www.facweb.iitkgp.ernet.in/~pabitra/paper/fsfs.tar.gz.

    LS: http://www.zjucadcg.cn/dengcai/Data/code/LaplacianScore.m.

    SPEC: http://featureselection.asu.edu/algorithms/fs_uns_spec.zip.

    MCFS: http://www.zjucadcg.cn/dengcai/Data/code/MCFS_p.m.

  5. The CPFA method was not included in the comparison as its implementation details were not completely specified in [20].

  6. The experiments on the first four data sets were conducted on an Intel P4 3.6 GHz machine with 2 GB RAM, while the experiments on the last two last sets were conducted on an Intel Core i5 650 3.2 GHz machine with 8 GB RAM.

  7. The implementations of AP and SPEC algorithms do not scale to run on the USPS data set, and those of AP, PCA-LRG, FSFS, and SPEC do not scale to run on the TDT2-30 and 20NG data sets on the used simulation machines.

References

  1. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) A review of feature selection methods on synthetic data. Knowl Inf Syst 1–37. doi:10.1007/s10115-012-0487-8

  2. Boutsidis C, Mahoney M, Drineas P (2009) Unsupervised feature selection for the \(k\)-means clustering problem. In: Proceedings of advances in neural information processing systems (NIPS), vol 22. Curran Associates, Red Hook, pp 153–161

  3. Boutsidis C, Mahoney MW, Drineas P (2008) Unsupervised feature selection for principal components analysis. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 61–69

  4. Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), ACM, New York, pp 333–342

  5. Cieri C, Graff D, Liberman M, Martey N, Strassel S (1999) The TDT-2 text and speech corpus. In: Proceedings of the DARPA Broadcast News, Workshop, pp 57–60

  6. Cole R, Fanty M (1990) Spoken letter recognition. In: Proceedings of the third DARPA workshop on speech and natural language, pp 385–390

  7. Cui Y, Dy J (2008) Orthogonal principal feature selection, the sparse optimization and variable selection workshop at the international conference on machine learning (ICML)

  8. Dhillon I, Modha D (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1):143–175

    Article  MATH  Google Scholar 

  9. Dhir C, Lee J, Lee S-Y (2012) Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30:359–375

    Article  Google Scholar 

  10. Farahat A, Ghodsi A, Kamel M (2011) An efficient greedy method for unsupervised feature selection. In: Proceedings of the 2011 IEEE 11th international conference on data mining (ICDM), pp 161–170

  11. Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972

    Article  MathSciNet  MATH  Google Scholar 

  12. Guyon I (2006) Feature extraction: foundations and applications. Springer, Berlin

    Book  MATH  Google Scholar 

  13. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  14. He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Proceedings of advances in neural information processing systems (NIPS) 18, MIT Press, Cambridge, pp 507–514

  15. Hull J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554

    Article  Google Scholar 

  16. Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  17. Jolliffe I (2002) Principal component analysis, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  18. Lu Y, Cohen I, Zhou X, Tian Q (2007) Feature selection using principal feature analysis. In: Proceedings of the 15th international conference on multimedia. ACM, New York, pp 301–304

  19. Lütkepohl H (1996) Handbook of matrices. Wiley, New York

    MATH  Google Scholar 

  20. Masaeli M, Yan Y, Cui Y, Fung G, Dy J (2010) Convex principal feature selection. In: Proceedings of SIAM international conference on data mining (SDM). SIAM, Philadelphia, pp 619–628

  21. Mitra P, Murthy C, Pal S (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312

    Article  Google Scholar 

  22. Nene S, Nayar S, Murase H (1996) Columbia object image library (COIL-20), technical report CUCS-005-96, Columbia University

  23. Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Proceedings of advances in neural information processing systems (NIPS), vol 14, MIT Press, Cambridge, pp 849–856

  24. Samaria F, Harter A (1994) Parameterisation of a stochastic model for human face identification. In: Proceedings of the second IEEE workshop on applications of computer vision, pp 138–142

  25. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  26. Wolf L, Shashua A (2005) Feature selection for unsupervised and supervised inference: the emergence of sparsity in a weight-based approach. J Mach Learn Res 6:1855–1887

    MathSciNet  MATH  Google Scholar 

  27. Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Proceedings of advances in neural information processing systems (NIPS), vol 16. MIT Press, Cambridge, pp 1601–1608

  28. Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning (ICML), ACM, New York, pp 1151–1157

  29. Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed K. Farahat.

Additional information

A preliminary version of this paper appeared as Farahat et al. [10].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Farahat, A.K., Ghodsi, A. & Kamel, M.S. Efficient greedy feature selection for unsupervised learning. Knowl Inf Syst 35, 285–310 (2013). https://doi.org/10.1007/s10115-012-0538-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0538-1

Keywords