Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2999792.2999794guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Pass-efficient unsupervised feature selection

Published: 05 December 2013 Publication History
  • Get Citation Alerts
  • Abstract

    The goal of unsupervised feature selection is to identify a small number of important features that can represent the data. We propose a new algorithm, a modification of the classical pivoted QR algorithm of Businger and Golub, that requires a small number of passes over the data. The improvements are based on two ideas: keeping track of multiple features in each pass, and skipping calculations that can be shown not to affect the final selection. Our algorithm selects the exact same features as the classical pivoted QR algorithm, and has the same favorable numerical stability. We describe experiments on real-world datasets which sometimes show improvements of several orders of magnitude over the classical algorithm. These results appear to be competitive with recently proposed randomized algorithms in terms of pass efficiency and run time. On the other hand, the randomized algorithms may produce more accurate features, at the cost of small probability of failure.

    References

    [1]
    M. Gu and S. C. Eisenstat. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM J. Computing, 17(4):848-869, 1996.
    [2]
    C. Boutsidis, M. W. Mahoney, and P. Drineas. An improved approximation algorithm for the column subset selection problem. In Claire Mathieu, editor, Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2009, New York, NY, USA, January 4-6, 2009, pages 968-977. SIAM, 2009.
    [3]
    C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Near-optimal column-based matrix reconstruction, February 2011. arXiv e-print (arXiv:1103.0995).
    [4]
    A. Dasgupta, P. Drineas, B. Harb, V. Josifovski, and M. W. Mahoney. Feature selection methods for text classification. In Pavel Berkhin, Rich Caruana, and Xindong Wu, editors, KDD, pages 230-239. ACM, 2007.
    [5]
    C. Boutsidis, P. Drineas, and M. Magdon-Ismail. Sparse features for PCA-like linear regression. In John Shawe-Taylor, Richard S. Zemel, Peter L. Bartlett, Fernando C. N. Pereira, and Kilian Q. Weinberger, editors, NIPS, pages 2285-2293, 2011.
    [6]
    V. Guruswami and A. K. Sinop. Optimal column-based low-rank matrix reconstruction. In Yuval Rabani, editor, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 1207-1214. SIAM, 2012.
    [7]
    Z. Li, Y. Yang, J. Liu, X. Zhou, and H. Lu. Unsupervised feature selection using nonnegative spectral analysis. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada. AAAI Press, 2012.
    [8]
    S. Zhang, H.S. Wong, Y. Shen, and D. Xie. A new unsupervised feature ranking method for gene expression data based on consensus affinity. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4):1257-1263, July 2012.
    [9]
    G. H. Golub and C. F. Van-Loan. Matrix computations. The Johns Hopkins University Press, third edition, 1996.
    [10]
    P. Businger and G. H. Golub. Linear least squares solutions by Householder transformations. Numer. Math, 7:269-276, 1965.
    [11]
    A. Çivril and M. Magdon-Ismail. Column subset selection via sparse approximation of SVD. Theoretical Computer Science, 421:1-14, March 2012.
    [12]
    A. M. Frieze, R. Kannan, and S. Vempala. Fast Monte-Carlo algorithms for finding low-rank approximations. In IEEE Symposium on Foundations of Computer Science, pages 370-378, 1998.
    [13]
    A. M. Frieze, R. Kannan, and S. Vempala. Fast Monte-Carlo algorithms for finding low-rank approximations. Journal of the ACM, 51(6):1025-1041, 2004.
    [14]
    A. Deshpande, L. Rademacher, S. Vempala, and G. Wang. Matrix approximation and projective clustering via volume sampling. Theory of Computing, 2(12):225-247, 2006.
    [15]
    A. Deshpande and L. Rademacher. Efficient volume sampling for row/column subset selection. In FOCS, pages 329-338. IEEE Computer Society Press, 2010.
    [16]
    M. W. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697-702, 2009.
    [17]
    P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research, 13:3441-3472, 2012.
    [18]
    K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. arXiv e-print (arXiv:1207.6365v4), April 2013.
    [19]
    N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 53(2):217-288, 2011.
    [20]
    T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to algorithms. MIT Press and McGraw-Hill Book Company, third edition, 2009.

    Cited By

    View all
    • (2017)Feature SelectionACM Computing Surveys10.1145/313662550:6(1-45)Online publication date: 6-Dec-2017

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2
    December 2013
    3236 pages

    Publisher

    Curran Associates Inc.

    Red Hook, NY, United States

    Publication History

    Published: 05 December 2013

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Feature SelectionACM Computing Surveys10.1145/313662550:6(1-45)Online publication date: 6-Dec-2017

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media