Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2649387.2649396acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
short-paper

Spectral feature selection and its application in high dimensional gene expression studies

Published: 20 September 2014 Publication History
  • Get Citation Alerts
  • Abstract

    Many variable selection techniques have been proposed for clustering analysis of gene expression data. Motivated by spectral learning, we propose a new filtering method that uses the correlation between features and the eigenspace of sample similarity matrix as the variable selection criteria. Spectral algorithm states that a sample similarity matrix with q strongly connected components tends to have q piecewise almost constant eigenvectors representing a specific partition of the sample space. Using distance correlation metric, our proposed method, spectral correlation (Scorrelation) measures features' correlation with the top q eigenvectors of sample similarity matrix and then infers their ability in differentiating the underlying clusters of samples. Our method has been applied to large-scale gene expression datasets. Compared to other filtering methods, our method is more effective and provides better clustering results in terms of clustering error rate and the reliability of the selected features. Our framework can be easily extended to other types of datasets for addressing clustering and classification problems.

    References

    [1]
    R. Kohavi, and G. John, "wrappers for feature subset selection," artifical intelligence, vol. 97, pp. 52, 1997.
    [2]
    L. Yu, and H. Liu, "Efficient feature selection via analysis of relevance and redundancy," J. Machine Learning Res., vol. 5, pp. 20, 2004.
    [3]
    J. Xu, Q. Wu, J. Zhang, F. Shen, and Z. Tang, "Boosting separability in semisupervised learning for object classification," IEEE transactions on circuits and systems for video technology, vol. 24, no. 7, pp. 12, 2014.
    [4]
    F. G. Dy, and C. E. Brodley, "Feature selection for unsupervised learning," J. Machine Learning Res., vol. 5, pp. 45, 2004.
    [5]
    M. H. Law, A. K. Jain, and F. M. A. T., "Feature selection in mixture-based clustering," In: NIPS, pp. 8, 2002.
    [6]
    S. Alelyani, J. Tang, and H. Liu, "Feature slection for clustering: review," Data Clustering: Algorithms and Applications, Editor: Charu Aggarwal and Chandan Reddy, CRC Press, 2013.
    [7]
    G. C. Cawley, N. L. Talbot, and M. Girolami, "Sparse multinomial logistic regression via bayesian l1 regularisation," In Neural Information Processing Systems, 2006.
    [8]
    X. He, D. Cai, and P. Niyogi, "Laplacian score for feature selection," Advances in Neural Information Processing Systems, vol. 18, pp. 8, 2006.
    [9]
    T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander, "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring," Science, vol. 286, no. 5439, pp. 531--7, Oct 15, 1999.
    [10]
    Z. Zhao, and H. Liu, "Spectral feature selection for supervised and unsupervised learning," in ICML '07: Proceedings of the 24th international conference on Machine learning, New York, NY, 2007, pp. 1151--1157.
    [11]
    D. Cai, C. Zhang, and X. He, "Unsupervised feature selection for multi-cluster data," in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 2010, pp. 333--342.
    [12]
    P. Qiu, and S. K. Plevritis, "Simultaneous class discovery and classification of microarray data using spectral analysis," Journal of Computational Biology, vol. 16, no. 7, pp. 10, 2009.
    [13]
    U. Von Luxburg, "A tutorial on spectral clustering," Statistics and Computing, vol. 17, no. 4, pp. 22, 2007.
    [14]
    G. Szekely, and M. Rizzo, "Brownian distance covariance," The Annuals Applied Statistics, vol. 3, no. 4, pp. 30, 2009.
    [15]
    G. Szekely, M. Rizzo, and N. Bakirov, "Measuring and testing independence by correlation of distances," The Annals of Statistics, vol. 35, no. 6, pp. 24, 2007.
    [16]
    A. Y. Ng, Y. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an algorithm," in Advances in Neural Information Processing Systems, Cambridge, MA, 2001, pp. 849--856.
    [17]
    F. R. Chung, Spectral Graph Theory: Regional Conference Series in mathematics, 1997.
    [18]
    J. Shi, and J. Malik, "Normalized cuts and image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 18, 2000.
    [19]
    D. Witten, and R. Tibshirani, "a framework for feature selection in clustering," Journal of the American Statistical Association, vol. 105, no. 490, pp. 14, 2010.
    [20]
    Z. Wang, F. A. San Lucas, Q. Peng, and Y. Liu, "Improving the sensitivity of sample clustering by leverging gene co-expression network in variable selection," BMC bioinformatics, vol. 15, pp. 11, 2014.
    [21]
    R. Tibshirani, "Regression Shrinkage and Selection via the Lasso," Journal of the Royal Statistical Society, vol. 58, no. 1, pp. 22, 1996.
    [22]
    U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine, "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays," Proc Natl Acad Sci U S A, vol. 96, no. 12, pp. 6745--50, Jun 8, 1999.
    [23]
    S. Dudoit, and J. Fridlyand, "A prediction-based resampling method for estimating the number of clusters in a dataset," Genome Biol, vol. 3, no. 7, pp. RESEARCH0036, Jun 25, 2002.
    [24]
    G. Getz, E. Levine, and E. Domany, "Coupled two-way clustering analysis of gene microarray data," Proc Natl Acad Sci U S A, vol. 97, no. 22, pp. 12079--84, Oct 24, 2000.
    [25]
    R. De Bin, and D. Risso, "A novel approach to the clustering of microarray data via nonparametric density estimation," BMC Bioinformatics, vol. 12, pp. 49, 2011.
    [26]
    M. Brito, E. Chavez, A. Quiroz, and J. Yukich, "Connectivity of the mutal k-nearest-neighbor graph in clustering and outlier detection," Statistics and Probability Letters, vol. 35, pp. 10, 1997.
    [27]
    F. Bach, and M. Jordan, "Learning Spectral Clustering, With Application to Speech Separation," Journal of Machine Learning Research, vol. 7, pp. 39, 2006.
    [28]
    H. Wang, F. Shkjezi, and E. Hoxha, "Distance Metric Learning for Multi-Camera People Matching," in Advanced Computational Intelligence, 2013 Sixth International conference on, Hang Zhou, China, 2013, pp. 4.
    [29]
    S. D. Kamvar, D. Klein, and C. D. Manning, "Spectral learning," in International Joint Conference on Artificial Intelligence (IJCAI), 2003.
    [30]
    Z. Wang, W. Xu, F. A. San Lucas, and Y. Liu, "Incorporating prior knowledge into Gene Network Study," Bioinformatics, vol. 29, no. 20, pp. 2633--2640, Oct 15, 2013.
    [31]
    W. Xu, A. San Lucas, Z. Wang, and Y. Liu, "Identifying MicroRNA Targets in Different Gene Regions," BMC Bioinformatics, vol. 15, pp. S4, 2014.

    Cited By

    View all
    • (2015)Systematic and Integrative Analysis of Gene Expression to Identify Feature Genes Underlying Human DiseasesTranscriptomics and Gene Regulation10.1007/978-94-017-7450-5_7(161-185)Online publication date: 17-Oct-2015

    Index Terms

    1. Spectral feature selection and its application in high dimensional gene expression studies

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      BCB '14: Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
      September 2014
      851 pages
      ISBN:9781450328944
      DOI:10.1145/2649387
      • General Chairs:
      • Pierre Baldi,
      • Wei Wang
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 September 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustering
      2. distance correlation
      3. spectral feature selection
      4. unsupervised

      Qualifiers

      • Short-paper

      Funding Sources

      Conference

      BCB '14
      Sponsor:
      BCB '14: ACM-BCB '14
      September 20 - 23, 2014
      California, Newport Beach

      Acceptance Rates

      Overall Acceptance Rate 254 of 885 submissions, 29%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 11 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Systematic and Integrative Analysis of Gene Expression to Identify Feature Genes Underlying Human DiseasesTranscriptomics and Gene Regulation10.1007/978-94-017-7450-5_7(161-185)Online publication date: 17-Oct-2015

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media