Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1014052.1014149acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Redundancy based feature selection for microarray data

Published: 22 August 2004 Publication History
  • Get Citation Alerts
  • Abstract

    In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.

    References

    [1]
    A. Alizadeh and et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503--511, 2000.
    [2]
    U. Alon and et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl Acad. Sci. USA, 96:6745--6750, 1999.
    [3]
    A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245--271, 1997.
    [4]
    M. Dash and H. Liu. Feature selection for classification. Intelligent Data Analysis: An International Journal, 1(3):131--156, 1997.
    [5]
    C. Ding and H. Peng. Minimum redundancy feature selection from microarray gene expression data. In Proceedings of the Computational Systems Bioinformatics Conference, pages 523--529, 2003.
    [6]
    E. R. Dougherty. Small sample issue for microarray-based classification. Comparative and Functional Genomics, 2:28--34, 2001.
    [7]
    T. R. Golub and et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531--537, 1999.
    [8]
    M. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In Proceedings of the 17th International Conference on Machine Learning, pages 359--366, 2000.
    [9]
    D. D. Jensen and P. R. Cohen. Multiple comparisions in induction algorithms. Machine Learning, 38(3):309--338, 2000.
    [10]
    D. Jiang, J. Pei, and A. Zhang. Interactive exploration of coherent patterns in time-series gene expression data. In Proceedings of the 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 565--570, 2003.
    [11]
    G. John, R. Kohavi, and K. Pfleger. Irrelevant feature and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning, pages 121--129, 1994.
    [12]
    R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273--324, 1997.
    [13]
    D. Koller and M. Sahami. Toward optimal feature selection. In Proceedings of the 13th International Conference on Machine Learning, pages 284--292, 1996.
    [14]
    H. Liu, F. Hussain, C. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4):393--423, 2002.
    [15]
    H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic Publishers, 1998.
    [16]
    F. Model, P. Adorjan, A. Olek, and C. Piepenbrock. Feature selection for DNA methylation based cancer classification. Bioinformatics, 17:157--164, 2001.
    [17]
    J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
    [18]
    M. Robnik-Sikonja and I. Kononenko. Theoretical and empirical analysis of Relief and ReliefF. Machine Learning, 53:23--69, 2003.
    [19]
    M. Schena, D. Shalon, R. W. Davis, and P. O. Brown. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270:467--470, 1995.
    [20]
    C. Tang, A. Zhang, and J. Pei. Mining phenotypes and informative genes from gene expression data. In Proceedings of the 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 655--660, 2003.
    [21]
    I. Witten and E. Frank. Data Mining - Pracitcal Machine Learning Tools and Techniques with JAVA Implementations. Morgan Kaufmann Publishers, 2000.
    [22]
    E. Xing, M. Jordan, and R. Karp. Feature selection for high-dimensional genomic microarray data. In Proceedings of the 18th International Conference on Machine Learning, pages 601--608, 2001.
    [23]
    M. Xiong, Z. Fang, and J. Zhao. Biomarker identification by feature wrappers. Genome Research, 11:1878--1887, 2001.
    [24]
    L. Yu and H. Liu. Feature selection for high-dimensional data: a fast correlation-based filter solution. In Proc. of the 20th International Conference on Machine Learning, pages 856--863, 2003.

    Cited By

    View all
    • (2024)Surrogate-Assisted and Filter-Based Multiobjective Evolutionary Feature Selection for Deep LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.323462935:7(9591-9605)Online publication date: Jul-2024
    • (2024)Human local field potentials in motor and non-motor brain areas encode upcoming movement directionCommunications Biology10.1038/s42003-024-06151-37:1Online publication date: 27-Apr-2024
    • (2024)Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measureNeurocomputing10.1016/j.neucom.2023.127111571(127111)Online publication date: Feb-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2004
    874 pages
    ISBN:1581138881
    DOI:10.1145/1014052
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. feature redundancy
    2. gene selection
    3. microarray data

    Qualifiers

    • Article

    Conference

    KDD04

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Surrogate-Assisted and Filter-Based Multiobjective Evolutionary Feature Selection for Deep LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.323462935:7(9591-9605)Online publication date: Jul-2024
    • (2024)Human local field potentials in motor and non-motor brain areas encode upcoming movement directionCommunications Biology10.1038/s42003-024-06151-37:1Online publication date: 27-Apr-2024
    • (2024)Filter unsupervised spectral feature selection method for mixed data based on a new feature correlation measureNeurocomputing10.1016/j.neucom.2023.127111571(127111)Online publication date: Feb-2024
    • (2024)Three-phases hybrid feature selection for facial expression recognitionThe Journal of Supercomputing10.1007/s11227-023-05758-380:6(8094-8128)Online publication date: 1-Apr-2024
    • (2024)Feature subset selection algorithm based on symmetric uncertainty and interaction factorMultimedia Tools and Applications10.1007/s11042-023-15821-z83:4(11247-11260)Online publication date: 1-Jan-2024
    • (2024)Feature selection techniques for machine learning: a survey of more than two decades of researchKnowledge and Information Systems10.1007/s10115-023-02010-566:3(1575-1637)Online publication date: 1-Mar-2024
    • (2024)Enhancing age-related postural sway classification using partial least squares-discriminant analysis and hybrid feature setNeural Computing and Applications10.1007/s00521-024-09557-636:10(5621-5643)Online publication date: 1-Apr-2024
    • (2023)A Comprehensive Review of Feature Selection and Feature Selection Stability in Machine LearningGazi University Journal of Science10.35378/gujs.99376336:4(1506-1520)Online publication date: 1-Dec-2023
    • (2023)Deep Learning Techniques for Biomedical Research and Significant Gene Identification using Next Generation Sequencing (NGS) Data: - A ReviewData Science and Interdisciplinary Research: Recent Trends and Applications10.2174/9789815079005123050011(172-216)Online publication date: 25-Sep-2023
    • (2023)RETRACTED ARTICLE: Features optimization selection in hidden layers of deep learning based on graph clusteringEURASIP Journal on Wireless Communications and Networking10.1186/s13638-023-02292-x2023:1Online publication date: 21-Aug-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media