Parallel Selection of Informative Genes for Classification

Slavik, Michael; Zhu, Xingquan; Mahgoub, Imad; Shoaib, Muhammad

doi:10.1007/978-3-642-00727-9_36

Michael Slavik²⁰,
Xingquan Zhu²⁰,
Imad Mahgoub²⁰ &
…
Muhammad Shoaib²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5462))

Included in the following conference series:

International Conference on Bioinformatics and Computational Biology

1129 Accesses

Abstract

In this paper, we argue that existing gene selection methods are not effective for selecting important genes when the number of samples and the data dimensions grow sufficiently large. As a solution, we propose two approaches for parallel gene selections, both are based on the well known ReliefF feature selection method. In the first design, denoted by PReliefF _p, the input data are split into non-overlapping subsets assigned to cluster nodes. Each node carries out gene selection by using the ReliefF method on its own subset, without interaction with other clusters. The final ranking of the genes is generated by gathering the weight vectors from all nodes. In the second design, namely PReliefF _g, each node dynamically updates the global weight vectors so the gene selection results in one node can be used to boost the selection of the other nodes. Experimental results from real-world microarray expression data show that PReliefF _p and PReliefF _g achieve a speedup factor nearly equal to the number of nodes. When combined with several popular classification methods, the classifiers built from the genes selected from both methods have the same or even better accuracy than the genes selected from the original ReliefF method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

MapReduce based parallel gene selection method

Article 30 July 2014

An Ensemble of Cooperative Parallel Metaheuristics for Gene Selection in Cancer Classification

Feature selection by recursive binary gravitational search algorithm optimization for cancer classification

Article 18 July 2019

References

Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article CAS PubMed Google Scholar
Xiong, M., et al.: Biomarker identification by feature wrappers. Genome Research 11, 1878–1887 (2001)
CAS PubMed PubMed Central Google Scholar
Baker, S., Kramer, B.: Identifying genes that contribute most to good classification in microarrays. BMC Bioinformatics 7, 407 (2006)
Article PubMed PubMed Central Google Scholar
Segal, E., et al.: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genetics 34(2), 166–176 (2003)
Article CAS PubMed Google Scholar
Quinlan, J.: C4.5: Programs for Machine learning. M. Kaufmann, San Francisco (1993)
Google Scholar
Hua, J., et al.: Optimal number of features as a function of sample size for various classification rules. Bioinformatics 21, 1509–1515 (2005)
Article CAS PubMed Google Scholar
Zhan, J., Deng, H.: Gene selection for classification of microarray data based on the Bayes error. BMC Bioinformatics 8, 370 (2007)
Article Google Scholar
Diaz, R., Alvarez, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Article Google Scholar
Mamitsuka, H.: Selecting features in microarray classification using ROC curves. Pattern Recognition 39, 2393–2404 (2006)
Article Google Scholar
Dobbin, K., et al.: How large a training set is needed to develop a classifier for microarray data. Clinical Cancer Research 14(1) (2008)
Google Scholar
Mukherjee, S., Roberts, S.: A Theoretical Analysis of Gene Selection. In: Proc. of IEEE Computer Society Bioinformatics Conference, pp. 131–141 (2004)
Google Scholar
Li, T., et al.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)
Article CAS PubMed Google Scholar
Statnikov, A., et al.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21(5), 631–643 (2005)
Article CAS PubMed Google Scholar
Witten, F.E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Plackett, R.: Karl Pearson and the Chi-Squared Test. International Statistical Review 51(1), 59–72 (1983)
Article Google Scholar
Robnik-Šikonja, M., Kononenko, I.: Theoretical and Empirical Analysis of ReliefF and RReliefF Mach. Learn. 53, 23–69 (2003)
Google Scholar
Gropp, W., et al.: MPICH2 User’s Guide (2008), http://www.mcs.anl.gov/research/projects/mpich2/index.php
Kohavi, R., John, G.: Wrappers for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Article Google Scholar
Kent Ridge Biomedical Data Set Repository, http://sdmc.i2r.a-star.edu.sg/rp/

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA
Michael Slavik, Xingquan Zhu, Imad Mahgoub & Muhammad Shoaib

Authors

Michael Slavik
View author publications
You can also search for this author in PubMed Google Scholar
Xingquan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Imad Mahgoub
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Shoaib
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Connecticut, 257 ITE Building, 371 Fairfield Way, CT 06269-2155, Storrs, USA
Sanguthevar Rajasekaran

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Slavik, M., Zhu, X., Mahgoub, I., Shoaib, M. (2009). Parallel Selection of Informative Genes for Classification. In: Rajasekaran, S. (eds) Bioinformatics and Computational Biology. BICoB 2009. Lecture Notes in Computer Science(), vol 5462. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00727-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-00727-9_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00726-2
Online ISBN: 978-3-642-00727-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallel Selection of Informative Genes for Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

MapReduce based parallel gene selection method

An Ensemble of Cooperative Parallel Metaheuristics for Gene Selection in Cancer Classification

Feature selection by recursive binary gravitational search algorithm optimization for cancer classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Parallel Selection of Informative Genes for Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

MapReduce based parallel gene selection method

An Ensemble of Cooperative Parallel Metaheuristics for Gene Selection in Cancer Classification

Feature selection by recursive binary gravitational search algorithm optimization for cancer classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation