Feature Elimination Approach Based on Random Forest for Cancer Diagnosis

Nguyen, Ha-Nam; Vu, Trung-Nghia; Ohn, Syng-Yup; Park, Young-Mee; Han, Mi Young; Kim, Chul Woo

doi:10.1007/11925231_50

Ha-Nam Nguyen²⁰,
Trung-Nghia Vu²⁰,
Syng-Yup Ohn²⁰,
Young-Mee Park²¹,
Mi Young Han²² &
…
Chul Woo Kim²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4293))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1015 Accesses
3 Citations

Abstract

The performance of learning tasks is very sensitive to the characteristics of training data. There are several ways to increase the effect of learning performance including standardization, normalization, signal enhancement, linear or non-linear space embedding methods, etc. Among those methods, determining the relevant and informative features is one of the key steps in the data analysis process that helps to improve the performance, reduce the generation of data, and understand the characteristics of data. Researchers have developed the various methods to extract the set of relevant features but no one method prevails. Random Forest, which is an ensemble classifier based on the set of tree classifiers, turns out good classification performance. Taking advantage of Random Forest and using wrapper approach first introduced by Kohavi et al, we propose a new algorithm to find the optimal subset of features. The Random Forest is used to obtain the feature ranking values. And these values are applied to decide which features are eliminated in the each iteration of the algorithm. We conducted experiments with two public datasets: colon cancer and leukemia cancer. The experimental results of the real world data showed that the proposed method results in a higher prediction rate than a baseline method for certain data sets and also shows comparable and sometimes better performance than the feature selection methods widely used.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 239.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Hybrid of Filters and Genetic Algorithm - Random Forests Based Wrapper Approach for Feature Selection and Prediction

Improved Feature Selection Algorithm for Prognosis Prediction of Primary Liver Cancer

Predicting Disease Risks Using Feature Selection Based on Random Forest and Support Vector Machine

References

Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence, 273–324 (1997)
Google Scholar
Blum, A.L., Langley, P.: Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence, 245–271 (1997)
Google Scholar
Breiman, L.: Random forest. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Torkkola, K., Venkatesan, S., Liu, H.: Sensor selection for maneuver classification. In: Proceedings. The 7th International IEEE Conference on Intelligent Transportation Systems, pp. 636–641 (2004)
Google Scholar
Wu, Y., Zhang, A.: Feature selection for classifying high-dimensional numerical data. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 251–258 (2004)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, Chichester (2001)
MATH Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, New York (1984)
MATH Google Scholar
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, J.P., Mesirov, J., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Fröhlich, H., Chapelle, O., Schölkopf, B.: Feature Selection for Support Vector Machines by Means of Genetic Algorithms. In: 15th IEEE International Conference on Tools with Artificial Intelligence, p. 142 (2003)
Google Scholar
Chen, X.-w.: Gene Selection for Cancer Classification Using Bootstrapped Genetic Algorithms and Support Vector Machines. In: IEEE Computer Society Bioinformatics Conference, p. 504 (2003)
Google Scholar
Zhang, H., Yu, C.-Y., Singer, B.: Cell and tumor classification using gene expression data: Construction of forests. Proceeding of the National Academy of Sciences of the United States of America 100, 4168–4172 (2003)
Article Google Scholar
Doak, J.: An evaluation of feature selection methods and their application to computer security, Technical Report CSE-92-18, Department of Computer Science and Engineering, University of Carlifornia (1992)
Google Scholar
Das, S.: Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of the 18th ICML ( (2001)
Google Scholar
Ng, A.Y.: On feature selection: learning with exponentially many irrelevant features as training examples. In: Proceedings of the Fifteenth International Conference on Machine Learning (1998)
Google Scholar
Xing, E., Jordan, M., Carp, R.: Feature selection for highdimensional genomic microarray data. In: Proc. of the 18th ICML (2001)
Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Proceeding of the International Conference on Extending Database Technology, pp. 18–32 (1996)
Google Scholar
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proceedings of National Academy of Sciences of the United States of American 96, 6745–6750 (1999)
Article Google Scholar
Nguyen, H.-N., Ohn, S.-Y., Park, J., Park, K.-S.: Combined Kernel Function Approach in SVM for Diagnosis of Cancer. In: Proceedings of the First International Conference on Natural Computation (2005)
Google Scholar
Su, T., Basu, M., Toure, A.: Multi-Domain Gating Network for Classification of Cancer Cells using Gene Expression Data. In: Proceedings of the International Joint Conference on Neural Networks, pp. 286–289 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer and Information Engineering, Hankuk Aviation University, Seoul, Korea
Ha-Nam Nguyen, Trung-Nghia Vu & Syng-Yup Ohn
Dept. of Cell Stress Biology, Roswell Park Cancer Institute, SUNY Buffalo, NY, USA
Young-Mee Park
Bioinfra Inc., Seoul, Korea
Mi Young Han
Dept. of Pathology, Tumor Immunity Medical Research Center, Seoul National University College of Medicine, Seoul, Korea
Chul Woo Kim

Authors

Ha-Nam Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Trung-Nghia Vu
View author publications
You can also search for this author in PubMed Google Scholar
Syng-Yup Ohn
View author publications
You can also search for this author in PubMed Google Scholar
Young-Mee Park
View author publications
You can also search for this author in PubMed Google Scholar
Mi Young Han
View author publications
You can also search for this author in PubMed Google Scholar
Chul Woo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, México
Alexander Gelbukh
Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro No. 1, Sta. Ma. Tonanzintla, 72840, Puebla, México
Carlos Alberto Reyes-Garcia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, HN., Vu, TN., Ohn, SY., Park, YM., Han, M.Y., Kim, C.W. (2006). Feature Elimination Approach Based on Random Forest for Cancer Diagnosis. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_50

Download citation

DOI: https://doi.org/10.1007/11925231_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49026-5
Online ISBN: 978-3-540-49058-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Feature Elimination Approach Based on Random Forest for Cancer Diagnosis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Hybrid of Filters and Genetic Algorithm - Random Forests Based Wrapper Approach for Feature Selection and Prediction

Improved Feature Selection Algorithm for Prognosis Prediction of Primary Liver Cancer

Predicting Disease Risks Using Feature Selection Based on Random Forest and Support Vector Machine

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Feature Elimination Approach Based on Random Forest for Cancer Diagnosis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Hybrid of Filters and Genetic Algorithm - Random Forests Based Wrapper Approach for Feature Selection and Prediction

Improved Feature Selection Algorithm for Prognosis Prediction of Primary Liver Cancer

Predicting Disease Risks Using Feature Selection Based on Random Forest and Support Vector Machine

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation