Ensemble Algorithms for Feature Selection

Rogers, Jeremy D.; Gunn, Steve R.

doi:10.1007/11559887_11

Jeremy D. Rogers²¹ &
Steve R. Gunn²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3635))

Included in the following conference series:

International Workshop on Deterministic and Statistical Methods in Machine Learning

2396 Accesses

Abstract

Many feature selection algorithms are limited in that they attempt to identify relevant feature subsets by examining the features individually. This paper introduces a technique for determining feature relevance using the average information gain achieved during the construction of decision tree ensembles. The technique introduces a node complexity measure and a statistical method for updating the feature sampling distribution based upon confidence intervals to control the rate of convergence. A feature selection threshold is also derived, using the expected performance of an irrelevant feature. Experiments demonstrate the potential of these methods and illustrate the need for both feature weighting and selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Ensemble feature selection for high dimensional data: a new method and a comparative study

Article 24 April 2017

A Comparative Study Between Feature Selection Algorithms

Measuring the Stability of Feature Selection with Applications to Ensemble Methods

References

Freund, Y., Schapire, R.: A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14, 771–780 (1999)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
MATH MathSciNet Google Scholar
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Article MATH Google Scholar
Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Wadsworth (1984)
Google Scholar
Opitz, D.: Feature selection for ensembles. In: 16th National Conference on Artificial Intelligence, pp. 379–384. AAAI, Menlo Park (1999)
Google Scholar
Ho, T.: Nearest neighbours in random subspaces. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 640–648. Springer, Heidelberg (1998)
Chapter Google Scholar
Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: 17th International Conference on Machine Learning, pp. 359–366 (2000)
Google Scholar
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Roobaert, D., Karakoulas, G., Chawla, N.: Information gain, correlation and support vector machines. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (2005) (In Press)
Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning, pp. 856–863. AAAI, Menlo Park (2003)
Google Scholar
Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference on Machine Learning, pp. 284–292 (1996)
Google Scholar
Scott, M., Niranjan, M., Prager, R.: Parcel: feature subset selection in variable cost domains. Technical report, Cambridge University Engineering Department (1998)
Google Scholar
Borisov, A., Eruhimov, V., Tuv, E.: Tree-based ensembles with dynamic soft feature selection. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (2005) (In Press)
Google Scholar
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Google Scholar
Friedman, J.: Multivariate adaptive regression splines. The Annals of Statistics 19, 1–141 (1991)
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Image, Speech and Intelligent Systems Research Group, School of Electronics and Computer Science, University of Southampton, U.K.
Jeremy D. Rogers & Steve R. Gunn

Authors

Jeremy D. Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Steve R. Gunn
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, The University of Sheffield, Regent Court, 211 Portobello Street, S1 4DP, Sheffield, UK
Joab Winkler
Department of Computer Science, The University of Sheffield, Regent Court,211 Portobello Street, S1 4DP, Sheffield, UK
Mahesan Niranjan
University of Manchester, UK
Neil Lawrence

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rogers, J.D., Gunn, S.R. (2005). Ensemble Algorithms for Feature Selection. In: Winkler, J., Niranjan, M., Lawrence, N. (eds) Deterministic and Statistical Methods in Machine Learning. DSMML 2004. Lecture Notes in Computer Science(), vol 3635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11559887_11

Download citation

DOI: https://doi.org/10.1007/11559887_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29073-5
Online ISBN: 978-3-540-31728-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics