Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Ensemble Algorithms for Feature Selection

  • Conference paper
Deterministic and Statistical Methods in Machine Learning (DSMML 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3635))

  • 2396 Accesses

Abstract

Many feature selection algorithms are limited in that they attempt to identify relevant feature subsets by examining the features individually. This paper introduces a technique for determining feature relevance using the average information gain achieved during the construction of decision tree ensembles. The technique introduces a node complexity measure and a statistical method for updating the feature sampling distribution based upon confidence intervals to control the rate of convergence. A feature selection threshold is also derived, using the expected performance of an irrelevant feature. Experiments demonstrate the potential of these methods and illustrate the need for both feature weighting and selection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Freund, Y., Schapire, R.: A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14, 771–780 (1999)

    Google Scholar 

  2. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  3. Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  4. Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)

    Article  Google Scholar 

  5. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Wadsworth (1984)

    Google Scholar 

  6. Opitz, D.: Feature selection for ensembles. In: 16th National Conference on Artificial Intelligence, pp. 379–384. AAAI, Menlo Park (1999)

    Google Scholar 

  7. Ho, T.: Nearest neighbours in random subspaces. In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS, vol. 1451, pp. 640–648. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  8. Hall, M.: Correlation-based feature selection for discrete and numeric class machine learning. In: 17th International Conference on Machine Learning, pp. 359–366 (2000)

    Google Scholar 

  9. John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.) Machine Learning, pp. 121–129. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  10. Roobaert, D., Karakoulas, G., Chawla, N.: Information gain, correlation and support vector machines. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (2005) (In Press)

    Google Scholar 

  11. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Machine Learning, pp. 856–863. AAAI, Menlo Park (2003)

    Google Scholar 

  12. Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference on Machine Learning, pp. 284–292 (1996)

    Google Scholar 

  13. Scott, M., Niranjan, M., Prager, R.: Parcel: feature subset selection in variable cost domains. Technical report, Cambridge University Engineering Department (1998)

    Google Scholar 

  14. Borisov, A., Eruhimov, V., Tuv, E.: Tree-based ensembles with dynamic soft feature selection. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (2005) (In Press)

    Google Scholar 

  15. Blake, C., Merz, C.: UCI repository of machine learning databases (1998)

    Google Scholar 

  16. Friedman, J.: Multivariate adaptive regression splines. The Annals of Statistics 19, 1–141 (1991)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rogers, J.D., Gunn, S.R. (2005). Ensemble Algorithms for Feature Selection. In: Winkler, J., Niranjan, M., Lawrence, N. (eds) Deterministic and Statistical Methods in Machine Learning. DSMML 2004. Lecture Notes in Computer Science(), vol 3635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11559887_11

Download citation

  • DOI: https://doi.org/10.1007/11559887_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29073-5

  • Online ISBN: 978-3-540-31728-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics