Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Proposal to Estimate the Variable Importance Measures in Predictive Models Using Results from a Wrapper

  • Conference paper
  • First Online:
Mining Intelligence and Knowledge Exploration (MIKE 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11308))

  • 914 Accesses

Abstract

The methods for variable importance measures and feature selection in the task of classification/regression in data mining and Big Data enable the removal of noise caused by irrelevant or redundant variables, the reduction of computational cost in the construction of models and facilitate the understanding of these models. This paper presents a proposal to measure the importance of the input variables in a classification/regression problem, taking as input the solutions evaluated by a wrapper and the performance information (quality of classification expressed for example in accuracy, precision, recall, F measure, among others) associated with each of these solutions. The proposed method quantifies the effect on the classification/regression performance produced by the presence or absence of each input variable in the subsets evaluated by the wrapper. This measure has the advantage of being specific for each classifier, which makes it possible to differentiate the effects each input variable can generate depending on the model built. The proposed method was evaluated using the results of three wrappers - one based on genetic algorithms (GA), another on particle swarm optimization (PSO), and a new proposal based on covering arrays (CA) - and compared with two filters and the variable importance in Random Forest. The experiments were performed on three classifiers (Naive Bayes, Random Forest and Multi-Layer Perception) and seven data sets from the UCI repository. The comparisons were made using Friedman’s Aligned Ranks test and the results indicate that the proposed measure stands out for maintaining in the first input variables a higher quality in the classification, approximating better to the variables found by the feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cui, L., Lu, Z., Wang, P., Wang, W.: The ordering importance measure of random variable and its estimation. Math. Comput. Simul. 105, 132–143 (2014)

    Article  MathSciNet  Google Scholar 

  2. Li, L., Lu, Z.: Importance analysis for models with correlated variables and its sparse grid solution. Reliab. Eng. Syst. Saf. 119, 207–217 (2013)

    Article  Google Scholar 

  3. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)

    Article  Google Scholar 

  4. Aggarwal, C.C.: Feature selection for classification: a review. In: Tang, J., Alelyani, S., Liu, H. (eds.) Data Classification: Algorithms and Applications, 1st edn., pp. 37-64. Chapman & Hall/CRC (2014)

    Google Scholar 

  5. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    Article  Google Scholar 

  6. Wei, P., Lu, Z., Song, J.: Variable importance analysis: a comprehensive review. Reliab. Eng. Syst. Saf. 142, 399–432 (2015)

    Article  Google Scholar 

  7. Altmann, A., Toloşi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26, 1340–1347 (2010)

    Article  Google Scholar 

  8. Kotsiantis, S.B.: Feature selection for machine learning classification problems : a recent overview. (2011)

    Google Scholar 

  9. Jovi, A., Brki, K., Bogunovi, N.: A review of feature selection methods with applications, pp. 25–29 (2015)

    Google Scholar 

  10. Abd-Alsabour, N.: A review on evolutionary feature selection. In: Proceedings - UKSim-AMSS 8th European Modelling Symposium on Computer Modelling and Simulation, EMS 2014, pp. 20–26 (2014)

    Google Scholar 

  11. Khan, G.M.: Evolutionary computation. In: Evolution of Artificial Neural Development, pp. 29–37. Springer, Boston (2018). https://doi.org/10.1007/978-1-4899-7687-1

  12. Sastry, K., Goldberg, D.E., Kendall, G.: Genetic algorithms. In: Search Methodologies, pp. 93–117. Springer, Boston (2014). https://doi.org/10.1007/0-387-28356-0_4

  13. Wan, Y., Wang, M., Ye, Z., Lai, X.: A feature selection method based on modified binary coded ant colony optimization algorithm. Appl. Soft Comput. 49, 248–258 (2016)

    Article  Google Scholar 

  14. Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671 (2013)

    Article  Google Scholar 

  15. Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. IEEE Int. Conf. Syst. Man, Cybern. Comput. Cybern. Simul. 5, 4104–4108 (1997)

    Google Scholar 

  16. Tzanakis, G., Moura, L., Panario, D., Stevens, B.: Constructing new covering arrays from LFSR sequences over finite fields. Discrete Math. 339, 1158–1171 (2016)

    Article  MathSciNet  Google Scholar 

  17. Dheeru, D., Karra Taniskidou, E.: UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

  18. Lin, S.-W., Lee, Z.-J., Chen, S.-C., Tseng, T.-Y.: Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl. Soft Comput. 8, 1505–1512 (2008)

    Article  Google Scholar 

  19. García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. (Ny) 180, 2044–2064 (2010)

    Article  Google Scholar 

  20. Zarshenas, A., Suzuki, K.: Binary coordinate ascent: an efficient optimization technique for feature subset selection for machine learning. Knowl.-Based Syst. 110, 191–201 (2016)

    Article  Google Scholar 

  21. Strobl, C., Boulesteix, A.-L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. 8, 25 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo Dorado .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dorado, H., Cobos, C., Torres-Jimenez, J., Jimenez, D., Mendoza, M. (2018). A Proposal to Estimate the Variable Importance Measures in Predictive Models Using Results from a Wrapper. In: Groza, A., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2018. Lecture Notes in Computer Science(), vol 11308. Springer, Cham. https://doi.org/10.1007/978-3-030-05918-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05918-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05917-0

  • Online ISBN: 978-3-030-05918-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics