A Proposal to Estimate the Variable Importance Measures in Predictive Models Using Results from a Wrapper

Dorado, Hugo; Cobos, Carlos; Torres-Jimenez, Jose; Jimenez, Daniel; Mendoza, Martha

doi:10.1007/978-3-030-05918-7_33

Hugo Dorado^14,15,
Carlos Cobos¹⁴,
Jose Torres-Jimenez¹⁶,
Daniel Jimenez¹⁵ &
…
Martha Mendoza¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11308))

Included in the following conference series:

International Conference on Mining Intelligence and Knowledge Exploration

914 Accesses

Abstract

The methods for variable importance measures and feature selection in the task of classification/regression in data mining and Big Data enable the removal of noise caused by irrelevant or redundant variables, the reduction of computational cost in the construction of models and facilitate the understanding of these models. This paper presents a proposal to measure the importance of the input variables in a classification/regression problem, taking as input the solutions evaluated by a wrapper and the performance information (quality of classification expressed for example in accuracy, precision, recall, F measure, among others) associated with each of these solutions. The proposed method quantifies the effect on the classification/regression performance produced by the presence or absence of each input variable in the subsets evaluated by the wrapper. This measure has the advantage of being specific for each classifier, which makes it possible to differentiate the effects each input variable can generate depending on the model built. The proposed method was evaluated using the results of three wrappers - one based on genetic algorithms (GA), another on particle swarm optimization (PSO), and a new proposal based on covering arrays (CA) - and compared with two filters and the variable importance in Random Forest. The experiments were performed on three classifiers (Naive Bayes, Random Forest and Multi-Layer Perception) and seven data sets from the UCI repository. The comparisons were made using Friedman’s Aligned Ranks test and the results indicate that the proposed measure stands out for maintaining in the first input variables a higher quality in the classification, approximating better to the variables found by the feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hybrid Global Sensitivity Analysis Based Optimal Attribute Selection Using Classification Techniques by Machine Learning Algorithm

Article 31 August 2021

Wrapper Based Feature Selection Approach Using Black Widow Optimization Algorithm for Data Classification

Ensemble Feature Selection Method Based on Recently Developed Nature-Inspired Algorithms

References

Cui, L., Lu, Z., Wang, P., Wang, W.: The ordering importance measure of random variable and its estimation. Math. Comput. Simul. 105, 132–143 (2014)
Article MathSciNet Google Scholar
Li, L., Lu, Z.: Importance analysis for models with correlated variables and its sparse grid solution. Reliab. Eng. Syst. Saf. 119, 207–217 (2013)
Article Google Scholar
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40, 16–28 (2014)
Article Google Scholar
Aggarwal, C.C.: Feature selection for classification: a review. In: Tang, J., Alelyani, S., Liu, H. (eds.) Data Classification: Algorithms and Applications, 1st edn., pp. 37-64. Chapman & Hall/CRC (2014)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Article Google Scholar
Wei, P., Lu, Z., Song, J.: Variable importance analysis: a comprehensive review. Reliab. Eng. Syst. Saf. 142, 399–432 (2015)
Article Google Scholar
Altmann, A., Toloşi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26, 1340–1347 (2010)
Article Google Scholar
Kotsiantis, S.B.: Feature selection for machine learning classification problems : a recent overview. (2011)
Google Scholar
Jovi, A., Brki, K., Bogunovi, N.: A review of feature selection methods with applications, pp. 25–29 (2015)
Google Scholar
Abd-Alsabour, N.: A review on evolutionary feature selection. In: Proceedings - UKSim-AMSS 8th European Modelling Symposium on Computer Modelling and Simulation, EMS 2014, pp. 20–26 (2014)
Google Scholar
Khan, G.M.: Evolutionary computation. In: Evolution of Artificial Neural Development, pp. 29–37. Springer, Boston (2018). https://doi.org/10.1007/978-1-4899-7687-1
Sastry, K., Goldberg, D.E., Kendall, G.: Genetic algorithms. In: Search Methodologies, pp. 93–117. Springer, Boston (2014). https://doi.org/10.1007/0-387-28356-0_4
Wan, Y., Wang, M., Ye, Z., Lai, X.: A feature selection method based on modified binary coded ant colony optimization algorithm. Appl. Soft Comput. 49, 248–258 (2016)
Article Google Scholar
Xue, B., Zhang, M., Browne, W.N.: Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans. Cybern. 43, 1656–1671 (2013)
Article Google Scholar
Kennedy, J., Eberhart, R.C.: A discrete binary version of the particle swarm algorithm. IEEE Int. Conf. Syst. Man, Cybern. Comput. Cybern. Simul. 5, 4104–4108 (1997)
Google Scholar
Tzanakis, G., Moura, L., Panario, D., Stevens, B.: Constructing new covering arrays from LFSR sequences over finite fields. Discrete Math. 339, 1158–1171 (2016)
Article MathSciNet Google Scholar
Dheeru, D., Karra Taniskidou, E.: UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Lin, S.-W., Lee, Z.-J., Chen, S.-C., Tseng, T.-Y.: Parameter determination of support vector machine and feature selection using simulated annealing approach. Appl. Soft Comput. 8, 1505–1512 (2008)
Article Google Scholar
García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. (Ny) 180, 2044–2064 (2010)
Article Google Scholar
Zarshenas, A., Suzuki, K.: Binary coordinate ascent: an efficient optimization technique for feature subset selection for machine learning. Knowl.-Based Syst. 110, 191–201 (2016)
Article Google Scholar
Strobl, C., Boulesteix, A.-L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. 8, 25 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Information Technology Research Group (GTI), Universidad del Cauca, Sector Tulcán Office 422 FIET, Popayán, Colombia
Hugo Dorado, Carlos Cobos & Martha Mendoza
International Center for Tropical Agriculture (CIAT), Km 17 Recta Cali-Palmira, Apartado Aéreo 6713, 763537, Cali, Colombia
Hugo Dorado & Daniel Jimenez
Center for Research and Advanced Studies of the National Polytechnic Institute, Ciudad Victoria, Tamaulipas, Mexico
Jose Torres-Jimenez

Authors

Hugo Dorado
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Cobos
View author publications
You can also search for this author in PubMed Google Scholar
Jose Torres-Jimenez
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Jimenez
View author publications
You can also search for this author in PubMed Google Scholar
Martha Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hugo Dorado .

Editor information

Editors and Affiliations

Technical University of Cluj-Napoca, Cluj-Napoca, Romania
Adrian Groza
Indian Institute of Information Technology, Sri City, India
Rajendra Prasath

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dorado, H., Cobos, C., Torres-Jimenez, J., Jimenez, D., Mendoza, M. (2018). A Proposal to Estimate the Variable Importance Measures in Predictive Models Using Results from a Wrapper. In: Groza, A., Prasath, R. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2018. Lecture Notes in Computer Science(), vol 11308. Springer, Cham. https://doi.org/10.1007/978-3-030-05918-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-05918-7_33
Published: 18 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05917-0
Online ISBN: 978-3-030-05918-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Proposal to Estimate the Variable Importance Measures in Predictive Models Using Results from a Wrapper

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid Global Sensitivity Analysis Based Optimal Attribute Selection Using Classification Techniques by Machine Learning Algorithm

Wrapper Based Feature Selection Approach Using Black Widow Optimization Algorithm for Data Classification

Ensemble Feature Selection Method Based on Recently Developed Nature-Inspired Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Proposal to Estimate the Variable Importance Measures in Predictive Models Using Results from a Wrapper

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid Global Sensitivity Analysis Based Optimal Attribute Selection Using Classification Techniques by Machine Learning Algorithm

Wrapper Based Feature Selection Approach Using Black Widow Optimization Algorithm for Data Classification

Ensemble Feature Selection Method Based on Recently Developed Nature-Inspired Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation