research-article

Automatic recommendation of feature selection algorithms based on dataset characteristics

Authors:

Antonio Rafael Sabino Parmezan,

Huei Diana Lee,

Newton Spolaôr,

Feng Chung WuAuthors Info & Claims

Volume 185, Issue C

https://doi.org/10.1016/j.eswa.2021.115589

Published: 15 December 2021 Publication History

Abstract

Feature selection in real-world data mining problems is essential to make the learning task efficient and more accurate. Identifying the best feature selection algorithm, among the many available, is a complex activity that still relies heavily on human experts or some random trial-and-error procedure. Thus, the automated machine learning community has taken some steps towards the automation of this process. In this paper, we address the metalearning challenge of recommending feature selection algorithms by proposing a novel meta-feature engineering model. Our model considers a broad collection of meta-features that enable the study of the relationship between the dataset properties and the feature selection algorithm performance in terms of several criteria. We arrange the input meta-features into eight categories: (i) simple, (ii) statistical, (iii) information-theoretical, (iv) complexity, (v) landmarking, (vi) based on symbolic models, (vii) based on images, and (viii) based on complex networks (graphs). The target meta-features emerge from a multi-criteria performance measure, based on five individual performance indexes, that assesses feature selection methods grounded in information, distance, dependence, consistency, and precision measures. We evaluate our proposal using a recently developed framework that extracts the input meta-features from 213 benchmark datasets, and ranks the assessed feature selection algorithms, to fill in the target meta-features in meta-bases. This evaluation uses five state-of-the-art classification methods to induce recommendation models from meta-bases: C4.5, Random Forest, XGBoost, ANN, and SVM. The results showed that it is possible to reach an average accuracy of up to 90% applying our meta-feature engineering model. This work is the first to use an extensive empirical evaluation to provide a careful discussion of the strengths and limitations of more than 160 meta-features. These meta-features, while designed to aid the task of feature selection algorithm recommendation, can readily be employed in other metalearning scenarios. Therefore, we believe our findings are a valuable contribution to the fields of automated machine learning and data mining, as well as to the feature extraction and pattern recognition communities.

Highlights

•

A novel meta-feature engineering model recommends feature selection algorithms.

•

The proposal obtains promising results from 213 datasets with hit rates of up to 90%.

•

Some simple, landmarking, image, and graph-based input meta-features highlighted.

•

A multi-criteria performance measure rigorously assesses candidate algorithms.

•

Chains of binary or multiclass classifiers can efficiently rank candidate algorithms.

References

[1]

Aduviri, R., Matos, D., & Villanueva, E. (2018). Feature selection algorithm recommendation for gene expression data through gradient boosting and neural network metamodels. In IEEE international conference on bioinformatics and biomedicine (pp. 2726–2728).

Abstract

Highlights

References

Cited By

Index Terms

Recommendations

Accelerating wrapper-based feature selection with K-nearest-neighbor

Automatic frequency-based feature selection using discrete weighted evolution strategy

An Adaptive Multiple Feature Subset Method for Feature Ranking and Selection

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations