research-article

Boosting meta-learning with simulated data complexity measures

Authors: Luís P.F. Garcia, Adriano Rivolli, Edesio Alcoba, Ana C. Lorena, André C.P.L.F. de CarvalhoAuthors Info & Claims

Intelligent Data Analysis, Volume 24, Issue 5

Pages 1011 - 1028

https://doi.org/10.3233/IDA-194803

Published: 01 January 2020 Publication History

Abstract

Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance of a pool of machine learning algorithms when applied to these datasets. The meta-features must describe essential aspects of the dataset and distinguish different problems and solutions. However, if one wants the use of Meta-Learning to be computationally efficient, the extraction of the meta-feature values should also show a low computational cost, considering a trade-off between the time spent to run all the algorithms and the time required to extract the meta-features. One class of measures with successful results in the characterization of classification datasets is concerned with estimating the underlying complexity of the classification problem. These data complexity measures take into account the overlap between classes imposed by the feature values, the separability of the classes and distribution of the instances within the classes. However, the extraction of these measures from datasets usually presents a high computational cost. In this paper, we propose an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability. The proposal consists of a novel Meta-Learning system able to predict the values of the data complexity measures for a dataset by using simpler meta-features as input. In an extensive set of experiments, we show that the predictive performance achieved by Meta-Learning systems which use the predicted data complexity measures is similar to the performance obtained using the original data complexity measures, but the computational cost involved in their computation is significantly reduced.

References

[1]

V.H. Barella, L.P.F. Garcia, M.P. de Souto, A.C. Lorena and A.C.P.L.F. de Carvalho, Data complexity measures for imbalanced classification tasks, In International Joint Conference on Neural Networks (IJCNN), volume 1, 2018, pp. 1–8.

Abstract

References

Cited By

Index Terms

Recommendations

Data complexity meta-features for regression problems

Simulating Complexity Measures on Imbalanced Datasets

Dataset2Vec: learning dataset meta-features

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations