Abstract
Metalearning has been largely used over the last years to recommend machine learning algorithms for new problems based on past experience. For such, the first step is the creation of metabase, or metadataset, containing metafeatures extracted from several datasets along with the performance of a pool of candidate algorithm(s). The next step is the induction of machine learning metamodels using the metabase as input. These models can recommend the most suitable algorithms for new datasets based on their metafeatures values. An effective metalearning system must employ metafeatures that characterize essential aspects of the datasets while also distinguishing different problems and solutions. The characterization process should also show a low computational cost, otherwise, the recommendation system can be replaced by a standard trial-and-error approach. This paper proposes the use of an unsupervised correlation-based feature selection strategy to identify a reduced subset of metafeatures for metalearning systems. Empirically, the predictive performance achieved by metalearning systems using the subset of selected metafeatures is similar or better than the performance obtained using the whole set of metafeatures. In addition, a noteworthy reduction in the number of metafeatures needed is observed, implying computational cost reductions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alcobaca, E., Siqueira, F., Rivolli, A., Garcia, L.P.F., Oliva, J.T., de Carvalho, A.C.P.L.F.: Mfe: towards reproducible meta-feature extraction. J. Mach. Learn. Res. 21(111), 1–5 (2020)
Bensusan, H., Kalousis, A.: Estimating the predictive accuracy of a classifier. In: 12th European Conference on Machine Learning (ECML), vol. 2167, pp. 25–36 (2001)
Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning - Applications to Data Mining, 1st edn. Cognitive Technologies, Springer (2009)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth and Brooks (1984)
ChristianKopf, I.I.: Combination of task description strategies and case base properties for meta-learning. In: Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning (IDDM), pp. 65–76 (2002)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Fernández, S.S., Ochoa, J.A.C., Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. Artif. Intell. Rev. 53(2), 907–948 (2020)
Filchenkov, A., Pendryak, A.: Datasets meta-feature description for recommending feature selection algorithm. Artif. Intell. Nat. Lang. Inform. Extract. Soc. Media Web Search 7, 11–18 (2015)
Haykin, S.: Neural Networks - A Comprehensive Foundation. Prentice Hall, Hoboken (1999)
Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)
Mantovani, R., Rossi, A., Alcobaça, E., Vanschoren, J., de Carvalho, A.: A meta-learning recommender system for hyperparameter tuning: predicting when tuning improves svm classifiers. Inform. Sci. 501, 193–221 (2019)
Mitchell, T.M.: Machine Learning. McGraw Hill series in computer science, McGraw Hill, New York (1997)
Muñoz, M.A., Villanova, L., Baatar, D., Smith-Miles, K.: Instance spaces for machine learning classification. Mach. Learn. 107(1), 109–147 (2018)
Pinto, F., Soares, C., Mendes-Moreira, J.: Towards automatic generation of metafeatures. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 215–226 (2016)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Reif, M., Shafait, F., Goldstein, M., Breuel, T., Dengel, A.: Automatic classifier selection for non-experts. Pattern Anal. Appl. 17(1), 83–96 (2014)
Rice, J.: The algorithm selection problem. Adv. Comp. 15, 65–118 (1976)
Rivolli, A., Garcia, L., Soares, C., Vanschoren, J., de Carvalho, A.: Towards reproducible empirical research in meta-learning. arXiv 1(1808.10406), 1–41 (2019)
Sá, J.D., Rossi, A., Batista, G., Garcia, L.P.: Algorithm recommendation for data streams. In: 25th International Conference on Pattern Recognition, pp. 1–6 (2021)
Schelter, S., Whang, S., Stoyanovich, J. (eds.): Proceedings of the Fourth Workshop on Data Management for End-To-End Machine Learning, In Conjunction with the 2020 ACM SIGMOD/PODS Conference (2020)
Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 1–25 (2008)
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013)
Acknowledgements
The authors would also like to thank the São Paulo Research Foundation (FAPESP), grant 2013/07375-0 (CEPID CeMEAI).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rivolli, A., Garcia, L.P.F., Lorena, A.C., de Carvalho, A.C.P.L.F. (2021). A Study of the Correlation of Metafeatures Used for Metalearning. In: Rojas, I., Joya, G., Català, A. (eds) Advances in Computational Intelligence. IWANN 2021. Lecture Notes in Computer Science(), vol 12861. Springer, Cham. https://doi.org/10.1007/978-3-030-85030-2_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-85030-2_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85029-6
Online ISBN: 978-3-030-85030-2
eBook Packages: Computer ScienceComputer Science (R0)