Abstract
Feature selection plays an important role in machine learning or data mining problems. Removing irrelevant features increases model accuracy and reduces the computational cost. However, selecting important features is not a simple task as one feature selection algorithm does not perform well on all the datasets that are of interest. This paper tries to address the recommendation of a feature selection algorithm based on dataset characteristics and quality. The research uses three types of dataset characteristics along with data quality metrics. The main contribution of the work is the utilization of Semantic Web techniques to develop a novel system that can aid in robust feature selection algorithm recommendations. The system’s strength lies in assisting users of machine learning algorithms by providing more relevant feature selection algorithms for the dataset using an ontology called Feature Selection algorithm recommendation based on Data Characteristics and Quality (FSDCQ). Results are generated using six different feature selection algorithms and four types of classifiers on ten datasets from UCI repository. Recommendations take the form of “Feature selection algorithm X is recommended for dataset i, as it performed better on dataset j, similar to dataset i in terms of class overlap 0.3, label noise 0.2, completeness 0.9, conciseness 0.8 units". While the domain-specific ontology FSDCQ was created to aid in the task of algorithm recommendation for feature selection, it is easily applicable to other meta-learning scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albertoni, R., Isaac, A.: Introducing the data quality vocabulary (DQV). Semantic Web 12(1), 81–97 (2021)
Almeida, R., Maio, P., Oliveira, P., Barroso, J.: An ontology-based methodology for reusing data cleaning knowledge. In: Proceedings of the International Conference on Knowledge Engineering and Ontology Development (KEOD 2015), pp. 202–211. SciTePress (2015)
Bozic, B., Brennan, R., Feeney, K., Mendel-Gleason, G.: Describing reasoning results with RVO, the reasoning violations ontology. In: MEPDaW and LDQ Co-located with ESWC, CEUR Workshop Proceedings, vol. 1585, pp. 62–69 (2016)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Chen, J., Li, K., Rong, H., Bilal, K., Yang, N., Li, K.: A disease diagnosis and treatment recommendation system based on big data mining and cloud computing. Inf. Sci. 435, 124–149 (2018)
Chen, R.C., Huang, Y.H., Bau, C.T., Chen, S.M.: A recommendation system based on domain ontology and SWRL for anti-diabetic drugs selection. Expert Syst. Appl. 39(4), 3995–4006 (2012)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(1–4), 131–156 (1997)
Fernández-López, M., Gómez-Pérez, A., Juristo, N.: Methontology: from ontological art towards ontological engineering (1997)
Fürber, C., Hepp, M.: Towards a vocabulary for data quality management in semantic web architectures. In: Proceedings of the 2011 EDBT/ICDT Workshop on Linked Web Data Management, pp. 1–8. ACM (2011)
Junior, A.C., Debruyne, C., Longo, L., O’Sullivan, D.: On the mental workload assessment of uplift mapping representations in linked data. In: Longo, L., Leva, M.C. (eds.) H-WORKLOAD 2018. CCIS, vol. 1012, pp. 160–179. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14273-5_10
Kalousis, A., Hilario, M.: Feature selection for meta-learning. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 222–233. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45357-1_26
Keet, C.M., Lawrynowicz, A., d’Amato, C., Kalousis, A., Nguyen, P., Palma, R., Stevens, R., Hilario, M.: The data mining optimization ontology. J. Web Semant. 32, 43–53 (2015)
Longo, L., Goebel, R., Lecue, F., Kieseberg, P., Holzinger, A.: Explainable artificial intelligence: concepts, applications, research challenges and visions. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2020. LNCS, vol. 12279, pp. 1–16. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57321-8_1
Mantovani, R.G., Rossi, A.L.D., Alcobaça, E., Vanschoren, J., de Carvalho, A.C.P.L.F.: A meta-learning recommender system for hyperparameter tuning: predicting when tuning improves SVM classifiers. Inf. Sci. 501, 193–221 (2019)
Nakamura, M., Otsuka, A., Kimura, H.: Automatic selection of classification algorithms for non-experts using meta-features. China-USA Bus. Rev. 13(3) (2014)
Nayak, A., Bozic, B., Longo, L.: Extending r2rml-f to support dynamic datatype and language tags. Proc. Comput. Sci. 192, 709–716 (2021). Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 25th International Conference KES2021
Obeid, C., Lahoud, I., El Khoury, H., Champin, P.A.: Ontology-based recommender system in higher education. In: Companion Proceedings of the The Web Conference 2018, pp. 1031–1034 (2018)
O’Connor, M.J., Halaschek-Wiener, C., Musen, M.A.: Mapping master: a flexible approach for mapping spreadsheets to OWL. In: Patel-Schneider, P.F., et al. (eds.) ISWC 2010. LNCS, vol. 6497, pp. 194–208. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17749-1_13
Oreski, D., Oreski, S., Klicek, B.: Effects of dataset characteristics on the performance of feature selection techniques. Appl. Soft Comput. 52, 109–119 (2017)
Panov, P., Dzeroski, S., Soldatova, L.N.: Ontodm: An ontology of data mining. In: Workshops Proceedings of the 8th IEEE International Conference on Data Mining, pp. 752–760. IEEE Computer Society (2008)
Panov, P., Soldatova, L., Džeroski, S.: OntoDM-KDD: ontology for representing the knowledge discovery process. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS (LNAI), vol. 8140, pp. 126–140. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40897-7_9
Panov, P., Soldatova, L.N., Dzeroski, S.: Generic ontology of datatypes. Inf. Sci. 329, 900–920 (2016)
Parmezan, A.R.S., Lee, H.D., Spolaôr, N., Wu, F.C.: Automatic recommendation of feature selection algorithms based on dataset characteristics. Expert Syst. Appl. 185, 115589 (2021)
Parmezan, A.R.S., Lee, H.D., Wu, F.C.: Metalearning for choosing feature selection algorithms in data mining: proposal of a new framework. Expert Syst. Appl. 75, 1–24 (2017)
Peng, Y., Flach, P.A., Soares, C., Brazdil, P.: Improved dataset characterisation for meta-learning. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 141–152. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-36182-0_14
Pise, N., Kulkarni, P.: Algorithm selection for classification problems. In: SAI Computing Conference (SAI), pp. 203–211. IEEE (2016)
Reif, M., Shafait, F., Dengel, A.: Prediction of classifier training time including parameter optimization. In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS (LNAI), vol. 7006, pp. 260–271. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24455-1_25
Reif, M., Shafait, F., Goldstein, M., Breuel, T., Dengel, A.: Automatic classifier selection for non-experts. Pattern Anal. Appl. 17(1), 83–96 (2012). https://doi.org/10.1007/s10044-012-0280-z
Rivolli, A., Garcia, L.P., Soares, C., Vanschoren, J., de Carvalho, A.C.: Meta-features for meta-learning. Knowl. Based Syst. 240, 108101 (2022)
Rodriguez-Muro, M., Rezk, M.: Efficient sparql-to-sql with r2rml mappings. J. Web Semant. 33, 141–169 (2015)
Rosa, R.L., Schwartz, G.M., Ruggiero, W.V., Rodríguez, D.Z.: A knowledge-based recommendation system that includes sentiment analysis and deep learning. IEEE Trans. Indust. Inf. 15(4), 2124–2135 (2018)
Shilbayeh, S., Vadera, S.: Feature selection in meta learning framework. In: Science and Information Conference, pp. 269–275. IEEE (2014)
Song, Q., Wang, G., Wang, C.: Automatic recommendation of classification algorithms based on dataset characteristics. Pattern Recogn. 45(7), 2672–2689 (2012)
Tianxing, M., Myint, M., Guan, W., Zhukova, N., Mustafin, N.: A hierarchical data mining process ontology. In: 28th Conference of Open Innovations Association (FRUCT), pp. 465–471. IEEE (2021)
Uschold, M., Gruninger, M.: Ontologies: principles, methods and applications. Knowl. Eng. Rev. 11(2), 93–136 (1996)
Vanschoren, J.: Meta-learning: A Survey. arXiv preprint arXiv:1810.03548 (2018)
Vanschoren, J., Soldatova, L.: Exposé: an ontology for data mining experiments. In: International Workshop on Third Generation Data Mining: Towards Service-Oriented Knowledge Discovery (SoKD-2010), pp. 31–46 (2010)
Vilalta, R., Giraud-Carrier, C.G., Brazdil, P., Soares, C.: Using meta-learning to support data mining. Int. J. Comput. Sci. Appl. 1(1), 31–45 (2004)
Vilone, G., Longo, L.: Classification of explainable artificial intelligence methods through their output formats. Mach. Learn. Knowl. Extract. 3(3), 615–661 (2021)
Vilone, G., Longo, L.: Notions of explainability and evaluation approaches for explainable artificial intelligence. Inf. Fusion 76, 89–106 (2021)
Wang, G., Song, Q., Sun, H., Zhang, X., Xu, B., Zhou, Y.: A feature subset selection algorithm automatic recommendation method. J. Artif. Intell. Res. 47, 1–34 (2013)
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semantic Web 7(1), 63–93 (2016)
Zhongguo, Y., Hongqi, L., Ali, S., Yile, A.: Choosing classification algorithms and its optimum parameters based on data set characteristics. J. Comput. 28(5), 26–38 (2017)
Acknowledgements
This publication has emanated from research supported in part by a grant from Science Foundation Ireland under Grant number 18/CRT/6183. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Nayak, A., Božić, B., Longo, L. (2022). An Ontological Approach for Recommending a Feature Selection Algorithm. In: Di Noia, T., Ko, IY., Schedl, M., Ardito, C. (eds) Web Engineering. ICWE 2022. Lecture Notes in Computer Science, vol 13362. Springer, Cham. https://doi.org/10.1007/978-3-031-09917-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-09917-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-09916-8
Online ISBN: 978-3-031-09917-5
eBook Packages: Computer ScienceComputer Science (R0)