Abstract
In pattern recognition and data mining, feature selection is a preprocessing step during which the dimensions of data are reduced by removing redundant, irrelevant, and noisy features for a machine learning task. Identifying the most informative features in a suitable computational time is one of the most important challenges in the existing feature selection methods. This paper introduces a multivariate filter feature selection method based on feature clustering technique called interaction-based feature clustering (IFC), which is very cost-effective in terms of computational cost while achieving high classification accuracy. In the proposed method, first, the features are ranked based on the symmetric uncertainty criterion, and then, the clustering of the features is performed by calculating their interactive weight as a similarity measure. To evaluate the performance, the results of the IFC algorithm are compared with six well-known multivariate filter methods on sixteen benchmark datasets using three classifiers of SVM, NB and kNN. In addition, for further evaluation, a comparison is made using the Akaike Information Criterion (AIC) and Pareto front curves. Experimental results prove that the IFC algorithm is often more efficient than the comparable methods in terms of classification accuracy and computational time and can be considered as a suitable method in the preprocessing step.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ng WW, Tuo Y, Zhang J, Kwong S (2020) Training error and sensitivity-based ensemble feature selection. Int J Mach Learn Cybernet 11(10):2313–2326
Li Z, Du J, Nie B, Xiong W, Xu G (2022) Luo J (2021) A new two-stage hybrid feature selection algorithm and its application in chinese medicine. Int J Mach Learn Cybernet 13:1243–1264
Hancer E (2020) New filter approaches for feature selection using differential evolution and fuzzy rough set theory. Neural Comput Appl 32(7):2929–2944
Taradeh M, Mafarja M, Heidari AA, Faris H, Aljarah I, Mirjalili S, Fujita H (2019) An evolutionary gravitational search-based feature selection. Inf Sci 497:219–239
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Applied Soft Computing 62:203–215
Got A, Moussaoui A, Zouache D (2021) Hybrid filter-wrapper feature selection using whale optimization algorithm: A multi-objective approach. Expert Syst Appl 183:115312
Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Dokeroglu T, Deniz A, Kiziloz HE (2022) A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494:269–296
An S, Hu Q, Wang C, Guo G, Li P (2022) Data reduction based on nn-knn measure for nn classification and regression. Int J Mach Learn Cybernet 13(3):765–781
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: A data perspective. ACM Comput Surv (CSUR) 50(6):1–45
Chan PP, Liang Y, Zhang F, Yeung DS (2021) Distribution-based adversarial filter feature selection against evasion attack. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 . IEEE
Wang J, Wei J-M, Yang Z, Wang S-Q (2017) Feature selection by maximizing independent classification information. IEEE Trans Knowl Data Eng 29(4):828–841
Zhang L, Chen X (2021) Feature selection methods based on symmetric uncertainty coefficients and independent classification information. IEEE Access 9:13845–13856
Vergara JR, Estévez PA (2010) Cmim-2: an enhanced conditional mutual information maximization criterion for feature selection. J Appl Comput Sci Method 2(1):5–20
Tran CT, Zhang M, Andreae P, Xue B, Bui LT (2018) Improving performance of classification on incomplete data using feature selection and clustering. Appl Soft Comput 73:848–861
Vans E, Sharma A, Patil A, Shigemizu D, Tsunoda T (2019) Clustering of small-sample single-cell rna-seq data via feature clustering and selection. In: Pacific Rim International Conference on Artificial Intelligence, pp. 445–456. Springer
Sun L, Zhang J, Ding W, Xu J (2022) Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted k-nearest neighbors. Inf Sci 593:591–613
Khaire UM, Dhanalakshmi R (2019) Stability of feature selection algorithm: a review. J King Saud Univ - Comput Inf Sci 34:1060–1073
Peralta D, Saeys Y (2020) Robust unsupervised dimensionality reduction based on feature clustering for single-cell imaging data. Appl Soft Comput 93:106421
Bakhshandeh S, Azmi R, Teshnehlab M (2020) Symmetric uncertainty class-feature association map for feature selection in microarray dataset. Int J Mach Learn Cybernet 11(1):15–32
Cai J, Wang S, Xu C, Guo W (2022) Unsupervised deep clustering via contractive feature representation and focal loss. Pattern Recognit 123:108386
Zeng Z, Zhang H, Zhang R, Yin C (2015) A novel feature selection method considering feature interaction. Pattern Recognit 48(8):2656–2666
Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with java implementations. Acm Sigmod Record 31(1):76–77
Kannan SS, Ramaraj N (2010) A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl Based Syst 23(6):580–585
Sharbaf FV, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6):231–238
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9(4):1106–1119
Lewis DD (1992) Feature selection and feature extraction for text categorization. In: Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992
Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550
Kwak N, Choi C-H (2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159
Estévez PA, Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE Trans Neural Netw 20(2):189–201
Hoque N, Bhattacharyya DK, Kalita JK (2014) Mifs-nd: A mutual information-based feature selection method. Expert Syst Appl 41(14):6371–6385
Yang H, Moody J (1999) Data visualization and feature selection: New algorithms for nongaussian data. Adv Neural Inf Process Syst 12
Meyer PE, Schretter C, Bontempi G (2008) Information-theoretic feature selection in microarray data using variable complementarity. IEEE J Select Topic Signal Process 2(3):261–274
Ganapathy S, Vijayakumar P, Yogesh P, Kannan A (2016) An intelligent crf based feature selection for effective intrusion detection. Int Arab J Inf Tech (IAJIT) 13(1)
Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532
Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl Based Syst 140:103–119
Cui L, Bai L, Zhang Z, Wang Y, Hancock ER (2019) Identifying the most informative features using a structurally interacting elastic net. Neurocomputing 336:13–26
Vrieze SI (2012) Model selection and psychological theory: a discussion of the differences between the akaike information criterion (aic) and the bayesian information criterion (bic). Psychol Method 17(2):228
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Esfandiari, A., Khaloozadeh, H. & Farivar, F. Interaction-based clustering algorithm for feature selection: a multivariate filter approach. Int. J. Mach. Learn. & Cyber. 14, 1769–1782 (2023). https://doi.org/10.1007/s13042-022-01726-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01726-0