Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Interaction-based clustering algorithm for feature selection: a multivariate filter approach

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

In pattern recognition and data mining, feature selection is a preprocessing step during which the dimensions of data are reduced by removing redundant, irrelevant, and noisy features for a machine learning task. Identifying the most informative features in a suitable computational time is one of the most important challenges in the existing feature selection methods. This paper introduces a multivariate filter feature selection method based on feature clustering technique called interaction-based feature clustering (IFC), which is very cost-effective in terms of computational cost while achieving high classification accuracy. In the proposed method, first, the features are ranked based on the symmetric uncertainty criterion, and then, the clustering of the features is performed by calculating their interactive weight as a similarity measure. To evaluate the performance, the results of the IFC algorithm are compared with six well-known multivariate filter methods on sixteen benchmark datasets using three classifiers of SVM, NB and kNN. In addition, for further evaluation, a comparison is made using the Akaike Information Criterion (AIC) and Pareto front curves. Experimental results prove that the IFC algorithm is often more efficient than the comparable methods in terms of classification accuracy and computational time and can be considered as a suitable method in the preprocessing step.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Ng WW, Tuo Y, Zhang J, Kwong S (2020) Training error and sensitivity-based ensemble feature selection. Int J Mach Learn Cybernet 11(10):2313–2326

    Article  Google Scholar 

  2. Li Z, Du J, Nie B, Xiong W, Xu G (2022) Luo J (2021) A new two-stage hybrid feature selection algorithm and its application in chinese medicine. Int J Mach Learn Cybernet 13:1243–1264

    Article  Google Scholar 

  3. Hancer E (2020) New filter approaches for feature selection using differential evolution and fuzzy rough set theory. Neural Comput Appl 32(7):2929–2944

    Article  Google Scholar 

  4. Taradeh M, Mafarja M, Heidari AA, Faris H, Aljarah I, Mirjalili S, Fujita H (2019) An evolutionary gravitational search-based feature selection. Inf Sci 497:219–239

    Article  Google Scholar 

  5. Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Applied Soft Computing 62:203–215

    Article  Google Scholar 

  6. Got A, Moussaoui A, Zouache D (2021) Hybrid filter-wrapper feature selection using whale optimization algorithm: A multi-objective approach. Expert Syst Appl 183:115312

    Article  Google Scholar 

  7. Xue B, Zhang M, Browne WN, Yao X (2015) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626

    Article  Google Scholar 

  8. Dokeroglu T, Deniz A, Kiziloz HE (2022) A comprehensive survey on recent metaheuristics for feature selection. Neurocomputing 494:269–296

    Article  Google Scholar 

  9. An S, Hu Q, Wang C, Guo G, Li P (2022) Data reduction based on nn-knn measure for nn classification and regression. Int J Mach Learn Cybernet 13(3):765–781

    Article  Google Scholar 

  10. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: A data perspective. ACM Comput Surv (CSUR) 50(6):1–45

    Article  Google Scholar 

  11. Chan PP, Liang Y, Zhang F, Yeung DS (2021) Distribution-based adversarial filter feature selection against evasion attack. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 . IEEE

  12. Wang J, Wei J-M, Yang Z, Wang S-Q (2017) Feature selection by maximizing independent classification information. IEEE Trans Knowl Data Eng 29(4):828–841

    Article  Google Scholar 

  13. Zhang L, Chen X (2021) Feature selection methods based on symmetric uncertainty coefficients and independent classification information. IEEE Access 9:13845–13856

    Article  Google Scholar 

  14. Vergara JR, Estévez PA (2010) Cmim-2: an enhanced conditional mutual information maximization criterion for feature selection. J Appl Comput Sci Method 2(1):5–20

    Google Scholar 

  15. Tran CT, Zhang M, Andreae P, Xue B, Bui LT (2018) Improving performance of classification on incomplete data using feature selection and clustering. Appl Soft Comput 73:848–861

    Article  Google Scholar 

  16. Vans E, Sharma A, Patil A, Shigemizu D, Tsunoda T (2019) Clustering of small-sample single-cell rna-seq data via feature clustering and selection. In: Pacific Rim International Conference on Artificial Intelligence, pp. 445–456. Springer

  17. Sun L, Zhang J, Ding W, Xu J (2022) Feature reduction for imbalanced data classification using similarity-based feature clustering with adaptive weighted k-nearest neighbors. Inf Sci 593:591–613

    Article  Google Scholar 

  18. Khaire UM, Dhanalakshmi R (2019) Stability of feature selection algorithm: a review. J King Saud Univ - Comput Inf Sci 34:1060–1073

    Google Scholar 

  19. Peralta D, Saeys Y (2020) Robust unsupervised dimensionality reduction based on feature clustering for single-cell imaging data. Appl Soft Comput 93:106421

    Article  Google Scholar 

  20. Bakhshandeh S, Azmi R, Teshnehlab M (2020) Symmetric uncertainty class-feature association map for feature selection in microarray dataset. Int J Mach Learn Cybernet 11(1):15–32

    Article  Google Scholar 

  21. Cai J, Wang S, Xu C, Guo W (2022) Unsupervised deep clustering via contractive feature representation and focal loss. Pattern Recognit 123:108386

    Article  Google Scholar 

  22. Zeng Z, Zhang H, Zhang R, Yin C (2015) A novel feature selection method considering feature interaction. Pattern Recognit 48(8):2656–2666

    Article  Google Scholar 

  23. Witten IH, Frank E (2002) Data mining: practical machine learning tools and techniques with java implementations. Acm Sigmod Record 31(1):76–77

    Article  Google Scholar 

  24. Kannan SS, Ramaraj N (2010) A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl Based Syst 23(6):580–585

    Article  Google Scholar 

  25. Sharbaf FV, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6):231–238

    Article  Google Scholar 

  26. Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9(4):1106–1119

    Article  Google Scholar 

  27. Lewis DD (1992) Feature selection and feature extraction for text categorization. In: Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

  28. Battiti R (1994) Using mutual information for selecting features in supervised neural net learning. IEEE Trans Neural Netw 5(4):537–550

    Article  Google Scholar 

  29. Kwak N, Choi C-H (2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159

    Article  Google Scholar 

  30. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159

    Article  Google Scholar 

  31. Estévez PA, Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE Trans Neural Netw 20(2):189–201

    Article  Google Scholar 

  32. Hoque N, Bhattacharyya DK, Kalita JK (2014) Mifs-nd: A mutual information-based feature selection method. Expert Syst Appl 41(14):6371–6385

    Article  Google Scholar 

  33. Yang H, Moody J (1999) Data visualization and feature selection: New algorithms for nongaussian data. Adv Neural Inf Process Syst 12

  34. Meyer PE, Schretter C, Bontempi G (2008) Information-theoretic feature selection in microarray data using variable complementarity. IEEE J Select Topic Signal Process 2(3):261–274

    Article  Google Scholar 

  35. Ganapathy S, Vijayakumar P, Yogesh P, Kannan A (2016) An intelligent crf based feature selection for effective intrusion detection. Int Arab J Inf Tech (IAJIT) 13(1)

  36. Bennasar M, Hicks Y, Setchi R (2015) Feature selection using joint mutual information maximisation. Expert Syst Appl 42(22):8520–8532

    Article  Google Scholar 

  37. Hancer E, Xue B, Zhang M (2018) Differential evolution for filter feature selection based on information theory and feature ranking. Knowl Based Syst 140:103–119

    Article  Google Scholar 

  38. Cui L, Bai L, Zhang Z, Wang Y, Hancock ER (2019) Identifying the most informative features using a structurally interacting elastic net. Neurocomputing 336:13–26

    Article  Google Scholar 

  39. Vrieze SI (2012) Model selection and psychological theory: a discussion of the differences between the akaike information criterion (aic) and the bayesian information criterion (bic). Psychol Method 17(2):228

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Khaloozadeh.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Esfandiari, A., Khaloozadeh, H. & Farivar, F. Interaction-based clustering algorithm for feature selection: a multivariate filter approach. Int. J. Mach. Learn. & Cyber. 14, 1769–1782 (2023). https://doi.org/10.1007/s13042-022-01726-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01726-0

Keywords