Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A novel grey wolf optimization algorithm based on geometric transformations for gene selection and cancer classification

Published: 21 September 2023 Publication History

Abstract

Cancer classification based on microarray data plays a very important role in cancer diagnosis and detection. Indeed, since microarray data contains a huge number of genes and a small number of samples, it is also nonlinear and noisy, which has led to the need to find a way to reduce the data dimensionality. In order to solve this problem, we need to find an effective way to help biologists and medical research scientists. This paper proposes a new bio-inspired algorithm for cancer classification in gene selection called Binary Grey Wolf Optimization Algorithm (BGWOA), which is based on hybridization between Minimum Redundancy-Maximum Relevance (MRMR) and a novel Binary Grey Wolf algorithm. The BGWOA is composed of two stages: The first stage consists of the MRMR pre-filter to obtain the set of relevant genes that reduces the dimensionality of the data sets. The second stage consists of a new Binary Grey Wolf algorithm based on direct similarity and centroid known in the geometric field to update the positions of grey wolves in order to exploit and explore the search spaces. As well, we used a fitness function that depends on the SVM with LOOCV classifier and the rate of unselected genes to evaluate the presented solutions. The primary goal of the last stage is to identify the best relevant subset of genes among those obtained in the first stage. This research used eight microarray datasets to evaluate and compare the proposed method with other existing algorithms. The experimental results produced in this research are able to provide a higher classification accuracy with fewer genes compared to many recently published algorithms. Specifically, the proposed method achieves 100% classification accuracy in five reference datasets with a number of genes ranging from 12 to 25. Therefore, this indicates that our research is promising and significant.

References

[1]
Abdiansah A and Wardoyo R Time complexity analysis of support vector machines (svm) in libsvm Int J Comput Aappl 2015 128 28-34
[2]
Al-Betar MA, Alomari OA, and Abu-Romman SM A triz-inspired bat algorithm for gene selection in cancer classification Genomics 2020 112 114-126
[3]
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, and Levine AJ Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays Proc Natl Acad Sci 1999 96 6745-6750
[4]
Alshamlan HM, Badr GH, and Alohali YA Genetic bee colony (gbc) algorithm: a new gene selection method for microarray cancer classification Comput Biol Chem 2015 56 49-60
[5]
Aydadenta H and Adiwijaya A A clustering approach for feature selection in microarray data classification using random forest J Inform Process Syst 2018 14 1167-1175
[6]
Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, et al. Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses Proc Natl Acad Sci 2001 98 13790-13795
[7]
Blanco R, Larrañaga P, Inza I, and Sierra B Gene selection for cancer classification using wrapper approaches Int J Pattern Recognit Artif Intell 2004 18 1373-1390
[8]
Bolón-Canedo V, Sánchez-Maroño N, and Alonso-Betanzos A Distributed feature selection: An application to microarray data classification Appl Soft Comput 2015 30 136-150
[9]
Bolón-Canedo V, Sánchez-Marono N, Alonso-Betanzos A, Benítez JM, and Herrera F A review of microarray datasets and applied feature selection methods Inf Sci 2014 282 111-135
[10]
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, and Jemal A Global cancer statistics 2018: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries CA: Canc J Clin 2018 68 394-424
[11]
Chaudhuri A and Sahu TP A hybrid feature selection method based on binary jaya algorithm for micro-array data classification Comput Electr Eng 2021 90 106963
[12]
Chen KH, Wang KJ, Tsai ML, Wang KM, Adrian AM, Cheng WC, Yang TS, Teng NC, Tan KP, and Chang KS Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm BMC Bioinform 2014 15 1-10
[13]
Chen Y, Zhang Z, Zheng J, Ma Y, and Xue Y Gene selection for tumor classification using neighborhood rough sets and entropy measures J Biomed Inform 2017 67 59-68
[14]
Chuang LY, Chang HW, Tu CJ, and Yang CH Improved binary PSO for feature selection using gene expression data Comput Biol Chem 2008 32 29-38
[15]
Cortes C and Vapnik V Support-vector networks Mach Learn 1995 20 273-297
[16]
Cotta C and Moscato P The k-feature set problem is w [2]-complete J Comput Syst Sci 2003 67 686-690
[17]
Crawford JR and Howell DC Comparing an individual’s test score against norms derived from small samples Clin Neuropsychol 1998 12 482-486
[18]
Dabba A, Tari A, and Meftali S Hybridization of moth flame optimization algorithm and quantum computing for gene selection in microarray data J Ambient Intell Humaniz Comput 2021 12 2731-2750
[19]
Dabba A, Tari A, Meftali S, and Mokhtari R Gene selection and classification of microarray data method based on mutual information and moth flame algorithm Expert Syst Appl 2021 166 114012
[20]
Dalton B Data mining: a preprocessing engine Solid State Technol 2019 62 09-16
[21]
Dash M and Liu H Feature selection for classification Intell Data Anal 1997 1 131-156
[22]
Dashtban M, Balafar M, and Suravajhala P Gene selection for tumor classification using a novel bio-inspired multi-objective approach Genomics 2018 110 10-17
[23]
Davies S, Russell S (1994) Np-completeness of searches for smallest possible feature sets. In: AAAI Symposium on Intelligent Relevance. AAAI Press. pp 37–39
[24]
Deng L, Pei J, Ma J, Lee DL (2004) A rank sum test method for informative gene discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp«410–419
[25]
Deng X, Li M, Deng S, Wang L (2021) Hybrid gene selection approach using xgboost and multi-objective genetic algorithm for cancer classification. arXiv preprint arXiv:2106.05841
[26]
Deng X, Li Y, Weng J, and Zhang J Feature selection for text classification: a review Multim Tools Appl 2019 78 3797-3816
[27]
Dif N and Elberrichi Z An enhanced recursive firefly algorithm for informative gene selection Int J Swarm Intell Res (IJSIR) 2019 10 21-33
[28]
Ding C and Peng H Minimum redundancy feature selection from microarray gene expression data J Bioinform Comput Biol 2005 3 185-205
[29]
Du D, Li K, Li X, and Fei M A novel forward gene selection algorithm for microarray data Neurocomputing 2014 133 446-458
[30]
Duval B, Hao JK, Hernandez Hernandez JC (2009) A memetic algorithm for gene selection and molecular classification of cancer, In: Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation. ACM, pp 201–208
[31]
Faris H, Mafarja MM, Heidari AA, Aljarah I, Ala’M AZ, Mirjalili S, and Fujita H An efficient binary salp swarm algorithm with crossover scheme for feature selection problems Knowl-Based Syst 2018 154 43-67
[32]
Fernández-Navarro F, Hervás-Martínez C, Ruiz R, and Riquelme JC Evolutionary generalized radial basis function neural networks for improving prediction accuracy in gene classification using feature selection Appl Soft Comput 2012 12 1787-1800
[33]
Fix E (1985) Discriminatory analysis: nonparametric discrimination, consistency properties. volume 1. USAF School of Aviation Medicine
[34]
Ghosh M, Guha R, Sarkar R, and Abraham A A wrapper-filter feature selection technique based on ant colony optimization Neural Comput Appl 2019 32 7839-7857
[35]
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring Science 1999 286 531-537
[36]
Hall MA (1999) Correlation-based feature selection for machine learning
[37]
Hameed SS, Muhammad FF, Hassan R, and Saeed F Gene selection and classification in microarray datasets using a hybrid approach of pcc-bpso/ga with multi classifiers JCS 2018 14 868-880
[38]
Hegazy AE, Makhlouf M, and El-Tawel GS Feature selection using chaotic salp swarm algorithm for data classification Arab J Sci Eng 2018 44 3801-3816
[39]
Hengpraprohm S, Mukviboonchai S, Thammasang R, Chongstitvatana P (2010) A ga-based classifier for microarray data classification. In: 2010 International Conference on Intelligent Computing and Cognitive Informatics. IEEE, pp 199–202
[40]
Huerta EB, Duval B, and Hao JK A hybrid lda and genetic algorithm for gene selection and classification of microarray data Neurocomputing 2010 73 2375-2383
[41]
Ibrahim AO, Shamsuddin SM, Abraham A, and Qasem SN Adaptive memetic method of multi-objective genetic evolutionary algorithm for backpropagation neural network Neural Comput Appl 2019 31 4945-4962
[42]
Inza I, Larranaga P, Blanco R, and Cerrolaza AJ Filter versus wrapper gene selection approaches in DNA microarray domains Artif Intell Med 2004 31 91-103
[43]
Jagga Z and Gupta D Machine learning for biomarker identification in cancer research-developments toward its clinical application Pers Med 2015 12 371-387
[44]
Kelemen A, Zhou H, Lawhead P, Liang Y (2003) Naive bayesian classifier for microarray data. In: Proceedings of the International Joint Conference on Neural Networks. IEEE, pp 1769–1773
[45]
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and Information Conference. IEEE, pp 372–378
[46]
Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks Nat Med 2001 7 673
[47]
Kira K and Rendell LA A practical approach to feature selection Machine learning proceedings 1992 Amsterdam Elsevier 249-256
[48]
Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: European Conference on Machine Learning. Springer, pp 171–182
[49]
Kullback S and Leibler RA On information and sufficiency Ann Math Stat 1951 22 79-86
[50]
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, de Schaetzen V, Duque R, Bersini H, and Nowe A A survey on filter techniques for feature selection in gene expression microarray analysis IEEE/ACM Trans Comput Biol Bioinf 2012 9 1106-1119
[51]
Li S, Wu X, and Tan M Gene selection using hybrid particle swarm optimization and genetic algorithm Soft Comput 2008 12 1039-1048
[52]
Ma S, Li X, and Wang Y Classification of gene expression data using multiobjective differential evolution Energies 2016 9 1061
[53]
Mafarja M, Aljarah I, Heidari AA, Faris H, Fournier-Viger P, Li X, and Mirjalili S Binary dragonfly optimization for feature selection using time-varying transfer functions Knowl-Based Syst 2018 161 185-204
[54]
Masoudi-Sobhanzadeh Y and Motieghader H World competitive contests (wcc) algorithm: a novel intelligent optimization algorithm for biological and non-biological problems Inform Med Unlock 2016 3 15-28
[55]
Mirjalili S, Mirjalili SM, and Lewis A Grey wolf optimizer Adv Eng Softw 2014 69 46-61
[56]
Molina LC, Belanche L, Nebot À (2002) Feature selection algorithms: a survey and experimental evaluation. In: 2002 IEEE International Conference on Data Mining, 2002. Proceedings, IEEE, pp 306–313
[57]
Moosa JM, Shakur R, Kaykobad M, and Rahman MS Gene selection for cancer classification with the help of bees BMC Med Genomics 2016 9 47
[58]
Motieghader H, Najafi A, Sadeghi B, and Masoudi-Nejad A A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata Inform Med Unlock 2017 9 246-254
[59]
Mundra PA and Rajapakse JC Gene and sample selection for cancer classification with support vectors based t-statistic Neurocomputing 2010 73 2353-2362
[60]
Nancy SG, Saranya K, Rajasekar S (2020) Neuro-fuzzy ant bee colony based feature selection for cancer classification. In: EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing. Springer. pp 31–40
[61]
Narendra PM and Fukunaga K A branch and bound algorithm for feature subset selection IEEE Comput Archit Lett 1977 26 917-922
[62]
Oreski S and Oreski G Genetic algorithm-based heuristic for feature selection in credit risk assessment Expert Syst Appl 2014 41 2052-2064
[63]
Othman MS, Kumaran SR, and Yusuf LM Gene selection using hybrid multi-objective cuckoo search algorithm with evolutionary operators for cancer microarray data IEEE Access 2020 8 186348-186361
[64]
Peng H, Long F, and Ding C Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy IEEE Trans Patt Anal Mach Intell 2005 27 8 1226-1238
[65]
Petricoin EF III, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, Mills GB, Simone C, Fishman DA, Kohn EC, et al. Use of proteomic patterns in serum to identify ovarian cancer The lancet 2002 359 572-577
[66]
Prasad Y, Biswas K, and Hanmandlu M A recursive PSO scheme for gene selection in microarray data Appl Soft Comput 2018 71 213-225
[67]
Radovic M, Ghalwash M, Filipovic N, and Obradovic Z Minimum redundancy maximum relevance feature selection approach for temporal gene expression data BMC Bioinform 2017 18 1-14
[68]
Saeys Y, Inza I, and Larrañaga P A review of feature selection techniques in bioinformatics Bioinformatics 2007 23 2507-2517
[69]
Sampathkumar A, Rastogi R, Arukonda S, Shankar A, Kautish S, and Sivaram M An efficient hybrid methodology for detection of cancer-causing gene using CSC for micro array data J Ambient Intell Humaniz Comput 2020 11 4743-4751
[70]
Sharma A and Rani R C-hmoshssa: gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods Comput Meth Progr Biomed 2019 178 219-235
[71]
Shukla AK, Singh P, and Vardhan M Gene selection for cancer types classification using novel hybrid metaheuristics approach Swarm Evol Comput 2020 54 100661
[72]
Su CT and Hsu JH An extended chi2 algorithm for discretization of real value attributes IEEE Trans Knowl Data Eng 2005 17 437-441
[73]
Tadist K, Mrabti F, Nikolov NS, Zahi A, and Najah S Sdpso: spark distributed pso-based approach for feature selection and cancer disease prognosis J Big Data 2021 8 1-22
[74]
Tang B, Xiang K, and Pang M An integrated particle swarm optimization approach hybridizing a new self-adaptive particle swarm optimization with a modified differential evolution Neural Comput Appl 2020 32 4849-4883
[75]
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Data classification: algorithms and applications, p 37
[76]
Tsai YS, Aguan K, Pal NR, and Chung IF Identification of single-and multiple-class specific signature genes from gene expression profiles by group marker index PLoS ONE 2011 6 e24259
[77]
Tsang IW, Kwok JT, Cheung PM, and Cristianini N Core vector machines: fast svm training on very large data sets J Mach Learn Res 2005 6 363-392
[78]
Wang H and Niu B A novel bacterial algorithm with randomness control for feature selection in classification Neurocomputing 2017 228 176-186
[79]
Zhu Z, Ong YS, and Dash M Markov blanket-embedded genetic algorithm for gene selection Patt Recogn 2007 40 3236-3248

Index Terms

  1. A novel grey wolf optimization algorithm based on geometric transformations for gene selection and cancer classification
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image The Journal of Supercomputing
          The Journal of Supercomputing  Volume 80, Issue 4
          Mar 2024
          1310 pages

          Publisher

          Kluwer Academic Publishers

          United States

          Publication History

          Published: 21 September 2023
          Accepted: 01 September 2023

          Author Tags

          1. Genes selection
          2. Grey wolf optimization
          3. Geometric transformation
          4. Direct similarity
          5. Centroid
          6. microarray data
          7. Cancer classification
          8. Bio-inspired algorithms
          9. Molecular biology
          10. Minimum redundancy-maximum relevance

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 03 Feb 2025

          Other Metrics

          Citations

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media