Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An Efficient Cancer Classification Model Using Microarray and High-Dimensional Data

Published: 01 January 2021 Publication History

Abstract

Cancer can be considered as one of the leading causes of death widely. One of the most effective tools to be able to handle cancer diagnosis, prognosis, and treatment is by using expression profiling technique which is based on microarray gene. For each data point (sample), gene data expression usually receives tens of thousands of genes. As a result, this data is large-scale, high-dimensional, and highly redundant. The classification of gene expression profiles is considered to be a (NP)-Hard problem. Feature (gene) selection is one of the most effective methods to handle this problem. A hybrid cancer classification approach is presented in this paper, and several machine learning techniques were used in the hybrid model: Pearson’s correlation coefficient as a correlation-based feature selector and reducer, a Decision Tree classifier that is easy to interpret and does not require a parameter, and Grid Search CV (cross-validation) to optimize the maximum depth hyperparameter. Seven standard microarray cancer datasets are used to evaluate our model. To identify which features are the most informative and relative using the proposed model, various performance measurements are employed, including classification accuracy, specificity, sensitivity, F1-score, and AUC. The suggested strategy greatly decreases the number of genes required for classification, selects the most informative features, and increases classification accuracy, according to the results.

References

[1]
R. L. Siegel, K. D. Miller, H. E. Fuchs, and A. Jemal, “Cancer statistics, 2021,” CA: A Cancer Journal for Clinicians, vol. 71, no. 1, pp. 7–33, 2021.
[2]
A. Perez-Diez, A. Morgun, and N. Shulzhenko, “Microarrays for cancer diagnosis and classification,” Advances in Experimental Medicine and Biology, vol. 593, pp. 74–85, 2007.
[3]
H. Pang, S. L. George, K. Hui, and T. Tong, “Gene selection using iterative feature elimination random forests for survival outcomes,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 5, pp. 1422–1431, 2012.
[4]
R. Jaksik, M. Iwanaszko, J. Rzeszowska-Wolny, and M. Kimmel, “Microarray experiments and factors which affect their reliability,” Biology Direct, vol. 10, no. 1, pp. 46–14, 2015.
[5]
M. Joseph, M. Devaraj, and L. A. Vea, “Cancer classification of gene expression data using machine learning models,” in Proceedings of the 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management (HNICEM), IEEE, Baguio City, Philippines, December 2018.
[6]
Saber, Abeer, M. Sakr, O. M. Abo-Seida, A. Keshk, and H. Chen, “A novel deep-learning model for automatic detection and classification of breast cancer using the transfer-learning technique,” IEEE Access, vol. 9, pp. 71194–71209, 2021.
[7]
J. De Guia and M. Devaraj, “Analysis of cancer classification using gene expression data: a scientometric review,” International Journal of Pure and Applied Mathematics, vol. 119, no. 12, 2018.
[8]
J. Benesty, J. Chen, and Y. Huang, “On the importance of the Pearson correlation coefficient in noise reduction,” IEEE Transactions on Audio Speech and Language Processing, vol. 16, no. 4, pp. 757–765, 2008.
[9]
Z. Li, W. Xie, and T. Liu, “Efficient feature selection and classification for microarray data,” PLoS One, vol. 13, no. 8, p. e0202167, 2018.
[10]
L. Heng, L. Sheng-Gang, S. Ye-Guo, and W. Hong-Xing, “Prescribed performance synchronization for fractional-order chaotic systems,” Chinese Physics B, vol. 24, no. 9, p. 090505, 2015.
[11]
R. Diaz-Uriarte and S. A. De Andres, “Gene selection and classification of microarray data using random forest,” BMC Bioinformatics, vol. 7, no. 1, pp. 1–13, 2006.
[12]
M. Yuan, Z. Yang, and G. Ji, “Partial maximum correlation information: a new feature selection method for microarray data classification,” Neurocomputing, vol. 323, pp. 231–243, 2019.
[13]
M. Qaraad, S. Amjad, I. I. M. Manhrawy, H. Fathi, B. L. Hassan, and P. E. Kafrawy, “A hybrid feature selection optimization model for high dimension data classification,” IEEE Access, vol. 9, pp. 42884–42895, 2021.
[14]
A. G. Hussien, D. Oliva, E. H. Houssein, A. A. Juan, and X. Yu, “Binary whale optimization algorithm for dimensionality reduction,” Mathematics, vol. 8, no. 10, p. 1821, 2020.
[15]
A. G. Hussien, A. E. Hassanien, E. H. Houssein, S. Bhattacharyya, and M. Amin, “S-shaped binary whale optimization algorithm for feature selection,” in Recent Trends in Signal and Image Processing, pp. 79–87, Springer, Singapore, 2019.
[16]
A. G. Hussien, E. H. Houssein, and A. E. Hassanien, “A binary whale optimization algorithm with hyperbolic tangent fitness function for feature selection,” in Proceedings of the 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 166–172, IEEE, Cairo, Egypt, December 2017.
[17]
A. G. Hussien, A. E. Hassanien, and E. H. Houssein, “Swarming behaviour of salps algorithm for predicting chemical compound activities,” in Proceedings of the 2017 Eighth International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 315–320, IEEE, Cairo, Egypt, December 2017.
[18]
A. G. Hussien and A. Mohamed, “A self-adaptive Harris Hawks optimization algorithm with opposition-based learning and chaotic local search strategy for global optimization and feature selection,” International Journal of Machine Learning and Cybernetics, pp. 1–28, 2021.
[19]
A. G. Hussien, “An enhanced opposition-based salp swarm algorithm for global optimization and engineering problems,” Journal of Ambient Intelligence and Humanized Computing, pp. 1–22, 2021.
[20]
A. G. Hussien, M. Amin, M. Wang, G. Liang, A. Alsanad, A. Gumaei, and H. Chen, “Crow search algorithm: theory, recent advances, and applications,” IEEE Access, vol. 8, pp. 173548–173565, 2020.
[21]
D. W. Zimmerman and B. D. Zumbo, “Relative power of the Wilcoxon test, the Friedman test, and repeated-measures ANOVA on ranks,” The Journal of Experimental Education, vol. 62, no. 1, pp. 75–86, 1993.
[22]
I. I. M. Manhrawy, M. Qaraad, and P. El Kafrawy, “Hybrid feature selection model based on relief based algorithms and regulizer algorithms for cancer classification,” Concurrency and Computation: Practice and Experience, vol. 33, no. 4, p. e6200, 2021.
[23]
Assiri, A. Saad, A. G. Hussien, and M. Amin, “Ant lion optimization: variants, hybrids, and applications,” IEEE Access, vol. 8, pp. 77746–77764, 2020.
[24]
Abualigah, Laith, M. Abd Elaziz, A. G. Hussien, B. Alsalibi, S. Mohammad Jafar Jalali, and A. H. Gandomi, “Lightning search algorithm: a comprehensive survey,” Applied Intelligence, vol. 51, no. 4, pp. 1–24, 2020.
[25]
Hussien, G. Abdelazim, M. Amin, and M. Abd El Aziz, “A comprehensive review of moth-flame optimisation: variants, hybrids, and applications,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 32, no. 4, pp. 705–725, 2020.
[26]
Hussien, G. Abdelazim, A. Ella Hassanien, E. H. Houssein, M. Amin, and A. Taher Azar, “New binary whale optimization algorithm for discrete optimization problems,” Engineering Optimization, vol. 52, no. 6, pp. 945–959, 2020.
[27]
Abualigah, Laith, A. H. Gandomi, M. Abd Elaziz, A. G. Hussien, A. M. Khasawneh, M. Alshinwan, and E. H. Houssein, “Nature-inspired optimization algorithms for text document clustering—a comprehensive analysis,” Algorithms, vol. 13, no. 12, p. 345, 2020.
[28]
A. G. Hussien, A. A. Heidari, X. Ye, G. Liang, H. Chen, and Z. Pan, “Boosting whale optimization with evolution strategy and Gaussian random walks: an image segmentation method,” Engineering with Computers, pp. 1–46, 2021.
[29]
A. K. Shukla and D. Tripathi, “Detecting biomarkers from microarray data using distributed correlation based gene selection,” Genes Genomics, vol. 42, no. 4, pp. 1–17, 2020.
[30]
S. Kilicarslan, K. Adem, and M. Celik, “Diagnosis and classification of cancer using hybrid model based on relief and convolutional neural network,” Medical Hypotheses, vol. 137, p. 109577, 2020.
[31]
E. Pashaei, M. Ozen, and N. Aydin, “Gene selection and classification approach for microarray data based on random forest ranking and BBHA,” in Proceedings of the 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), IEEE, Las Vegas, NV, USA, February 2016.
[32]
A. C. Pradana and A. Aditsania, “Implementing binary particle swarm optimization and C4. 5 decision tree for cancer detection based on microarray data classification,” Journal of Physics: Conference Series, vol. 1192, no. 1, IOP Publishing, 2019.
[33]
R. G. Mantovani, T. Horváth, R. Cerri, J. Vanschoren, and A. C. P. L. F. De Carvalho, “Hyper-parameter tuning of a decision tree induction algorithm,” in Proceedings of the 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), IEEE, Recife, Brazil, October 2016.
[34]
S. Abbas, Z. Jalil, A. R. Javed, I. Batool, M. Z. Khan, A. Noorwali, T. R. Gadekallu, and A. Akbar, “BCD-WERT: a novel approach for breast cancer detection using whale optimization based efficient features and extremely randomized tree algorithm,” PeerJ Computer Science, vol. 7, p. e390, 2021.
[35]
G. T. Reddy, M. P. K. Reddy, K. Lakshmanna, D. S. Rajput, R. Kaluri, and G. Srivastava, “Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis,” Evolutionary Intelligence, vol. 13, no. 2, pp. 185–196, 2020.
[36]
M. Qaraad, S. Amjad, I. I. M. Manhrawy, H. Fathi, and P. E. Kafrawy, “Parameters optimization of elastic NET for high dimensional data using PSO algorithm,” in Proceedings of the 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), IEEE, Fez, Morocco, June 2020.
[37]
P. El-Kafrawy, I. I. M. Manhrawy, H. Fathi, M. Qaraad, and A. K. Kelany, “Using multi-feature selection with machine learning for de novo acute myeloid leukemia in Egypt,” in Proceedings of the 2019 International Conference on Intelligent Systems and Advanced Computing Sciences (ISACS), IEEE, Taza, Morocco, December 2019.
[38]
S. Turgut, M. Dagtekin, and T. Ensari, “Microarray breast cancer data classification using machine learning methods,” in Proceedings of the 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), IEEE, Istanbul, Turkey, April 2018.
[39]
K. Lakshmanna, R. Kaluri, T. R. Gadekallu, G. Nagaraja, and D. V. Subramanian, “An enhanced algorithm for frequent pattern mining from biological sequences,” International Journal of Pharmacy and Technology, vol. 8, no. 2, pp. 12776–12784, 2016.
[40]
K. Polat and S. Güneş, “Classification of epileptiform EEG using a hybrid system based on decision tree classifier and fast Fourier transform,” Applied Mathematics and Computation, vol. 187, no. 2, pp. 1017–1026, 2007.
[41]
J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in Proceedings of the Machine Learning Proceeding Twelfth International Conference, pp. 194–202, Tahoe, CA, USA, July 1995.
[42]
A. Chinnaswamy and R. Srinivasan, “Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data,” in Proceedings of the 6th International Conference on Innovations in Bio-Inspired Computing and Applications, pp. 229–239, Kochi, India, December 2016.
[43]
C. Lazar, J. Taminau, S. Meganck, D. Steenhoff, A. Coletta, C. Molter, V. de Schaetzen, R. Duque, H. Bersini, and A. Nowe, “A survey on filter techniques for feature selection in gene expression microarray analysis,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 9, no. 4, pp. 1106–1119, 2012.
[44]
C.-S. Yang, L.-Y. Chuang, C.-H. Ke, and C.-H. Yang, “A hybrid feature selection method for microarray classification,” IAENG International Journal of Computer Science, vol. 35, no. 3, 2008.
[45]
E. C. Blessie and E. Karthikeyan, “Sigmis: a feature selection algorithm using correlation based method,” Journal of Algorithms & Computational Technology, vol. 6, no. 3, pp. 385–394, 2012.
[46]
G. S. K. Ranjan, A. Kumar Verma, and S. Radhika, “K-nearest neighbors and grid search cv based real time fault monitoring system for industries,” in Proceedings of the 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), IEEE, Pune, India, March 2019.
[47]
B. Stitic, “A multiclass neural network model for contaminant detection in hazelnut-cocoa spread jars,” Politecnico di Torino, Turin, Italy, 2020, Dissertation thesis.
[48]
L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Computational Intelligence and Neuroscience
Computational Intelligence and Neuroscience  Volume 2021, Issue
2021
8452 pages
ISSN:1687-5265
EISSN:1687-5273
Issue’s Table of Contents
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

Hindawi Limited

London, United Kingdom

Publication History

Published: 01 January 2021

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media