Abstract
Breast cancer is considered the most frequently diagnosed cancer in worldwide women and ranked second after lung cancer. Early diagnosis of this cancer may increase the chance to get an early treatment, which can increase the chance of survival for women suffering from this disease. Recently, Microarray data technology has brought a great opportunity to make diagnose cancer faster and easy. However, the most common challenge of gene expression data is high dimensionality, i.e., thousands of genes, and a few tens of patients, which makes any prediction approach difficult to apply. To take this challenge, a C5.0 based feature selection approach is being proposed. The strongest point of our approach resides in the combination of two feature selection techniques: the fisher-score based filter method and the inner feature selection ability of C5.0. The classification algorithms used to assess our approach in terms of prediction accuracy are Artificial neural Networks, C5.0 Decision Tree, Logistic Regression, and Support Vector Machine. Compared to the state-of-the-art models, our approach can predict breast cancer with the highest accuracy based on a strict minimum of genes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19(5), 563–570 (2003). https://doi.org/10.1093/bioinformatics/btg062
Ghoncheh, M., Pournamdar, Z., Salehiniya, H.: Incidence and mortality and epidemiology of breast cancer in the world. Asian Pac. J. Cancer Prev. 17(sup3), 43–46 (2016). https://doi.org/10.7314/APJCP.2016.17.S3.43
Moutachaouik, H., El Moudden, I.: Mining prostate cancer behavior using parsimonious factors and shrinkage methods. SSRN J. (2018). https://doi.org/10.2139/ssrn.3180967
Turgut, S., Dagtekin, M., Ensari, T.: Microarray breast cancer data classification using machine learning methods. In: 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), Istanbul, pp. 1–3 (2018). https://doi.org/10.1109/EBBT.2018.8391468
Al-Quraishi, T., Abawajy, J.H., Al-Quraishi, N., Abdalrada, A., Al-Omairi, L.: Predicting breast cancer risk using subset of genes. In: 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, pp. 1379–1384 (2019). https://doi.org/10.1109/CoDIT.2019.8820378
Aldryan, D.P., Adiwijaya, Annisa, A.: Cancer detection based on microarray data classification with ant colony optimization and modified backpropagation conjugate gradient Polak-Ribiére. In: 2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA), Tangerang, Indonesia, pp. 13–16 (2018). https://doi.org/10.1109/IC3INA.2018.8629506
Jain, I., Jain, V.K., Jain, R.: Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl. Soft Comput. 62, 203–215 (2018). https://doi.org/10.1016/j.asoc.2017.09.038
Li, Z., Xie, W., Liu, T.: Efficient feature selection and classification for microarray data. PLoS ONE 13(8), e0202167 (2018). https://doi.org/10.1371/journal.pone.0202167
El Moudden, I., Ouzir, M., ElBernoussi, S.: Feature selection and extraction for class prediction in dysphonia measures analysis: a case study on Parkinson’s disease speech rehabilitation. THC 25, 693–708 (2017)
Gu, Q., Li, Z., Han, J.: Generalized fisher score for feature selection. arXiv:1202.3725 (February 2012)
Wang, Y.Y., Li, J.: Feature-selection ability of the decision-tree algorithm and the impact of feature-selection/extraction on decision-tree results based on hyperspectral data. Int. J. Remote Sens. 29(10), 2993–3010 (2008). https://doi.org/10.1080/01431160701442070
McIver, D.K., Friedl, M.A.: Using prior probabilities in decision-tree classification of remotely sensed data. Remote Sens. Environ. 81(2–3), 253–261 (2002). https://doi.org/10.1016/S0034-4257(02)00003-2
Qi, Z., Yeh, A.G.-O., Li, X., Lin, Z.: A novel algorithm for land use and land cover classification using RADARSAT-2 polarimetric SAR data. Remote Sens. Environ. 118, 21–39 (2012). https://doi.org/10.1016/j.rse.2011.11.001
Deng, L., Yan, Y., Wang, C.: Improved POLSAR image classification by the use of multi-feature combination. Remote Sens. 7(4), 4157–4177 (2015). https://doi.org/10.3390/rs70404157
Revathy, R., Lawrance, R.: Comparative analysis of C4.5 and C5.0 algorithms on crop pest data. Int. J. Innov. Res. Comput. Commun. Eng. 5, 2017 (2019)
Chen, M.-Y.: Predicting corporate financial distress based on integration of decision tree classification and logistic regression. Expert Syst. Appl. 38, 11261–11272 (2011)
Pang, S., Gong, J.: C5.0 classification algorithm and application on individual credit evaluation of banks. Syst. Eng. - Theory Pract. 29(12), 94–104 (2009). https://doi.org/10.1016/S1874-8651(10)60092-0
Rajasekaran, S., Pai, G.A.V.: Neural Network. Fuzzy Logic and Genetic Algorithms - Synthesis and Applications. Prentice-Hall, Upper Saddle River (2005)
van ’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002). https://doi.org/10.1038/415530a
Mohamad, I., Usman, D.: Standardization and its effects on k-means clustering algorithm (2013). https://doi.org/10.19026/rjaset.6.3638
Puspita Siknun, G., Sitanggang, I.: Web-based classification application for forest fire data using the shiny framework and the C5.0 algorithm. Procedia Environ. Sci. 33, 332–339 (2016)
Bujlow, T., Riaz, T., Myrup Pedersen, J.: A method for classification of network traffic based on C5.0 machine learning algorithm (2012)
Ranjbar, S., Aghamohammadi, M., Haghjoo, F.: Determining wide area damping control signal (WADCS)based on C5.0 classifier (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Hamim, M., El Moudden, I., Moutachaouik, H., Hain, M. (2020). Decision Tree Model Based Gene Selection and Classification for Breast Cancer Risk Prediction. In: Hamlich, M., Bellatreche, L., Mondal, A., Ordonez, C. (eds) Smart Applications and Data Analysis. SADASC 2020. Communications in Computer and Information Science, vol 1207. Springer, Cham. https://doi.org/10.1007/978-3-030-45183-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-45183-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45182-0
Online ISBN: 978-3-030-45183-7
eBook Packages: Computer ScienceComputer Science (R0)