Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Decision Tree Model Based Gene Selection and Classification for Breast Cancer Risk Prediction

  • Conference paper
  • First Online:
Smart Applications and Data Analysis (SADASC 2020)

Abstract

Breast cancer is considered the most frequently diagnosed cancer in worldwide women and ranked second after lung cancer. Early diagnosis of this cancer may increase the chance to get an early treatment, which can increase the chance of survival for women suffering from this disease. Recently, Microarray data technology has brought a great opportunity to make diagnose cancer faster and easy. However, the most common challenge of gene expression data is high dimensionality, i.e., thousands of genes, and a few tens of patients, which makes any prediction approach difficult to apply. To take this challenge, a C5.0 based feature selection approach is being proposed. The strongest point of our approach resides in the combination of two feature selection techniques: the fisher-score based filter method and the inner feature selection ability of C5.0. The classification algorithms used to assess our approach in terms of prediction accuracy are Artificial neural Networks, C5.0 Decision Tree, Logistic Regression, and Support Vector Machine. Compared to the state-of-the-art models, our approach can predict breast cancer with the highest accuracy based on a strict minimum of genes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Antoniadis, A., Lambert-Lacroix, S., Leblanc, F.: Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 19(5), 563–570 (2003). https://doi.org/10.1093/bioinformatics/btg062

    Article  Google Scholar 

  2. Ghoncheh, M., Pournamdar, Z., Salehiniya, H.: Incidence and mortality and epidemiology of breast cancer in the world. Asian Pac. J. Cancer Prev. 17(sup3), 43–46 (2016). https://doi.org/10.7314/APJCP.2016.17.S3.43

    Article  Google Scholar 

  3. Moutachaouik, H., El Moudden, I.: Mining prostate cancer behavior using parsimonious factors and shrinkage methods. SSRN J. (2018). https://doi.org/10.2139/ssrn.3180967

    Article  Google Scholar 

  4. Turgut, S., Dagtekin, M., Ensari, T.: Microarray breast cancer data classification using machine learning methods. In: 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), Istanbul, pp. 1–3 (2018). https://doi.org/10.1109/EBBT.2018.8391468

  5. Al-Quraishi, T., Abawajy, J.H., Al-Quraishi, N., Abdalrada, A., Al-Omairi, L.: Predicting breast cancer risk using subset of genes. In: 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, pp. 1379–1384 (2019). https://doi.org/10.1109/CoDIT.2019.8820378

  6. Aldryan, D.P., Adiwijaya, Annisa, A.: Cancer detection based on microarray data classification with ant colony optimization and modified backpropagation conjugate gradient Polak-Ribiére. In: 2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA), Tangerang, Indonesia, pp. 13–16 (2018). https://doi.org/10.1109/IC3INA.2018.8629506

  7. Jain, I., Jain, V.K., Jain, R.: Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl. Soft Comput. 62, 203–215 (2018). https://doi.org/10.1016/j.asoc.2017.09.038

    Article  Google Scholar 

  8. Li, Z., Xie, W., Liu, T.: Efficient feature selection and classification for microarray data. PLoS ONE 13(8), e0202167 (2018). https://doi.org/10.1371/journal.pone.0202167

    Article  Google Scholar 

  9. El Moudden, I., Ouzir, M., ElBernoussi, S.: Feature selection and extraction for class prediction in dysphonia measures analysis: a case study on Parkinson’s disease speech rehabilitation. THC 25, 693–708 (2017)

    Article  Google Scholar 

  10. Gu, Q., Li, Z., Han, J.: Generalized fisher score for feature selection. arXiv:1202.3725 (February 2012)

  11. Wang, Y.Y., Li, J.: Feature-selection ability of the decision-tree algorithm and the impact of feature-selection/extraction on decision-tree results based on hyperspectral data. Int. J. Remote Sens. 29(10), 2993–3010 (2008). https://doi.org/10.1080/01431160701442070

    Article  Google Scholar 

  12. McIver, D.K., Friedl, M.A.: Using prior probabilities in decision-tree classification of remotely sensed data. Remote Sens. Environ. 81(2–3), 253–261 (2002). https://doi.org/10.1016/S0034-4257(02)00003-2

    Article  Google Scholar 

  13. Qi, Z., Yeh, A.G.-O., Li, X., Lin, Z.: A novel algorithm for land use and land cover classification using RADARSAT-2 polarimetric SAR data. Remote Sens. Environ. 118, 21–39 (2012). https://doi.org/10.1016/j.rse.2011.11.001

    Article  Google Scholar 

  14. Deng, L., Yan, Y., Wang, C.: Improved POLSAR image classification by the use of multi-feature combination. Remote Sens. 7(4), 4157–4177 (2015). https://doi.org/10.3390/rs70404157

    Article  Google Scholar 

  15. Revathy, R., Lawrance, R.: Comparative analysis of C4.5 and C5.0 algorithms on crop pest data. Int. J. Innov. Res. Comput. Commun. Eng. 5, 2017 (2019)

    Google Scholar 

  16. Chen, M.-Y.: Predicting corporate financial distress based on integration of decision tree classification and logistic regression. Expert Syst. Appl. 38, 11261–11272 (2011)

    Article  Google Scholar 

  17. Pang, S., Gong, J.: C5.0 classification algorithm and application on individual credit evaluation of banks. Syst. Eng. - Theory Pract. 29(12), 94–104 (2009). https://doi.org/10.1016/S1874-8651(10)60092-0

    Article  Google Scholar 

  18. Rajasekaran, S., Pai, G.A.V.: Neural Network. Fuzzy Logic and Genetic Algorithms - Synthesis and Applications. Prentice-Hall, Upper Saddle River (2005)

    Google Scholar 

  19. van ’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002). https://doi.org/10.1038/415530a

  20. Mohamad, I., Usman, D.: Standardization and its effects on k-means clustering algorithm (2013). https://doi.org/10.19026/rjaset.6.3638

  21. Puspita Siknun, G., Sitanggang, I.: Web-based classification application for forest fire data using the shiny framework and the C5.0 algorithm. Procedia Environ. Sci. 33, 332–339 (2016)

    Article  Google Scholar 

  22. Bujlow, T., Riaz, T., Myrup Pedersen, J.: A method for classification of network traffic based on C5.0 machine learning algorithm (2012)

    Google Scholar 

  23. Ranjbar, S., Aghamohammadi, M., Haghjoo, F.: Determining wide area damping control signal (WADCS)based on C5.0 classifier (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Hamim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hamim, M., El Moudden, I., Moutachaouik, H., Hain, M. (2020). Decision Tree Model Based Gene Selection and Classification for Breast Cancer Risk Prediction. In: Hamlich, M., Bellatreche, L., Mondal, A., Ordonez, C. (eds) Smart Applications and Data Analysis. SADASC 2020. Communications in Computer and Information Science, vol 1207. Springer, Cham. https://doi.org/10.1007/978-3-030-45183-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45183-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45182-0

  • Online ISBN: 978-3-030-45183-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics