Feature Selection in Cancer Classification: Utilizing Explainable Artificial Intelligence to Uncover Influential Genes in Machine Learning Models
Abstract
:1. Introduction
- Application of the SHAP technique as a method for input feature dimensionality reduction.
- Utilization of an XAI method to explain the behavior of classifiers based on the SHAP library in the Python programming language.
- Development of high-performing models using the most influential RNA-seq gene expression values selected by SHAP.
- Analysis of the key genes identified by the SHAP technique.
2. Related Works
3. Materials and Methods
3.1. Database
3.2. Data Preprocessing
3.3. Machine Learning Algorithms
3.3.1. Decision Trees
3.3.2. Random Forest
3.3.3. Extreme Gradient Boosting—XGBoost
3.3.4. Naive Bayes—NB
3.3.5. Explainable Artificial Intelligence
3.4. Model Training
4. Results
4.1. Training Machine Learning Models to Predict Cancer Types Using RNA-Seq Data Based on the Full Gene List
4.2. Feature Selection Using SHAP and ML Model Performance Evaluation
4.3. SHAP Genes
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fahad Ullah, M. Breast cancer: Current perspectives on the disease status. In Breast Cancer Metastasis and Drug Resistance: Challenges and Progress; Springer: Berlin/Heidelberg, Germany, 2019; pp. 51–64. [Google Scholar]
- Wang, X.; Ahmad, I.; Javeed, D.; Zaidi, S.A.; Alotaibi, F.M.; Ghoneim, M.E.; Daradkeh, Y.I.; Asghar, J.; Eldin, E.T. Intelligent Hybrid Deep Learning Model for Breast Cancer Detection. Electronics 2022, 11, 2767. [Google Scholar] [CrossRef]
- Fidler-Benaoudia, M.M.; Torre, L.A.; Bray, F.; Ferlay, J.; Jemal, A. Lung cancer incidence in young women vs. young men: A systematic analysis in 40 countries. Int. J. Cancer 2020, 147, 811–819. [Google Scholar] [CrossRef] [PubMed]
- Tsai, L.L.; Chu, N.Q.; Blessing, W.A.; Moonsamy, P.; Colson, Y.L. Lung cancer in women. Ann. Thorac. Surg. 2022, 114, 1965–1973. [Google Scholar] [CrossRef] [PubMed]
- Stewart, C.; Ralyea, C.; Lockwood, S. Ovarian cancer: An integrated review. In Proceedings of the Seminars in Oncology Nursing; Elsevier: Amsterdam, The Netherlands, 2019; Volume 35, pp. 151–156. [Google Scholar]
- Cai, Y.; Rattray, N.J.; Zhang, Q.; Mironova, V.; Santos-Neto, A.; Hsu, K.S.; Rattray, Z.; Cross, J.R.; Zhang, Y.; Paty, P.B.; et al. Sex differences in colon cancer metabolism reveal a novel subphenotype. Sci. Rep. 2020, 10, 4905. [Google Scholar] [CrossRef]
- Wen, H.; Li, F.; Bukhari, I.; Mi, Y.; Guo, C.; Liu, B.; Zheng, P.; Liu, S. Comprehensive analysis of colorectal cancer immunity and identification of immune-related prognostic targets. Dis. Markers 2022, 2022, 7932655. [Google Scholar] [CrossRef]
- Van Velsen, E.F.; Leung, A.M.; Korevaar, T.I. Diagnostic and treatment considerations for thyroid cancer in women of reproductive age and the perinatal period. Endocrinol. Metab. Clin. 2022, 51, 403–416. [Google Scholar] [CrossRef]
- Tang, Z.; Zhang, J.; Zhou, Q.; Xu, S.; Cai, Z.; Jiang, G. Thyroid cancer “epidemic”: A socio-environmental health problem needs collaborative efforts. Environ. Sci. Technol. 2020, 54, 3725–3727. [Google Scholar] [CrossRef]
- Mattiuzzi, C.; Lippi, G. Current cancer epidemiology. J. Epidemiol. Glob. Health 2019, 9, 217. [Google Scholar] [CrossRef]
- Huang, S.; Yang, J.; Fong, S.; Zhao, Q. Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett. 2020, 471, 61–71. [Google Scholar] [CrossRef]
- Elemento, O.; Leslie, C.; Lundin, J.; Tourassi, G. Artificial intelligence in cancer research, diagnosis and therapy. Nat. Rev. Cancer 2021, 21, 747–752. [Google Scholar] [CrossRef]
- Chua, I.S.; Gaziel-Yablowitz, M.; Korach, Z.T.; Kehl, K.L.; Levitan, N.A.; Arriaga, Y.E.; Jackson, G.P.; Bates, D.W.; Hassett, M. Artificial intelligence in oncology: Path to implementation. Cancer Med. 2021, 10, 4138–4149. [Google Scholar] [CrossRef] [PubMed]
- Agrawal, P.; Abutarboush, H.F.; Ganesh, T.; Mohamed, A.W. Metaheuristic algorithms on feature selection: A survey of one decade of research (2009–2019). IEEE Access 2021, 9, 26766–26791. [Google Scholar] [CrossRef]
- Moncada-Torres, A.; van Maaren, M.C.; Hendriks, M.P.; Siesling, S.; Geleijnse, G. Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci. Rep. 2021, 11, 6968. [Google Scholar] [CrossRef]
- Hauser, K.; Kurz, A.; Haggenmüller, S.; Maron, R.C.; von Kalle, C.; Utikal, J.S.; Meier, F.; Hobelsberger, S.; Gellrich, F.F.; Sergon, M.; et al. Explainable artificial intelligence in skin cancer recognition: A systematic review. Eur. J. Cancer 2022, 167, 54–69. [Google Scholar] [CrossRef]
- Zhang, Y.; Weng, Y.; Lund, J. Applications of explainable artificial intelligence in diagnosis and surgery. Diagnostics 2022, 12, 237. [Google Scholar] [CrossRef]
- Meshoul, S.; Batouche, A.; Shaiba, H.; AlBinali, S. Explainable Multi-Class Classification Based on Integrative Feature Selection for Breast Cancer Subtyping. Mathematics 2022, 10, 4271. [Google Scholar] [CrossRef]
- Ara, S.; Das, A.; Dey, A. Malignant and benign breast cancer classification using machine learning algorithms. In Proceedings of the 2021 International Conference on Artificial Intelligence (ICAI), Islamabad, Pakistan, 5–7 April 2021; pp. 97–101. [Google Scholar]
- Vural, S.; Wang, X.; Guda, C. Classification of breast cancer patients using somatic mutation profiles and machine learning approaches. BMC Syst. Biol. 2016, 10, 263–276. [Google Scholar] [CrossRef]
- Ram, M.; Najafi, A.; Shakeri, M.T. Classification and biomarker genes selection for cancer gene expression data using random forest. Iran. J. Pathol. 2017, 12, 339. [Google Scholar] [CrossRef]
- Yuan, F.; Lu, L.; Zou, Q. Analysis of gene expression profiles of lung cancer subtypes with machine learning algorithms. Biochim. Biophys. Acta (BBA) Mol. Basis Dis. 2020, 1866, 165822. [Google Scholar] [CrossRef]
- Yeganeh, P.N.; Mostafavi, M.T. Use of Machine Learning for Diagnosis of Cancer in Ovarian Tissues with a Selected mRNA Panel. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018; pp. 2429–2434. [Google Scholar]
- Alharbi, F.; Vakanski, A. Machine learning methods for cancer classification using gene expression data: A review. Bioengineering 2023, 10, 173. [Google Scholar] [CrossRef]
- Khalifa, N.E.M.; Taha, M.H.N.; Ezzat Ali, D.; Slowik, A.; Hassanien, A.E. Artificial Intelligence Technique for Gene Expression by Tumor RNA-Seq Data: A Novel Optimized Deep Learning Approach. IEEE Access 2020, 8, 22874–22883. [Google Scholar] [CrossRef]
- De Guia, J.M.; Devaraj, M.; Leung, C.K. DeepGx: Deep Learning Using Gene Expression for Cancer Classification. In Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Vancouver, BC, Canada, 27–30 August 2019; pp. 913–920. [Google Scholar]
- Hassan, M.R.; Islam, M.F.; Uddin, M.Z.; Ghoshal, G.; Hassan, M.M.; Huda, S.; Fortino, G. Prostate cancer classification from ultrasound and MRI images using deep learning based Explainable Artificial Intelligence. Future Gener. Comput. Syst. 2022, 127, 462–472. [Google Scholar] [CrossRef]
- Yap, M.; Johnston, R.L.; Foley, H.; MacDonald, S.; Kondrashova, O.; Tran, K.A.; Nones, K.; Koufariotis, L.T.; Bean, C.; Pearson, J.V.; et al. Verifying explainability of a deep learning tissue classifier trained on RNA-seq data. Sci. Rep. 2021, 11, 2641. [Google Scholar] [CrossRef] [PubMed]
- Grossman, R.L.; Heath, A.P.; Ferretti, V.; Varmus, H.E.; Lowy, D.R.; Kibbe, W.A.; Staudt, L.M. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 2016, 375, 1109–1112. [Google Scholar] [CrossRef]
- Colaprico, A.; Silva, T.C.; Olsen, C.; Garofano, L.; Cava, C.; Garolini, D.; Sabedot, T.S.; Malta, T.M.; Pagnotta, S.M.; Castiglioni, I.; et al. TCGAbiolinks: An R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016, 44, e71. [Google Scholar] [CrossRef]
- Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386. [Google Scholar] [CrossRef]
- Yan, J.; Wang, X. Unsupervised and semi-supervised learning: The next frontier in machine learning for plant systems biology. Plant J. 2022, 111, 1527–1538. [Google Scholar] [CrossRef]
- Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef]
- Kaiser, L.; Babaeizadeh, M.; Milos, P.; Osinski, B.; Campbell, R.H.; Czechowski, K.; Erhan, D.; Finn, C.; Kozakowski, P.; Levine, S.; et al. Model-based reinforcement learning for atari. arXiv 2019, arXiv:1903.00374. [Google Scholar]
- Somvanshi, M.; Chavan, P.; Tambade, S.; Shinde, S. A review of machine learning techniques using decision tree and support vector machine. In Proceedings of the 2016 International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 12–13 August 2016; pp. 1–7. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Cutler, A.; Cutler, D.; Stevens, J. Random forests. In Ensemble Machine Learning; Springer: Berlin/Heidelberg, Germany, 2012; pp. 157–175. [Google Scholar]
- Reis, I.; Baron, D.; Shahaf, S. Probabilistic random forest: A machine learning algorithm for noisy data sets. Astron. J. 2018, 157, 16. [Google Scholar] [CrossRef]
- Schonlau, M.; Zou, R.Y. The random forest algorithm for statistical learning. Stata J. 2020, 20, 3–29. [Google Scholar] [CrossRef]
- Ali, J.; Khan, R.; Ahmad, N.; Maqsood, I. Random forests and decision trees. Int. J. Comput. Sci. Issues (IJCSI) 2012, 9, 272. [Google Scholar]
- Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
- Meng, Y.; Yang, N.; Qian, Z.; Zhang, G. What makes an online review more helpful: An interpretation framework using XGBoost and SHAP values. J. Theor. Appl. Electron. Commer. Res. 2020, 16, 466–490. [Google Scholar] [CrossRef]
- Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, 4–10 August 2001; Volume 3, pp. 41–46. [Google Scholar]
- Zhang, H.; Li, D. Naïve Bayes text classifier. In Proceedings of the 2007 IEEE International Conference on Granular Computing (GRC 2007), San Jose, CA, USA, 2–4 November 2007; p. 708. [Google Scholar]
- Kamel, H.; Abdulah, D.; Al-Tuwaijari, J.M. Cancer classification using gaussian naive bayes algorithm. In Proceedings of the 2019 International Engineering Conference (IEC), Erbil, Iraq, 23–25 June 2019; pp. 165–170. [Google Scholar]
- Singh, G.; Kumar, B.; Gaur, L.; Tyagi, A. Comparison between multinomial and Bernoulli naïve Bayes for text classification. In Proceedings of the 2019 International Conference on Automation, Computational and Technology Management (ICACTM), London, UK, 24–26 April 2019; pp. 593–596. [Google Scholar]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
- Mangalathu, S.; Hwang, S.H.; Jeon, J.S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Erion, G.G.; Lee, S.I. Consistent individualized feature attribution for tree ensembles. arXiv 2018, arXiv:1802.03888. [Google Scholar]
- Probst, P.; Boulesteix, A.L. To tune or not to tune the number of trees in random forest. J. Mach. Learn. Res. 2017, 18, 6673–6690. [Google Scholar]
- Quinto, B. Next-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More; Apress: New York, NY, USA, 2020. [Google Scholar]
- Marcílio, W.E.; Eler, D.M. From explanations to feature selection: Assessing SHAP values as feature selection mechanism. In Proceedings of the 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Recife/Porto de Galinhas, Brazil, 7–10 November 2020; pp. 340–347. [Google Scholar]
- Santos, M.R.; Guedes, A.; Sanchez-Gendriz, I. SHapley Additive exPlanations (SHAP) for Efficient Feature Selection in Rolling Bearing Fault Diagnosis. Mach. Learn. Knowl. Extr. 2024, 6, 316–341. [Google Scholar] [CrossRef]
- Sadaei, H.J.; Loguercio, S.; Shafiei Neyestanak, M.; Torkamani, A.; Prilutsky, D. Zoish: A Novel Feature Selection Approach Leveraging Shapley Additive Values for Machine Learning Applications in Healthcare. In Proceedings of the Pacific Symposium on Biocomputing, Kohala Coast, HI, USA, 3–7 January 2024; World Scientific: Singapore, 2023; pp. 81–95. [Google Scholar]
- Mohammed, M.; Mwambi, H.; Mboya, I.B.; Elbashir, M.K.; Omolo, B. A stacking ensemble deep learning approach to cancer type classification based on TCGA data. Sci. Rep. 2021, 11, 15626. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Yu, Y.; Kossinna, P.; Lun, T.; Liao, W.; Zhang, Q. XA4C: eXplainable representation learning via Autoencoders revealing Critical genes. PLoS Comput. Biol. 2023, 19, e1011476. [Google Scholar] [CrossRef] [PubMed]
- Citterio, C.E.; Targovnik, H.M.; Arvan, P. The role of thyroglobulin in thyroid hormonogenesis. Nat. Rev. Endocrinol. 2019, 15, 323–338. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Young, C.; Morishita, Y.; Kim, K.; Kabil, O.O.; Clarke, O.B.; Di Jeso, B.; Arvan, P. Defective thyroglobulin: Cell biology of disease. Int. J. Mol. Sci. 2022, 23, 13605. [Google Scholar] [CrossRef]
- Guo, R.J.; Suh, E.R.; Lynch, J.P. The role of Cdx proteins in intestinal development and cancer. Cancer Biol. Ther. 2004, 3, 593–601. [Google Scholar] [CrossRef]
- Dum, D.; Ocokoljic, A.; Lennartz, M.; Hube-Magg, C.; Reiswich, V.; Höflmayer, D.; Jacobsen, F.; Bernreuther, C.; Lebok, P.; Sauter, G.; et al. FABP1 expression in human tumors: A tissue microarray study on 17,071 tumors. Virchows Arch. 2022, 481, 945–961. [Google Scholar] [CrossRef]
- Frey, D.; Coelho, V.; Petrausch, U.; Schaefer, M.; Keilholz, U.; Thiel, E.; Deckert, P.M. Surface expression of gpA33 is dependent on culture density and cell-cycle phase and is regulated by intracellular traffic rather than gene transcription. Cancer Biother. Radiopharm. 2008, 23, 65–73. [Google Scholar] [CrossRef]
- Shaker, N.; Chen, W.; Sinclair, W.; Parwani, A.V.; Li, Z. Identifying SOX17 as a sensitive and specific marker for ovarian and endometrial carcinomas. Mod. Pathol. 2023, 36, 100038. [Google Scholar] [CrossRef]
- Wang, S.; Wang, Y.; Chen, Y.; Li, Y.; Du, X.; Li, Y.; Li, Q. MEIS1 Is a Common Transcription Repressor of the miR-23a and NORHA Axis in Granulosa Cells. Int. J. Mol. Sci. 2023, 24, 3589. [Google Scholar] [CrossRef]
- Pellegrini, M.; Pantano, S.; Lucchini, F.; Fumi, M.; Forabosco, A. Emx 2 developmental expression in the primordia of the reproductive and excretory systems. Anat. Embryol. 1997, 196, 427–433. [Google Scholar] [CrossRef]
- Chen, X.; Lv, Y.; Sun, Y.; Zhang, H.; Xie, W.; Zhong, L.; Chen, Q.; Li, M.; Li, L.; Feng, J.; et al. PGC1β regulates breast tumor growth and metastasis by SREBP1-mediated HKDC1 expression. Front. Oncol. 2019, 9, 290. [Google Scholar] [CrossRef] [PubMed]
- Kjær, I.M.; Kahns, S.; Timm, S.; Andersen, R.F.; Madsen, J.S.; Jakobsen, E.H.; Tabor, T.P.; Jakobsen, A.; Bechmann, T. Phase II trial of delta-tocotrienol in neoadjuvant breast cancer with evaluation of treatment response using ctDNA. Sci. Rep. 2023, 13, 8419. [Google Scholar] [CrossRef] [PubMed]
- Moss, J.; Zick, A.; Grinshpun, A.; Carmon, E.; Maoz, M.; Ochana, B.; Abraham, O.; Arieli, O.; Germansky, L.; Meir, K.; et al. Circulating breast-derived DNA allows universal detection and monitoring of localized breast cancer. Ann. Oncol. 2020, 31, 395–403. [Google Scholar] [CrossRef] [PubMed]
- Takaku, M.; Grimm, S.A.; Wade, P.A. GATA3 in breast cancer: Tumor suppressor or oncogene? Gene Expr. J. Liver Res. 2015, 16, 163–168. [Google Scholar] [CrossRef] [PubMed]
- Ai, D.; Yao, J.; Yang, F.; Huo, L.; Chen, H.; Lu, W.; Soto, L.M.S.; Jiang, M.; Raso, M.G.; Wang, S.; et al. TRPS1: A highly sensitive and specific marker for breast carcinoma, especially for triple-negative breast cancer. Mod. Pathol. 2021, 34, 710–719. [Google Scholar] [CrossRef]
- Perez-Balaguer, A.; Ortiz-Martínez, F.; García-Martínez, A.; Pomares-Navarro, C.; Lerma, E.; Peiró, G. FOXA2 mRNA expression is associated with relapse in patients with triple-negative/basal-like breast carcinoma. Breast Cancer Res. Treat. 2015, 153, 465–474. [Google Scholar] [CrossRef]
- Floros, J.; Thorenoor, N.; Tsotakos, N.; Phelps, D.S. Human surfactant protein SP-A1 and SP-A2 variants differentially affect the alveolar microenvironment, surfactant structure, regulation and function of the alveolar macrophage, and animal and human survival under various conditions. Front. Immunol. 2021, 12, 681639. [Google Scholar] [CrossRef]
- Kim, M.Y.; Go, H.; Koh, J.; Lee, K.; Min, H.S.; Kim, M.A.; Jeon, Y.K.; Lee, H.S.; Moon, K.C.; Park, S.Y.; et al. Napsin A is a useful marker for metastatic adenocarcinomas of pulmonary origin. Histopathology 2014, 65, 195–206. [Google Scholar] [CrossRef]
- Horie, M.; Miyashita, N.; Mikami, Y.; Noguchi, S.; Yamauchi, Y.; Suzukawa, M.; Fukami, T.; Ohta, K.; Asano, Y.; Sato, S.; et al. TBX4 is involved in the super-enhancer-driven transcriptional programs underlying features specific to lung fibroblasts. Am. J. Physiol.-Lung Cell. Mol. Physiol. 2018, 314, L177–L191. [Google Scholar] [CrossRef]
- Ghafouri-Fard, S.; Hussen, B.M.; Abdullah, S.R.; Dadyar, M.; Taheri, M.; Kiani, A. A review on the role of HAND2-AS1 in cancer. Clin. Exp. Med. 2023, 23, 3179–3188. [Google Scholar] [CrossRef]
- Fei, X.; Wang, H.; Yuan, W.; Wo, M.; Jiang, L. Tissue factor pathway inhibitor-1 is a valuable marker for the prediction of deep venous thrombosis and tumor metastasis in patients with lung cancer. BioMed Res. Int. 2017, 2017, 8983763. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Gao, Z.; Zeng, G.; Xie, H.; Liu, J.; Liu, N.; Wang, G. Clinical significance of urinary plasminogen and fibrinogen gamma chain as novel potential diagnostic markers for non-small-cell lung cancer. Clin. Chim. Acta 2020, 502, 55–65. [Google Scholar] [CrossRef] [PubMed]
- Giovanella, L.; D’Aurizio, F.; Petranović Ovčariček, P.; Görges, R. Diagnostic, Theranostic and Prognostic Value of Thyroglobulin in Thyroid Cancer. J. Clin. Med. 2024, 13, 2463. [Google Scholar] [CrossRef]
- Kołodziej, M.; Saracyn, M.; Lubas, A.; Brodowska-Kania, D.; Mazurek, A.; Dziuk, M.; Durma, A.D.; Niemczyk, S.; Kamiński, G. TSH Stimulation before PET/CT as Our Frenemy in Detecting Thyroid Cancer Metastases—Final Results of a Retrospective Analysis. Cancers 2024, 16, 3413. [Google Scholar] [CrossRef] [PubMed]
- Hadi, R.; Xu, H. Primary Lung Versus Metastatic Adenocarcinoma. In Practical Lung Pathology: Frequently Asked Questions; Springer: Berlin/Heidelberg, Germany, 2022; pp. 101–105. [Google Scholar]
- Xu, Y. Single-cell landscape of the immune microenvironment of leptomeningeal metastases in non-small cell lung cancer treated with pemetrexed sheath injection. J. Clin. Oncol. 2024, 42, e20026. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, W.; Min, X.; Zhujun, X.; Fangmei, A.; Qiang, Z.; Wenying, T.; Tianyue, Z. High expression of FABP4 in colorectal cancer and its clinical significance. J. Zhejiang Univ. Sci. B 2021, 22, 136. [Google Scholar] [CrossRef]
- Prayugo, F.B.; Kao, T.J.; Anuraga, G.; Ta, H.D.K.; Chuang, J.Y.; Lin, L.C.; Wu, Y.F.; Wang, C.Y.; Lee, K.H. Expression profiles and prognostic value of FABPs in colorectal adenocarcinomas. Biomedicines 2021, 9, 1460. [Google Scholar] [CrossRef]
- Lin, L.; Shi, K.; Zhou, S.; Cai, M.C.; Zhang, C.; Sun, Y.; Zang, J.; Cheng, L.; Ye, K.; Ma, P.; et al. SOX17 and PAX8 constitute an actionable lineage-survival transcriptional complex in ovarian cancer. Oncogene 2022, 41, 1767–1779. [Google Scholar] [CrossRef]
- Chaves-Moreira, D.; Mitchell, M.A.; Arruza, C.; Rawat, P.; Sidoli, S.; Nameki, R.; Reddy, J.; Corona, R.I.; Afeyan, L.K.; Klein, I.A.; et al. The transcription factor PAX8 promotes angiogenesis in ovarian cancer through interaction with SOX17. Sci. Signal. 2022, 15, eabm2496. [Google Scholar] [CrossRef]
Classifier | Metric | Original Features | SHAP-Selected Features |
---|---|---|---|
Decision Tree | Accuracy | 97.60% | 98.69% |
Precision | 97.74% | 98.74% | |
Recall | 97.70% | 98.70% | |
F1 Score | 97.64% | 98.70% | |
Random Forest | Accuracy | 99.40% | 99.76% |
Precision | 99.43% | 99.77% | |
Recall | 99.40% | 99.87% | |
F1 Score | 99.40% | 99.86% | |
XGBoost | Accuracy | 99.34% | 99.64% |
Precision | 99.36% | 99.66% | |
Recall | 99.34% | 99.79% | |
F1 Score | 99.34% | 99.80% | |
Gaussian Naive Bayes | Accuracy | 98.93% | 99.63% |
Precision | 98.88% | 99.55% | |
Recall | 98.82% | 99.65% | |
F1 Score | 98.84% | 99.60% | |
Bernoulli Naive Bayes | Accuracy | 97.94% | 99.22% |
Precision | 97.14% | 99.19% | |
Recall | 97.41% | 99.26% | |
F1 Score | 97.20% | 99.22% |
Gene Abbreviation | Gene Name |
---|---|
TG | Thyroglobulin |
CDX1 | Caudal-type homeobox 1 |
SOX17 | SRY-box transcription factor 17 |
HKDC1 | Hexokinase domain-containing 1 |
SFTPA2 | Surfactant protein A2 |
FABP1 | Fatty acid binding protein 1 |
LMX1B | LIM homeobox transcription factor 1 beta |
NAPSA | Napsin A aspartic peptidase |
MEIS1 | Meis homeobox 1 |
TBX4 | T-box transcription factor 4 |
SFTPA1 | Surfactant protein A1 |
GATA3 | GATA binding protein 3 |
EMX2 | Empty spiracles homeobox 2 |
TRPS1 | Transcriptional repressor GATA binding 1 |
FOXA2 | Forkhead box A2 |
HAND2 | Heart and neural crest derivatives expressed 2 |
GPA33 | Glycoprotein A33 |
TFPI | Tissue factor pathway inhibitor |
RPL10AP6 | Ribosomal protein L10a pseudogene 6 |
FGG | Fibrinogen gamma chain |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dalmolin, M.; Azevedo, K.S.; Souza, L.C.d.; de Farias, C.B.; Lichtenfels, M.; Fernandes, M.A.C. Feature Selection in Cancer Classification: Utilizing Explainable Artificial Intelligence to Uncover Influential Genes in Machine Learning Models. AI 2025, 6, 2. https://doi.org/10.3390/ai6010002
Dalmolin M, Azevedo KS, Souza LCd, de Farias CB, Lichtenfels M, Fernandes MAC. Feature Selection in Cancer Classification: Utilizing Explainable Artificial Intelligence to Uncover Influential Genes in Machine Learning Models. AI. 2025; 6(1):2. https://doi.org/10.3390/ai6010002
Chicago/Turabian StyleDalmolin, Matheus, Karolayne S. Azevedo, Luísa C. de Souza, Caroline B. de Farias, Martina Lichtenfels, and Marcelo A. C. Fernandes. 2025. "Feature Selection in Cancer Classification: Utilizing Explainable Artificial Intelligence to Uncover Influential Genes in Machine Learning Models" AI 6, no. 1: 2. https://doi.org/10.3390/ai6010002
APA StyleDalmolin, M., Azevedo, K. S., Souza, L. C. d., de Farias, C. B., Lichtenfels, M., & Fernandes, M. A. C. (2025). Feature Selection in Cancer Classification: Utilizing Explainable Artificial Intelligence to Uncover Influential Genes in Machine Learning Models. AI, 6(1), 2. https://doi.org/10.3390/ai6010002