Abstract
Coronary artery disease (CAD) is an important cause of mortality across the globe. Early risk prediction of CAD would be able to reduce the death rate by allowing early and targeted treatments. In healthcare, some studies applied data mining techniques and machine learning algorithms on the risk prediction of CAD using patient data collected by hospitals and medical centers. However, most of these studies used all the attributes in the datasets which might reduce the performance of prediction models due to data redundancy. The objective of this research is to identify significant features to build models for predicting the risk level of patients with CAD. In this research, significant features were selected using three methods (i.e., Chi-squared test, recursive feature elimination, and Embedded Decision Tree). Synthetic Minority Over-sampling Technique (SMOTE) oversampling technique was implemented to address the imbalanced dataset issue. The prediction models were built based on the identified significant features and eight machine learning algorithms, utilizing Acute Coronary Syndrome (ACS) datasets provided by National Cardiovascular Disease Database (NCVD) Malaysia. The prediction models were evaluated and compared using six performance evaluation metrics, and the top-performing models have achieved AUC more than 90%.
Similar content being viewed by others
Change history
19 January 2022
A Correction to this paper has been published: https://doi.org/10.1007/s11517-022-02506-2
References
WHO (2018) The top 10 causes of death. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death. Accessed 20 June 2018
Department of Statistics Malaysia (2018) Statistics on causes of death, Malaysia, 2018. Available via DOSM. https://www.dosm.gov.my/v1/index.php. Accessed 30 Jan 2019
Sweis RN, Jivan A (2018) Overview of coronary artery disease (CAD). MSD Manual Professional Version, Available: https://www.msdmanuals.com/professional/cardiovascular-disorders/coronary-artery-disease/overview-of-coronary-artery-disease. Accessed 20 Feb 2019
Hajar R (2017) Risk factors for coronary artery disease: historical perspectives. Heart Views 18(3):109–114
Rashid NA, Nawi AM, Khadijah S (2019) Exploratory analysis of traditional risk factors of ischemic heart disease (IHD) among predominantly Malay Malaysian women. BMC Public Health 19(4):545
Narain R, Saxena S, Goyal AK (2016) Cardiovascular risk prediction: A comparative study of framingham and quantum neural network based approach. Patient Prefer Adherence 10:1259–1270. https://doi.org/10.2147/PPA.S108203
Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L (2017) An analytical method for diseases prediction using machine learning techniques. Comput Chem Eng 106:212–223
Katus H, Ziegler A, Ekinci O, Giannitsis E, Stough WG, Achenbach S, ..., Crea F (2017) Early diagnosis of acute coronary syndrome. Eur Heart J 38(41):3049–3055
Esfandiari N, Babavalian MR, Moghadam AME, Tabar VK (2014) Knowledge discovery in medicine: Current issue and future trend. Expert Syst Appl 41(9):4434–4463
Mohan S, Thirumalai C, Srivastava G (2019) Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques. IEEE Access 7:81542–81554
Amin MS, Chiam YK, Varathan KD (2019) Identification of significant features and data mining techniques in predicting heart disease. Telematics Inform 36:82–93
Mathan K, Kumar PM, Panchatcharam P, Manogaran G, Varadharajan R (2018) A novel Gini index decision tree data mining method with neural network classifiers for prediction of heart disease. Des Autom Embed Syst 22:225–242
Jaafar J, Atwell E, Johnson O, Clamp S, Wan Ahmad WA (2013) Evaluation of machine learning techniques in predicting acute coronary syndrome outcome. In: Research and Development in Intelligent Systems XXX. Springer, pp 321–333
Sun S (2015) An innovative intelligent system based on automatic diagnostic feature extraction for diagnosing heart diseases. Knowl-Based Syst 75:224–238
Nahar J, Imam T, Tickle KS, Chen YPP (2013) Computational intelligence for heart disease diagnosis: A medical knowledge driven approach. Expert Syst Appl 40(1):96–104
Shilaskar S, Ghatol A (2013) Feature selection for medical diagnosis: evaluation for cardiovascular diseases. Expert Syst Appl 40(10):4146–4153
Phyu TZ, Oo NN (2016) Performance comparison of feature selection methods. In: MATEC Web of Conferences 42(06002). EDP Sciences, pp 1–4
Chin SP, Jeyaindran S, Azhari R, Wan Azman WA, Omar I, Robaayah Z, Sim KH (2008) Acute coronary syndrome (ACS) registry-leading the charge for National Cardiovascular Disease (NCVD) Database. Med J Malaysia 63(Suppl C):29–36
Wan Ahmad WA (ed) (2017) Annual report of the NCVD-ACS registry, 2014-2015. National Heart Association of Malaysia. Available: https://www.malaysianheart.org/?p=ncvd&a=1250
Wirth R, Hipp J (2000) CRISP-DM: Towards a standard process model for data mining. In: Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining. Manchester, UK, pp 29–39
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Jovic A, Brkic K, Bogunovic N (2015) A review of feature selection methods with applications. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, pp 1200–1205
Jain D, Singh V (2018) Feature selection and classification systems for chronic disease prediction: A review. Egypt Inform J 19(3):179–189
Han J, Pei J, Kamber M (2012) Data mining: concepts and techniques, 3rd edn. Elsevier
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Breiman L (2001) Random forests. Mach Learn 45:5–32
Chaurasia V, Pal S (2013) Early prediction of heart diseases using data mining techniques. Carib J SciTech 1:208–217
Subanya B, Rajalaxmi R R (2014) Feature selection using Artificial Bee Colony for cardiovascular disease classification. In: International Conference on Electronics and Communication Systems (ICECS), pp. 1–6
Ismaeel S, Miri A, Sadeghian A, Chourishi D (2015) An extreme learning machine (ELM) predictor for electric arc furnaces’ vi vharacteristics. In: IEEE 2nd International Conference on Cyber Security and Cloud Computing (CSCloud), New York, pp 329–334
El-Bialy R, Salamay MA, Karam OH, Khalifa ME (2015) Feature analysis of coronary artery heart disease data sets. Proc Comput Sci 65:459–468
Nahar J, Imam T, Tickle K S, Garcia-Alonso D (2015) Medical knowledge based data mining for cardiac stress test diagnostics. In: 2nd Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE). IEEE, pp 1–7
Verma L, Srivastava S, Negi PC (2016) A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data. J Med Syst 40(7):1–7
Wiharto W, Kusnanto H, Herianto H (2017) Hybrid system of tiered multivariate analysis and artificial neural network for coronary heart disease diagnosis. Int J Electr Comput Eng (IJECE) 7(2):1023–1031
Paul AK, Shill PC, Rabin MRI, Murase K (2018) Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease. Appl Intell 48(7):1739–1756
Ali L, Khan S U, Arshad M, Ali S, Anwar M (2019) A multi-model framework for evalu-ating type of speech samples having complementary information about Parkin-son’s disease. In: 2019 international conference on electrical, communication, and computer engineering (ICECCE). IEEE, pp 1–5
Reddy GT, Reddy MPK, Lakshmanna K, Rajput DS, Kaluri R, Srivastava G (2019) Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis. Evol Intell 1–12
Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart dis-ease risk based on ensemble classification techniques. Inform Med Unlocked 16:100203
Acknowledgments
This work was supported by the Ministry of Education Malaysia (Higher Education)’s Fundamental Research Grant Scheme (FRGS), Project Code: FP057-2017A. The authors would like to thank the Governance Board member of the Malaysian National Cardiovascular Disease Database (NCVD) Registry for providing us with the Acute Coronary Syndrome (ACS) dataset to be used for the research. Acknowledgement also goes to the Ministry of Health Malaysia and National Heart Association of Malaysia for funding the NCVD Registry database. Thanks to all the NCVD investigators and to all source data providers for their contribution to this registry.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Md Idris, N., Chiam, Y.K., Varathan, K.D. et al. Feature selection and risk prediction for patients with coronary artery disease using data mining. Med Biol Eng Comput 58, 3123–3140 (2020). https://doi.org/10.1007/s11517-020-02268-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11517-020-02268-9