Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Feature selection and risk prediction for patients with coronary artery disease using data mining

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

A Correction to this article was published on 19 January 2022

This article has been updated

Abstract

Coronary artery disease (CAD) is an important cause of mortality across the globe. Early risk prediction of CAD would be able to reduce the death rate by allowing early and targeted treatments. In healthcare, some studies applied data mining techniques and machine learning algorithms on the risk prediction of CAD using patient data collected by hospitals and medical centers. However, most of these studies used all the attributes in the datasets which might reduce the performance of prediction models due to data redundancy. The objective of this research is to identify significant features to build models for predicting the risk level of patients with CAD. In this research, significant features were selected using three methods (i.e., Chi-squared test, recursive feature elimination, and Embedded Decision Tree). Synthetic Minority Over-sampling Technique (SMOTE) oversampling technique was implemented to address the imbalanced dataset issue. The prediction models were built based on the identified significant features and eight machine learning algorithms, utilizing Acute Coronary Syndrome (ACS) datasets provided by National Cardiovascular Disease Database (NCVD) Malaysia. The prediction models were evaluated and compared using six performance evaluation metrics, and the top-performing models have achieved AUC more than 90%.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Change history

References

  1. WHO (2018) The top 10 causes of death. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death. Accessed 20 June 2018

  2. Department of Statistics Malaysia (2018) Statistics on causes of death, Malaysia, 2018. Available via DOSM. https://www.dosm.gov.my/v1/index.php. Accessed 30 Jan 2019

  3. Sweis RN, Jivan A (2018) Overview of coronary artery disease (CAD). MSD Manual Professional Version, Available: https://www.msdmanuals.com/professional/cardiovascular-disorders/coronary-artery-disease/overview-of-coronary-artery-disease. Accessed 20 Feb 2019

  4. Hajar R (2017) Risk factors for coronary artery disease: historical perspectives. Heart Views 18(3):109–114

    Article  Google Scholar 

  5. Rashid NA, Nawi AM, Khadijah S (2019) Exploratory analysis of traditional risk factors of ischemic heart disease (IHD) among predominantly Malay Malaysian women. BMC Public Health 19(4):545

    Article  Google Scholar 

  6. Narain R, Saxena S, Goyal AK (2016) Cardiovascular risk prediction: A comparative study of framingham and quantum neural network based approach. Patient Prefer Adherence 10:1259–1270. https://doi.org/10.2147/PPA.S108203

    Article  PubMed  PubMed Central  Google Scholar 

  7. Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L (2017) An analytical method for diseases prediction using machine learning techniques. Comput Chem Eng 106:212–223

    Article  CAS  Google Scholar 

  8. Katus H, Ziegler A, Ekinci O, Giannitsis E, Stough WG, Achenbach S, ..., Crea F (2017) Early diagnosis of acute coronary syndrome. Eur Heart J 38(41):3049–3055

  9. Esfandiari N, Babavalian MR, Moghadam AME, Tabar VK (2014) Knowledge discovery in medicine: Current issue and future trend. Expert Syst Appl 41(9):4434–4463

    Article  Google Scholar 

  10. Mohan S, Thirumalai C, Srivastava G (2019) Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques. IEEE Access 7:81542–81554

    Article  Google Scholar 

  11. Amin MS, Chiam YK, Varathan KD (2019) Identification of significant features and data mining techniques in predicting heart disease. Telematics Inform 36:82–93

    Article  Google Scholar 

  12. Mathan K, Kumar PM, Panchatcharam P, Manogaran G, Varadharajan R (2018) A novel Gini index decision tree data mining method with neural network classifiers for prediction of heart disease. Des Autom Embed Syst 22:225–242

    Article  Google Scholar 

  13. Jaafar J, Atwell E, Johnson O, Clamp S, Wan Ahmad WA (2013) Evaluation of machine learning techniques in predicting acute coronary syndrome outcome. In: Research and Development in Intelligent Systems XXX. Springer, pp 321–333

  14. Sun S (2015) An innovative intelligent system based on automatic diagnostic feature extraction for diagnosing heart diseases. Knowl-Based Syst 75:224–238

    Article  Google Scholar 

  15. Nahar J, Imam T, Tickle KS, Chen YPP (2013) Computational intelligence for heart disease diagnosis: A medical knowledge driven approach. Expert Syst Appl 40(1):96–104

    Article  Google Scholar 

  16. Shilaskar S, Ghatol A (2013) Feature selection for medical diagnosis: evaluation for cardiovascular diseases. Expert Syst Appl 40(10):4146–4153

    Article  Google Scholar 

  17. Phyu TZ, Oo NN (2016) Performance comparison of feature selection methods. In: MATEC Web of Conferences 42(06002). EDP Sciences, pp 1–4

  18. Chin SP, Jeyaindran S, Azhari R, Wan Azman WA, Omar I, Robaayah Z, Sim KH (2008) Acute coronary syndrome (ACS) registry-leading the charge for National Cardiovascular Disease (NCVD) Database. Med J Malaysia 63(Suppl C):29–36

    PubMed  Google Scholar 

  19. Wan Ahmad WA (ed) (2017) Annual report of the NCVD-ACS registry, 2014-2015. National Heart Association of Malaysia. Available: https://www.malaysianheart.org/?p=ncvd&a=1250

  20. Wirth R, Hipp J (2000) CRISP-DM: Towards a standard process model for data mining. In: Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining. Manchester, UK, pp 29–39

  21. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  22. Jovic A, Brkic K, Bogunovic N (2015) A review of feature selection methods with applications. In: 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, pp 1200–1205

  23. Jain D, Singh V (2018) Feature selection and classification systems for chronic disease prediction: A review. Egypt Inform J 19(3):179–189

    Article  Google Scholar 

  24. Han J, Pei J, Kamber M (2012) Data mining: concepts and techniques, 3rd edn. Elsevier

  25. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Article  Google Scholar 

  26. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  27. Chaurasia V, Pal S (2013) Early prediction of heart diseases using data mining techniques. Carib J SciTech 1:208–217

    Google Scholar 

  28. Subanya B, Rajalaxmi R R (2014) Feature selection using Artificial Bee Colony for cardiovascular disease classification. In: International Conference on Electronics and Communication Systems (ICECS), pp. 1–6

  29. Ismaeel S, Miri A, Sadeghian A, Chourishi D (2015) An extreme learning machine (ELM) predictor for electric arc furnaces’ vi vharacteristics. In: IEEE 2nd International Conference on Cyber Security and Cloud Computing (CSCloud), New York, pp 329–334

  30. El-Bialy R, Salamay MA, Karam OH, Khalifa ME (2015) Feature analysis of coronary artery heart disease data sets. Proc Comput Sci 65:459–468

    Article  Google Scholar 

  31. Nahar J, Imam T, Tickle K S, Garcia-Alonso D (2015) Medical knowledge based data mining for cardiac stress test diagnostics. In: 2nd Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE). IEEE, pp 1–7

  32. Verma L, Srivastava S, Negi PC (2016) A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data. J Med Syst 40(7):1–7

    Article  Google Scholar 

  33. Wiharto W, Kusnanto H, Herianto H (2017) Hybrid system of tiered multivariate analysis and artificial neural network for coronary heart disease diagnosis. Int J Electr Comput Eng (IJECE) 7(2):1023–1031

    Article  Google Scholar 

  34. Paul AK, Shill PC, Rabin MRI, Murase K (2018) Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease. Appl Intell 48(7):1739–1756

    Article  Google Scholar 

  35. Ali L, Khan S U, Arshad M, Ali S, Anwar M (2019) A multi-model framework for evalu-ating type of speech samples having complementary information about Parkin-son’s disease. In: 2019 international conference on electrical, communication, and computer engineering (ICECCE). IEEE, pp 1–5

  36. Reddy GT, Reddy MPK, Lakshmanna K, Rajput DS, Kaluri R, Srivastava G (2019) Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis. Evol Intell 1–12

  37. Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart dis-ease risk based on ensemble classification techniques. Inform Med Unlocked 16:100203

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Ministry of Education Malaysia (Higher Education)’s Fundamental Research Grant Scheme (FRGS), Project Code: FP057-2017A. The authors would like to thank the Governance Board member of the Malaysian National Cardiovascular Disease Database (NCVD) Registry for providing us with the Acute Coronary Syndrome (ACS) dataset to be used for the research. Acknowledgement also goes to the Ministry of Health Malaysia and National Heart Association of Malaysia for funding the NCVD Registry database. Thanks to all the NCVD investigators and to all source data providers for their contribution to this registry.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yin Kia Chiam.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 9 Description of attributes from NCVD-ACS Dataset

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Md Idris, N., Chiam, Y.K., Varathan, K.D. et al. Feature selection and risk prediction for patients with coronary artery disease using data mining. Med Biol Eng Comput 58, 3123–3140 (2020). https://doi.org/10.1007/s11517-020-02268-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-020-02268-9

Keywords