Abstract
Pulmonary Embolism (PE) is a life-threatening clinical disease with no specific clinical symptoms and Computed Tomography Angiography (CTA) is used for diagnosis. Clinical decision support scoring systems like Wells and rGeneva based on PE risk factors have been developed to estimate the pre-test probability but are underused, leading to continuous overuse of CTA imaging. This diagnostic study aimed to propose a novel approach for efficient management of PE diagnosis using a two-step interconnected machine learning framework directly by analyzing patients' Electronic Health Records data. First, we performed feature importance analysis according to the result of LightGBM superiority for PE prediction, then four state-of-the-art machine learning methods were applied for PE prediction based on the feature importance results, enabling swift and accurate pre-test diagnosis. Throughout the study patients' data from different departments were collected from Sina educational hospital, affiliated with the Tehran University of medical sciences in Iran. Generally, the Ridge classification method obtained the best performance with an F1 score of 0.96. Extensive experimental findings showed the effectiveness and simplicity of this diagnostic process of PE in comparison with the existing scoring systems. The main strength of this approach centered on PE disease management procedures, which would reduce avoidable invasive CTA imaging and be applied as a primary prognosis of PE, hence assisting the healthcare system, clinicians, and patients by reducing costs and promoting treatment quality and patient satisfaction.
Similar content being viewed by others
References
Banerjee I, Sofela M, Yang J, et al. Development and performance of the pulmonary embolism result forecast model (PERFORM) for computed tomography clinical decision support. JAMA Netw Open. 2019. https://doi.org/10.1001/jamanetworkopen.2019.8719.
Ma H, Sheng W, Li J, et al. A novel hierarchical machine learning model for hospital-acquired venous thromboembolism risk assessment among multiple-departments. J Biomed Inform. 2021;122: 103892. https://doi.org/10.1016/j.jbi.2021.103892.
Cano-Espinosa C, Cazorla M, González G. Computer aided detection of pulmonary embolism using multi-slice multi-axial segmentation. Appl Sci. 2020. https://doi.org/10.3390/APP10082945.
Huang SC, Kothari T, Banerjee I, et al. PENet—a scalable deep-learning model for automated diagnosis of pulmonary embolism using volumetric CT imaging. NPJ Digit Med. 2020. https://doi.org/10.1038/s41746-020-0266-y.
Shi L, Rajan D, Abedin S, et al (2020) Automatic diagnosis of pulmonary embolism using an attention-guided framework: a large-scale study. In Medical imaging with deep learning, pp 743–754. PMLR
Shi L, Dehghan E (2020) Automatic diagnosis of pulmonary embolism using an attention-guided framework : a large-scale study. 1–12
Kiourt C, Feretzakis G, Dalamarinis K, Kalles D (2021) Pulmonary embolism identification in computerized tomography pulmonary angiography scans with deep learning technologies in COVID-19 patients. arXiv:2105.11187
Valle C, Bonaffini PA, Dal Corso M, et al. Association between pulmonary embolism and COVID-19 severe pneumonia: experience from two centers in the core of the infection Italian peak. Eur J Radiol. 2021. https://doi.org/10.1016/j.ejrad.2021.109613.
Sakr Y, Giovini M, Leone M, et al. Pulmonary embolism in patients with coronavirus disease-2019 (COVID-19) pneumonia: a narrative review. Ann Intensive Care. 2020;10:1–13.
Thachil R, Nagraj S, Kharawala A, Sokol SI. Pulmonary embolism in women: a systematic review of the current literature. J Cardiovasc Dev Dis. 2022. https://doi.org/10.3390/jcdd9080234.
Morís DI, de Moura Ramos JJ, Buján JN, Hortas MO. Data augmentation approaches using cycle-consistent adversarial networks for improving COVID-19 screening in portable chest X-ray images. Expert Syst Appl. 2021;185: 115681. https://doi.org/10.1016/j.eswa.2021.115681.
Kiourt C, Feretzakis G, Dalamarinis K, et al (2021) Pulmonary embolism identification in computerized tomography pulmonary angiography scans with deep learning technologies in COVID-19 patients. arXiv:2105.11187
Mountain D, Keijzers G, Chu K, et al. Correction: RESPECT-ED: rates of pulmonary emboli (PE) and sub-segmental PE with modern computed tomographic pulmonary angiograms in emergency departments: a multi-center observational study finds significant yield variation, uncorrelated with use or smal. PLoS ONE. 2017;12:2015–8. https://doi.org/10.1371/journal.pone.0184219.
Kocher KE, Meurer WJ, Fazel R, Scott PA. National trends in use of computed tomography in the emergency department. YMEM. 2011;58:452-462.e3. https://doi.org/10.1016/j.annemergmed.2011.05.020.
Wang RC, Bent S, Weber E, et al. The impact of clinical decision rules on computed tomography use and yield for pulmonary embolism: a systematic review and meta-analysis. Ann Emerg Med. 2016;67:693-701.e3. https://doi.org/10.1016/j.annemergmed.2015.11.005.
Shahid O, Nasajpour M, Pouriyeh S, et al. Machine learning research towards combating COVID-19: virus detection, spread prevention, and medical assistance. J Biomed Inform. 2021;117: 103751. https://doi.org/10.1016/j.jbi.2021.103751.
Rucco M, Rodrigues DS, Merelli E, et al. Neural hypernetwork approach for pulmonary embolism diagnosis. BMC Res Notes. 2015. https://doi.org/10.1186/s13104-015-1554-5.
Puaschunder JM. The potential for artificial intelligence in healthcare. SSRN Electron J. 2020;6:94–8. https://doi.org/10.2139/ssrn.3525037.
Rysavy M. Evidence-based medicine: a science of uncertainty and an art of probability. Virtual Mentor. 2013;15:4–8. https://doi.org/10.1001/virtualmentor.2013.15.1.fred1-1301.
Menegotto AB, Becker CDL, Cazella SC. Computer-aided diagnosis of hepatocellular carcinoma fusing imaging and structured health data. Heal Inf Sci Syst. 2021. https://doi.org/10.1007/s13755-021-00151-x.
Wu C, Guo S, Hong Y, et al. Discrimination and conversion prediction of mild cognitive impairment using convolutional neural networks. Quant Imaging Med Surg. 2018;8:992–1003.
Fisher CK, Smith AM, Walsh JR, et al. Machine learning for comprehensive forecasting of Alzheimer’s disease progression. Sci Rep. 2019. https://doi.org/10.1038/s41598-019-49656-2.
Arco JE, Ramírez J, Górriz JM, Ruz M. Data fusion based on Searchlight analysis for the prediction of Alzheimer’s disease. Expert Syst Appl. 2021. https://doi.org/10.1016/j.eswa.2021.115549.
Thabtah F, Spencer R, Ye Y. The correlation of everyday cognition test scores and the progression of Alzheimer’s disease: a data analytics study. Heal Inf Sci Syst. 2020. https://doi.org/10.1007/s13755-020-00114-8.
Ryan L, Mataraso S, Siefkas A, et al. A machine learning approach to predict deep venous thrombosis among hospitalized patients. Clin Appl Thromb. 2021. https://doi.org/10.1177/1076029621991185.
Wiener RS, Gould MK, Arenberg DA, et al. An official American Thoracic Society/American College of Chest Physicians policy statement: implementation of low-dose computed tomography lung cancer screening programs in clinical practice. Am J Respir Crit Care Med. 2015;192:881–91. https://doi.org/10.1164/rccm.201508-1671ST.
Danzi GB, Loffi M, Galeazzi G, Gherbesi E. Acute pulmonary embolism and COVID-19 pneumonia: a random association? Eur Heart J. 2020;41:1858. https://doi.org/10.1093/eurheartj/ehaa254.
Sadik F, Dastider AG, Subah MR, et al. A dual-stage deep convolutional neural network for automatic diagnosis of COVID-19 and pneumonia from chest CT images ✩. Comput Biol Med. 2022;149: 105806. https://doi.org/10.1016/j.compbiomed.2022.105806.
Feki I, Ammar S, Kessentini Y, Muhammad K. Federated learning for COVID-19 screening from Chest X-ray images. Appl Soft Comput. 2021;106: 107330. https://doi.org/10.1016/j.asoc.2021.107330.
Shorten C, Khoshgoftaar TM, Furht B. Deep learning applications for COVID-19. J Big Data. 2021. https://doi.org/10.1186/s40537-020-00392-9.
Goel K, Sindhgatta R, Kalra S, et al. The effect of machine learning explanations on user trust for automated diagnosis of COVID-19. Comput Biol Med. 2022;146: 105587. https://doi.org/10.1016/j.compbiomed.2022.105587.
Bertsimas D, Borenstein A, Mingardi L, et al. Personalized prescription of ACEI/ARBs for hypertensive COVID-19 patients. Health Care Manag Sci. 2021;24:339–55. https://doi.org/10.1007/s10729-021-09545-5.
Liu Y, Qin J, Fan Y, et al. Estimation of infection density and epidemic size of COVID - 19 using the back—calculation algorithm. Heal Inf Sci Syst. 2020. https://doi.org/10.1007/s13755-020-00122-8.
Yang Y, Li Y, Chen R, et al. Risk prediction of renal failure for chronic disease population based on electronic health record big data. Big Data Res. 2021. https://doi.org/10.1016/j.bdr.2021.100234.
Bertsimas D, Orfanoudaki A, Weiner RB. Personalized treatment for coronary artery disease patients: a machine learning approach. Health Care Manag Sci. 2020;23:482–506. https://doi.org/10.1007/s10729-020-09522-4.
Schmuelling L, Franzeck FC, Nickel CH, et al. Deep learning-based automated detection of pulmonary embolism on CT pulmonary angiograms: no significant effects on report communication times and patient turnaround in the emergency department nine months after technical implementation. Eur J Radiol. 2021;141: 109816. https://doi.org/10.1016/j.ejrad.2021.109816.
Soffer S, Klang E, Shimon O, et al. Deep learning for pulmonary embolism detection on computed tomography pulmonary angiogram: a systematic review and meta-analysis. Sci Rep. 2021;11:1–8. https://doi.org/10.1038/s41598-021-95249-3.
Serpen G, Tekkedil DK, Orra M. A knowledge-based artificial neural network classifier for pulmonary embolism diagnosis. Comput Biol Med. 2008;38:204–20. https://doi.org/10.1016/j.compbiomed.2007.10.001.
Manshad A, Akbilgic O, Brailovsky Y, et al. Machine learning-based prediction of 30-day all-cause mortality in patients hospitalized with acute pulmonary embolism. Chest. 2020;158:A2213–4. https://doi.org/10.1016/j.chest.2020.08.1892.
Jenab Y, Hosseini K, Esmaeili Z, et al. Prediction of in-hospital adverse clinical outcomes in patients with pulmonary thromboembolism, machine learning based models. Front Cardiovasc Med. 2023;10:1–10. https://doi.org/10.3389/fcvm.2023.1087702.
Arbet J, Brokamp C, Meinzen-derr J, et al. Lessons and tips for designing a machine learning study using EHR data. J Clin Transl Sci. 2020. https://doi.org/10.1017/cts.2020.513.
Ma L, Zhang C, Wang Y, et al (2020) ConCare: personalized clinical feature embedding via capturing the healthcare context. In: AAAI 2020—34th AAAI conference on artificial intelligence, pp. 833–40. https://doi.org/10.1609/aaai.v34i01.5428
Leontjeva A, Kuzovkin I (2016) Combining static and dynamic features for multivariate sequence classification. In: Proceedings of 3rd IEEE international conference on data science and advanced analytics DSAA 2016, pp. 21–30. https://doi.org/10.1109/DSAA.2016.10
Kumar A (2018) A framework for malware detection with static features using machine learning algorithms. A thesis submitted by Ajit Kumar in partial fulfillment of the requirements for the award of the degree. https://doi.org/10.13140/RG.2.2.35593.90723
Li Z, Zhao S, Chen Y, et al. A deep-learning-based framework for severity assessment of COVID-19 with CT images. Expert Syst Appl. 2021;185: 115616. https://doi.org/10.1016/j.eswa.2021.115616.
Lucas PJF. Logic engineering in medicine. Knowl Eng Rev. 1995;10:153–79. https://doi.org/10.1017/S0269888900008134.
Scudiero F, Silverio A, Di Maio M, et al. Pulmonary embolism in COVID-19 patients: prevalence, predictors and clinical outcome. Thromb Res. 2021;198:34–9.
Weikert T, Nesic I, Cyriac J, et al. Towards automated generation of curated datasets in radiology: application of natural language processing to unstructured reports exemplified on CT for pulmonary embolism. Eur J Radiol. 2020;125: 108862. https://doi.org/10.1016/j.ejrad.2020.108862.
Tayefi M, Ngo P, Chomutare T. Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip Rev. 2021;13(6):e1549. https://doi.org/10.1002/wics.1549.
Indexed S. Conversion of unstructured data to structured data with a profile. Int J Mech Eng Technol. 2017;8:623–30.
Schiaffino S, Codari M, Cozzi A, et al. Machine learning to predict in-hospital mortality in covid-19 patients using computed tomography-derived pulmonary and vascular features. J Pers Med. 2021. https://doi.org/10.3390/jpm11060501.
Datia N. Data mining algorithms for computer aided detection of pulmonary embolism : a comparative study. 2014
Nargesian F, Samulowitz H, Khurana U, et al. Learning feature engineering for classification. Int Jt Conf Artif Intell 2017. https://doi.org/10.24963/ijcai.2017/352
Card QR UpToDate ® Advanced
Harrison TR, Resnick WR. Harrison’s principles of internal medicine. 618. 2022
Watson KL. Medical microbiology. 2. 1978
Shang Z. Use of Delphi in health sciences research: a narrative review. Medicine. 2023. https://doi.org/10.1097/MD.0000000000032829.
Chicco D, Oneto L, Tavazzi E. Eleven quick tips for data cleaning and feature engineering. PLoS Comput Biol. 2022;18:1–21. https://doi.org/10.1371/journal.pcbi.1010718.
Erjavac I, Kalafatovic D, Mau G. Artificial intelligence in the life sciences coupled encoding methods for antimicrobial peptide prediction: how sensitive is a highly accurate model? Artif Intell Life Sci. 2022. https://doi.org/10.1016/j.ailsci.2022.100034.
Sahoo SS, Kobow K, Zhang J, et al. Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records. Sci Rep. 2022;12:1–11. https://doi.org/10.1038/s41598-022-23101-3.
Ebinger J, Wells M, Ouyang D, et al. A machine learning algorithm predicts duration of hospitalization in COVID-19 patients. Intell Med. 2021;5: 100035. https://doi.org/10.1016/j.ibmed.2021.100035.
Andres M, Amell N, Awais M, et al. MethodsX attribute value extraction mechanism of constructed wetlands information. MethodsX. 2019;6:1054–67. https://doi.org/10.1016/j.mex.2019.04.017.
Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials—a practical guide with flowcharts. BMC Med Res Methodol. 2017. https://doi.org/10.1186/s12874-017-0442-1.
Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3147–55.
Liang W, Luo S, Zhao G, Wu H. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics. 2020;8:1–17. https://doi.org/10.3390/MATH8050765.
Fang X, Gao H, Wu J. Prediction of extubation failure for intensive care unit patients using light gradient boosting machine. IEEE Access. 2019;7:150960–8. https://doi.org/10.1109/ACCESS.2019.2946980.
Yu B. Fertility—LightGBM: a fertility—related protein prediction model by multi-information fusion and light gradient boosting machine. Biomed Signal Process Control. 2020;68:1–17.
Tariq A, Celi LA, Newsome JM, et al. Patient-specific COVID-19 resource utilization prediction using fusion AI model. NPJ Digit Med. 2021. https://doi.org/10.1038/s41746-021-00461-0.
Fayed HA, Atiya AF. Speed up grid-search for parameter selection of support vector machines. Appl Soft Comput J. 2019;80:202–10. https://doi.org/10.1016/j.asoc.2019.03.037.
Darapureddy N, Karatapu N, Battula TK. Research of machine learning algorithms using K-fold cross validation. Int J Eng Adv Technol. 2019. https://doi.org/10.35940/ijeat.F1043.0886S19.
Grüning M, Kropf S. A ridge classification method for high-dimensional observations. Data Inf Anal Knowl Eng. 2006. https://doi.org/10.1007/3-540-31314-1_84.
Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J Big Data. 2020. https://doi.org/10.1186/s40537-020-00369-8.
Moreno-Ibarra MA, Villuendas-Rey Y, Lytras MD, et al. Classification of diseases using machine learning algorithms: a comparative study. Mathematics. 2021;9:1–21. https://doi.org/10.3390/math9151817.
Zhang C, Ding Y, Peng Q. Who determines United States Healthcare out—of—pocket costs? Factor ranking and selection using ensemble learning. Heal Inf Sci Syst. 2021. https://doi.org/10.1007/s13755-021-00153-9.
Zhang NJ, Rameau P, Julemis M, et al. Automated pulmonary embolism risk assessment using the wells criteria: validation study. JMIR Formative Res. 2022;6:1–9. https://doi.org/10.2196/32230.
Case-study E, Banerjee I, Ph D, et al. Prediction of imaging outcomes from electronic health records : pulmonary prediction of imaging outcomes from electronic health records: pulmonary embolism case-study. In AMIA, 3–5. 2019
van Es N, Kraaijpoel N, Klok FA, et al. The original and simplified Wells rules and age-adjusted D-dimer testing to rule out pulmonary embolism: an individual patient data meta-analysis. J Thromb Haemost. 2017;15:678–84. https://doi.org/10.1111/jth.13630.
Simon MA, Tan C, Hilden P, et al. Effectiveness of clinical decision tools in predicting pulmonary embolism. Pulm Med. 2021;2021:1–5.
Elliott CG. Evaluation of suspected pulmonary embolism in pregnancy. J Thorac Imaging. 2012;27:3–4. https://doi.org/10.1097/RTI.0b013e31823ba521.
Zhao F, Zheng L, Shan F, et al. Evaluation of pulmonary ventilation in COVID-19 patients using oxygen-enhanced three-dimensional ultrashort echo time MRI: a preliminary study. Clin Radiol. 2021;76:391.e33-391.e41. https://doi.org/10.1016/j.crad.2021.02.008.
Waring J, Lindvall C, Umeton R. Automated machine learning: review of the state-of-the-art and opportunities for healthcare. Artif Intell Med. 2020;104: 101822. https://doi.org/10.1016/j.artmed.2020.101822.
Hong S, Lynn HS. Accuracy of random-forest-based imputation of missing data in the presence interaction. BMC Med Res Methodol. 2020;1:1–12.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial or non- for- profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships between the authors and any organization that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Laffafchi, S., Ebrahimi, A. & Kafan, S. Efficient management of pulmonary embolism diagnosis using a two-step interconnected machine learning model based on electronic health records data. Health Inf Sci Syst 12, 17 (2024). https://doi.org/10.1007/s13755-024-00276-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13755-024-00276-9