A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care

Gao, Junyi; Zhu, Yinghao; Wang, Wenqing; Wang, Yasha; Tang, Wen; Harrison, Ewen M.; Ma, Liantao

Computer Science > Machine Learning

arXiv:2209.07805 (cs)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 16 Sep 2022 (v1), last revised 23 Jan 2024 (this version, v4)]

Title:A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care

Authors:Junyi Gao, Yinghao Zhu, Wenqing Wang, Yasha Wang, Wen Tang, Ewen M. Harrison, Liantao Ma

View PDF HTML (experimental)

Abstract:The COVID-19 pandemic has posed a heavy burden to the healthcare system worldwide and caused huge social disruption and economic loss. Many deep learning models have been proposed to conduct clinical predictive tasks such as mortality prediction for COVID-19 patients in intensive care units using Electronic Health Record (EHR) data. Despite their initial success in certain clinical applications, there is currently a lack of benchmarking results to achieve a fair comparison so that we can select the optimal model for clinical use. Furthermore, there is a discrepancy between the formulation of traditional prediction tasks and real-world clinical practice in intensive care. To fill these gaps, we propose two clinical prediction tasks, Outcome-specific length-of-stay prediction and Early mortality prediction for COVID-19 patients in intensive care units. The two tasks are adapted from the naive length-of-stay and mortality prediction tasks to accommodate the clinical practice for COVID-19 patients. We propose fair, detailed, open-source data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models on two tasks, including 5 machine learning models, 6 basic deep learning models and 6 deep learning predictive models specifically designed for EHR data. We provide benchmarking results using data from two real-world COVID-19 EHR datasets. One dataset is publicly available without needing any inquiry and another dataset can be accessed on request. We provide fair, reproducible benchmarking results for two tasks. We deploy all experiment results and models on an online platform. We also allow clinicians and researchers to upload their data to the platform and get quick prediction results using our trained models. We hope our efforts can further facilitate deep learning and machine learning research for COVID-19 predictive modeling.

Comments:	Junyi Gao, Yinghao Zhu and Wenqing Wang contributed equally
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2209.07805 [cs.LG]
	(or arXiv:2209.07805v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2209.07805

Submission history

From: Junyi Gao [view email]
[v1] Fri, 16 Sep 2022 09:09:15 UTC (6,966 KB)
[v2] Wed, 19 Oct 2022 20:17:59 UTC (6,966 KB)
[v3] Wed, 7 Jun 2023 21:03:43 UTC (8,104 KB)
[v4] Tue, 23 Jan 2024 17:14:20 UTC (8,438 KB)

Computer Science > Machine Learning

Title:A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators