Machine learning to predict venous thrombosis in acutely ill medical patients

Tarek Nafee; C Michael Gibson; Ryan Travis; Megan K Yee; Mathieu Kerneis; Gerald Chi; Fahad AlKhalfan; Adrian F Hernandez; Russell D Hull; Ander T Cohen; Robert A Harrington; Samuel Z Goldhaber

doi:10.1002/rth2.12292

Machine learning to predict venous thrombosis in acutely ill medical patients

Res Pract Thromb Haemost. 2020 Jan 21;4(2):230-237. doi: 10.1002/rth2.12292. eCollection 2020 Feb.

Authors

Affiliations

¹ The Cardiovascular Division Department of Medicine Beth Israel Deaconess Medical Center Harvard Medical School Boston Massachusetts.
² Duke University The Duke Clinical Research Institute Durham North Carolina.
³ Foothills Hospital University of Calgary Calgary AB Canada.
⁴ Guys and St Thomas Hospitals London UK.
⁵ Department of Medicine Stanford University Stanford California.
⁶ Cardiovascular Division Department of Medicine Brigham and Women's Hospital Boston Massachusetts.

Abstract

Background: The identification of acutely ill patients at high risk for venous thromboembolism (VTE) may be determined clinically or by use of integer-based scoring systems. These scores demonstrated modest performance in external data sets.

Objectives: To evaluate the performance of machine learning models compared to the IMPROVE score.

Methods: The APEX trial randomized 7513 acutely medically ill patients to extended duration betrixaban vs. enoxaparin. Including 68 variables, a super learner model (ML) was built to predict VTE by combining estimates from 5 families of candidate models. A "reduced" model (rML) was also developed using 16 variables that were thought, a priori, to be associated with VTE. The IMPROVE score was calculated for each patient. Model performance was assessed by discrimination and calibration to predict a composite VTE end point. The frequency of predicted risks of VTE were plotted and divided into tertiles. VTE risks were compared across tertiles.

Results: The ML and rML algorithms outperformed the IMPROVE score in predicting VTE (c-statistic: 0.69, 0.68 and 0.59, respectively). The Hosmer-Lemeshow goodness-of-fit P-value was 0.06 for ML, 0.44 for rML, and <0.001 for the IMPROVE score. The observed event rate in the lowest tertile was 2.5%, 4.8% in tertile 2, and 11.4% in the highest tertile. Patients in the highest tertile of VTE risk had a 5-fold increase in odds of VTE compared to the lowest tertile.

Conclusion: The super learner algorithms improved discrimination and calibration compared to the IMPROVE score for predicting VTE in acute medically ill patients.

Keywords: acute medically ill; machine learning; personalized medicine; super learner; venous thromboembolism.