Background: The identification of acutely ill patients at high risk for venous thromboembolism (VTE) may be determined clinically or by use of integer-based scoring systems. These scores demonstrated modest performance in external data sets.
Objectives: To evaluate the performance of machine learning models compared to the IMPROVE score.
Methods: The APEX trial randomized 7513 acutely medically ill patients to extended duration betrixaban vs. enoxaparin. Including 68 variables, a super learner model (ML) was built to predict VTE by combining estimates from 5 families of candidate models. A "reduced" model (rML) was also developed using 16 variables that were thought, a priori, to be associated with VTE. The IMPROVE score was calculated for each patient. Model performance was assessed by discrimination and calibration to predict a composite VTE end point. The frequency of predicted risks of VTE were plotted and divided into tertiles. VTE risks were compared across tertiles.
Results: The ML and rML algorithms outperformed the IMPROVE score in predicting VTE (c-statistic: 0.69, 0.68 and 0.59, respectively). The Hosmer-Lemeshow goodness-of-fit P-value was 0.06 for ML, 0.44 for rML, and <0.001 for the IMPROVE score. The observed event rate in the lowest tertile was 2.5%, 4.8% in tertile 2, and 11.4% in the highest tertile. Patients in the highest tertile of VTE risk had a 5-fold increase in odds of VTE compared to the lowest tertile.
Conclusion: The super learner algorithms improved discrimination and calibration compared to the IMPROVE score for predicting VTE in acute medically ill patients.
Keywords: acute medically ill; machine learning; personalized medicine; super learner; venous thromboembolism.
© 2020 The Authors. Research and Practice in Thrombosis and Haemostasis published by Wiley Periodicals, Inc on behalf of International Society on Thrombosis and Haemostasis.