research-article

Open access

Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications

Authors:

Pavitra KrishnaswamyAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 4673 - 4684

https://doi.org/10.1145/3580305.3599800

Published: 04 August 2023 Publication History

PDF eReader

Abstract

There is increasing interest in data-driven approaches for recommending optimal treatment strategies in many chronic disease management and critical care applications. Reinforcement learning methods are well-suited to this sequential decision-making problem, but must be trained and evaluated exclusively on retrospective medical record datasets as direct online exploration is unsafe and infeasible. Despite this requirement, the vast majority of treatment optimization studies use off-policy RL methods (e.g., Double Deep Q Networks (DDQN) or its variants) that are known to perform poorly in purely offline settings. Recent advances in offline RL, such as Conservative Q-Learning (CQL), offer a suitable alternative. But there remain challenges in adapting these approaches to real-world applications where suboptimal examples dominate the retrospective dataset and strict safety constraints need to be satisfied. In this work, we introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training. We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization to compare performance of the proposed approach against prominent off-policy and offline RL baselines (DDQN and CQL). Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes and in consistency with relevant practice and safety guidelines.

Supplementary Material

MP4 File (adfp620-2min-promo.mp4)

There is growing interest in applying deep reinforcement learning to recommend optimal medical treatments in critical care and chronic disease management settings. But this is challenging as treatment optimization applications don?t permit learning through direct exploration of an environment. For safety reasons, recommendations must instead be learned from retrospective data, where suboptimal treatments can be overrepresented. To address these challenges, we introduce a practical and theoretically grounded transition sampling approach for deep offline reinforcement learning. We give a preview of our main findings on both diabetes and sepsis treatment optimization tasks. Namely, our proposed solution outperforms baselines in terms of expected health outcomes and consistency with clinical safety guidelines.

Download
16.45 MB

References

[1]

Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. 2020. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning. PMLR, 104--114.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space

ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

The treatment of sepsis: an episodic memory-assisted deep reinforcement learning approach

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations