Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599800acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications

Published: 04 August 2023 Publication History
  • Get Citation Alerts
  • Abstract

    There is increasing interest in data-driven approaches for recommending optimal treatment strategies in many chronic disease management and critical care applications. Reinforcement learning methods are well-suited to this sequential decision-making problem, but must be trained and evaluated exclusively on retrospective medical record datasets as direct online exploration is unsafe and infeasible. Despite this requirement, the vast majority of treatment optimization studies use off-policy RL methods (e.g., Double Deep Q Networks (DDQN) or its variants) that are known to perform poorly in purely offline settings. Recent advances in offline RL, such as Conservative Q-Learning (CQL), offer a suitable alternative. But there remain challenges in adapting these approaches to real-world applications where suboptimal examples dominate the retrospective dataset and strict safety constraints need to be satisfied. In this work, we introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training. We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization to compare performance of the proposed approach against prominent off-policy and offline RL baselines (DDQN and CQL). Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes and in consistency with relevant practice and safety guidelines.

    Supplementary Material

    MP4 File (adfp620-2min-promo.mp4)
    There is growing interest in applying deep reinforcement learning to recommend optimal medical treatments in critical care and chronic disease management settings. But this is challenging as treatment optimization applications don?t permit learning through direct exploration of an environment. For safety reasons, recommendations must instead be learned from retrospective data, where suboptimal treatments can be overrepresented. To address these challenges, we introduce a practical and theoretically grounded transition sampling approach for deep offline reinforcement learning. We give a preview of our main findings on both diabetes and sepsis treatment optimization tasks. Namely, our proposed solution outperforms baselines in terms of expected health outcomes and consistency with clinical safety guidelines.

    References

    [1]
    Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. 2020. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning. PMLR, 104--114.
    [2]
    American Diabetes Association. 2022. Standards of Medical Care in Diabetes-2022 Abridged for Primary Care Providers. Clinical Diabetes 40, 1 (01 2022), 10--38. https://doi.org/10. 2337/cd22-as01 arXiv:https://diabetesjournals.org/clinical/article-pdf/40/1/10/684479/diaclincd22as01.pdf
    [3]
    Glenn W. Brier. 1950. VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY. Monthly Weather Review 78 (1950), 1--3.
    [4]
    Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic Minority over-Sampling Technique. J. Artif. Int. Res. 16, 1 (jun 2002), 321--357.
    [5]
    Xinyue Chen, Zijian Zhou, Zheng Wang, Che Wang, Yanqiu Wu, and Keith Ross. 2020. BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 18353--18363. https://proceedings.neurips.cc/paper/2020/file/ d55cbf210f175f4a37916eafe6c04f0d-Paper.pdf
    [6]
    Mehdi Fatemi, Taylor W Killian, Jayakumar Subramanian, and Marzyeh Ghassemi. 2021. Medical Dead-ends and Learning to Identify High-Risk States and Treatments. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 4856--4870. https://proceedings.neurips.cc/paper/2021/ file/26405399c51ad7b13b504e74eb7c696c-Paper.pdf
    [7]
    Mehdi Fatemi, Mary Wu, Jeremy Petch, Walter Nelson, Stuart J. Connolly, Alexander Benz, Anthony Carnicelli, and Marzyeh Ghassemi. 2022. Semi-Markov Offline Reinforcement Learning for Healthcare. https://doi.org/10.48550/ARXIV.2203. 09365
    [8]
    Justin Fu, Aviral Kumar, Matthew Soh, and Sergey Levine. 2019. Diagnosing bottlenecks in deep q-learning algorithms. In International Conference on Machine Learning. PMLR, 2021--2030.
    [9]
    Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-Policy Deep Reinforcement Learning without Exploration. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 2052--2062. https://proceedings.mlr.press/v97/fujimoto19a.html
    [10]
    Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 1322--1328. https://doi.org/10.1109/IJCNN.2008.4633969
    [11]
    Tim Hesterberg. 1995. Weighted Average Importance Sampling and Defensive Mixture Distributions. Technometrics 37, 2 (1995), 185--194. http://www.jstor. org/stable/1269620
    [12]
    Alistair E.W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. 3 (May. 2016). https://doi.org/10.1038/sdata.2016.35
    [13]
    Taylor W. Killian, Haoran Zhang, Jayakumar Subramanian, Mehdi Fatemi, and Marzyeh Ghassemi. 2020. An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare. https://doi.org/10.48550/ARXIV.2011. 11235
    [14]
    Matthieu Komorowski, Leo A. Celi, Omar Badawi, Anthony C. Gordon, and A. Aldo Faisal. 2018. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. 24 (Nov. 2018). https://doi.org/10.1038/ s41591-018-0213-5
    [15]
    Ilya Kostrikov, Jonathan Tompson, Rob Fergus, and Ofir Nachum. 2021. Offline Reinforcement Learning with Fisher Divergence Critic Regularization. In International Conference on Machine Learning.
    [16]
    Miroslav Kubat and Stan Matwin. 1997. Addressing the Curse of Imbalanced Training Sets: One Sided Selection. In Proceedings of the 14th International Conference on Machine Learning. Morgan Kaufman, 179--186.
    [17]
    Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. 2019. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).
    [18]
    Aviral Kumar, Justin Fu, George Tucker, and Sergey Levine. 2019. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. Curran Associates Inc., Red Hook, NY, USA.
    [19]
    Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1179--1191. https://proceedings. neurips.cc/paper/2020/file/0d2b2061826a5df3221116a5085a6052-Paper.pdf
    [20]
    Sascha Lange, Thomas Gabel, and Martin Riedmiller. 2012. Batch reinforcement learning. Reinforcement learning: State-of-the-art (2012), 45--73.
    [21]
    Daniel Yan Zheng Lim, Sing Yi Chia, Hanis Abdul Kadir, Nur Nasyitah Mohamed Salim, and Yong Mong Bee. 2021. Establishment of the SingHealth Diabetes Registry. Clinical Epidemiology 13 (2021), 215--223. https://doi.org/10.2147/CLEP. S300663
    [22]
    Rongmei Lin, Matthew D. Stanley, Mohammad M. Ghassemi, and Shamim Nemati. 2018. A Deep Deterministic Policy Gradient Approach to Medication Dosing and Surveillance in the ICU. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 4927--4931. https: //doi.org/10.1109/EMBC.2018.8513203
    [23]
    Charles X. Ling and Chenghui Li. 1998. Data Mining for Direct Marketing: Problems and Solutions. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (New York, NY) (KDD'98). AAAI Press, 73--79.
    [24]
    Mingyu Lu, Zachary Shahn, Daby Sow, Finale Doshi-Velez, and Li-Wei H Lehman. 2020. Is deep Reinforcement Learning ready for practical applications in healthcare? A sensitivity analysis of duel-DDQN for hemodynamic management in sepsis patients. AMIA Annu. Symp. Proc. 2020 (2020), 773--782.
    [25]
    Jiafei Lyu, Xiaoteng Ma, Xiu Li, and Zongqing Lu. 2022. Mildly Conservative Q-Learning for Offline Reinforcement Learning. https://doi.org/10.48550/ARXIV. 2206.04745
    [26]
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
    [27]
    S Nemati, M.M. Ghassemi, and G. Clifford. 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). https://doi.org/10.1109/EMBC.2016.7591355
    [28]
    Shamim Nemati, Mohammad M. Ghassemi, and Gari D. Clifford. 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2978--2981. https://doi.org/ 10.1109/EMBC.2016.7591355
    [29]
    Xuefeng Peng, Yi Ding, David Wihl, Omer Gottesman, Matthieu Komorowski, Li-Wei H Lehman, Andrew Ross, Aldo Faisal, and Finale Doshi-Velez. 2018. Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. AMIA Annu. Symp. Proc. 2018 (Dec. 2018), 887--896.
    [30]
    Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Deep Reinforcement Learning for Sepsis Treatment. https://doi.org/10.48550/ARXIV.1711.09602
    [31]
    Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach. In Proceedings of the 2nd Machine Learning for Healthcare Conference (Proceedings of Machine Learning Research, Vol. 68), Finale Doshi-Velez, Jim Fackler, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens (Eds.). PMLR, 147--163. https: //proceedings.mlr.press/v68/raghu17a.html
    [32]
    Luca Roggeveen, Ali el Hassouni, Jonas Ahrendt, Tingjie Guo, Lucas Fleuren, Patrick Thoral, Armand RJ Girbes, Mark Hoogendoorn, and Paul WG Elbers. 2021. Transatlantic transferability of a new reinforcement learning model for optimizing haemodynamic treatment for critically ill patients with sepsis. Artificial Intelligence in Medicine 112 (2021), 102003. https://doi.org/10.1016/j.artmed. 2020.102003
    [33]
    Gabriel Schamberg, Marcus Badgeley, and Emery N. Brown. 2020. Controlling Level of Unconsciousness by Titrating Propofol with Deep Reinforcement Learning. AI in Medicine (2020).
    [34]
    A.H. Schistad Solberg and R. Solberg. 1996. A large-scale evaluation of features for automatic detection of oil spills in ERS SAR images. In IGARSS '96. 1996 International Geoscience and Remote Sensing Symposium, Vol. 3. 1484--1486 vol.3. https://doi.org/10.1109/IGARSS.1996.516705
    [35]
    Baiju R Shah, Janet E Hux, Andreas Laupacis, Bernard Zinman, and Carl van Walraven. 2005. Clinical inertia in response to inadequate glycemic control: do specialists differ from primary care physicians? Diabetes Care 28, 3 (March 2005), 600--606.
    [36]
    David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354--359.
    [37]
    Xingzhi Sun, Yong Mong Bee, Shao Wei Lam, Zhuo Liu, Wei Zhao, Sing Yi Chia, Hanis Abdul Kadir, Jun Tian Wu, Boon Yew Ang, Nan Liu, Zuo Lei, Zhuoyang Xu, Tingting Zhao, Gang Hu, and Guotong Xie. 2021. Effective Treatment Recommendations for Type 2 Diabetes Management Using Reinforcement Learning: Treatment Recommendation Model Development and Validation. J Med Internet Res 23, 7 (22 Jul 2021), e27858. https://doi.org/10.2196/27858
    [38]
    Charles Teddlie and Fen Yu. 2007. Mixed Methods Sampling: A Typology With Examples. Journal of Mixed Methods Research 1, 1 (2007), 77--100. https://doi. org/10.1177/1558689806292430 arXiv:https://doi.org/10.1177/1558689806292430
    [39]
    Huan-Hsin Tseng, Yi Luo, Sunan Cui, Jen-Tzung Chien, Randall K Ten Haken, and Issam El Naqa. 2017. Deep reinforcement learning for automated radiation adaptation in lung cancer. Medical Physics 44 (2017), 6690--6705. https://doi.org/ 10.1002/mp.12625
    [40]
    Hado van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence 30, 1 (Mar. 2016). https://doi.org/10.1609/aaai.v30i1.10295
    [41]
    Qing Wang, Jiechao Xiong, Lei Han, peng sun, Han Liu, and Tong Zhang. 2018. Exponentially Weighted Imitation Learning for Batched Historical Data. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/ 4aec1b3435c52abbdf8334ea0e7141e0-Paper.pdf
    [42]
    C.J.C.H. Watkins. 1989. Learning from Delayed Rewards. Ph. D. Dissertation. University of Cambridge England.
    [43]
    Yifan Wu, George Tucker, and Ofir Nachum. 2019. Behavior Regularized Offline Reinforcement Learning. https://doi.org/10.48550/ARXIV.1911.11361
    [44]
    Yue Wu, Shuangfei Zhai, Nitish Srivastava, Joshua M Susskind, Jian Zhang, Ruslan Salakhutdinov, and Hanlin Goh. 2021. Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 11319--11328. https://proceedings.mlr.press/ v139/wu21i.html
    [45]
    Yiqin Yang, Xiaoteng Ma, Chenghao Li, Zewu Zheng, Qiyuan Zhang, Gao Huang, Jun Yang, and Qianchuan Zhao. 2021. Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning. https://doi.org/10. 48550/ARXIV.2106.03400
    [46]
    Chao Yu, Jiming Liu, Shamim Nemati, and Guosheng Yin. 2021. Reinforcement learning in healthcare: a survey. Comput. Surveys 33(1) (2021).
    [47]
    Chao Yu, Guoqi Ren, and Yinzhao Dong. 2020. Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC medical informatics and decision making (2020).
    [48]
    Chao Yu, Guoqi Ren, and Jiming Liu. 2019. Deep Inverse Reinforcement Learning for Sepsis Treatment. In 2019 IEEE International Conference on Healthcare Informatics (ICHI). 1--3. https://doi.org/10.1109/ICHI.2019.8904645
    [49]
    Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, and Chelsea Finn. 2021. Combo: Conservative offline model-based policy optimization. Advances in neural information processing systems 34 (2021), 28954--28967.
    [50]
    Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Y Zou, Sergey Levine, Chelsea Finn, and Tengyu Ma. 2020. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems 33 (2020), 14129--14142.
    [51]
    Daochen Zha, Kwei-Herng Lai, Qiaoyu Tan, Sirui Ding, Na Zou, and Xia Hu. 2022. Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning. arXiv:2208.12433 [cs.LG]
    [52]
    Hua Zheng, Ilya O. Rozhov, Wei Xie, and Judy Zhong. 2021. Personalized Multimorbidity Management for Patients with Type 2 Diabetes Using Reinforcement Learning of Electronic Health Records. Drugs 4 (2021), 471--482. https://doi.org/10.1007/s40265-020-01435-4
    [53]
    Taiyu Zhu, Kezhi Li, Pau Herrero, and Pantelis Georgiou. 2021. Basal Glucose Control in Type 1 Diabetes Using Deep Reinforcement Learning: An In Silico Validation. IEEE Journal of Biomedical and Health Informatics 25, 4 (2021), 1223--1232. https://doi.org/10.1109/JBHI.2020.3014556

    Cited By

    View all
    • (2024)End-to-end offline reinforcement learning for glycemia controlArtificial Intelligence in Medicine10.1016/j.artmed.2024.102920154(102920)Online publication date: Aug-2024
    • (2024)Personalization for web-based services using offline reinforcement learningMachine Language10.1007/s10994-024-06525-y113:5(3049-3071)Online publication date: 28-Mar-2024
    • (2023)A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and futureAging Clinical and Experimental Research10.1007/s40520-023-02552-235:11(2363-2397)Online publication date: 8-Sep-2023

    Index Terms

    1. Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
        August 2023
        5996 pages
        ISBN:9798400701030
        DOI:10.1145/3580305
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 04 August 2023

        Check for updates

        Author Tags

        1. offline reinforcement learning
        2. safety constraints
        3. sampling
        4. sepsis treatment
        5. treatment optimization
        6. type 2 diabetes treatment

        Qualifiers

        • Research-article

        Funding Sources

        • A*STAR, Singapore

        Conference

        KDD '23
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

        Upcoming Conference

        KDD '24

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)563
        • Downloads (Last 6 weeks)69
        Reflects downloads up to 27 Jul 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)End-to-end offline reinforcement learning for glycemia controlArtificial Intelligence in Medicine10.1016/j.artmed.2024.102920154(102920)Online publication date: Aug-2024
        • (2024)Personalization for web-based services using offline reinforcement learningMachine Language10.1007/s10994-024-06525-y113:5(3049-3071)Online publication date: 28-Mar-2024
        • (2023)A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and futureAging Clinical and Experimental Research10.1007/s40520-023-02552-235:11(2363-2397)Online publication date: 8-Sep-2023

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media