Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3366423.3380248acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes✱

Published: 20 April 2020 Publication History

Abstract

Recent developments in discovering dynamic treatment regimes (DTRs) have heightened the importance of deep reinforcement learning (DRL) which are used to recover the doctor’s treatment policies. However, existing DRL-based methods expose the following limitations: 1) supervised methods based on behavior cloning suffer from compounding errors; 2) the self-defined reward signals in reinforcement learning models are either too sparse or need clinical guidance; 3) only positive trajectories (e.g. survived patients) are considered in current imitation learning models, with negative trajectories (e.g. deceased patients) been largely ignored, which are examples of what not to do and could help the learned policy avoid repeating mistakes. To address these limitations, in this paper, we propose the adversarial cooperative imitation learning model, ACIL, to deduce the optimal dynamic treatment regimes that mimics the positive trajectories while differs from the negative trajectories. Specifically, two discriminators are used to help achieve this goal: an adversarial discriminator is designed to minimize the discrepancies between the trajectories generated from the policy and the positive trajectories, and a cooperative discriminator is used to distinguish the negative trajectories from the positive and generated trajectories. The reward signals from the discriminators are utilized to refine the policy for dynamic treatment regimes. Experiments on the publicly real-world medical data demonstrate that ACIL improves the likelihood of patient survival and provides better dynamic treatment regimes with the exploitation of information from both positive and negative trajectories.

References

[1]
Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning. ACM, 1.
[2]
Jacek M Bajor and Thomas A Lasko. 2016. Predicting medications from diagnostic codes with recurrent neural networks. (2016).
[3]
Melanie K Bothe, Luke Dickens, Katrin Reichel, Arn Tellmann, Bjoern Ellger, Martin Westphal, and Ahmed A Faisal. 2013. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert review of medical devices 10, 5 (2013), 661–673.
[4]
Lars Buesing, Theophane Weber, Sebastien Racaniere, SM Eslami, Danilo Rezende, David P Reichert, Fabio Viola, Frederic Besse, Karol Gregor, Demis Hassabis, 2018. Learning and querying fast generative models for reinforcement learning. arXiv preprint arXiv:1802.03006(2018).
[5]
Bibhas Chakraborty and Susan A Murphy. 2014. Dynamic treatment regimes. Annual review of statistics and its application 1 (2014), 447–464.
[6]
Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, and Le Song. 2019. Generative Adversarial User Model for Reinforcement Learning Based Recommendation System. In International Conference on Machine Learning. 1052–1061.
[7]
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016. Doctor ai: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference. 301–318.
[8]
John Co-Reyes, YuXuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, and Sergey Levine. 2018. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings. In International Conference on Machine Learning. 1008–1017.
[9]
Yan Duan, Marcin Andrychowicz, Bradly Stadie, OpenAI Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. 2017. One-shot imitation learning. In Advances in neural information processing systems. 1087–1098.
[10]
Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In ICML. 1097–1104.
[11]
Chelsea Finn, Sergey Levine, and Pieter Abbeel. 2016. Guided cost learning: Deep inverse optimal control via policy optimization. In International Conference on Machine Learning. 49–58.
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.
[13]
Daniel H Grollman and Aude Billard. 2011. Donut as I do: Learning from failed demonstrations. In 2011 IEEE International Conference on Robotics and Automation. IEEE, 3804–3809.
[14]
Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In ICRA. IEEE, 3389–3396.
[15]
Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. In Advances in neural information processing systems. 4565–4573.
[16]
Nan Jiang and Lihong Li. 2015. Doubly robust off-policy value evaluation for reinforcement learning. arXiv preprint arXiv:1511.03722(2015).
[17]
Bo Jin, Haoyu Yang, Leilei Sun, Chuanren Liu, Yue Qu, and Jianing Tong. 2018. A treatment engine by predicting next-period prescriptions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1608–1616.
[18]
Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018. Real-time bidding with multi-agent reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2193–2201.
[19]
Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data 3(2016), 160035.
[20]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114(2013).
[21]
Matthieu Komorowski, Leo A Celi, Omar Badawi, Anthony C Gordon, and A Aldo Faisal. 2018. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine 24, 11 (2018), 1716.
[22]
Yunzhu Li, Jiaming Song, and Stefano Ermon. 2017. Infogail: Interpretable imitation learning from visual demonstrations. In Advances in Neural Information Processing Systems. 3812–3822.
[23]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.
[24]
Susan A Murphy. 2003. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65, 2 (2003), 331–355.
[25]
Dean A Pomerleau. 1991. Efficient training of artificial neural networks for autonomous navigation. Neural Computation 3, 1 (1991), 88–97.
[26]
Doina Precup, Richard S. Sutton, and Sanjoy Dasgupta. 2001. Off-policy temporal-difference learning with function approximation. In ICML. 417–424.
[27]
Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Deep reinforcement learning for sepsis treatment. arXiv preprint arXiv:1711.09602(2017).
[28]
Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Continuous state-space models for optimal sepsis treatment-a deep reinforcement learning approach. arXiv preprint arXiv:1705.08422(2017).
[29]
Stéphane Ross and Drew Bagnell. 2010. Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 661–668.
[30]
Stéphane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 627–635.
[31]
Suchi Saria. 2018. Individualized sepsis treatment using reinforcement learning. Nature medicine 24, 11 (2018), 1641.
[32]
Mervyn Singer, Clifford S Deutschman, Christopher Warren Seymour, Manu Shankar-Hari, Djillali Annane, Michael Bauer, Rinaldo Bellomo, Gordon R Bernard, Jean-Daniel Chiche, Craig M Coopersmith, 2016. The third international consensus definitions for sepsis and septic shock (Sepsis-3). Jama 315, 8 (2016), 801–810.
[33]
Richard S Sutton, Andrew G Barto, 1998. Introduction to reinforcement learning. Vol. 2. MIT press Cambridge.
[34]
Elise Van der Pol and Frans A Oliehoek. 2016. Coordinated deep reinforcement learners for traffic light control. Proceedings of Learning, Inference and Control of Multi-Agent Systems (at NIPS 2016) (2016).
[35]
Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2447–2456.
[36]
Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. 2015. Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888(2015).
[37]
Yutao Zhang, Robert Chen, Jie Tang, Walter F Stewart, and Jimeng Sun. 2017. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1315–1324.
[38]
Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee, 167–176.
[39]
Shao Zhifei and Er Meng Joo. 2012. A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics 5, 3(2012), 293–311.
[40]
Brian D Ziebart, Andrew Maas, J Andrew Bagnell, and Anind K Dey. 2008. Maximum entropy inverse reinforcement learning. (2008).

Cited By

View all
  • (2024)PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral OptimizationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671611(6148-6157)Online publication date: 25-Aug-2024
  • (2024)Interpretable Imitation Learning with Dynamic Causal RelationsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635827(967-975)Online publication date: 4-Mar-2024
  • (2024)Label-Free Adaptive Gaussian Sample Consensus Framework for Learning From Perfect and Imperfect DemonstrationsIEEE Transactions on Medical Robotics and Bionics10.1109/TMRB.2024.34226526:3(1093-1103)Online publication date: Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '20: Proceedings of The Web Conference 2020
April 2020
3143 pages
ISBN:9781450370233
DOI:10.1145/3366423
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic treatment regimes
  2. generative adversarial networks
  3. imitation learning
  4. reinforcement learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '20
Sponsor:
WWW '20: The Web Conference 2020
April 20 - 24, 2020
Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)3
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral OptimizationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671611(6148-6157)Online publication date: 25-Aug-2024
  • (2024)Interpretable Imitation Learning with Dynamic Causal RelationsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635827(967-975)Online publication date: 4-Mar-2024
  • (2024)Label-Free Adaptive Gaussian Sample Consensus Framework for Learning From Perfect and Imperfect DemonstrationsIEEE Transactions on Medical Robotics and Bionics10.1109/TMRB.2024.34226526:3(1093-1103)Online publication date: Aug-2024
  • (2024)Reinforced Sequential Decision-Making for Sepsis Treatment: The PosNegDM Framework With Mortality Classifier and TransformerIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2024.337721428:5(3114-3122)Online publication date: May-2024
  • (2023)ILRoute: A Graph-based Imitation Learning Method to Unveil Riders' Routing Strategies in Food Delivery ServiceProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599844(4024-4034)Online publication date: 6-Aug-2023
  • (2023)FedSkill: Privacy Preserved Interpretable Skill Learning via ImitationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599349(1010-1019)Online publication date: 6-Aug-2023
  • (2023)Interpretable Skill Learning for Dynamic Treatment Regimes through Imitation2023 57th Annual Conference on Information Sciences and Systems (CISS)10.1109/CISS56502.2023.10089648(1-6)Online publication date: 22-Mar-2023
  • (2023)Adversarial reinforcement learning for dynamic treatment regimesJournal of Biomedical Informatics10.1016/j.jbi.2022.104244137(104244)Online publication date: Jan-2023
  • (2023)Self-adaptive Inverse Soft-Q Learning for ImitationNeural Information Processing10.1007/978-981-99-8138-0_1(3-14)Online publication date: 26-Nov-2023
  • (2022)Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation LearningIEEE Access10.1109/ACCESS.2022.319349410(78148-78158)Online publication date: 2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media