research-article

Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes✱

Authors:

Martin Renqiang Ren,

Hongyuan ZhaAuthors Info & Claims

WWW '20: Proceedings of The Web Conference 2020

Pages 1785 - 1795

https://doi.org/10.1145/3366423.3380248

Published: 20 April 2020 Publication History

Abstract

Recent developments in discovering dynamic treatment regimes (DTRs) have heightened the importance of deep reinforcement learning (DRL) which are used to recover the doctor’s treatment policies. However, existing DRL-based methods expose the following limitations: 1) supervised methods based on behavior cloning suffer from compounding errors; 2) the self-defined reward signals in reinforcement learning models are either too sparse or need clinical guidance; 3) only positive trajectories (e.g. survived patients) are considered in current imitation learning models, with negative trajectories (e.g. deceased patients) been largely ignored, which are examples of what not to do and could help the learned policy avoid repeating mistakes. To address these limitations, in this paper, we propose the adversarial cooperative imitation learning model, ACIL, to deduce the optimal dynamic treatment regimes that mimics the positive trajectories while differs from the negative trajectories. Specifically, two discriminators are used to help achieve this goal: an adversarial discriminator is designed to minimize the discrepancies between the trajectories generated from the policy and the positive trajectories, and a cooperative discriminator is used to distinguish the negative trajectories from the positive and generated trajectories. The reward signals from the discriminators are utilized to refine the policy for dynamic treatment regimes. Experiments on the publicly real-world medical data demonstrate that ACIL improves the likelihood of patient survival and provides better dynamic treatment regimes with the exploitation of information from both positive and negative trajectories.

References

[1]

Pieter Abbeel and Andrew Y Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning. ACM, 1.

Digital Library

[2]

Jacek M Bajor and Thomas A Lasko. 2016. Predicting medications from diagnostic codes with recurrent neural networks. (2016).

[3]

Melanie K Bothe, Luke Dickens, Katrin Reichel, Arn Tellmann, Bjoern Ellger, Martin Westphal, and Ahmed A Faisal. 2013. The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas. Expert review of medical devices 10, 5 (2013), 661–673.

[4]

Lars Buesing, Theophane Weber, Sebastien Racaniere, SM Eslami, Danilo Rezende, David P Reichert, Fabio Viola, Frederic Besse, Karol Gregor, Demis Hassabis, 2018. Learning and querying fast generative models for reinforcement learning. arXiv preprint arXiv:1802.03006(2018).

[5]

Bibhas Chakraborty and Susan A Murphy. 2014. Dynamic treatment regimes. Annual review of statistics and its application 1 (2014), 447–464.

[6]

Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, and Le Song. 2019. Generative Adversarial User Model for Reinforcement Learning Based Recommendation System. In International Conference on Machine Learning. 1052–1061.

[7]

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016. Doctor ai: Predicting clinical events via recurrent neural networks. In Machine Learning for Healthcare Conference. 301–318.

[8]

John Co-Reyes, YuXuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, and Sergey Levine. 2018. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings. In International Conference on Machine Learning. 1008–1017.

[9]

Yan Duan, Marcin Andrychowicz, Bradly Stadie, OpenAI Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. 2017. One-shot imitation learning. In Advances in neural information processing systems. 1087–1098.

[10]

Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In ICML. 1097–1104.

[11]

Chelsea Finn, Sergey Levine, and Pieter Abbeel. 2016. Guided cost learning: Deep inverse optimal control via policy optimization. In International Conference on Machine Learning. 49–58.

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.

[13]

Daniel H Grollman and Aude Billard. 2011. Donut as I do: Learning from failed demonstrations. In 2011 IEEE International Conference on Robotics and Automation. IEEE, 3804–3809.

[14]

Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In ICRA. IEEE, 3389–3396.

[15]

Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. In Advances in neural information processing systems. 4565–4573.

[16]

Nan Jiang and Lihong Li. 2015. Doubly robust off-policy value evaluation for reinforcement learning. arXiv preprint arXiv:1511.03722(2015).

[17]

Bo Jin, Haoyu Yang, Leilei Sun, Chuanren Liu, Yue Qu, and Jianing Tong. 2018. A treatment engine by predicting next-period prescriptions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1608–1616.

Digital Library

[18]

Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018. Real-time bidding with multi-agent reinforcement learning in display advertising. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2193–2201.

Digital Library

[19]

Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Mengling Feng, Mohammad Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific data 3(2016), 160035.

[20]

Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114(2013).

[21]

Matthieu Komorowski, Leo A Celi, Omar Badawi, Anthony C Gordon, and A Aldo Faisal. 2018. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nature Medicine 24, 11 (2018), 1716.

[22]

Yunzhu Li, Jiaming Song, and Stefano Ermon. 2017. Infogail: Interpretable imitation learning from visual demonstrations. In Advances in Neural Information Processing Systems. 3812–3822.

[23]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.

[24]

Susan A Murphy. 2003. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65, 2 (2003), 331–355.

[25]

Dean A Pomerleau. 1991. Efficient training of artificial neural networks for autonomous navigation. Neural Computation 3, 1 (1991), 88–97.

[26]

Doina Precup, Richard S. Sutton, and Sanjoy Dasgupta. 2001. Off-policy temporal-difference learning with function approximation. In ICML. 417–424.

[27]

Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Deep reinforcement learning for sepsis treatment. arXiv preprint arXiv:1711.09602(2017).

[28]

Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Continuous state-space models for optimal sepsis treatment-a deep reinforcement learning approach. arXiv preprint arXiv:1705.08422(2017).

[29]

Stéphane Ross and Drew Bagnell. 2010. Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 661–668.

[30]

Stéphane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 627–635.

[31]

Suchi Saria. 2018. Individualized sepsis treatment using reinforcement learning. Nature medicine 24, 11 (2018), 1641.

[32]

Mervyn Singer, Clifford S Deutschman, Christopher Warren Seymour, Manu Shankar-Hari, Djillali Annane, Michael Bauer, Rinaldo Bellomo, Gordon R Bernard, Jean-Daniel Chiche, Craig M Coopersmith, 2016. The third international consensus definitions for sepsis and septic shock (Sepsis-3). Jama 315, 8 (2016), 801–810.

[33]

Richard S Sutton, Andrew G Barto, 1998. Introduction to reinforcement learning. Vol. 2. MIT press Cambridge.

[34]

Elise Van der Pol and Frans A Oliehoek. 2016. Coordinated deep reinforcement learners for traffic light control. Proceedings of Learning, Inference and Control of Multi-Agent Systems (at NIPS 2016) (2016).

[35]

Lu Wang, Wei Zhang, Xiaofeng He, and Hongyuan Zha. 2018. Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 2447–2456.

Digital Library

[36]

Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. 2015. Maximum entropy deep inverse reinforcement learning. arXiv preprint arXiv:1507.04888(2015).

[37]

Yutao Zhang, Robert Chen, Jie Tang, Walter F Stewart, and Jimeng Sun. 2017. LEAP: learning to prescribe effective and safe treatment combinations for multimorbidity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1315–1324.

Digital Library

[38]

Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee, 167–176.

Digital Library

[39]

Shao Zhifei and Er Meng Joo. 2012. A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics 5, 3(2012), 293–311.

[40]

Brian D Ziebart, Andrew Maas, J Andrew Bagnell, and Anind K Dey. 2008. Maximum entropy inverse reinforcement learning. (2008).

Cited By

Ye YTang LWang HYu RYu WHe EChen HXiong HBaeza-Yates RBonchi F(2024)PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral OptimizationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671611(6148-6157)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671611
Zhao TYu WWang SWang LZhang XChen YLiu YCheng WChen HAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Interpretable Imitation Learning with Dynamic Causal RelationsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635827(967-975)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635827
Hu YSamadikhoshkho ZJin JTavakoli M(2024)Label-Free Adaptive Gaussian Sample Consensus Framework for Learning From Perfect and Imperfect DemonstrationsIEEE Transactions on Medical Robotics and Bionics10.1109/TMRB.2024.34226526:3(1093-1103)Online publication date: Aug-2024
https://doi.org/10.1109/TMRB.2024.3422652
Show More Cited By

Recommendations

Improve generated adversarial imitation learning with reward variance regularization
Abstract
Imitation learning aims at recovering expert policies from limited demonstration data. Generative Adversarial Imitation Learning (GAIL) employs the generative adversarial learning framework for imitation learning and has shown great potentials. ...
Deterministic generative adversarial imitation learning
- The proposed method introduces the deterministic policy into the generative adversarial imitation learning method, so that the robot can quickly imitate the ...
Graphical abstract

Display Omitted

Abstract
This paper proposes a deterministic generative adversarial imitation learning method which allows the robot to implement the motion planning task rapidly by learning from the demonstration data without reward function. In our method, ...
Lipschitzness is all you need to tame off-policy generative adversarial imitation learning
Abstract
Despite the recent success of reinforcement learning in various domains, these approaches remain, for the most part, deterringly sensitive to hyper-parameters and are often riddled with essential engineering feats allowing their success. We ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Proceedings of The Web Conference 2020

April 2020

3143 pages

ISBN:9781450370233

DOI:10.1145/3366423

Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
771
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)3

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ye YTang LWang HYu RYu WHe EChen HXiong HBaeza-Yates RBonchi F(2024)PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral OptimizationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671611(6148-6157)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671611
Zhao TYu WWang SWang LZhang XChen YLiu YCheng WChen HAngélica LLattanzi SMuñoz Medina AAkoglu LGionis AVassilvitskii S(2024)Interpretable Imitation Learning with Dynamic Causal RelationsProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635827(967-975)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3616855.3635827
Hu YSamadikhoshkho ZJin JTavakoli M(2024)Label-Free Adaptive Gaussian Sample Consensus Framework for Learning From Perfect and Imperfect DemonstrationsIEEE Transactions on Medical Robotics and Bionics10.1109/TMRB.2024.34226526:3(1093-1103)Online publication date: Aug-2024
https://doi.org/10.1109/TMRB.2024.3422652
Tamboli DChen JJotheeswaran KYu DAggarwal V(2024)Reinforced Sequential Decision-Making for Sepsis Treatment: The PosNegDM Framework With Mortality Classifier and TransformerIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2024.337721428:5(3114-3122)Online publication date: May-2024
https://doi.org/10.1109/JBHI.2024.3377214
Feng TYan HWang HHuang WHan YLiao HHao JLi YSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)ILRoute: A Graph-based Imitation Learning Method to Unveil Riders' Routing Strategies in Food Delivery ServiceProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599844(4024-4034)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599844
Jiang YYu WSong DWang LCheng WChen HSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)FedSkill: Privacy Preserved Interpretable Skill Learning via ImitationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599349(1010-1019)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599349
Jiang YYu WSong DCheng WChen H(2023)Interpretable Skill Learning for Dynamic Treatment Regimes through Imitation2023 57th Annual Conference on Information Sciences and Systems (CISS)10.1109/CISS56502.2023.10089648(1-6)Online publication date: 22-Mar-2023
https://doi.org/10.1109/CISS56502.2023.10089648
Sun ZDong WLi HHuang Z(2023)Adversarial reinforcement learning for dynamic treatment regimesJournal of Biomedical Informatics10.1016/j.jbi.2022.104244137(104244)Online publication date: Jan-2023
https://doi.org/10.1016/j.jbi.2022.104244
Wang ZLiu QZhang X(2023)Self-adaptive Inverse Soft-Q Learning for ImitationNeural Information Processing10.1007/978-981-99-8138-0_1(3-14)Online publication date: 26-Nov-2023
https://doi.org/10.1007/978-981-99-8138-0_1
Shah SCoronato ANaeem MDe Pietro G(2022)Learning and Assessing Optimal Dynamic Treatment Regimes Through Cooperative Imitation LearningIEEE Access10.1109/ACCESS.2022.319349410(78148-78158)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3193494
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents