survey

Bias in Reinforcement Learning: A Review in Healthcare Applications

Authors:

Benjamin Smith,

Anahita Khojandi,

Rama VasudevanAuthors Info & Claims

ACM Computing Surveys, Volume 56, Issue 2

Article No.: 52, Pages 1 - 17

https://doi.org/10.1145/3609502

Published: 15 September 2023 Publication History

Abstract

Reinforcement learning (RL) can assist in medical decision making using patient data collected in electronic health record (EHR) systems. RL, a type of machine learning, can use these data to develop treatment policies. However, RL models are typically trained using imperfect retrospective EHR data. Therefore, if care is not taken in training, RL policies can propagate existing bias in healthcare. Literature that considers and addresses the issues of bias and fairness in sequential decision making are reviewed. The major themes to mitigate bias that emerge relate to (1) data management; (2) algorithmic design; and (3) clinical understanding of the resulting policies.

References

[1]

Shahriar Akter, Grace McCarthy, Shahriar Sajib, Katina Michael, Yogesh K. Dwivedi, John D’Ambra, and K. N. Shen. 2021. Algorithmic bias in data-driven innovation in the age of AI. International Journal of Information Management 60 (2021). https://www.sciencedirect.com/science/article/pii/S0268401221000803

[2]

Mohamed Alosh, Kathleen Fritsch, Mohammad Huque, Kooros Mahjoob, Gene Pennello, Mark Rothmann, Estelle Russek-Cohen, Fraser Smith, Stephen Wilson, and Lilly Yue. 2015. Statistical considerations on subgroup analysis in clinical trials. Statistics in Biopharmaceutical Research 7, 4 (2015), 286–303.

[3]

Onur Asan and Enid Montague. 2012. Physician interactions with electronic health records in primary care. Health Systems 1, 2 (2012), 96–103.

[4]

Matthew Baucum, Anahita Khojandi, and Rama Vasudevan. 2021a. Improving deep reinforcement learning with transitional variational autoencoders: A healthcare application. IEEE Journal of Biomedical and Health Informatics 25, 6 (2021), 2273–2280. DOI:

[5]

Matt Baucum, Anahita Khojandi, Rama Vasudevan, and Robert Davis. 2022. Adapting reinforcement learning treatment policies using limited data to personalize critical care. INFORMS Journal on Data Science (2022).

[6]

Matt Baucum, Anahita Khojandi, Rama Vasudevan, and Ritesh Ramdhani. 2023. Optimizing patient-specific medication regimen policies using wearable sensors in parkinson’s disease. Management Science 0, 0 (2023).

[7]

Brett K. Beaulieu-Jones, Jason H. Moore, and Pooled Resource Open-Access ALS Clinical Trials Consortium. 2017. Missing data imputation in the electronic health record using deeply learned autoencoders. In Proceedings of the Pacific Symposium on Biocomputing 2017. World Scientific, 207–218.

[8]

Hillary Bekker. 2015. Using decision making theory to inform clinical practice. Shared Decision Making in Healthcare: Achieving Evidence-based Patient Choice (2015).

[9]

Susannah M. Bernheim, Joseph S. Ross, Harlan M. Krumholz, and Elizabeth H. Bradley. 2008. Influence of patients’ socioeconomic status on clinical management decisions: A qualitative study. The Annals of Family Medicine 6, 1 (2008), 53–59.

[10]

Matthew Bond, Ann Bowling, Dorothy McKee, Marian Kennelly, Adrian P. Banning, Nigel Dudley, Andrew Elder, and Anthony Martin. 2003. Does ageism affect the management of ischaemic heart disease? Journal of Health Services Research & Policy 8, 1 (2003), 40–47.

[11]

Jason Brownlee. 2020. Cost-sensitive learning for imbalanced classification. Machine Learning Mastery (Jan2020). https://machinelearningmastery.com/cost-sensitive-learning-for-imbalanced-classification/

[12]

Pascale Carayon, Tosha B. Wetterneck, A. Joy Rivera-Rodriguez, Ann Schoofs Hundt, Peter Hoonakker, Richard Holden, and Ayse P. Gurses. 2014. Human factors systems approach to healthcare quality and patient safety. Applied Ergonomics 45, 1 (2014), 14–25.

[13]

Changgee Chang, Yi Deng, Xiaoqian Jiang, and Qi Long. 2020. Multiple imputation for analysis of incomplete data in distributed health data networks. Nature Communications 11, 1 (2020), 1–11.

[14]

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321–357.

[15]

Irene Chen, Fredrik D. Johansson, and David Sontag. 2018. Why Is My Classifier Discriminatory? (Dec2018). https://arxiv.org/abs/1805.12002

[16]

Dogan C. Cicek, Enes Duran, Baturay Saglam, Kagan Kaya, Furkan Mutlu, and Suleyman S. Kozat. 2021. AWD3: Dynamic reduction of the estimation bias. In Proceedings of the 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI’21). IEEE, 775–779.

[17]

David I. Cook, Val J. Gebski, and Anthony C. Keech. 2004. Subgroup analysis in clinical trials. Medical Journal of Australia 180, 6 (2004), 289.

[18]

Ali el Hassouni, Mark Hoogendoorn, Martijn van Otterlo, and Eduardo Barbaro. 2018. Personalization of health interventions using cluster-based reinforcement learning. In Proceedings of PRIMA.

[19]

Tomás Escobar-Rodríguez, Pedro Monge-Lozano, and Ma Mercedes Romero-Alonso. 2012. Acceptance of e-prescriptions and automated medication-management systems in hospitals: An extension of the technology acceptance model. Journal of Information Systems 26, 1 (2012), 77–96.

[20]

Vincent François-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare, and Joelle Pineau. 2018. An introduction to deep reinforcement learning. CoRR abs/1811.12560 (2018). arxiv:1811.12560 http://arxiv.org/abs/1811.12560

[21]

Michael F. Furukawa, T. S. Raghu, and Benjamin B. M. Shao. 2010. Electronic medical records, nurse staffing, and nurse-sensitive patient outcomes: Evidence from California hospitals, 1998–2007. Health Services Research 45, 4 (2010), 941–962.

[22]

Yingqiang Ge, Shuchang Liu, Ruoyuan Gao, Yikun Xian, Yunqi Li, Xiangyu Zhao, Changhua Pei, Fei Sun, Junfeng Ge, Wenwu Ou, and Yongfeng Zhang. 2021. Towards long-term fairness in recommendation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 445–453.

Digital Library

[23]

Yingqiang Ge, Xiaoting Zhao, Lucia Yu, Saurabh Paul, Diane Hu, Chu-Cheng Hsieh, and Yongfeng Zhang. 2022. Toward pareto efficient fairness-utility trade-off in recommendation through reinforcement learning. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 316–324.

Digital Library

[24]

Xinyang Geng, Kevin Li, Abhishek Gupta, Aviral Kumar, and Sergey Levine. [n.d.]. Effective offline RL needs going beyond pessimism: Representations and distributional shift. In Proceedings of the Decision Awareness in Reinforcement Learning Workshop at ICML 2022.

[25]

Yue Geng and Xinyu Luo. 2018. Cost-Sensitive Convolution based Neural Networks for Imbalanced Time-Series Classification. (2018). arxiv:cs.LG/1801.04396

[26]

Mohammad Ghassemi, Stefan Richter, Ifeoma Eche, Tszyi Chen, John Danziger, and Leo Celi. 2014. A data-driven approach to optimized medication dosing: A focus on heparin. Intensive Care Medicine 40 (082014). DOI:

[27]

Rachel Gold, Erika Cottrell, Arwen Bunce, Mary Middendorf, Celine Hollombe, Stuart Cowburn, Peter Mahr, and Gerardo Melgar. 2017. Developing electronic health record (EHR) strategies related to health center patients’ social determinants of health. The Journal of the American Board of Family Medicine 30, 4 (2017), 428–447.

[28]

Bryce Goodman and Seth Flaxman. 2017. European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine 38, 3 (2017), 50–57.

Digital Library

[29]

Marek Grześ. 2017. Reward shaping in episodic reinforcement learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (AAMAS’17). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 565–573.

[30]

F. M. Hajjaj, M. S. Salek, M. K. A. Basra, and A. Y. Finlay. 2010. Non-clinical influences on clinical decision-making: A major challenge to evidence-based practice. Journal of the Royal Society of Medicine 103, 5 (May2010), 178–187. DOI:

[31]

Qiang He and Xinwen Hou. 2020. WD3: Taming the estimation bias in deep reinforcement learning. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI’20). IEEE, 391–398.

[32]

Úrsula Hébert-Johnson, Michael P. Kim, Omer Reingold, and Guy N. Rothblum. 2017. Calibration for the (Computationally-Identifiable) Masses. CoRR abs/1711.08513 (2017). arxiv:1711.08513 http://arxiv.org/abs/1711.08513

[33]

Alexandre Heuillet, Fabien Couthouis, and Natalia Díaz Rodríguez. 2020. Explainability in deep reinforcement learning. CoRR abs/2008.06693 (2020). arxiv:2008.06693 https://arxiv.org/abs/2008.06693

[34]

Calvin WL Ho, Joseph Ali, and Karel Caals. 2020. Ensuring trustworthy use of artificial intelligence and big data analytics in health insurance. Bulletin of the World Health Organization 98, 4 (2020), 263.

[35]

Sara Hooker. 2021. Moving Beyond “Algorithmic Bias is a Data Problem”. (Apr2021). https://www.sciencedirect.com/science/article/pii/S2666389921000611

[36]

Yujing Hu, Weixun Wang, Hangtian Jia, Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, and Changjie Fan. 2020. Learning to utilize shaping rewards: A new approach of reward shaping. CoRR abs/2011.02669 (2020). arXiv:2011.02669 https://arxiv.org/abs/2011.02669

[37]

Zhen Hu, Genevieve B. Melton, Elliot G. Arsoniadis, Yan Wang, Mary R. Kwaan, and Gyorgy J. Simon. 2017. Strategies for handling missing clinical data for automated surgical site infection detection from the electronic health record. Journal of Biomedical Informatics 68 (2017), 112–120.

Digital Library

[38]

Janus Christian Jakobsen, Christian Gluud, Jørn Wetterslev, and Per Winkel. 2017. When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC Medical Research Methodology 17, 1 (2017), 1–10.

[39]

Alistair Johnson, Tom Pollard, and Roger Mark. 2016. MIMIC-III Clinical Database. (Sept2016). https://physionet.org/content/mimiciii/1.4/

[40]

Lynn B. Jorde and Stephen P. Wooding. 2004. Genetic variation, classification and ’race’. Nature Genetics 36, 11 (2004), S28–S33.

[41]

Christopher J. Kelly, Alan Karthikesalingam, Mustafa Suleyman, Greg Corrado, and Dominic King. 2019. Key Challenges for Delivering Clinical Impact with Artificial Intelligence. (Oct2019).

[42]

Anahita Khojandi, Lisa M. Maillart, Oleg A. Prokopyev, Mark S. Roberts, Timothy Brown, and William W. Barrington. 2014. Optimal implantable cardioverter defibrillator (ICD) generator replacement. INFORMS Journal on Computing 26, 3 (2014), 599–615.

[43]

Anahita Khojandi, Lisa M. Maillart, Oleg A. Prokopyev, Mark S. Roberts, and Samir F. Saba. 2018. Dynamic abandon/extract decisions for failed cardiac leads. Management Science 64, 2 (2018), 633–651.

Digital Library

[44]

Aki Koivu, Mikko Sairanen, Antti Airola, and Tapio Pahikkala. 2020. Synthetic minority oversampling of vital statistics data with generative adversarial networks. Journal of the American Medical Informatics Association 27, 11 (2020), 1667–1674.

[45]

Noemi Kreif, Richard Grieve, Rosalba Radice, Zia Sadique, Roland Ramsahai, and Jasjeet S. Sekhon. 2012. Methods for estimating subgroup effects in cost-effectiveness analyses that use observational data. Medical Decision Making 32, 6 (2012), 750–763.

[46]

Matjaz Kukar and Igor Kononenko. 1998. Cost-sensitive learning with neural networks. In ECAI, Vol. 15. Citeseer, 88–94.

[47]

Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. 2019. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems 32 (2019).

[48]

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems 33 (2020), 1179–1191.

[49]

Kimmo Kärkkäinen and Jungseock Joo. 2019. FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age. (2019). arxiv:cs.CV/1908.04913

[50]

Isotta Landi, Benjamin S. Glicksberg, Hao-Chih Lee, Sarah Cherng, Giulia Landi, Matteo Danieletto, Joel T. Dudley, Cesare Furlanello, and Riccardo Miotto. 2020. Deep representation learning of electronic health records to unlock patient stratification at scale. NPJ Digital Medicine 3, 1 (2020), 1–11.

[51]

Catherine R. Lesko, Nicholas C. Henderson, and Ravi Varadhan. 2018. Considerations when assessing heterogeneity of treatment effect in patient-centered outcomes research. Journal of Clinical Epidemiology 100 (2018), 22–31.

[52]

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. CoRR abs/2005.01643 (2020). arxiv:2005.01643 https://arxiv.org/abs/2005.01643

[53]

Ping Li, Zi yan Cheng, and Gui lin Liu. 2020. Availability bias causes misdiagnoses by physicians: Direct evidence from a randomized controlled trial. Internal Medicine 59, 24 (2020), 3141–3146.

[54]

Qing Li, Wengang Zhou, Zhenbo Lu, and Houqiang Li. 2022. Simultaneous double Q-learning with conservative advantage learning for actor-critic methods. arXiv preprint arXiv:2205.03819 (2022).

[55]

Tian-Hao Li, Zhi-Shun Wang, Wei Lu, Qian Zhang, and Deng-Feng Li. 2021. Electronic health records based reinforcement learning for treatment optimizing. Information Systems (2021), 101878.

[56]

Yuxi Li. 2017. Deep reinforcement learning: An overview. CoRR abs/1701.07274 (2017). http://dblp.uni-trier.de/db/journals/corr/corr1701.html#Li17b

[57]

Enlu Lin, Qiong Chen, and Xiaoming Qi. 2019. Deep Reinforcement Learning for Imbalanced Classification. (2019). arxiv:cs.LG/1901.01379

[58]

Enlu Lin, Qiong Chen, and Xiaoming Qi. 2019. Deep reinforcement learning for imbalanced classification. CoRR abs/1901.01379 (2019). arxiv:1901.01379 http://arxiv.org/abs/1901.01379

[59]

Michael L. Littman. 1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning. Morgan-Kaufmann, 157–163.

Digital Library

[60]

Ning Liu, Ying Liu, Brent Logan, Zhiyuan Xu, Jian Tang, and Yanzhi Wang. 2018. Deep reinforcement learning for dynamic treatment regimes on medical registry data. CoRR abs/1801.09271 (2018). arxiv:1801.09271 http://arxiv.org/abs/1801.09271

[61]

S. Liu, K. C. See, K. Y. Ngiam, L. A. Celi, X. Sun, and M. Feng. 2020. Reinforcement learning for clinical decision support in critical care: comprehensive review. Journal of Medical Internet Research 22, 7 (2020), e18477.

[62]

Zeyu Liu, Anahita Khojandi, Xueping Li, Akram Mohammed, Robert L. Davis, and Rishikesan Kamaleswaran. 2022. A machine learning–enabled partially observable markov decision process framework for early sepsis prediction. INFORMS J. on Computing 34, 4 (July-August 2022), 2039–2057.

Digital Library

[63]

MingYu Lu, Zachary Shahn, Daby Sow, Finale Doshi-Velez, and Li-wei H Lehman. 2020. Is deep reinforcement learning ready for practical applications in healthcare? A sensitivity analysis of duel-DDQN for hemodynamic management in sepsis patients. In AMIA Annual Symposium Proceedings, Vol. 2020. American Medical Informatics Association, 773.

[64]

Aaron J. Masino, Mary Catherine Harris, Daniel Forsyth, Svetlana Ostapenko, Lakshmi Srinivasan, Christopher P. Bonafide, Fran Balamuth, Melissa Schmatz, and Robert W. Grundmeier. 2019. Machine learning models for early sepsis recognition in the neonatal intensive care unit using readily available electronic health record data. PloS One 14, 2 (2019), e0212665.

[65]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. 2013. Playing Atari with deep reinforcement learning. CoRR abs/1312.5602 (2013). arxiv:1312.5602 http://arxiv.org/abs/1312.5602

[66]

Susan A. Murphy. 2005. An experimental design for the development of adaptive treatment strategies. Statistics in Medicine 24, 10 (2005), 1455–1481.

[67]

Shamim Nemati, Mohammad M. Ghassemi, and Gari D. Clifford. 2016. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’16). IEEE, 2978–2981.

[68]

Cattram D. Nguyen, John B. Carlin, and Katherine J. Lee. 2021. Practical strategies for handling breakdown of multiple imputation procedures. Emerging Themes in Epidemiology 18, 1 (2021), 1–8.

[69]

Takato Okudo and Seiji Yamada. 2021. Subgoal-based reward shaping to improve efficiency in reinforcement learning. CoRR abs/2104.06411 (2021). arxiv:2104.06411 https://arxiv.org/abs/2104.06411

[70]

Martijn Otterlo and Marco Wiering. 2012. Reinforcement learning and Markov decision processes. Reinforcement Learning: State of the Art (012012), 3–42. DOI:

[71]

Trishan Panch, Heather Mattie, and Rifat Atun. 2019. Artificial Intelligence and Algorithmic Bias: Implications for Health Systems. (Dec2019). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6875681/

[72]

Sonali Parbhoo, Jasmina Bogojeska, Maurizio Zazzi, Volker Roth, and Finale Doshi-Velez. 2017. Combining Kernel and Model Based Learning for HIV Therapy Selection. (Jul2017). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5543338/

[73]

Sonali Parbhoo, Mario Wieser, Volker Roth, and Finale Doshi-Velez. 2020. Transfer learning from well-curated to less-resourced populations with HIV. In Proceedings of the Machine Learning for Healthcare Conference. PMLR, 589–609.

[74]

Ravi B. Parikh, Stephanie Teeple, and Amol S. Navathe. 2019. Addressing bias in artificial intelligence in health care. JAMA 322, 24 (2019), 2377–2378.

[75]

Dana Pessach and Erez Shmueli. 2022. A review on fairness in machine learning. ACM Comput. Surv. 55, 3, Article 51 (Feb 2022), 44 pages. DOI:

Digital Library

[76]

Tom J. Pollard, Alistair E. W. Johnson, Jesse D. Raffa, Leo A. Celi, Roger G. Mark, and Omar Badawi. 2018. The eICU collaborative research database, a freely available multi-center database for critical care research. Scientific Data 5, 1 (2018), 1–13.

[77]

Erika Puiutta and Eric M. S. P. Veith. 2020. Explainable reinforcement learning: A survey. CoRR abs/2005.06247 (2020). arxiv:2005.06247 https://arxiv.org/abs/2005.06247

[78]

Aniruddh Raghu, Matthieu Komorowski, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. 2017. Continuous state-space models for optimal sepsis treatment - A deep reinforcement learning approach. CoRR abs/1705.08422 (2017). arxiv:1705.08422 http://arxiv.org/abs/1705.08422

[79]

Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Björn W. Schuller, and Jiajun Liu. 2021. A novel policy for pre-trained deep reinforcement learning for speech emotion recognition. CoRR abs/2101.00738 (2021). arXiv:2101.00738 https://arxiv.org/abs/2101.00738

[80]

Alvin Rajkomar, Michael Howell, and Michaela Hardt. 2018. Ensuring Fairness in Machine Learning to Advance Health Equity. (2018). https://pubmed.ncbi.nlm.nih.gov/30508424/

[81]

Susan Rea, Jyotishman Pathak, Guergana Savova, Thomas A. Oniki, Les Westberg, Calvin E. Beebe, Cui Tao, Craig G. Parker, Peter J. Haug, Stanley M. Huff, and Christopher G. Chute. 2012. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project. Journal of Biomedical Informatics 45, 4 (2012), 763–771.

Digital Library

[82]

Elsa Riachi, Muhammad Mamdani, Michael Fralick, and Frank Rudzicz. 2021. Challenges for Reinforcement Learning in Healthcare. (2021). arxiv:cs.LG/2103.05612

[83]

Wayne J. Riley. 2012. Health disparities: Gaps in access, quality and affordability of medical care. Transactions of the American Clinical and Climatological Association 123 (2012), 167.

[84]

Patricia J. Rodriguez, Zachary J. Ward, Michael W. Long, S. Bryn Austin, and Davene R. Wright. 2021. Applied methods for estimating transition probabilities from electronic health record data. Medical Decision Making 41, 2 (2021), 143–152.

[85]

S. Rosenbloom, William Stead, Joshua Denny, Dario Giuse, Nancy Lorenzi, Steven Brown, and Kevin Johnson. 2010. Generating clinical notes for electronic health record systems. Applied Clinical Informatics 1 (072010), 232–243. DOI:

[86]

Fernando Sánchez-Hernández, Juan Carlos Ballesteros-Herráez, Mohamed S. Kraiem, Mercedes Sánchez-Barba, and María N. Moreno-García. 2019. Predictive modeling of ICU healthcare-associated infections from imbalanced data. Using ensembles and a clustering-based undersampling approach. Applied Sciences 9, 24 (2019), 5287.

[87]

Andrew Schaefer and Matthew Bailey. 2005. Modeling Medical Treatment using Markov Decision Processes. (2005).

[88]

Ashkan Sharabiani, Adam Bress, Elnaz Douzali, and Houshang Darabi. 2015. Revisiting warfarin dosing using machine learning techniques. Computational and Mathematical Methods in Medicine 2015 (2015).

[89]

Jonathan A. C. Sterne, Ian R. White, John B. Carlin, Michael Spratt, Patrick Royston, Michael G. Kenward, Angela M. Wood, and James R. Carpenter. 2009. Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BMJ 338 (2009).

[90]

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction (2nd ed.). The MIT Press.

Digital Library

[91]

Phillip Swazinna, Steffen Udluft, and Thomas Runkler. 2021. Overcoming model bias for robust offline deep reinforcement learning. Engineering Applications of Artificial Intelligence 104 (2021), 104366.

[92]

Nicholas L. Syn, Andrea Li-Ann Wong, Soo-Chin Lee, Hock-Luen Teoh, James Wei Luen Yip, Raymond C. S. Seet, Wee Tiong Yeo, William Kristanto, Ping-Chong Bee, L. M. Poon, Patrick Marban, Tuck Seng Wu, Michael D. Winther, Liam R. Brunham, Richie Soong, Bee-Choo Tai, and Boon-Cher Goh. 2018. Genotype-guided versus traditional clinical dosing of warfarin in patients of Asian ancestry: A randomized controlled trial. BMC Medicine 16, 1 (2018), 1–10.

[93]

Shengpu Tang, Aditya Modi, Michael W. Sjoding, and Jenna Wiens. 2020. Clinician-in-the-loop decision making: Reinforcement learning with near-optimal set-valued policies. CoRR abs/2007.12678 (2020). arxiv:2007.12678 https://arxiv.org/abs/2007.12678

[94]

Julien Tanniou, Ingeborg Van Der Tweel, Steven Teerenstra, and Kit C. B. Roes. 2016. Subgroup analyses in confirmatory clinical trials: Time to be specific about their purposes. BMC Medical Research Methodology 16, 1 (2016), 1–15.

[95]

Alexandra Chouldechova and and Aaron Roth. 2020. A Snapshot of the Frontiers of Fairness in Machine Learning. (May2020).

Digital Library

[96]

R. Vincent. 2014. Reinforcement learning in models of adaptive medical treatment strategies. McGill University (Canada). 2014.

[97]

Darshali A. Vyas, Leo G. Eisenstein, and David S. Jones. 2020. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. New England Journal of Medicine 383, 9 (2020), 874–882.

[98]

Jeremy Watts, Anahita Khojandi, Rama Vasudevan, and Ritesh Ramdhani. 2020. Optimizing individualized treatment planning for Parkinson’s disease using deep reinforcement learning. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC’20). 5406–5409. DOI:

[99]

Min Wen, Osbert Bastani, and Ufuk Topcu. 2021. Algorithms for fairness in sequential decision making. In Proceedings of the International Conference on Artificial Intelligence and Statistics(PMLR), 1144–1152.

[100]

Wei-Hung Weng, Mingwu Gao, Ze He, Susu Yan, and Peter Szolovits. 2017. Representation and reinforcement learning for personalized glycemic control in septic patients. CoRR abs/1712.00654 (2017). arxiv:1712.00654 http://arxiv.org/abs/1712.00654

[101]

Jeff Whittle, Joseph Conigliaro, C. B. Good, and Richard P. Lofgren. 1993. Racial differences in the use of invasive cardiovascular procedures in the Department of Veterans Affairs medical system. New England Journal of Medicine 329, 9 (1993), 621–627.

[102]

Edwin S. Wong, Jean Yoon, Rebecca I. Piegari, Ann-Marie M Rosland, Stephan D. Fihn, and Evelyn T. Chang. 2018. Identifying latent subgroups of high-risk patients using risk score trajectories. Journal of General Internal Medicine 33, 12 (2018), 2120–2126.

[103]

Jionglin Wu, Jason Roy, and Walter F. Stewart. 2010. Prediction modeling using EHR data: Challenges, strategies, and a comparison of machine learning approaches. Medical Care (2010), S106–S113.

[104]

Jiachen Yang, Brenden K. Petersen, Hongyuan Zha, and Daniel M. Faissol. 2019. Single episode policy transfer in reinforcement learning. CoRR abs/1910.07719 (2019). arXiv:1910.07719 http://arxiv.org/abs/1910.07719

[105]

Jenny Yang, Andrew A. S. Soltan, and David A. Clifton. 2022. Algorithmic fairness and bias mitigation for clinical machine learning: A new utility for deep reinforcement learning. medRxiv (2022).

[106]

Jiayu Yao, Taylor Killian, George Konidaris, and Finale Doshi-Velez. 2018. Direct policy transfer via hidden parameter markov decision processes. In Proceedings of the LLARLA Workshop, FAIM, Vol. 2018.

[107]

Chao Yu, Jiming Liu, Shamim Nemati, and Guosheng Yin. 2021. Reinforcement learning in healthcare: A survey. ACM Comput. Surv. 55, 1, Article 5 (Nov 2021), 36 pages. DOI:

Digital Library

[108]

Daochen Zha, Kwei-Herng Lai, Qiaoyu Tan, Sirui Ding, Na Zou, and Xia Ben Hu. 2022. Towards automated imbalanced learning with deep hierarchical reinforcement learning. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2476–2485.

Digital Library

[109]

Songan Zhang, Lu Wen, Huei Peng, and H. Eric Tseng. 2021. Quick learner automated vehicle adapting its roadmanship to varying traffic cultures with meta reinforcement learning. CoRR abs/2104.08876 (2021). arXiv:2104.08876 https://arxiv.org/abs/2104.08876

[110]

Yang Zhao, Zoie Shui-Yee Wong, and Kwok Leung Tsui. 2018. A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events’ Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection. (May2018). https://www.hindawi.com/journals/jhe/2018/6275435/

[111]

Tuanfei Zhu, Yaping Lin, and Yonghe Liu. 2020. Oversampling for imbalanced time series data. CoRR abs/2004.06373 (2020). arxiv:2004.06373 https://arxiv.org/abs/2004.06373

[112]

Zhuangdi Zhu, Kaixiang Lin, and Jiayu Zhou. 2020. Transfer learning in deep reinforcement learning: A survey. CoRR abs/2009.07888 (2020). arXiv:2009.07888 https://arxiv.org/abs/2009.07888

Cited By

Gursel EMadadi MCoble JAgarwal VYadav VBoring RKhojandi A(2025)The role of AI in detecting and mitigating human errors in safety-critical industries: A reviewReliability Engineering & System Safety10.1016/j.ress.2024.110682256(110682)Online publication date: Apr-2025
https://doi.org/10.1016/j.ress.2024.110682
Sidhi Menon USiby TNatchimuthu N(2024)Comprehending Algorithmic Bias and Strategies for Fostering Trust in Artificial IntelligenceDigital Technologies, Ethics, and Decentralization in the Digital Era10.4018/979-8-3693-1762-4.ch014(286-305)Online publication date: 8-Feb-2024
https://doi.org/10.4018/979-8-3693-1762-4.ch014
Al-Hamadani MFadhel MAlzubaidi LHarangi B(2024)Reinforcement Learning Algorithms and Applications in Healthcare and Robotics: A Comprehensive and Systematic ReviewSensors10.3390/s2408246124:8(2461)Online publication date: 11-Apr-2024
https://doi.org/10.3390/s24082461
Show More Cited By

Index Terms

Bias in Reinforcement Learning: A Review in Healthcare Applications
1. Applied computing
  1. Life and medical sciences
    1. Health care information systems
    2. Health informatics

Recommendations

Reinforcement Learning in Healthcare: A Survey
As a subfield of machine learning, reinforcement learning (RL) aims at optimizing decision making by using interaction samples of an agent with its environment and the potentially delayed feedbacks. In contrast to traditional supervised learning that ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Reinforcement learning-based expanded personalized diabetes treatment recommendation using South Korean electronic health records
Highlights
- Expanded treatment recommendation model addresses challenges in healthcare domain.
Abstract
Currently, electronic medical records are becoming more accessible to a growing number of researchers seeking to develop personalized healthcare recommendations to aid physicians in making better clinical decisions and treating ...

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 56, Issue 2

February 2024

974 pages

EISSN:1557-7341

DOI:10.1145/3613559

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2023

Online AM: 18 July 2023

Accepted: 11 July 2023

Revised: 16 November 2022

Received: 17 February 2022

Published in CSUR Volume 56, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey

Funding Sources

Science Alliance, The University of Tennessee, and the Laboratory Directed Research
Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U.S. Department of Energy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
1,079
Total Downloads

Downloads (Last 12 months)666
Downloads (Last 6 weeks)83

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gursel EMadadi MCoble JAgarwal VYadav VBoring RKhojandi A(2025)The role of AI in detecting and mitigating human errors in safety-critical industries: A reviewReliability Engineering & System Safety10.1016/j.ress.2024.110682256(110682)Online publication date: Apr-2025
https://doi.org/10.1016/j.ress.2024.110682
Sidhi Menon USiby TNatchimuthu N(2024)Comprehending Algorithmic Bias and Strategies for Fostering Trust in Artificial IntelligenceDigital Technologies, Ethics, and Decentralization in the Digital Era10.4018/979-8-3693-1762-4.ch014(286-305)Online publication date: 8-Feb-2024
https://doi.org/10.4018/979-8-3693-1762-4.ch014
Al-Hamadani MFadhel MAlzubaidi LHarangi B(2024)Reinforcement Learning Algorithms and Applications in Healthcare and Robotics: A Comprehensive and Systematic ReviewSensors10.3390/s2408246124:8(2461)Online publication date: 11-Apr-2024
https://doi.org/10.3390/s24082461
Londoño LValeria Hurtado JHertz NKellmeyer PVoeneky SValada A(2024)Fairness and Bias in Robot LearningProceedings of the IEEE10.1109/JPROC.2024.3403898112:4(305-330)Online publication date: Apr-2024
https://doi.org/10.1109/JPROC.2024.3403898
Al-Akayleh FAl-Remawi MAli Agha A(2024)AI-Driven Physical Rehabilitation Strategies in Post-Cancer Care2024 2nd International Conference on Cyber Resilience (ICCR)10.1109/ICCR61006.2024.10532883(1-6)Online publication date: 26-Feb-2024
https://doi.org/10.1109/ICCR61006.2024.10532883
Yang YZhang CZhang BNing J(2024)A reinforcement learning assisted evolutionary algorithm for constrained multi-task optimizationInformation Sciences10.1016/j.ins.2024.120863678(120863)Online publication date: Sep-2024
https://doi.org/10.1016/j.ins.2024.120863
Singh PKushwaha A(2024)Leveraging Natural Language Queries for Effective Video AnalysisArtificial Intelligence: Theory and Applications10.1007/978-981-99-8476-3_18(231-240)Online publication date: 28-Feb-2024
https://doi.org/10.1007/978-981-99-8476-3_18

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents