Abstract
An important problem in reinforcement learning is the need for greater sample efficiency. One approach to dealing with this problem is to incorporate external information elicited from a domain expert in the learning process. Indeed, it has been shown that incorporating expert advice in the learning process can improve the rate at which an agent’s policy converges. However, these approaches typically assume a single, infallible expert; learning from multiple and/or unreliable experts is considered an open problem in assisted reinforcement learning. We present CLUE (cautiously learning with unreliable experts), a framework for learning single-stage decision problems with action advice from multiple, potentially unreliable experts that augments an unassisted learning with a model of expert reliability and a Bayesian method of pooling advice to select actions during exploration. Our results show that CLUE maintains the benefits of traditional approaches when advised by reliable experts, but is robust to the presence of unreliable experts. When learning with multiple experts, CLUE is able to rank experts by their reliability and differentiate experts based on their reliability.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code availability
All code at https://github.com/tamlinlove/CLUE_SSDP.
References
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on World Wide Web, pp 661–670
Huo X, Fu F (2017) Risk-aware multi-armed bandit problem with application to portfolio selection. R Soc Open Sci 4(11):171377
Varatharajah Y, Berry B, Koyejo S, Iyer R (2018) A contextual-bandit-based approach for informed decision-making in clinical trials. J Environ Sci (China) Engl Ed
Lauritzen SL, Spiegelhalter DJ (1988) Local computations with probabilities on graphical structures and their application to expert systems. J R Stat Soc Ser B (Methodol) 50(2):157–194
Kao H-C, Tang K-F, Chang EY (2018) Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Bignold A, Cruz F, Taylor ME, Brys T, Dazeley R, Vamplew P, Foale C (2021) A conceptual framework for externally-influenced agents: An assisted reinforcement learning review. J Ambient Intell Hum Comput 2021:1–24
Torrey L, Taylor M (2013) Teaching on a budget: agents advising agents in reinforcement learning. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp 1053–1060
Efthymiadis K, Devlin S, Kudenko D ( 2013) Overcoming erroneous domain knowledge in plan-based reward shaping. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp 1245–1246. Citeseer
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
Langford J, Zhang T (2007) The epoch-greedy algorithm for contextual multi-armed bandits. In: Proceedings of the 20th international conference on neural information processing systems, pp 817–824. Citeseer
Cortes D (2018) Adapting multi-armed bandits policies to contextual bandits scenarios. arXiv preprint arXiv:1811.04383 (2018)
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2):235–256
Taylor ME, Chernova S ( 2010) Integrating human demonstration and reinforcement learning: initial results in human-agent transfer. In: Proceedings of the agents learning interactively with human teachers AAMAS workshop, p 23. Citeseer
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685
Thomaz AL, Hoffman G, Breazeal C (2005) Real-time interactive reinforcement learning for robots. In: AAAI 2005 workshop on human comprehensible machine learning
Gimelfarb M, Sanner S, Lee C-G (2018) Reinforcement learning with multiple experts: A Bayesian model combination approach. Adv Neural Inf Process Syst 31:9528–9538
Fernández F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the fifth international joint conference on autonomous agents and multiagent systems, pp 720–727
Griffith S, Subramanian K, Scholz J, Isbell CL, Thomaz AL (2013) Policy shaping: integrating human feedback with reinforcement learning. Georgia Institute of Technology
Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) Persistent rule-based interactive reinforcement learning. Neural Comput Appl 2021:1–18
Shelton C (2000) Balancing multiple sources of reward in reinforcement learning. Adv Neural Inf Process Syst 13:1082–1088
Gao Y, Xu H, Lin J, Yu F, Levine S, Darrell T (2018) Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:1802.05313
Keswani V, Lease M, Kenthapadi K (2021) Towards unbiased and accurate deferral to multiple experts. arXiv preprint arXiv:2102.13004
Owen CEB (2008) Parameter estimation for the beta distribution. Master’s thesis, Brigham Young University-Provo
Yi SKM, Steyvers M, Lee MD, Dry MJ (2012) The wisdom of the crowd in combinatorial problems. Cogn Sci 36(3):452–470
Burke P, Klein R (2020) Confident in the crowd: Bayesian inference to improve data labelling in crowdsourcing. In: 2020 International SAUPEC/RobMech/PRASA conference. IEEE, pp 1–6
Howard RA, Matheson JE (2005) Influence diagrams. Decis Anal 2(3):127–143
Innes C, Lascarides A (2019) Learning structured decision problems with unawareness. In: International conference on machine learning, pp 2941–2950
Cleveland WS (1981) LOWESS: a program for smoothing scatterplots by robust locally weighted regression. Am Stat 35(1):54
Funding
The authors did not receive support from any organisation for the submitted work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Love, T., Ajoodha, R. & Rosman, B. Who should I trust? Cautiously learning with unreliable experts. Neural Comput & Applic 35, 16865–16875 (2023). https://doi.org/10.1007/s00521-022-07808-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07808-y