Who should I trust? Cautiously learning with unreliable experts

Love, Tamlin; Ajoodha, Ritesh; Rosman, Benjamin

doi:10.1007/s00521-022-07808-y

Who should I trust? Cautiously learning with unreliable experts

S.I.: Human-aligned Reinforcement Learning for Autonomous Agents and Robots
Published: 27 September 2022

Volume 35, pages 16865–16875, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

347 Accesses
1 Altmetric
Explore all metrics

Abstract

An important problem in reinforcement learning is the need for greater sample efficiency. One approach to dealing with this problem is to incorporate external information elicited from a domain expert in the learning process. Indeed, it has been shown that incorporating expert advice in the learning process can improve the rate at which an agent’s policy converges. However, these approaches typically assume a single, infallible expert; learning from multiple and/or unreliable experts is considered an open problem in assisted reinforcement learning. We present CLUE (cautiously learning with unreliable experts), a framework for learning single-stage decision problems with action advice from multiple, potentially unreliable experts that augments an unassisted learning with a model of expert reliability and a Bayesian method of pooling advice to select actions during exploration. Our results show that CLUE maintains the benefits of traditional approaches when advised by reliable experts, but is robust to the presence of unreliable experts. When learning with multiple experts, CLUE is able to rank experts by their reliability and differentiate experts based on their reliability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active inference and the two-step task

Article Open access 21 October 2022

Reinforcement Learning Informed by Optimal Control

Assessing Explainability in Reinforcement Learning

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Code availability

All code at https://github.com/tamlinlove/CLUE_SSDP.

References

Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th international conference on World Wide Web, pp 661–670
Huo X, Fu F (2017) Risk-aware multi-armed bandit problem with application to portfolio selection. R Soc Open Sci 4(11):171377
Article MathSciNet Google Scholar
Varatharajah Y, Berry B, Koyejo S, Iyer R (2018) A contextual-bandit-based approach for informed decision-making in clinical trials. J Environ Sci (China) Engl Ed
Lauritzen SL, Spiegelhalter DJ (1988) Local computations with probabilities on graphical structures and their application to expert systems. J R Stat Soc Ser B (Methodol) 50(2):157–194
MathSciNet MATH Google Scholar
Kao H-C, Tang K-F, Chang EY (2018) Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning. In: Thirty-second AAAI conference on artificial intelligence
Bignold A, Cruz F, Taylor ME, Brys T, Dazeley R, Vamplew P, Foale C (2021) A conceptual framework for externally-influenced agents: An assisted reinforcement learning review. J Ambient Intell Hum Comput 2021:1–24
Google Scholar
Torrey L, Taylor M (2013) Teaching on a budget: agents advising agents in reinforcement learning. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp 1053–1060
Efthymiadis K, Devlin S, Kudenko D ( 2013) Overcoming erroneous domain knowledge in plan-based reward shaping. In: Proceedings of the 2013 international conference on autonomous agents and multi-agent systems, pp 1245–1246. Citeseer
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
MATH Google Scholar
Langford J, Zhang T (2007) The epoch-greedy algorithm for contextual multi-armed bandits. In: Proceedings of the 20th international conference on neural information processing systems, pp 817–824. Citeseer
Cortes D (2018) Adapting multi-armed bandits policies to contextual bandits scenarios. arXiv preprint arXiv:1811.04383 (2018)
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Mach Learn 47(2):235–256
Article MATH Google Scholar
Taylor ME, Chernova S ( 2010) Integrating human demonstration and reinforcement learning: initial results in human-agent transfer. In: Proceedings of the agents learning interactively with human teachers AAMAS workshop, p 23. Citeseer
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10:1633–1685
MathSciNet MATH Google Scholar
Thomaz AL, Hoffman G, Breazeal C (2005) Real-time interactive reinforcement learning for robots. In: AAAI 2005 workshop on human comprehensible machine learning
Gimelfarb M, Sanner S, Lee C-G (2018) Reinforcement learning with multiple experts: A Bayesian model combination approach. Adv Neural Inf Process Syst 31:9528–9538
Google Scholar
Fernández F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the fifth international joint conference on autonomous agents and multiagent systems, pp 720–727
Griffith S, Subramanian K, Scholz J, Isbell CL, Thomaz AL (2013) Policy shaping: integrating human feedback with reinforcement learning. Georgia Institute of Technology
Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) Persistent rule-based interactive reinforcement learning. Neural Comput Appl 2021:1–18
Google Scholar
Shelton C (2000) Balancing multiple sources of reward in reinforcement learning. Adv Neural Inf Process Syst 13:1082–1088
Google Scholar
Gao Y, Xu H, Lin J, Yu F, Levine S, Darrell T (2018) Reinforcement learning from imperfect demonstrations. arXiv preprint arXiv:1802.05313
Keswani V, Lease M, Kenthapadi K (2021) Towards unbiased and accurate deferral to multiple experts. arXiv preprint arXiv:2102.13004
Owen CEB (2008) Parameter estimation for the beta distribution. Master’s thesis, Brigham Young University-Provo
Yi SKM, Steyvers M, Lee MD, Dry MJ (2012) The wisdom of the crowd in combinatorial problems. Cogn Sci 36(3):452–470
Article Google Scholar
Burke P, Klein R (2020) Confident in the crowd: Bayesian inference to improve data labelling in crowdsourcing. In: 2020 International SAUPEC/RobMech/PRASA conference. IEEE, pp 1–6
Howard RA, Matheson JE (2005) Influence diagrams. Decis Anal 2(3):127–143
Article Google Scholar
Innes C, Lascarides A (2019) Learning structured decision problems with unawareness. In: International conference on machine learning, pp 2941–2950
Cleveland WS (1981) LOWESS: a program for smoothing scatterplots by robust locally weighted regression. Am Stat 35(1):54
Article Google Scholar

Download references

Funding

The authors did not receive support from any organisation for the submitted work.

Author information

Authors and Affiliations

School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa
Tamlin Love, Ritesh Ajoodha & Benjamin Rosman

Authors

Tamlin Love
View author publications
You can also search for this author in PubMed Google Scholar
Ritesh Ajoodha
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Rosman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tamlin Love.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 927 KB)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Love, T., Ajoodha, R. & Rosman, B. Who should I trust? Cautiously learning with unreliable experts. Neural Comput & Applic 35, 16865–16875 (2023). https://doi.org/10.1007/s00521-022-07808-y

Download citation

Received: 14 December 2021
Accepted: 06 September 2022
Published: 27 September 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00521-022-07808-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Who should I trust? Cautiously learning with unreliable experts

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Active inference and the two-step task

Reinforcement Learning Informed by Optimal Control

Assessing Explainability in Reinforcement Learning

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 927 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Who should I trust? Cautiously learning with unreliable experts

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Active inference and the two-step task

Reinforcement Learning Informed by Optimal Control

Assessing Explainability in Reinforcement Learning

Explore related subjects

Code availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 927 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation