research-article

Human Feedback as Action Assignment in Interactive Reinforcement Learning

Authors:

Mary-Anne WilliamsAuthors Info & Claims

ACM Transactions on Autonomous and Adaptive Systems (TAAS), Volume 14, Issue 4

Article No.: 14, Pages 1 - 24

https://doi.org/10.1145/3404197

Published: 04 August 2020 Publication History

Abstract

Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner’s performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users’ ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent’s behaviour directly.

References

[1]

Alejandro Agostini, Carme Torras, and Florentin Wörgötter. 2015. Efficient interactive decision-making framework for robotic applications. Artific. Intell. 247, C (2015), 187--212.

[2]

Tom Anthony, Daniel Polani, and Chrystopher L. Nehaniv. 2014. General self-motivation and strategy identification: Case studies based on Sokoban and Pac-Man. IEEE Trans. Comput. Intell. AI Games 6, 1 (2014), 1--17.

[3]

Riku Arakawa, Sosuke Kobayashi, Yuya Unno, Yuta Tsuboi, and Shin-ichi Maeda. 2018. DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback. CoRR abs/1810.11748 (2018). arXiv:1810.11748. http://arxiv.org/abs/1810.11748.

[4]

Brenna D. Argall, Brett Browning, and Manuela Veloso. 2008. Learning robot motion control with demonstration and advice-operators. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 399--404.

[5]

Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robot. Auton. Syst. 57, 5 (2009), 469--483.

Digital Library

[6]

Merwan Barlier, Romain Laroche, and Olivier Pietquin. 2018. Training dialogue systems with human advice. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 999--1007.

[7]

Christoph Bartneck, Dana Kulić, Elizabeth Croft, and Susana Zoghbi. 2009. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int. J. Soc. Robot. 1, 1 (2009), 71--81.

[8]

Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Kötter, Thorsten Meinl, Peter Ohl, Christoph Sieb, Kilian Thiel, and Bernd Wiswedel. 2007. KNIME: The Konstanz information miner. In Studies in Classification, Data Analysis, and Knowledge Organization. Springer.

[9]

Colleen M. Carpinella, Alisa B. Wyman, Michael A. Perez, and Steven J. Stroessner. 2017. The robotic social attributes scale (rosas): Development and validation. In Proceedings of the ACM/IEEE International Conference on Human-robot Interaction. ACM, 254--262.

[10]

Sonia Chernova and Andrea L. Thomaz. 2014. Robot learning from human teachers. Synth. Lect. Artific. Intell. Mach. Learn. 8, 3 (2014), 1--121.

[11]

Francisco Cruz, German I. Parisi, and Stefan Wermter. 2018. Multi-modal feedback for affordance-driven interactive reinforcement learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’18). IEEE, 1--8.

[12]

Francisco Cruz, Johannes Twiefel, Sven Magg, Cornelius Weber, and Stefan Wermter. 2015. Interactive reinforcement learning through speech guidance in a domestic scenario. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’15). IEEE, 1--8.

[13]

M. M. de Graaf and Bertram F. Malle. 2017. How people explain action (and autonomous intelligent systems should too). In Proceedings of the AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction.

[14]

Richard Evans. 2002. Varieties of learning. AI Game Programming Wisdom 2 (2002), 15.

[15]

Rachel Gockley, Allison Bruce, Jodi Forlizzi, Marek Michalowski, Anne Mundell, Stephanie Rosenthal, Brennan Sellner, Reid Simmons, Kevin Snipes, Alan C. Schultz, et al. 2005. Designing robots for long-term social interaction. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’05). IEEE, 1338--1343.

[16]

Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, and Nicholas R. Waytowich. 2018. Efficiently combining human demonstrations and interventions for safe training of autonomous systems in real-time. CoRR abs/1810.11545 (2018). arXiv:1810.11545. http://arxiv.org/abs/1810.11545.

[17]

Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, and Andrea L. Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems. MIT Press, 2625--2633.

[18]

Kao-Shing Hwang, Jin-Ling Lin, Haobin Shi, and Yu-Ying Chen. 2016. Policy learning with human reinforcement. Int. J. Fuzzy Syst. 18, 4 (2016), 618--629.

[19]

Charles Isbell, Christian R. Shelton, Michael Kearns, Satinder Singh, and Peter Stone. 2001. A social reinforcement learning agent. In Proceedings of the 5th International Conference on Autonomous Agents. ACM, 377--384.

Digital Library

[20]

Charles Lee Isbell, Michael Kearns, Dave Kormann, Satinder Singh, and Peter Stone. 2000. Cobot in LambdaMOO: A social statistics agent. In Proceedings of the AAAI International Conference on Artificial Intelligence (AAAI/IAAI’00). 36--41.

[21]

Petr Jarušek and Radek Pelánek. 2010. Difficulty rating of sokoban puzzle. In Proceedings of the 5th Starting AI Researchers’ Symposium (STAIRS’10). 140--150.

[22]

Taemie Kim and Pamela Hinds. 2006. Who should I blame? Effects of autonomy and transparency on attributions in human-robot interaction. In Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’06). IEEE, 80--85.

[23]

W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the Fifth International Conference on Knowledge Capture. ACM, 9--16.

Digital Library

[24]

W. Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 5--12.

[25]

W. Bradley Knox and Peter Stone. 2012. Reinforcement learning from simultaneous human and MDP reward. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 475--482.

[26]

W. Bradley Knox, Peter Stone, and Cynthia Breazeal. 2013. Training a robot via human feedback: A case study. In Proceedings of the International Conference on Social Robotics. Springer, 460--470.

Digital Library

[27]

Samantha Krening. 2018. Newtonian action advice: Integrating human verbal instruction with reinforcement learning. CoRR abs/1804.05821 (2018). arXiv:1804.05821. http://arxiv.org/abs/1804.05821.

[28]

Samantha Krening and Karen M. Feigh. 2018. Interaction algorithm effect on human experience with reinforcement learning. ACM Trans. Hum.-Robot Interact. 7, 2 (2018), 16.

Digital Library

[29]

Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9, 1 (2017), 44--55.

[30]

Gautam Kunapuli, Phillip Odom, Jude W. Shavlik, and Sriraam Natarajan. 2013. Guiding autonomous agents to better behaviors through human advice. In Proceedings of the IEEE 13th International Conference on Data Mining (ICDM’13). IEEE, 409--418.

[31]

Adrián León, Eduardo Morales, Leopoldo Altamirano, and Jaime Ruiz. 2011. Teaching a robot to perform task through imitation and on-line feedback. Progr. Pattern Recogn., Image Anal., Comput. Vision, Appl. (2011), 549--556.

[32]

L. Adrián León, Ana C. Tenorio, and Eduardo F. Morales. 2013. Human interaction for effective reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’13).

[33]

Guangliang Li, Hayley Hung, Shimon Whiteson, and W. Bradley Knox. 2014. Learning from human reward benefits from socio-competitive feedback. In Proceedings of the Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob’14). IEEE, 93--100.

[34]

Guangliang Li, Shimon Whiteson, W. Bradley Knox, and Hayley Hung. 2017. Social interaction for efficient agent learning from human reward. Auton. Agents Multi-Agent Syst. 32, 1 (2017), 1--25.

Digital Library

[35]

Jamy Li. 2015. The benefit of being physically present: A survey of experimental works comparing copresent robots, telepresent robots and virtual agents. Int. J. Hum.-Comput. Studies 77 (2015), 23--37.

Digital Library

[36]

Henry B. Mann and Donald R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 1 (1947), 50--60.

[37]

Matthew Marge, Satanjeev Banerjee, and Alexander I. Rudnicky. 2010. Using the Amazon mechanical turk for transcription of spoken language. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP’10). IEEE, 5270--5273.

[38]

Andrew Y. Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the International Conference on machine Learning (ICML’99), Vol. 99. 278--287.

[39]

J. Ross Quinlan. 2014. C4. 5: Programs for Machine Learning. Elsevier.

[40]

Syed Ali Raza, Jesse Clark, and Mary-Anne Williams. 2016. On designing socially acceptable reward shaping. In Proceedings of the International Conference on Social Robotics. Springer, 860--869.

[41]

Syed Ali Raza, Benjamin Johnston, and Mary-Anne Williams. 2016. Reward from demonstration in interactive reinforcement learning. In The Twenty-Ninth International Flairs Conference.

[42]

Jon Sprouse. 2011. A validation of Amazon mechanical turk for the collection of acceptability judgments in linguistic theory. Behav. Res. Methods 43, 1 (2011), 155--167.

[43]

Andrew Stern, Adam Frank, and Ben Resner. 1998. Virtual petz (video session): A hybrid approach to creating autonomous, lifelike dogz and catz. In Proceedings of the 2nd International Conference on Autonomous Agents. ACM, 334--335.

Digital Library

[44]

Sidney Strauss and Margalit Ziv. 2012. Teaching is a natural cognitive ability for humans. Mind, Brain Educat. 6, 4 (2012), 186--196.

[45]

Halit Bener Suay and Sonia Chernova. 2011. Effect of human guidance and state space size on interactive reinforcement learning. In Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’11). IEEE, 1--6.

[46]

Halit Bener Suay, Russell Toris, and Sonia Chernova. 2012. A practical comparison of three robot learning from demonstration algorithm. Int. J. Soc. Robot. 4, 4 (2012), 319--330.

[47]

Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT Press, Cambridge.

[48]

Dag Sverre Syrdal, Kerstin Dautenhahn, Kheng Lee Koay, and Michael L. Walters. 2009. The negative attitudes towards robots scale and reactions to robot behaviour in a live human-robot interaction study. In Adaptive and Emergent Behaviour and Complex Systems. SSAISB.

[49]

Ana C. Tenorio-Gonzalez, Eduardo F. Morales, and Luis Villaseñor-Pineda. 2010. Dynamic reward shaping: Training a robot by voice. In Ibero-American Conference on Artificial Intelligence. Springer, 483--492.

[50]

Andrea Thomaz, Guy Hoffman, Maya Cakmak, et al. 2016. Computational human-robot interaction. Found. Trends Robot. 4, 2--3 (2016), 105--223.

[51]

Andrea L. Thomaz, Guy Hoffman, and Cynthia Breazeal. 2006. Reinforcement learning with human teachers: Understanding how people want to teach robots. In Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’06). IEEE, 352--357.

[52]

Ngo Anh Vien, Wolfgang Ertel, and Tae Choong Chung. 2013. Learning via human feedback in continuous state and action spaces. Appl. Intell. 39, 2 (2013), 267--278.

Digital Library

[53]

Joshua Wainer, David J. Feil-Seifer, Dylan A. Shell, and Maja J. Mataric. 2007. Embodiment and human-robot interaction: A task-based perspective. In Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’07). IEEE, 872--877.

[54]

Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, and Peter Stone. 2017. Deep TAMER: Interactive agent shaping in high-dimensional state spaces. CoRR abs/1709.10163 (2017). arXiv:1709.10163. http://arxiv.org/abs/1709.10163.

[55]

Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. University of Cambridge, England.

[56]

Nicholas R. Waytowich, Vinicius G. Goecks, and Vernon J. Lawhern. 2018. Cycle-of-learning for autonomous systems from human interaction. CoRR abs/1808.09572 (2018). arXiv:1808.09572. http://arxiv.org/abs/1808.09572.

[57]

Theophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adrià Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter W. Battaglia, David Silver, and Daan Wierstra. 2017. Imagination-Augmented Agents for Deep Reinforcement Learning. CoRR abs/1707.06203 (2017). arXiv:1707.06203. http://arxiv.org/abs/1707.06203.

[58]

Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometr. Bull. 1, 6 (1945), 80--83.

Cited By

Chi VMalle B(2023) Calibrated Human-Robot Teaching: What People Do When Teaching Norms to Robots * 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN57019.2023.10309635(1308-1314)Online publication date: 28-Aug-2023
https://doi.org/10.1109/RO-MAN57019.2023.10309635
Chi VMalle BSakamoto DWeiss AHiatt LShiomi M(2022)Instruct or EvaluateProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523862(718-722)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.5555/3523760.3523862
van Waveren SPek CTumova JLeite ISakamoto DWeiss AHiatt LShiomi M(2022)Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning PoliciesProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523825(493-501)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.5555/3523760.3523825
Show More Cited By

Index Terms

Recommendations

Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Newtonian Action Advice: Integrating Human Verbal Instruction with Reinforcement Learning
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

A goal of Interactive Machine Learning is to enable people without specialized training to teach agents how to perform tasks. Many of the existing algorithms that learn from human instructions are evaluated using simulated feedback and focus on how ...
Framing reinforcement learning from human reward

Several studies have demonstrated that reward from a human trainer can be a powerful feedback signal for control-learning algorithms. However, the space of algorithms for learning from such human reward has hitherto not been explored systematically. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems

ACM Transactions on Autonomous and Adaptive Systems Volume 14, Issue 4

December 2019

88 pages

ISSN:1556-4665

EISSN:1556-4703

DOI:10.1145/3415348

Editor:
Bashar Nuseibeh
The Open University, UK, 8 Lero, Ireland

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2020

Accepted: 01 May 2020

Revised: 01 January 2020

Received: 01 January 2019

Published in TAAS Volume 14, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Australian Research Council

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
478
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)5

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chi VMalle B(2023) Calibrated Human-Robot Teaching: What People Do When Teaching Norms to Robots * 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN57019.2023.10309635(1308-1314)Online publication date: 28-Aug-2023
https://doi.org/10.1109/RO-MAN57019.2023.10309635
Chi VMalle BSakamoto DWeiss AHiatt LShiomi M(2022)Instruct or EvaluateProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523862(718-722)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.5555/3523760.3523862
van Waveren SPek CTumova JLeite ISakamoto DWeiss AHiatt LShiomi M(2022)Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning PoliciesProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523825(493-501)Online publication date: 7-Mar-2022
https://dl.acm.org/doi/10.5555/3523760.3523825
Rosen EHsiung EChi VMalle B(2022)Norm Learning with Reward Models from Instructive and Evaluative Feedback2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900563(1634-1640)Online publication date: 29-Aug-2022
https://doi.org/10.1109/RO-MAN53752.2022.9900563
Zhang PLiu WShao J(2022)Research on Human-in-the-loop Traffic Adaptive Decision Making Method2022 4th International Conference on Robotics and Computer Vision (ICRCV)10.1109/ICRCV55858.2022.9953216(272-276)Online publication date: 25-Sep-2022
https://doi.org/10.1109/ICRCV55858.2022.9953216
van Waveren SPek CTumova JLeite I(2022)Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning Policies2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)10.1109/HRI53351.2022.9889604(493-501)Online publication date: 7-Mar-2022
https://doi.org/10.1109/HRI53351.2022.9889604
Chi VMalle B(2022)Instruct or Evaluate: How People Choose to Teach Norms to Social Robots2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)10.1109/HRI53351.2022.9889555(718-722)Online publication date: 7-Mar-2022
https://doi.org/10.1109/HRI53351.2022.9889555
Williams MBethel CPaiva ABroadbent EFeil-Seifer DSzafir D(2021)Designing Human-Robot Interaction with Social IntelligenceProceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3434073.3444865(3-4)Online publication date: 8-Mar-2021
https://dl.acm.org/doi/10.1145/3434073.3444865
Lepenioti KBousdekis AApostolou DMentzas G(2021)Human-Augmented Prescriptive Analytics With Interactive Multi-Objective Reinforcement LearningIEEE Access10.1109/ACCESS.2021.30966629(100677-100693)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3096662

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents