Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Human Feedback as Action Assignment in Interactive Reinforcement Learning

Published: 04 August 2020 Publication History

Abstract

Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner’s performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users’ ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent’s behaviour directly.

References

[1]
Alejandro Agostini, Carme Torras, and Florentin Wörgötter. 2015. Efficient interactive decision-making framework for robotic applications. Artific. Intell. 247, C (2015), 187--212.
[2]
Tom Anthony, Daniel Polani, and Chrystopher L. Nehaniv. 2014. General self-motivation and strategy identification: Case studies based on Sokoban and Pac-Man. IEEE Trans. Comput. Intell. AI Games 6, 1 (2014), 1--17.
[3]
Riku Arakawa, Sosuke Kobayashi, Yuya Unno, Yuta Tsuboi, and Shin-ichi Maeda. 2018. DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback. CoRR abs/1810.11748 (2018). arXiv:1810.11748. http://arxiv.org/abs/1810.11748.
[4]
Brenna D. Argall, Brett Browning, and Manuela Veloso. 2008. Learning robot motion control with demonstration and advice-operators. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 399--404.
[5]
Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robot. Auton. Syst. 57, 5 (2009), 469--483.
[6]
Merwan Barlier, Romain Laroche, and Olivier Pietquin. 2018. Training dialogue systems with human advice. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 999--1007.
[7]
Christoph Bartneck, Dana Kulić, Elizabeth Croft, and Susana Zoghbi. 2009. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int. J. Soc. Robot. 1, 1 (2009), 71--81.
[8]
Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Kötter, Thorsten Meinl, Peter Ohl, Christoph Sieb, Kilian Thiel, and Bernd Wiswedel. 2007. KNIME: The Konstanz information miner. In Studies in Classification, Data Analysis, and Knowledge Organization. Springer.
[9]
Colleen M. Carpinella, Alisa B. Wyman, Michael A. Perez, and Steven J. Stroessner. 2017. The robotic social attributes scale (rosas): Development and validation. In Proceedings of the ACM/IEEE International Conference on Human-robot Interaction. ACM, 254--262.
[10]
Sonia Chernova and Andrea L. Thomaz. 2014. Robot learning from human teachers. Synth. Lect. Artific. Intell. Mach. Learn. 8, 3 (2014), 1--121.
[11]
Francisco Cruz, German I. Parisi, and Stefan Wermter. 2018. Multi-modal feedback for affordance-driven interactive reinforcement learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’18). IEEE, 1--8.
[12]
Francisco Cruz, Johannes Twiefel, Sven Magg, Cornelius Weber, and Stefan Wermter. 2015. Interactive reinforcement learning through speech guidance in a domestic scenario. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’15). IEEE, 1--8.
[13]
M. M. de Graaf and Bertram F. Malle. 2017. How people explain action (and autonomous intelligent systems should too). In Proceedings of the AAAI Fall Symposium on Artificial Intelligence for Human-Robot Interaction.
[14]
Richard Evans. 2002. Varieties of learning. AI Game Programming Wisdom 2 (2002), 15.
[15]
Rachel Gockley, Allison Bruce, Jodi Forlizzi, Marek Michalowski, Anne Mundell, Stephanie Rosenthal, Brennan Sellner, Reid Simmons, Kevin Snipes, Alan C. Schultz, et al. 2005. Designing robots for long-term social interaction. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’05). IEEE, 1338--1343.
[16]
Vinicius G. Goecks, Gregory M. Gremillion, Vernon J. Lawhern, John Valasek, and Nicholas R. Waytowich. 2018. Efficiently combining human demonstrations and interventions for safe training of autonomous systems in real-time. CoRR abs/1810.11545 (2018). arXiv:1810.11545. http://arxiv.org/abs/1810.11545.
[17]
Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles L. Isbell, and Andrea L. Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems. MIT Press, 2625--2633.
[18]
Kao-Shing Hwang, Jin-Ling Lin, Haobin Shi, and Yu-Ying Chen. 2016. Policy learning with human reinforcement. Int. J. Fuzzy Syst. 18, 4 (2016), 618--629.
[19]
Charles Isbell, Christian R. Shelton, Michael Kearns, Satinder Singh, and Peter Stone. 2001. A social reinforcement learning agent. In Proceedings of the 5th International Conference on Autonomous Agents. ACM, 377--384.
[20]
Charles Lee Isbell, Michael Kearns, Dave Kormann, Satinder Singh, and Peter Stone. 2000. Cobot in LambdaMOO: A social statistics agent. In Proceedings of the AAAI International Conference on Artificial Intelligence (AAAI/IAAI’00). 36--41.
[21]
Petr Jarušek and Radek Pelánek. 2010. Difficulty rating of sokoban puzzle. In Proceedings of the 5th Starting AI Researchers’ Symposium (STAIRS’10). 140--150.
[22]
Taemie Kim and Pamela Hinds. 2006. Who should I blame? Effects of autonomy and transparency on attributions in human-robot interaction. In Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’06). IEEE, 80--85.
[23]
W. Bradley Knox and Peter Stone. 2009. Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the Fifth International Conference on Knowledge Capture. ACM, 9--16.
[24]
W. Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 5--12.
[25]
W. Bradley Knox and Peter Stone. 2012. Reinforcement learning from simultaneous human and MDP reward. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 475--482.
[26]
W. Bradley Knox, Peter Stone, and Cynthia Breazeal. 2013. Training a robot via human feedback: A case study. In Proceedings of the International Conference on Social Robotics. Springer, 460--470.
[27]
Samantha Krening. 2018. Newtonian action advice: Integrating human verbal instruction with reinforcement learning. CoRR abs/1804.05821 (2018). arXiv:1804.05821. http://arxiv.org/abs/1804.05821.
[28]
Samantha Krening and Karen M. Feigh. 2018. Interaction algorithm effect on human experience with reinforcement learning. ACM Trans. Hum.-Robot Interact. 7, 2 (2018), 16.
[29]
Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9, 1 (2017), 44--55.
[30]
Gautam Kunapuli, Phillip Odom, Jude W. Shavlik, and Sriraam Natarajan. 2013. Guiding autonomous agents to better behaviors through human advice. In Proceedings of the IEEE 13th International Conference on Data Mining (ICDM’13). IEEE, 409--418.
[31]
Adrián León, Eduardo Morales, Leopoldo Altamirano, and Jaime Ruiz. 2011. Teaching a robot to perform task through imitation and on-line feedback. Progr. Pattern Recogn., Image Anal., Comput. Vision, Appl. (2011), 549--556.
[32]
L. Adrián León, Ana C. Tenorio, and Eduardo F. Morales. 2013. Human interaction for effective reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’13).
[33]
Guangliang Li, Hayley Hung, Shimon Whiteson, and W. Bradley Knox. 2014. Learning from human reward benefits from socio-competitive feedback. In Proceedings of the Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob’14). IEEE, 93--100.
[34]
Guangliang Li, Shimon Whiteson, W. Bradley Knox, and Hayley Hung. 2017. Social interaction for efficient agent learning from human reward. Auton. Agents Multi-Agent Syst. 32, 1 (2017), 1--25.
[35]
Jamy Li. 2015. The benefit of being physically present: A survey of experimental works comparing copresent robots, telepresent robots and virtual agents. Int. J. Hum.-Comput. Studies 77 (2015), 23--37.
[36]
Henry B. Mann and Donald R. Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 1 (1947), 50--60.
[37]
Matthew Marge, Satanjeev Banerjee, and Alexander I. Rudnicky. 2010. Using the Amazon mechanical turk for transcription of spoken language. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP’10). IEEE, 5270--5273.
[38]
Andrew Y. Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the International Conference on machine Learning (ICML’99), Vol. 99. 278--287.
[39]
J. Ross Quinlan. 2014. C4. 5: Programs for Machine Learning. Elsevier.
[40]
Syed Ali Raza, Jesse Clark, and Mary-Anne Williams. 2016. On designing socially acceptable reward shaping. In Proceedings of the International Conference on Social Robotics. Springer, 860--869.
[41]
Syed Ali Raza, Benjamin Johnston, and Mary-Anne Williams. 2016. Reward from demonstration in interactive reinforcement learning. In The Twenty-Ninth International Flairs Conference.
[42]
Jon Sprouse. 2011. A validation of Amazon mechanical turk for the collection of acceptability judgments in linguistic theory. Behav. Res. Methods 43, 1 (2011), 155--167.
[43]
Andrew Stern, Adam Frank, and Ben Resner. 1998. Virtual petz (video session): A hybrid approach to creating autonomous, lifelike dogz and catz. In Proceedings of the 2nd International Conference on Autonomous Agents. ACM, 334--335.
[44]
Sidney Strauss and Margalit Ziv. 2012. Teaching is a natural cognitive ability for humans. Mind, Brain Educat. 6, 4 (2012), 186--196.
[45]
Halit Bener Suay and Sonia Chernova. 2011. Effect of human guidance and state space size on interactive reinforcement learning. In Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’11). IEEE, 1--6.
[46]
Halit Bener Suay, Russell Toris, and Sonia Chernova. 2012. A practical comparison of three robot learning from demonstration algorithm. Int. J. Soc. Robot. 4, 4 (2012), 319--330.
[47]
Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT Press, Cambridge.
[48]
Dag Sverre Syrdal, Kerstin Dautenhahn, Kheng Lee Koay, and Michael L. Walters. 2009. The negative attitudes towards robots scale and reactions to robot behaviour in a live human-robot interaction study. In Adaptive and Emergent Behaviour and Complex Systems. SSAISB.
[49]
Ana C. Tenorio-Gonzalez, Eduardo F. Morales, and Luis Villaseñor-Pineda. 2010. Dynamic reward shaping: Training a robot by voice. In Ibero-American Conference on Artificial Intelligence. Springer, 483--492.
[50]
Andrea Thomaz, Guy Hoffman, Maya Cakmak, et al. 2016. Computational human-robot interaction. Found. Trends Robot. 4, 2--3 (2016), 105--223.
[51]
Andrea L. Thomaz, Guy Hoffman, and Cynthia Breazeal. 2006. Reinforcement learning with human teachers: Understanding how people want to teach robots. In Proceedings of the 15th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’06). IEEE, 352--357.
[52]
Ngo Anh Vien, Wolfgang Ertel, and Tae Choong Chung. 2013. Learning via human feedback in continuous state and action spaces. Appl. Intell. 39, 2 (2013), 267--278.
[53]
Joshua Wainer, David J. Feil-Seifer, Dylan A. Shell, and Maja J. Mataric. 2007. Embodiment and human-robot interaction: A task-based perspective. In Proceedings of the 16th IEEE International Symposium on Robot and Human Interactive Communication (ROMAN’07). IEEE, 872--877.
[54]
Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, and Peter Stone. 2017. Deep TAMER: Interactive agent shaping in high-dimensional state spaces. CoRR abs/1709.10163 (2017). arXiv:1709.10163. http://arxiv.org/abs/1709.10163.
[55]
Christopher John Cornish Hellaby Watkins. 1989. Learning from Delayed Rewards. Ph.D. Dissertation. University of Cambridge, England.
[56]
Nicholas R. Waytowich, Vinicius G. Goecks, and Vernon J. Lawhern. 2018. Cycle-of-learning for autonomous systems from human interaction. CoRR abs/1808.09572 (2018). arXiv:1808.09572. http://arxiv.org/abs/1808.09572.
[57]
Theophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adrià Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter W. Battaglia, David Silver, and Daan Wierstra. 2017. Imagination-Augmented Agents for Deep Reinforcement Learning. CoRR abs/1707.06203 (2017). arXiv:1707.06203. http://arxiv.org/abs/1707.06203.
[58]
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometr. Bull. 1, 6 (1945), 80--83.

Cited By

View all
  • (2023) Calibrated Human-Robot Teaching: What People Do When Teaching Norms to Robots * 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN57019.2023.10309635(1308-1314)Online publication date: 28-Aug-2023
  • (2022)Instruct or EvaluateProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523862(718-722)Online publication date: 7-Mar-2022
  • (2022)Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning PoliciesProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523825(493-501)Online publication date: 7-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems
ACM Transactions on Autonomous and Adaptive Systems  Volume 14, Issue 4
December 2019
88 pages
ISSN:1556-4665
EISSN:1556-4703
DOI:10.1145/3415348
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2020
Accepted: 01 May 2020
Revised: 01 January 2020
Received: 01 January 2019
Published in TAAS Volume 14, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Interactive machine learning
  2. learning from human teachers
  3. reinforcement learning
  4. reward shaping

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Australian Research Council

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)48
  • Downloads (Last 6 weeks)5
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023) Calibrated Human-Robot Teaching: What People Do When Teaching Norms to Robots * 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN57019.2023.10309635(1308-1314)Online publication date: 28-Aug-2023
  • (2022)Instruct or EvaluateProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523862(718-722)Online publication date: 7-Mar-2022
  • (2022)Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning PoliciesProceedings of the 2022 ACM/IEEE International Conference on Human-Robot Interaction10.5555/3523760.3523825(493-501)Online publication date: 7-Mar-2022
  • (2022)Norm Learning with Reward Models from Instructive and Evaluative Feedback2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900563(1634-1640)Online publication date: 29-Aug-2022
  • (2022)Research on Human-in-the-loop Traffic Adaptive Decision Making Method2022 4th International Conference on Robotics and Computer Vision (ICRCV)10.1109/ICRCV55858.2022.9953216(272-276)Online publication date: 25-Sep-2022
  • (2022)Correct Me If I'm Wrong: Using Non-Experts to Repair Reinforcement Learning Policies2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)10.1109/HRI53351.2022.9889604(493-501)Online publication date: 7-Mar-2022
  • (2022)Instruct or Evaluate: How People Choose to Teach Norms to Social Robots2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)10.1109/HRI53351.2022.9889555(718-722)Online publication date: 7-Mar-2022
  • (2021)Designing Human-Robot Interaction with Social IntelligenceProceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3434073.3444865(3-4)Online publication date: 8-Mar-2021
  • (2021)Human-Augmented Prescriptive Analytics With Interactive Multi-Objective Reinforcement LearningIEEE Access10.1109/ACCESS.2021.30966629(100677-100693)Online publication date: 2021

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media