research-article

Public Access

Newtonian Action Advice: Integrating Human Verbal Instruction with Reinforcement Learning

Authors:

Samantha Krening,

Karen M. FeighAuthors Info & Claims

AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Pages 720 - 727

Published: 08 May 2019 Publication History

Abstract

A goal of Interactive Machine Learning is to enable people without specialized training to teach agents how to perform tasks. Many of the existing algorithms that learn from human instructions are evaluated using simulated feedback and focus on how quickly the agent learns. While this is valuable information, it ignores important aspects of the human-agent interaction such as frustration. To correct this, we propose a method for the design and verification of interactive algorithms that includes a human-subject study that measures the human's experience working with the agent. In this paper, we present Newtonian Action Advice, a method of incorporating human verbal action advice with Reinforcement Learning in a way that improves the human-agent interaction. In addition to simulations, we validated the Newtonian Action Advice algorithm by conducting a human-subject experiment. The results show that Newtonian Action Advice can perform better than Policy Shaping, a state-of-the-art IML algorithm, both in terms of RL metrics like cumulative reward and human factors metrics like frustration.

References

[1]

Brenna D Argall, Brett Browning, and Manuela Veloso. 2008. Learning robot motion control with demonstration and advice-operators. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on. IEEE, 399--404.

[2]

Thomas Cederborg, Ishaan Grover, Charles L Isbell, and Andrea Lockerd Thomaz. 2015. Policy Shaping with Human Teachers. In IJCAI. 3366--3372.

Digital Library

[3]

Richard Dearden, Nir Friedman, and Stuart Russell. 1998. Bayesian Q-learning. In AAAI/IAAI. 761--768.

Digital Library

[4]

Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles Isbell, and Andrea L Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems. 2625--2633.

Digital Library

[5]

David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W Black, Mosur Ravishankar, and Alexander I Rudnicky. 2006. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, Vol. 1. IEEE, I--I.

[6]

Charles Isbell, Christian R Shelton, Michael Kearns, Satinder Singh, and Peter Stone. 2001. A social reinforcement learning agent. In Proceedings of the fifth international conference on Autonomous agents. ACM, 377--384.

Digital Library

[7]

Madhura Joshi, Rakesh Khobragade, Saurabh Sarda, Umesh Deshpande, and Swati Mohan. 2012. Object-Oriented Representation and Hierarchical Reinforcement Learning in Infinite Mario. In Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on, Vol. 1. IEEE, 1076--1081.

Digital Library

[8]

W Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12.

Digital Library

[9]

Samantha Krening and Karen M Feigh. 2018a. Characteristics that influence perceived intelligence in AI design. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 62. SAGE Publications Sage CA: Los Angeles, CA, 1637--1641.

[10]

Samantha Krening and Karen M Feigh. 2018b. Interaction Algorithm Effect on Human Experience with Reinforcement Learning. ACM Transactions on Human-Robot Interaction (THRI), Vol. 7, 2 (2018), 16.

Digital Library

[11]

Samantha Krening, Brent Harrison, Karen M Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from Explanations using Sentiment and Advice in RL. IEEE Transactions on Cognitive and Developmental Systems, Vol. 9, 1 (2017), 44--55.

[12]

Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik. 2004. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In The AAAI-2004 workshop on supervisory control of learning and adaptive systems .

[13]

James MacGlashan, Monica Babes-Vroman, Marie desJardins, Michael Littman, Smaranda Muresan, Shawn Squire, Stefanie Tellex, Dilip Arumugam, and Lei Yang. 2015. Grounding English Commands to Reward Functions. In Proceedings of Robotics: Science and Systems . Rome, Italy.

[14]

Richard Maclin, Jude Shavlik, Lisa Torrey, Trevor Walker, and Edward Wild. 2005. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Proceedings of the National Conference on Artificial intelligence, Vol. 20. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 819.

Digital Library

[15]

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 55--60. http://www.aclweb.org/anthology/P/P14/P14--5010

[16]

Cetin Mericc li, Steven D Klee, Jack Paparian, and Manuela Veloso. 2014. An interactive approach for situated task specification through verbal instructions. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1069--1076.

Digital Library

[17]

Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 115--124.

Digital Library

[18]

Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, Vol. 2, 1--2 (2008), 1--135.

Digital Library

[19]

Himanshu Sahni, Brent Harrison, Kaushik Subramanian, Thomas Cederborg, Charles Isbell, and Andrea Thomaz. 2016. Policy shaping in domains with multiple optimal policies. In Proceedings of the 2016 International Conference on AAMAS. International Foundation for AAMAS, 1455--1456.

Digital Library

[20]

Manimaran Sivasamy Sivamurugan and Balaraman Ravindran. 2012. Instructing a Reinforcement Learner. In FLAIRS Conference .

[21]

Burrhus Frederic Skinner. 1938. The behavior of organisms: An experimental analysis. (1938).

[22]

Richard Socher, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP), Vol. 1631. Citeseer, 1642.

[23]

Kaushik Subramanian, Charles L Isbell Jr, and Andrea L Thomaz. 2016. Exploration from demonstration for interactive reinforcement learning. In Proceedings of the 2016 International Conference on AAMAS. International Foundation for AAMAS, 447--456.

Digital Library

[24]

Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction . Vol. 1. MIT press Cambridge.

Digital Library

[25]

Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R Walter, Ashis Gopal Banerjee, Seth J Teller, and Nicholas Roy. 2011. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation. In AAAI, Vol. 1. 2.

Digital Library

[26]

Jesse Thomason, Shiqi Zhang, Raymond J Mooney, and Peter Stone. 2015. Learning to Interpret Natural Language Commands through Human-Robot Dialog. In IJCAI . 1923--1929.

Digital Library

[27]

Andrea L Thomaz and Cynthia Breazeal. 2008. Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence, Vol. 172, 6--7 (2008), 716--737.

Digital Library

Cited By

Index Terms

Newtonian Action Advice: Integrating Human Verbal Instruction with Reinforcement Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Natural language interfaces
  2. Interaction design

Recommendations

Effect of Interaction Design on the Human Experience with Interactive Reinforcement Learning
DIS '19: Proceedings of the 2019 on Designing Interactive Systems Conference

A goal of interactive machine learning (IML) is to enable people with no machine learning knowledge to intuitively teach intelligent agents how to perform tasks. This study investigates how three factors of the design of an interactive reinforcement ...
Interaction Algorithm Effect on Human Experience with Reinforcement Learning
Special Issue on Artificial Intelligence and Human-Robot Interaction

A goal of interactive machine learning (IML) is to enable people with no specialized training to intuitively teach intelligent agents how to perform tasks. Toward achieving that goal, we are studying how the design of the interaction method for a ...
Reinforcement learning of motor skills using Policy Search and human corrective advice

Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

May 2019

2518 pages

ISBN:9781450363099

General Chairs:
Edith Elkind
University of Oxford, UK
,
Manuela Veloso
CMU (on leave), JPMorgan, USA
,
Program Chairs:
Noa Agmon
Bar-Ilan University, Israel
,
Matthew E. Taylor
Borealis AI, Canada

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2019

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Office of Naval Research

Conference

AAMAS '19

Sponsor:

SIGAI

AAMAS '19: International Conference on Autonomous Agents and Multiagent Systems

May 13 - 17, 2019

Montreal QC, Canada

Acceptance Rates

AAMAS '19 Paper Acceptance Rate 193 of 793 submissions, 24%;

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
115
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)9

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents