Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3306127.3331761acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article
Public Access

Newtonian Action Advice: Integrating Human Verbal Instruction with Reinforcement Learning

Published: 08 May 2019 Publication History

Abstract

A goal of Interactive Machine Learning is to enable people without specialized training to teach agents how to perform tasks. Many of the existing algorithms that learn from human instructions are evaluated using simulated feedback and focus on how quickly the agent learns. While this is valuable information, it ignores important aspects of the human-agent interaction such as frustration. To correct this, we propose a method for the design and verification of interactive algorithms that includes a human-subject study that measures the human's experience working with the agent. In this paper, we present Newtonian Action Advice, a method of incorporating human verbal action advice with Reinforcement Learning in a way that improves the human-agent interaction. In addition to simulations, we validated the Newtonian Action Advice algorithm by conducting a human-subject experiment. The results show that Newtonian Action Advice can perform better than Policy Shaping, a state-of-the-art IML algorithm, both in terms of RL metrics like cumulative reward and human factors metrics like frustration.

References

[1]
Brenna D Argall, Brett Browning, and Manuela Veloso. 2008. Learning robot motion control with demonstration and advice-operators. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on. IEEE, 399--404.
[2]
Thomas Cederborg, Ishaan Grover, Charles L Isbell, and Andrea Lockerd Thomaz. 2015. Policy Shaping with Human Teachers. In IJCAI. 3366--3372.
[3]
Richard Dearden, Nir Friedman, and Stuart Russell. 1998. Bayesian Q-learning. In AAAI/IAAI. 761--768.
[4]
Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles Isbell, and Andrea L Thomaz. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in Neural Information Processing Systems. 2625--2633.
[5]
David Huggins-Daines, Mohit Kumar, Arthur Chan, Alan W Black, Mosur Ravishankar, and Alexander I Rudnicky. 2006. Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on, Vol. 1. IEEE, I--I.
[6]
Charles Isbell, Christian R Shelton, Michael Kearns, Satinder Singh, and Peter Stone. 2001. A social reinforcement learning agent. In Proceedings of the fifth international conference on Autonomous agents. ACM, 377--384.
[7]
Madhura Joshi, Rakesh Khobragade, Saurabh Sarda, Umesh Deshpande, and Swati Mohan. 2012. Object-Oriented Representation and Hierarchical Reinforcement Learning in Infinite Mario. In Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on, Vol. 1. IEEE, 1076--1081.
[8]
W Bradley Knox and Peter Stone. 2010. Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, 5--12.
[9]
Samantha Krening and Karen M Feigh. 2018a. Characteristics that influence perceived intelligence in AI design. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 62. SAGE Publications Sage CA: Los Angeles, CA, 1637--1641.
[10]
Samantha Krening and Karen M Feigh. 2018b. Interaction Algorithm Effect on Human Experience with Reinforcement Learning. ACM Transactions on Human-Robot Interaction (THRI), Vol. 7, 2 (2018), 16.
[11]
Samantha Krening, Brent Harrison, Karen M Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from Explanations using Sentiment and Advice in RL. IEEE Transactions on Cognitive and Developmental Systems, Vol. 9, 1 (2017), 44--55.
[12]
Gregory Kuhlmann, Peter Stone, Raymond Mooney, and Jude Shavlik. 2004. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In The AAAI-2004 workshop on supervisory control of learning and adaptive systems .
[13]
James MacGlashan, Monica Babes-Vroman, Marie desJardins, Michael Littman, Smaranda Muresan, Shawn Squire, Stefanie Tellex, Dilip Arumugam, and Lei Yang. 2015. Grounding English Commands to Reward Functions. In Proceedings of Robotics: Science and Systems . Rome, Italy.
[14]
Richard Maclin, Jude Shavlik, Lisa Torrey, Trevor Walker, and Edward Wild. 2005. Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In Proceedings of the National Conference on Artificial intelligence, Vol. 20. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999, 819.
[15]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 55--60. http://www.aclweb.org/anthology/P/P14/P14--5010
[16]
Cetin Mericc li, Steven D Klee, Jack Paparian, and Manuela Veloso. 2014. An interactive approach for situated task specification through verbal instructions. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems. International Foundation for Autonomous Agents and Multiagent Systems, 1069--1076.
[17]
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 115--124.
[18]
Bo Pang and Lillian Lee. 2008. Opinion mining and sentiment analysis. Foundations and trends in information retrieval, Vol. 2, 1--2 (2008), 1--135.
[19]
Himanshu Sahni, Brent Harrison, Kaushik Subramanian, Thomas Cederborg, Charles Isbell, and Andrea Thomaz. 2016. Policy shaping in domains with multiple optimal policies. In Proceedings of the 2016 International Conference on AAMAS. International Foundation for AAMAS, 1455--1456.
[20]
Manimaran Sivasamy Sivamurugan and Balaraman Ravindran. 2012. Instructing a Reinforcement Learner. In FLAIRS Conference .
[21]
Burrhus Frederic Skinner. 1938. The behavior of organisms: An experimental analysis. (1938).
[22]
Richard Socher, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP), Vol. 1631. Citeseer, 1642.
[23]
Kaushik Subramanian, Charles L Isbell Jr, and Andrea L Thomaz. 2016. Exploration from demonstration for interactive reinforcement learning. In Proceedings of the 2016 International Conference on AAMAS. International Foundation for AAMAS, 447--456.
[24]
Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction . Vol. 1. MIT press Cambridge.
[25]
Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R Walter, Ashis Gopal Banerjee, Seth J Teller, and Nicholas Roy. 2011. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation. In AAAI, Vol. 1. 2.
[26]
Jesse Thomason, Shiqi Zhang, Raymond J Mooney, and Peter Stone. 2015. Learning to Interpret Natural Language Commands through Human-Robot Dialog. In IJCAI . 1923--1929.
[27]
Andrea L Thomaz and Cynthia Breazeal. 2008. Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence, Vol. 172, 6--7 (2008), 716--737.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems
May 2019
2518 pages
ISBN:9781450363099

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2019

Check for updates

Author Tags

  1. human-subject experiment
  2. interactive machine learning
  3. learning from human teachers
  4. natural language interface
  5. reinforcement learning
  6. verification

Qualifiers

  • Research-article

Funding Sources

  • Office of Naval Research

Conference

AAMAS '19
Sponsor:

Acceptance Rates

AAMAS '19 Paper Acceptance Rate 193 of 793 submissions, 24%;
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 115
    Total Downloads
  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)9
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media