Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICRA.2018.8460937guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Learning to Parse Natural Language to Grounded Reward Functions with Weak Supervision

Published: 21 May 2018 Publication History

Abstract

In order to intuitively and efficiently collaborate with humans, robots must learn to complete tasks specified using natural language. We represent natural language instructions as goal-state reward functions specified using lambda calculus. Using reward functions as language representations allows robots to plan efficiently in stochastic environments. To map sentences to such reward functions, we learn a weighted linear Combinatory Categorial Grammar (CCG) semantic parser. The parser, including both parameters and the CCG lexicon, is learned from a validation procedure that does not require execution of a planner, annotating reward functions, or labeling parse trees, unlike prior approaches. To learn a CCG lexicon and parse weights, we use coarse lexical generation and validation-driven perceptron weight updates using the approach of Artzi and Zettlemoyer [4]. We present results on the Cleanup World domain [18] to demonstrate the potential of our approach. We report an F1 score of 0.82 on a collected corpus of 23 tasks containing combinations of nested referential expressions, comparators and object properties with 2037 corresponding sentences. Our goal-condition learning approach enables an improvement of orders of magnitude in computation time over a baseline that performs planning during learning, while achieving comparable results. Further, we conduct an experiment with just 6 labeled demonstrations to show the ease of teaching a robot behaviors using our method. We show that parsing models learned from small data sets can generalize to commands not seen during training.

References

[1]
Ken R. Anderson, Timothy J. Hickey, and Peter Norvig. The jscheme language and implementation, 2013. URL http://jscheme.sourceforge.net/jscheme/main.html.
[2]
Yoav Artzi. Cornell SPF: Cornell Semantic Parsing Framework, 2016.
[3]
Yoav Artzi and Luke Zettlemoyer. Bootstrapping semantic parsers from conversations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011.
[4]
Yoav Artzi and Luke Zettlemoyer. Weakly supervized learning of semantic parsers for mapping instructions to actions. In Annual Meeting of the Association for Computational Linguistics, 2013.
[5]
Yoav Artzi, Dipanjan Das, and Slav Petrov. Learning compact lexicons for ccg semantic parsing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1273–1283. Association for Computational Linguistics, October 2014.
[6]
Dilip Arumugam, Siddharth Karamcheti, Nakul Gopalan, Lawson L. S. Wong, and Stefanie Tellex. Accurately and efficiently interpreting human-robot instructions of varying granularities. In Robotics: Science and Systems XIII, 2017.
[7]
R. Bellman. A Markovian decision process. Indiana University Mathematics Journal, 6:679–684, 1957.
[8]
B. Carpenter. Type-Logical Semantics. MIT Press, Cambridge, MA, USA, 1997.
[9]
Stephen Clark and James R Curran. Wide-coverage efficient statistical parsing with ccg and log-linear models. Computational Linguistics, 33(4):493–552, 2007.
[10]
Carlos Diuk, Andre Cohen, and Michael L. Littman. An object-oriented representation for efficient reinforcement learning. In International Conference on Machine Learning, 2008.
[11]
Nakul Gopalan, Marie Desjardins, Michael L. Littman, James Macglashan, Shawn Squire, Stefanie Tellex, John Winder, and Lawson L. S. Wong. Planning with abstract markov decision processes. In Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling, pages 480–488, 2017.
[12]
Kelvin Guu, Panupong Pasupat, Evan Zheran Liu, and Percy Liang. From language to programs: Bridging reinforcement learning and maximum marginal likelihood. CoRR, abs/1704.07926, 2017.
[13]
Thomas M. Howard, Stefanie Tellex, and Nicholas Roy. A natural language planner interface for mobile manipulators. In IEEE International Conference on Robotics and Automation, 2014.
[14]
Michael Janner, Karthik Narasimhan, and Regina Barzilay. Representation learning for grounded spatial reasoning. arXiv preprint arXiv:, 2017.
[15]
Jayant Krishnamurthy and Tom M. Mitchell. Weakly supervised training of semantic parsers. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012.
[16]
Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. Lexical generalization in ccg grammar induction for semantic parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1512–1523. Association for Computational Linguistics, 2011.
[17]
James MacGlashan. Brown-UMBC Reinforcement Learning and Planning (BURLAP)-Project Page. http://burlap.cs.brown.edu/, 2014.
[18]
James MacGlashan, Monica Babeş-Vroman, Marie Desjardins, Michael L. Littman, Smaranda Muresan, Shawn Squire, Ste-fanie Tellex, Dilip Arumugam, and Lei Yang. Grounding english commands to reward functions. In Robotics: Science and Systems, 2015.
[19]
Dipendra Misra, John Langford, and Yoav Artzi. Mapping instructions and visual observations to actions with reinforcement learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017.
[20]
Andrew Y. Ng and Stuart J. Russell. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, ICML'00, pages 663–670, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. ISBN 1-55860-707-2.
[21]
Amir Pnueli. The temporal logic of programs. In Proceedings of the 18th Annual Symposium on Foundations of Computer Science, SFCS'77, pages 46–57, Washington, DC, USA, 1977. IEEE Computer Society. https://doi.org/10.1109/SFCS.1977.32.
[22]
Morgan Quigley, Josh Faust, Tully Foote, and Jeremy Leibs. Ros: an open-source robot operating system.
[23]
Pierre Sermanet, Kelvin Xu, and Sergey Levine. Unsupervised perceptual rewards for imitation learning. CoRR, abs/1612.06699, 2016.
[24]
Mark Steedman. The Syntactic Process. MIT Press, Cambridge, MA, USA, 2000. ISBN 0-262-19420-1.
[25]
Stefanie Tellex, Thomas Kollar, Steven Dickerson, Matthew R. Walter, Ashis Gopal Banerjee, Seth Teller, and Nicholas Roy. Understanding natural language commands for robotic navigation and mobile manipulation. In AAAI Conference on Artificial Intelligence, 2011.
[26]
Edward C. Williams, Mina Rhee, Nakul Gopalan, and Stefanie Tellex. Learning to parse natural language to grounded reward functions with weak supervision. In AAAI Fall Symposium on Natural Communication for Human-Robot Collaboration, 2017.
[27]
Luke Zettlemoyer and Michael Collins. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, 2005.
[28]
Luke Zettlemoyer and Michael Collins. Online learning of relaxed ccg grammars for parsing to logical form. In Proceedings of the Joint Conference on Empirical Methods in Natural Lanugage Processing and Computational Natural Language Learning, 2007.

Cited By

View all
  • (2024)Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse PlanningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663074(2094-2103)Online publication date: 6-May-2024
  • (2023)RLangProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619620(29161-29178)Online publication date: 23-Jul-2023
  • (2021)A Survey of Collaborative Reinforcement Learning: Interactive Methods and Design PatternsProceedings of the 2021 ACM Designing Interactive Systems Conference10.1145/3461778.3462135(1579-1590)Online publication date: 28-Jun-2021
  • Show More Cited By

Index Terms

  1. Learning to Parse Natural Language to Grounded Reward Functions with Weak Supervision
              Index terms have been assigned to the content through auto-classification.

              Recommendations

              Comments

              Information & Contributors

              Information

              Published In

              cover image Guide Proceedings
              2018 IEEE International Conference on Robotics and Automation (ICRA)
              May 2018
              5954 pages

              Publisher

              IEEE Press

              Publication History

              Published: 21 May 2018

              Qualifiers

              • Research-article

              Contributors

              Other Metrics

              Bibliometrics & Citations

              Bibliometrics

              Article Metrics

              • Downloads (Last 12 months)0
              • Downloads (Last 6 weeks)0
              Reflects downloads up to 04 Oct 2024

              Other Metrics

              Citations

              Cited By

              View all
              • (2024)Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse PlanningProceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems10.5555/3635637.3663074(2094-2103)Online publication date: 6-May-2024
              • (2023)RLangProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619620(29161-29178)Online publication date: 23-Jul-2023
              • (2021)A Survey of Collaborative Reinforcement Learning: Interactive Methods and Design PatternsProceedings of the 2021 ACM Designing Interactive Systems Conference10.1145/3461778.3462135(1579-1590)Online publication date: 28-Jun-2021
              • (2021)Interactive Explanations: Diagnosis and Repair of Reinforcement Learning Based Agent Behaviors2021 IEEE Conference on Games (CoG)10.1109/CoG52621.2021.9618999(01-08)Online publication date: 17-Aug-2021

              View Options

              View options

              Get Access

              Login options

              Media

              Figures

              Other

              Tables

              Share

              Share

              Share this Publication link

              Share on social media