research-article

Interpreting Natural Language Instructions Using Language, Vision, and Behavior

Authors:

Luciana Benotti,

Martín VillalbaAuthors Info & Claims

ACM Transactions on Interactive Intelligent Systems (TiiS), Volume 4, Issue 3

Article No.: 13, Pages 1 - 22

https://doi.org/10.1145/2629632

Published: 11 August 2014 Publication History

Abstract

We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.

References

[1]

Yoav Artzi and Luke Zettlemoyer. 2011. Bootstrapping semantic parsers from conversations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). Association for Computational Linguistics, Stroudsburg, PA, 421--432. http://dl.acm.org/citation.cfm&quest;id=2145432.2145481

Digital Library

[2]

Luciana Benotti. 2009. Frolog: An accommodating text-adventure game. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Demonstrations Session (EACL’09). Association for Computational Linguistics, Stroudsburg, PA, 1--4.

Digital Library

[3]

Luciana Benotti and Alexandre Denis. 2011. Prototyping virtual instructors from human-human corpora. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations (ACL’11). Association for Computer Linguistics, Stroudsburg, PA, 62--67.

Digital Library

[4]

Luciana Benotti, Martin Villalba, Tessa Lau, and Julian Cerruti. 2012. Corpus-based interpretation of instructions in virtual environments. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Vol 2: Short Papers). Association for Computational Linguistics, Stroudsburg, PA, 181--186. http://www.aclweb.org/anthology/P12-2036

Digital Library

[5]

Blai Bonet and Héctor Geffner. 2005. mGPT: A probabilistic planner based on heuristic search. Journal of Artificial Intelligence Research 24, 1, 933--944.

Digital Library

[6]

Satchuthananthavale R. K. Branavan, Harr Chen, Luke Zettlemoyer, and Regina Barzilay. 2009. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing (ACL-IJNLP’09). Association for Computational Linguistics, Stroudsburg, PA, 82--90.

Digital Library

[7]

Jean Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics 22, 2, 249--254.

Digital Library

[8]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, 27:1--27:27. Software available at http://www.csie.ntu.edu.tw/&sim;cjlin/libsvm.

Digital Library

[9]

David L. Chen. 2012. Learning Language from Ambiguous Perceptual Context. Ph.D. Dissertation. University of Texas, Austin.

[10]

David L. Chen, Joohyun Kim, and Raymond J. Mooney. 2010. Training a multilingual sportscaster: Using perceptual context to learn language. Journal of Artificial Intelligence Research 37, 1, 397--436. http://dl.acm.org/citation.cfm&quest;id=1861751.1861761

Digital Library

[11]

David L. Chen and Raymond J. Mooney. 2011. Learning to interpret natural language navigation instructions from observations. In Proceedings of the 25th Conference on Artificial Intelligence (AAAI’11) 859--865.

[12]

Sonia Chernova, Nick DePalma, and Cynthia Breazeal. 2011. Crowdsourcing real world human-robot dialog and teamwork through online multiplayer games. AI Magazine 32, 4, 100--111.

Digital Library

[13]

Herbert H. Clark. 1996. Using Language. Cambridge University Press.

[14]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3, 273--297.

[15]

Heriberto Cuayáhuitl and Nina Dethlefs. 2011. Spatially-aware dialogue control using hierarchical reinforcement learning. ACM Transactions on Speech and Language Processing 7, 3, 5:1--5:26.

Digital Library

[16]

Myroslava O. Dzikovska, James F. Allen, and Mary D. Swift. 2008. Linking semantic and knowledge representations in a multi-domain dialogue system. Journal of Logic and Computation 18, 3, 405--430.

Digital Library

[17]

João Gama, Raquel Sebastião, and Pedro Pereira Rodrigues. 2009. Issues in evaluation of stream learning algorithms. In Proceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 329--338.

Digital Library

[18]

Andrew Gargett, Konstantina Garoufi, Alexander Koller, and Kristina Striegnitz. 2010. The GIVE-2 corpus of giving instructions in virtual environments. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10).

[19]

James J. Gibson. 1979. The Ecological Approach to Visual Perception. Houghton Mifflin.

[20]

Dan Goldwasser, Roi Reichart, James Clarke, and Dan Roth. 2011. Confidence driven unsupervised semantic parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT’11). Association for Computational Linguistics, Stroudsburg, PA, 1486--1495. http://dl.acm.org/citation.cfm&quest;id=2002472.2002653

Digital Library

[21]

Peter Gorniak and Deb Roy. 2007. Situated language understanding as filtering perceived affordances. Cognitive Science 31, 2, 197--231.

[22]

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. ACM Special Interest Group on Knowledge Discovery in Data and Data Mining Explorations Newsletter 11, 1, 10--18.

Digital Library

[23]

Jörg Hoffmann. 2003. The Metric-FF planning system: Translating “ignoring delete lists” to numeric state variables. Journal of Artificial Intelligence Research 20, 291--341.

Digital Library

[24]

Bevan Keeley Jones, Mark Johnson, and Sharon Goldwater. 2012. Semantic parsing with Bayesian tree transducers. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers (ACL’12). Association for Computational Linguistics, Stroudsburg, PA, 488--496. http://dl.acm.org/citation.cfm&quest;id=2390524.2390593

Digital Library

[25]

Thomas Kollar, Stefanie Tellex, Deb Roy, and Nicholas Roy. 2010. Toward understanding natural language directions. In Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI’10). IEEE, Los Alamitos, CA, 259--266.

Digital Library

[26]

Alexander Koller, Ralph Debusmann, Malte Gabsdil, and Kristina Striegnitz. 2004. Put my galakmid coin into the dispenser and kick it: Computational linguistics and theorem proving in a computer game. Journal of Logic, Language and Information 13, 2, 187--206.

Digital Library

[27]

Alexander Koller, Kristina Striegnitz, Andrew Gargett, Donna Byron, Justine Cassell, Robert Dale, Johanna Moore, and Jon Oberlander. 2010. Report on the second challenge on generating instructions in virtual environments (GIVE-2). In Proceedings of the 6th International Natural Language Generation Conference (INLG’10). Association for Computational Linguistics, Stroudsburg, PA, 243--250.

Digital Library

[28]

Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. 2010. Inducing probabilistic CCG grammars from logical form with higher-order unification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, Stroudsburg, PA, 1223--1233. http://dl.acm.org/citation.cfm&quest;id=1870658.1870777

Digital Library

[29]

Tessa Lau, Julian Cerruti, Guillermo Manzato, Mateo Bengualid, Jeffrey P. Bigham, and Jeffrey Nichols. 2010. A conversational interface to Web automation. In Proceedings of the 23nd Annual ACM Symposium on User Unterface Software and Technology (UIST’10). ACM, New York, NY, 229--238.

Digital Library

[30]

Tessa Lau, Clemens Drews, and Jeffrey Nichols. 2009. Interpreting written how-to instructions. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09). Morgan Kaufmann, San Francisco, CA, 1433--1438.

Digital Library

[31]

Anton Leuski, Carsten Eickhoff, James Ganis, and Victor Lavrenko. 2012. The BladeMistress corpus: From talk to action in virtual worlds. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association, Istanbul, Turkey, 4060--4067.

[32]

Vladimir Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10, 8, 707--710.

[33]

Percy Liang, Michael Jordan, and Dan Klein. 2013. Learning dependency-based compositional semantics. Computational Linguistics 39, 2, 398--446.

Digital Library

[34]

Matt MacMahon, Brian Stankiewicz, and Benjamin Kuipers. 2006. Walk the talk: Connecting language, knowledge, and action in route instructions. In Proceedings of the 21st National Conference on Artificial Intelligence—Volume 2 (AAAI’06). 1475--1482.

Digital Library

[35]

Cynthia Matuszek, Dieter Fox, and Karl Koscher. 2010. Following directions using statistical machine translation. In Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI’10). ACM, New York, NY, 251--258.

Digital Library

[36]

Sreerama K. Murthy. 1998. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery 2, 4, 345--389.

Digital Library

[37]

Dana Nau, Malik Ghallab, and Paolo Traverso. 2004. Automated Planning: Theory and Practice. Morgan Kaufmann, San Francisco, CA.

Digital Library

[38]

Masoud Nikravesh, Tomohiro Takagi, Masanori Tajima, Akiyoshi Shinmura, Ryosuke Ohgaya, Koji Taniguchi, Kazuyosi Kawahara, Kouta Fukano, and Akiko Aizawa. 2005. Soft computing for perception-based decision processing and analysis: Web-based BISC-DSS. In Soft Computing for Information Processing and Analysis, Masoud Nikravesh, Lotfi Zadeh, and Janusz Kacprzyk (Eds.). Studies in Fuzziness and Soft Computing, Vol. 164. Springer, 93--188.

[39]

Jeff Orkin and Deb Roy. 2009. Automatic learning and generation of social behavior from collective human gameplay. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems—Volume 1. 385--392.

Digital Library

[40]

Jeff Orkin and Deb Roy. 2007. The restaurant game: Learning social behavior and language from thousands of players online. Journal of Game Development 3, 1, 39--60.

[41]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Stroudsburg, PA, 311--318.

Digital Library

[42]

Matthew Purver. 2004. The Theory and Use of Clarification Requests in Dialogue. Ph.D. Dissertation. King’s College, University of London. http://www.dcs.qmul.ac.uk/&sim;mpurver/papers/purver04thesis.pdf.

[43]

Verena Rieser and Oliver Lemon. 2010. Learning human multimodal dialogue strategies. Natural Language Engineering 16, 1, 3--23.

Digital Library

[44]

Sharon Gower Small, Jennifer Stromer-Galley, and Tomek Strzalkowski. 2011. Multi-modal annotation of quest games in Second Life. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies—Volume 1 (ACL-HLT’11). Association for Computational Linguistics, Stroudsburg, PA, 171--179. http://dl.acm.org/citation.cfm&quest;id=2002472.2002495

Digital Library

[45]

Laura Stoia, Donna K. Byron, Darla Magdalene Shockley, and Eric Fosler-Lussier. 2006. Sentence planning for realtime navigational instructions. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers (NAACL-Short’06). Association for Computational Linguistics, Stroudsburg, PA, 157--160.

Digital Library

[46]

Adam Vogel and Dan Jurafsky. 2010. Learning to follow navigational directions. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL’10). Association for Computational Linguistics, Stroudsburg, PA, 806--814.

Digital Library

[47]

Jason D. Williams and Steve Young. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21, 2, 393--422.

Digital Library

[48]

Terry Winograd. 1972. Understanding Natural Language. Academic Press, New York, NY.

Digital Library

Cited By

Stoev TYordanova K(2021)BehavE: Behaviour Understanding Through Automated Generation of Situation ModelsKI 2021: Advances in Artificial Intelligence10.1007/978-3-030-87626-5_27(362-369)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-87626-5_27
Yordanova K(2020)Towards Automated Generation of Semantic Annotation for Activity Recognition Problems2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)10.1109/PerComWorkshops48775.2020.9156147(1-6)Online publication date: Mar-2020
https://doi.org/10.1109/PerComWorkshops48775.2020.9156147
Yordanova K(2018)Extracting Planning Operators from Instructional Texts for Behaviour InterpretationKI 2018: Advances in Artificial Intelligence10.1007/978-3-030-00111-7_19(215-228)Online publication date: 30-Aug-2018
https://doi.org/10.1007/978-3-030-00111-7_19
Show More Cited By

Index Terms

Interpreting Natural Language Instructions Using Language, Vision, and Behavior
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Interactive Intelligent Systems

ACM Transactions on Interactive Intelligent Systems Volume 4, Issue 3

Special Issue on Multiple Modalities in Interactive Systems and Robots

October 2014

115 pages

ISSN:2160-6455

EISSN:2160-6463

DOI:10.1145/2660857

Editors:
Anthony Jameson
German Research Center for Artifi cial Intelligence (DFKI), Germany
,
Krzysztof Gajos
Harvard University, U.S.A.

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 August 2014

Accepted: 01 April 2014

Revised: 01 March 2014

Received: 01 March 2013

Published in TIIS Volume 4, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
288
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Stoev TYordanova K(2021)BehavE: Behaviour Understanding Through Automated Generation of Situation ModelsKI 2021: Advances in Artificial Intelligence10.1007/978-3-030-87626-5_27(362-369)Online publication date: 27-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-87626-5_27
Yordanova K(2020)Towards Automated Generation of Semantic Annotation for Activity Recognition Problems2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)10.1109/PerComWorkshops48775.2020.9156147(1-6)Online publication date: Mar-2020
https://doi.org/10.1109/PerComWorkshops48775.2020.9156147
Yordanova K(2018)Extracting Planning Operators from Instructional Texts for Behaviour InterpretationKI 2018: Advances in Artificial Intelligence10.1007/978-3-030-00111-7_19(215-228)Online publication date: 30-Aug-2018
https://doi.org/10.1007/978-3-030-00111-7_19
Cuayáhuitl HFrommberger LDethlefs NRaux AMarge MZender H(2014)Introduction to the Special Issue on Machine Learning for Multiple Modalities in Interactive Systems and RobotsACM Transactions on Interactive Intelligent Systems (TiiS)10.1145/26705394:3(1-6)Online publication date: 14-Oct-2014
https://dl.acm.org/doi/10.1145/2670539

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents