research-article

Leveraging commonsense reasoning and multimodal perception for robot spoken dialog systems

Authors:

Xiaoping ChenAuthors Info & Claims

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Pages 6582 - 6588

https://doi.org/10.1109/IROS.2017.8206570

Published: 24 September 2017 Publication History

Abstract

Probabilistic graphical models, such as partially observable Markov decision processes (POMDPs), have been used in stochastic spoken dialog systems to handle the inherent uncertainty in speech recognition and language understanding. Such dialog systems suffer from the fact that only a relatively small number of domain variables are allowed in the model, so as to ensure the generation of good-quality dialog policies. At the same time, the non-language perception modalities on robots, such as vision-based facial expression recognition and Lidar-based distance detection, can hardly be integrated into this process. In this paper, we use a probabilistic commonsense reasoner to “guide” our POMDP-based dialog manager, and present a principled, multimodal dialog management (MDM) framework that allows the robot's dialog belief state to be seamlessly updated by both observations of human spoken language, and exogenous events such as the change of human facial expressions. The MDM approach has been implemented and evaluated both in simulation and on a real mobile robot using guidance tasks.

References

[1]

L.P. Kaelbling, M.L. Littman, and A.R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1, pp. 99–134, 1998.

[2]

S. Young, M. Gašić, B. Thomson, and J.D. Williams, “Pomdp-based statistical spoken dialog systems: A review,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1160–1179, 2013.

[3]

S. Singh, D. Litman, M. Kearns, and M. Walker, “Optimizing dialogue management with reinforcement learning: Experiments with the njfun system,” Journal of Artificial Intelligence Research, vol. 16, pp. 105–133, 2002.

[4]

N. Roy, J. Pineau, and S. Thrun, “Spoken dialogue management using probabilistic reasoning,” in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2000, pp. 93–100.

[5]

S. Zhang and P. Stone, “Corpp: Commonsense reasoning and probabilistic planning, as applied to dialog with a mobile robot,” in Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015.

[6]

N. Gopalan and S. Tellex, “Modeling and solving human-robot collaborative tasks using pomdps,” in Robotics: Science and Systems 2015: Workshop on Model Learning for Human-Robot Communication, 2015.

[7]

J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: an anytime algorithm for pomdps,” in Proceedings of the 18th international joint conference on Artificial intelligence. Morgan Kaufmann Publishers Inc., 2003, pp. 1025–1030.

[8]

G. Shani, J. Pineau, and R. Kaplow, “A survey of point-based pomdp solvers,” Autonomous Agents and Multi-Agent Systems, pp. 1–51, 2013.

[9]

W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig, “Achieving human parity in conversational speech recognition,” arXiv preprint arXiv:, 2016.

[10]

R. Reiter, “A logic for default reasoning,” Artificial intelligence, vol. 13, no. 1–2, pp. 81–132, 1980.

Digital Library

[11]

M. Gelfond and Y. Kahl, Knowledge representation, reasoning, and the design of intelligent agents: The answer-set programming approach. Cambridge University Press, 2014.

[12]

E. Erdem, M. Gelfond, and N. Leone, “Applications of answer set programming,” AI Magazine, vol. 37, no. 3, pp. 53–68, 2016.

Digital Library

[13]

C. Baral, M. Gelfond, and N. Rushton, “Probabilistic reasoning with answer sets,” Theory and Practice of Logic Programming, vol. 9, no. 01, pp. 57–144, 2009.

Digital Library

[14]

X. Chen, J. Xie, J. Ji, and Z. Sui, “Toward open knowledge enabling for human-robot interaction,” Journal of Human-Robot Interaction, vol. 1, no. 2, pp. 100–117, 2012.

Digital Library

[15]

M. Veloso, J. Biswas, B. Coltin, and S. Rosenthal, “Cobots: Robust symbiotic autonomous mobile service robots,” in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.

[16]

N. Hawes, C. Burbridge, F. Jovan, L. Kunze, B. Lacerda, L. Mudrova, J. Young, J. Wyatt, D. Hebesberger, T. Körtner, et al., “The strands project: Long-term autonomy in everyday environments,” IEEE Robotics and Automation Magazine, 2016.

[17]

P. Khandelwal, S. Zhang, J. Sinapov, M. Leonetti, J. Thomason, F. Yang, I. Gori, M. Svetlik, P. Khante, V. Lifschitz, et al., “Bwibots: A platform for bridging the gap between ai and human-robot interaction research,” The International Journal of Robotics Research, 2017.

[18]

S. Tellex, R.A. Knepper, A. Li, D. Rus, and N. Roy, “Asking for help using inverse semantics.” in Robotics: Science and systems, vol. 2, no. 3, 2014.

[19]

J. Thomason, J. Sinapov, M. Svetlik, P. Stone, and R.J. Mooney, “Learning multi-modal grounded linguistic semantics by playing “i spy”” in Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York City, 2016, pp. 3477–3483.

[20]

H. Kurniawati, Y. Du, D. Hsu, and W.S. Lee, “Motion planning under uncertainty for robotic tasks with long time horizons,” The International Journal of Robotics Research, vol. 30, no. 3, pp. 308–323, 2011.

Digital Library

[21]

S. Zhang, P. Khandelwal, and P. Stone, “Dynamically constructed (po) mdps for adaptive robot planning,” in Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017.

[22]

J. Pearl, Causality. Cambridge university press, 2009.

[23]

M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A.Y. Ng, “Ros: an open-source robot operating system,” in ICRA workshop on open source software, vol. 3, no. 3.2. Kobe, 2009, p. 5.

[24]

T. Baltrušaitis, P. Robinson, and L.-P. Morency, “Openface: an open source facial behavior analysis toolkit,” in Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 2016, pp. 1–10.

Cited By

Duncan JAlambeigi FPryor M(2024)A Survey of Multimodal Perception Methods for Human–Robot Interaction in Social EnvironmentsACM Transactions on Human-Robot Interaction10.1145/365703013:4(1-50)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3657030
Reimann MKunneman FOertel CHindriks K(2024)A Survey on Dialogue Management in Human-robot InteractionACM Transactions on Human-Robot Interaction10.1145/364860513:2(1-22)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3648605
Yuan CYuan CBai YLi Zd'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Logic Enhanced Commonsense Inference with Chain TransformerProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3411895(1763-1772)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3411895

Index Terms

Leveraging commonsense reasoning and multimodal perception for robot spoken dialog systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Human-robot collaborative tutoring using multiparty multimodal spoken dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

In this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is ...
A dialogue system for multimodal human-robot interaction
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

This paper presents a POMDP-based dialogue system for multimodal human-robot interaction (HRI). Our aim is to exploit a dialogical paradigm to allow a natural and robust interaction between the human and the robot. The proposed dialogue system should ...
Importance-Driven Turn-Bidding for spoken dialogue systems
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Current turn-taking approaches for spoken dialogue systems rely on the speaker releasing the turn before the other can take it. This reliance results in restricted interactions that can lead to inefficient dialogues. In this paper we present a model we ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Sep 2017

10678 pages

Copyright © 2017.

Publisher

IEEE Press

Publication History

Published: 24 September 2017

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Duncan JAlambeigi FPryor M(2024)A Survey of Multimodal Perception Methods for Human–Robot Interaction in Social EnvironmentsACM Transactions on Human-Robot Interaction10.1145/365703013:4(1-50)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3657030
Reimann MKunneman FOertel CHindriks K(2024)A Survey on Dialogue Management in Human-robot InteractionACM Transactions on Human-Robot Interaction10.1145/364860513:2(1-22)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3648605
Yuan CYuan CBai YLi Zd'Aquin MDietze SHauff CCurry ECudre Mauroux P(2020)Logic Enhanced Commonsense Inference with Chain TransformerProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3411895(1763-1772)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3340531.3411895

View Options

View options

Figures

Tables

Media

View Table of Conten