Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/IROS.2017.8206570guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Leveraging commonsense reasoning and multimodal perception for robot spoken dialog systems

Published: 24 September 2017 Publication History

Abstract

Probabilistic graphical models, such as partially observable Markov decision processes (POMDPs), have been used in stochastic spoken dialog systems to handle the inherent uncertainty in speech recognition and language understanding. Such dialog systems suffer from the fact that only a relatively small number of domain variables are allowed in the model, so as to ensure the generation of good-quality dialog policies. At the same time, the non-language perception modalities on robots, such as vision-based facial expression recognition and Lidar-based distance detection, can hardly be integrated into this process. In this paper, we use a probabilistic commonsense reasoner to “guide” our POMDP-based dialog manager, and present a principled, multimodal dialog management (MDM) framework that allows the robot's dialog belief state to be seamlessly updated by both observations of human spoken language, and exogenous events such as the change of human facial expressions. The MDM approach has been implemented and evaluated both in simulation and on a real mobile robot using guidance tasks.

References

[1]
L.P. Kaelbling, M.L. Littman, and A.R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1, pp. 99–134, 1998.
[2]
S. Young, M. Gašić, B. Thomson, and J.D. Williams, “Pomdp-based statistical spoken dialog systems: A review,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1160–1179, 2013.
[3]
S. Singh, D. Litman, M. Kearns, and M. Walker, “Optimizing dialogue management with reinforcement learning: Experiments with the njfun system,” Journal of Artificial Intelligence Research, vol. 16, pp. 105–133, 2002.
[4]
N. Roy, J. Pineau, and S. Thrun, “Spoken dialogue management using probabilistic reasoning,” in Proceedings of the 38th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2000, pp. 93–100.
[5]
S. Zhang and P. Stone, “Corpp: Commonsense reasoning and probabilistic planning, as applied to dialog with a mobile robot,” in Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015.
[6]
N. Gopalan and S. Tellex, “Modeling and solving human-robot collaborative tasks using pomdps,” in Robotics: Science and Systems 2015: Workshop on Model Learning for Human-Robot Communication, 2015.
[7]
J. Pineau, G. Gordon, and S. Thrun, “Point-based value iteration: an anytime algorithm for pomdps,” in Proceedings of the 18th international joint conference on Artificial intelligence. Morgan Kaufmann Publishers Inc., 2003, pp. 1025–1030.
[8]
G. Shani, J. Pineau, and R. Kaplow, “A survey of point-based pomdp solvers,” Autonomous Agents and Multi-Agent Systems, pp. 1–51, 2013.
[9]
W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig, “Achieving human parity in conversational speech recognition,” arXiv preprint arXiv:, 2016.
[10]
R. Reiter, “A logic for default reasoning,” Artificial intelligence, vol. 13, no. 1–2, pp. 81–132, 1980.
[11]
M. Gelfond and Y. Kahl, Knowledge representation, reasoning, and the design of intelligent agents: The answer-set programming approach. Cambridge University Press, 2014.
[12]
E. Erdem, M. Gelfond, and N. Leone, “Applications of answer set programming,” AI Magazine, vol. 37, no. 3, pp. 53–68, 2016.
[13]
C. Baral, M. Gelfond, and N. Rushton, “Probabilistic reasoning with answer sets,” Theory and Practice of Logic Programming, vol. 9, no. 01, pp. 57–144, 2009.
[14]
X. Chen, J. Xie, J. Ji, and Z. Sui, “Toward open knowledge enabling for human-robot interaction,” Journal of Human-Robot Interaction, vol. 1, no. 2, pp. 100–117, 2012.
[15]
M. Veloso, J. Biswas, B. Coltin, and S. Rosenthal, “Cobots: Robust symbiotic autonomous mobile service robots,” in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
[16]
N. Hawes, C. Burbridge, F. Jovan, L. Kunze, B. Lacerda, L. Mudrova, J. Young, J. Wyatt, D. Hebesberger, T. Körtner, et al., “The strands project: Long-term autonomy in everyday environments,” IEEE Robotics and Automation Magazine, 2016.
[17]
P. Khandelwal, S. Zhang, J. Sinapov, M. Leonetti, J. Thomason, F. Yang, I. Gori, M. Svetlik, P. Khante, V. Lifschitz, et al., “Bwibots: A platform for bridging the gap between ai and human-robot interaction research,” The International Journal of Robotics Research, 2017.
[18]
S. Tellex, R.A. Knepper, A. Li, D. Rus, and N. Roy, “Asking for help using inverse semantics.” in Robotics: Science and systems, vol. 2, no. 3, 2014.
[19]
J. Thomason, J. Sinapov, M. Svetlik, P. Stone, and R.J. Mooney, “Learning multi-modal grounded linguistic semantics by playing “i spy”” in Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York City, 2016, pp. 3477–3483.
[20]
H. Kurniawati, Y. Du, D. Hsu, and W.S. Lee, “Motion planning under uncertainty for robotic tasks with long time horizons,” The International Journal of Robotics Research, vol. 30, no. 3, pp. 308–323, 2011.
[21]
S. Zhang, P. Khandelwal, and P. Stone, “Dynamically constructed (po) mdps for adaptive robot planning,” in Thirty-First AAAI Conference on Artificial Intelligence (AAAI), 2017.
[22]
J. Pearl, Causality. Cambridge university press, 2009.
[23]
M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A.Y. Ng, “Ros: an open-source robot operating system,” in ICRA workshop on open source software, vol. 3, no. 3.2. Kobe, 2009, p. 5.
[24]
T. Baltrušaitis, P. Robinson, and L.-P. Morency, “Openface: an open source facial behavior analysis toolkit,” in Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 2016, pp. 1–10.

Cited By

View all
  • (2024)A Survey of Multimodal Perception Methods for Human–Robot Interaction in Social EnvironmentsACM Transactions on Human-Robot Interaction10.1145/365703013:4(1-50)Online publication date: 29-Apr-2024
  • (2024)A Survey on Dialogue Management in Human-robot InteractionACM Transactions on Human-Robot Interaction10.1145/364860513:2(1-22)Online publication date: 14-Jun-2024
  • (2020)Logic Enhanced Commonsense Inference with Chain TransformerProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3411895(1763-1772)Online publication date: 19-Oct-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Sep 2017
10678 pages

Publisher

IEEE Press

Publication History

Published: 24 September 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Survey of Multimodal Perception Methods for Human–Robot Interaction in Social EnvironmentsACM Transactions on Human-Robot Interaction10.1145/365703013:4(1-50)Online publication date: 29-Apr-2024
  • (2024)A Survey on Dialogue Management in Human-robot InteractionACM Transactions on Human-Robot Interaction10.1145/364860513:2(1-22)Online publication date: 14-Jun-2024
  • (2020)Logic Enhanced Commonsense Inference with Chain TransformerProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3411895(1763-1772)Online publication date: 19-Oct-2020

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media