Article

Using vision, acoustics, and natural language for disambiguation

Authors:

Benjamin Fransen,

Eric Martinson,

Samuel Blisard,

Alan Schultz, and

Dennis PerzanowskiAuthors Info & Claims

HRI '07: Proceedings of the ACM/IEEE international conference on Human-robot interaction

March 2007

Pages 73 - 80

https://doi.org/10.1145/1228716.1228727

Published: 10 March 2007 Publication History

Abstract

Creating a human-robot interface is a daunting experience. Capabilities and functionalities of the interface are dependent on the robustness of many different sensor and input modalities. For example, object recognition poses problems for state-of-the-art vision systems. Speech recognition in noisy environments remains problematic for acoustic systems. Natural language understanding and dialog are often limited to specific domains and baffled by ambiguous or novel utterances. Plans based on domain-specific tasks limit the applicability of dialog managers. The types of sensors used limit spatial knowledge and understanding, and constrain cognitive issues, such as perspective-taking.In this research, we are integrating several modalities, such as vision, audition, and natural language understanding to leverage the existing strengths of each modality and overcome individual weaknesses. We are using visual, acoustic, and linguistic inputs in various combinations to solve such problems as the disambiguation of referents (objects in the environment), localization of human speakers, and determination of the source of utterances and appropriateness of responses when humans and robots interact. For this research, we limit our consideration to the interaction of two humans and one robot in a retrieval scenario. This paper will describe the system and integration of the various modules prior to future testing.

References

[1]

Argyros, A. A., and Lourakis, M.I.A. Real-time tracking of multiple skin-colored objects with a possibly moving camera. In European Conf. on Computer Vision (ECCV 2004), vol. 3, 2004, 368--379.

[2]

Artières, T., Marchand, J.-M., Gallinari, P., and Dorizzi, B. Multimodal segmental models for on-line handwriting recognition. In Intl. Conf. On Pattern Recognition, 2000, 2247--2250.

[3]

Bischoff, R. and Graefe, V. Design principles for dependable robotic assistants. In Intl. Journal on Humanoid Robotics, vol. 1, no. 1, (March 2004), 95--125.

[4]

Bolt, R. A. "Put-that-there": Voice and gesture at the graphics interface. In Proceedings of the 7th Annual Conf. on Computer Graphics and Interactive Techniques, (July 1980), 262--270.

Digital Library

[5]

Bradski, G., Kaehler, A., and Pisarevsky, V. Learning-based computer vision with Intel's Open Source computer vision library. In Intel Technology Journal, (May 2005).

[6]

Bradski, G.R. Computer Vision Face Tracking for Use in a Perceptual User Interface, Intel Technology Journal, 1998.

[7]

Brooks, A.G., and Breazeal, C. Working with robots and objects: Revisiting deictic reference for achieving spatial common ground. In Proceedings of the 2006 ACM Conference on Human-Robot Interaction, (March 2006), 297--304.

Digital Library

[8]

Collins, R.T., Liu, Y., and Leordeanu, M. Online selection of discriminative tracking features. In IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, (October 2005), 1631--1643.

Digital Library

[9]

Erman, L.D., Hayes-Roth, F., Lesser, V.R., and Reddy, D. The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty. In Computing Surveys, vol. 12, no. 2, (June 1980), 213--253.

Digital Library

[10]

Harris, T.K., Banerjee, S., Rudnicky, A., Sison, J., Bodine, K., and Black, A. A research platform for multi-agent dialogue dynamics. In Proceedings of the IEEE Intl. Workshop on Robotics and Human Interactive Communications, (September) 2004, 497--502.

[11]

Iba, S., Weghe, M.V., Paredis, C., and Khosla, P. An architecture for gesture based control of mobile robots. In Proceedings of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS'99), vol. 2, (October 1999), 851--857.

[12]

Isard, M., and Blake, A. Condensation¿conditional density propagation for visual tracking. In Intl. Journal of Computer Vision, (August 1998), vol. 29, no. 1, 5--28.

Digital Library

[13]

Kim, S., Smyth, P., and Luther, S. Modeling waveform shapes with random effects segmental Hidden Markov models. In AUAI '04: Proceedings of the 20th Conf. on Uncertainty in Artificial Intelligence, 2004, 309--316.

Digital Library

[14]

Kruppa, H., Castrillon-Santana, M., and Schiele, B. Fast and robust face finding via local context. In Joint IEEE Intl. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, (October 2003), 157--164.

[15]

Lienhart, L. and Maydt, J. An extended set of Haar-like features for rapid object detection. In Intl. Conf. on Image Processing, vol.1, 2002, 900--903.

[16]

Lowe, D.G. Object recognition from local scale invariant features. In Proceedings of the Seventh Intl. Conf. On Computer Vision (ICCV'99), (September 1999), 1150--1157.

Digital Library

[17]

Luke, R.H., Blisard, S.N., Keller, J.M., Skubic, M. Linguistic spatial relations of three dimensional scenes using SIFT keypoints. In IEEE Intl. Workshop on Robot and Human Interactive Communication: RO-MAN 2005, (August 2005), 704--709.

[18]

Mungamuru, B., and Aarabi, P. Enhanced sound localization. In IEEE Trans. on Systems, Man, and Cybernetics Part B, vol. 34, no. 3, (June 2004), 1526--1540.

Digital Library

[19]

Ostendorf, M., Digalakis, V., and Kimball, O. From HMMs to segment models: a unified view of stochastic modeling for speech recognition. In IEEE Trans. on Acoustics, vol. 4, 1996, 360--378.

[20]

Paquin, V., and Cohen, P. A vision-based gestural guidance interface for mobile robotic platforms. In Computer Vision in Human-Computer Interaction: ECCV Workshop in HCI Proceedings, 2004, 39--47.

[21]

Perzanowski, D., Schultz, A., Adams, W., Bugajska, M., Marsh, E., Trafton, G., Brock, D., Skubic, M., and Abramson, M. Communicating with teams of cooperative robots. In Multi-Robot Systems: From Swarms to Intelligent Automata. Kluwer: The Netherlands, 2002, 185--193.

[22]

Quatieri, T. Discrete Time Speech Signal Processing, Pearson Education, Inc.: Dehli, India, 2002.

Digital Library

[23]

Sidner, C.L., Kidd, C.D., Lee, C.H., and Lesh, N., Where to look: A study of human robot engagement. In ACM International Conference on Intelligent User Interfaces, (January 1994), 78--84.

Digital Library

[24]

Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams W., Bugajska, M. Spatial language for human-robot dialogs. In IEEE Trans. on Systems, Man, and Cybernetics, Special Issue on Human-Robot Interaction, (May 2004), vol. 34, no. 2, 154--167.

Digital Library

[25]

Sobottka, K., and Pitas, I. Face localization and facial feature extraction based on shape and color information. In Proc. of the Intl.Conf. on Image Processing, (September 1996), 483--486.

[26]

Starner, T, and Pentland, A. Visual recognition of American Sign Language using Hidden Markov Models. In Intl. Workshop on Automatic Face and Gesture Recognition, 1995, 189--194.

[27]

Yang, C., Duraiswami, R., and Davis, L.S. Fast multiple object tracking via a hierarchical particle filter. In IEEE Intl. Conf. On Computer Vision, vol. 1, 2005, 212--219.

Digital Library

[28]

Zhao, L., and Davis, L.S. Iterative figure-ground discrimination. In Intl. Conf. on Pattern Recognition (ICPR), 2005, 67--70.

Digital Library

Cited By

Huang KHan YWu JQiu FTang Q(2022)Language-Driven Robot Manipulation With Perspective Disambiguation and Placement OptimizationIEEE Robotics and Automation Letters10.1109/LRA.2022.31469557:2(4188-4195)Online publication date: Apr-2022
https://doi.org/10.1109/LRA.2022.3146955
Gervits FBriggs GRoque AKadomatsu GThurston DScheutz MMarge M(2021)Decision-Theoretic Question Generation for Situated Reference Resolution: An Empirical Study and Computational ModelProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3479925(150-158)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3462244.3479925
Marge MRudnicky A(2019)Miscommunication Detection and Recovery in Situated Human–Robot DialogueACM Transactions on Interactive Intelligent Systems10.1145/32371899:1(1-40)Online publication date: 17-Feb-2019
https://dl.acm.org/doi/10.1145/3237189
Show More Cited By

Index Terms

Recommendations

Human-robot collaborative tutoring using multiparty multimodal spoken dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction

In this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is ...
Read More
INTEGRATION OF A VOICE RECOGNITION SYSTEM IN A SOCIAL ROBOT

Human-robot interaction (HRI)1 is one of the main fields in the study and research of robotics. Within this field, dialogue systems and interaction by voice play an important role. When speaking about human-robot natural dialogue we assume that the ...
Read More
Situated Open World Reference Resolution for Human-Robot Dialogue
HRI '16: The Eleventh ACM/IEEE International Conference on Human Robot Interaction

A robot participating in natural dialogue with a human interlocutor may need to discuss, reason about, or initiate actions concerning dialogue-referenced entities. To do so, the robot must first identify or create new representations for those entities, ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HRI '07: Proceedings of the ACM/IEEE international conference on Human-robot interaction

March 2007

392 pages

ISBN:9781595936172

DOI:10.1145/1228716

General Chairs:
Cynthia Breazeal
Massachusetts Institute of Technology, USA
,
Alan C. Schultz
Naval Research Laboratory, USA
,
Program Chairs:
Terry Fong
NASA Ames Research Center, USA
,
Sara Kiesler
Carnegie Mellon University, USA

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 March 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

HRI07

Sponsor:

HRI07: International Conference on Human Robot Interaction

March 10 - 12, 2007

Virginia, Arlington, USA

Acceptance Rates

HRI '07 Paper Acceptance Rate 22 of 101 submissions, 22%;

Overall Acceptance Rate 268 of 1,124 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
769
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

Cited By

Huang KHan YWu JQiu FTang Q(2022)Language-Driven Robot Manipulation With Perspective Disambiguation and Placement OptimizationIEEE Robotics and Automation Letters10.1109/LRA.2022.31469557:2(4188-4195)Online publication date: Apr-2022
https://doi.org/10.1109/LRA.2022.3146955
Gervits FBriggs GRoque AKadomatsu GThurston DScheutz MMarge M(2021)Decision-Theoretic Question Generation for Situated Reference Resolution: An Empirical Study and Computational ModelProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3479925(150-158)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3462244.3479925
Marge MRudnicky A(2019)Miscommunication Detection and Recovery in Situated Human–Robot DialogueACM Transactions on Interactive Intelligent Systems10.1145/32371899:1(1-40)Online publication date: 17-Feb-2019
https://dl.acm.org/doi/10.1145/3237189
Brawer JMangin ORoncone AWidder SScassellati B(2018)Situated Human–Robot Collaboration: predicting intent from grounded natural language2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2018.8593942(827-833)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1109/IROS.2018.8593942
Thomaz AHoffman GCakmak M(2016)Computational Human-Robot InteractionFoundations and Trends in Robotics10.1561/23000000494:2-3(105-223)Online publication date: 20-Dec-2016
https://dl.acm.org/doi/10.1561/2300000049
Chai JShe LFang ROttarson SLittley CLiu CHanson KSagerer GImai MBelpaeme TThomaz A(2014)Collaborative effort towards common ground in situated human-robot dialogueProceedings of the 2014 ACM/IEEE international conference on Human-robot interaction10.1145/2559636.2559677(33-40)Online publication date: 3-Mar-2014
https://dl.acm.org/doi/10.1145/2559636.2559677
Martinson EBrock D(2013)Auditory Perspective TakingIEEE Transactions on Cybernetics10.1109/TSMCB.2012.221952443:3(957-969)Online publication date: Jun-2013
https://doi.org/10.1109/TSMCB.2012.2219524
Liu CFang RChai JLee GGinzburg J(2012)Towards mediating shared perceptual basis in situated dialogueProceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/2392800.2392827(140-149)Online publication date: 5-Jul-2012
https://dl.acm.org/doi/10.5555/2392800.2392827
Shibasaki YInaba TNakano YYanco HSteinfeld AEvers VJenkins O(2012)Referent identification process in human-robot multimodal communicationProceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction10.1145/2157689.2157753(197-198)Online publication date: 5-Mar-2012
https://dl.acm.org/doi/10.1145/2157689.2157753
Li LYan SYu XTan YLi H(2012)Robust Multiperson Detection and Tracking for Mobile Service and Social RobotsIEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics10.1109/TSMCB.2012.219210742:5(1398-1412)Online publication date: 1-Oct-2012
https://dl.acm.org/doi/10.1109/TSMCB.2012.2192107
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents