Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1228716.1228727acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
Article

Using vision, acoustics, and natural language for disambiguation

Published: 10 March 2007 Publication History
  • Get Citation Alerts
  • Abstract

    Creating a human-robot interface is a daunting experience. Capabilities and functionalities of the interface are dependent on the robustness of many different sensor and input modalities. For example, object recognition poses problems for state-of-the-art vision systems. Speech recognition in noisy environments remains problematic for acoustic systems. Natural language understanding and dialog are often limited to specific domains and baffled by ambiguous or novel utterances. Plans based on domain-specific tasks limit the applicability of dialog managers. The types of sensors used limit spatial knowledge and understanding, and constrain cognitive issues, such as perspective-taking.In this research, we are integrating several modalities, such as vision, audition, and natural language understanding to leverage the existing strengths of each modality and overcome individual weaknesses. We are using visual, acoustic, and linguistic inputs in various combinations to solve such problems as the disambiguation of referents (objects in the environment), localization of human speakers, and determination of the source of utterances and appropriateness of responses when humans and robots interact. For this research, we limit our consideration to the interaction of two humans and one robot in a retrieval scenario. This paper will describe the system and integration of the various modules prior to future testing.

    References

    [1]
    Argyros, A. A., and Lourakis, M.I.A. Real-time tracking of multiple skin-colored objects with a possibly moving camera. In European Conf. on Computer Vision (ECCV 2004), vol. 3, 2004, 368--379.
    [2]
    Artières, T., Marchand, J.-M., Gallinari, P., and Dorizzi, B. Multimodal segmental models for on-line handwriting recognition. In Intl. Conf. On Pattern Recognition, 2000, 2247--2250.
    [3]
    Bischoff, R. and Graefe, V. Design principles for dependable robotic assistants. In Intl. Journal on Humanoid Robotics, vol. 1, no. 1, (March 2004), 95--125.
    [4]
    Bolt, R. A. "Put-that-there": Voice and gesture at the graphics interface. In Proceedings of the 7th Annual Conf. on Computer Graphics and Interactive Techniques, (July 1980), 262--270.
    [5]
    Bradski, G., Kaehler, A., and Pisarevsky, V. Learning-based computer vision with Intel's Open Source computer vision library. In Intel Technology Journal, (May 2005).
    [6]
    Bradski, G.R. Computer Vision Face Tracking for Use in a Perceptual User Interface, Intel Technology Journal, 1998.
    [7]
    Brooks, A.G., and Breazeal, C. Working with robots and objects: Revisiting deictic reference for achieving spatial common ground. In Proceedings of the 2006 ACM Conference on Human-Robot Interaction, (March 2006), 297--304.
    [8]
    Collins, R.T., Liu, Y., and Leordeanu, M. Online selection of discriminative tracking features. In IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, (October 2005), 1631--1643.
    [9]
    Erman, L.D., Hayes-Roth, F., Lesser, V.R., and Reddy, D. The Hearsay-II Speech-Understanding System: Integrating Knowledge to Resolve Uncertainty. In Computing Surveys, vol. 12, no. 2, (June 1980), 213--253.
    [10]
    Harris, T.K., Banerjee, S., Rudnicky, A., Sison, J., Bodine, K., and Black, A. A research platform for multi-agent dialogue dynamics. In Proceedings of the IEEE Intl. Workshop on Robotics and Human Interactive Communications, (September) 2004, 497--502.
    [11]
    Iba, S., Weghe, M.V., Paredis, C., and Khosla, P. An architecture for gesture based control of mobile robots. In Proceedings of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS'99), vol. 2, (October 1999), 851--857.
    [12]
    Isard, M., and Blake, A. Condensation¿conditional density propagation for visual tracking. In Intl. Journal of Computer Vision, (August 1998), vol. 29, no. 1, 5--28.
    [13]
    Kim, S., Smyth, P., and Luther, S. Modeling waveform shapes with random effects segmental Hidden Markov models. In AUAI '04: Proceedings of the 20th Conf. on Uncertainty in Artificial Intelligence, 2004, 309--316.
    [14]
    Kruppa, H., Castrillon-Santana, M., and Schiele, B. Fast and robust face finding via local context. In Joint IEEE Intl. Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, (October 2003), 157--164.
    [15]
    Lienhart, L. and Maydt, J. An extended set of Haar-like features for rapid object detection. In Intl. Conf. on Image Processing, vol.1, 2002, 900--903.
    [16]
    Lowe, D.G. Object recognition from local scale invariant features. In Proceedings of the Seventh Intl. Conf. On Computer Vision (ICCV'99), (September 1999), 1150--1157.
    [17]
    Luke, R.H., Blisard, S.N., Keller, J.M., Skubic, M. Linguistic spatial relations of three dimensional scenes using SIFT keypoints. In IEEE Intl. Workshop on Robot and Human Interactive Communication: RO-MAN 2005, (August 2005), 704--709.
    [18]
    Mungamuru, B., and Aarabi, P. Enhanced sound localization. In IEEE Trans. on Systems, Man, and Cybernetics Part B, vol. 34, no. 3, (June 2004), 1526--1540.
    [19]
    Ostendorf, M., Digalakis, V., and Kimball, O. From HMMs to segment models: a unified view of stochastic modeling for speech recognition. In IEEE Trans. on Acoustics, vol. 4, 1996, 360--378.
    [20]
    Paquin, V., and Cohen, P. A vision-based gestural guidance interface for mobile robotic platforms. In Computer Vision in Human-Computer Interaction: ECCV Workshop in HCI Proceedings, 2004, 39--47.
    [21]
    Perzanowski, D., Schultz, A., Adams, W., Bugajska, M., Marsh, E., Trafton, G., Brock, D., Skubic, M., and Abramson, M. Communicating with teams of cooperative robots. In Multi-Robot Systems: From Swarms to Intelligent Automata. Kluwer: The Netherlands, 2002, 185--193.
    [22]
    Quatieri, T. Discrete Time Speech Signal Processing, Pearson Education, Inc.: Dehli, India, 2002.
    [23]
    Sidner, C.L., Kidd, C.D., Lee, C.H., and Lesh, N., Where to look: A study of human robot engagement. In ACM International Conference on Intelligent User Interfaces, (January 1994), 78--84.
    [24]
    Skubic, M., Perzanowski, D., Blisard, S., Schultz, A., Adams W., Bugajska, M. Spatial language for human-robot dialogs. In IEEE Trans. on Systems, Man, and Cybernetics, Special Issue on Human-Robot Interaction, (May 2004), vol. 34, no. 2, 154--167.
    [25]
    Sobottka, K., and Pitas, I. Face localization and facial feature extraction based on shape and color information. In Proc. of the Intl.Conf. on Image Processing, (September 1996), 483--486.
    [26]
    Starner, T, and Pentland, A. Visual recognition of American Sign Language using Hidden Markov Models. In Intl. Workshop on Automatic Face and Gesture Recognition, 1995, 189--194.
    [27]
    Yang, C., Duraiswami, R., and Davis, L.S. Fast multiple object tracking via a hierarchical particle filter. In IEEE Intl. Conf. On Computer Vision, vol. 1, 2005, 212--219.
    [28]
    Zhao, L., and Davis, L.S. Iterative figure-ground discrimination. In Intl. Conf. on Pattern Recognition (ICPR), 2005, 67--70.

    Cited By

    View all
    • (2022)Language-Driven Robot Manipulation With Perspective Disambiguation and Placement OptimizationIEEE Robotics and Automation Letters10.1109/LRA.2022.31469557:2(4188-4195)Online publication date: Apr-2022
    • (2021)Decision-Theoretic Question Generation for Situated Reference Resolution: An Empirical Study and Computational ModelProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3479925(150-158)Online publication date: 18-Oct-2021
    • (2019)Miscommunication Detection and Recovery in Situated Human–Robot DialogueACM Transactions on Interactive Intelligent Systems10.1145/32371899:1(1-40)Online publication date: 17-Feb-2019
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HRI '07: Proceedings of the ACM/IEEE international conference on Human-robot interaction
    March 2007
    392 pages
    ISBN:9781595936172
    DOI:10.1145/1228716
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 March 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. acoustics
    2. artificial intelligence
    3. auditory perspective-taking
    4. dialog
    5. human-robot interaction
    6. natural language understanding
    7. spatial reasoning
    8. vision

    Qualifiers

    • Article

    Conference

    HRI07
    HRI07: International Conference on Human Robot Interaction
    March 10 - 12, 2007
    Virginia, Arlington, USA

    Acceptance Rates

    HRI '07 Paper Acceptance Rate 22 of 101 submissions, 22%;
    Overall Acceptance Rate 268 of 1,124 submissions, 24%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Language-Driven Robot Manipulation With Perspective Disambiguation and Placement OptimizationIEEE Robotics and Automation Letters10.1109/LRA.2022.31469557:2(4188-4195)Online publication date: Apr-2022
    • (2021)Decision-Theoretic Question Generation for Situated Reference Resolution: An Empirical Study and Computational ModelProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3479925(150-158)Online publication date: 18-Oct-2021
    • (2019)Miscommunication Detection and Recovery in Situated Human–Robot DialogueACM Transactions on Interactive Intelligent Systems10.1145/32371899:1(1-40)Online publication date: 17-Feb-2019
    • (2018)Situated Human–Robot Collaboration: predicting intent from grounded natural language2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS.2018.8593942(827-833)Online publication date: 1-Oct-2018
    • (2016)Computational Human-Robot InteractionFoundations and Trends in Robotics10.1561/23000000494:2-3(105-223)Online publication date: 20-Dec-2016
    • (2014)Collaborative effort towards common ground in situated human-robot dialogueProceedings of the 2014 ACM/IEEE international conference on Human-robot interaction10.1145/2559636.2559677(33-40)Online publication date: 3-Mar-2014
    • (2013)Auditory Perspective TakingIEEE Transactions on Cybernetics10.1109/TSMCB.2012.221952443:3(957-969)Online publication date: Jun-2013
    • (2012)Towards mediating shared perceptual basis in situated dialogueProceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/2392800.2392827(140-149)Online publication date: 5-Jul-2012
    • (2012)Referent identification process in human-robot multimodal communicationProceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction10.1145/2157689.2157753(197-198)Online publication date: 5-Mar-2012
    • (2012)Robust Multiperson Detection and Tracking for Mobile Service and Social RobotsIEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics10.1109/TSMCB.2012.219210742:5(1398-1412)Online publication date: 1-Oct-2012
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media