Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1027933.1027964acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

Published: 13 October 2004 Publication History
  • Get Citation Alerts
  • Abstract

    This paper presents an architecture for fusion of multimodal input streams for natural interaction with a humanoid robot as well as results from a user study with our system. The presented fusion architecture consists of an application independent parser of input events, and application specific rules. In the presented user study, people could interact with a robot in a kitchen scenario, using speech and gesture input. In the study, we could observe that our fusion approach is very tolerant against falsely detected pointing gestures. This is because we use speech as the main modality and pointing gestures mainly for disambiguation of objects. In the paper we also report about the temporal correlation of speech and gesture events as observed in the user study.

    References

    [1]
    T. Asfour, A. Ude, K.Berns, and R. Dillmann. Control of armar for the realization of anthropomorphic motion patterns. In The second IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS 2001), pages 22--24, 2001.
    [2]
    R. A. Bolt. "put-that-there": Voice and gesture at the graphics interface. In Proceedings of the 7th annual conference on Computer Graphics and Interactive Techniques, pages 262--270. ACM Press, 1980.
    [3]
    B. Carpenter. The Logic of Typed Feature Structures. Cambridge University Press, 1992.
    [4]
    P. R. Cohen, R. Coulston, and K. Krout. Multimodal interaction during multiparty dialogues: Initial results. In Proceedings of the International Conference On Multimodal Interfaces, 2002.
    [5]
    A. Corradini, R. M. Wesson, and P. R. Cohen. A map-based system using speech and 3d gestures for pervasive computing. In Proceedings of the 4th IEEE International Conference on Multimodal Interfaces, 2002.
    [6]
    M. Denecke. Object-oriented techniques in grammar and ontology specification. In The Workshop on Multilingual Speech Communication, pages 59--64, Kyoto, Japan, 2000.
    [7]
    M. Denecke. Rapid prototyping for spoken dialogue systems. In Proceedings of the 19th International Conference on Computational Linguistics, Taiwan, 2002.
    [8]
    J. Eisenstein and C. M. Christoudias. A salience-based approach to gesture-speech alignment. In Proceedings of the Human Language Technology conference / North American chapter of the Association for Computational Linguistics annual meeting, 2004.
    [9]
    M. Finke, P. Geutner, H. Hild, T. Kemp, K. Ries, and M. Westphal. The karlsruhe-verbmobil speech recognition engine. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP-97, Munich, Germany, 1997.
    [10]
    C. Fuegen, H. Holzapfel, and A. Waibel. Tight coupling of speech recognition and dialog management -- dialog-context dependent grammar weighting for speech recognition. In Proceedings of the International Conference on Spoken Language Processing, 2004.
    [11]
    P. Gieselmann, C. Fuegen, H. Holzapfel, T. Schaaf, and A. Waibel. Towards multimodal communication with a household robot. In Proceedings of the International Conference on Humanoid Robots, 2003.
    [12]
    Third IEEE International Conference on Humanoid Robots - Humanoids, Karlsruhe, Germany, 2003.
    [13]
    IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004.
    [14]
    M. Johnston. Unification-based multimodal parsing. In COLING-ACL, pages 624--630, 1998.
    [15]
    M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, and P. Maloor. Match: An architecture for multimodal dialogue systems. In Proceedings of ACL, 2002.
    [16]
    M. Johnston, P. R. Cohen, D. McGee, S. L. Oviatt, J. A. Pittman, and I. Smith. Unification-based multimodal integration. In Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics, pages 281--288, 1997.
    [17]
    M. Kaur, M. Tremaine, N. Huang, J. Wilder, Z. Gacovski, F. Flippo, and C. S. Mantravadi. Where is "it"? event synchronization in gaze-speech input systems. In Proceedings of the 5th International Conference on Multimodal Interfaces, pages 151--158. ACM Press, 2003.
    [18]
    S. Kettebekov, M. Yeasin, and R. Sharma. Prosody based co-analysis for continuous recognition of coverbal gestures. In Proceedings of the 4th International Conference on Multimodal Interfaces, Pittsburgh, USA, 2002.
    [19]
    S. C. Levinson. Pragmatics. Cambridge, England: Cambridge University, 1983.
    [20]
    K. Nickel and R. Stiefelhagen. Pointing gesture recognition based on 3d-tracking of face, hands and head orientation. In Proceedings of the Fifth International Conference on Multimodal Interfaces, Vancouver, Canada, Nov. 5-7 2003.
    [21]
    S. L. Oviatt, A. DeAngeli, and K. Kuhn. Integration and synchronization of input modes during multimodal human-computer interaction. In CHI, pages 415--422, 1997.
    [22]
    N. Reithinger, J. Alexandersson, T. Becker, A. Blocher, R. Engel, M. Löckelt, J. Müller, N. Pfleger, P. Poller, M. Streit, and V. Tschernomas. Smartkom - adaptive and flexible multimodal access to multiple applications. In Proceedings of the Fifth International Conference on Multimodal Interfaces, 2003.
    [23]
    R. Sharma, V. Pavlovic, and T. Huang. Toward multimodal human-computer interface. In Proceedings of the IEEE, volume~86, pages 853 -- 869, May 1998.
    [24]
    H. Soltau, F. Metze, C. Fuegen, and A. Waibel. A one pass- decoder based on polymorphic linguistic context assignment. In Proceedings of the Automatic Speech Recognition and Understanding Workshop, ASRU-2001, Madonna di Campiglio, Trento, Italy, December 2001.
    [25]
    R. Stiefelhagen, C. Fugen, P. Gieselmann, H. Holzapfel, K. Nickel, and A. Waibel. Natural human-robot interaction using speech, gaze and gestures. In Proceedings of the International Conference on Intelligent Robots and Systems, Sendai, Japan, 2004.
    [26]
    I. Wachsmuth. Communicative rhythm in gesture and speech. In Gesture-Based Communication in Human-Computer Interaction: International Gesture Workshop, GW'99, Gif-sur-Yvette, France, March 1999.
    [27]
    L. Wu, S. L. Oviatt, and P. R. Cohen. Multimodal integration - a statistical view. IEEE Transactions on Multimedia, 1(4):334--341, 1999.

    Cited By

    View all
    • (2023)A Parallel Multimodal Integration Framework and Application for Cake ShoppingApplied Sciences10.3390/app1401029914:1(299)Online publication date: 29-Dec-2023
    • (2023)MFIRA: Multimodal Fusion Intent Recognition Algorithm for AR Chemistry ExperimentsApplied Sciences10.3390/app1314820013:14(8200)Online publication date: 14-Jul-2023
    • (2023)Multimodal Error Correction with Natural Language and Pointing Gestures2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00212(1968-1978)Online publication date: 2-Oct-2023
    • Show More Cited By

    Index Terms

    1. Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces
      October 2004
      368 pages
      ISBN:1581139950
      DOI:10.1145/1027933
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 October 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. gesture
      2. multimodal architectures
      3. multimodal fusion and multisensory integration
      4. natural language
      5. speech
      6. vision

      Qualifiers

      • Article

      Conference

      ICMI04
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 453 of 1,080 submissions, 42%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)25
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 10 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)A Parallel Multimodal Integration Framework and Application for Cake ShoppingApplied Sciences10.3390/app1401029914:1(299)Online publication date: 29-Dec-2023
      • (2023)MFIRA: Multimodal Fusion Intent Recognition Algorithm for AR Chemistry ExperimentsApplied Sciences10.3390/app1314820013:14(8200)Online publication date: 14-Jul-2023
      • (2023)Multimodal Error Correction with Natural Language and Pointing Gestures2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00212(1968-1978)Online publication date: 2-Oct-2023
      • (2023)Interactive Multimodal Robot Dialog Using Pointing Gesture RecognitionComputer Vision – ECCV 2022 Workshops10.1007/978-3-031-25075-0_43(640-657)Online publication date: 19-Feb-2023
      • (2021)An intention understanding algorithm based on multimodal fusionSecond IYSF Academic Symposium on Artificial Intelligence and Computer Engineering10.1117/12.2623101(82)Online publication date: 1-Dec-2021
      • (2020)Research on Multimodal Perceptual Navigational Virtual and Real Fusion Intelligent Experiment Equipment and AlgorithmIEEE Access10.1109/ACCESS.2020.29780898(43375-43390)Online publication date: 2020
      • (2019)Introducing NarRob, a Robotic StorytellerDigital Forensics and Watermarking10.1007/978-3-030-11548-7_36(387-396)Online publication date: 22-Jan-2019
      • (2018)Semantic Fusion for Natural Multimodal Interfaces using Concurrent Augmented Transition NetworksMultimodal Technologies and Interaction10.3390/mti20400812:4(81)Online publication date: 6-Dec-2018
      • (2018)Up to the Finger TipProceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play10.1145/3242671.3242675(477-488)Online publication date: 23-Oct-2018
      • (2018)A robust user interface for IoT using context-aware Bayesian fusion2018 IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN)10.1109/BSN.2018.8329675(126-131)Online publication date: Mar-2018
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media