Article

Multimodal fusion: a new hybrid strategy for dialogue systems

Authors:

Pilar Manchón Portillo,

Guillermo Pérez García,

Gabriel Amores CarredanoAuthors Info & Claims

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Pages 357 - 363

https://doi.org/10.1145/1180995.1181061

Published: 02 November 2006 Publication History

Abstract

This is a new hybrid fusion strategy based primarily on the implementation of two former and differentiated approaches to multimodal fusion [11] in multimodal dialogue systems. Both approaches, their predecessors and their respective advantages and disadvantages will be described in order to illustrate how the new strategy merges them into a more solid and coherent solution. The first strategy was largely based on Johnston's approach [5] and implies the inclusion of multimodal grammar entries and temporal constraints. The second approach implied the fusion of information coming from different channels at dialogue level. The new hybrid strategy hereby described requires the inclusion of multimodal grammar entries and temporal constraints plus the additional information at dialogue level utilized in the second strategy. Within this new approach therefore, the fusion process will be initiated at grammar level and will be culminated at dialogue level.

References

[1]

Gabriel Amores & Joséé Francisco Quesada (1997) Episteme Procesamiento del Lenguaje Natural 21. pp 1--16.]]

[2]

Gabriel Amores, José Francisco Quesada (2000) Diseño e Implementación de Sistemas de Traducción Automática. Sevilla: Secretariado de Publicaciones de la Universidad de Sevilla.]]

[3]

Michael Johnston, Philip R. Cohen, David McGee, Sharon L. Oviatt, James A. Pittman, Ira A. Smith (1997): Unification-based Multimodal Integration. ACL 1997: pp 281--288.]]

Digital Library

[4]

Michael Johnston (1998): Unification-based Multimodal Parsing. COLING-ACL 1998, 624--630.]]

Digital Library

[5]

Michael Johnston, Srinivas Bangalore (2000). Finite State Multimodal Parsing and Understanding. Proceedings of the 18th conference on Computational linguistics - Volume 1. pp 369--375.]]

Digital Library

[6]

Michael Johnston, Srinivas Bangalore (2001). Finite-state Methods for Multimodal Parsing and Integration, Finite State Methods in Natural Language Processing, August 2001.]]

[7]

Pilar Manchón, Guillermo Pérez, Gabriel Amores (2005). WOZ experiments in Multimodal Dialogue Systems. Proceedings of Dialor'05, Nancy, France, pp 131--135.]]

[8]

Sharon Oviatt (1999). Ten myths of multimodal interaction, Communications of the ACM, Vol. 42, No. 11, pp. 74--81.]]

Digital Library

[9]

Sharon Oviatt (2003). Multimodal interfaces. In The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications. J. Jacko & A. Sears, Eds. Lawrence Erlbaum Assoc., Mahwah, NJ, chap.14, pp 286--304.]]

Digital Library

[10]

José Francisco Quesada, Doroteo Torre, Gabriel Amores (2000). Design of a Natural Command Language Dialogue System. Deliverable 3.2, Siridus Project.]]

[11]

Guillermo Pérez, Gabriel Amores & Pilar Manchóón (2005) Proceedings of ICMI'05 Workshop on Multimodal Interaction for the Visualisation and Exploration of Scientific Data. Trento, Italy.]]

[12]

Dafydd Gibbon, Inge Mertins & Roger Moore Eds. (2000). Handbook of Multimodal and Spoken Dialogue Systems. Kluwer Academic Publishers, Norwell, MA.]]

[13]

Laurence Nigay & Joëëlle Coutaz (1995). A generic platform for addressing the multimodal challenge. International Conference on Human-Computer Interaction, pp 98--105, Denver (CO). ACM.]]

Digital Library

[14]

Minh Tue Vo (1998). A Framework and Toolkit for the Construction of Multimodal Learning Interfaces. PhD. Thesis, Carnegie Mellon University, Pittsburgh, USA.]]

[15]

Minh Tue Vo & Alex Waibel (1997). Modelling and interpreting multimodal inputs. A semantic integration approach. Technical Report CMU-CS-97-192, Carnegie Mellon University, Pittsburgh, USA.]]

[16]

Minh Tue Vo & Wood, C. (1996). Building an application framework for speech and pen input integration in multimodal learning interfaces. International Conference on Acoustics, Speech and Signal Processing, Atlanta, GA. IEEE.]]

Digital Library

Cited By

Zhang LRadke R(2020)A Multi-Stream Recurrent Neural Network for Social Role Detection in Multiparty InteractionsIEEE Journal of Selected Topics in Signal Processing10.1109/JSTSP.2020.299239414:3(554-567)Online publication date: Mar-2020
https://doi.org/10.1109/JSTSP.2020.2992394
Cohen POviatt S(2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
https://dl.acm.org/doi/10.1145/3015783.3015795
Schüssel FHonold FBubalo NWeber MHuckauf A(2017)Management of Multimodal User Interaction in Companion-SystemsCompanion Technology10.1007/978-3-319-43665-4_10(187-207)Online publication date: 5-Dec-2017
https://doi.org/10.1007/978-3-319-43665-4_10
Show More Cited By

Index Terms

Multimodal fusion: a new hybrid strategy for dialogue systems
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Sound-based input / output
2. Human-centered computing
  1. Human computer interaction (HCI)
  2. Interaction design
    1. Interaction design theory, concepts and paradigms

Recommendations

Deep Multimodal Fusion: Combining Discrete Events and Continuous Signals
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when studying human behaviour it is common to annotate actions performed by the participant over several other modalities such as video ...
Toward multimodal fusion of affective cues
HCM '06: Proceedings of the 1st ACM international workshop on Human-centered multimedia

During face to face communication, it has been suggested that as much as 70% of what people communicate when talking directly with others is through paralanguage involving multiple modalities combined together (e.g. voice tone and volume, body language)...
From vocal to multimodal dialogue management
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Multimodal, speech-enabled systems pose different research problems when compared to unimodal, voice-only dialogue systems. One of the important issues is the question of how a multimodal interface should look like in order to make the multimodal ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

November 2006

404 pages

ISBN:159593541X

DOI:10.1145/1180995

General Chairs:
Francis Quek
Virginia Tech, USA
,
Jie Yang
Carnegie Mellon University, USA
,
Program Chairs:
Dominic Massaro
University of California, Santa Cruz, USA
,
Abeer Alwan
University of California, Los Angeles, USA
,
Timothy J. Hazen
Massachusetts Institute of Technology, USA

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI06

Sponsor:

ICMI06: 8th International Conference on Multimodal Interfaces 2006

November 2 - 4, 2006

Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
440
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang LRadke R(2020)A Multi-Stream Recurrent Neural Network for Social Role Detection in Multiparty InteractionsIEEE Journal of Selected Topics in Signal Processing10.1109/JSTSP.2020.299239414:3(554-567)Online publication date: Mar-2020
https://doi.org/10.1109/JSTSP.2020.2992394
Cohen POviatt S(2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
https://dl.acm.org/doi/10.1145/3015783.3015795
Schüssel FHonold FBubalo NWeber MHuckauf A(2017)Management of Multimodal User Interaction in Companion-SystemsCompanion Technology10.1007/978-3-319-43665-4_10(187-207)Online publication date: 5-Dec-2017
https://doi.org/10.1007/978-3-319-43665-4_10
Caschera MD’Ulizia AFerri FGrifoni P(2015)Multimodal Systems: An Excursus of the Main Research QuestionsOn the Move to Meaningful Internet Systems: OTM 2015 Workshops10.1007/978-3-319-26138-6_59(546-558)Online publication date: 28-Oct-2015
https://doi.org/10.1007/978-3-319-26138-6_59
Turk M(2014)Review ArticlePattern Recognition Letters10.1016/j.patrec.2013.07.00336(189-195)Online publication date: 1-Jan-2014
https://dl.acm.org/doi/10.1016/j.patrec.2013.07.003
Caschera MD'Ulizia AFerri FGrifoni P(2014)An Italian Multimodal CorpusProceedings of the Confederated International Workshops on On the Move to Meaningful Internet Systems: OTM 2014 Workshops - Volume 884210.1007/978-3-662-45550-0_57(557-566)Online publication date: 27-Oct-2014
https://dl.acm.org/doi/10.1007/978-3-662-45550-0_57
Lynch SRajendran K(2011)A Multiagent Approach to Teaching Complex Systems DevelopmentMulti-Agent Systems for Education and Interactive Entertainment10.4018/978-1-60960-080-8.ch004(70-87)Online publication date: 2011
https://doi.org/10.4018/978-1-60960-080-8.ch004
Ertl DKavaldjian SKaindl HFalb J(2010)Semi-Automatically Generated High-Level Fusion for Multimodal User InterfacesProceedings of the 2010 43rd Hawaii International Conference on System Sciences10.1109/HICSS.2010.335(1-10)Online publication date: 5-Jan-2010
https://dl.acm.org/doi/10.1109/HICSS.2010.335
Lalanne DNigay LPalanque pRobinson PVanderdonckt JLadry JCrowley JIvanov YWren CGatica-Perez DJohnston MStiefelhagen R(2009)Fusion engines for multimodal inputProceedings of the 2009 international conference on Multimodal interfaces10.1145/1647314.1647343(153-160)Online publication date: 2-Nov-2009
https://dl.acm.org/doi/10.1145/1647314.1647343
Schuricht MDavis ZHu MPrasad SMelliar‐Smith PMoser L(2009)Managing multiple speech‐enabled applications in a mobile handheld deviceInternational Journal of Pervasive Computing and Communications10.1108/174273709109918845:3(332-359)Online publication date: 4-Sep-2009
https://doi.org/10.1108/17427370910991884
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents