Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-00437-7_1guidebooksArticle/Chapter ViewAbstractPublication PagesBookacm-pubtype
chapter

Multimodal Interfaces: A Survey of Principles, Models and Frameworks

Published: 27 March 2009 Publication History

Abstract

The grand challenge of multimodal interface creation is to build reliable processing systems able to analyze and understand multiple communication means in real-time. This opens a number of associated issues covered by this chapter, such as heterogeneous data types fusion, architectures for real-time processing, dialog management, machine learning for multimodal interaction, modeling languages, frameworks, etc. This chapter does not intend to cover exhaustively all the issues related to multimodal interfaces creation and some hot topics, such as error handling, have been left aside. The chapter starts with the features and advantages associated with multimodal interaction, with a focus on particular findings and guidelines, as well as cognitive foundations underlying multimodal interaction. The chapter then focuses on the driving theoretical principles, time-sensitive software architectures and multimodal fusion and fission issues. Modeling of multimodal interaction as well as tools allowing rapid creation of multimodal interfaces are then presented. The article concludes with an outline of the current state of multimodal interaction research in Switzerland, and also summarizes the major future challenges in the field.

References

[1]
Ailomaa, M., Lisowska, A., Melichar, M., Armstrong, S., Rajmanm, M.: Archivus: A Multimodal System for Multimedia Meeting Browsing and Retrieval. In: Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, Sydney, Australia, July 17th- 21st (2006).
[2]
Allen, J.F., Perault, C.R.: Analyzing Intentions in Dialogues. Artificial Intelligence 15(3), 143-178 (1980).
[3]
André, E.: The generation of multimedia documents. In: Dale, R., Moisl, H., Somers, H. (eds.) A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text, pp. 305-327. Marcel Dekker Inc., New York (2000).
[4]
Araki, M., Tachibana, K.: Multimodal Dialog Description Language for Rapid System Development. In: Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue (July 2006).
[5]
Baddeley, A.D.: Working Memory. Science 255, 556-559 (1992).
[6]
Baddeley, A.D.: Working Memory. In: Bower, G.A. (ed.) Recent advances in learning and motivation, vol. 8. Academic Press, New York (1974).
[7]
Arens, Y., Hovy, E., Vossers, M.: On the knowledge underlying multimedia presentations. In: Maybury, M.T. (ed.) Intelligent Multimedia Interfaces, pp. 280-306. AAAI Press, Menlo Park (1993); Reprinted in Maybury and Wahlster, pp. 157-172 (1998).
[8]
Benoit, C., Martin, J.-C., Pelachaud, C., Schomaker, L., Suhm, B.: Audio-visual and multimodal speech-based systems. In: Gibbon, D., Mertins, I., Moore, R. (eds.) Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation, pp. 102-203. Kluwer, Dordrecht (2000).
[9]
Bolt, R.A.: Put-that-there: voice and gesture at the graphics interface. Computer Graphics 14(3), 262-270 (1980).
[10]
Bouchet, J., Nigay, L., Ganille, T.: ICARE Software Components for Rapidly Developing Multimodal Interfaces. In: Conference Proceedings of ICMI 2004, State College, Pennsylvania, USA, pp. 251-258. ACM Press, New York (2004).
[11]
Bourguet, M.L.: A Toolkit for Creating and Testing Multimodal Interface Designs. In: Companion proceedings of UIST 2002, Paris, pp. 29-30 (October 2002).
[12]
Brooke, N.M., Petajan, E.D.: Seeing speech: Investigations into the synthesis and recognition of visible speech movements using automatic image processing and computer graphics. In: Proceedings of the International Conference on Speech Input and Output: Techniques and Applications (1986), vol. 258, pp. 104-109 (1986).
[13]
Bui, T.H.: Multimodal Dialogue Management - State of the Art. CTIT Technical Report series No. 06-01, University of Twente (UT), Enschede, The Netherlands (2006).
[14]
Card, S., Moran, T.P., Newell, A.: The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates, London (1983).
[15]
Churcher, G., Atwell, E., Souter, C.: Dialogue management systems: a survey and overview (1997).
[16]
Cohen, P.: Dialogue Modeling. In: Cole, R., Mariani, J., Uszkoreit, H., Varile, G.B., Zaenen, A., Zampolli, A. (eds.) Survey of the State of the Art in Human Language Technology, pp. 204-209. Cambridge University Press, Cambridge (1998).
[17]
Cohen, P.R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., Clow, J.: QuickSet: multimodal interaction for distributed applications. In: Proceedings of the Fifth ACM international Conference on Multimedia, Seattle, USA, pp. 31-40 (1997).
[18]
Coutaz, J., Nigay, L., Salber, D., Blandford, A., May, J., Young, R.: Four Easy Pieces for Assessing the Usability of Multimodal Interaction: The CARE properties. In: Proceedings of INTERACT 1995, Lillehammer, Norway, pp. 115-120. Chapman & Hall Publ., Boca Raton (1995).
[19]
Dumas, B., Lalanne, D., Ingold, R.: Prototyping Multimodal Interfaces with SMUIML Modeling Language. In: CHI 2008 Workshop on User Interface Description Languages for Next Generation User Interfaces, CHI 2008, Firenze, Italy, pp. 63-66 (2008).
[20]
Dumas, B., Lalanne, D., Guinard, D., Ingold, R., Koenig, R.: Strengths and Weaknesses of Software Architectures for the Rapid Creation of Tangible and Multimodal Interfaces. In: Proceedings of 2nd international conference on Tangible and Embedded Interaction (TEI 2008), Bonn, Germany, February 19 - 21, pp. 47-54 (2008).
[21]
Duric, Z., Gray, W., Heishman, R., Li, F., Rosenfeld, A., Schoelles, M., Schunn, C., Wechsler, H.: Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interaction. Proc. of the IEEE 90(7), 1272-1289 (2002).
[22]
Flippo, F., Krebs, A., Marsic, I.: A Framework for Rapid Development of Multimodal Interfaces. In: Proceedings of ICMI 2003, Vancouver, BC, November 5-7, pp. 109-116 (2003).
[23]
Foster, M.E.: State of the art review: Multimodal fission. COMIC project Deliverable 6.1 (September 2002).
[24]
Grant, K.W., Greenberg, S.: Speech intelligibility derived from asynchronous processing of auditory-visual information. In: Workshop on Audio-Visual Speech Processing (AVSP 2001), Scheelsminde, Denmark, pp. 132-137 (2001).
[25]
Greenberg, S., Fitchett, C.: Phidgets: easy development of physical interfaces through physical widgets. In: Proceedings of the 14th Annual ACM Symposium on User interface Software and Technology (UIST 2001), Orlando, Florida, pp. 209-218. ACM, New York (2001).
[26]
Jaimes, A., Sebe, N.: Multimodal human-computer interaction: A survey. In: Computer Vision and Image Understanding, vol. 108(1-2), pp. 116-134. Elsevier, Amsterdam (2007).
[27]
Johnston, M., Cohen, P.R., McGee, D., Oviatt, S.L., Pittman, J.A., Smith, I.: Unificationbased multimodal integration. In: Proceedings of the Eighth Conference on European Chapter of the Association For Computational Linguistics, Madrid, Spain, July 07-12, pp. 281-288 (1997).
[28]
Katsurada, K., Nakamura, Y., Yamada, H., Nitta, T.: XISL: a language for describing multimodal interaction scenarios. In: Proceedings of ICMI 2003, Vancouver, Canada (2003).
[29]
Kieras, D., Meyer, D.E.: An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction 12, 391-438 (1997).
[30]
Klemmer, S.R., Li, J., Lin, J., Landay, J.A.: Papier-Mâché: Toolkit Support for Tangible Input. In: Proceedings of CHI 2004, pp. 399-406 (2004).
[31]
Koons, D., Sparrell, C., Thorisson, K.: Integrating simultaneous input from speech, gaze, and hand gestures. In: Maybury, M. (ed.) Intelligent Multimedia Interfaces, pp. 257-276. MIT Press, Cambridge (1993).
[32]
Krahnstoever, N., Kettebekov, S., Yeasin, M., Sharma, R.: A real-time framework for natural multimodal interaction with large screen displays. In: ICMI 2002, Pittsburgh, USA (October 2002).
[33]
Lalanne, D., Lisowska, A., Bruno, E., Flynn, M., Georgescul, M., Guillemot, M., Janvier, B., Marchand-Maillet, S., Melichar, M., Moenne-Loccoz, N., Popescu-Belis, A., Rajman, M., Rigamonti, M., von Rotz, D., Wellner, P.: In: The IM2 Multimodal Meeting Browser Family, Technical report, Fribourg (March 2005).
[34]
Lalanne, D., Rigamonti, M., Evequoz, F., Dumas, B., Ingold, R.: An ego-centric and tangible approach to meeting indexing and browsing. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds.) MLMI 2007. LNCS, vol. 4892, pp. 84-95. Springer, Heidelberg (2008).
[35]
Lisowska, A.: Multimodal Interface Design for Multimedia Meeting Content Retrieval. PhD Thesis, University of Geneva, Switzerland (September 2007).
[36]
Lisowska, A., Betrancourt, M., Armstrong, S., Rajman, M.: Minimizing Modality Bias When Exploring Input Preference for Multimodal Systems in New Domains: the Archivus Case Study. In: Proceedings of CHI 2007, San José, California, pp. 1805-1810 (2007).
[37]
Lisowska, A.: Multimodal Interface Design for the Multimodal Meeting Domain: Preliminary Indications from a Query Analysis Study. IM2.MDM Internal Report IM2.MDM-11 (November 2003).
[38]
Matena, L., Jaimes, A., Popescu-Belis, A.: Graphical representation of meetings on mobile devices. In: Proceedings of MobileHCI 2008 (10th International Conference on Human-Computer Interaction with Mobile Devices and Services), Amsterdam, pp. 503-506 (2008).
[39]
Mayer, R.E., Moreno, R.: A split-attention effect in multimedia learning: evidence for dual processing systems in working memory. Journal of Educational Psychology 90(2), 312- 320 (1998).
[40]
McKeown, K.: Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, Cambridge (1985).
[41]
McNeill, D.: Hand and Mind: What Gestures Reveal About Thought. Univ. of Chicago Press, Chicago (1992).
[42]
Melichar, M., Cenek, P.: From vocal to multimodal dialogue management. In: Proceedings of the Eighth International Conference on Multimodal Interfaces (ICMI 2006), Banff, Canada, November 2-4, pp. 59-67 (2006).
[43]
Moore, J.D.: Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context. MIT Press, Cambridge (1995).
[44]
Mousavi, S.Y., Low, R., Sweller, J.: Reducing cognitive load by mixing auditory and visual presentation modes. Journal of Educational Psychology 87(2), 319-334 (1995).
[45]
Neal, J.G., Shapiro, S.C.: Intelligent multimedia interface technology. In: Sullivan, J., Tyler, S. (eds.) Intelligent User Interfaces, pp. 11-43. ACM Press, New York (1991).
[46]
Nigay, L., Coutaz, J.A.: Design space for multimodal systems: concurrent processing and data fusion. In: Proceedings of the INTERACT 1993 and CHI 1993 Conference on Human Factors in Computing Systems, Amsterdam, The Netherlands, April 24 - 29, pp. 172-178. ACM, New York (1993).
[47]
Norman, D.A.: The Design of Everyday Things. Basic Book, New York (1988).
[48]
Novick, D.G., Ward, K.: Mutual Beliefs of Multiple Conversants: A computational model of collaboration in Air Trafic Control. In: Proceedings of AAAI 1993, pp. 196-201 (1993).
[49]
Oviatt, S.L.: Advances in Robust Multimodal Interface Design. IEEE Computer Graphics and Applications 23 (September 2003).
[50]
Oviatt, S.L.: Multimodal interactive maps: Designing for human performance. Human-Computer Interaction 12, 93-129 (1997).
[51]
Oviatt, S.L.: Multimodal interfaces. In: Jacko, J., Sears, A. (eds.) The Human-Computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, ch. 14, 2nd edn., pp. 286-304. CRC Press, Boca Raton (2008).
[52]
Oviatt, S.L.: Ten myths of multimodal interaction. Communications of the ACM 42(11), 74-81 (1999).
[53]
Oviatt, S.L.: Human-centered design meets cognitive load theory: designing interfaces that help people think. In: Proceedings of the 14th Annual ACM international Conference on Multimedia, Santa Barbara, CA, USA, October 23-27, pp. 871-880. ACM, New York (2006).
[54]
Oviatt, S.L., Cohen, P.R., Wu, L., Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J., Ferro, D.: Designing the user interface for multimodal speech and gesture applications: State-of-the-art systems and research directions. Human Computer Interaction 15(4), 263-322 (2000); Reprinted. In: Carroll, J. (ed.) Human-Computer Interaction in the New Millennium, ch. 19, pp. 421-456. Addison-Wesley Press, Reading (2001).
[55]
Oviatt, S.L., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., Carmichael, L.: Toward a theory of organized multimodal integration patterns during human-computer interaction. In: Proceedings of ICMI 2003, pp. 44-51. ACM Press, New York (2003).
[56]
Pan, H., Liang, Z.P., Anastasio, T.J., Huang, T.S.: Exploiting the dependencies in information fusion. In: CVPR, vol. 2, pp. 407-412 (1999).
[57]
Petajan, E.D.: Automatic Lipreading to Enhance Speech Recognition, PhD thesis, University of Illinois at Urbana-Champaign (1984).
[58]
Popescu-Belis, A., Georgescul, M.: TQB: Accessing Multimedia Data Using a Transcript-based Query and Browsing Interface. In: Proceedings of LREC 2006 (5th International Conference on Language Resources and Evaluation), Genoa, Italy, pp. 1560-1565 (2006).
[59]
Reeves, L.M., Lai, J., Larson, J.A., Oviatt, S., Balaji, T.S., Buisine, S.p., Collings, P., Cohen, P., Kraal, B., Martin, J.-C., McTear, M., Raman, T., Stanney, K.M., Su, H., Wang, Q.Y.: Guidelines for multimodal user interface design. Communications of the ACM 47(1), 57-59 (2004).
[60]
Rigamonti, M., Lalanne, D., Ingold, R.: FaericWorld: Browsing Multimedia Events Through Static Documents And Links. In: Baranauskas, C., Palanque, P., Abascal, J., Barbosa, S.D.J. (eds.) INTERACT 2007. LNCS, vol. 4663, pp. 102-115. Springer, Heidelberg (2007).
[61]
Serrano, M., Nigay, L., Lawson, J.-Y.L., Ramsay, A., Murray-Smith, R., Denef, S.: The OpenInterface framework: a tool for multimodal interaction. In: Adjunct Proceedings of CHI 2008, Florence, Italy, April 5-10, pp. 3501-3506. ACM Press, New York (2008).
[62]
Sharma, R., Pavlovic, V.I., Huang, T.S.: Toward multimodal human-computer interface. Proceedings IEEE 86(5), 853-860 (1998); Special issue on Multimedia Signal Processing.
[63]
Sire, S., Chatty, C.: The Markup Way to Multimodal Toolkits. In: W3C Multimodal Interaction Workshop (2002).
[64]
SSPNet: Social Signal Processing Network, http://www.sspnet.eu
[65]
Stanciulescu, A., Limbourg, Q., Vanderdonckt, J., Michotte, B., Montero, F.: A transformational approach for multimodal web user interfaces based on UsiXML. In: Proceedings of ICMI 2005, Torento, Italy, October 04-06, pp. 259-266 (2005).
[66]
Sweller, J., Chandler, P., Tierney, P., Cooper, M.: Cognitive Load as a Factor in the Structuring of Technical Material. Journal of Experimental Psychology: General 119, 176-192 (1990).
[67]
Tindall-Ford, S., Chandler, P., Sweller, J.: When two sensory modes are better than one. Journal of Experimental Psychology: Applied 3(3), 257-287 (1997).
[68]
Traum, D., Larsson, S.: The Information State Approach to Dialogue Management. In: Van Kuppevelt, J.C.J., Smith, R.W. (eds.) Current and New Directions in Discourse and Dialogue, pp. 325-353 (2003).
[69]
Turk, M., Robertson, G.: Perceptual user interfaces (Introduction). Communications of the ACM 43(3), 32-70 (2000).
[70]
Vo, M.T., Wood, C.: Building an application framework for speech and pen input integration in multimodal learning interfaces. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing (IEEE-ICASSP), vol. 6, pp. 3545-3548. IEEE Computer Society Press, Los Alamitos (1996).
[71]
W3C Multimodal Interaction Framework, http://www.w3.org/TR/mmi-framework
[72]
Wickens, C.: Multiple resources and performance prediction. Theoretical Issues in Ergonomic Science 3(2), 159-177 (2002).
[73]
Wickens, C., Sandry, D., Vidulich, M.: Compatibility and resource competition between modalities of input, central processing, and output. Human Factors 25(2), 227-248 (1983).
[74]
Wu, L., Oviatt, S., Cohen, P.: From members to teams to committee - a robust approach to gestural and multimodal recognition. IEEE Transactions on Neural Networks 13(4), 972- 982 (2002).
[75]
Wu, L., Oviatt, S., Cohen, P.: Multimodal integration - A statistical view. IEEE Transactions on Multimedia 1(4), 334-341 (1999).
[76]
Zhai, S., Morimoto, C., Ihde, S.: Manual and gaze input cascaded (MAGIC) pointing. In: Proceedings of the Conference on Human Factors in Computing Systems (CHI 1999), pp. 246-253. ACM Press, New York (1999).

Cited By

View all
  • (2024)AI as Modality in Human Augmentation: Toward New Forms of Multimodal Interaction with AI-Embodied ModalitiesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3678958(591-595)Online publication date: 4-Nov-2024
  • (2024)Towards Multimodal Interaction with AI-Infused Shape-Changing InterfacesAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686315(1-3)Online publication date: 13-Oct-2024
  • (2024)Exploiting Semantic Search and Object-Oriented Programming to Ease Multimodal Interface DevelopmentCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3664244(74-80)Online publication date: 24-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide books
Human Machine Interaction: Research Results of the MMI Program
March 2009
301 pages
ISBN:9783642004360
  • Editors:
  • Denis Lalanne,
  • Jürg Kohlas

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 27 March 2009

Qualifiers

  • Chapter

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)AI as Modality in Human Augmentation: Toward New Forms of Multimodal Interaction with AI-Embodied ModalitiesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3678958(591-595)Online publication date: 4-Nov-2024
  • (2024)Towards Multimodal Interaction with AI-Infused Shape-Changing InterfacesAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686315(1-3)Online publication date: 13-Oct-2024
  • (2024)Exploiting Semantic Search and Object-Oriented Programming to Ease Multimodal Interface DevelopmentCompanion Proceedings of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems10.1145/3660515.3664244(74-80)Online publication date: 24-Jun-2024
  • (2024)Towards the Automatic Construction of Multimodal Graphical and Voice InterfacesPattern Recognition10.1007/978-3-031-62836-8_28(297-307)Online publication date: 19-Jun-2024
  • (2024)User Interaction Mode Selection and Preferences in Different Driving States of Automotive Intelligent CockpitDesign, User Experience, and Usability10.1007/978-3-031-61353-1_18(262-274)Online publication date: 29-Jun-2024
  • (2023)A Qualitative Study on the Expectations and Concerns Around Voice and Gesture Interactions in VehiclesProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596040(2155-2171)Online publication date: 10-Jul-2023
  • (2023)Semantic Scene Builder: Towards a Context Sensitive Text-to-3D Scene FrameworkDigital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management10.1007/978-3-031-35748-0_32(461-479)Online publication date: 23-Jul-2023
  • (2022)Enhancing interaction of people with quadriplegiaProceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3529190.3529218(223-229)Online publication date: 29-Jun-2022
  • (2022)Multimodality in VR: A SurveyACM Computing Surveys10.1145/350836154:10s(1-36)Online publication date: 13-Sep-2022
  • (2022)ONYX - User Interfaces for Assisting in Interactive Task Learning for Natural Language Interfaces of Data Visualization ToolsExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519793(1-7)Online publication date: 27-Apr-2022
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media