Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
chapter

Multimodal speech and pen interfaces

Published: 24 April 2017 Publication History

Abstract

This chapter describes interfaces that enable users to combine digital pen and speech input for interacting with computing systems. Such interfaces promise natural and efficient interaction, taking advantage of skills that users have developed over many years. Many applications for such systems have been explored, such as speech and pen systems for computer-aided design (CAD), with which an architect can sketch to create and position entities while speaking information about them. For instance, a user could draw a hardwood floor outline while saying "threefourths inch thick heart pine." In response, the CAD system would create a floor of the correct shape, thickness, and materials, while also updating the list of materials to purchase for the job. Then the user could touch the floor and say "finish with polyurethane." The user of such a system could concentrate on creating the planned building, without interrupting their concentration to navigate a complex interface menu system. In fact, multimodal CAD systems like Think3 are preferred by users, and have been documented to significantly increase their productivity by speeding up interaction 23% [Engineer Live 2013, Price 2004].
This chapter will discuss how speech and pen multimodal systems have been built, and also how well they have performed. By pen input we include such devices as light pens, styluses, wireless digital pens, and digital pens that can write on paper while either storing digital data, or streaming it to a receiver [Anoto 2016]. We will also occasionally refer to other devices that can, like digital pens, provide a continuous stream of < x, y >coordinates---such as tracked laser pointers, finger input on touch-screens, and the ubiquitous mouse. Pen input devices can be used for a number of communicative functions, such as handwriting letters and numbers, drawing symbols, sketching diagrams or shapes, pointing, or gesturing (e.g., drawing an arrow to scroll a map). See the Glossary for defined terms.
This chapter begins by discussing users' multimodal speech and pen interaction patterns, and the documented advantages of this type of multimodal system (Section 10.2). Section 10.3 describes the simulation infrastructure that's ideally required for prototyping new systems, and the process of collecting multimodal data resources. In terms of system development, Sections 10.4 and 10.5 outline general signal processing and information flow, and major architectural components. Section 10.6 describes implemented approaches to multimodal fusion and semantic integration. Section 10.7 presents examples of multimodal speech and pen systems, some of which are commercial applications [Tumuluri 2017], with the Sketch-Thru-Plan system provided as a walk-through case study. The chapter concludes with Section 10.8 by discussing future directions for research and development. As an aid to comprehension, readers are referred to the Glossary for newly introduced terms throughout the chapter, and also the Focus Questions at the end of the chapter.

References

[1]
A. Adler and R. Davis. 2007. Speech and sketching: an empirical study of multimodal interaction. In Proceedings of the Eurographics Workshop on Sketch-Based Interfaces and Modeling (SBIM), pp. 83--90. 410
[2]
J. Alexandersson, T. Becker, and N. Pfleger. 2006. Overlay: the basic operation for discourse processing. In W. Wahlster, editor, SmartKom: Foundations of Multimodal Dialogue Systems, pp. 255--267. Springer, Berlin. 430
[3]
E. Alpaydin. 2017. Classifying multimodal data. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan Claypool Publishers, San Rafael, CA. 418
[4]
R. Anderson, C. Hoyer, C. Prince, J. Su, F. Videon, and S. Wolfman. 2004. Speech, ink, and slides: the interaction of content channels. In Proceedings of the Annual ACM International Conference on Multimedia, pp. 796--803. 410
[5]
Anoto. 2016. http://www.anoto.com/ (accessed November 5, 2016). 403, 406, 626
[6]
H. Aras, V. Chandrasekhara, S. Krüger, R. Malaka, and R. Porzel. 2006. Intelligent integration of external data and services into SmartKom. In W. Wahlster, editor, SmartKom: Foundations of Multimodal Dialogue Systems, pp. 363--378. Springer, Berlin. 417
[7]
A. M. Arthur, R. Lunsford, M. Wesson, and S. Oviatt. 2006. Prototyping novel collaborative multimodal systems: Simulation, data collection and analysis tools for the next decade. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 209--216. 412
[8]
S. Bangalore and M. Johnston. 2009. Robust understanding in multimodal interfaces. Computational Linguistics, 35(3): 345--397. 408
[9]
J. Barnett, R. Akolkar, R. J. Auburn, M. Bodell, D. C. Burnett, J. Carter, S. McGlashan, T. Lager, M. Helbing, R. Hosn, T. V.Raman, K.Reifenrath, N.Rosenthal, and J.Roxendal. 2015. State ChartXML(SCXML): State machine notation for control abstraction, W3C Recommendation. Available at http://www.w3.org/TR/scxml 431
[10]
A.W. Black and K. A. Lenzo. 2001. Flite: a small fast run-time synthesis engine. In Proceedings of the ISCA Tutorial and ResearchWorkshop on Speech Synthesis (SSW), pp. 204:1--204:6. 424
[11]
P. Boersma and D. Weenink. 2016. Praat: Doing phonetics by computer. Available at http://www.fon.hum.uva.nl/praat/ (accessed November 4, 2016). 414
[12]
R. A. Bolt. 1980. "Put-that-there": Voice and gesture at the graphics interface. ACMSIGGRAPH Computer Graphics, 14(3):262--270. 424
[13]
A. Cheyer and L. Julia. 1998. Multimodal maps: An agent-based approach. In H. Bunt, R. J. Beun, and T. Borghuis, editors, Multimodal Human-Computer Communication (CMC'95), vol. LNCS 1374, pp. 111--121, Springer, Berlin. 416, 423, 425, 426
[14]
A. Cheyer, L. Julia and J.-C. Martin. 2001. A unified framework for constructing multimodal experiments and applications. In H. Bunt and R. J. Beun, editors, Cooperative Multimodal Communication (CMC'98), vol. LNCS 2155, pp. 234--242, Springer, Berlin. 413
[15]
J. Clow and S. L. Oviatt. 1998. STAMP: A suite of tools for analyzing multimodal system processing. In Proceedings of International Conference on Spoken Language Processing (ICSLP), vol. 2, pp. 277--280. 414
[16]
P. R. Cohen. 1992. The role of natural language in a multimodal interface. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST), pp. 143--149. 425
[17]
P. R. Cohen. 1994. Natural language techniques for multimodal interaction, IEICE Transactions on Information and Systems (Japanese Edition), J94-D77(8):1403--1411. 425
[18]
P. R. Cohen, A. Cheyer, M. Wang, and S. C. Baeg. 1994. An open agent architecture. In Proceedings of the AAAI Spring Symposium, pp. 1--8. 406, 416, 418, 423, 425, 426, 624
[19]
P. R. Cohen, M. Dalrymple, D. B. Moran, F. C. Pereira, J. W. Sullivan, R. A. Gargan, J. L. Schlossberg, and S.W. Tyler. 1989. Synergistic use of direct manipulation and natural language. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 227--233. 404, 405, 424, 611
[20]
P. R. Cohen, M. Johnston, D. McGee, S. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow. 1997. QuickSet: multimodal interaction for distributed applications. In Proceedings of the ACMInternational Conference on Multimedia, pp. 31--40. 418, 425, 426
[21]
P. R. Cohen, E C. Kaiser, M. C. Buchanan, S. Lind, M. J. Corrigan, and R. M. Wesson. 2015. Sketch-Thru-Plan: a multimodal interface forcommandand control. Communications of the ACM, 58(4):56--65. 432, 433, 434, 435
[22]
P. R. Cohen, D. R. McGee, and J. Clow. 2000. The efficiency of multimodal interaction for a map-based task. In Proceedings of the Conference on Applied Natural Language Processing (ANLC), pp. 331--338. 426, 427
[23]
P. R. Cohen and D. R. McGee. 2004. Tangible multimodal interfaces for safety-critical applications. Communications of the ACM, 47(1):41--46. 426
[24]
P. Cohen, C. Swindells, S. Oviatt, and A. Arthur. 2008. A high-performance dual-wizard infrastructure for designing speech, pen, and multimodal interfaces. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 137--140. 413
[25]
H. D. Crane and D. Rtischev. 1993. Pen and voice unite. Byte, 18(11):98--102. 404
[26]
F. Cuenca, J. Van den Bergh, K. Luyten, and K. Coninx. 2014. A domain-specific textual language for rapid prototyping of multimodal interactive systems. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS), pp. 97--106, 414
[27]
D. Dahl, F. Paterno, R. Tumuluri, and M. Zancanaro. 2017. Standardized representations and markup languages for multimodal interaction. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Multimodal Language Processing, Software Tools, Commercial Applications, and Emerging Directions. Morgan Claypool Publishers, San Rafael, CA. 418, 430, 431
[28]
N. Dahlbäck, A. Jönsson, and L. Ahrenberg. 1993. Wizard of Oz studies: why and how. In Proceedings of the International Conference on Intelligent User Interfaces (IUI), pp. 193--200. 412
[29]
M. Denecke and J. Yang. 2000. Partial information in multimodal dialogue. In T. Tan, Y. Shi, andW. Gao, editors, Advances in Multimodal Interfaces (ICMI'00), vol. LNCS 1948, pp. 624--633. Springer, Berlin. 422
[30]
P. Ehlen, M. Johnston, and G. Vasireddy. 2002. Collecting mobile multimodal data for MATCH. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 2557--2560. 413
[31]
P. Ehlen and M. Johnston. 2012. Multimodal interaction patterns in mobile local search. In Proceedings of the ACM International Conference on Intelligent User Interfaces (IUI), pp. 21--24. 409, 425, 428, 429
[32]
R. Engel and N. Pfleger. 2006. Modality fusion. In W. Wahlster, editor, SmartKom: Foundations of Multimodal Dialogue Systems, pp. 223--235. Springer-Verlag, Berlin. 422
[33]
Engineer Live. 2013. Speech recognition technology can dramatically improve productivity. February 21. Available at http://www.engineerlive.com/content/15114 (accessed February 12, 2015). 403
[34]
T. Finin, R. Fritzson, D. McKay, and R. McEntire. 1994. KQML as an agent communication language. In Proceedings of the International Conference on Information and Knowledge Management (CIKM), pp. 456--463. 417
[35]
A. Fouse, N.Weibel, E. Hutchins, and J. D. Hollan. 2011. ChronoViz: a system for supporting navigation of time-coded data. In Proceedings of the ACMConference on Human Factors in Computing Systems---Extended Abstracts CHI EA, pp. 299--304. 414
[36]
A. Goldschen and D. Loehr. 1999. The role of the DARPA Communicator architecture as a human computer interface for distributed simulations. In Spring Simulation Interoperability Workshop. Simulation Interoperability Standards Organization, Orlando, FL. 417
[37]
H. Greene, L. Stotts, R. Patterson, and J. Greenberg. 2010. Command post of the future: Successful transition of a science and technology initiative to a program of record. Defense Acquisition Research Journal, 17(1): 3--26. 434
[38]
G. Herzog, H. Kirchmann, S. Merten, A. Ndiaye, and P. Poller. 2003. MULTIPLATFORM Testbed: An integration platform for multimodal dialog systems. In Proceedings of the HLT-NAACL Workshop on Software Engineering and Architecture of Language Technology Systems (SEALTS), pp. 75--82. 417
[39]
H. Holzapfel, K. Nickel, and R. Stiefelhagen. 2004. Implementation and evaluation of a constraint-based multimodal fusion system for speech and 3D pointing gestures. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 175--182. 422
[40]
X. Huang, S. Oviatt, and R. Lunsford. 2006. Combining user modeling and machine learning to predict users' multimodal integration patterns. In S. Renals, S. Bengio, and J. G. Fiscus, editors, Machine Learning for Multimodal Interaction (MLMI'06), vol. LNCS 4299, pp. 50--62, Springer-Verlag, Berlin. 411
[41]
X. Huang and S. Oviatt. 2006. Toward adaptive information fusion in multimodal systems. In S. Renals and S. Bengio, editors, Machine Learning for Multimodal Interaction (MLMI'05), vol. LNCS 3869, pp. 15--27. Springer-Verlag, Berlin. 411
[42]
M. Johnston. 1998. Unification-based multimodal parsing. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the International Conference on Computational Linguistics (COLING-ACL 98), pp.624--630. 421, 423, 426, 433
[43]
M. Johnston. 2017. Multimodal integration for interactive conversational systems. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Multimodal Language Processing, Software Tools, Commercial Applications, and Emerging Directions. Morgan Claypool Publishers, San Rafael, CA. 420, 421, 422, 423, 426, 430, 436
[44]
M. Johnston and S. Bangalore. 2000. Finite-state multimodal parsing and understanding. In Proceedings of the Conference on Computational Linguistics (COLING), vol. 1, pp. 369--375. 417, 420
[45]
M. Johnston and S. Bangalore. 2005. Finite-state multimodal integration and understanding. Natural Language Engineering, 11(2):159--187. 420
[46]
M. Johnston, P. R. Cohen, D. McGee, S. L. Oviatt, J. A. Pittman, and I. Smith. 1997. Unificationbased multimodal integration. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 281--288. 418, 420, 426, 433
[47]
M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, and P. Maloor. 2002. MATCH: an architecture for multimodal dialogue systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 376--383. 425, 428
[48]
M. Johnston, P. Baggia, D. C. Burnett, J. Carter, D. A. Dahl, G. McCobb, and D. Raggett. 2009. EMMA: Extensible MultiModal Annotation markup language. W3C Recommendation. Available at http://www.w3.org/TR/2009/REC-emma-20090210. 418, 430
[49]
M. Johnston, J. Chen, P. Ehlen, H. Jung, J. Lieske, A. Reddy, E. Selfridge, S. Stoyanchev, B. Vasilieff, and J. Wilpon. 2014. MVA: the multimodal virtual assistant. In Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), pp. 257--259. 425, 428, 430
[50]
L. Julia and C. Faure. 1995. Pattern recognition and beautification for a pen based interface. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 58--63. 425
[51]
D. Jurafsky and J. H. Martin. 2009. Speech and Language Processing, 2nd ed. Prentice Hall, Inc., Upper Saddle River, NJ. 423
[52]
E. C. Kaiser. 2005. Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application. In Proceedings of the ACM Intelligent User Interfaces (IUI), pp. 51--58. 410, 424
[53]
E. C. Kaiser. 2006. Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 347--356. 424
[54]
E. C. Kaiser, P. Barthelmess, C. Erdmann, and P. Cohen. 2007. Multimodal redundancy across handwriting and speech during computer mediated human-human interactions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1009--1018. 423, 424
[55]
E. Kaiser, A. Olwal, D. McGee, H. Benko, A. Corradini, X. Li, P. Cohen, and S. Feiner. 2003. Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 12--19. 409, 423
[56]
R. M. Kaplan and J. Bresnan. 1982. Lexical functional grammar: A formal system for grammatical representation. In J. Bresnan, editor, The Mental Representation of Grammatical Relations, pp. 173--281. MIT Press, Cambridge, MA. 405, 615
[57]
M. Kay. 1973. The MIND System. In R. Rustin, editor, Natural Language Processing, pp. 155--188. Algorithmics Press, New York. 423
[58]
M. Kipp. 2014. ANVIL---The video annotation research tool. In J. Durand, U. Gut, and G. Kristoffersen, editors, The Oxford Handbook of Corpus Phonology, pp. 420--436. Oxford University Press, Oxford, UK. 413
[59]
J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3):226--239. 418, 422
[60]
A. Kobsa, J. Allgayer, C. Reddig, N. Reithinger, D, Schmauks, K. Harbusch, andW.Wahlster. 1986. Combining deictic gestures and natural language for referent identification. In Proceedings of the Conference on Computational Linguistics (COLING), pp. 356--361. 424
[61]
G.Kondrak. 2000.Anew algorithm for the alignment of phonetic sequences. In Proceedings of the North American Chapter of the Association for Computational Linguistics Conference (NAACL), pp. 288--295. 424
[62]
S. Kumar, P. R. Cohen, and H. J. Levesque. 2000. The Adaptive Agent Architecture: Achieving fault-tolerance using persistent broker teams. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, pp. 459--466. 417, 426
[63]
S. Kumar, P. R. Cohen, and R. Coulston. 2004. Multimodal interaction under exerted conditions in a natural field setting. In Proceedings of the ACMInternational Conference on Multimodal Interfaces (ICMI), pp. 227--234. 410, 413
[64]
D. Lalanne, L. Nigay, P. Palanque, P. Robinson, J. Vanderdonckt, and J.-F. Ladry. Fusion engines for multimodal input: a survey. In Proceedings of the International Conference on Multimodal Interfaces (ICMI-MLMI), pp. 153--160. 420
[65]
D. L. Martin, A. J. Cheyer, and D. B. Moran. 1999. The Open Agent Architecture: A framework for building distributed software systems. Applied Artificial Intelligence, 13:91--128. 406, 416, 426, 624
[66]
H. P. Martínez and G. N. Yannakakis. 2014. Deep multimodal fusion: Combining discrete events and continuous signals. In Proceedings of the International Conference on Multimodal Interaction (ICMI), pp. 34--41. 420
[67]
D. R. McGee, P. R. Cohen, R. M. Wesson, and S. Horman. 2002. Comparing paper and tangible, multimodal tools. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 407--414. 426, 428
[68]
D. B. Moran, A. J. Cheyer, L. E. Julia, D. L. Martin, and S. Park. 1997. Multimodal user interfaces in the Open Agent Architecture. In Proceedings of the International Conference on Intelligent User Interfaces (IUI), pp. 61--68. 416, 418, 425, 426
[69]
S. Nakagawa and J. X. Zhang. 1994. An input interface with speech and touch screen. Transactions of the Institute of Electrical Engineers of Japan, 114-C(10):1009--1017. 425
[70]
J. G. Neal and S. C. Shapiro. 1991. Intelligent multimedia interface technology. In J. W. Sullivan and S. W. Tyler, editors, Intelligent User Interfaces, pp. 11--43, ACM Press, New York. 424
[71]
R. Neßelrath and M. Feld. 2014. SiAM-dp: A platform for the model based development of context-aware multimodal dialogue applications. In Proceedings of the International Conference on Intelligent Environments (IE), pp. 162--169. 414
[72]
L. Nigay and J. Coutaz. 1993. A design space for multimodal systems: Concurrent processing and data fusion. In Proceedings of INTERACT'93 and CHI'93 Conference on Human Factors in Computing Systems, pp. 172--178. 420
[73]
L. Nigay and J. Coutaz. 1995. A generic platform for addressing the multimodal challenge. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 98--105. 420, 425
[74]
T. Nishimoto, N. Shida, T. Kobayashi, and K. Shirai. 1995. Improving human interface in drawing tool using speech, mouse, and key-board. In Proceedings of the IEEE International Workshop on Robot and Human Communication (RO-MAN), pp. 107--112. 425
[75]
Northrop Grumman. 2016. Command and Control Personal Computer (C2PC). Available at http://www.northropgrumman.com/capabilities/c2pc/Pages/default.aspx (accessed May 3, 2016). 435
[76]
Openstream. Cue-me™. 2015. http://www.openstream.com/cueme.html (accessed February 12, 2015). 430
[77]
S. Oviatt. 1992. Pen/voice: Complementary multimodal communication. In Proceedings of Speech Tech'92, pp. 238--241. 404, 412, 425, 434
[78]
S. Oviatt. 1997. Multimodal interactive maps: Designing for human performance. Human-Computer Interaction (Special issue on Multimodal Interfaces), 12(1):93--129. 404, 407, 408, 410, 422, 425, 426, 434
[79]
S. Oviatt. 1998. The CHAM model of hyperarticulate adaptation during human-computer error resolution. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 2311--2314. 409
[80]
S. L. Oviatt, M. MacEachern, and G. Levow. 1998. Predicting hyperarticulate speech during human-computer error resolution. Speech Commun., 24(2):1--23. 405
[81]
S. Oviatt. 1999a. Mutual disambiguation of recognition errors in a multimodel architecture. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 576--583. 409, 410, 426
[82]
S. Oviatt. 1999b. Ten myths of multimodal interaction. Communications of the ACM, 42(11):74--81. 408, 412
[83]
S. Oviatt. 2000. Multimodal system processing in mobile environments. In Proceedings of the Annual ACM Symposium on User Interface Software Technology (UIST), pp. 21--30. 409, 413, 426
[84]
S. Oviatt. 2002. Breaking the robustness barrier: Recent progress on the design of robust multimodal systems. Advances in Computers, 56: 305--341. 409, 410
[85]
S. Oviatt. 2013. The Design of Future Educational Interfaces. Routledge Press, New York. 404, 436
[86]
S. Oviatt and P. R. Cohen. 2015. The Paradigm Shift to Multimodality in Contemporary Computer Interfaces. Morgan & Claypool Publishers, San Rafael, CA. 405, 406, 408, 409, 411, 418, 425, 618, 624
[87]
S. Oviatt, P. Cohen, M. Fong, and M. Frank. 1992. A rapid semi-automatic simulation technique for investigating interactive speech and handwriting. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), vol. 2, pp. 1351--1354.
[88]
S. Oviatt, A. Cohen, N. Weibel, K. Hang, and K. Thompson. 2014. Multimodal learning analytics data resources: Description of math data corpus and coded documents. In Proceedings of the Third International Data-Driven Grand Challenge Workshop on Multimodal Learning Analytics. 414
[89]
S. Oviatt, R. Coulston, and R. Lunsford. 2004. When do we interact multimodally?: cognitive load and multimodal communication patterns. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 129--136. 408
[90]
S. Oviatt, A. DeAngeli, and K. Kuhn. 1997. Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of Conference on Human Factors in Computing Systems (CHI), pp. 415--422.
[91]
S. L. Oviatt, J. Grafsgaard, L. Chen, and X. Ochoa. 2017. Multimodal learning analytics: Assessing learners' mental state during the process of learning. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan Claypool Publishers, San Rafael, CA. 414
[92]
S. Oviatt, R. Lunsford, and R. Coulston. 2005. Individual differences in multimodal integration patterns: what are they and why do they exist? In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 241--249. 410, 411
[93]
S. Oviatt and R. VanGent. 1996. Error resolution during multimodal human-computer interaction. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), vol. 1, pp 204--207. 409
[94]
J. A. Pittman, L. I. Smith, P. R. Cohen, S. L. Oviatt, and T.-C. Yang. 1996. QuickSet: A multimodal interface for military simulation. In Proceedings of the Conference on Computer-Generated Forces and Behavioral Representation, pp. 217--224. 425
[95]
C. Pollard and I. A. Sag. 1994. Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago. 405, 421, 615
[96]
P. M. Portillo, G. P. García, and G. A. Carredano. 2006. Multimodal fusion: a new hybrid strategy for dialogue systems. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 357--363. 422
[97]
G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior. 2003. Recent advances in the automatic recognition of audiovisual speech. Proceedings of the IEEE, 91(9): 1306--1326. 418
[98]
G. Potamianos, E. Marcheret, Y. Mroueh, V. Goel, A. Koumbaroulis, A. Vartholomaios, and S. Thermos. 2017. Audio and visual modality combination in speech processing applications. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 1: Foundations, User Modeling, and Common Modality Combinations. Morgan Claypool Publishers, San Rafael, CA. 418
[99]
P. Price. 2004. Matching technology and application. Tutorial given at the Annual Meeting of the American Voice I/O Society. 403
[100]
K. Rohlfing, D. Loehr, S. Duncan, A. Brown, A. Franklin, I. Kimbara, J.-T. Milde, F. Parrill, T. Rose, T. Schmidt, H. Sloetjes, A. Thies, and S. Wellinghoff. 2006. Comparison of multimodal annotation tools---Workshop report, Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion, 7:99--123. 414
[101]
M. W. Salisbury, J. H. Hendrickson, T. L. Lammers, C. Fu, and S. A. Moody. 1990. Talk and draw: bundling speech and graphics. IEEE Computer, 23(8):59--65. 424
[102]
F. Schiel and U. Türk. 2006. Wizard-of-Oz recordings. In W. Wahlster, editor, SmartKom: Foundations of Multimodal Dialogue Systems. pp. 541--570. Springer, Berlin. 413
[103]
S. Seneff, R. Lau, and J. Polifroni. 1999. Organization, communication and control in the GALAXY-II conversational system. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1271--1274. 424
[104]
H. Shimazu, S. Arita, and Y. Takashima. 1994. Multi-modal definite clause grammar. In Proceedings of the Conference on Computational Linguistics (COLING), pp. 832--836. 418, 422, 425
[105]
J. Siroux, M. Guyomard, F. Multon, and C. Remondeau. 1998. Modeling and processing of the oral and tactile activities in the GEORAL tactile system. In H. Bunt, R.-J. Beun, and T. Borghuis, editors, Multimodal Human-Computer Communication (CMC'95), vol. LNCS 1374, pp. 101--110. Springer, Berlin. 424
[106]
S. Steininger, F. Schiel, and S. Rabold. 2006. Annotation of multimodal data. InW.Wahlster, editor, SmartKom: Foundations of Multimodal Dialogue Systems, pp. 571--596. Springer, Berlin. 413
[107]
B. Suhm, B. Myers, and A. Waibel. 2001. Multimodal error correction for speech user interfaces. In ACM Transactions on Computer-Human Interaction, 8(1):60--98. 409
[108]
R. Tumuluri. 2017. Commercialization of multimodal systems. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Multimodal Language Processing, Software Tools, Commercial Applications, and Emerging Directions. Morgan Claypool Publishers, San Rafael, CA. 404, 431
[109]
B. Xiao, C. Girand, and S. Oviatt. 2002. Multimodal integration patterns in children. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 629--632. 410
[110]
B. Xiao, R. Lunsford, R. Coulston, M. Wesson, and S. Oviatt. 2003. Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of individual differences. In Proceedings of the International Conference on Multimodal Interfaces (ICMI), pp. 265--272. 410
[111]
M. T. Vo. 1998. A framework and toolkit for the construction of multimodal learning interfaces. Ph.D. thesis, Technical Report CMU-CS-98-129, School of Computer Science, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA. 423, 425
[112]
M. T. Vo and A. Waibel. 1993. Multimodal human-computer interaction. In Proceedings of the International Symposium on Spoken Dialogue: New Directions in Human and Man-Machine Communication, pp. 95--101. 420, 425
[113]
W. Wahlster. 1991. User and discourse modeling for multimodal communciation. In J. W. Sullivan and S. W. Tyler, editors, Intelligent User Interfaces, pp. 45--67, Ch. 3. ACM Press, New York. 424
[114]
W. Wahlster, N. Reithinger, and A. Blocher. 2001. SmartKom: Multimodal communication with a life-like character. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1547--1550. 422
[115]
W.Wahlster. 2002. SmartKom: Fusion and fission of speech, gestures, and facial expressions. In Proceedings of the International Workshop on Man-Machine Symbiotic Systems, pp. 213--225. 431
[116]
W.Wahlster, editor. 2006. SmartKom: Foundations of Multimodal Dialogue Systems. Springer, Berlin. 422, 425, 429, 430
[117]
A. Waibel, M. T. Vo, P. Duchnowski, and S. Manke. 1996. Multimodal interfaces. Artificial Intelligence Review, 10(3):299--319. 420, 425
[118]
D. H. D. Warren and F. C. N. Pereira. 1982. An efficient easily adaptable system for interpreting natural language queries. American Journal of Computational Linguistics, 8(3--4): 110--122. 418
[119]
S. M. Watt, Y.-M. Chee, K. Franke, M. Froumentin, S. Madhvanath, J.-A. Magaña, G. Pakosz, G. Russel, M. Selvaraj, G. Seni, C. Tremblay, and L. Yaeger. 2011. Ink Markup Language (InkML), W3C Recommendation. http://www.w3.org/TR/InkML/431
[120]
K. Wauchope. 1994. Eucalyptus: Integrating natural language input with a graphical user interface. US Naval Research Laboratory, NRL technical report NRL/FR/5510-949711. 424
[121]
K. Wittenburg, L. Weitzman, and J. Talley. 1991. Unification-based grammars and tabular parsing for graphical languages. Journal of Visual Languages & Computing, 2(4):347--370. 423
[122]
L. Wu, S. L. Oviatt, and P. R. Cohen. 1999a. Multimodal integration---a statistical view. IEEE Transactions on Multimedia, 1(4):334--341. 421
[123]
L. Wu, S. L. Oviatt, and P. R. Cohen. 1999b. Statistical multimodal integration for intelligent HCI. In Proceedings of the IEEE Signal Processing SocietyWorkshop on Neural Networks for Signal Processing (NNSP), pp. 487--496. 421
[124]
M. Zancanaro, O. Stock, and C. Strapparava. 1997. Multimodal interaction for information access: Exploiting cohesion. Computational Intelligence, 13(4):439--464.

Cited By

View all
  • (2024)Pen-Based InteractionHandbook of Human Computer Interaction10.1007/978-3-319-27648-9_102-1(1-22)Online publication date: 16-Oct-2024
  • (2023)A Qualitative Study on the Expectations and Concerns Around Voice and Gesture Interactions in VehiclesProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596040(2155-2171)Online publication date: 10-Jul-2023
  • (2021)Studying Natural User Interfaces for Smart Video Annotation towards Ubiquitous EnvironmentsProceedings of the 20th International Conference on Mobile and Ubiquitous Multimedia10.1145/3490632.3490672(158-168)Online publication date: 5-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Books
The Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations - Volume 1
April 2017
662 pages
ISBN:9781970001679
DOI:10.1145/3015783

Publisher

Association for Computing Machinery and Morgan & Claypool

Publication History

Published: 24 April 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Chapter

Appears in

ACM Books

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Pen-Based InteractionHandbook of Human Computer Interaction10.1007/978-3-319-27648-9_102-1(1-22)Online publication date: 16-Oct-2024
  • (2023)A Qualitative Study on the Expectations and Concerns Around Voice and Gesture Interactions in VehiclesProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596040(2155-2171)Online publication date: 10-Jul-2023
  • (2021)Studying Natural User Interfaces for Smart Video Annotation towards Ubiquitous EnvironmentsProceedings of the 20th International Conference on Mobile and Ubiquitous Multimedia10.1145/3490632.3490672(158-168)Online publication date: 5-Dec-2021
  • (2019)Standardized representations and markup languages for multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233806(347-392)Online publication date: 1-Jul-2019
  • (2019)Software platforms and toolkits for building multimodal systems and applicationsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233801(145-190)Online publication date: 1-Jul-2019
  • (2019)Multimodal integration for interactive conversational systemsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233798(21-76)Online publication date: 1-Jul-2019
  • (2019)IntroductionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233797(1-20)Online publication date: 1-Jul-2019
  • (2018)Multimodal learning analyticsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3107990.3108003(331-374)Online publication date: 1-Oct-2018
  • (2017)Multimodal gesture recognitionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015796(449-487)Online publication date: 24-Apr-2017
  • (2017)Multimodal feedback in HCIThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015792(277-317)Online publication date: 24-Apr-2017

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media