research-article

Multimodal integration for interactive conversational systems

Author:

The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions - Volume 3

July 2019

Pages 21 - 76

https://doi.org/10.1145/3233795.3233798

Published: 01 July 2019 Publication History

Get Access

References

[1]

S. P. Abney. 1991. Parsing by chunks. In R. Berwick, S. Abney, and C. Tenny, editors Principle-based parsing. IEEE, Los Alamitos, CA. pp. 257--278. 44

Google Scholar

[2]

J. Alexandersson and T. Becker. 2001. Overlay as the basic operation for discourse processing in a multimodal dialogue system. In Proceedings of 2nd IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems. pp. 1--7. 36, 52, 59, 60, 61

Google Scholar

[3]

J. Alexandersson, T. Becker, and N. Pfleger. 2004. Scoring for overlay based on informational distance. In Proceedings of KONVENS-04. Vienna, Austria. pp. 1--4.

Google Scholar

[4]

C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri. 2007. Openfst: A general and efficient weighted finite-state transducer library. In Proceedings of the Ninth International Conference on Implementation and Application of Automata, (CIAA 2007). Lecture Notes in Computer Science Vol. 4783, pp. 11--23. Springer, Berlin, Heidelberg. 45

Digital Library

Google Scholar

[5]

J. Allgayer, R. M. Jansen-Winkeln, C. Reddig, N. Reithinger. 1989. Bidirectional use of knowledge in the multimodal NL access system XTRA. In Proceedings of IJCAI 1989, pp. 1492--1497. 32

Digital Library

Google Scholar

[6]

H. Alshawi. 1987. Memory and Context for Language Interpretation. Cambridge, UK. 50

Digital Library

Google Scholar

[7]

M. Al-Hames, A. Dielmann, D. Gatica-Perez, S. Reiter, S. Renals, G. Rigoli, and D. Zhang. 2006. Multimodal integration for meeting group action segmentation and recognition. In S. Renals and S. Bengio, editors, MLMI 2005, LNCS 3869, pp. 52--63. 53

Digital Library

Google Scholar

[8]

D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. Fan, C. Fougner, T. Han, A. Y. Hannun, B. Jun, P. LeGresley, L. Lin, S. Narang, A. Ng, S. Ozair, R. Prenger, J. Raiman, S. Sathees, D. Seetapun, S. Sengupta, Y. Wang, Z. Wang, C. Wang, B. Xiao, D. Yogatama, J. Zhan, and Z. Zhu. 2016. Deep Speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of the 33rd International Conference on Machine Learning, New York. 62

Digital Library

Google Scholar

[9]

A. H. Anderson, M. Bader, E. Gurman Bard, E. Boyle, G. Doherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Miller, C. Sotillo, and H. S. Thompson. 1991. The HCRC Map Task corpus. Language and Speech, 34(4). 54

Crossref

Google Scholar

[10]

E. André. 2002. Natural language in multimedia/multimodal systems. In Ruslan Mitkov, editor, Handbook of Computational Linguistics. Oxford University Press, New York. 33

Google Scholar

[11]

O. Aran and D. Gatica-Perez. 2010. Fusing audio-visual nonverbal cues to detect dominant people in group conversations. In Proceedings of 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey. pp. 3687--3690. 25

Digital Library

Google Scholar

[12]

T. Baltrušaitis, C. Ahuja, and L.-P. Morency. 2018. Challenges and applications in multimodal machine learning. In S. Oviatt, B. Schuller, P. R. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan & Claypool Publishers, San Rafael, CA. 23

Google Scholar

[13]

S. Bangalore and G. Riccardi. 2002. Stochastic finite-state models of spoken language machine translation. Machine Translation. 17(3): 165--184. 44

Digital Library

Google Scholar

[14]

S. Bangalore and M. Johnston. 2004. Balancing data-driven and rule-based approaches in the context of a multimodal conversational system. In Proceedings of the North American Association for Computational Linguistics/Human Language Technology (NAACL/SLT), pp. 33--40. Boston, MA. 45

Google Scholar

[15]

S. Bangalore and M. Johnston. 2000. Tight-coupling of multimodal language processing with speech recognition. In Proceedings of the International Conference on Spoken Language Processing, Beijing. pp. 126--129.

Google Scholar

[16]

S. Bangalore and M. Johnston. 2009. Robust understanding in multimodal interfaces. Computational Linguistics 35(3): 345--397. 41, 42, 44, 45, 59, 60, 63, 64

Digital Library

Google Scholar

[17]

Bolt, R. A. 1980. "Put-that-there": voice and gesture at the graphics interface. Computer Graphics 14(3): 262--270. 24, 32

Digital Library

Google Scholar

[18]

R. J. Brachman, D. L. McGuiness, P. F. Patel-Schneider, and L. A. Resnick. 1991. Living with CLASSIC: When and how to use a KL-ONE-like language. In J. Sowa, editor, Principles of Semantic Networks. Morgan Kaufmann, San Mateo, CA. 33

Google Scholar

[19]

R. Carpenter. 1992. The Logic of Typed Feature Structures. Cambridge University Press, Cambridge, UK. 30, 33, 34, 788

Digital Library

Google Scholar

[20]

Cassell, J. 1998. A framework for gesture generation and interpretation. In R. Cipolla and A. Pentland, editors, Computer Vision in Human-Machine Interaction, pp. 191--215. Cambridge University Press, Cambridge, UK. 54

Google Scholar

[21]

M. Chatterjee, S. Park, L-P. Morency, and S. Scherer. 2015. Combining two perspectives on classifying multimodal data for recognizing speaker traits. In Proceedings of ICMI 2015, pp. 7--14. Seattle, WA. 25, 53 25, 53

Digital Library

Google Scholar

[22]

J. Chai, P. Hong, and M. Zhou. 2004. A probabilistic approach to reference resolution in multimodal user interfaces. In Proceedings of 9th International Conference on Intelligent User Interfaces (IUI), Madeira, Portugal. pp. 70--77. 26, 51, 59, 60, 61, 762

Digital Library

Google Scholar

[23]

C. Chao and A. L. Thomaz. 2012. Timed petri nets for multimodal interaction modeling. In Proceedings of ICMI 2012 Workshop on Speech and Gesture Production in Virtually and Physically Embodied Conversational Agents, Santa Monica, CA. 49

Google Scholar

[24]

L. Chen and B. Di Eugenio 2013. Multimodality and dialog act classification in the RoboHelper project. In Proceedings of SigDial Conference, pp. 183--192. Association for Computational Linguistics. Metz, France. 53, 54, 60, 61

Google Scholar

[25]

J. Cocke and J. T. Schwartz. 1970. Programming languages and their compilers: Preliminary notes (Technical report) (2nd revised ed.). Courant Institute of Mathematical Sciences. New York University, New York. 38

Digital Library

Google Scholar

[26]

P. R. Cohen, M. Dalrymple, D. B. Moran, F. C. N. Pereira, J. W. Sullivan, R. A. Gargan, J. L. Schlossberg, and S. W. Tyler. 1989. Synergistic use of direct manipulation and natural language. In Proceedings of the Conference on Human Factors in Computing Systems (CHI'89), 227--234. New York: ACM Press. (Reprinted in Maybury & Wahlster editors, 1998. Readings in Intelligent User Interfaces pp. 29--37. San Francisco: Morgan Kaufmann.) 32

Digital Library

Google Scholar

[27]

P. R. Cohen. 1992. The role of natural language in a multimodal interface. In Proceedings of the 5th Annual ACM Symposium on User Interface Software and Technology. Monterey, CA. pp. 143--149. ACM Press. 31

Digital Library

Google Scholar

[28]

P. R. Cohen, M. Johnston, D. McGee, S. L. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow. 1997. Multimodal interaction for distributed interactive simulation. In Proceedings of Innovative Applications of Artificial Intelligence Conference. AAAI/MIT Press, Menlo Park, CA. 23, 34, 56

Digital Library

Google Scholar

[29]

P. R. Cohen, M. Johnston, D. McGee, S. L. Oviatt, J. Clow, and I. Smith. 1998. The efficiency of multimodal interaction: A case study. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). Sydney, Australia. 31

Google Scholar

[30]

P. R. Cohen, D. McGee, S. Oviatt, L. Wu, and J. Clow. 1999. Multimodal Interaction for 2D and 3D environments. L. Rosenblum and M. Macedonia, editors, IEEE Computer Graphics and Applications. IEEE Press, New York. 24

Digital Library

Google Scholar

[31]

P. R. Cohen, E. C. Kaiser, C. M. Buchanan, and S. Lind. 2015. Sketch-Thru-Plan: A multimodal interface for command and control. Communications of ACM. April 2015. 58(4): pp. 56--65. 24, 32, 34

Digital Library

Google Scholar

[32]

P. R. Cohen, and S. Oviatt. 2017. Multimodal speech and pen interfaces. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, A. Krüger, editors, Handbook of Multimodal-Multisensor Interfaces, Volume 1: Foundations, User Modeling, and Common Modality Combinations. Morgan & Claypool Publishers, San Rafael, CA. 24

Digital Library

Google Scholar

[33]

A. Corradini, R. M. Wesson, and P. R. Cohen. 2002. A map-based system using speech and 3D gestures for pervasive computing. In Proceedings of International Conference on Multimodal Interfaces (ICMI). pp. 191--196. 24

Digital Library

Google Scholar

[34]

C. Cortes, and V. Vapnik. 1995. Support-vector networks. Machine Learning 20.3, pp. 273--297. 53

Digital Library

Google Scholar

[35]

A. Crimi, A. Guercio, G. Nota, G. Pacini, G. Tortora, and M. Tucci. 1991. Relation grammars and their application to multi-dimensional languages. Journal of Visual Languages and Computing, 2:333--346. 39

Digital Library

Google Scholar

[36]

L. Duncan, W. Brown, C. Esposito, H. Holmback, and P. Xue. 1999. Enhancing Virtual Maintenance Environments with Speech Understanding. Boeing M&CT TechNet. Seattle, WA. 24

Google Scholar

[37]

J. Earley. 1970. An efficient context-free parsing algorithm. Communications of the ACM. 13: pp. 94--102. 39

Digital Library

Google Scholar

[38]

P. Ehlen and M. Johnston. 2010. Location grounding in multimodal local search. In Proceedings of the International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI '10), Beijing, China. 53, 55, 60, 61

Digital Library

Google Scholar

[39]

P. Ehlen and M. Johnston. 2012. Multimodal dialogue in mobile local search. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA, pp. 303--304. 52, 59, 60

Digital Library

Google Scholar

[40]

P. Ehlen and M. Johnston. 2013. A multimodal dialogue interface for mobile local search. In Proceedings of the ACM Conference on Intelligent User Interfaces (IUI), Santa Monica, CA. pp. 63--64. 52, 61, 64

Digital Library

Google Scholar

[41]

J. Eisenstein and R. Davis. 2004. Visual and linguistic information in gesture classification. In Proceedings of the International Conference on Multimodal Interaction (ICMI). State College, PA, USA. pp. 113--120. 53, 60, 61

Digital Library

Google Scholar

[42]

A. L. Gorin, S. Levinson, A. Gertner, E. Goldman. 1991. Adaptive acquisition of language. Computer Speech and Language. 5:2, pp. 101--132. 57

Crossref

Google Scholar

[43]

A. L. Gorin, G. Riccardi, and J. H. Wright. 1997. How may I help you? Speech Communication. 23, pp. 113--127.

Digital Library

Google Scholar

[44]

D. Harel. 1987. STATECHARTS: A visual formalism for complex systems. Science of Computer Programming. 8. pp. 231--274. North Holland. 30, 785

Digital Library

Google Scholar

[45]

A. Hauptmann. 1989. Speech and gesture for graphic image manipulation. In Proceedings of CHI'89. pp. 241--245, Austin, TX. 31

Digital Library

Google Scholar

[46]

R. Helm, K. Marriott, and M. Odersky. 1991. Building visual language parsers. In Proceedings of the Conference on Human Factors in Computing Systems: CHI '91, ACM Press, New York. pp. 105--112. 39

Digital Library

Google Scholar

[47]

L. Hetherington. 2004. The MIT finite-state transducer toolkit for speech and language processing. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). Jeju Island, Korea. 45

Google Scholar

[48]

C. Huls, E. Bos, and W. Classen. 1995. Automatic referent resolution of deictic and anaphoric expressions. Computational Linguistics 21: 59--79. 50, 55

Digital Library

Google Scholar

[49]

M. Johnston. 1998. Unification-based multimodal parsing. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada. pp. 624--630. 30, 37, 38, 47, 49, 52, 59, 60, 786

Digital Library

Google Scholar

[50]

M. Johnston. 2000. Deixis and conjunction in multimodal systems. In Proceedings of the 18th Conference on Computational Linguistics (COLING), Saarbrücken, Germany. pp. 362--368. 41

Digital Library

Google Scholar

[51]

M. Johnston, P. R. Cohen, D. McGee, S. L. Oviatt, J. A. Pittman, and I. Smith. 1997. Unification-based multimodal integration. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. pp. 281--288. 34, 52, 55, 56, 59, 60, 61

Digital Library

Google Scholar

[52]

M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, and P. Maloor. 2002a. MATCH: An architecture for multimodal dialog systems. In Proceedings of the Association of Computational Linguistics, Philadelphia, PA. pp. 376--383. 23, 42, 46

Digital Library

Google Scholar

[53]

M. Johnston, S. Bangalore, A. Stent, G. Vasireddy, and P. Ehlen. 2002b. Multimodal language processing for mobile information access. In Proceedings of the International Conference on Spoken Language Processing, Denver, CO. pp. 2237--2240. 41, 59

Google Scholar

[54]

M. Johnston and S. Bangalore. 2005. Finite-state multimodal integration and understanding. Journal of Natural Language Engineering, 11(2): 159--187. 32, 41, 43, 44, 45, 46, 59, 60, 63

Digital Library

Google Scholar

[55]

M. Johnston and S. Bangalore. 2001. Finite-state methods for multimodal parsing and integration. In Proceedings of the ESSLLI Workshop on Finite-state Methods, Helsinki, Finland. 41, 43, 44, 45, 59, 60, 63

Google Scholar

[56]

M. Johnston and P. Ehlen. 2010. Speak4It<sup>™</sup>: Multimodal interaction in the wild. In Proceedings of the IEEE Spoken Language Technology workshop, Berkeley, CA. pp. 59--60. 23, 24, 55

Google Scholar

[57]

A. Joshi and P. Hopely. 1997. A parser from antiquity. Journal of Natural Language Engineering, 2(4): 6--15. 44

Digital Library

Google Scholar

[58]

J. Lafferty, A. McCallum, and F. C. N. Pereira, 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Departmental Paper CIS, UPENN. June 2001. 53

Google Scholar

[59]

E. Kaiser, A. Olwal, D. McGee, H. Benko, A. Corradini, X. Li, P. Cohen, and S. Feiner. 2003. Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In Proceedings of the 5th International Conference on Multimodal Interfaces (ICMI). New York. pp. 12--19. 24, 32, 51

Digital Library

Google Scholar

[60]

R. M. Kaplan and J. Bresnan. 1995. Lexical-functional grammar: A formal system for grammatical representation. In J. Bresnan, editor, The Mental Representation of Grammatical Relations, pp. 173--181. MIT Press, Cambridge, MA. 27, 771

Google Scholar

[61]

R. M. Kaplan and M. Kay. 1994. Regular models of phonological rule systems. Computational Linguistics, 20(3): 331--378. 44

Digital Library

Google Scholar

[62]

L. Kartunnen. 1991. Finite-state constraints. In Proceedings of the International Conference on Current Issues in Computational Linguistics, Universiti Sains Malaysia, Penang. 44

Google Scholar

[63]

T. Kasami. 1965. An efficient recognition and syntax-analysis algorithm for context-free languages (Technical report). AFCRL. 65--758. 38

Google Scholar

[64]

A. Kehler, J. C. Martin, A. Cheyer, L. Julia, J. R. Hobbs, and J. Bear. 1998. On representing salience and reference in multimodal human-computer interaction. In Proceedings of the AAAI-98 Workshop on Representations for Multimodal Human-Computer Interaction, Madison, WI. 50, 55, 59

Google Scholar

[65]

A. Kehler. 2000. Cognitive status and form of reference in multimodal human-computer interaction. In Proceedings of the AAAI'00. pp. 685--689. Austin TX. 50

Digital Library

Google Scholar

[66]

K. K. Koskenniemi. 1984. Two-level morphology: A general computation model for wordform recognition and production. Ph.D. thesis, University of Helsinki. 44

Google Scholar

[67]

D. B. Koons, C. J. Sparrell, and K. R. Thorisson. 1993. Integrating simultaneous input from speech, gaze, and hand gestures. In M. T. Maybury, editor, Intelligent Multimedia Interfaces. AAAI Press/MIT Press, Cambridge, MA, pp. 257--276. 33

Digital Library

Google Scholar

[68]

F. Lakin. 1986. Spatial parsing for visual languages. In S. K. Chang, T. Ichikawa, and P. A. Ligomenides, editors, Visual Languages. Plenum Press. pp. 35--85. 39

Google Scholar

[69]

M. E. Latoschik. 2002. Designing transition networks for multimodal VR-interactions using a markup language. In Proceedings of the Fourth ACM International Conference on Multimodal Interfaces (ICMI), Pittsburgh, PA. pp. 411--416. 32

Digital Library

Google Scholar

[70]

X. Ma and E. Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the ACL. pp. 1064--1074. Berlin, Germany. 53

Google Scholar

[71]

A. McCallum, D. Freitag, and F. Pereira. 2000. Maximum entropy markov models for information extraction and segmentation. In Proceedings of the ICML 2000, pp. 591--598. Stanford, CA. 58

Digital Library

Google Scholar

[72]

D. McNeill. 1992. Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago. 31, 53

Google Scholar

[73]

G. Mehlmann. and E. André. 2012. Modeling multimodal integration with event logic charts. In Proceedings of the International Conference on Multimodal Interfaces (ICMI). pp. 125--132. Santa Monica, CA. 47, 48, 59, 60, 61

Digital Library

Google Scholar

[74]

M. Minsky. 1974. A framework for representing knowledge. MIT-AI Laboratory Memo 306. http://web.media.mit.edu/~minsky/papers/Frames/frames.html. Accessed June 17 2017. 28, 773

Digital Library

Google Scholar

[75]

M. Mohri, F. C. N. Pereira, and M. Riley. 1998. A rational design for a weighted finite-state transducer library. Lecture Notes in Computer Science, 1436: 144--158. 45

Digital Library

Google Scholar

[76]

L-P. Morency, C. Sidner, C. Lee, T. Darrell. 2007. Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelligence, 171: 568--585. 53

Digital Library

Google Scholar

[77]

J. G. Neal and S. C. Shapiro. 1991. Intelligent multi-media interface technology. In J. W. Sullivan and S. W. Tyler, editors. Intelligent User Interfaces. Addison Wesley, New York. pp. 45--68. 32

Digital Library

Google Scholar

[78]

M. J. Nederhof. 1997. Regular approximations of CFLs: A grammatical view. In Proceedings of the International Workshop on Parsing Technology. pp. 159--170, Boston, MA. 45, 63

Google Scholar

[79]

T. Nishimoto, N. Shida, T. Kobayashi, and K. Shirai. 1995. Improving human interface in drawing tool using speech, mouse, and keyboard. In Proceedings of the 4th IEEE International Workshop on Robot and Human Communication, ROMAN95. pp. 107--112. Tokyo. 31

Google Scholar

[80]

Openstream 2018, EVA:Enterprise Virtual Assistant. www.openstream.com. Accessed August 31, 2018. 24

Google Scholar

[81]

S. Oviatt and R. VanGent. 1996. Error resolution during multimodal human-computer interaction. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). pp. 204--207. Philadelphia, PA. 31, 606

Google Scholar

[82]

S. L. Oviatt. 1997a. Multimodal interactive maps: Designing for human performance. Human-Computer Interaction. 12(1): 93--129. &2_4. 59

Crossref

Google Scholar

[83]

S. Oviatt, A. DeAngeli, and K. Kuhn. 1997b. Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '97. pp. 415--422, New York. 38

Digital Library

Google Scholar

[84]

S. L. Oviatt. 1999. Mutual disambiguation of recognition errors in a multimodal architecture. In Proceedings of the Conference on Human Factors in Computing Systems: CHI'99, Pittsburgh, PA. pp. 576--583. 31, 32, 36, 46, 56

Digital Library

Google Scholar

[85]

S. Oviatt and P. Cohen. 2000. Perceptual User Interfaces: Multimodal Interfaces that process what comes naturally. Communications of the ACM 43.3, pp. 45--53. 56

Digital Library

Google Scholar

[86]

F. C. N. Pereira and M. D. Riley. 1997. Speech recognition by composition of weighted finite automata. In E. Roche and Y. Schabes, editors, Finite State Devices for Natural Language Processing. MIT Press, Cambridge, MA. pp. 431--456. 44

Google Scholar

[87]

C. Pollard and I. A. Sag. 1994. Head-Driven Phrase Structure Grammar. Center for the Study of Language and Information, University of Chicago Press, Chicago, IL. 27, 30, 33, 36, 39, 771, 786

Google Scholar

[88]

G. Potamianos, C. Neti, G. Gravier, A. Garg, A. W. Senior. 2003. Recent advances in the automatic recognition of audio-visual speech. In Proceedings of the IEEE 91:9, pp. 1306--1326. 24

Crossref

Google Scholar

[89]

G. Potamianos, E. Marcheret, Y. Mroueh, V. Goel, A. Loumbaroulis, A. Vartholomaios, S. Thermos. 2017. Audio and visual modality combination in speech processing applications. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, A. Krüger, editors, Handbook of Multimodal-Multisensor Interfaces: Volume 1: Foundations, User Modeling, and Common Modality Combinations. Morgan & Claypool Publishers, San Raphael, CA. 24

Digital Library

Google Scholar

[90]

L. R. Rabiner, A. E. Rosenberg, and S. E. Levinson. 1978. Considerations in dynamic time-warping algorithms for discrete word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ICASSP-26, October 1978. 58

Crossref

Google Scholar

[91]

O. Rambow, S. Bangalore, T. Butt, A. Nasr, and R. Sproat. 2002. Creating a finite-state parser with application semantics. In Proceedings of the International Conference on Computational Linguistics (COLING), Taipei. pp. 1--5.

Digital Library

Google Scholar

[92]

G. Riccardi, R. Pieraccini, and E. Bocchieri. 1996. Stochastic automata for language modeling. Computer Speech and Language, 10:(4): 265--293. 44

Google Scholar

[93]

E. Roche. 1999. Finite-state transducers: parsing free and frozen sentences. In A. Kornai, editor, Extended Finite-State Models of Language. Cambridge University Press, Cambridge, UK. pp. 108--120. 44

Digital Library

Google Scholar

[94]

A. L. Rosenberg. 1967. Multi-tape finite automata with rewind instructions. Journal of Computer and System Sciences, 1(3): 299--315. 45

Digital Library

Google Scholar

[95]

A. Rudnicky, and A. Hauptman. 1992. Multimodal interactions in speech systems. In M. Blattner & R. Dannenberg, editors, Multimedia Interface Design. pp. 147--172. New York: ACM Press. 31

Digital Library

Google Scholar

[96]

E. Selfridge and M. Johnston. 2015. Interact: tightly coupling multimodal dialog with an interactive virtual assistant. In Proceedings of the 17th ACM International Conference on Multimodal Interaction (ICMI), Seattle, WA. pp. 381--382. 23, 52

Digital Library

Google Scholar

[97]

R. Sharma, M. Yeasin, N. Krahnstoever, I. Rauschert, G. Cai, I. Brewer, A. M. MacEachren, K. Sengupta. 2003. Speech-gesture driven multimodal interfaces for crisis management. In Proceedings of the IEEE. 91(9): 1327--1354. 24, 32

Crossref

Google Scholar

[98]

M. Steedman. 1996. Surface Structure and Interpretation. MIT Press, Cambridge, MA. 39

Google Scholar

[99]

A. J. Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory IT-13: 260--269. 58

Digital Library

Google Scholar

[100]

M. T. Vo and A. Waibel. 1997. Modeling and Interpreting Multimodal Inputs: A Semantic Integration Approach. CMU Technical Report. CMU-CS-97--192. 57, 60, 61

Google Scholar

[101]

M. T. Vo. 1998. A Framework and Toolkit for the Construction of Multimodal Learning Interfaces. Ph.D. Thesis, Carnegie Mellon University, CMU-CS-98--129. 53, 57, 58, 61

Digital Library

Google Scholar

[102]

Y. Wang, R. J. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgiannakis, R. Clark, and R. A. Saurous. 2017. Tacotron: Towards end-to-end speech synthesis. In Proceedings of Interspeech 2017. pp. 4006--4010. 62

Crossref

Google Scholar

[103]

K. Wauchope. 1994. Eucalyptus: Integrating Natural Language Input with a Graphical User Interface. Naval Research Laboratory, Report NRL/FR/5510-94-9711.

Google Scholar

[104]

W. Wahlster. 2006. (editor) SmartKom: Foundations of Multimodal Dialogue Systems. Springer. 23, 52

Digital Library

Google Scholar

[105]

A. Waibel, M. Vo, P. Duchnowski, S. Manke. 1996. Multimodal interfaces. AI Review Journal, pp. 299--319. 33

Digital Library

Google Scholar

[106]

S. Watt, T. Underhill, Y-M. Chee, K. Franke, M. Froumentin, S. Madhvanath, J-A. Magana, G. Pakosz, G. Russell, M. Selvaraj, G. Seni, C. Tremblay, L. Yaeger. September 2011. Ink Markup Language (InkML). W3C Recommendation. https://www.w3.org/TR/2011/REC-InkML-20110920/. 42

Google Scholar

[107]

I. H. Witten and E. Frank. 2009. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann. 54, 55

Google Scholar

[108]

K. Wittenburg, L. Weitzman, and J. Talley. 1991. Unification-based grammars and tabular parsing for graphical languages. Journal of Visual Languages and Computing, 2:347--370. 39

Digital Library

Google Scholar

[109]

K. Wittenburg. 1993. F-PATR: Functional constraints for unification-based grammars. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. pp. 216--223.

Digital Library

Google Scholar

[110]

W. A. Woods. 1970. Transition network grammars for natural language analysis. Communications of the ACM, Columbus, OH. 13 (10): 591--606. 26, 49, 762

Digital Library

Google Scholar

[111]

M. Worsley and M. Johnston. 2010. Multimodal interactive spaces: MagicTV and MagicMAP. In Proceedings of the IEEE Spoken Language Technology Workshop, Berkeley, CA. pp. 161--162. 24

Google Scholar

[112]

L. Wu, S. L. Oviatt, and P. R. Cohen. 1999. Multimodal integration---A statistical view. IEEE Transactions on Multimedia, 1(4): 334--341. 36, 55, 56, 60, 61

Digital Library

Google Scholar

[113]

L. Wu, S. L. Oviatt, and P. R. Cohen. 2002. From members to teams to committee---A robust approach to gestural and multimodal recognition. In Proceedings of the IEEE Transactions on Neural Networks 13(40): 72--82. 36, 53, 55, 56, 60, 61

Digital Library

Google Scholar

[114]

D. H. Younger. 1967. Recognition and parsing of context-free languages in time n3. Information Control, 10(2): 189--208. 38

Crossref

Google Scholar

Cited By

View all

Hutchens MKrishnaswamy NCochran BPustejovsky J(2020)Jarvis: A Multimodal Visualization Tool for Bioinformatic DataHCI International 2020 – Late Breaking Papers: Interaction, Knowledge and Social Media10.1007/978-3-030-60152-2_9(104-116)Online publication date: 27-Sep-2020
https://doi.org/10.1007/978-3-030-60152-2_9
Tumuluri RDahl DPaternò FZancanaro M(2019)Standardized representations and markup languages for multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233806(347-392)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3233795.3233806
Feld MNeβelrath RSchwartz T(2019)Software platforms and toolkits for building multimodal systems and applicationsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233801(145-190)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3233795.3233801

Recommendations

Attention and Engagement Aware Multimodal Conversational Systems
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Despite their ability to complete certain tasks, dialog systems still suffer from poor adaptation to users' engagement and attention. We observe human behaviors in different conversational settings to understand human communication dynamics and then ...
Expressive multimodal conversational acts for SAIBA agents
IVA'11: Proceedings of the 10th international conference on Intelligent virtual agents

We discuss here the need to define what we call an agent conversational language, a language for Embodied Conversational Agents (ECA) to have conversations with a human. We propose a set of Expressive Multimodal Conversation Acts (EMCA), which is based ...
Conversational Grounding in Multimodal Dialog Systems
ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction

The process of “conversational grounding” is an interactive process that has been studied extensively in cognitive science, whereby participants in a conversation check to make sure their interlocutors understand what is being referred to. This ...

Comments

Information & Contributors

Information

Published In

The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions

July 2019

813 pages

ISBN:9781970001754

DOI:10.1145/3233795

Editors:
Sharon Oviatt
Monash University
,
Björn Schuller
Imperial College London and University of Augsburg
,
Philip R. Cohen
Monash University
,
Daniel Sonntag
German Research Center for Artificial Intelligence (DFKI)
,
Gerasimos Potamianos
University of Thessaly
,
Antonio Krüger
Saarland University and German Research Center for Artificial Intelligence (DFKI)

Publisher

Association for Computing Machinery and Morgan & Claypool

Publication History

Published: 01 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Appears in

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
162
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)4

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Hutchens MKrishnaswamy NCochran BPustejovsky J(2020)Jarvis: A Multimodal Visualization Tool for Bioinformatic DataHCI International 2020 – Late Breaking Papers: Interaction, Knowledge and Social Media10.1007/978-3-030-60152-2_9(104-116)Online publication date: 27-Sep-2020
https://doi.org/10.1007/978-3-030-60152-2_9
Tumuluri RDahl DPaternò FZancanaro M(2019)Standardized representations and markup languages for multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233806(347-392)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3233795.3233806
Feld MNeβelrath RSchwartz T(2019)Software platforms and toolkits for building multimodal systems and applicationsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233801(145-190)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3233795.3233801

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Attention and Engagement Aware Multimodal Conversational Systems

Expressive multimodal conversational acts for SAIBA agents

Conversational Grounding in Multimodal Dialog Systems