Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multimodal integration for interactive conversational systems

Published: 01 July 2019 Publication History
First page of PDF

References

[1]
S. P. Abney. 1991. Parsing by chunks. In R. Berwick, S. Abney, and C. Tenny, editors Principle-based parsing. IEEE, Los Alamitos, CA. pp. 257--278. 44
[2]
J. Alexandersson and T. Becker. 2001. Overlay as the basic operation for discourse processing in a multimodal dialogue system. In Proceedings of 2nd IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems. pp. 1--7. 36, 52, 59, 60, 61
[3]
J. Alexandersson, T. Becker, and N. Pfleger. 2004. Scoring for overlay based on informational distance. In Proceedings of KONVENS-04. Vienna, Austria. pp. 1--4.
[4]
C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, and M. Mohri. 2007. Openfst: A general and efficient weighted finite-state transducer library. In Proceedings of the Ninth International Conference on Implementation and Application of Automata, (CIAA 2007). Lecture Notes in Computer Science Vol. 4783, pp. 11--23. Springer, Berlin, Heidelberg. 45
[5]
J. Allgayer, R. M. Jansen-Winkeln, C. Reddig, N. Reithinger. 1989. Bidirectional use of knowledge in the multimodal NL access system XTRA. In Proceedings of IJCAI 1989, pp. 1492--1497. 32
[6]
H. Alshawi. 1987. Memory and Context for Language Interpretation. Cambridge, UK. 50
[7]
M. Al-Hames, A. Dielmann, D. Gatica-Perez, S. Reiter, S. Renals, G. Rigoli, and D. Zhang. 2006. Multimodal integration for meeting group action segmentation and recognition. In S. Renals and S. Bengio, editors, MLMI 2005, LNCS 3869, pp. 52--63. 53
[8]
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. Fan, C. Fougner, T. Han, A. Y. Hannun, B. Jun, P. LeGresley, L. Lin, S. Narang, A. Ng, S. Ozair, R. Prenger, J. Raiman, S. Sathees, D. Seetapun, S. Sengupta, Y. Wang, Z. Wang, C. Wang, B. Xiao, D. Yogatama, J. Zhan, and Z. Zhu. 2016. Deep Speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of the 33rd International Conference on Machine Learning, New York. 62
[9]
A. H. Anderson, M. Bader, E. Gurman Bard, E. Boyle, G. Doherty, S. Garrod, S. Isard, J. Kowtko, J. McAllister, J. Miller, C. Sotillo, and H. S. Thompson. 1991. The HCRC Map Task corpus. Language and Speech, 34(4). 54
[10]
E. André. 2002. Natural language in multimedia/multimodal systems. In Ruslan Mitkov, editor, Handbook of Computational Linguistics. Oxford University Press, New York. 33
[11]
O. Aran and D. Gatica-Perez. 2010. Fusing audio-visual nonverbal cues to detect dominant people in group conversations. In Proceedings of 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey. pp. 3687--3690. 25
[12]
T. Baltrušaitis, C. Ahuja, and L.-P. Morency. 2018. Challenges and applications in multimodal machine learning. In S. Oviatt, B. Schuller, P. R. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan & Claypool Publishers, San Rafael, CA. 23
[13]
S. Bangalore and G. Riccardi. 2002. Stochastic finite-state models of spoken language machine translation. Machine Translation. 17(3): 165--184. 44
[14]
S. Bangalore and M. Johnston. 2004. Balancing data-driven and rule-based approaches in the context of a multimodal conversational system. In Proceedings of the North American Association for Computational Linguistics/Human Language Technology (NAACL/SLT), pp. 33--40. Boston, MA. 45
[15]
S. Bangalore and M. Johnston. 2000. Tight-coupling of multimodal language processing with speech recognition. In Proceedings of the International Conference on Spoken Language Processing, Beijing. pp. 126--129.
[16]
S. Bangalore and M. Johnston. 2009. Robust understanding in multimodal interfaces. Computational Linguistics 35(3): 345--397. 41, 42, 44, 45, 59, 60, 63, 64
[17]
Bolt, R. A. 1980. "Put-that-there": voice and gesture at the graphics interface. Computer Graphics 14(3): 262--270. 24, 32
[18]
R. J. Brachman, D. L. McGuiness, P. F. Patel-Schneider, and L. A. Resnick. 1991. Living with CLASSIC: When and how to use a KL-ONE-like language. In J. Sowa, editor, Principles of Semantic Networks. Morgan Kaufmann, San Mateo, CA. 33
[19]
R. Carpenter. 1992. The Logic of Typed Feature Structures. Cambridge University Press, Cambridge, UK. 30, 33, 34, 788
[20]
Cassell, J. 1998. A framework for gesture generation and interpretation. In R. Cipolla and A. Pentland, editors, Computer Vision in Human-Machine Interaction, pp. 191--215. Cambridge University Press, Cambridge, UK. 54
[21]
M. Chatterjee, S. Park, L-P. Morency, and S. Scherer. 2015. Combining two perspectives on classifying multimodal data for recognizing speaker traits. In Proceedings of ICMI 2015, pp. 7--14. Seattle, WA. 25, 53 25, 53
[22]
J. Chai, P. Hong, and M. Zhou. 2004. A probabilistic approach to reference resolution in multimodal user interfaces. In Proceedings of 9th International Conference on Intelligent User Interfaces (IUI), Madeira, Portugal. pp. 70--77. 26, 51, 59, 60, 61, 762
[23]
C. Chao and A. L. Thomaz. 2012. Timed petri nets for multimodal interaction modeling. In Proceedings of ICMI 2012 Workshop on Speech and Gesture Production in Virtually and Physically Embodied Conversational Agents, Santa Monica, CA. 49
[24]
L. Chen and B. Di Eugenio 2013. Multimodality and dialog act classification in the RoboHelper project. In Proceedings of SigDial Conference, pp. 183--192. Association for Computational Linguistics. Metz, France. 53, 54, 60, 61
[25]
J. Cocke and J. T. Schwartz. 1970. Programming languages and their compilers: Preliminary notes (Technical report) (2nd revised ed.). Courant Institute of Mathematical Sciences. New York University, New York. 38
[26]
P. R. Cohen, M. Dalrymple, D. B. Moran, F. C. N. Pereira, J. W. Sullivan, R. A. Gargan, J. L. Schlossberg, and S. W. Tyler. 1989. Synergistic use of direct manipulation and natural language. In Proceedings of the Conference on Human Factors in Computing Systems (CHI'89), 227--234. New York: ACM Press. (Reprinted in Maybury & Wahlster editors, 1998. Readings in Intelligent User Interfaces pp. 29--37. San Francisco: Morgan Kaufmann.) 32
[27]
P. R. Cohen. 1992. The role of natural language in a multimodal interface. In Proceedings of the 5th Annual ACM Symposium on User Interface Software and Technology. Monterey, CA. pp. 143--149. ACM Press. 31
[28]
P. R. Cohen, M. Johnston, D. McGee, S. L. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow. 1997. Multimodal interaction for distributed interactive simulation. In Proceedings of Innovative Applications of Artificial Intelligence Conference. AAAI/MIT Press, Menlo Park, CA. 23, 34, 56
[29]
P. R. Cohen, M. Johnston, D. McGee, S. L. Oviatt, J. Clow, and I. Smith. 1998. The efficiency of multimodal interaction: A case study. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). Sydney, Australia. 31
[30]
P. R. Cohen, D. McGee, S. Oviatt, L. Wu, and J. Clow. 1999. Multimodal Interaction for 2D and 3D environments. L. Rosenblum and M. Macedonia, editors, IEEE Computer Graphics and Applications. IEEE Press, New York. 24
[31]
P. R. Cohen, E. C. Kaiser, C. M. Buchanan, and S. Lind. 2015. Sketch-Thru-Plan: A multimodal interface for command and control. Communications of ACM. April 2015. 58(4): pp. 56--65. 24, 32, 34
[32]
P. R. Cohen, and S. Oviatt. 2017. Multimodal speech and pen interfaces. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, A. Krüger, editors, Handbook of Multimodal-Multisensor Interfaces, Volume 1: Foundations, User Modeling, and Common Modality Combinations. Morgan & Claypool Publishers, San Rafael, CA. 24
[33]
A. Corradini, R. M. Wesson, and P. R. Cohen. 2002. A map-based system using speech and 3D gestures for pervasive computing. In Proceedings of International Conference on Multimodal Interfaces (ICMI). pp. 191--196. 24
[34]
C. Cortes, and V. Vapnik. 1995. Support-vector networks. Machine Learning 20.3, pp. 273--297. 53
[35]
A. Crimi, A. Guercio, G. Nota, G. Pacini, G. Tortora, and M. Tucci. 1991. Relation grammars and their application to multi-dimensional languages. Journal of Visual Languages and Computing, 2:333--346. 39
[36]
L. Duncan, W. Brown, C. Esposito, H. Holmback, and P. Xue. 1999. Enhancing Virtual Maintenance Environments with Speech Understanding. Boeing M&CT TechNet. Seattle, WA. 24
[37]
J. Earley. 1970. An efficient context-free parsing algorithm. Communications of the ACM. 13: pp. 94--102. 39
[38]
P. Ehlen and M. Johnston. 2010. Location grounding in multimodal local search. In Proceedings of the International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI '10), Beijing, China. 53, 55, 60, 61
[39]
P. Ehlen and M. Johnston. 2012. Multimodal dialogue in mobile local search. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, Santa Monica, CA, pp. 303--304. 52, 59, 60
[40]
P. Ehlen and M. Johnston. 2013. A multimodal dialogue interface for mobile local search. In Proceedings of the ACM Conference on Intelligent User Interfaces (IUI), Santa Monica, CA. pp. 63--64. 52, 61, 64
[41]
J. Eisenstein and R. Davis. 2004. Visual and linguistic information in gesture classification. In Proceedings of the International Conference on Multimodal Interaction (ICMI). State College, PA, USA. pp. 113--120. 53, 60, 61
[42]
A. L. Gorin, S. Levinson, A. Gertner, E. Goldman. 1991. Adaptive acquisition of language. Computer Speech and Language. 5:2, pp. 101--132. 57
[43]
A. L. Gorin, G. Riccardi, and J. H. Wright. 1997. How may I help you? Speech Communication. 23, pp. 113--127.
[44]
D. Harel. 1987. STATECHARTS: A visual formalism for complex systems. Science of Computer Programming. 8. pp. 231--274. North Holland. 30, 785
[45]
A. Hauptmann. 1989. Speech and gesture for graphic image manipulation. In Proceedings of CHI'89. pp. 241--245, Austin, TX. 31
[46]
R. Helm, K. Marriott, and M. Odersky. 1991. Building visual language parsers. In Proceedings of the Conference on Human Factors in Computing Systems: CHI '91, ACM Press, New York. pp. 105--112. 39
[47]
L. Hetherington. 2004. The MIT finite-state transducer toolkit for speech and language processing. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). Jeju Island, Korea. 45
[48]
C. Huls, E. Bos, and W. Classen. 1995. Automatic referent resolution of deictic and anaphoric expressions. Computational Linguistics 21: 59--79. 50, 55
[49]
M. Johnston. 1998. Unification-based multimodal parsing. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Montreal, Canada. pp. 624--630. 30, 37, 38, 47, 49, 52, 59, 60, 786
[50]
M. Johnston. 2000. Deixis and conjunction in multimodal systems. In Proceedings of the 18th Conference on Computational Linguistics (COLING), Saarbrücken, Germany. pp. 362--368. 41
[51]
M. Johnston, P. R. Cohen, D. McGee, S. L. Oviatt, J. A. Pittman, and I. Smith. 1997. Unification-based multimodal integration. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics. pp. 281--288. 34, 52, 55, 56, 59, 60, 61
[52]
M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, and P. Maloor. 2002a. MATCH: An architecture for multimodal dialog systems. In Proceedings of the Association of Computational Linguistics, Philadelphia, PA. pp. 376--383. 23, 42, 46
[53]
M. Johnston, S. Bangalore, A. Stent, G. Vasireddy, and P. Ehlen. 2002b. Multimodal language processing for mobile information access. In Proceedings of the International Conference on Spoken Language Processing, Denver, CO. pp. 2237--2240. 41, 59
[54]
M. Johnston and S. Bangalore. 2005. Finite-state multimodal integration and understanding. Journal of Natural Language Engineering, 11(2): 159--187. 32, 41, 43, 44, 45, 46, 59, 60, 63
[55]
M. Johnston and S. Bangalore. 2001. Finite-state methods for multimodal parsing and integration. In Proceedings of the ESSLLI Workshop on Finite-state Methods, Helsinki, Finland. 41, 43, 44, 45, 59, 60, 63
[56]
M. Johnston and P. Ehlen. 2010. Speak4It<sup>™</sup>: Multimodal interaction in the wild. In Proceedings of the IEEE Spoken Language Technology workshop, Berkeley, CA. pp. 59--60. 23, 24, 55
[57]
A. Joshi and P. Hopely. 1997. A parser from antiquity. Journal of Natural Language Engineering, 2(4): 6--15. 44
[58]
J. Lafferty, A. McCallum, and F. C. N. Pereira, 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Departmental Paper CIS, UPENN. June 2001. 53
[59]
E. Kaiser, A. Olwal, D. McGee, H. Benko, A. Corradini, X. Li, P. Cohen, and S. Feiner. 2003. Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In Proceedings of the 5th International Conference on Multimodal Interfaces (ICMI). New York. pp. 12--19. 24, 32, 51
[60]
R. M. Kaplan and J. Bresnan. 1995. Lexical-functional grammar: A formal system for grammatical representation. In J. Bresnan, editor, The Mental Representation of Grammatical Relations, pp. 173--181. MIT Press, Cambridge, MA. 27, 771
[61]
R. M. Kaplan and M. Kay. 1994. Regular models of phonological rule systems. Computational Linguistics, 20(3): 331--378. 44
[62]
L. Kartunnen. 1991. Finite-state constraints. In Proceedings of the International Conference on Current Issues in Computational Linguistics, Universiti Sains Malaysia, Penang. 44
[63]
T. Kasami. 1965. An efficient recognition and syntax-analysis algorithm for context-free languages (Technical report). AFCRL. 65--758. 38
[64]
A. Kehler, J. C. Martin, A. Cheyer, L. Julia, J. R. Hobbs, and J. Bear. 1998. On representing salience and reference in multimodal human-computer interaction. In Proceedings of the AAAI-98 Workshop on Representations for Multimodal Human-Computer Interaction, Madison, WI. 50, 55, 59
[65]
A. Kehler. 2000. Cognitive status and form of reference in multimodal human-computer interaction. In Proceedings of the AAAI'00. pp. 685--689. Austin TX. 50
[66]
K. K. Koskenniemi. 1984. Two-level morphology: A general computation model for wordform recognition and production. Ph.D. thesis, University of Helsinki. 44
[67]
D. B. Koons, C. J. Sparrell, and K. R. Thorisson. 1993. Integrating simultaneous input from speech, gaze, and hand gestures. In M. T. Maybury, editor, Intelligent Multimedia Interfaces. AAAI Press/MIT Press, Cambridge, MA, pp. 257--276. 33
[68]
F. Lakin. 1986. Spatial parsing for visual languages. In S. K. Chang, T. Ichikawa, and P. A. Ligomenides, editors, Visual Languages. Plenum Press. pp. 35--85. 39
[69]
M. E. Latoschik. 2002. Designing transition networks for multimodal VR-interactions using a markup language. In Proceedings of the Fourth ACM International Conference on Multimodal Interfaces (ICMI), Pittsburgh, PA. pp. 411--416. 32
[70]
X. Ma and E. Hovy. 2016. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the ACL. pp. 1064--1074. Berlin, Germany. 53
[71]
A. McCallum, D. Freitag, and F. Pereira. 2000. Maximum entropy markov models for information extraction and segmentation. In Proceedings of the ICML 2000, pp. 591--598. Stanford, CA. 58
[72]
D. McNeill. 1992. Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago. 31, 53
[73]
G. Mehlmann. and E. André. 2012. Modeling multimodal integration with event logic charts. In Proceedings of the International Conference on Multimodal Interfaces (ICMI). pp. 125--132. Santa Monica, CA. 47, 48, 59, 60, 61
[74]
M. Minsky. 1974. A framework for representing knowledge. MIT-AI Laboratory Memo 306. http://web.media.mit.edu/~minsky/papers/Frames/frames.html. Accessed June 17 2017. 28, 773
[75]
M. Mohri, F. C. N. Pereira, and M. Riley. 1998. A rational design for a weighted finite-state transducer library. Lecture Notes in Computer Science, 1436: 144--158. 45
[76]
L-P. Morency, C. Sidner, C. Lee, T. Darrell. 2007. Head gestures for perceptual interfaces: The role of context in improving recognition. Artificial Intelligence, 171: 568--585. 53
[77]
J. G. Neal and S. C. Shapiro. 1991. Intelligent multi-media interface technology. In J. W. Sullivan and S. W. Tyler, editors. Intelligent User Interfaces. Addison Wesley, New York. pp. 45--68. 32
[78]
M. J. Nederhof. 1997. Regular approximations of CFLs: A grammatical view. In Proceedings of the International Workshop on Parsing Technology. pp. 159--170, Boston, MA. 45, 63
[79]
T. Nishimoto, N. Shida, T. Kobayashi, and K. Shirai. 1995. Improving human interface in drawing tool using speech, mouse, and keyboard. In Proceedings of the 4th IEEE International Workshop on Robot and Human Communication, ROMAN95. pp. 107--112. Tokyo. 31
[80]
Openstream 2018, EVA:Enterprise Virtual Assistant. www.openstream.com. Accessed August 31, 2018. 24
[81]
S. Oviatt and R. VanGent. 1996. Error resolution during multimodal human-computer interaction. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). pp. 204--207. Philadelphia, PA. 31, 606
[82]
S. L. Oviatt. 1997a. Multimodal interactive maps: Designing for human performance. Human-Computer Interaction. 12(1): 93--129. &2_4. 59
[83]
S. Oviatt, A. DeAngeli, and K. Kuhn. 1997b. Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '97. pp. 415--422, New York. 38
[84]
S. L. Oviatt. 1999. Mutual disambiguation of recognition errors in a multimodal architecture. In Proceedings of the Conference on Human Factors in Computing Systems: CHI'99, Pittsburgh, PA. pp. 576--583. 31, 32, 36, 46, 56
[85]
S. Oviatt and P. Cohen. 2000. Perceptual User Interfaces: Multimodal Interfaces that process what comes naturally. Communications of the ACM 43.3, pp. 45--53. 56
[86]
F. C. N. Pereira and M. D. Riley. 1997. Speech recognition by composition of weighted finite automata. In E. Roche and Y. Schabes, editors, Finite State Devices for Natural Language Processing. MIT Press, Cambridge, MA. pp. 431--456. 44
[87]
C. Pollard and I. A. Sag. 1994. Head-Driven Phrase Structure Grammar. Center for the Study of Language and Information, University of Chicago Press, Chicago, IL. 27, 30, 33, 36, 39, 771, 786
[88]
G. Potamianos, C. Neti, G. Gravier, A. Garg, A. W. Senior. 2003. Recent advances in the automatic recognition of audio-visual speech. In Proceedings of the IEEE 91:9, pp. 1306--1326. 24
[89]
G. Potamianos, E. Marcheret, Y. Mroueh, V. Goel, A. Loumbaroulis, A. Vartholomaios, S. Thermos. 2017. Audio and visual modality combination in speech processing applications. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, A. Krüger, editors, Handbook of Multimodal-Multisensor Interfaces: Volume 1: Foundations, User Modeling, and Common Modality Combinations. Morgan & Claypool Publishers, San Raphael, CA. 24
[90]
L. R. Rabiner, A. E. Rosenberg, and S. E. Levinson. 1978. Considerations in dynamic time-warping algorithms for discrete word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ICASSP-26, October 1978. 58
[91]
O. Rambow, S. Bangalore, T. Butt, A. Nasr, and R. Sproat. 2002. Creating a finite-state parser with application semantics. In Proceedings of the International Conference on Computational Linguistics (COLING), Taipei. pp. 1--5.
[92]
G. Riccardi, R. Pieraccini, and E. Bocchieri. 1996. Stochastic automata for language modeling. Computer Speech and Language, 10:(4): 265--293. 44
[93]
E. Roche. 1999. Finite-state transducers: parsing free and frozen sentences. In A. Kornai, editor, Extended Finite-State Models of Language. Cambridge University Press, Cambridge, UK. pp. 108--120. 44
[94]
A. L. Rosenberg. 1967. Multi-tape finite automata with rewind instructions. Journal of Computer and System Sciences, 1(3): 299--315. 45
[95]
A. Rudnicky, and A. Hauptman. 1992. Multimodal interactions in speech systems. In M. Blattner & R. Dannenberg, editors, Multimedia Interface Design. pp. 147--172. New York: ACM Press. 31
[96]
E. Selfridge and M. Johnston. 2015. Interact: tightly coupling multimodal dialog with an interactive virtual assistant. In Proceedings of the 17th ACM International Conference on Multimodal Interaction (ICMI), Seattle, WA. pp. 381--382. 23, 52
[97]
R. Sharma, M. Yeasin, N. Krahnstoever, I. Rauschert, G. Cai, I. Brewer, A. M. MacEachren, K. Sengupta. 2003. Speech-gesture driven multimodal interfaces for crisis management. In Proceedings of the IEEE. 91(9): 1327--1354. 24, 32
[98]
M. Steedman. 1996. Surface Structure and Interpretation. MIT Press, Cambridge, MA. 39
[99]
A. J. Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory IT-13: 260--269. 58
[100]
M. T. Vo and A. Waibel. 1997. Modeling and Interpreting Multimodal Inputs: A Semantic Integration Approach. CMU Technical Report. CMU-CS-97--192. 57, 60, 61
[101]
M. T. Vo. 1998. A Framework and Toolkit for the Construction of Multimodal Learning Interfaces. Ph.D. Thesis, Carnegie Mellon University, CMU-CS-98--129. 53, 57, 58, 61
[102]
Y. Wang, R. J. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgiannakis, R. Clark, and R. A. Saurous. 2017. Tacotron: Towards end-to-end speech synthesis. In Proceedings of Interspeech 2017. pp. 4006--4010. 62
[103]
K. Wauchope. 1994. Eucalyptus: Integrating Natural Language Input with a Graphical User Interface. Naval Research Laboratory, Report NRL/FR/5510-94-9711.
[104]
W. Wahlster. 2006. (editor) SmartKom: Foundations of Multimodal Dialogue Systems. Springer. 23, 52
[105]
A. Waibel, M. Vo, P. Duchnowski, S. Manke. 1996. Multimodal interfaces. AI Review Journal, pp. 299--319. 33
[106]
S. Watt, T. Underhill, Y-M. Chee, K. Franke, M. Froumentin, S. Madhvanath, J-A. Magana, G. Pakosz, G. Russell, M. Selvaraj, G. Seni, C. Tremblay, L. Yaeger. September 2011. Ink Markup Language (InkML). W3C Recommendation. https://www.w3.org/TR/2011/REC-InkML-20110920/. 42
[107]
I. H. Witten and E. Frank. 2009. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann. 54, 55
[108]
K. Wittenburg, L. Weitzman, and J. Talley. 1991. Unification-based grammars and tabular parsing for graphical languages. Journal of Visual Languages and Computing, 2:347--370. 39
[109]
K. Wittenburg. 1993. F-PATR: Functional constraints for unification-based grammars. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. pp. 216--223.
[110]
W. A. Woods. 1970. Transition network grammars for natural language analysis. Communications of the ACM, Columbus, OH. 13 (10): 591--606. 26, 49, 762
[111]
M. Worsley and M. Johnston. 2010. Multimodal interactive spaces: MagicTV and MagicMAP. In Proceedings of the IEEE Spoken Language Technology Workshop, Berkeley, CA. pp. 161--162. 24
[112]
L. Wu, S. L. Oviatt, and P. R. Cohen. 1999. Multimodal integration---A statistical view. IEEE Transactions on Multimedia, 1(4): 334--341. 36, 55, 56, 60, 61
[113]
L. Wu, S. L. Oviatt, and P. R. Cohen. 2002. From members to teams to committee---A robust approach to gestural and multimodal recognition. In Proceedings of the IEEE Transactions on Neural Networks 13(40): 72--82. 36, 53, 55, 56, 60, 61
[114]
D. H. Younger. 1967. Recognition and parsing of context-free languages in time n3. Information Control, 10(2): 189--208. 38

Cited By

View all
  • (2020)Jarvis: A Multimodal Visualization Tool for Bioinformatic DataHCI International 2020 – Late Breaking Papers: Interaction, Knowledge and Social Media10.1007/978-3-030-60152-2_9(104-116)Online publication date: 27-Sep-2020
  • (2019)Standardized representations and markup languages for multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233806(347-392)Online publication date: 1-Jul-2019
  • (2019)Software platforms and toolkits for building multimodal systems and applicationsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233801(145-190)Online publication date: 1-Jul-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Books
The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions
July 2019
813 pages
ISBN:9781970001754
DOI:10.1145/3233795

Publisher

Association for Computing Machinery and Morgan & Claypool

Publication History

Published: 01 July 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Appears in

ACM Books

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)4
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Jarvis: A Multimodal Visualization Tool for Bioinformatic DataHCI International 2020 – Late Breaking Papers: Interaction, Knowledge and Social Media10.1007/978-3-030-60152-2_9(104-116)Online publication date: 27-Sep-2020
  • (2019)Standardized representations and markup languages for multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233806(347-392)Online publication date: 1-Jul-2019
  • (2019)Software platforms and toolkits for building multimodal systems and applicationsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233801(145-190)Online publication date: 1-Jul-2019

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media