Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Software platforms and toolkits for building multimodal systems and applications

Published: 01 July 2019 Publication History
First page of PDF

References

[1]
J. Allen and M. Core, 1997. Draft of DAMSL: Dialog act markup in several layers. Unpublished manuscript. https://www.cs.rochester.edu/research/speech/damsl/RevisedManual/. 161
[2]
J. Allen, G. Ferguson, and A. Stent. 2001. An architecture for more realistic conversational systems. In Proceedings of the 6th International Conference on Intelligent User Interfaces, IUI '01, pp. 1--8. ACM. 156
[3]
P. K. Atrey, M. Hossain, A. El Saddik, and M. S. Kankanhalli. 2010. Multimodal fusion for multimedia analysis: A survey. Multimedia Systems, 16(6): 345--379. 160
[4]
H. U. Block, R. Caspari, and S. Schachtl. 2004. Callable manuals - access to product documentation via voice (Anrufbare Bedienungsanleitungen - Zugang zu Produktdokumentation über Sprache). Information Technology, 46(6): 299--305. 166
[5]
D. Bobbert and M. Wolska. 2007. Dialog OS: An extensible platform for teaching spoken dialogue systems. In Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue, Trento, pp. 159--160. 166
[6]
D. G. Bobrow, R. M. Kaplan, M. Kay, D. A. Norman, H. Thompson, and T. Winograd. 1977. Gus, a frame-driven dialog system. Artificial intelligence, 8(2): 155--173. 156
[7]
R. A. Bolt. 1980. "Put-that-there": Voice and gesture at the graphics interface. In Proceedings of the 7th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH '80, pp. 262--270. ACM, New York. . 157
[8]
A. Bordes, Y.-L. Boureau, and J. Weston. 2017. Learning end-to-end goal-oriented dialog. In Proceedings of the 5th International Conference on Learning Representations. 182
[9]
T. H. Bui. 2006. Multimodal dialogue management - state of the art. Technical Report TR-CTIT-06-01, Centre for Telematics and Information Technology, University of Twente, Enschede, The Netherlands. 154
[10]
H. Bunt. 2000. Dialogue pragmatics and context specification. In Abduction, Belief and Context in Dialogue: Studies in Computational Pragmatics, pp. 81--150. John Benjamins. 162
[11]
H. Bunt. 2009. The DIT<sup>++</sup> taxonomy for functional dialogue markup. In D. Heylen, C. Pelachaud, R. Catizone, and D. Traum, editors, AAMAS 2009 Workshop, Towards a Standard Markup Language for Embodied Dialogue Acts, pp. 13--24. 161
[12]
H. Bunt. 2011a. Multifunctionality in dialogue. Journal Computer Speech and Language, 25(2): 222--245. 162
[13]
H. Bunt. 2011b. The semantics of dialogue acts. In Proceedings of the 9th International Conference on Computational Semantics, IWCS '11, pp. 1--13. Association for Computational Linguistics, Stroudsburg, PA. http://portal.acm.org/citation.cfm?id=2002670. 161
[14]
H. Bunt, M. Kipp, M. T. Maybury, and W. Wahlster. 2005. Fusion and coordination for multimodal interactive information presentation. In O. Stock and M. Zancanaro, editors, Multimodal Intelligent Information Presentation, vol. 27 of Text, Speech and Language Technology, pp. 325--339. Springer, Dordrecht, The Netherlands. 146
[15]
J. Cassell. 2000. More than just another pretty face: Embodied conversational interface agents. Communications of the ACM, 43(4): 70--78. 161
[16]
S. Castronovo, A. Mahr, M. Pentcheva, and C. Müller. September 2010. Multimodal dialog in the car: Combining speech and turn-and-push dial to control comfort functions. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), pp. 510--513. ISCA, Makuhari, Japan. 173, 174
[17]
P. R. Cohen and S. Oviatt. 2017. Multimodal speech and pen interfaces. In S. Oviatt, B. Schuller, P. R. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 1: Foundations, User Modeling, and Common Modality Combinations. Morgan & Claypool Publishers, San Rafael, CA. 159
[18]
P. R. Cohen and C. R. Perrault. 1979. Elements of a plan-based theory of speech acts. Cognitive Science, 3(3): 177--212. 156
[19]
P. R. Cohen, M. Johnston, D. McGee, S. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow. 1997. QuickSet: Multimodal Interaction for Distributed Applications. In Proceedings of the 5th ACM International Conference on Multimedia, pp. 31--40. ACM. 160
[20]
D. Costa and C. Duarte. 2009. Improving interaction with tv-based applications through adaptive multimodal fission. In K. Blashki and P. Isaias, editors, Emerging Research and Trends in Interactivity and the Human-Computer Interface, Ch. 3, pp. 54--73. IGI Global, Hershey, PA. 151
[21]
J. Coutaz, L. Nigay, D. Salber, A. Blandford, J. May, and R. M. Young. 1995. Four Easy Pieces for Assessing the Usability of Multimodal Interaction: The Care Properties. In Proceedings of the INTER-ACT 95-IFIP TC13 Fifth International Conference on Human-Computer Interaction, vol. 95, pp. 115--120. Springer US, Boston, MA. 157
[22]
D. A. Dahl. November 2013. The W3C multimodal architecture and interfaces standard. Journal on Multimodal User Interfaces, 7(3): 171--182. 169
[23]
G. Di Fabbrizio, J. Wilpon, and T. Okken. 2009. A speech mashup framework for multimodal mobile services. In Proceedings of the 11th International Conference on Multimodal Interfaces and the 6th Workshop on Machine Learning for Multimodal Interfaces ICMI-MLMI, pp. 71--78. Cambridge, MA. 164
[24]
C. Endres. 2012a. PresTK: Situation-aware presentation of messages and infotainment content for drivers. Ph.D. thesis, Saarland University, Saarbrücken, Saarland, Germany. 153
[25]
C. Endres. 2012b. Real-time assessment of driver cognitive load as a prerequisite for the situation-aware presentation toolkit PresTK. In Adjunct Proceedings of the 4th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, AutomotiveUI 2012, pp. 76--79. 153
[26]
C. Endres, T. Schwartz, M. Feld, and C. Müller. February 2010. Cinematic analysis of automotive personalization. In Proceedings of the 15th International Conference on Intelligent User Interfaces, IUI 2010, pp. 1--6. ACM, Hong Kong, China. 176
[27]
M. E. Foster. 2002. COMIC project deliverable: State of the art review: Multimodal fission. Available at http://groups.inf.ed.ac.uk/comic/documents/deliverables/Del6-1.pdf. Last accessed January 2019. 151
[28]
A. Gruenstein, I. McGraw, and I. Badr. 2008. The WAMI toolkit for developing, deploying, and evaluating web-accessible multimodal interfaces. In Proceedings of the 10th International Conference on Multimodal Interfaces, ICMI '08, pp. 141--148. ACM, New York, NY. 165
[29]
F. Honold, F. Schüssel, and M. Weber. 2012. Adaptive probabilistic fission for multimodal systems. In Proceedings 2012 Conference of the Computer-Human Interaction Special Interest Group (CHSIG) of Australia on Computer-Human Interaction, pp. 222--231. ACM. 152
[30]
ISO 24617--2:2012. 2012. Language Resource Management---Semantic Annotation Framework (SemAF)---Part 2: Dialogue Acts. ISO, Geneva, Switzerland.
[31]
A. Jaimes and N. Sebe. 2007. Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding, 108(1--2): 116--134. 159
[32]
M. Johnston. 2019. Multimodal integration for Interactive Conversational systems. In S. Oviatt, B. Schuller, P. R. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions. Morgan & Claypool Publishers, San Rafael, CA.
[33]
M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, and P. Maloor. 2002. MATCH: An Architecture for Multimodal Dialogue Systems. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 376--383. Association for Computational Linguistics. 160
[34]
M. Johnston, P. Baggia, D. C. Burnett, J. Carter, D. A. Dahl, G. McCobb, and D. Raggett, February 2009. EMMA: Extensible MultiModal Annotation Markup Language - W3C Recommendation. http://www.w3.org/TR/emma/. 157
[35]
D. Jurafsky and J. H. Martin. 2009. Speech and Language Processing, 2nd Edition, Dialogue and Conversational Agents, pp. 863--891. Pearson, Upper Saddle River, New Jersey. 150, 154
[36]
G.-J. M. Kruijff, J. D. Kelleher, and N. Hawes. 2006. Information fusion for visual reference resolution in dynamic situated dialogue. In E. André, L. Dybkjær, W. Minker, H. Neumann, and M. Weber, editors, Perception and Interactive Technologies, pp. 117--128. Springer, Berlin, Heidelberg. 163
[37]
P. Lison. 2012. Probabilistic dialogue models with prior domain knowledge. In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL '12, pp. 179--188. Association for Computational Linguistics. 156
[38]
L. Lucignano, F. Cutugno, S. Rossi, and A. Finzi. 2013. A dialogue system for multimodal human-robot interaction. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 197--204. ACM. 182
[39]
M. T. Maybury and W. Wahlster. 1998. Intelligent user interfaces: An introduction. In M. T. Maybury and W. Wahlster, editors, Readings in Intelligent User Interfaces, pp. 1--13. Morgan-Kaufmann, San Francisco, CA. 146, 147
[40]
K. McKeown. 1985. Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, New York, NY. 152
[41]
M. M. Moniri, M. Feld, and C. Müller. June 2012. Personalized in-vehicle information systems: Building an application infrastructure for smart cars in smart spaces. In Proceedings of the 8th International Conference on Intelligent Environments, IE'12, pp. 379--382. IEEE, Guanajuato, Mexico. 163, 174
[42]
J. D. Moore. 1995. Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context. MIT Press, Cambridge, MA. 152
[43]
R. Neßelrath and M. Feld. 2013. Towards a cognitive load ready multimodal dialogue system for in-vehicle human-machine interaction. In Adjunct Proceedings of the 5th International Conference on Automotive User Interfaces and Interactive Vehicular Applications, AutomotiveUI 2013, pp. 49--52. Eindhoven. 179
[44]
R. Neßelrath and M. Feld. July 2014. SiAM-dp: A platform for the model-based development of context-aware multimodal dialogue applications. In Proceedings of the 10th International Conference on Intelligent Environments. IEEE. 170
[45]
R. Neßelrath and D. Porta. July 2011. Rapid development of multimodal dialogue applications with semantic models. In Proceedings of the 7th IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems, KRPD-11. Twenty-Second International Joint Conference on Artificial Intelligence, IJCAI-11. Barcelona, Spain. 168
[46]
R. Neßelrath, M. M. Moniri, and M. Feld. 2016. Combining speech, gaze, and micro gestures for the multimodal control of in-car functions. In Proceedings of the International Conference on Intelligent Environments, IE-16. IEEE, London. 173, 174
[47]
L. Nigay and J. Coutaz. 1993. A design space for multimodal systems: Concurrent processing and data fusion. In Proceedings of the INTERACT '93 and CHI '93 Conference on Human Factors in Computing Systems, CHI '93, pp. 172--178. ACM, New York. 157, 159
[48]
S. Oviatt. 1999a. Mutual disambiguation of recognition errors in a multimodel architecture. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems, pp. 576--583. ACM. 170
[49]
S. Oviatt. 1999b. Ten myths of multimodal interaction. Communications of the ACM, 42(11): 74--81. 159
[50]
S. Oviatt. 2012. Multimodal interfaces. In J. A. Jacko and A. Sears, editors, The Human Computer Interaction Handbook, ch. 18, pp. 405--430. CRC Press, Boca Raton, FL. 159
[51]
S. Oviatt, A. DeAngeli, and K. Kuhn. 1997. Integration and synchronization of input modes during multimodal human-computer interaction. In Referring Phenomena in a Multimedia Context and Their Computational Treatment, ReferringPhenomena '97, pp. 1--13. Association for Computational Linguistics, Stroudsburg, PA, USA. 163
[52]
S. L. Oviatt and P. R. Cohen. 2015a. The Paradigm Shift to Multimodality in Contemporary Computer Interfaces, Ch. 9. Morgan & Claypool Publishers, San Rafael, CA. 170
[53]
S. L. Oviatt and P. R. Cohen. 2015b. The Paradigm Shift to Multimodality in Contemporary Computer Interfaces, Ch. 7. Morgan & Claypool Publishers, San Rafael, CA. 160, 161
[54]
S. L. Oviatt, R. Lunsford, and R. Coulston. 2005. Individual differences in multimodal integration patterns: What are they and why do they exist? In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '05, pp. 241--249. ACM, New York. 159
[55]
C. R. Perrault and J. F. Allen. 1980. A plan-based analysis of indirect speech acts. Computational Linguistics, 6(3--4): 167--182. 156
[56]
V. Petukhova. 2011. Multidimensional dialogue modelling. Ph.D. thesis, Tilburg University. 162
[57]
V. Petukhova and H. Bunt. May 2012. The coding and annotation of multimodal dialogue acts. In Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC'12. European Language Resources Association (ELRA), Istanbul. 162
[58]
N. Pfleger. 2007. Context-based multimodal interpretation: An integrated approach to multimodal fusion and discourse processing. Ph.D. thesis, Universität des Saarlandes. 162, 163
[59]
D. Porta, M. Deru, S. Bergweiler, G. Herzog, and P. Poller. 10 2014. Building Multimodal Dialog User Interfaces in the Context of the Internet of Services, pp. 145--162. Cognitive Technologies. Springer, Cham. 168
[60]
Z. Prasov and J. Y. Chai. 2010. Fusing eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 471--481. Association for Computational Linguistics. 163
[61]
B. Schuller, R. Müller, F. Eyben, J. Gast, B. Hörnler, M. Wöllmer, G. Rigoll, A. Höthker, and H. Konosu. 2009. Being bored? Recognising natural interest by extensive audiovisual integration for real-life application. Image and Vision Computing, 27(12): 1760--1774. 160
[62]
N. Schutte, J. Kelleher, and B. Mac Namee. 2010. Visual salience and reference resolution in situated dialogues: A corpus-based evaluation. In Proceedings of the AAAI Symposium on Dialog with Robots, pp. 109--114. Dublin Institute of Technology. 163
[63]
S. Schwärzler, A. Bannat, J. Gast, F. Wallhoff, M. Giuliani, M. Kassecker, C. Mayer, M. Wimmer, C. Wendt, and S. Schmidt. 2008. MuDiS - a multimodal dialogue system for human-robot interaction. Technical report, Cotesys. 160
[64]
M. Serrano and L. Nigay. 2009. Temporal aspects of CARE-based multimodal fusion: From a fusion mechanism to composition components and WoZ components. In Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI '09, pp. 177--184. ACM, New York, NY. 157
[65]
D. Song. December 2006. Combining speech user interfaces of different applications. Ph.D. thesis, Ludwig-Maximilians-Universität München. http://nbn-resolving.de/urn:nbn:de:bvb:19-62088 166
[66]
D. Sonntag, R. Neßelrath, G. Sonnenberg, and G. Herzog, December 2009. Supporting a rapid dialogue engineering process. Paper presented at the First International Workshop on Spoken Dialogue Systems Technology, IWSDS-2009, Kloster Irsee, Germany. 168
[67]
S. Tamura, K. Iwano, and S. Furui. 2004. Multimodal speech recognition using optical-flow analysis for lip images. In J.-F. Wang, S. Furui, and B.-H. Juang, editors, Real World Speech Processing, pp. 43--50. Springer US. 4. 160
[68]
D. Traum and S. Larsson. 2003. The information state approach to dialogue management. In J. van Kuppevelt and R. W. Smith, editors, Current and New Directions in Discourse and Dialogue, vol. 22 of Text, Speech and Language Technology, pp. 325--353. Springer, Dordrecht, The Netherlands. 150, 156
[69]
R. Tumuluri and R. Cohen, P. 2019. Commercialization of multimodal systems. In S. Oviatt, B. Schuller, P. R. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions. Morgan & Claypool Publishers, San Rafael, CA.
[70]
R. Tumuluri, D. Dahl, F. Paterno, and M. Zancanaro. 2019. Standardized representations and markup languages for MMI. In S. Oviatt, B. Schuller, P. R. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions. Morgan & Claypool Publishers, San Rafael, CA. 169
[71]
M. Turk and M. Kölsch. 2003. Perceptual interfaces. Technical report, University of California, Santa Barbara. https://www.cs.ucsb.edu/research/tech_reports/reports/2003-33.pdf. 159
[72]
W. Wahlster. 2003. SmartKom: Symmetric multimodality in an adaptive and reusable dialogue shell. In R. Krahl and D. Günther, editors, Proceedings of the Human Computer Interaction Status Conference 2003, pp. 47--62. DLR. 153, 160, 167
[73]
W. Wahlster, editor. 2006. SmartKom: Foundations of Multimodal Dialogue Systems. Cognitive Technologies. Springer, Berlin Heidelberg, Germany. 153, 167
[74]
W. Wahlster. 2014. Multiadaptive interfaces to cyber-physical environments. In Proceedings of the 19th international conference on Intelligent User Interfaces, pp. 1--2. ACM. Keynote. 148
[75]
W. Wahlster, N. Reithinger, and A. Blocher. 2001. SmartKom: Multimodal communication with a life-like character. In Proceedings of the 7th European Conference on Speech Communication and Technology Eurospeech 2001, vol. 3, pp. 1547--1550. 160
[76]
R. Wasinger. 2006. Multimodal Interaction with Mobile Devices: Fusing a Broad Spectrum of Modality Combinations. Aka Verlag, Heidelberg, Germany. 153, 160, 161
[77]
R. Wasinger, C. Kray, and C. Endres. 2003. Controlling multiple devices. In Physical Interaction (PI03) - Workshop on Real World User Interfaces in Conjunction with MobileHCI '03, pp. 60--63. 153
[78]
B. L. Webber. 1978. Description formation and discourse model synthesis. In Proceedings of the 1978 Workshop on Theoretical Issues in Natural Language Processing, pp. 42--50. Association for Computational Linguistics. 162
[79]
T. Wen, D. Vandyke, N. Mrkšíc, M. Gašíc, L. Rojas-Barahona, P. Su, S. Ultes, and S. Young. 2017. A network-based end-to-end trainable task-oriented dialogue system. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Proceedings of Conference, vol. 1, pp. 438--449. 181
[80]
C. D. Wickens, D. L. Sandry, and M. Vidulich. 1983. Compatibility and resource competition between modalities of input, central processing, and output. Human Factors: The Journal of the Human Factors and Ergonomics Society, 25(2): 227--248. 179
[81]
J. D. Williams and S. Young. 2007. Partially observable markov decision processes for spoken dialog systems. Computer Speech & Language, 21(2): 393--422. 182
[82]
J. D. Williams, K. Asadi, and G. Zweig. 2017. Hybrid code networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 665--677. 181
[83]
B. Xiao, C. Girand, and S. L. Oviatt. 2002. Multimodal integration patterns in children. In J. H. L. Hansen and B. L. Pellom, editors, Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP 2002, pp. 629--632. ISCA. 159
[84]
B. Xiao, R. Lunsford, R. Coulston, M. Wesson, and S. Oviatt. 2003. Modeling multimodal integration patterns and performance in seniors: Toward adaptive processing of individual differences. In Proceedings of the 5th International Conference on Multimodal Interfaces, ICMI '03, pp. 265--272. ACM, New York, NY. 159
[85]
S. Young, M. Gašić, B. Thomson, and J. D. Williams. 2013. POMDP-based statistical spoken dialog systems: A review. In Proceedings of the IEEE, 101(5): 1160--1179. 182
[86]
T. Zhao and M. Eskenazi. 2016. Towards end-to-end learning for dialog state tracking and management using deep reinforcement learning. In 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 181
[87]
J. Zhou, K. Yu, F. Chen, Y. Wang, and S. Z. Arshad. 2018. Multimodal Behavioural and Physiological Signals as Indicators of Cognitive Load. In S. Oviatt, B. Schuller, P. Cohen, D. Sonntag, G. Potamianos, and A. Krüger, editors, The Handbook of Multimodal-Multisensor Interfaces, Volume 2: Signal Processing, Architectures, and Detection of Emotion and Cognition. Morgan & Claypool Publishers, San Rafael, CA. 178

Cited By

View all
  • (2019)Standardized representations and markup languages for multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233806(347-392)Online publication date: 1-Jul-2019

Comments

Information & Contributors

Information

Published In

cover image ACM Books
The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions
July 2019
813 pages
ISBN:9781970001754
DOI:10.1145/3233795

Publisher

Association for Computing Machinery and Morgan & Claypool

Publication History

Published: 01 July 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Appears in

ACM Books

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Standardized representations and markup languages for multimodal interactionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233806(347-392)Online publication date: 1-Jul-2019

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media