Abstract
We present a computational model of incremental grounding, including state updates and action selection. The model is inspired by corpus-based examples of overlapping utterances of several sorts, including backchannels and completions. The model has also been partially implemented within a virtual human system that includes incremental understanding, and can be used to track grounding and provide overlapping verbal and non-verbal behaviors from a listener, before a speaker has completed her utterance.


Similar content being viewed by others
Notes
It is sometimes useful to distinguish further between the explicit or predicted surface form, as opposed to the explicit or predicted meaning.
Sometimes, an utterance that includes an acknowledgment will also proceed to initiate a new CGU (as in “okay, so let’s talk about the other matter”).
References
op den Akker H, Schulz C (2008) Exploring features and classifiers for dialogue act segmentation. In: Popescu-Belis A, Stiefelhagen R (eds) Machine learning for multimodal interaction. Lecture notes in computer science, vol 5237. Springer, Heidelberg, pp 196–207
Allwood J, Kopp S, Grammer K, Ahlsn E, Oberzaucher E, Koppensteiner M (2007) The analysis of embodied communicative feedback in multimodal corpora: a prerequisite for behavior simulation. Lang Res Eval 41(3—-4):255–272. doi:10.1007/s10579-007-9056-2
Bohus D, Horvitz E (2009) Learning to predict engagement with a spoken dialog system in open-world settings. In: Proceedings of SIGDIAL 2009. London
Buß O, Baumann T, Schlangen D (2010) Collaborating on utterances with a spoken dialogue system using an isu-based approach to incremental dialogue management. In: Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue Association for, Computational Linguistics. pp 233–236
Carletta J (2007) Unleashing the killer corpus: experiences in creating the multi-everything ami meeting corpus. Lang Res Eval 41(2):181–190
Clark H (1996) Using language. Cambridge University Press, linebreak Cambridge
Clark H, Schaefer E (1989) Contributing to discourse. Cogn Sci 13(2):259–294
DeVault D, Sagae K, Traum D (2009) Can i finish? Learning when to respond to incremental interpretation results in interactive dialogue. In: 10th SIGdial Workshop on Discourse and Dialogue. London
DeVault D, Sagae K, Traum D (2011) Detecting the status of a predictive incremental speech understanding model for real-time decision-making in a spoken dialogue system. In: The 12th Annual Conference of the International Speech Communication Association (InterSpeech 2011)
DeVault D, Sagae K, Traum D (2011) Incremental interpretation and prediction of utterance meaning for interactive dialogue. Dialog Discourse 2(1)
DeVault D, Traum D (2013) A method for the approximation of incremental understanding of explicit utterance meaning using predictive models in finite domains. NAACL-HLT 2013
Gratch J, Okhmatovskaia A, Lamothe F, Marsella S, Morales M, van der Werf R, Morency LP (2006) Virtual rapport. In: Gratch J, Young M, Aylett R, Ballin D, Olivier P (eds) Intelligent virtual agents, vol 2. Springer, Berlin, pp 14–27. doi:10.1007/11821830_2
Hartholt A, Traum DR, Marsella SC, Shapiro A, Stratou G, Leuski A, Morency LP, Gratch J (2013) All together now—introducing the virtual human toolkit. In: Aylett R, Krenn B, Pelachaud C, Shimodaira H (eds) IVA, Lecture notes in computer science, vol 8108. Springer, Berlin, pp 368–381
Huang L, Morency L, Gratch J (2011) Virtual rapport 2.0. Intelligent virtual agents. Springer, Berlin, pp 68–79
Kopp S, Allwood J, Grammer K, Ahlsen E, Stocksmeier T (2008) Modeling embodied feedback with virtual humans. In: Proceedings of the Embodied communication in humans and machines, 2nd ZiF research group international conference on Modeling communication with robots and virtual humans, ZiF’06, Springer-Verlag, Berlin, pp 18–37. http://dl.acm.org/citation.cfm?id=1794517.1794519
Matheson C, Poesio M, Traum D (2000) Modelling grounding and discourse obligations using update rules. In: Proceedings of the First Conference of the North American Chapter of the Association for Computational Linguistics
Milward D (1992) Dynamics, dependency grammar and incremental interpretation. In: COLING92, pp 1095–1099
Morency LP, Kok I, Gratch J (2010) A probabilistic multimodal approach for predicting listener backchannels. Autonom Agent Multi-Agent Syst 20:70–84. doi:10.1007/s10458-009-9092-y
Nakatani C, Traum D (1999) Coding discourse structure in dialogue (version 1.0). Tech. Rep. UMIACS-TR-99-03, University of Maryland
Oviatt S, Cohen P (1991) Discourse structure and performance efficiency in interactive and non-interactive spoken modalities. Comp Speech Lang 5(4):297–326
Plüss B, DeVault D, Traum D (2011) Toward rapid development of multi-party virtual human negotiation scenarios. In: Proceedings of SemDial
Poesio M, Traum DR (1997) Conversational actions and discourse situations. Comput Intell 13(3)
Roque A (2009) Dialogue management in spoken dialogue systems with degrees of grounding. Ph.D. thesis, University of Southern California, Los Angeles
Roque A, Traum D (2008) Degrees of grounding based on evidence of understanding. In: Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, Association for, Computational Linguistics. pp 54–63
Schlangen D, Baumann T, Buschmeier H, Buß O, Kopp S, Skantze G, Yaghoubzadeh R (2010) Middleware for incremental processing in conversational agents. In: Proceedings of SigDial 2010. Tokyo
Schlangen D, Skantze G (2009) A general, abstract model of incremental dialogue processing. In: Proc. of the 12th Conference of the European Chapter of the ACL
Schuler W, Wu S, Schwartz L (2009) A framework for fast incremental interpretation during speech decoding. Comput Ling 35(3):313–343
Selfridge E, Arizmendi I, Heeman P, Williams J (2011) Stability and accuracy in incremental speech recognition. In: Proceedings of the SIGDIAL 2011 Conference, Association for Computational Linguistics, Portland, pp 110–119. http://www.aclweb.org/anthology/W/W11/W11-2014
Skantze G, Hjalmarsson A (2010) Towards incremental speech generation in dialogue systems. In: Proceedings of the SIGDIAL 2010 Conference, Association for Computational Linguistics, Tokyo, pp 1–8. http://www.aclweb.org/anthology/W/W10/W10-4301
Skantze G, Schlangen D (2009) Incremental dialogue processing in a micro-domain. In: Proceedings of the 12th Conference of the European Association for Computational Linguistics (EACL)
Tanenhaus M, Brown-Schmidt S (2008) Language processing in the natural world. Philos Trans Royal Soc B 363(1493):1105–1122
Traum D (2003) Semantics and pragmatics of questions and answers for dialogue agents. In: proceedings of the International Workshop on Computational Semantics, pp 380–394
Traum D, DeVault D, Lee J, Wang Z, Marsella S (2012) Incremental dialogue understanding and feedback for multiparty, multimodal conversation. In: Intelligent Virtual Agents. Springer
Traum D, Rickel J, Marsella S, Gratch J (2003) Negotiation over tasks in hybrid human-agent teams for simulation-based training. In: Proceedings of AAMAS 2003: Second International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp 441–448
Traum D, Swartout W, Gratch J, Marsella S (2008) A virtual human dialogue model for non-team interaction. In: Dybkjaer L, Minker W (eds) Recent trends in discourse and dialogue. Springer, Netherlands
Traum DR (1994) A computational theory of grounding in natural language conversation. Ph.D. thesis, University of Rochester, Rochester
Traum DR, Marsella S, Gratch J, Lee J, Hartholt A (2008) Multi-party, multi-issue, multi-strategy negotiation for multi-modal virtual agents. In: Prendinger H, Lester JC, Ishizuka M (eds) IVA, lecture notes in computer science, vol 5208. Springer, Berlin, pp 117–130
Traum DR, Morency LP (2010) Integration of visual perception in dialogue understanding for virtual humans in multi-party interaction. In: AAMAS International Workshop on Interacting with ECAs as Virtual Characters
Traum DR, Schubert LK, Poesio M, Martin NG, Light M, Hwang CH, Heeman P, Ferguson G, Allen JF (1996) Knowledge representation in the TRAINS-93 conversation system. Intern J Exp Syst 9(1):173–223
Wang Z, Lee J, Marsella S (2011) Towards more comprehensive listening behavior: beyond the bobble head. In: Intelligent Virtual Agents, Springer, Berlin, pp 216–227
Ward N, Tsukahara W (1999) A responsive dialogue system. In: Wilks Y (eds) Machine conversations. Springer, New York
Acknowledgments
Some of the effort described here has been sponsored by the US Army. Any opinions, content or information presented does not necessarily reflect the position or the policy of the United States Government, and no official endorsement should be inferred.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Visser, T., Traum, D., DeVault, D. et al. A model for incremental grounding in spoken dialogue systems. J Multimodal User Interfaces 8, 61–73 (2014). https://doi.org/10.1007/s12193-013-0147-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-013-0147-7