Abstract
We present a corpus-based prosodic analysis with the aim of uncovering the relationship between dialogue acts, personality and prosody in view to providing guidelines for the ECA Greta’s text-to-speech system. The corpus used is the SEMAINE corpus, featuring four different personalities, further annotated for dialogue acts and prosodic features. In order to show the importance of the choice of dialogue act taxonomy, two different taxonomies were used, the first corresponding to Searle’s taxonomy of speech acts and the second, inspired by Bunt’s DIT++, including a division of directive acts into finer categories. Our results show that finer-grained distinctions are important when choosing a taxonomy. We also show with some preliminary results that the prosodic correlates of dialogue acts are not always as cited in the literature and prove more complex and variable. By studying the realisation of different directive acts, we also observe differences in the communicative strategies of the ECA depending on personality, in view to providing input to a speech system.
Notes
Note that certain of these values, notably the pitch and intensity values, are sensitive to recording conditions and so comparison with previous works should be made with care. Here we compare only the values from the SEMAINE corpus between each other.
Although no comparison is made in this study with other accents, the Belfast accent is known to be associated with high rising terminal inflections in declarative sentences, which could mean that a higher percentage of assertives and expressives are associated with a rising intonation than for example with SSE accents.
This phenomenon has previously been noted in a corpus study of Dutch dialogues by Beun (1989), where only 48 % of declarative sentence had rising intonation, and there was seen to be a correlation between the use of the second person personal pronoun and of particles such as ‘and’ and ‘so’ in the identification of questions from declarative sentences with falling intonation. See Ŝafárová (2006) for an example of the use of the ‘you’-pronoun and particles to indicate response-seeking acts in American English. A further semantic explanation for the lack of a rising intonation in declarative questions is provided by Beun (2000), which suggests that a greater degree of certainty in the demand for confirmation is linked with a greater probability of a descending final contour.
References
Allwood, J. (1995). An activity based approach to pragmatics. Technical report, Department of Linguistics, University of Göteborg.
Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G. M., Garrod, S., et al. (1991). The HCRC map task corpus. Language and Speech, 34, 351–366.
Anderson, K., Andre, E., Baur, T., Bernardini, S., Chollet, M., & Chryssafidou, E., et al. (2013). The TARDIS framework: Intelligent virtual agents for social coaching in job interviews. In Proceedings of the 10th international conference on advances in computer entertainment. Heidelberg: Springer.
Austin, J. L. (1962). How to do things with words. Oxford: Clarendon Press.
Beun, R.-J. (1989). Declarative question acts: Two experiments on identification. In M. Taylor, F. Neel, & D. Bouwhuis (Eds.), The structure of multimodal dialogues. Amsterdam: North Holland.
Beun, R.-J. (2000). Context and form: Declarative or interrogative, that is the question. In H. Bunt & W. Black (Eds.), Abduction, belief and context in dialogue. Studies in computational pragmatics. Amsterdam: John Benjamins.
Bevacqua, E., Prépin, K., Niewiadomski, R., De Sevin, E., & Pelachaud, P. (2010). Greta: Towards an interactive conversational virtual companion. In Y. Wilks (Ed.), Artificial companions in society: perspectives on the present and future (pp. 143–156). Amsterdam: John Benjamins.
Boersma, P., & Weenink, D. (2014). Praat: Doing phonetics by computer [Computer program]. Version 5.3.51, retrieved 31 May 2014 from http://www.praat.org.
Bunt, H., Alexandersson, J., Choe, J.-W., Fang, A., Hasida, K., & Lee, K., et al. (2012). Semantically-based standard for dialogue annotation. In Proceedings of the 8th international conference on language resources and evaluation (LREC 2012). Istanbul, Turkey.
Bunt, H. (2000). Dynamic interpretation and dialogue theory. In M. M. Taylor, D. G. Bouwhuis, & F. Neel (Eds.), The structure of multimodal dialogue (Vol. 2, pp. 139–166). Amsterdam: John Benjamins.
Bunt, H. (2011). Multifunctionality in dialogue. Computer Speech & Language, 25(2), 222–245.
Campano, S., Clavel, C., & Pelachaud, C. (2015). I like this painting too: When an ECA shares appreciations to engage users. In Proceedings of the 2015 international conference on autonomous agents and multiagent systems (pp. 1649–1650).
Core, M. G., & Allen, J. (1997). Coding dialogs with the DAMSL annotation scheme. In Working notes of the AAAI fall symposium on communicative action in humans and machines. Cambridge, MA.
De Carolis, B., Pelachaud, C., Poggi, I., & Steedman, M. (2004). APML, a mark-up language for believable behavior generation. In H. Prendinger & M. Ishizuka (Eds.), Life-like characters. Tools, affective functions and applications (pp. 65–85). Heidelberg: Springer.
Gravano, A., Benus, S., Hirschberg, J., German, E. S., & Ward, G. (2008). The effect of prosody and semantic modality on the assessment of speaker certainty. In Proceedings of 4th speech prosody conference. Campinas, Brazil.
Hirschberg, J. (2004). Pragmatics and intonation. In L. R. Horn & G. Ward (Eds.), The handbook of pragmatics (pp. 515–537). Oxford: Blackwell.
Hoque, M. E., Sorower, M. S., Yeasin, M., & Louwerse, M. M. (2007). What speech tells us about discourse: The role of prosodic and discourse features in speech act classification. In IEEE international joint conference on neural networks (IJCNN 2007) (pp. 2999–3004). Orlando, FL.
Klatt, J., Marsella, S., & Kramer, N. (2011). Negotiations in the context of AIDS prevention: An agent-based model using theory of mind. In H. H. Vilhjálmsson, S. Kopp, S. Marsella, & K. R. Thórisson (Eds.), Intelligent virtual agents (pp. 209–215). Heidelberg: Springer.
Laukka, P., Juslin, P. N., & Bresin, R. (2005). A Dimensional approach to vocal expression of emotion. Cognition and Emotion, 19(5), 633–653.
McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schröder, M. (2012). The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, 3(1), 5–17.
Palmer, H. E. (1922). English intonation with systematic exercises. Heffer.
Pierrehumbert, J. (1980). The phonology and phonetics of english intonation. Ph.D thesis, MIT.
Popescu-Belis, A. (2003). Dialogue act tagsets for meeting understanding: An Abstraction based on the DAMSL, Switchboard and ICSI-MR tagsets. Technical report, IM2.MDM-09.
Ŝafárová, M. (2006). Rises and falls: Studies in the semantics and pragmatics of intonation. Ph.D thesis, Institute for Logic, Language and Computation (pp. 59–74).
Scherer, K. (2003). Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1–2), 227–256.
Searle, J. (1979). Expression and meaning: Studies in the theory of speech acts. Cambridge: Cambridge University Press.
Shriberg, E., Bates, R., Stolcke, A., Taylor, P., Jurafsky, D. F., Ries, K., et al. (1998). Can Prosody aid the automatic classification of dialog acts in conversational speech? Language and Speech, 41(3–4), 439–487.
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., et al. (1992). ToBI: A standard for labeling English prosody. In J. J. Ohala, T. M. Nearey, B. L. Derwing, M. M. Hodge, & G. E. Wiebe (Eds.), ICSLP 92 proceedings: 1992 international conference on spoken language processing (pp. 867–870). Department of Linguistics, University of Alberta.
Suignard, P. (2010). NaviQuest: Un outil pour naviguer dans une base de questions posées à un Agent Conversationnel. In Workshop sur les agents conversationnels Animés, Lille.
Syrdal, A., & Kim, Y.-J. (2008). Dialog speech acts and prosody: Considerations for TTS. In Proceedings of speech prosody (pp. 661-665). Campinas, Brazil.
Yoon, T., Chavarria, S., Cole, J., & Hasegawa-Johnson, M. (2004). Intertranscriber reliability of prosodic labeling on telephone conversation using ToBI. In Proceedings of the international conference on spoken language processing (pp. 2729–2732). Nara: Japan.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bawden, R., Clavel, C. & Landragin, F. Towards the generation of dialogue acts in socio-affective ECAs: a corpus-based prosodic analysis. Lang Resources & Evaluation 50, 821–838 (2016). https://doi.org/10.1007/s10579-015-9312-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-015-9312-9