Abstract
A multimodal, cross-cultural corpus of affective behavior is presented in this research work. The corpus construction process, including issues related to the design and implementation of an experiment, is discussed along with resulting acoustic prosody, facial expressions and gesture expressivity features. However, research work presented here focuses more on the cross-cultural aspect of gestural behavior defining a common corpus construction protocol aiming to identify cultural patterns within non-verbal behavior across cultures i.e. German, Greek and Italian. Culture specific findings regarding gesture expressivity are derived from the affective analysis performed. Additionally, the multimodal aspect, including prosody and facial expressions, is researched in terms of fusion techniques. Finally, a release plan of the corpus to the public domain is discussed aiming to establish the current corpus as a benchmark multimodal, cross-cultural standard and reference point.



Similar content being viewed by others
Notes
Initially, we had expected that emotions are more or less homogeneously expressed across the three modalities. But first analysis, yet based exclusively on annotations obtained for the audio channel, showed surprisingly low improvements when adding information of other modalities to the audio channel [26]. This was taken as a first hint for a possible discrepancy between the modalities.
References
Abrilian S, Devillers L, Buisine S, Martin JC (2005) EmoTV1: annotation of real-life emotions for the specification of multimodal affective interfaces. In: International proceedings of HCI
Amazon Web Services: Public Data Sets (2012) http://aws.amazon.com/publicdatasets/. Accessed 31 Jan 2012
Amir N, Weiss A, Hadad R (2009) Is there a dominant channel in perception of emotions? In: 3rd International conference on affective computing and intelligent interaction and workshops, 2009 (ACII 2009). IEEE, Amsterdam, pp 1–6
Bänziger T, Pirker H, Scherer K (2006) GEMEP-GEneva multimodal emotion portrayals: a corpus for the study of multimodal emotional expressions. In: The workshop programme corpora for research on emotion and affect, 23 May 2006. Citeseer, p 15
Battocchi A, Pianesi F, Goren-Bar D (2005) A first evaluation study of a database of kinetic facial expressions (dafex). In: Proceedings of the 7th international conference on multimodal interfaces (ICMI ’05). ACM, New York, pp 214–221
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In: Proceedings of interspeech, Lissabon, pp 1517–1520
Busso C, Bulut M, Lee C, Kazemzadeh A, Mower E, Kim S, Chang J, Lee S, Narayanan S (2008) Iemocap: interactive emotional dyadic motion capture database. Lang Resour Eval 42(4):335–359
Caridakis G, Raouzaiou A, Bevacqua E, Mancini M, Karpouzis K, Malatesta L, Pelachaud C (2007) Virtual agent multimodal mimicry of humans. Special issue on multimodal corpora. Lang Resour Eval 41(3–4): pp 367–388. Springer, Berlin. http://www.image.ece.ntua.gr/publications.php
Caridakis G, Raouzaiou A, Karpouzis K, Kollias S (2006) Synthesizing gesture expressivity based on real sequences. Workshop on multimodal corpora: from multimodal behaviour theories to usable models. In: LREC 2006 conference, Genoa, Italy, 24–26 May 2006. http://www.image.ece.ntua.gr/publications.php
Caridakis G, Wagner J, Raouzaiou A, Curto Z, Andre E, Karpouzis K (2010) A multimodal corpus for gesture expressivity analysis. In: Multimodal corpora: advances in capturing, coding and analyzing multimodality, LREC, Malta, 17–23 May 2010
Castellano G, Leite I, Pereira A, Martinho C, Paiva A, McOwan P (2010) Inter-act: an affective and contextually rich multimodal video corpus for studying interaction with robots. In: Proceedings of the international conference on multimedia. ACM, New York, pp 1031–1034
Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) ‘FEELTRACE’: an instrument for recording perceived emotion in real time. In: ISCA tutorial and research workshop (ITRW) on speech and emotion, Citeseer
Creative Commons: BY-NC-SA 3.0 (2012) http://creativecommons.org/licenses/by-nc-sa/3.0/. Accessed 2 Feb 2012
Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40(1–2):33–60
Douglas-Cowie E, Cowie R, Sneddon I, Cox C, Lowry O, Mcrorie M, Martin J, Devillers L, Abrilian S, Batliner A et al (2007) The humaine database: addressing the collection and annotation of naturalistic and induced emotional data. In: Affective computing and intelligent interaction, pp 488–500
Douglas-Cowie E, Devillers L, Martin JC, Cowie R, Savvidou S, Abrilian S, Cox C (2005) Multimodal databases of everyday emotion: facing up to complexity. In: INTERSPEECH 2005, pp 813–816
Velten E (1968) A laboratory task for induction of mood states. Behav Res Ther 6:473–482
Ekman P et al (1971) Universals and cultural differences in facial expressions of emotion. University of Nebraska Press, Lincoln
Elfenbein H, Beaupré M, Lévesque M, Hess U (2007) Toward a dialect theory: cultural differences in the expression and recognition of posed facial expressions. Emotion 7(1):131
Fanelli G, Gall J, Romsdorfer H, Weise T, Van Gool L (2010) A 3-d audio-visual corpus of affective communication. IEEE Trans Multimedia 12(6):591–598
Fleiss J, Levin B, Paik M (2003) Statistical methods for rates and proportions. Wiley series in probability and mathematical statistics. Probability and mathematical statistics. Wiley, New York
Kessous L, Castellano G, Caridakis G (2009) Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J Multimodal User Interfaces. doi:10.1007/s12193-009-0025-5. http://www.image.ece.ntua.gr/publications.php
Kipp M (2001) Anvil-a generic annotation tool for multimodal dialogue. In: Seventh European conference on speech communication and technology. ISCA
Küblbeck C, Ernst A (2006) Face detection and tracking in video sequences using the modified census transformation. Image Vis Comput 24:564–572
Leite I, Pereira A, Martinho C, Paiva A (2008) Are emotional robots more fun to play with? In: 17th IEEE international symposium on robot and human interactive communication, 2008, RO-MAN 2008. IEEE, pp 77–82
Lingenfelser F, Wagner J, André E (2011) A systematic discussion of fusion techniques for multi-modal affect recognition tasks. In: ICMI, pp 19–26
Plutchik R (1994) The psychology and biology of emotion. HarperCollins College Publishers
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39:1161–1178. doi:10.1037/h0077714
Shami M, Verhelst W (2007) Automatic classification of expressiveness in speech: a multi-corpus study. Speaker classification II, pp 43–56
Soleymani M, Lichtenauer J, Pun T, Pantic M (2011) A multi-modal affective database for affect recognition and implicit tagging. IEEE Transactions on Affective Computing, vol 99, p 1
Velten E (1998) A laboratory task for induction of mood states. Behav Res Ther 35:72–82
Vogt T, André E (2009) Exploring the benefits of discretization of acoustic features for speech emotion recognition. In: Proceedings of 10th conference of the international speech communication association (INTERSPEECH). ISCA, Brighton, UK, pp 328–331
Wagner J, Lingenfelser F, André E (2011) The social signal interpretation framework (SSI) for real time signal processing and recognition. In: Proceedings of Interspeech 2011
Wagner J, Lingenfelser F, André E, Kim J (2011) Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Transactions on Affective Computing 99(PrePrints)
Zara A, Maffiolo V, Martin J, Devillers L (2007) Collection and annotation of a corpus of human-human multimodal interactions: emotion and others anthropomorphic characteristics. In: Affective computing and intelligent interaction, pp 464–475
Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Acknowledgments
This work was partially funded by the European Commission under grant agreement eCute (FP7-ICT-2009-5), ILHAIRE (FP7-ICT-2009.8.0) and CEEDS (FP7-ICT-2009-5).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Caridakis, G., Wagner, J., Raouzaiou, A. et al. A cross-cultural, multimodal, affective corpus for gesture expressivity analysis. J Multimodal User Interfaces 7, 121–134 (2013). https://doi.org/10.1007/s12193-012-0112-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-012-0112-x