Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition

Karpouzis, Kostas; Caridakis, George; Kessous, Loic; Amir, Noam; Raouzaiou, Amaryllis; Malatesta, Lori; Kollias, Stefanos

doi:10.1007/978-3-540-72348-6_5

Kostas Karpouzis¹,
George Caridakis¹,
Loic Kessous²,
Noam Amir²,
Amaryllis Raouzaiou¹,
Lori Malatesta¹ &
…
Stefanos Kollias¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4451))

2462 Accesses

Abstract

Affective and human-centered computing have attracted a lot of attention during the past years, mainly due to the abundance of devices and environments able to exploit multimodal input from the part of the users and adapt their functionality to their preferences or individual habits. In the quest to receive feedback from the users in an unobtrusive manner, the combination of facial and hand gestures with prosody information allows us to infer the users’ emotional state, relying on the best performing modality in cases where one modality suffers from noise or bad sensing conditions. In this paper, we describe a multi-cue, dynamic approach to detect emotion in naturalistic video sequences. Contrary to strictly controlled recording conditions of audiovisual material, the proposed approach focuses on sequences taken from nearly real world situations. Recognition is performed via a ’Simple Recurrent Network’ which lends itself well to modeling dynamic events in both user’s facial expressions and speech. Moreover this approach differs from existing work in that it models user expressivity using a dimensional representation of activation and valence, instead of detecting discrete ’universal emotions’, which are scarce in everyday human-machine interaction. The algorithm is deployed on an audiovisual database which was recorded simulating human-human discourse and, therefore, contains less extreme expressivity and subtle variations of a number of emotion labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Article 01 September 2022

Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features

Survey of deep emotion recognition in dynamic data using facial, speech and textual cues

Article 22 January 2024

References

Jaimes, A.: Human-Centered Multimedia: Culture, Deployment, and Access. IEEE Multimedia Magazine 13(1) (2006)
Google Scholar
Mehrabian, A.: Communication without Words. Psychology Today 2(4), 53–56 (1968)
Google Scholar
Nogueiras, A., et al.: Speech emotion recognition using hidden markov models. In: Proceedings of Eurospeech, Aalborg, Denmark (2001)
Google Scholar
Pentland, A.: Socially Aware Computation and Communication. Computer 38(3), 33–40 (2005)
Article Google Scholar
Raouzaiou, A., et al.: Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing 2002(10), 1021–1038 (2002)
Article MATH Google Scholar
Fridlund, A.J.: Human Facial Expression: An Evolutionary Perspective. Academic Press, London (1994)
Google Scholar
Hartmann, M., Mancini, C., Pelachaud, C.: Formational parameters and adaptive prototype instantiation for MPEG-4 compliant gesture synthesis. In: Computer Animation’02, Geneva, Switzerland, IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Tomasi, C., Kanade, T.: Detection and Tracking of Point Features. Carnegie Mellon University Technical Report CMU-CS-91-132 (April 1991)
Google Scholar
Freitag, F., Monte, E.: Acoustic-phonetic decoding based on Elman predictive neural networks. In: Proceedings of ICSLP 96, Fourth International Conference on on Spoken Language Processing, vol. 1, pp. 522–525 (1996)
Google Scholar
FP5 IST ERMIS, Emotionally Rich Man-machine Intelligent System IST-2000-29319, http://www.image.ntua.gr/ermis
FP6 IST HUMAINE, Human-Machine Interaction Network on Emotion, 2004-2007, http://www.emotion-research.net
Caridakis, G., et al.: Synthesizing Gesture Expressivity Based on Real Sequences. In: Workshop on multimodal corpora: from multimodal behaviour theories to usable models, LREC 2006 Conference, Genoa, Italy, 24-26 May (2006)
Google Scholar
Williams, G.W.: Comparing the joint agreement of several raters with another rater”. Biometrics 32, 619–627 (1976)
Article MATH Google Scholar
Zimmermann, H.G., et al.: Identification and forecasting of large dynamical systems by dynamical consistent neural networks. In: Haykin, S., et al. (eds.) New Directions in Statistical Signal Processing: From Systems to Brain, MIT Press, Cambridge (2006)
Google Scholar
Cohen, I., Garg, A., Huang, T.S.: Emotion Recognition using Multilevel-HMM. In: NIPS Workshop on Affective Computing, Colorado (Dec. 2000)
Google Scholar
Cohen, I., et al.: Learning Bayesian network classifiers for facial expression recognition using both labeled and unlabeled data. In: Proc. Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 595–601 (2003)
Google Scholar
Lien, J.: Automatic recognition of facial expressions using hidden markov models and estimation of expression intensity. Ph.D. dissertation, Carnegie Mellon University, Pittsburg, PA (1998)
Google Scholar
Young, J.W.: Head and Face Anthropometry of Adult U.S. Civilians. FAA Civil Aeromedical Institute, 1963-1993 (final report 1993)
Google Scholar
Weizenbaum, J.: ELIZA – A Computer Program For the Study of Natural Language Communication Between Man and Machine. Communications of the ACM 9(1), 36–45 (1966)
Article Google Scholar
Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)
Article Google Scholar
Karpouzis, K., et al.: Facial expression and gesture analysis for emotionally-rich man-machine interaction. In: Sarris, N., Strintzis, M. (eds.) 3D Modeling and Animation: Synthesis and Analysis Techniques, pp. 175–200. Idea Group Publ., Hershey (2004)
Google Scholar
Karpouzis, K., et al.: Facial expression and gesture analysis for emotionally-rich man-machine interaction. In: Sarris, N., Strintzis, M. (eds.) 3D Modeling and Animation: Synthesis and Analysis Techniques, pp. 175–200. Idea Group Publ., Hershey (2004)
Google Scholar
Lam, K.M., Yan, H.: Locating and Extracting the Eye in Human Face Images. Pattern Recognition 29(5), 771–779 (1996)
Article MathSciNet Google Scholar
Scherer, K.R.: Adding the affective dimension: A new look in speech analysis and synthesis. In: Proc. International Conf. on Spoken Language Processing, pp. 1808–1811 (1996)
Google Scholar
Vincent, L.: Morphological Grayscale Reconstruction in Image Analysis: Applications and Efficient Algorithms. IEEE Trans. Image Processing 2(2), 176–201 (1993)
Article Google Scholar
De Silva, L.C., Ng, P.C.: Bimodal emotion recognition. In: Proc. Automatic Face and Gesture Recognition, pp. 332–335 (2000)
Google Scholar
Wang, L.M., et al.: Applications of PSO Algorithm and OIF Elman Neural Network to Assessment and Forecasting for Atmospheric Quality. In: ICANNGA (2005)
Google Scholar
Chen, L.S., Huang, T.S.: Emotional expressions in audiovisual human computer interaction. In: Proc. International Conference on Multimedia and Expo, pp. 423–426 (2000)
Google Scholar
Chen, L.S.: Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction. PhD thesis, University of Illinois at Urbana-Champaign, Dept. of Electrical Engineering (2000)
Google Scholar
Yang, M.H., Kriegman, D., Ahuja, N.: Detecting Faces in Images: A Survey. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(1), 34–58 (2002)
Article Google Scholar
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(12), 1424–1445 (2000)
Article Google Scholar
Pantic, M., Rothkrantz, L.J.M.: Towards an Affect-sensitive Multimodal Human-Computer Interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)
Article Google Scholar
Pantic, M.: Face for Interface. In: Pagani, M. (ed.) The Encyclopedia of Multimedia Technology and Networking, pp. 308–314. Idea Group, Hershey (2005)
Google Scholar
Pantic, M., et al.: Affective Multimodal Human-Computer Interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia, pp. 669–676 (2005)
Google Scholar
Mathworks, Manual of Neural Network Toolbox for MATLAB
Google Scholar
Tekalp, M., Ostermann, J.: Face and 2-D mesh animation in MPEG-4. Signal Processing: Image Communication 15(4), 387–421 (2000)
Article Google Scholar
Sebe, N., Cohen, I., Huang, T.S.: Handbook of Pattern Recognition and Computer Vision, January 2005. World Scientific, Singapore (2005)
Google Scholar
Ekman, P., Friesen, W.: Pictures of Facial Affect. Consulting Psychologists Press, Palo Alto (1978)
Google Scholar
Mertens, P.: The Prosogram: Semi-Automatic Transcription of Prosody based on a Tonal Perception Model. In: Bel, B., Marlien, I. (eds.) Proc. of Speech Prosody, Japan (2004)
Google Scholar
Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human Computer Interaction 59(1-2), 157–183 (2003)
Google Scholar
Cowie, R., Douglas-Cowie, E.: Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In: Proc. International Conf. on Spoken Language Processing, pp. 1989–1992 (1996)
Google Scholar
Cowie, R., et al.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processign Magazine, 33- 80 (January 2001)
Google Scholar
Cowie, R., et al.: FEELTRACE: An instrument for recording perceived emotion in real time. In: ISCA Workshop on Speech and Emotion, Northern Ireland, pp. 19–24 (2000)
Google Scholar
Fransens, R., De Prins, J.: SVM-based Nonparametric Discriminant Analysis, An Application to Face Detection. In: Ninth IEEE International Conference on Computer Vision, vol. 2, October 13-16 (2003)
Google Scholar
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Google Scholar
Picard, R.W.: Towards computers that recognize and respond to user emotion. IBM Syst. Journal 39(3–4), 705–719 (2000)
Article Google Scholar
Hsu, R.L., Abdel-Mottaleb, M., Jain, A.K.: Face Detection in Color Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5) (2002)
Google Scholar
Picard, R.W., Vyzas, E., Healey, J.: Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(10), 1175–1191 (2001)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)
MATH Google Scholar
Ioannou, S., et al.: Emotion recognition through facial expression analysis based on a neurofuzzy network. Neural Networks 18(4), 423–435 (2005)
Article Google Scholar
Balomenos, T., et al.: Emotion Analysis in Man-Machine Interaction Systems. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 318–328. Springer, Heidelberg (2005)
Google Scholar
Williams, U., Stevens, K.N.: Emotions and Speech: some acoustical correlates. JASA 52, 1238–1250 (1972)
Google Scholar
Zeng, Z., et al.: Multi-stream Confidence Analysis for Audio-Visual Affect Recognition. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 964–971. Springer, Heidelberg (2005)
Chapter Google Scholar
Zeng, Z., et al.: Audio-visual, Emotion Recognition in Adult Attachment Interview. This volume (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, Politechnioupoli, Zographou, Greece
Kostas Karpouzis, George Caridakis, Amaryllis Raouzaiou, Lori Malatesta & Stefanos Kollias
Tel Aviv Academic College of Engineering, 218 Bnei Efraim St. 69107, Tel Aviv, Israel
Loic Kessous & Noam Amir

Authors

Kostas Karpouzis
View author publications
You can also search for this author in PubMed Google Scholar
George Caridakis
View author publications
You can also search for this author in PubMed Google Scholar
Loic Kessous
View author publications
You can also search for this author in PubMed Google Scholar
Noam Amir
View author publications
You can also search for this author in PubMed Google Scholar
Amaryllis Raouzaiou
View author publications
You can also search for this author in PubMed Google Scholar
Lori Malatesta
View author publications
You can also search for this author in PubMed Google Scholar
Stefanos Kollias
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Thomas S. Huang Anton Nijholt Maja Pantic Alex Pentland

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karpouzis, K. et al. (2007). Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition. In: Huang, T.S., Nijholt, A., Pantic, M., Pentland, A. (eds) Artifical Intelligence for Human Computing. Lecture Notes in Computer Science(), vol 4451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72348-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-72348-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72346-2
Online ISBN: 978-3-540-72348-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition

Abstract

Access this chapter

Preview

Similar content being viewed by others

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features

Survey of deep emotion recognition in dynamic data using facial, speech and textual cues

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition

Abstract

Access this chapter

Preview

Similar content being viewed by others

Audio-Visual Continuous Recognition of Emotional State in a Multi-User System Based on Personalized Representation of Facial Expressions and Voice

Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features

Survey of deep emotion recognition in dynamic data using facial, speech and textual cues

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation