Abstract
Affective and human-centered computing have attracted a lot of attention during the past years, mainly due to the abundance of devices and environments able to exploit multimodal input from the part of the users and adapt their functionality to their preferences or individual habits. In the quest to receive feedback from the users in an unobtrusive manner, the combination of facial and hand gestures with prosody information allows us to infer the users’ emotional state, relying on the best performing modality in cases where one modality suffers from noise or bad sensing conditions. In this paper, we describe a multi-cue, dynamic approach to detect emotion in naturalistic video sequences. Contrary to strictly controlled recording conditions of audiovisual material, the proposed approach focuses on sequences taken from nearly real world situations. Recognition is performed via a ’Simple Recurrent Network’ which lends itself well to modeling dynamic events in both user’s facial expressions and speech. Moreover this approach differs from existing work in that it models user expressivity using a dimensional representation of activation and valence, instead of detecting discrete ’universal emotions’, which are scarce in everyday human-machine interaction. The algorithm is deployed on an audiovisual database which was recorded simulating human-human discourse and, therefore, contains less extreme expressivity and subtle variations of a number of emotion labels.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jaimes, A.: Human-Centered Multimedia: Culture, Deployment, and Access. IEEE Multimedia Magazine 13(1) (2006)
Mehrabian, A.: Communication without Words. Psychology Today 2(4), 53–56 (1968)
Nogueiras, A., et al.: Speech emotion recognition using hidden markov models. In: Proceedings of Eurospeech, Aalborg, Denmark (2001)
Pentland, A.: Socially Aware Computation and Communication. Computer 38(3), 33–40 (2005)
Raouzaiou, A., et al.: Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing 2002(10), 1021–1038 (2002)
Fridlund, A.J.: Human Facial Expression: An Evolutionary Perspective. Academic Press, London (1994)
Hartmann, M., Mancini, C., Pelachaud, C.: Formational parameters and adaptive prototype instantiation for MPEG-4 compliant gesture synthesis. In: Computer Animation’02, Geneva, Switzerland, IEEE Computer Society Press, Los Alamitos (2002)
Tomasi, C., Kanade, T.: Detection and Tracking of Point Features. Carnegie Mellon University Technical Report CMU-CS-91-132 (April 1991)
Freitag, F., Monte, E.: Acoustic-phonetic decoding based on Elman predictive neural networks. In: Proceedings of ICSLP 96, Fourth International Conference on on Spoken Language Processing, vol. 1, pp. 522–525 (1996)
FP5 IST ERMIS, Emotionally Rich Man-machine Intelligent System IST-2000-29319, http://www.image.ntua.gr/ermis
FP6 IST HUMAINE, Human-Machine Interaction Network on Emotion, 2004-2007, http://www.emotion-research.net
Caridakis, G., et al.: Synthesizing Gesture Expressivity Based on Real Sequences. In: Workshop on multimodal corpora: from multimodal behaviour theories to usable models, LREC 2006 Conference, Genoa, Italy, 24-26 May (2006)
Williams, G.W.: Comparing the joint agreement of several raters with another rater”. Biometrics 32, 619–627 (1976)
Zimmermann, H.G., et al.: Identification and forecasting of large dynamical systems by dynamical consistent neural networks. In: Haykin, S., et al. (eds.) New Directions in Statistical Signal Processing: From Systems to Brain, MIT Press, Cambridge (2006)
Cohen, I., Garg, A., Huang, T.S.: Emotion Recognition using Multilevel-HMM. In: NIPS Workshop on Affective Computing, Colorado (Dec. 2000)
Cohen, I., et al.: Learning Bayesian network classifiers for facial expression recognition using both labeled and unlabeled data. In: Proc. Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 595–601 (2003)
Lien, J.: Automatic recognition of facial expressions using hidden markov models and estimation of expression intensity. Ph.D. dissertation, Carnegie Mellon University, Pittsburg, PA (1998)
Young, J.W.: Head and Face Anthropometry of Adult U.S. Civilians. FAA Civil Aeromedical Institute, 1963-1993 (final report 1993)
Weizenbaum, J.: ELIZA – A Computer Program For the Study of Natural Language Communication Between Man and Machine. Communications of the ACM 9(1), 36–45 (1966)
Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)
Karpouzis, K., et al.: Facial expression and gesture analysis for emotionally-rich man-machine interaction. In: Sarris, N., Strintzis, M. (eds.) 3D Modeling and Animation: Synthesis and Analysis Techniques, pp. 175–200. Idea Group Publ., Hershey (2004)
Karpouzis, K., et al.: Facial expression and gesture analysis for emotionally-rich man-machine interaction. In: Sarris, N., Strintzis, M. (eds.) 3D Modeling and Animation: Synthesis and Analysis Techniques, pp. 175–200. Idea Group Publ., Hershey (2004)
Lam, K.M., Yan, H.: Locating and Extracting the Eye in Human Face Images. Pattern Recognition 29(5), 771–779 (1996)
Scherer, K.R.: Adding the affective dimension: A new look in speech analysis and synthesis. In: Proc. International Conf. on Spoken Language Processing, pp. 1808–1811 (1996)
Vincent, L.: Morphological Grayscale Reconstruction in Image Analysis: Applications and Efficient Algorithms. IEEE Trans. Image Processing 2(2), 176–201 (1993)
De Silva, L.C., Ng, P.C.: Bimodal emotion recognition. In: Proc. Automatic Face and Gesture Recognition, pp. 332–335 (2000)
Wang, L.M., et al.: Applications of PSO Algorithm and OIF Elman Neural Network to Assessment and Forecasting for Atmospheric Quality. In: ICANNGA (2005)
Chen, L.S., Huang, T.S.: Emotional expressions in audiovisual human computer interaction. In: Proc. International Conference on Multimedia and Expo, pp. 423–426 (2000)
Chen, L.S.: Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction. PhD thesis, University of Illinois at Urbana-Champaign, Dept. of Electrical Engineering (2000)
Yang, M.H., Kriegman, D., Ahuja, N.: Detecting Faces in Images: A Survey. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(1), 34–58 (2002)
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(12), 1424–1445 (2000)
Pantic, M., Rothkrantz, L.J.M.: Towards an Affect-sensitive Multimodal Human-Computer Interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)
Pantic, M.: Face for Interface. In: Pagani, M. (ed.) The Encyclopedia of Multimedia Technology and Networking, pp. 308–314. Idea Group, Hershey (2005)
Pantic, M., et al.: Affective Multimodal Human-Computer Interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia, pp. 669–676 (2005)
Mathworks, Manual of Neural Network Toolbox for MATLAB
Tekalp, M., Ostermann, J.: Face and 2-D mesh animation in MPEG-4. Signal Processing: Image Communication 15(4), 387–421 (2000)
Sebe, N., Cohen, I., Huang, T.S.: Handbook of Pattern Recognition and Computer Vision, January 2005. World Scientific, Singapore (2005)
Ekman, P., Friesen, W.: Pictures of Facial Affect. Consulting Psychologists Press, Palo Alto (1978)
Mertens, P.: The Prosogram: Semi-Automatic Transcription of Prosody based on a Tonal Perception Model. In: Bel, B., Marlien, I. (eds.) Proc. of Speech Prosody, Japan (2004)
Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human Computer Interaction 59(1-2), 157–183 (2003)
Cowie, R., Douglas-Cowie, E.: Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In: Proc. International Conf. on Spoken Language Processing, pp. 1989–1992 (1996)
Cowie, R., et al.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processign Magazine, 33- 80 (January 2001)
Cowie, R., et al.: FEELTRACE: An instrument for recording perceived emotion in real time. In: ISCA Workshop on Speech and Emotion, Northern Ireland, pp. 19–24 (2000)
Fransens, R., De Prins, J.: SVM-based Nonparametric Discriminant Analysis, An Application to Face Detection. In: Ninth IEEE International Conference on Computer Vision, vol. 2, October 13-16 (2003)
Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)
Picard, R.W.: Towards computers that recognize and respond to user emotion. IBM Syst. Journal 39(3–4), 705–719 (2000)
Hsu, R.L., Abdel-Mottaleb, M., Jain, A.K.: Face Detection in Color Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5) (2002)
Picard, R.W., Vyzas, E., Healey, J.: Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(10), 1175–1191 (2001)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)
Ioannou, S., et al.: Emotion recognition through facial expression analysis based on a neurofuzzy network. Neural Networks 18(4), 423–435 (2005)
Balomenos, T., et al.: Emotion Analysis in Man-Machine Interaction Systems. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 318–328. Springer, Heidelberg (2005)
Williams, U., Stevens, K.N.: Emotions and Speech: some acoustical correlates. JASA 52, 1238–1250 (1972)
Zeng, Z., et al.: Multi-stream Confidence Analysis for Audio-Visual Affect Recognition. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 964–971. Springer, Heidelberg (2005)
Zeng, Z., et al.: Audio-visual, Emotion Recognition in Adult Attachment Interview. This volume (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Karpouzis, K. et al. (2007). Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition. In: Huang, T.S., Nijholt, A., Pantic, M., Pentland, A. (eds) Artifical Intelligence for Human Computing. Lecture Notes in Computer Science(), vol 4451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72348-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-72348-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72346-2
Online ISBN: 978-3-540-72348-6
eBook Packages: Computer ScienceComputer Science (R0)