Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition

  • Conference paper
Artifical Intelligence for Human Computing

Abstract

Affective and human-centered computing have attracted a lot of attention during the past years, mainly due to the abundance of devices and environments able to exploit multimodal input from the part of the users and adapt their functionality to their preferences or individual habits. In the quest to receive feedback from the users in an unobtrusive manner, the combination of facial and hand gestures with prosody information allows us to infer the users’ emotional state, relying on the best performing modality in cases where one modality suffers from noise or bad sensing conditions. In this paper, we describe a multi-cue, dynamic approach to detect emotion in naturalistic video sequences. Contrary to strictly controlled recording conditions of audiovisual material, the proposed approach focuses on sequences taken from nearly real world situations. Recognition is performed via a ’Simple Recurrent Network’ which lends itself well to modeling dynamic events in both user’s facial expressions and speech. Moreover this approach differs from existing work in that it models user expressivity using a dimensional representation of activation and valence, instead of detecting discrete ’universal emotions’, which are scarce in everyday human-machine interaction. The algorithm is deployed on an audiovisual database which was recorded simulating human-human discourse and, therefore, contains less extreme expressivity and subtle variations of a number of emotion labels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Jaimes, A.: Human-Centered Multimedia: Culture, Deployment, and Access. IEEE Multimedia Magazine 13(1) (2006)

    Google Scholar 

  2. Mehrabian, A.: Communication without Words. Psychology Today 2(4), 53–56 (1968)

    Google Scholar 

  3. Nogueiras, A., et al.: Speech emotion recognition using hidden markov models. In: Proceedings of Eurospeech, Aalborg, Denmark (2001)

    Google Scholar 

  4. Pentland, A.: Socially Aware Computation and Communication. Computer 38(3), 33–40 (2005)

    Article  Google Scholar 

  5. Raouzaiou, A., et al.: Parameterized facial expression synthesis based on MPEG-4. EURASIP Journal on Applied Signal Processing 2002(10), 1021–1038 (2002)

    Article  MATH  Google Scholar 

  6. Fridlund, A.J.: Human Facial Expression: An Evolutionary Perspective. Academic Press, London (1994)

    Google Scholar 

  7. Hartmann, M., Mancini, C., Pelachaud, C.: Formational parameters and adaptive prototype instantiation for MPEG-4 compliant gesture synthesis. In: Computer Animation’02, Geneva, Switzerland, IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  8. Tomasi, C., Kanade, T.: Detection and Tracking of Point Features. Carnegie Mellon University Technical Report CMU-CS-91-132 (April 1991)

    Google Scholar 

  9. Freitag, F., Monte, E.: Acoustic-phonetic decoding based on Elman predictive neural networks. In: Proceedings of ICSLP 96, Fourth International Conference on on Spoken Language Processing, vol. 1, pp. 522–525 (1996)

    Google Scholar 

  10. FP5 IST ERMIS, Emotionally Rich Man-machine Intelligent System IST-2000-29319, http://www.image.ntua.gr/ermis

  11. FP6 IST HUMAINE, Human-Machine Interaction Network on Emotion, 2004-2007, http://www.emotion-research.net

  12. Caridakis, G., et al.: Synthesizing Gesture Expressivity Based on Real Sequences. In: Workshop on multimodal corpora: from multimodal behaviour theories to usable models, LREC 2006 Conference, Genoa, Italy, 24-26 May (2006)

    Google Scholar 

  13. Williams, G.W.: Comparing the joint agreement of several raters with another rater”. Biometrics 32, 619–627 (1976)

    Article  MATH  Google Scholar 

  14. Zimmermann, H.G., et al.: Identification and forecasting of large dynamical systems by dynamical consistent neural networks. In: Haykin, S., et al. (eds.) New Directions in Statistical Signal Processing: From Systems to Brain, MIT Press, Cambridge (2006)

    Google Scholar 

  15. Cohen, I., Garg, A., Huang, T.S.: Emotion Recognition using Multilevel-HMM. In: NIPS Workshop on Affective Computing, Colorado (Dec. 2000)

    Google Scholar 

  16. Cohen, I., et al.: Learning Bayesian network classifiers for facial expression recognition using both labeled and unlabeled data. In: Proc. Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 595–601 (2003)

    Google Scholar 

  17. Lien, J.: Automatic recognition of facial expressions using hidden markov models and estimation of expression intensity. Ph.D. dissertation, Carnegie Mellon University, Pittsburg, PA (1998)

    Google Scholar 

  18. Young, J.W.: Head and Face Anthropometry of Adult U.S. Civilians. FAA Civil Aeromedical Institute, 1963-1993 (final report 1993)

    Google Scholar 

  19. Weizenbaum, J.: ELIZA – A Computer Program For the Study of Natural Language Communication Between Man and Machine. Communications of the ACM 9(1), 36–45 (1966)

    Article  Google Scholar 

  20. Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)

    Article  Google Scholar 

  21. Karpouzis, K., et al.: Facial expression and gesture analysis for emotionally-rich man-machine interaction. In: Sarris, N., Strintzis, M. (eds.) 3D Modeling and Animation: Synthesis and Analysis Techniques, pp. 175–200. Idea Group Publ., Hershey (2004)

    Google Scholar 

  22. Karpouzis, K., et al.: Facial expression and gesture analysis for emotionally-rich man-machine interaction. In: Sarris, N., Strintzis, M. (eds.) 3D Modeling and Animation: Synthesis and Analysis Techniques, pp. 175–200. Idea Group Publ., Hershey (2004)

    Google Scholar 

  23. Lam, K.M., Yan, H.: Locating and Extracting the Eye in Human Face Images. Pattern Recognition 29(5), 771–779 (1996)

    Article  MathSciNet  Google Scholar 

  24. Scherer, K.R.: Adding the affective dimension: A new look in speech analysis and synthesis. In: Proc. International Conf. on Spoken Language Processing, pp. 1808–1811 (1996)

    Google Scholar 

  25. Vincent, L.: Morphological Grayscale Reconstruction in Image Analysis: Applications and Efficient Algorithms. IEEE Trans. Image Processing 2(2), 176–201 (1993)

    Article  Google Scholar 

  26. De Silva, L.C., Ng, P.C.: Bimodal emotion recognition. In: Proc. Automatic Face and Gesture Recognition, pp. 332–335 (2000)

    Google Scholar 

  27. Wang, L.M., et al.: Applications of PSO Algorithm and OIF Elman Neural Network to Assessment and Forecasting for Atmospheric Quality. In: ICANNGA (2005)

    Google Scholar 

  28. Chen, L.S., Huang, T.S.: Emotional expressions in audiovisual human computer interaction. In: Proc. International Conference on Multimedia and Expo, pp. 423–426 (2000)

    Google Scholar 

  29. Chen, L.S.: Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction. PhD thesis, University of Illinois at Urbana-Champaign, Dept. of Electrical Engineering (2000)

    Google Scholar 

  30. Yang, M.H., Kriegman, D., Ahuja, N.: Detecting Faces in Images: A Survey. IEEE Trans. on Pattern Analysis and Machine Intelligence 24(1), 34–58 (2002)

    Article  Google Scholar 

  31. Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(12), 1424–1445 (2000)

    Article  Google Scholar 

  32. Pantic, M., Rothkrantz, L.J.M.: Towards an Affect-sensitive Multimodal Human-Computer Interaction. Proceedings of the IEEE 91(9), 1370–1390 (2003)

    Article  Google Scholar 

  33. Pantic, M.: Face for Interface. In: Pagani, M. (ed.) The Encyclopedia of Multimedia Technology and Networking, pp. 308–314. Idea Group, Hershey (2005)

    Google Scholar 

  34. Pantic, M., et al.: Affective Multimodal Human-Computer Interaction. In: Proceedings of the 13th annual ACM international conference on Multimedia, pp. 669–676 (2005)

    Google Scholar 

  35. Mathworks, Manual of Neural Network Toolbox for MATLAB

    Google Scholar 

  36. Tekalp, M., Ostermann, J.: Face and 2-D mesh animation in MPEG-4. Signal Processing: Image Communication 15(4), 387–421 (2000)

    Article  Google Scholar 

  37. Sebe, N., Cohen, I., Huang, T.S.: Handbook of Pattern Recognition and Computer Vision, January 2005. World Scientific, Singapore (2005)

    Google Scholar 

  38. Ekman, P., Friesen, W.: Pictures of Facial Affect. Consulting Psychologists Press, Palo Alto (1978)

    Google Scholar 

  39. Mertens, P.: The Prosogram: Semi-Automatic Transcription of Prosody based on a Tonal Perception Model. In: Bel, B., Marlien, I. (eds.) Proc. of Speech Prosody, Japan (2004)

    Google Scholar 

  40. Oudeyer, P.: The production and recognition of emotions in speech: features and algorithms. International Journal of Human Computer Interaction 59(1-2), 157–183 (2003)

    Google Scholar 

  41. Cowie, R., Douglas-Cowie, E.: Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In: Proc. International Conf. on Spoken Language Processing, pp. 1989–1992 (1996)

    Google Scholar 

  42. Cowie, R., et al.: Emotion Recognition in Human-Computer Interaction. IEEE Signal Processign Magazine, 33- 80 (January 2001)

    Google Scholar 

  43. Cowie, R., et al.: FEELTRACE: An instrument for recording perceived emotion in real time. In: ISCA Workshop on Speech and Emotion, Northern Ireland, pp. 19–24 (2000)

    Google Scholar 

  44. Fransens, R., De Prins, J.: SVM-based Nonparametric Discriminant Analysis, An Application to Face Detection. In: Ninth IEEE International Conference on Computer Vision, vol. 2, October 13-16 (2003)

    Google Scholar 

  45. Picard, R.W.: Affective Computing. MIT Press, Cambridge (1997)

    Google Scholar 

  46. Picard, R.W.: Towards computers that recognize and respond to user emotion. IBM Syst. Journal 39(3–4), 705–719 (2000)

    Article  Google Scholar 

  47. Hsu, R.L., Abdel-Mottaleb, M., Jain, A.K.: Face Detection in Color Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5) (2002)

    Google Scholar 

  48. Picard, R.W., Vyzas, E., Healey, J.: Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(10), 1175–1191 (2001)

    Article  Google Scholar 

  49. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)

    MATH  Google Scholar 

  50. Ioannou, S., et al.: Emotion recognition through facial expression analysis based on a neurofuzzy network. Neural Networks 18(4), 423–435 (2005)

    Article  Google Scholar 

  51. Balomenos, T., et al.: Emotion Analysis in Man-Machine Interaction Systems. In: Bengio, S., Bourlard, H. (eds.) MLMI 2004. LNCS, vol. 3361, pp. 318–328. Springer, Heidelberg (2005)

    Google Scholar 

  52. Williams, U., Stevens, K.N.: Emotions and Speech: some acoustical correlates. JASA 52, 1238–1250 (1972)

    Google Scholar 

  53. Zeng, Z., et al.: Multi-stream Confidence Analysis for Audio-Visual Affect Recognition. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 964–971. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  54. Zeng, Z., et al.: Audio-visual, Emotion Recognition in Adult Attachment Interview. This volume (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Thomas S. Huang Anton Nijholt Maja Pantic Alex Pentland

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Karpouzis, K. et al. (2007). Modeling Naturalistic Affective States Via Facial, Vocal, and Bodily Expressions Recognition. In: Huang, T.S., Nijholt, A., Pantic, M., Pentland, A. (eds) Artifical Intelligence for Human Computing. Lecture Notes in Computer Science(), vol 4451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72348-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72348-6_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72346-2

  • Online ISBN: 978-3-540-72348-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics