Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
announcement

Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face

Published: 21 October 2015 Publication History

Abstract

The need for human-centered, affective multimedia interfaces has motivated research in automatic emotion recognition. In this article, we focus on facial emotion recognition. Specifically, we target a domain in which speakers produce emotional facial expressions while speaking. The main challenge of this domain is the presence of modulations due to both emotion and speech. For example, an individual's mouth movement may be similar when he smiles and when he pronounces the phoneme /IY/, as in “cheese”. The result of this confusion is a decrease in performance of facial emotion recognition systems. In our previous work, we investigated the joint effects of emotion and speech on facial movement. We found that it is critical to employ proper temporal segmentation and to leverage knowledge of spoken content to improve classification performance. In the current work, we investigate the temporal characteristics of specific regions of the face, such as the forehead, eyebrow, cheek, and mouth. We present methodology that uses the temporal patterns of specific regions of the face in the context of a facial emotion recognition system. We test our proposed approaches on two emotion datasets, the IEMOCAP and SAVEE datasets. Our results demonstrate that the combination of emotion recognition systems based on different facial regions improves overall accuracy compared to systems that do not leverage different characteristics of individual regions.

Supplementary Material

a25-kim-app.pdf (kim.zip)
Supplemental movie, appendix, image and software files for, Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face

References

[1]
Barry Arons. 1994. Pitch-based emphasis detection for segmenting speech recordings. In Proceedings of the International Conference on Spoken Language Processing. 1931--1934.
[2]
Douglas Bates, Martin Maechler, and Ben Bolker. 2007. lme4: Linear mixed-effects models using S4 classes (R package version 0.9975-11).
[3]
Elisabetta Bevacqua and Catherine Pelachaud. 2004. Expressive audio-visual speech. Comput. Anim. Virtual Worlds 15, 3--4, 297--304.
[4]
Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, and Mubarak Shah. 2013. Towards a comprehensive computational model for aesthetic assessment of videos. In Proceedings of the 21st ACM International Conference on Multimedia. ACM, 361--364.
[5]
Benjamin Bigot, Isabelle Ferrane, and Z. Ibrahim. 2008. Towards the detection and the characterization of conversational speech zones in audiovisual documents. In Proceedings of the International Workshop on Content-Based Multimedia Indexing (CBMI'08). IEEE, 162--169.
[6]
Michael J. Black and Yaser Yacoob. 1997. Recognizing facial expressions in image sequences using local parameterized models of image motion. Int. J. Comput. Vision 25, 1, 23--48.
[7]
Marisa Boston, John Hale, Reinhold Kliegl, Umesh Patil, and Shravan Vasishth. 2008. Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus. The Mind Research Repository (beta) 1.
[8]
Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Lang. Resources Eval. 42, 4, 335--359.
[9]
Carlos Busso and Shrikanth S. Narayanan. 2007. Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans. Audio Speech Lang. Process. 15, 8, 2331--2347.
[10]
Rafael A. Calvo and Sidney D'Mello. 2010. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Trans. Affective Computing 1, 1, 18--37.
[11]
Erik Cambria, Bjorn Schuller, Yunqing Xia, and Catherine Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 1.
[12]
Chandramouli Chandrasekaran, Andrea Trubanova, Sébastien Stillittano, Alice Caplier, and Asif A. Ghazanfar. 2009. The natural statistics of audiovisual speech. PLoS Comput. Biol. 5, 7, 1000436.
[13]
Jingying Chen, Maylor K. Leung, and Yongsheng Gao. 2003. Noisy logo recognition using line segment Hausdorff distance. Pattern Recog. 36, 4, 943--955.
[14]
Tao Chen, Felix X. Yu, Jiawei Chen, Yin Cui, Yan-Ying Chen, and Shih-Fu Chang. 2014. Object-based visual sentiment concept analysis and application. In Proceedings of the ACM International Conference on Multimedia. ACM, 367--376.
[15]
Abhinav Dhall, Akshay Asthana, Roland Goecke, and Tom Gedeon. 2011. Emotion recognition using PHOG and LPQ features. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG'11). IEEE, 878--883.
[16]
Thomas G. Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 7, 1895--1923.
[17]
Paul Ekman and Wallace V. Friesen. 1977. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, CA.
[18]
Moataz El Ayadi, Mohamed S. Kamel, and Fakhri Karray. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recog. 44, 3, 572--587.
[19]
Vipul Garg, Harsh Kumar, and Rohit Sinha. 2013. Speech based Emotion Recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers. In Proceedings of the National Conference on Communications. IEEE, 1--5.
[20]
Davood Gharavian, Mansour Sheikhan, Alireza Nazerieh, and Sahar Garoucy. 2012. Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput. Appl. 21, 8, 2115--2126.
[21]
Ben Gold, Nelson Morgan, and Dan Ellis. 2011. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley & Sons.
[22]
Hatice Gunes, Björn Schuller, Maja Pantic, and Roddy Cowie. 2011. Emotion representation, analysis and synthesis in continuous space: A survey. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition and Workshops. IEEE, 827--834.
[23]
Sanaul Haq and Philip J. B. Jackson. 2010. Machine Audition: Principles, Algorithms and Systems. IGI Global, Hershey PA, Chapter Multimodal emotion recognition, 398--423.
[24]
M. Sazzad Hussain, Sidney K. D'Mello, and Rafael A. Calvo. 2014. 25 Research and development tools in affective computing. In The Oxford Handbook of Affective Computing, 349.
[25]
Brendan Jou, Subhabrata Bhattacharya, and Shih-Fu Chang. 2014. Predicting viewer perceived emotions in animated GIFs. In Proceedings of the ACM International Conference on Multimedia. ACM, 213--216.
[26]
Markus Kächele, Michael Glodek, Dimitrij Zharkov, Sascha Meudt, and Friedhelm Schwenker. 2014. Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. Depression 1, 1.
[27]
Ozlem Kalinli. 2012. Automatic phoneme segmentation using auditory attention features. In Proceedings of INTERSPEECH.
[28]
Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2005. Phoneme alignment based on discriminative learning. http://u.cs.biu.ac.il/∼jkeshet/papers/KeshetShSiCh05.pdf.
[29]
Yelin Kim and Emily Mower Provost. 2014. Say Cheese vs. smile: Reducing speech-related variability for facial emotion recognition. In Proceedings of the ACM International Conference on Multimedia (ACM MM'14).
[30]
Michael Kipp and J.-C. Martin. 2009. Gesture and emotion: Can basic gestural form features discriminate emotions? In Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (ACII'09). IEEE, 1--8.
[31]
Andrea Kleinsmith and Nadia Bianchi-Berthouze. 2013. Affective body expression perception and recognition: A survey. IEEE Trans. Affective Computing 4, 1, 15--33.
[32]
Shashidhar G. Koolagudi, Nitin Kumar, and K. Sreenivasa Rao. 2011. Speech emotion recognition using segmental level prosodic analysis. In Proceedings of the International Conference on Devices and Communications. IEEE, 1--5.
[33]
Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee, and Shrikanth Narayanan. 2009. Emotion recognition using a hierarchical binary decision tree approach. In Proceedings of INTERSPEECH. 320--323.
[34]
Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee, and Shrikanth Narayanan. 2011. Emotion recognition using a hierarchical binary decision tree approach. Speech Commun. 53, 9, 1162--1171.
[35]
Chul Min Lee and Shrikanth S. Narayanan. 2005. Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13, 2, 293--303.
[36]
Chul Min Lee, Serdar Yildirim, Murtaza Bulut, Abe Kazemzadeh, Carlos Busso, Zhigang Deng, Sungbok Lee, and Shrikanth Narayanan. 2004. Emotion recognition based on phoneme classes. In Proceedings of INTERSPEECH. 205--211.
[37]
Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang. 2007. Trajectory clustering: a partition-and-group framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 593--604.
[38]
Patrick Lucey, Terrence Martin, and Sridha Sridharan. 2004. Confusability of phonemes grouped according to their viseme classes in noisy environments. In Proceedings of the Australian International Conference on Speech Science & Technology. 265--270.
[39]
Soroosh Mariooryad and Carlos Busso. 2013. Feature and model level compensation of lexical content for facial emotion recognition. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG'13).
[40]
Hongying Meng and Nadia Bianchi-Berthouze. 2011. Naturalistic affective expression classification by a multi-stage approach based on hidden Markov models. In Affective Computing and Intelligent Interaction, Springer, 378--387.
[41]
Angeliki Metallinou, Carlos Busso, Sungbok Lee, and Shrikanth Narayanan. 2010. Visual emotion recognition using compact facial representations and viseme information. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing. IEEE, 2474--2477.
[42]
Angeliki Metallinou, Athanasios Katsamanis, and Shrikanth Narayanan. 2013. Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image Vision Comput. 31, 2, 137--152.
[43]
Angeliki Metallinou, Martin Wollmer, Athanasios Katsamanis, Florian Eyben, Björn Schuller, and Shrikanth Narayanan. 2012. Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affective Computing 3, 2, 184--198.
[44]
Emily Mower, Maja J. Mataric, and Shrikanth Narayanan. 2009. Human perception of audio-visual synthetic character emotion expression in the presence of ambiguous and conflicting information. IEEE Trans. Multimedia 11, 5, 843--855.
[45]
Emily Mower, Maja J. Mataric, and Shrikanth Narayanan. 2011. A framework for automatic human emotion classification using emotion profiles. IEEE Trans. Audio Speech Lang. Process. 19, 5 (2011), 1057--1070.
[46]
Emily Mower and Shrikanth Narayanan. 2011. A hierarchical static-dynamic framework for emotion classification. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2372--2375.
[47]
Emily Mower Provost. 2013. Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective flow. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 3682--3686.
[48]
Shrikanth Narayanan and Panayiotis G. Georgiou. 2013. Behavioral signal processing: Deriving human behavioral informatics from speech and language. Proc. IEEE 101, 5, 1203.
[49]
Jérémie Nicolle, Vincent Rapp, Kévin Bailly, Lionel Prevost, and Mohamed Chetouani. 2012. Robust continuous prediction of human emotions using multiscale dynamic cues. In Proceedings of the ACM International Conference on Multimodal Interaction. ACM, 501--508.
[50]
Maja Pantic and Marian Stewart Bartlett. 2007. Machine analysis of facial expressions. In Face Recognition, I-Tech Education and Publishing, Vienna, Austria, 377--416.
[51]
Yu Qiao, Naoya Shimomura, and Nobuaki Minematsu. 2008. Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3989--3992.
[52]
Yong Rui, Anoop Gupta, and Alex Acero. 2000. Automatically extracting highlights for TV baseball programs. In Proceedings of the 8th ACM International Conference on Multimedia. ACM, 105--115.
[53]
Enrique Sánchez-Lozano, Paula Lopez-Otero, Laura Docio-Fernandez, Enrique Argones-Rúa, and José Luis Alba-Castro. 2013. Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge. ACM, 31--40.
[54]
Georgia Sandbach, Stefanos Zafeiriou, Maja Pantic, and Daniel Rueckert. 2011. A dynamic approach to the recognition of 3d facial expressions and their temporal models. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition and Workshops. 406--413.
[55]
Arman Savran, Houwei Cao, Miraj Shah, Ani Nenkova, and Ragini Verma. 2012. Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering. In Proceedings of the ACM International Conference on Multimodal Interaction. 485--492.
[56]
Björn Schuller, Anton Batliner, Stefan Steidl, and Dino Seppi. 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Commun. 53, 9, 1062--1087.
[57]
Björn Schuller and Gerhard Rigoll. 2006. Timing levels in segment-based speech emotion recognition. In Proceedings of INTERSPEECH. 1818--1821.
[58]
Björn Schuller, Stefan Steidl, Anton Batliner, Felix Burkhardt, Laurence Devillers, Christian Müller, and Shrikanth Narayanan. 2013. Paralinguistics in speech and language: State-of-the-art and the challenge. Computer Speech Lang. 27, 1, 4--39.
[59]
Miraj Shah, David G. Cooper, Houwei Cao, Ruben C. Gur, Ani Nenkova, and Ragini Verma. 2013. Action Unit Models of Facial Expression of Emotion in the Presence of Speech. In Proceedings of the Conference on Affective Computing and Intelligent Interaction. IEEE, 49--54.
[60]
Caifeng Shan, Shaogang Gong, and Peter W. McOwan. 2009. Facial expression recognition based on local binary patterns: A comprehensive study. Image Vision Comput. 27, 6, 803--816.
[61]
Doroteo Torre Toledano, Luis A. Hernández Gómez, and Luis Villarrubia Grande. 2003. Automatic phonetic segmentation. IEEE Trans. Speech Audio Process. 11, 6, 617--625.
[62]
Bogdan Vlasenko, Dmytro Prylipko, Ronald Böck, and Andreas Wendemuth. 2014. Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Computer Speech Lang. 28, 2, 483--500.
[63]
Shaohua Wan and J. K. Aggarwal. 2014. Spontaneous facial expression recognition: A robust metric learning approach. Pattern Recog. 47, 5, 1859--1868.
[64]
Siqing Wu, Tiago H. Falk, and Wai-Yip Chan. 2011. Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53, 5, 768--785.
[65]
Guoying Zhao and Matti Pietikainen. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29, 6, 915--928.

Cited By

View all
  • (2023)Designing Behaviors of Robots Based on the Artificial Emotion Expression Method in Human–Robot InteractionsMachines10.3390/machines1105053311:5(533)Online publication date: 6-May-2023
  • (2023)Exploring Different Techniques for Emotion Detection Through Face Recognition2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech)10.1109/ICACCTech61146.2023.00128(779-786)Online publication date: 23-Dec-2023
  • (2023)Attention-based multimodal sentiment analysis and emotion recognition using deep neural networksApplied Soft Computing10.1016/j.asoc.2023.110494144(110494)Online publication date: Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 12, Issue 1s
Special Issue on Smartphone-Based Interactive Technologies, Systems, and Applications and Special Issue on Extended Best Papers from ACM Multimedia 2014
October 2015
317 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/2837676
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2015
Accepted: 01 July 2015
Revised: 01 March 2015
Received: 01 February 2015
Published in TOMM Volume 12, Issue 1s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Emotion
  2. emotion recognition
  3. facial movement
  4. segmentation

Qualifiers

  • Announcement
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Designing Behaviors of Robots Based on the Artificial Emotion Expression Method in Human–Robot InteractionsMachines10.3390/machines1105053311:5(533)Online publication date: 6-May-2023
  • (2023)Exploring Different Techniques for Emotion Detection Through Face Recognition2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech)10.1109/ICACCTech61146.2023.00128(779-786)Online publication date: 23-Dec-2023
  • (2023)Attention-based multimodal sentiment analysis and emotion recognition using deep neural networksApplied Soft Computing10.1016/j.asoc.2023.110494144(110494)Online publication date: Sep-2023
  • (2022)Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer InteractionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/353957720:2(1-17)Online publication date: 26-May-2022
  • (2022)A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity SignalsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/349068618:3(1-23)Online publication date: 4-Mar-2022
  • (2021)A Benchmark Database for the Comparison of Face Morphing Detection Methods2021 International Conference on Electronic Information Technology and Smart Agriculture (ICEITSA)10.1109/ICEITSA54226.2021.00082(393-401)Online publication date: Dec-2021
  • (2020)Attention-Based Modality-Gated Networks for Image-Text Sentiment AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/338886116:3(1-19)Online publication date: 5-Jul-2020
  • (2020)Style Extractor For Facial Expression Recognition in the Presence of Speech2020 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP40778.2020.9191330(1806-1810)Online publication date: Oct-2020
  • (2019)Facial Expression Recognition Using Computer Vision: A Systematic ReviewApplied Sciences10.3390/app92146789:21(4678)Online publication date: 2-Nov-2019
  • (2019)Detecting Happiness Using Hyperspectral Imaging TechnologyComputational Intelligence and Neuroscience10.1155/2019/19657892019Online publication date: 15-Jan-2019
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media