announcement

Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face

Authors:

Emily Mower ProvostAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 12, Issue 1s

Article No.: 25, Pages 1 - 23

https://doi.org/10.1145/2808204

Published: 21 October 2015 Publication History

Abstract

The need for human-centered, affective multimedia interfaces has motivated research in automatic emotion recognition. In this article, we focus on facial emotion recognition. Specifically, we target a domain in which speakers produce emotional facial expressions while speaking. The main challenge of this domain is the presence of modulations due to both emotion and speech. For example, an individual's mouth movement may be similar when he smiles and when he pronounces the phoneme /IY/, as in “cheese”. The result of this confusion is a decrease in performance of facial emotion recognition systems. In our previous work, we investigated the joint effects of emotion and speech on facial movement. We found that it is critical to employ proper temporal segmentation and to leverage knowledge of spoken content to improve classification performance. In the current work, we investigate the temporal characteristics of specific regions of the face, such as the forehead, eyebrow, cheek, and mouth. We present methodology that uses the temporal patterns of specific regions of the face in the context of a facial emotion recognition system. We test our proposed approaches on two emotion datasets, the IEMOCAP and SAVEE datasets. Our results demonstrate that the combination of emotion recognition systems based on different facial regions improves overall accuracy compared to systems that do not leverage different characteristics of individual regions.

Supplementary Material

a25-kim-app.pdf (kim.zip)

Supplemental movie, appendix, image and software files for, Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face

Download
23.00 KB

References

[1]

Barry Arons. 1994. Pitch-based emphasis detection for segmenting speech recordings. In Proceedings of the International Conference on Spoken Language Processing. 1931--1934.

[2]

Douglas Bates, Martin Maechler, and Ben Bolker. 2007. lme4: Linear mixed-effects models using S4 classes (R package version 0.9975-11).

[3]

Elisabetta Bevacqua and Catherine Pelachaud. 2004. Expressive audio-visual speech. Comput. Anim. Virtual Worlds 15, 3--4, 297--304.

Digital Library

[4]

Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, and Mubarak Shah. 2013. Towards a comprehensive computational model for aesthetic assessment of videos. In Proceedings of the 21st ACM International Conference on Multimedia. ACM, 361--364.

Digital Library

[5]

Benjamin Bigot, Isabelle Ferrane, and Z. Ibrahim. 2008. Towards the detection and the characterization of conversational speech zones in audiovisual documents. In Proceedings of the International Workshop on Content-Based Multimedia Indexing (CBMI'08). IEEE, 162--169.

[6]

Michael J. Black and Yaser Yacoob. 1997. Recognizing facial expressions in image sequences using local parameterized models of image motion. Int. J. Comput. Vision 25, 1, 23--48.

Digital Library

[7]

Marisa Boston, John Hale, Reinhold Kliegl, Umesh Patil, and Shravan Vasishth. 2008. Parsing costs as predictors of reading difficulty: An evaluation using the Potsdam Sentence Corpus. The Mind Research Repository (beta) 1.

[8]

Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee, and Shrikanth S. Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Lang. Resources Eval. 42, 4, 335--359.

[9]

Carlos Busso and Shrikanth S. Narayanan. 2007. Interrelation between speech and facial gestures in emotional utterances: a single subject study. IEEE Trans. Audio Speech Lang. Process. 15, 8, 2331--2347.

Digital Library

[10]

Rafael A. Calvo and Sidney D'Mello. 2010. Affect detection: An interdisciplinary review of models, methods, and their applications. IEEE Trans. Affective Computing 1, 1, 18--37.

Digital Library

[11]

Erik Cambria, Bjorn Schuller, Yunqing Xia, and Catherine Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 1.

Digital Library

[12]

Chandramouli Chandrasekaran, Andrea Trubanova, Sébastien Stillittano, Alice Caplier, and Asif A. Ghazanfar. 2009. The natural statistics of audiovisual speech. PLoS Comput. Biol. 5, 7, 1000436.

[13]

Jingying Chen, Maylor K. Leung, and Yongsheng Gao. 2003. Noisy logo recognition using line segment Hausdorff distance. Pattern Recog. 36, 4, 943--955.

[14]

Tao Chen, Felix X. Yu, Jiawei Chen, Yin Cui, Yan-Ying Chen, and Shih-Fu Chang. 2014. Object-based visual sentiment concept analysis and application. In Proceedings of the ACM International Conference on Multimedia. ACM, 367--376.

Digital Library

[15]

Abhinav Dhall, Akshay Asthana, Roland Goecke, and Tom Gedeon. 2011. Emotion recognition using PHOG and LPQ features. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG'11). IEEE, 878--883.

[16]

Thomas G. Dietterich. 1998. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10, 7, 1895--1923.

Digital Library

[17]

Paul Ekman and Wallace V. Friesen. 1977. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, CA.

[18]

Moataz El Ayadi, Mohamed S. Kamel, and Fakhri Karray. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recog. 44, 3, 572--587.

Digital Library

[19]

Vipul Garg, Harsh Kumar, and Rohit Sinha. 2013. Speech based Emotion Recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers. In Proceedings of the National Conference on Communications. IEEE, 1--5.

[20]

Davood Gharavian, Mansour Sheikhan, Alireza Nazerieh, and Sahar Garoucy. 2012. Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput. Appl. 21, 8, 2115--2126.

[21]

Ben Gold, Nelson Morgan, and Dan Ellis. 2011. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley & Sons.

Digital Library

[22]

Hatice Gunes, Björn Schuller, Maja Pantic, and Roddy Cowie. 2011. Emotion representation, analysis and synthesis in continuous space: A survey. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition and Workshops. IEEE, 827--834.

[23]

Sanaul Haq and Philip J. B. Jackson. 2010. Machine Audition: Principles, Algorithms and Systems. IGI Global, Hershey PA, Chapter Multimodal emotion recognition, 398--423.

[24]

M. Sazzad Hussain, Sidney K. D'Mello, and Rafael A. Calvo. 2014. 25 Research and development tools in affective computing. In The Oxford Handbook of Affective Computing, 349.

[25]

Brendan Jou, Subhabrata Bhattacharya, and Shih-Fu Chang. 2014. Predicting viewer perceived emotions in animated GIFs. In Proceedings of the ACM International Conference on Multimedia. ACM, 213--216.

Digital Library

[26]

Markus Kächele, Michael Glodek, Dimitrij Zharkov, Sascha Meudt, and Friedhelm Schwenker. 2014. Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. Depression 1, 1.

[27]

Ozlem Kalinli. 2012. Automatic phoneme segmentation using auditory attention features. In Proceedings of INTERSPEECH.

[28]

Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2005. Phoneme alignment based on discriminative learning. http://u.cs.biu.ac.il/&sim;jkeshet/papers/KeshetShSiCh05.pdf.

[29]

Yelin Kim and Emily Mower Provost. 2014. Say Cheese vs. smile: Reducing speech-related variability for facial emotion recognition. In Proceedings of the ACM International Conference on Multimedia (ACM MM'14).

Digital Library

[30]

Michael Kipp and J.-C. Martin. 2009. Gesture and emotion: Can basic gestural form features discriminate emotions? In Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (ACII'09). IEEE, 1--8.

[31]

Andrea Kleinsmith and Nadia Bianchi-Berthouze. 2013. Affective body expression perception and recognition: A survey. IEEE Trans. Affective Computing 4, 1, 15--33.

Digital Library

[32]

Shashidhar G. Koolagudi, Nitin Kumar, and K. Sreenivasa Rao. 2011. Speech emotion recognition using segmental level prosodic analysis. In Proceedings of the International Conference on Devices and Communications. IEEE, 1--5.

[33]

Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee, and Shrikanth Narayanan. 2009. Emotion recognition using a hierarchical binary decision tree approach. In Proceedings of INTERSPEECH. 320--323.

[34]

Chi-Chun Lee, Emily Mower, Carlos Busso, Sungbok Lee, and Shrikanth Narayanan. 2011. Emotion recognition using a hierarchical binary decision tree approach. Speech Commun. 53, 9, 1162--1171.

Digital Library

[35]

Chul Min Lee and Shrikanth S. Narayanan. 2005. Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13, 2, 293--303.

[36]

Chul Min Lee, Serdar Yildirim, Murtaza Bulut, Abe Kazemzadeh, Carlos Busso, Zhigang Deng, Sungbok Lee, and Shrikanth Narayanan. 2004. Emotion recognition based on phoneme classes. In Proceedings of INTERSPEECH. 205--211.

[37]

Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang. 2007. Trajectory clustering: a partition-and-group framework. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 593--604.

Digital Library

[38]

Patrick Lucey, Terrence Martin, and Sridha Sridharan. 2004. Confusability of phonemes grouped according to their viseme classes in noisy environments. In Proceedings of the Australian International Conference on Speech Science & Technology. 265--270.

[39]

Soroosh Mariooryad and Carlos Busso. 2013. Feature and model level compensation of lexical content for facial emotion recognition. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG'13).

[40]

Hongying Meng and Nadia Bianchi-Berthouze. 2011. Naturalistic affective expression classification by a multi-stage approach based on hidden Markov models. In Affective Computing and Intelligent Interaction, Springer, 378--387.

Digital Library

[41]

Angeliki Metallinou, Carlos Busso, Sungbok Lee, and Shrikanth Narayanan. 2010. Visual emotion recognition using compact facial representations and viseme information. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing. IEEE, 2474--2477.

[42]

Angeliki Metallinou, Athanasios Katsamanis, and Shrikanth Narayanan. 2013. Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information. Image Vision Comput. 31, 2, 137--152.

Digital Library

[43]

Angeliki Metallinou, Martin Wollmer, Athanasios Katsamanis, Florian Eyben, Björn Schuller, and Shrikanth Narayanan. 2012. Context-sensitive learning for enhanced audiovisual emotion classification. IEEE Trans. Affective Computing 3, 2, 184--198.

Digital Library

[44]

Emily Mower, Maja J. Mataric, and Shrikanth Narayanan. 2009. Human perception of audio-visual synthetic character emotion expression in the presence of ambiguous and conflicting information. IEEE Trans. Multimedia 11, 5, 843--855.

Digital Library

[45]

Emily Mower, Maja J. Mataric, and Shrikanth Narayanan. 2011. A framework for automatic human emotion classification using emotion profiles. IEEE Trans. Audio Speech Lang. Process. 19, 5 (2011), 1057--1070.

Digital Library

[46]

Emily Mower and Shrikanth Narayanan. 2011. A hierarchical static-dynamic framework for emotion classification. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2372--2375.

[47]

Emily Mower Provost. 2013. Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective flow. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 3682--3686.

[48]

Shrikanth Narayanan and Panayiotis G. Georgiou. 2013. Behavioral signal processing: Deriving human behavioral informatics from speech and language. Proc. IEEE 101, 5, 1203.

[49]

Jérémie Nicolle, Vincent Rapp, Kévin Bailly, Lionel Prevost, and Mohamed Chetouani. 2012. Robust continuous prediction of human emotions using multiscale dynamic cues. In Proceedings of the ACM International Conference on Multimodal Interaction. ACM, 501--508.

Digital Library

[50]

Maja Pantic and Marian Stewart Bartlett. 2007. Machine analysis of facial expressions. In Face Recognition, I-Tech Education and Publishing, Vienna, Austria, 377--416.

[51]

Yu Qiao, Naoya Shimomura, and Nobuaki Minematsu. 2008. Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3989--3992.

[52]

Yong Rui, Anoop Gupta, and Alex Acero. 2000. Automatically extracting highlights for TV baseball programs. In Proceedings of the 8th ACM International Conference on Multimedia. ACM, 105--115.

Digital Library

[53]

Enrique Sánchez-Lozano, Paula Lopez-Otero, Laura Docio-Fernandez, Enrique Argones-Rúa, and José Luis Alba-Castro. 2013. Audiovisual three-level fusion for continuous estimation of Russell's emotion circumplex. In Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge. ACM, 31--40.

Digital Library

[54]

Georgia Sandbach, Stefanos Zafeiriou, Maja Pantic, and Daniel Rueckert. 2011. A dynamic approach to the recognition of 3d facial expressions and their temporal models. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition and Workshops. 406--413.

[55]

Arman Savran, Houwei Cao, Miraj Shah, Ani Nenkova, and Ragini Verma. 2012. Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering. In Proceedings of the ACM International Conference on Multimodal Interaction. 485--492.

Digital Library

[56]

Björn Schuller, Anton Batliner, Stefan Steidl, and Dino Seppi. 2011. Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge. Speech Commun. 53, 9, 1062--1087.

Digital Library

[57]

Björn Schuller and Gerhard Rigoll. 2006. Timing levels in segment-based speech emotion recognition. In Proceedings of INTERSPEECH. 1818--1821.

[58]

Björn Schuller, Stefan Steidl, Anton Batliner, Felix Burkhardt, Laurence Devillers, Christian Müller, and Shrikanth Narayanan. 2013. Paralinguistics in speech and language: State-of-the-art and the challenge. Computer Speech Lang. 27, 1, 4--39.

Digital Library

[59]

Miraj Shah, David G. Cooper, Houwei Cao, Ruben C. Gur, Ani Nenkova, and Ragini Verma. 2013. Action Unit Models of Facial Expression of Emotion in the Presence of Speech. In Proceedings of the Conference on Affective Computing and Intelligent Interaction. IEEE, 49--54.

Digital Library

[60]

Caifeng Shan, Shaogang Gong, and Peter W. McOwan. 2009. Facial expression recognition based on local binary patterns: A comprehensive study. Image Vision Comput. 27, 6, 803--816.

Digital Library

[61]

Doroteo Torre Toledano, Luis A. Hernández Gómez, and Luis Villarrubia Grande. 2003. Automatic phonetic segmentation. IEEE Trans. Speech Audio Process. 11, 6, 617--625.

[62]

Bogdan Vlasenko, Dmytro Prylipko, Ronald Böck, and Andreas Wendemuth. 2014. Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications. Computer Speech Lang. 28, 2, 483--500.

Digital Library

[63]

Shaohua Wan and J. K. Aggarwal. 2014. Spontaneous facial expression recognition: A robust metric learning approach. Pattern Recog. 47, 5, 1859--1868.

Digital Library

[64]

Siqing Wu, Tiago H. Falk, and Wai-Yip Chan. 2011. Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53, 5, 768--785.

Digital Library

[65]

Guoying Zhao and Matti Pietikainen. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 29, 6, 915--928.

Digital Library

Cited By

Li LZhao Z(2023)Designing Behaviors of Robots Based on the Artificial Emotion Expression Method in Human–Robot InteractionsMachines10.3390/machines1105053311:5(533)Online publication date: 6-May-2023
https://doi.org/10.3390/machines11050533
Rana SChaudhary RGupta MGarg P(2023)Exploring Different Techniques for Emotion Detection Through Face Recognition2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech)10.1109/ICACCTech61146.2023.00128(779-786)Online publication date: 23-Dec-2023
https://doi.org/10.1109/ICACCTech61146.2023.00128
Aslam ASargano AHabib Z(2023)Attention-based multimodal sentiment analysis and emotion recognition using deep neural networksApplied Soft Computing10.1016/j.asoc.2023.110494144(110494)Online publication date: Sep-2023
https://doi.org/10.1016/j.asoc.2023.110494
Show More Cited By

Index Terms

Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
      2. Computer vision tasks
        Scene understanding
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Analysis of emotion recognition using facial expressions, speech and multimodal information
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on ...
Emotion recognition by face dynamics
CompSysTech '13: Proceedings of the 14th International Conference on Computer Systems and Technologies

The paper proposes an accessible method for emotion recognition from facial dynamics in video streams. The emotions considered are anger, disgust, fear, happiness, sadness, surprise, and the neutral expression as well. The method is based on the Facial ...
Human-Computer Interaction Using Emotion Recognition from Facial Expression
EMS '11: Proceedings of the 2011 UKSim 5th European Symposium on Computer Modeling and Simulation

This paper describes emotion recognition system based on facial expression. A fully automatic facial expression recognition system is based on three steps: face detection, facial characteristic extraction and facial expression classification. We have ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 12, Issue 1s

Special Issue on Smartphone-Based Interactive Technologies, Systems, and Applications and Special Issue on Extended Best Papers from ACM Multimedia 2014

October 2015

317 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/2837676

Editor:
Ralf Steinmetz
Technische Universität Darmstadt, Germany

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2015

Accepted: 01 July 2015

Revised: 01 March 2015

Received: 01 February 2015

Published in TOMM Volume 12, Issue 1s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Announcement
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
503
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Li LZhao Z(2023)Designing Behaviors of Robots Based on the Artificial Emotion Expression Method in Human–Robot InteractionsMachines10.3390/machines1105053311:5(533)Online publication date: 6-May-2023
https://doi.org/10.3390/machines11050533
Rana SChaudhary RGupta MGarg P(2023)Exploring Different Techniques for Emotion Detection Through Face Recognition2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech)10.1109/ICACCTech61146.2023.00128(779-786)Online publication date: 23-Dec-2023
https://doi.org/10.1109/ICACCTech61146.2023.00128
Aslam ASargano AHabib Z(2023)Attention-based multimodal sentiment analysis and emotion recognition using deep neural networksApplied Soft Computing10.1016/j.asoc.2023.110494144(110494)Online publication date: Sep-2023
https://doi.org/10.1016/j.asoc.2023.110494
Qayyum ARazzak ITanveer MMazher M(2022)Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer InteractionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/353957720:2(1-17)Online publication date: 26-May-2022
https://dl.acm.org/doi/10.1145/3539577
Yin GSun SYu DLi DZhang K(2022)A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity SignalsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/349068618:3(1-23)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3490686
Zhang LCai JPeng FLong M(2021)A Benchmark Database for the Comparison of Face Morphing Detection Methods2021 International Conference on Electronic Information Technology and Smart Agriculture (ICEITSA)10.1109/ICEITSA54226.2021.00082(393-401)Online publication date: Dec-2021
https://doi.org/10.1109/ICEITSA54226.2021.00082
Huang FWei KWeng JLi Z(2020)Attention-Based Modality-Gated Networks for Image-Text Sentiment AnalysisACM Transactions on Multimedia Computing, Communications, and Applications10.1145/338886116:3(1-19)Online publication date: 5-Jul-2020
https://dl.acm.org/doi/10.1145/3388861
Salman ABusso C(2020)Style Extractor For Facial Expression Recognition in the Presence of Speech2020 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP40778.2020.9191330(1806-1810)Online publication date: Oct-2020
https://doi.org/10.1109/ICIP40778.2020.9191330
Canedo DNeves A(2019)Facial Expression Recognition Using Computer Vision: A Systematic ReviewApplied Sciences10.3390/app92146789:21(4678)Online publication date: 2-Nov-2019
https://doi.org/10.3390/app9214678
Hao MLiu GGokhale AXu YChen R(2019)Detecting Happiness Using Hyperspectral Imaging TechnologyComputational Intelligence and Neuroscience10.1155/2019/19657892019Online publication date: 15-Jan-2019
https://dl.acm.org/doi/10.1155/2019/1965789
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents