short-paper

Wild wild emotion: a multimodal ensemble approach

Authors:

Zakaria Aldeneh,

Soheil Khorram,

Duc Le, and

Emily Mower ProvostAuthors Info & Claims

ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

October 2016

Pages 501 - 505

https://doi.org/10.1145/2993148.2997626

Published: 31 October 2016 Publication History

Abstract

Automatic emotion recognition from audio-visual data is a topic that has been broadly explored using data captured in the laboratory. However, these data are not necessarily representative of how emotion is manifested in the real-world. In this paper, we describe our system for the 2016 Emotion Recognition in the Wild challenge. We use the Acted Facial Expressions in the Wild database 6.0 (AFEW 6.0), which contains short clips of popular TV shows and movies and has more variability in the data compared to laboratory recordings. We explore a set of features that incorporate information from facial expressions and speech, in addition to cues from the background music and overall scene. In particular, we propose the use of a feature set composed of dimensional emotion estimates trained from outside acoustic corpora. We design sets of multiclass and pairwise (one-versus-one) classifiers and fuse the resulting systems. Our fusion increases the performance from a baseline of 38.81% to 43.86% and from 40.47% to 46.88%, for validation and test sets, respectively. While the video features perform better than audio features alone, a combination of the two modalities achieves the greatest performance, with gains of 4.4% and 1.4%, with and without information gain, respectively. Because of the flexible design of the fusion, it is easily adaptable to other multimodal learning problems.

References

[1]

A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.

Digital Library

[2]

L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.

Digital Library

[3]

C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4):335–359, 2008.

[4]

C. Busso, S. Parthasarathy, A. Burmania, M. Abdel-Wahab, N. Sadoughi, and E. Mower Provost. Msp-improv: An acted corpus of dyadic interactions to study emotion perception. 2015.

[5]

C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.

Digital Library

[6]

M. Cover Thomas and A. Thomas Joy. Elements of information theory. Wiley, 1991.

Digital Library

[7]

A. Dhall, R. Goecke, J. Joshi, J. Hoey, and T. Gedeon. Emotiw 2016: Video and group-level emotion recognition challenges. In ACM ICMI, 2016.

Digital Library

[8]

A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 423–426. ACM, 2015.

Digital Library

[9]

A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI ’15, pages 423–426, New York, NY, USA, 2015. ACM.

Digital Library

[10]

P. Ekman and E. L. Rosenberg. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.

[11]

F. Eyben, M. Wöllmer, and B. Schuller. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia, pages 1459–1462. ACM, 2010.

Digital Library

[12]

M. Glodek, S. Tschechne, G. Layher, M. Schels, T. Brosch, S. Scherer, M. Kächele, M. Schmidt, H. Neumann, G. Palm, et al. Multiple classifier systems for the classification of audio-visual emotional states. In Affective Computing and Intelligent Interaction, pages 359–368. Springer, 2011.

Digital Library

[13]

S. Haq and P. J. Jackson. Multimodal emotion recognition. Machine audition: principles, algorithms and systems, pages 398–423, 2010.

[14]

C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks, 13(2):415–425, 2002.

Digital Library

[15]

B. Jou, S. Bhattacharya, and S.-F. Chang. Predicting viewer perceived emotions in animated gifs. In Proceedings of the 22nd ACM international conference on Multimedia, pages 213–216, 2014.

Digital Library

[16]

Y. Kim and E. Mower Provost. Emotion spotting: Discovering regions of evidence in audio-visual emotion expressions. In ACM International Conference on Multimodal Interaction (ACM ICMI), 2016.

Digital Library

[17]

G. Littlewort, J. Whitehill, T. Wu, I. Fasel, M. Frank, J. Movellan, and M. Bartlett. The computer expression recognition toolbox (cert). In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, pages 298–305. IEEE, 2011.

[18]

M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 494–501. ACM, 2014.

Digital Library

[19]

R. Polikar. Ensemble learning. In Ensemble machine learning, pages 1–34. Springer, 2012.

[20]

F. Ringeval, S. Amiriparian, F. Eyben, K. Scherer, and B. Schuller. Emotion recognition in the wild: Incorporating voice and lip activity in multimodal decision-level fusion. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 473–480. ACM, 2014.

Digital Library

[21]

D. Ruta and B. Gabrys. An overview of classifier fusion methods. Computing and Information systems, 7(1):1–10, 2000.

[22]

C. Shan, S. Gong, and P. W. McOwan. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6):803–816, 2009.

Digital Library

[23]

B. Sun, L. Li, G. Zhou, X. Wu, J. He, L. Yu, D. Li, and Q. Wei. Combining multimodal features within a fusion network for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 497–502. ACM, 2015.

Digital Library

[24]

B. Sun, L. Li, T. Zuo, Y. Chen, G. Zhou, and X. Wu. Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 481–486. ACM, 2014.

Digital Library

[25]

A. Yao, J. Shao, N. Ma, and Y. Chen. Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 451–458. ACM, 2015.

Digital Library

[26]

G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE transactions on pattern analysis and machine intelligence, 29(6):915–928, 2007.

Digital Library

Cited By

Siddiqui MDhakal PYang XJavaid A(2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
https://doi.org/10.3390/mti6060047
Khorram SMcInnis MProvost E(2021)Jointly Aligning and Predicting Continuous Emotion AnnotationsIEEE Transactions on Affective Computing10.1109/TAFFC.2019.291704712:4(1069-1083)Online publication date: 1-Oct-2021
https://doi.org/10.1109/TAFFC.2019.2917047
Siddiqui MJavaid A(2020)A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared ImagesMultimodal Technologies and Interaction10.3390/mti40300464:3(46)Online publication date: 6-Aug-2020
https://doi.org/10.3390/mti4030046
Show More Cited By

Index Terms

Wild wild emotion: a multimodal ensemble approach
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Machine learning algorithms
      1. Ensemble methods

Recommendations

Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

The Emotion Recognition in the Wild (EmotiW) Challenge has been held for three years. Previous winner teams primarily focus on designing specific deep neural networks or fusing diverse hand-crafted and deep convolutional features. They all neglect to ...
Read More
Audiovisual emotion recognition in wild

People express emotions through different modalities. Utilization of both verbal and nonverbal communication channels allows to create a system in which the emotional state is expressed more clearly and therefore easier to understand. Expanding the ...
Read More
Emotion recognition in the wild challenge 2016
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

The fourth Emotion Recognition in the Wild (EmotiW) challenge is a grand challenge in the ACM International Conference on Multimodal Interaction 2016, Tokyo. EmotiW is a series of benchmarking and competition effort for researchers working in the area ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

October 2016

605 pages

ISBN:9781450345569

DOI:10.1145/2993148

General Chairs:
Yukiko I. Nakano
Seikei University, Japan
,
Elisabeth André
Augsburg University, Germany
,
Toyoaki Nishida
Kyoto University, Japan
,
Program Chairs:
Louis-Philippe Morency
Carnegie Mellon University, USA
,
Carlos Busso
University of Texas at Dallas, USA
,
Catherine Pelachaud
ISIR, France / University of Paris6, France

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

ICMI '16

Sponsor:

SIGCHI

ICMI '16: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 12 - 16, 2016

Tokyo, Japan

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
177
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Other Metrics

View Author Metrics

Citations

Cited By

Siddiqui MDhakal PYang XJavaid A(2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
https://doi.org/10.3390/mti6060047
Khorram SMcInnis MProvost E(2021)Jointly Aligning and Predicting Continuous Emotion AnnotationsIEEE Transactions on Affective Computing10.1109/TAFFC.2019.291704712:4(1069-1083)Online publication date: 1-Oct-2021
https://doi.org/10.1109/TAFFC.2019.2917047
Siddiqui MJavaid A(2020)A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared ImagesMultimodal Technologies and Interaction10.3390/mti40300464:3(46)Online publication date: 6-Aug-2020
https://doi.org/10.3390/mti4030046
Parthasarathy SSundaram STruong KHeylen DCzerwinski MBerthouze NChetouani MNakano M(2020)Training Strategies to Handle Missing Modalities for Audio-Visual Expression RecognitionCompanion Publication of the 2020 International Conference on Multimodal Interaction10.1145/3395035.3425202(400-404)Online publication date: 25-Oct-2020
https://dl.acm.org/doi/10.1145/3395035.3425202
Canedo DNeves A(2019)Facial Expression Recognition Using Computer Vision: A Systematic ReviewApplied Sciences10.3390/app92146789:21(4678)Online publication date: 2-Nov-2019
https://doi.org/10.3390/app9214678
Cai JMeng ZKhan ALi ZOReilly JHan SLiu PChen MTong Y(2019)Feature-Level and Model-Level Audiovisual Fusion for Emotion Recognition in the Wild2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR.2019.00089(443-448)Online publication date: Mar-2019
https://doi.org/10.1109/MIPR.2019.00089
Selvaraj ARussel N(2019)Bimodal recognition of affective states with the features inspired from human visual and auditory perception systemInternational Journal of Imaging Systems and Technology10.1002/ima.2233829:4(584-598)Online publication date: 18-May-2019
https://doi.org/10.1002/ima.22338
Ghaleb EPopa MHortal EAsteriadis S(2017)Multimodal fusion based on information gain for emotion recognition in the wild2017 Intelligent Systems Conference (IntelliSys)10.1109/IntelliSys.2017.8324224(814-823)Online publication date: Sep-2017
https://doi.org/10.1109/IntelliSys.2017.8324224

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents