Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2993148.2997626acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Wild wild emotion: a multimodal ensemble approach

Published: 31 October 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Automatic emotion recognition from audio-visual data is a topic that has been broadly explored using data captured in the laboratory. However, these data are not necessarily representative of how emotion is manifested in the real-world. In this paper, we describe our system for the 2016 Emotion Recognition in the Wild challenge. We use the Acted Facial Expressions in the Wild database 6.0 (AFEW 6.0), which contains short clips of popular TV shows and movies and has more variability in the data compared to laboratory recordings. We explore a set of features that incorporate information from facial expressions and speech, in addition to cues from the background music and overall scene. In particular, we propose the use of a feature set composed of dimensional emotion estimates trained from outside acoustic corpora. We design sets of multiclass and pairwise (one-versus-one) classifiers and fuse the resulting systems. Our fusion increases the performance from a baseline of 38.81% to 43.86% and from 40.47% to 46.88%, for validation and test sets, respectively. While the video features perform better than audio features alone, a combination of the two modalities achieves the greatest performance, with gains of 4.4% and 1.4%, with and without information gain, respectively. Because of the flexible design of the fusion, it is easily adaptable to other multimodal learning problems.

    References

    [1]
    A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008.
    [2]
    L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
    [3]
    C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan. Iemocap: Interactive emotional dyadic motion capture database. Language resources and evaluation, 42(4):335–359, 2008.
    [4]
    C. Busso, S. Parthasarathy, A. Burmania, M. Abdel-Wahab, N. Sadoughi, and E. Mower Provost. Msp-improv: An acted corpus of dyadic interactions to study emotion perception. 2015.
    [5]
    C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.
    [6]
    M. Cover Thomas and A. Thomas Joy. Elements of information theory. Wiley, 1991.
    [7]
    A. Dhall, R. Goecke, J. Joshi, J. Hoey, and T. Gedeon. Emotiw 2016: Video and group-level emotion recognition challenges. In ACM ICMI, 2016.
    [8]
    A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 423–426. ACM, 2015.
    [9]
    A. Dhall, O. Ramana Murthy, R. Goecke, J. Joshi, and T. Gedeon. Video and image based emotion recognition challenges in the wild: Emotiw 2015. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ICMI ’15, pages 423–426, New York, NY, USA, 2015. ACM.
    [10]
    P. Ekman and E. L. Rosenberg. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA, 1997.
    [11]
    F. Eyben, M. Wöllmer, and B. Schuller. Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia, pages 1459–1462. ACM, 2010.
    [12]
    M. Glodek, S. Tschechne, G. Layher, M. Schels, T. Brosch, S. Scherer, M. Kächele, M. Schmidt, H. Neumann, G. Palm, et al. Multiple classifier systems for the classification of audio-visual emotional states. In Affective Computing and Intelligent Interaction, pages 359–368. Springer, 2011.
    [13]
    S. Haq and P. J. Jackson. Multimodal emotion recognition. Machine audition: principles, algorithms and systems, pages 398–423, 2010.
    [14]
    C.-W. Hsu and C.-J. Lin. A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks, 13(2):415–425, 2002.
    [15]
    B. Jou, S. Bhattacharya, and S.-F. Chang. Predicting viewer perceived emotions in animated gifs. In Proceedings of the 22nd ACM international conference on Multimedia, pages 213–216, 2014.
    [16]
    Y. Kim and E. Mower Provost. Emotion spotting: Discovering regions of evidence in audio-visual emotion expressions. In ACM International Conference on Multimodal Interaction (ACM ICMI), 2016.
    [17]
    G. Littlewort, J. Whitehill, T. Wu, I. Fasel, M. Frank, J. Movellan, and M. Bartlett. The computer expression recognition toolbox (cert). In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, pages 298–305. IEEE, 2011.
    [18]
    M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen. Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 494–501. ACM, 2014.
    [19]
    R. Polikar. Ensemble learning. In Ensemble machine learning, pages 1–34. Springer, 2012.
    [20]
    F. Ringeval, S. Amiriparian, F. Eyben, K. Scherer, and B. Schuller. Emotion recognition in the wild: Incorporating voice and lip activity in multimodal decision-level fusion. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 473–480. ACM, 2014.
    [21]
    D. Ruta and B. Gabrys. An overview of classifier fusion methods. Computing and Information systems, 7(1):1–10, 2000.
    [22]
    C. Shan, S. Gong, and P. W. McOwan. Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6):803–816, 2009.
    [23]
    B. Sun, L. Li, G. Zhou, X. Wu, J. He, L. Yu, D. Li, and Q. Wei. Combining multimodal features within a fusion network for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 497–502. ACM, 2015.
    [24]
    B. Sun, L. Li, T. Zuo, Y. Chen, G. Zhou, and X. Wu. Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction, pages 481–486. ACM, 2014.
    [25]
    A. Yao, J. Shao, N. Ma, and Y. Chen. Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pages 451–458. ACM, 2015.
    [26]
    G. Zhao and M. Pietikainen. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE transactions on pattern analysis and machine intelligence, 29(6):915–928, 2007.

    Cited By

    View all
    • (2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
    • (2021)Jointly Aligning and Predicting Continuous Emotion AnnotationsIEEE Transactions on Affective Computing10.1109/TAFFC.2019.291704712:4(1069-1083)Online publication date: 1-Oct-2021
    • (2020)A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared ImagesMultimodal Technologies and Interaction10.3390/mti40300464:3(46)Online publication date: 6-Aug-2020
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction
    October 2016
    605 pages
    ISBN:9781450345569
    DOI:10.1145/2993148
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 October 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Emotion Recognition
    2. Emotion in the Wild
    3. Ensemble Learning
    4. Multimodal Learning

    Qualifiers

    • Short-paper

    Conference

    ICMI '16
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
    • (2021)Jointly Aligning and Predicting Continuous Emotion AnnotationsIEEE Transactions on Affective Computing10.1109/TAFFC.2019.291704712:4(1069-1083)Online publication date: 1-Oct-2021
    • (2020)A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared ImagesMultimodal Technologies and Interaction10.3390/mti40300464:3(46)Online publication date: 6-Aug-2020
    • (2020)Training Strategies to Handle Missing Modalities for Audio-Visual Expression RecognitionCompanion Publication of the 2020 International Conference on Multimodal Interaction10.1145/3395035.3425202(400-404)Online publication date: 25-Oct-2020
    • (2019)Facial Expression Recognition Using Computer Vision: A Systematic ReviewApplied Sciences10.3390/app92146789:21(4678)Online publication date: 2-Nov-2019
    • (2019)Feature-Level and Model-Level Audiovisual Fusion for Emotion Recognition in the Wild2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR.2019.00089(443-448)Online publication date: Mar-2019
    • (2019)Bimodal recognition of affective states with the features inspired from human visual and auditory perception systemInternational Journal of Imaging Systems and Technology10.1002/ima.2233829:4(584-598)Online publication date: 18-May-2019
    • (2017)Multimodal fusion based on information gain for emotion recognition in the wild2017 Intelligent Systems Conference (IntelliSys)10.1109/IntelliSys.2017.8324224(814-823)Online publication date: Sep-2017

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media