Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3577190.3616544acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Audio-Visual Group-based Emotion Recognition using Local and Global Feature Aggregation based Multi-Task Learning

Published: 09 October 2023 Publication History

Abstract

Audio-video group emotion recognition is a challenging task and has attracted more attention in recent decades. Recently, deep learning models have shown tremendous advances in analyzing human emotion. However, due to its difficulties such as hard to gather a broad range of potential information to obtain meaningful emotional representations and hard to associate implicit contextual knowledge like humans. To tackle these problems, in this paper, we proposed the Local and Global Feature Aggregation based Multi-Task Learning (LGFAM) method to tackle the Group Emotion Recognition problem. The framework consists of three parallel feature extraction networks that were verified in previous work. After that, an attention network using MLP as a backbone with specially designed loss functions was used to fuse features from different modalities. In the experiment section, we present its performance on the EmotiW2023 Audio-Visual Group-based Emotion Recognition subchallenge which aims to classify a video into one of the three emotions. According to the feedback results, the best result achieved 70.63 WAR and 70.38 UAR on the test set. Such improvement proves the effectiveness of our method.

References

[1]
Brandon Amos, Bartosz Ludwiczuk, and Mahadev Satyanarayanan. 2016. OpenFace: A general-purpose face recognition library with mobile applications. Technical Report. CMU-CS-16-118, CMU School of Computer Science.
[2]
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33 (2020), 12449–12460.
[3]
Abhinav Dhall, Garima Sharma, Roland Goecke, and Tom Gedeon. 2020. Emotiw 2020: Driver gaze, group emotion, student engagement and physiological signal based challenges. In Proceedings of the 2020 International Conference on Multimodal Interaction. 784–789.
[4]
Abhinav Dhall, Monisha Singh, Roland Goecke, Tom Gedeon, Donghua Zeng, Yanan Wang, and Kazushi Ikeda. 2023. EmotiW 2023: Emotion Recognition in the Wild Challenge. In Proceedings of the 2023 International Conference on Multimodal Interaction.
[5]
Ian J Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, 2013. Challenges in representation learning: A report on three machine learning contests. In Neural Information Processing: 20th International Conference, ICONIP 2013, Daegu, Korea, November 3-7, 2013. Proceedings, Part III 20. Springer, 117–124.
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[7]
Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. 2021. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021), 3451–3460.
[8]
Markus Kächele, Martin Schels, Sascha Meudt, Günther Palm, and Friedhelm Schwenker. 2016. Revisiting the EmotiW challenge: how wild is it really? Classification of human emotions in movie snippets based on multiple features. Journal on Multimodal User Interfaces 10 (2016), 151–162.
[9]
Heysem Kaya, Furkan Gürpinar, Sadaf Afshar, and Albert Ali Salah. 2015. Contrasting and combining least squares based learners for emotion recognition in the wild. In Proceedings of the 2015 ACM on international conference on multimodal interaction. 459–466.
[10]
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, Vol. 1. 2.
[11]
Cheng Lu, Hailun Lian, Wenming Zheng, Yuan Zong, Yan Zhao, and Sunan Li. 2023. Learning Local to Global Feature Aggregation for Speech Emotion Recognition. arXiv preprint arXiv:2306.01491 (2023).
[12]
Cheng Lu, Wenming Zheng, Hailun Lian, Yuan Zong, Chuangao Tang, Sunan Li, and Yan Zhao. 2022. Speech Emotion Recognition via an Attentive Time–Frequency Neural Network. IEEE Transactions on Computational Social Systems (2022), 1–10. https://doi.org/10.1109/TCSS.2022.3219825
[13]
Anastasia Petrova, Dominique Vaufreydaz, and Philippe Dessus. 2020. Group-level emotion recognition using a unimodal privacy-safe non-individual approach. In Proceedings of the 2020 International Conference on Multimodal Interaction. 813–820.
[14]
Mo Sun, Jian Li, Hui Feng, Wei Gou, Haifeng Shen, Jian Tang, Yi Yang, and Jieping Ye. 2020. Multi-modal fusion using spatio-temporal and static features for group emotion recognition. In Proceedings of the 2020 International Conference on Multimodal Interaction. 835–840.
[15]
Vladimir Vapnik and A Ya Chervonenkis. 1964. A class of algorithms for pattern recognition learning. Avtomat. i Telemekh 25, 6 (1964), 937–945.
[16]
Yanan Wang, Jianming Wu, Panikos Heracleous, Shinya Wada, Rui Kimura, and Satoshi Kurihara. 2020. Implicit knowledge injectable cross attention audiovisual model for group emotion recognition. In Proceedings of the 2020 international conference on multimodal interaction. 827–834.
[17]
Jianlong Wu, Zhouchen Lin, and Hongbin Zha. 2015. Multiple models fusion for emotion recognition in the wild. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. 475–481.
[18]
Anbang Yao, Junchao Shao, Ningning Ma, and Yurong Chen. 2015. Capturing au-aware facial features and their latent relations for emotion recognition in the wild. In Proceedings of the 2015 acm on international conference on multimodal interaction. 451–458.

Index Terms

  1. Audio-Visual Group-based Emotion Recognition using Local and Global Feature Aggregation based Multi-Task Learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICMI '23: Proceedings of the 25th International Conference on Multimodal Interaction
      October 2023
      858 pages
      ISBN:9798400700552
      DOI:10.1145/3577190
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 09 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multimodal emotion recognition
      2. feature fusion
      3. label revision
      4. modality robustness
      5. neural networks

      Qualifiers

      • Short-paper
      • Research
      • Refereed limited

      Funding Sources

      • NSFC

      Conference

      ICMI '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 453 of 1,080 submissions, 42%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 145
        Total Downloads
      • Downloads (Last 12 months)56
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 08 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media