Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3551876.3554804acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Improving Dimensional Emotion Recognition via Feature-wise Fusion

Published: 10 October 2022 Publication History

Abstract

This paper introduces the solution of the RiHNU team for the MuSe-Stress sub-challenge as part of Multimodal Sentiment Analysis (MuSe) 2022. The MuSe-Stress is a task to discern human emotional states via internal or external responses (e.g., audio, physiological signal, and facial expression) in a job-interview setting. Multimodal learning is extensively considered an available approach for multimodal sentiment analysis tasks. However, most multimodal models fail to capture the association among each modality, resulting in limited generalizability. We argue that those methods are incapable of establishing discriminative features, mainly because they typically neglect fine-grained information. To address this problem, we first encode spatio-temporal features via a feature-wise fuse mechanism to learn more informative representations. Then we exploit the late fusion strategy to capture fine-grained relations between multiple modalities. The ensemble strategy is also used to enhance the final performance. Our method achieves CCC of 0.6803 and 0.6689 for valence and physiological arousal, respectively, on the test set.

Supplementary Material

MP4 File (MuSe22-fp006.mp4)
Presentation video of the RiHNU solution

References

[1]
Shahin Amiriparian, Lukas Christ, Andreas König, Eva-Maria Meßner, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2022. MuSe 2022 Challenge: Multimodal Humour, Emotional Reactions, and Stress. In Proceedings of the 30th ACM International Conference on Multimedia (MM'22), October 10--14, 2022, Lisbon, Portugal. Association for Computing Machinery, Lisbon, Portugal. 3 pages, to appear.
[2]
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6--12, 2020, virtual, Hugo Larochelle, Marc'Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/ 92d1e1eb1cd6f9fba3227870bb6d7f07-Abstract.html
[3]
Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. OpenFace 2.0: Facial Behavior Analysis Toolkit. In 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, Xi'an, China, May 15--19, 2018. IEEE Computer Society, 59--66. https://doi.org/10.1109/FG.2018.00019
[4]
Cong Cai, Yu He, Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao, Mingyu Xu, and Kexin Wang. 2021. Multimodal Sentiment Analysis based on Recurrent Neural Network and Multimodal Attention. In MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, Virtual Event, China, 24 October 2021, Björn W. Schuller, Lukas Stappen, Eva-Maria Meßner, Erik Cambria, and Guoying Zhao (Eds.). ACM, 61--67. https://doi.org/10.1145/3475957.3484454
[5]
Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. 2018. VGGFace2: A Dataset for Recognising Faces across Pose and Age. In 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, Xi'an, China, May 15--19, 2018. IEEE Computer Society, 67--74. https://doi.org/10.1109/ FG.2018.00020
[6]
Chengxin Chen and Pengyuan Zhang. 2022. CTA-RNN: Channel and Temporalwise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition. CoRR abs/2203.17023 (2022). https://doi.org/10.48550/arXiv.2203. 17023 arXiv:2203.17023
[7]
Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, and Furu Wei. 2021. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. CoRR abs/2110.13900 (2021). arXiv:2110.13900 https://arxiv.org/abs/2110.13900
[8]
Lukas Christ, Shahin Amiriparian, Alice Baird, Panagiotis Tzirakis, Alexander Kathan, Niklas Müller, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, and Björn W. Schuller. 2022. The MuSe 2022 MultimodaSentiment Analysis Challenge: Humor, Emotional Reactions, and Stress. In Proceedings of the 3rd Multimodal Sentiment Analysis Challenge. Association for Computing Machinery, Lisbon, Portugal. Workshop held at ACM Multimedia 2022, to appear.
[9]
Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net. https://openreview.net/ forum?id=r1xMH1BtvB
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. IEEE Computer Society, 770--778. https://doi.org/10.1109/CVPR.2016.90
[11]
Paul Pu Liang, Ziyin Liu, Amir Zadeh, and Louis-Philippe Morency. 2018. Multimodal Language Analysis with Recurrent Multistage Fusion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun'ichi Tsujii (Eds.). Association for Computational Linguistics, 150--161. https://doi.org/10.18653/v1/d18--1014
[12]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
[13]
Ziyu Ma, Fuyan Ma, Bin Sun, and Shutao Li. 2021. Hybrid Mutimodal Fusion for Dimensional Emotion Recognition. In MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, Virtual Event, China, 24 October 2021, Björn W. Schuller, Lukas Stappen, Eva-Maria Meßner, Erik Cambria, and Guoying Zhao (Eds.). ACM, 29--36. https://doi.org/10.1145/3475957.3484457
[14]
Seyedmahdad Mirsamadi, Emad Barsoum, and Cha Zhang. 2017. Automatic speech emotion recognition using recurrent neural networks with local attention. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5--9, 2017. IEEE, 2227--2231. https: //doi.org/10.1109/ICASSP.2017.7952552
[15]
Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hussain. 2017. A review of affective computing: From unimodal analysis to multimodal fusion. Inf. Fusion 37 (2017), 98--125. https://doi.org/10.1016/j.inffus.2017.02.003
[16]
Soujanya Poria, Navonil Majumder, Devamanyu Hazarika, Erik Cambria, Alexander F. Gelbukh, and Amir Hussain. 2018. Multimodal Sentiment Analysis: Addressing Key Issues and Setting Up the Baselines. IEEE Intell. Syst. 33, 6 (2018), 17--25. https://doi.org/10.1109/MIS.2018.2882362
[17]
Wasifur Rahman, Md. Kamrul Hasan, Sangwu Lee, AmirAli Bagher Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Mohammed E. Hoque. 2020. Integrating Multimodal Information in Large Pretrained Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5--10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 2359--2369. https://doi.org/10.18653/v1/2020.acl-main.214
[18]
Björn W. Schuller. 2018. Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Commun. ACM 61, 5 (2018), 90--99. https: //doi.org/10.1145/3129340
[19]
Björn W. Schuller, Iulia Lefter, Erik Cambria, Ioannis Kompatsiaris, and Lukas Stappen (Eds.). 2020. MuSe'20: Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, Seattle, WA, USA, October 16, 2020. ACM. https://doi.org/10.1145/3423327
[20]
Björn W. Schuller, Lukas Stappen, Eva-Maria Meßner, Erik Cambria, and Guoying Zhao (Eds.). 2021. MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, Virtual Event, China, 24 October 2021. ACM. https://doi.org/ 10.1145/3475957
[21]
Licai Sun, Mingyu Xu, Zheng Lian, Bin Liu, Jianhua Tao, Meng Wang, and Yuan Cheng. 2021. Multimodal Emotion Recognition and Sentiment Analysis via Attention Enhanced Recurrent Model. In MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, Virtual Event, China, 24 October 2021, Björn W. Schuller, Lukas Stappen, Eva-Maria Meßner, Erik Cambria, and Guoying Zhao (Eds.). ACM, 15--20. https://doi.org/10.1145/3475957.3484456
[22]
Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal Transformer for Unaligned Multimodal Language Sequences. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 6558--6569. https: //doi.org/10.18653/v1/p19--1656
[23]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4- 9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008. https://proceedings.neurips.cc/paper/2017/hash/ 3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[24]
Yansen Wang, Ying Shen, Zhun Liu, Paul Pu Liang, Amir Zadeh, and LouisPhilippe Morency. 2019. Words Can Shift: Dynamically Adjusting Word Representations Using Nonverbal Behaviors. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 7216--7223. https: //doi.org/10.1609/aaai.v33i01.33017216
[25]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor Fusion Network for Multimodal Sentiment Analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9--11, 2017, MarthPalmer, Rebecca Hwa, and Sebastian Riedel (Eds.). Association for Computational Linguistics, 1103--1114. https://doi.org/10.18653/v1/d17--1115
[26]
Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond Empirical Risk Minimization. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/ forum?id=r1Ddp1-R

Cited By

View all
  • (2024)A multimodal shared network with a cross-modal distribution constraint for continuous emotion recognitionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108413133:PDOnline publication date: 1-Jul-2024
  • (2023)The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and PersonalisationProceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation10.1145/3606039.3613114(1-10)Online publication date: 1-Nov-2023
  • (2023)Accommodating Missing Modalities in Time-Continuous Multimodal Emotion Recognition2023 11th International Conference on Affective Computing and Intelligent Interaction (ACII)10.1109/ACII59096.2023.10388079(1-8)Online publication date: 10-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MuSe' 22: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge
October 2022
118 pages
ISBN:9781450394840
DOI:10.1145/3551876
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dimensional emotion recognition
  2. feature-wise fusion
  3. multimodal learning
  4. multiple modalities

Qualifiers

  • Research-article

Funding Sources

Conference

MM '22
Sponsor:

Acceptance Rates

MuSe' 22 Paper Acceptance Rate 14 of 17 submissions, 82%;
Overall Acceptance Rate 14 of 17 submissions, 82%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)66
  • Downloads (Last 6 weeks)4
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A multimodal shared network with a cross-modal distribution constraint for continuous emotion recognitionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108413133:PDOnline publication date: 1-Jul-2024
  • (2023)The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and PersonalisationProceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation10.1145/3606039.3613114(1-10)Online publication date: 1-Nov-2023
  • (2023)Accommodating Missing Modalities in Time-Continuous Multimodal Emotion Recognition2023 11th International Conference on Affective Computing and Intelligent Interaction (ACII)10.1109/ACII59096.2023.10388079(1-8)Online publication date: 10-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media