MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

Xiang, Peihao; Lin, Chaohao; Wu, Kaida; Bai, Ou

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.18327v1 (cs)

[Submitted on 28 Apr 2024 (this version), latest version 16 May 2024 (v2)]

Title:MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

Authors:Peihao Xiang, Chaohao Lin, Kaida Wu, Ou Bai

View PDF HTML (experimental)

Abstract:This paper presents a novel approach to processing multimodal data for dynamic emotion recognition, named as the Multimodal Masked Autoencoder for Dynamic Emotion Recognition (MultiMAE-DER). The MultiMAE-DER leverages the closely correlated representation information within spatiotemporal sequences across visual and audio modalities. By utilizing a pre-trained masked autoencoder model, the MultiMAEDER is accomplished through simple, straightforward finetuning. The performance of the MultiMAE-DER is enhanced by optimizing six fusion strategies for multimodal input sequences. These strategies address dynamic feature correlations within cross-domain data across spatial, temporal, and spatiotemporal sequences. In comparison to state-of-the-art multimodal supervised learning models for dynamic emotion recognition, MultiMAE-DER enhances the weighted average recall (WAR) by 4.41% on the RAVDESS dataset and by 2.06% on the CREMAD. Furthermore, when compared with the state-of-the-art model of multimodal self-supervised learning, MultiMAE-DER achieves a 1.86% higher WAR on the IEMOCAP dataset.

Comments:	Accepted by ICPRS 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.18327 [cs.CV]
	(or arXiv:2404.18327v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.18327

Submission history

From: Peihao Xiang [view email]
[v1] Sun, 28 Apr 2024 21:53:42 UTC (909 KB)
[v2] Thu, 16 May 2024 13:54:39 UTC (1,078 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators