Temporal Multimodal Fusion for Video Emotion Classification in the Wild

Vielzeuf, Valentin; Pateux, Stéphane; Jurie, Frédéric

Computer Science > Computer Vision and Pattern Recognition

arXiv:1709.07200 (cs)

[Submitted on 21 Sep 2017]

Title:Temporal Multimodal Fusion for Video Emotion Classification in the Wild

Authors:Valentin Vielzeuf, Stéphane Pateux, Frédéric Jurie

View PDF

Abstract:This paper addresses the question of emotion classification. The task consists in predicting emotion labels (taken among a set of possible labels) best describing the emotions contained in short video clips. Building on a standard framework -- lying in describing videos by audio and visual features used by a supervised classifier to infer the labels -- this paper investigates several novel directions. First of all, improved face descriptors based on 2D and 3D Convo-lutional Neural Networks are proposed. Second, the paper explores several fusion methods, temporal and multimodal, including a novel hierarchical method combining features and scores. In addition, we carefully reviewed the different stages of the pipeline and designed a CNN architecture adapted to the task; this is important as the size of the training set is small compared to the difficulty of the problem, making generalization difficult. The so-obtained model ranked 4th at the 2017 Emotion in the Wild challenge with the accuracy of 58.8 %.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
Cite as:	arXiv:1709.07200 [cs.CV]
	(or arXiv:1709.07200v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1709.07200
Journal reference:	ACM - ICMI 2017, Nov 2017, Glasgow, United Kingdom

Submission history

From: Valentin Vielzeuf [view email] [via CCSD proxy]
[v1] Thu, 21 Sep 2017 08:14:40 UTC (1,325 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2017-09

Change to browse by:

cs
cs.LG
cs.MM

References & Citations

DBLP - CS Bibliography

listing | bibtex

Valentin Vielzeuf
Stéphane Pateux
Frédéric Jurie

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Temporal Multimodal Fusion for Video Emotion Classification in the Wild

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Temporal Multimodal Fusion for Video Emotion Classification in the Wild

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators