Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

Belharbi, Soufiane; Pedersoli, Marco; Koerich, Alessandro Lameiras; Bacon, Simon; Granger, Eric

Computer Science > Computer Vision and Pattern Recognition

arXiv:2402.00281 (cs)

[Submitted on 1 Feb 2024 (v1), last revised 14 May 2024 (this version, v5)]

Title:Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

Authors:Soufiane Belharbi, Marco Pedersoli, Alessandro Lameiras Koerich, Simon Bacon, Eric Granger

View PDF HTML (experimental)

Abstract:Although state-of-the-art classifiers for facial expression recognition (FER) can achieve a high level of accuracy, they lack interpretability, an important feature for end-users. Experts typically associate spatial action units (\aus) from a codebook to facial regions for the visual interpretation of expressions. In this paper, the same expert steps are followed. A new learning strategy is proposed to explicitly incorporate \au cues into classifier training, allowing to train deep interpretable models. During training, this \au codebook is used, along with the input image expression label, and facial landmarks, to construct a \au heatmap that indicates the most discriminative image regions of interest w.r.t the facial expression. This valuable spatial cue is leveraged to train a deep interpretable classifier for FER. This is achieved by constraining the spatial layer features of a classifier to be correlated with \au heatmaps. Using a composite loss, the classifier is trained to correctly classify an image while yielding interpretable visual layer-wise attention correlated with \au maps, simulating the expert decision process. Our strategy only relies on image class expression for supervision, without additional manual annotations. Our new strategy is generic, and can be applied to any deep CNN- or transformer-based classifier without requiring any architectural change or significant additional training time. Our extensive evaluation on two public benchmarks \rafdb, and \affectnet datasets shows that our proposed strategy can improve layer-wise interpretability without degrading classification performance. In addition, we explore a common type of interpretable classifiers that rely on class activation mapping (CAM) methods, and show that our approach can also improve CAM interpretability.

Comments:	15 pages, 11 figures, 3 tables, International Conference on Automatic Face and Gesture Recognition (FG 2024)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2402.00281 [cs.CV]
	(or arXiv:2402.00281v5 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2402.00281

Submission history

From: Soufiane Belharbi [view email]
[v1] Thu, 1 Feb 2024 02:13:49 UTC (11,455 KB)
[v2] Fri, 2 Feb 2024 02:56:43 UTC (11,455 KB)
[v3] Thu, 25 Apr 2024 16:55:46 UTC (11,455 KB)
[v4] Mon, 13 May 2024 14:54:17 UTC (11,455 KB)
[v5] Tue, 14 May 2024 12:26:54 UTC (11,455 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Guided Interpretable Facial Expression Recognition via Spatial Action Unit Cues

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators