research-article

Texture and Geometry Scattering Representation-Based Facial Expression Recognition in 2D+3D Videos

Authors:

Liming ChenAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 14, Issue 1s

Article No.: 18, Pages 1 - 23

https://doi.org/10.1145/3131345

Published: 06 March 2018 Publication History

Get Access

Abstract

Facial Expression Recognition (FER) is one of the most important topics in the domain of computer vision and pattern recognition, and it has attracted increasing attention for its scientific challenges and application potentials. In this article, we propose a novel and effective approach to FER using multi-model two-dimensional (2D) and 3D videos, which encodes both static and dynamic clues by scattering convolution network. First, a shape-based detection method is introduced to locate the start and the end of an expression in videos; segment its onset, apex, and offset states; and sample the important frames for emotion analysis. Second, the frames in Apex of 2D videos are represented by scattering, conveying static texture details. Those of 3D videos are processed in a similar way, but to highlight static shape details, several geometric maps in terms of multiple order differential quantities, i.e., Normal Maps and Shape Index Maps, are generated as the input of scattering, instead of original smooth facial surfaces. Third, the average of neighboring samples centred at each key texture frame or shape map in Onset is computed, and the scattering features extracted from all the average samples of 2D and 3D videos are then concatenated to capture dynamic texture and shape cues, respectively. Finally, Multiple Kernel Learning is adopted to combine the features in the 2D and 3D modalities and compute similarities to predict the expression label. Thanks to the scattering descriptor, the proposed approach not only encodes distinct local texture and shape variations of different expressions as by several milestone operators, such as SIFT, HOG, and so on, but also captures subtle information hidden in high frequencies in both channels, which is quite crucial to better distinguish expressions that are easily confused. The validation is conducted on the BU-4DFE and BP-4D databa ses, and the accuracies reached are very competitive, indicating its competency for this issue.

References

[1]

Bouchra Abboud, Franck Davoine, and Mo Dang. 2004. Facial expression recognition and synthesis based on an appearance model. Sign. Process.: Image Commun. 19, 8 (2004), 723--740.

Abstract

References

Cited By

Index Terms

Recommendations

Pose-Robust Facial Expression Recognition Using View-Based 2D + 3D AAM

Expression-invariant face recognition by facial expression transformations

Pose-Invariant Facial Expression Recognition Based on 3D Face Reconstruction and Synthesis from a Single 2D Image

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations