research-article

Learning Dynamics for Video Facial Expression Recognition

Authors:

Yao ZhuangAuthors Info & Claims

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

December 2021

Article No.: 35, Pages 1 - 5

https://doi.org/10.1145/3508546.3508581

Published: 25 February 2022 Publication History

Abstract

Video-based facial expression recognition has always been a focus of attention in the computer vision community for decades. It aims to automatically identify one of several emotions represented by the video according to the input audio or visual information. Capturing the dynamics, namely motion pattern plays an important role in video-based facial expression recognition. In this paper, we explore an effective and efficient motion pattern to model temporal relationships called Diff-based Canny Operator (DCO) to guide the inter-frame aggregation and generate a novel feature modality. Our proposed DCO add rare computational consumption and can be easily inserted into any frameworks, so we incorporate it with exist networks to form a unified structure for video-based facial expression recognition task, which enable the network to Ideally extract temporal information. With extensive experiments on CK+ and AFEW dataset, our proposed method shows its superiority with better or comparable performance compared to the state-of-the-art approaches at low FLOPs.

References

[1]

Masih Aminbeidokhti, Marco Pedersoli, Patrick Cardinal, and Eric Granger. 2019. Emotion recognition with spatial attention and temporal softmax pooling. In International Conference on Image Analysis and Recognition. Springer, 323–331.

Digital Library

[2]

Weicong Chen, Dong Zhang, Ming Li, and Dah-Jye Lee. 2020. STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition. IEEE Transactions on Affective Computing(2020).

[3]

Yuedong Chen, Jianfeng Wang, Shikai Chen, Zhongchao Shi, and Jianfei Cai. 2019. Facial motion prior networks for facial expression recognition. In 2019 IEEE Visual Communications and Image Processing (VCIP). IEEE, 1–4.

[4]

Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. Emotiw 2018: Audio-video, student engagement and group-level affect prediction. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 653–656.

Digital Library

[5]

Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758–2766.

Digital Library

[6]

Yingruo Fan, Victor Li, and Jacqueline CK Lam. 2020. Facial Expression Recognition with Deeply-Supervised Attention Network. IEEE Transactions on Affective Computing(2020).

[7]

Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. 445–450.

Digital Library

[8]

Duo Feng and Fuji Ren. 2018. Dynamic facial expression recognition based on two-stream-cnn with lbp-top. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). IEEE, 355–359.

[9]

Yanling Gan, Jingying Chen, Zongkai Yang, and Luhui Xu. 2020. Multiple attention network for facial expression recognition. IEEE Access 8(2020), 7383–7393.

[10]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[11]

Heechul Jung, Sihaeng Lee, Junho Yim, Sunjeong Park, and Junmo Kim. 2015. Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE international conference on computer vision. 2983–2991.

Digital Library

[12]

Boris Knyazev, Roman Shvetsov, Natalia Efremova, and Artem Kuharenko. 2017. Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598(2017).

[13]

Ivan Laptev. 2005. On space-time interest points. International journal of computer vision 64, 2-3 (2005), 107–123.

Digital Library

[14]

Chuanhe Liu, Tianhao Tang, Kui Lv, and Minghao Wang. 2018. Multi-feature based emotion recognition for video clips. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 630–634.

Digital Library

[15]

Mengyi Liu, Shiguang Shan, Ruiping Wang, and Xilin Chen. 2014. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1749–1756.

Digital Library

[16]

Cheng Lu, Wenming Zheng, Chaolong Li, Chuangao Tang, Suyuan Liu, Simeng Yan, and Yuan Zong. 2018. Multiple spatio-temporal feature learning for video-based emotion recognition in the wild. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 646–652.

Digital Library

[17]

Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. 2010. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops. IEEE, 94–101.

[18]

Chunjie Luo, Jianfeng Zhan, Xiaohe Xue, Lei Wang, Rui Ren, and Qiang Yang. 2018. Cosine normalization: Using cosine similarity instead of dot product in neural networks. In International Conference on Artificial Neural Networks. Springer, 382–391.

[19]

Debin Meng, Xiaojiang Peng, Kai Wang, and Yu Qiao. 2019. Frame attention networks for facial expression recognition in videos. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 3866–3870.

[20]

Anurag Ranjan and Michael J Black. 2017. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4161–4170.

[21]

Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett. 2012. Exploring bag of words architectures in the facial expression domain. In European Conference on Computer Vision. Springer, 250–259.

Digital Library

[22]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199(2014).

[23]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489–4497.

Digital Library

[24]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762(2017).

[25]

Valentin Vielzeuf, Stéphane Pateux, and Frédéric Jurie. 2017. Temporal multimodal fusion for video emotion classification in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. 569–576.

Digital Library

[26]

H. Wang, C. Schmid, and C. L. Liu. 2011. Action Recognition by Dense Trajectories. Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition(2011).

Digital Library

[27]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20–36.

[28]

Jiaolong Yang, Peiran Ren, Dongqing Zhang, Dong Chen, Fang Wen, Hongdong Li, and Gang Hua. 2017. Neural aggregation network for video face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4362–4371.

[29]

Kaihao Zhang, Yongzhen Huang, Yong Du, and Liang Wang. 2017. Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Transactions on Image Processing 26, 9 (2017), 4193–4203.

Digital Library

[30]

Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, and Yichen Wei. 2017. Flow-guided feature aggregation for video object detection. In Proceedings of the IEEE International Conference on Computer Vision. 408–417.

Cited By

Fakhar SBaber JBazai SMarjan SJasinski MJasinska EChaudhry MLeonowicz ZHussain S(2022)Smart Classroom Monitoring Using Novel Real-Time Facial Expression Recognition SystemApplied Sciences10.3390/app12231213412:23(12134)Online publication date: 27-Nov-2022
https://doi.org/10.3390/app122312134
Zhong HDanni ZFuji RMin HJuan L(2022)Emotion Recognition Method based on Guided Fusion of Facial Expression and Bodily Posture2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS57298.2022.10016324(628-632)Online publication date: 26-Nov-2022
https://doi.org/10.1109/CCIS57298.2022.10016324

Recommendations

Expression-invariant face recognition by facial expression transformations

In this paper, we present a method of expression-invariant face recognition that transforms input face image with an arbitrary expression into its corresponding neutral facial expression image. When a new face image with an arbitrary expression is ...
Read More
Collaborative discriminative multi-metric learning for facial expression recognition in video

We present a new metric learning approach for facial expression recognition in videos.Our approach combines both audio and visual features and achieves better facial expression recognition performance.Experimental results clearly show the advantages of ...
Read More
Subtle facial expression recognition using motion magnification

This paper proposes a novel method for subtle facial expression recognition that uses motion magnification to transform subtle expressions into corresponding exaggerated ones. Motion magnification consists of four steps: First, active appearance model (...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

December 2021

699 pages

ISBN:9781450385053

DOI:10.1145/3508546

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 February 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ACAI'21

ACAI'21: 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

December 22 - 24, 2021

Sanya, China

Acceptance Rates

Overall Acceptance Rate 173 of 395 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
83
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

Cited By

Fakhar SBaber JBazai SMarjan SJasinski MJasinska EChaudhry MLeonowicz ZHussain S(2022)Smart Classroom Monitoring Using Novel Real-Time Facial Expression Recognition SystemApplied Sciences10.3390/app12231213412:23(12134)Online publication date: 27-Nov-2022
https://doi.org/10.3390/app122312134
Zhong HDanni ZFuji RMin HJuan L(2022)Emotion Recognition Method based on Guided Fusion of Facial Expression and Bodily Posture2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS57298.2022.10016324(628-632)Online publication date: 26-Nov-2022
https://doi.org/10.1109/CCIS57298.2022.10016324

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents