Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3508546.3508581acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacaiConference Proceedingsconference-collections
research-article

Learning Dynamics for Video Facial Expression Recognition

Published: 25 February 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Video-based facial expression recognition has always been a focus of attention in the computer vision community for decades. It aims to automatically identify one of several emotions represented by the video according to the input audio or visual information. Capturing the dynamics, namely motion pattern plays an important role in video-based facial expression recognition. In this paper, we explore an effective and efficient motion pattern to model temporal relationships called Diff-based Canny Operator (DCO) to guide the inter-frame aggregation and generate a novel feature modality. Our proposed DCO add rare computational consumption and can be easily inserted into any frameworks, so we incorporate it with exist networks to form a unified structure for video-based facial expression recognition task, which enable the network to Ideally extract temporal information. With extensive experiments on CK+ and AFEW dataset, our proposed method shows its superiority with better or comparable performance compared to the state-of-the-art approaches at low FLOPs.

    References

    [1]
    Masih Aminbeidokhti, Marco Pedersoli, Patrick Cardinal, and Eric Granger. 2019. Emotion recognition with spatial attention and temporal softmax pooling. In International Conference on Image Analysis and Recognition. Springer, 323–331.
    [2]
    Weicong Chen, Dong Zhang, Ming Li, and Dah-Jye Lee. 2020. STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition. IEEE Transactions on Affective Computing(2020).
    [3]
    Yuedong Chen, Jianfeng Wang, Shikai Chen, Zhongchao Shi, and Jianfei Cai. 2019. Facial motion prior networks for facial expression recognition. In 2019 IEEE Visual Communications and Image Processing (VCIP). IEEE, 1–4.
    [4]
    Abhinav Dhall, Amanjot Kaur, Roland Goecke, and Tom Gedeon. 2018. Emotiw 2018: Audio-video, student engagement and group-level affect prediction. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 653–656.
    [5]
    Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. 2015. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision. 2758–2766.
    [6]
    Yingruo Fan, Victor Li, and Jacqueline CK Lam. 2020. Facial Expression Recognition with Deeply-Supervised Attention Network. IEEE Transactions on Affective Computing(2020).
    [7]
    Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. 445–450.
    [8]
    Duo Feng and Fuji Ren. 2018. Dynamic facial expression recognition based on two-stream-cnn with lbp-top. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). IEEE, 355–359.
    [9]
    Yanling Gan, Jingying Chen, Zongkai Yang, and Luhui Xu. 2020. Multiple attention network for facial expression recognition. IEEE Access 8(2020), 7383–7393.
    [10]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
    [11]
    Heechul Jung, Sihaeng Lee, Junho Yim, Sunjeong Park, and Junmo Kim. 2015. Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE international conference on computer vision. 2983–2991.
    [12]
    Boris Knyazev, Roman Shvetsov, Natalia Efremova, and Artem Kuharenko. 2017. Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598(2017).
    [13]
    Ivan Laptev. 2005. On space-time interest points. International journal of computer vision 64, 2-3 (2005), 107–123.
    [14]
    Chuanhe Liu, Tianhao Tang, Kui Lv, and Minghao Wang. 2018. Multi-feature based emotion recognition for video clips. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 630–634.
    [15]
    Mengyi Liu, Shiguang Shan, Ruiping Wang, and Xilin Chen. 2014. Learning expressionlets on spatio-temporal manifold for dynamic facial expression recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1749–1756.
    [16]
    Cheng Lu, Wenming Zheng, Chaolong Li, Chuangao Tang, Suyuan Liu, Simeng Yan, and Yuan Zong. 2018. Multiple spatio-temporal feature learning for video-based emotion recognition in the wild. In Proceedings of the 20th ACM International Conference on Multimodal Interaction. 646–652.
    [17]
    Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. 2010. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In 2010 ieee computer society conference on computer vision and pattern recognition-workshops. IEEE, 94–101.
    [18]
    Chunjie Luo, Jianfeng Zhan, Xiaohe Xue, Lei Wang, Rui Ren, and Qiang Yang. 2018. Cosine normalization: Using cosine similarity instead of dot product in neural networks. In International Conference on Artificial Neural Networks. Springer, 382–391.
    [19]
    Debin Meng, Xiaojiang Peng, Kai Wang, and Yu Qiao. 2019. Frame attention networks for facial expression recognition in videos. In 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 3866–3870.
    [20]
    Anurag Ranjan and Michael J Black. 2017. Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4161–4170.
    [21]
    Karan Sikka, Tingfan Wu, Josh Susskind, and Marian Bartlett. 2012. Exploring bag of words architectures in the facial expression domain. In European Conference on Computer Vision. Springer, 250–259.
    [22]
    Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199(2014).
    [23]
    Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision. 4489–4497.
    [24]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv preprint arXiv:1706.03762(2017).
    [25]
    Valentin Vielzeuf, Stéphane Pateux, and Frédéric Jurie. 2017. Temporal multimodal fusion for video emotion classification in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. 569–576.
    [26]
    H. Wang, C. Schmid, and C. L. Liu. 2011. Action Recognition by Dense Trajectories. Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition(2011).
    [27]
    Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20–36.
    [28]
    Jiaolong Yang, Peiran Ren, Dongqing Zhang, Dong Chen, Fang Wen, Hongdong Li, and Gang Hua. 2017. Neural aggregation network for video face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4362–4371.
    [29]
    Kaihao Zhang, Yongzhen Huang, Yong Du, and Liang Wang. 2017. Facial expression recognition based on deep evolutional spatial-temporal networks. IEEE Transactions on Image Processing 26, 9 (2017), 4193–4203.
    [30]
    Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, and Yichen Wei. 2017. Flow-guided feature aggregation for video object detection. In Proceedings of the IEEE International Conference on Computer Vision. 408–417.

    Cited By

    View all
    • (2022)Smart Classroom Monitoring Using Novel Real-Time Facial Expression Recognition SystemApplied Sciences10.3390/app12231213412:23(12134)Online publication date: 27-Nov-2022
    • (2022)Emotion Recognition Method based on Guided Fusion of Facial Expression and Bodily Posture2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS57298.2022.10016324(628-632)Online publication date: 26-Nov-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence
    December 2021
    699 pages
    ISBN:9781450385053
    DOI:10.1145/3508546
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 February 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Diff-based Canny Operator
    2. dynamics
    3. motion pattern
    4. video-based facial expression recognition

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ACAI'21

    Acceptance Rates

    Overall Acceptance Rate 173 of 395 submissions, 44%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)1

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Smart Classroom Monitoring Using Novel Real-Time Facial Expression Recognition SystemApplied Sciences10.3390/app12231213412:23(12134)Online publication date: 27-Nov-2022
    • (2022)Emotion Recognition Method based on Guided Fusion of Facial Expression and Bodily Posture2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS)10.1109/CCIS57298.2022.10016324(628-632)Online publication date: 26-Nov-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media