Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Adaptive key-frame selection-based facial expression recognition via multi-cue dynamic features hybrid fusion

Published: 01 March 2024 Publication History

Abstract

A multi-cue dynamic features hybrid fusion (MDF-HF) method for video-based facial expression recognition is presented. It is composed of key-frame selection, multi-cue dynamic feature extraction, and information fusion components. An adaptive key-frame selection strategy is first designed in the training procedure to extract pivotal facial images from video sequences, addressing the challenge of imbalanced data distribution and improving data quality. The similarity threshold used for key-frame selection is automatically adjusted based on the number of image frames in each expression category, creating a flexible frame processing procedure. Multi-cue spatio-temporal feature descriptors are then designed to acquire diverse dynamic feature representations from the selected key-frame sequences. With parallel computation, different levels of semantic information are extracted simultaneously to explore facial expression deformation in video clips. To integrate features from multiple cues, a weighted stacking ensemble strategy is devised, preserving unique feature characteristics while exploring interrelationships among the multi-cue features. The proposed method is evaluated on three benchmark datasets: eNTERFACE'05, BAUM-1s, and AFEW, achieving average accuracies of 59.7%, 57.5%, and 54.7%, respectively. The MDF-HF method exhibits superior performance, compared to state-of-the-art methods in facial expression recognition, offering a robust solution for recognizing facial expressions in dynamic and unconstrained video scenarios.

References

[1]
Shan Li, Weihong Deng, Deep facial expression recognition: a survey, IEEE Trans. Affect. Comput. 13 (3) (2020) 1195–1215.
[2]
Luefeng Chen, Min Li, Min Wu, Witold Pedrycz, Kaoru Hirota, Coupled multimodal emotional feature analysis based on broad-deep fusion networks in human-robot interaction, IEEE Trans. Neural Netw. Learn. Syst. (2023).
[3]
Zhenyu Liu, Xiaoyan Yuan, Yutong Li, Zixuan Shangguan, Li Zhou, Bin Hu, Pra-net: part-and-relation attention network for depression recognition from facial expression, Comput. Biol. Med. 157 (2023).
[4]
Paul Ekman, Wallace V. Friesen, Constants across cultures in the face and emotion, J. Pers. Soc. Psychol. 17 (2) (1971) 124.
[5]
S.L. Happy, Aurobinda Routray, Automatic facial expression recognition using features of salient facial patches, IEEE Trans. Affect. Comput. 6 (1) (2014) 1–12.
[6]
Jingwei Yan, Wenming Zheng, Zhen Cui, Chuangao Tang, Tong Zhang, Yuan Zong, Multi-cue fusion for emotion recognition in the wild, Neurocomputing 309 (2018) 27–35.
[7]
Qionghao Huang, Changqin Huang, Xizhe Wang, Fan Jiang, Facial expression recognition with grid-wise attention and visual transformer, Inf. Sci. 580 (2021) 35–54.
[8]
Ziyang Zhang, Xiang Tian, Yuan Zhang, Kailing Guo, Xiangmin Xu, Enhanced discriminative global-local feature learning with priority for facial expression recognition, Inf. Sci. 630 (2023) 370–384.
[9]
Yaoguang Ye, Yongqi Pan, Yan Liang, Jiahui Pan, A cascaded spatiotemporal attention network for dynamic facial expression recognition, Appl. Intell. 53 (5) (2023) 5402–5415.
[10]
Weicong Chen, Dong Zhang, Ming Li, Dah-Jye Lee, STCAM: spatial-temporal and channel attention module for dynamic facial expression recognition, IEEE Trans. Affect. Comput. 14 (1) (2020) 800–810.
[11]
Fatemeh Noroozi, Marina Marjanovic, Angelina Njegus, Sergio Escalera, Gholamreza Anbarjafari, Audio-visual emotion recognition in video clips, IEEE Trans. Affect. Comput. 10 (1) (2017) 60–75.
[12]
Xiaohan Xia, Dongmei Jiang, HiT-MST: dynamic facial expression recognition with hierarchical transformers and multi-scale spatiotemporal aggregation, Inf. Sci. 644 (2023).
[13]
Andrey Savchenko, Facial expression recognition with adaptive frame rate based on multiple testing correction, in: Proceedings of the International Conference on Machine Learning (ICML), 2023, pp. 7660–7669.
[14]
Timo Ojala, Matti Pietikainen, Topi Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell. 24 (7) (2002) 971–987.
[15]
Navneet Dalal, Bill Triggs, Histograms of oriented gradients for human detection, in: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), vol. 1, 2005, pp. 886–893.
[16]
Jizheng Yi, Aibin Chen, Zixing Cai, Yi Sima, Mengna Zhou, Xingyu Wu, Facial expression recognition of intercepted video sequences based on feature point movement trend and feature block texture variation, Appl. Soft Comput. 82 (2019).
[17]
Xi Zhang, Feifei Zhang, Changsheng Xu, Joint expression synthesis and representation learning for facial expression recognition, IEEE Trans. Circuits Syst. Video Technol. 32 (3) (2021) 1681–1695.
[18]
Guoying Zhao, Matti Pietikainen, Dynamic texture recognition using local binary patterns with an application to facial expressions, IEEE Trans. Pattern Anal. Mach. Intell. 29 (6) (2007) 915–928.
[19]
Junkai Chen, Zenghai Chen, Zheru Chi, Hong Fu, Facial expression recognition in video with multiple feature fusion, IEEE Trans. Affect. Comput. 9 (1) (2016) 38–50.
[20]
Yuanyuan Liu, Xiaohui Yuan, Xi Gong, Zhong Xie, Fang Fang, Zhongwen Luo, Conditional convolution neural network enhanced random forest for facial expression recognition, Pattern Recognit. 84 (2018) 251–261.
[21]
Jiyoung Lee, Sunok Kim, Seungryong Kim, Kwanghoon Sohn, Multi-modal recurrent attention networks for facial expression recognition, IEEE Trans. Image Process. 29 (2020) 6977–6991.
[22]
Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, Yang Li, Spatial–temporal recurrent neural network for emotion recognition, IEEE Trans. Cybern. 49 (3) (2018) 839–847.
[23]
Debin Meng, Xiaojiang Peng, Kai Wang, Yu Qiao, Frame attention networks for facial expression recognition in videos, in: 2019 IEEE International Conference on Image Processing (ICIP), 2019, pp. 3866–3870.
[24]
Wei Zhang, Youmei Zhang, Lin Ma, Jingwei Guan, Shijie Gong, Multimodal learning for facial expression recognition, Pattern Recognit. 48 (10) (2015) 3191–3202.
[25]
Seo-Jeon Park, Byung-Gyu Kim, Naveen Chilamkurti, A robust facial expression recognition algorithm based on multi-rate feature fusion scheme, Sensors 21 (21) (2021) 6954.
[26]
Rui Zhao, Tianshan Liu, Zixun Huang, Daniel Pak-Kong Lun, Kenneth K.M. Lam, Geometry-aware facial expression recognition via attentive graph convolutional networks, IEEE Trans. Affect. Comput. 14 (2) (2021) 1159–1174.
[27]
Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1–3) (2006) 489–501.
[28]
Olivier Martin, Irene Kotsia, Benoit Macq, Ioannis Pitas, The eNTERFACE'05 audio-visual emotion database, in: 22nd International Conference on Data Engineering Workshops (ICDEW'06), 2006, p. 8.
[29]
Sara Zhalehpour, Onur Onder, Zahid Akhtar, Cigdem Eroglu Erdem, BAUM-1: a spontaneous audio-visual face database of affective and mental states, IEEE Trans. Affect. Comput. 8 (3) (2016) 300–313.
[30]
Shiqing Zhang, Xianzhang Pan, Yueli Cui, Xiaoming Zhao, Limei Liu, Learning affective video features for facial expression recognition via hybrid deep learning, IEEE Access 7 (2019) 32297–32304.
[31]
Yaxiong Ma, Yixue Hao, Min Chen, Jincai Chen, Ping Lu, Andrej Košir, Audio-visual emotion fusion (avef): a deep efficient weighted approach, Inf. Fusion 46 (2019) 184–192.
[32]
Abhinav Dhall, Roland Goecke, Simon Lucey, Tom Gedeon, et al., Collecting large, richly annotated facial-expression databases from movies, IEEE Multimed. 19 (3) (2012) 34.
[33]
Andrey V. Savchenko, Facial expression and attributes recognition based on multi-task learning of lightweight neural networks, in: 2021 IEEE 19th International Symposium on Intelligent Systems and Informatics (SISY), 2021, pp. 119–124.
[34]
Egils Avots, Tomasz Sapiński, Maie Bachmann, Dorota Kamińska, Audiovisual emotion recognition in wild, Mach. Vis. Appl. 30 (5) (2019) 975–985.
[35]
Ryo Miyoshi, Noriko Nagata, Manabu Hashimoto, Enhanced convolutional lstm with spatial and temporal skip connections and temporal gates for facial expression recognition from video, Neural Comput. Appl. 33 (2021) 7381–7392.
[36]
Zeinab Farhoudi, Saeed Setayeshi, Fusion of deep learning features with mixture of brain emotional learning for audio-visual emotion recognition, Speech Commun. 127 (2021) 92–103.
[37]
Rui Zhao, Tianshan Liu, Zixun Huang, Daniel P.K. Lun, Kin-Man Lam, Spatial-temporal graphs plus transformers for geometry-guided facial expression recognition, IEEE Trans. Affect. Comput. (2022) 1–17.
[38]
Amir Shirian, Subarna Tripathi, Tanaya Guha, Dynamic emotion modeling with learnable graphs and graph inception network, IEEE Trans. Multimed. 24 (2021) 780–790.
[39]
Junnan Zhi, Tingting Song, Kang Yu, Fengen Yuan, Huaqiang Wang, Guangyang Hu, Hao Yang, Multi-attention module for dynamic facial emotion recognition, Information 13 (5) (2022) 207.
[40]
Jie Wei, Guanyu Hu, Xinyu Yang, Anh Tuan Luu, Yizhuo Dong, Learning facial expression and body gesture visual information for video emotion recognition, Expert Syst. Appl. 237 (2024).
[41]
Xiaoming Zhao, Gang Chen, Yuelong Chuang, Xin Tao, Shiqing Zhang, Learning expression features via deep residual attention networks for facial expression recognition from video sequences, IETE Tech. Rev. 38 (6) (2021) 602–610.
[42]
Min Hu, Haowen Wang, Xiaohua Wang, Juan Yang, Ronggui Wang, Video facial emotion recognition based on local enhanced motion history image and cnn-ctslstm networks, J. Vis. Commun. Image Represent. 59 (2019) 176–185.
[43]
Vikas Kumar, Shivansh Rao, Li Yu, Noisy student training using body language dataset improves facial expression recognition, in: Proceedings of the European Conference on Computer Vision (ECCV), Springer, 2020, pp. 756–773.
[44]
Xiaofeng Liu, Linghao Jin, Xu Han, Jane You, Mutual information regularized identity-aware facial expression recognition in compressed video, Pattern Recognit. 119 (2021).
[45]
Yuanyuan Liu, Chuanxu Feng, Xiaohui Yuan, Lin Zhou, Wenbin Wang, Jie Qin, Zhongwen Luo, Clip-aware expressive feature learning for video-based facial expression recognition, Inf. Sci. 598 (2022) 182–195.
[46]
Rajesh Singh, Sumeet Saurav, Tarun Kumar, Ravi Saini, Anil Vohra, Sanjay Singh, Facial expression recognition in videos using hybrid cnn & convlstm, Int. J. Inf. Technol. 15 (4) (2023) 1819–1830.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 660, Issue C
Mar 2024
599 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 March 2024

Author Tags

  1. Facial expression recognition
  2. Key-frame selection
  3. Dynamic feature learning
  4. Multi-cue information fusion
  5. Stacking ensemble

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media