Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Contrastive Learning of View-invariant Representations for Facial Expressions Recognition

Published: 11 December 2023 Publication History
  • Get Citation Alerts
  • Abstract

    Although there has been much progress in the area of facial expression recognition (FER), most existing methods suffer when presented with images that have been captured from viewing angles that are non-frontal and substantially different from those used in the training process. In this article, we propose ViewFX, a novel view-invariant FER framework based on contrastive learning, capable of accurately classifying facial expressions regardless of the input viewing angles during inference. ViewFX learns view-invariant features of expression using a proposed self-supervised contrastive loss, which brings together different views of the same subject with a particular expression in the embedding space. We also introduce a supervised contrastive loss to push the learned view-invariant features of each expression away from other expressions. Since facial expressions are often distinguished with very subtle differences in the learned feature space, we incorporate the Barlow twins loss to reduce the redundancy and correlations of the representations in the learned representations. The proposed method is a substantial extension of our previously proposed CL-MEx, which only had a self-supervised loss. We test the proposed framework on two public multi-view facial expression recognition datasets, KDEF and DDCF. The experiments demonstrate that our approach outperforms previous works in the area and sets a new state-of-the-art for both datasets while showing considerably less sensitivity to challenging angles and the number of output labels used for training. We also perform detailed sensitivity and ablation experiments to evaluate the impact of different components of our model as well as its sensitivity to different parameters.

    References

    [1]
    Felix Albu, Daniela Hagiescu, Liviu Vladutu, and Mihaela-Alexandra Puica. 2015. Neural network approaches for children’s emotion recognition in intelligent learning applications. In 7th Annual International Conference on Education and New Learning Technologies.
    [2]
    Adrien Bardes, Jean Ponce, and Yann LeCun. 2021. VICReg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906 (2021).
    [3]
    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning. 1597–1607.
    [4]
    Youngjun Cho, Simon J. Julier, and Nadia Bianchi-Berthouze. 2019. Instant stress: Detection of perceived mental stress through smartphone photoplethysmography and thermal imaging. JMIR Mental Health 6, 4 (2019).
    [5]
    Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1. 539–546.
    [6]
    Kirsten A. Dalrymple, Jesse Gomez, and Brad Duchaine. 2013. The Dartmouth database of children’s faces: Acquisition and validation of a new face stimulus set. PloS One 8, 11 (2013), e79131.
    [7]
    Stefanos Eleftheriadis, Ognjen Rudovic, and Maja Pantic. 2014. Discriminative shared Gaussian processes for multiview and view-invariant facial expression recognition. IEEE Trans. Image Process. 24, 1 (2014), 189–204.
    [8]
    Irfan A. Essa and Alex P. Pentland. 1995. Facial expression recognition using a dynamic model and motion energy. In IEEE International Conference on Computer Vision. IEEE, 360–367.
    [9]
    Bei Fang, Xian Li, Guangxin Han, and Juhou He. 2023. Rethinking pseudo-labeling for semi-supervised facial expression recognition with contrastive self-supervised learning. IEEE Access (2023).
    [10]
    Ali Pourramezan Fard and Mohammad H. Mahoor. 2022. Ad-Corre: Adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access 10 (2022), 26756–26768.
    [11]
    Nicholas Frosst, Nicolas Papernot, and Geoffrey Hinton. 2019. Analyzing and improving representations with the soft nearest neighbor loss. In International Conference on Machine Learning. 2012–2020.
    [12]
    Walid Hariri, Hedi Tabia, Nadir Farah, Abdallah Benouareth, and David Declercq. 2017. 3D facial expression recognition using kernel methods on Riemannian manifold. Eng. Applic. Artif. Intell. 64 (2017), 25–32.
    [13]
    Behzad Hasani, Pooran Singh Negi, and Mohammad Mahoor. 2020. BReG-NeXt: Facial affect computing using adaptive residual networks with bounded gradient. IEEE Trans. Affect. Comput. (2020).
    [14]
    Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729–9738.
    [15]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
    [16]
    Olivier Henaff. 2020. Data-efficient image recognition with contrastive predictive coding. In International Conference on Machine Learning. 4182–4192.
    [17]
    R. Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018).
    [18]
    Yuxiao Hu, Zhihong Zeng, Lijun Yin, Xiaozhou Wei, Xi Zhou, and Thomas S. Huang. 2008. Multi-view facial expression recognition. In 8th IEEE International Conference on Automatic Face & Gesture Recognition. IEEE, 1–6.
    [19]
    Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In IEEE Conference on Computer Vision and Pattern Recognition. 4700–4708.
    [20]
    Xuan-Phung Huynh, Tien-Duc Tran, and Yong-Guk Kim. 2016. Convolutional neural network models for facial expression recognition using BU-3DFE database. In Information Science and Applications. Springer, 441–450.
    [21]
    Jing Jiang and Weihong Deng. 2021. Boosting facial expression recognition by a semi-supervised progressive teacher. IEEE Trans. Affect. Comput. (2021).
    [22]
    Shao Jie and Qian Yongsheng. 2020. Multi-view facial expression recognition with multi-view facial expression light weight network. Pattern Recog. Image Anal. 30 (2020), 805–814.
    [23]
    Rizwan Ahmed Khan, Alexandre Meyer, Hubert Konik, and Saida Bouakaz. 2019. Saliency-based framework for facial expression recognition. Front. Comput. Sci. 13, 1 (2019), 183–198.
    [24]
    Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 33 (2020).
    [25]
    Mojtaba Kolahdouzi, Alireza Sepas-Moghaddam, and Ali Etemad. 2021. Face trees for expression recognition. In 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG’21). IEEE, 1–5.
    [26]
    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25 (2012).
    [27]
    H. Leng, Y. Lin, and L. A. Zanzi. 2007. An experimental study on physiological parameters toward driver emotion recognition. In International Conference on Ergonomics and Health Aspects of Work with Computers. 237–246.
    [28]
    Hangyu Li, Nannan Wang, Xi Yang, and Xinbo Gao. 2022. CRS-CONT: A well-trained general encoder for facial expression analysis. IEEE Trans. Image Process. (2022).
    [29]
    Hangyu Li, Nannan Wang, Xi Yang, Xiaoyu Wang, and Xinbo Gao. 2022. Towards semi-supervised deep facial expression recognition with an adaptive confidence margin. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4166–4175.
    [30]
    Yinqi Li, Hong Chang, Bingpeng Ma, Shiguang Shan, and Xilin Chen. 2022. Optimal positive generation via latent transformation for contrastive learning. Adv. Neural Inf. Process. Syst. 35 (2022), 18327–18342.
    [31]
    Yingjian Li, Yingnan Gao, Bingzhi Chen, Zheng Zhang, Guangming Lu, and David Zhang. 2021. Self-supervised exclusive-inclusive interactive learning for multi-label facial expression recognition in the wild. IEEE Trans. Circ. Syst. Vid. Technol. (2021).
    [32]
    Chaoji Liu, Xingqiao Liu, Chong Chen, and Qiankun Wang. 2023. Soft thresholding squeeze-and-excitation network for pose-invariant facial expression recognition. Vis. Comput. 39, 7 (2023), 2637–2652.
    [33]
    Yuanyuan Liu, Jiyao Peng, Jiabei Zeng, and Shiguang Shan. 2019. Pose-adaptive hierarchical attention network for facial expression recognition. arXiv preprint arXiv:1905.10059 (2019).
    [34]
    Yuanyuan Liu, Jiabei Zeng, Shiguang Shan, and Zhuo Zheng. 2018. Multi-channel pose-aware convolution neural networks for multi-view facial expression recognition. In 13th IEEE International Conference on Automatic Face & Gesture Recognition. 458–465.
    [35]
    Daniel Lundqvist, Anders Flykt, and Arne Öhman. 1998. The Karolinska Directed Emotional Faces (KDEF). CD ROM Depar. Clin. Neurosci., Psychol. Sect., Karolinska Instit. 91, 630 (1998), 2–2.
    [36]
    Vijayalakshmi G. V. Mahesh, Chengji Chen, Vijayarajan Rajangam, Alex Noel Joseph Raj, and Palani Thanaraj Krishnan. 2021. Shape and texture aware facial expression recognition using spatial pyramid Zernike moments and law’s textures feature set. IEEE Access 9 (2021), 52509–52522.
    [37]
    Ishan Misra and Laurens van der Maaten. 2020. Self-supervised learning of pretext-invariant representations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6707–6717.
    [38]
    Atsuyuki Miyai, Qing Yu, Daiki Ikami, Go Irie, and Kiyoharu Aizawa. 2023. Rethinking rotation in self-supervised contrastive learning: Adaptive positive or negative data augmentation. In IEEE/CVF Winter Conference on Applications of Computer Vision. 2809–2818.
    [39]
    Stephen Moore and Richard Bowden. 2011. Local binary patterns for multi-view facial expression recognition. Comput. Vis. Image Underst. 115, 4 (2011), 541–558.
    [40]
    Mahdi Pourmirzaei, Farzaneh Esmaili, and Gholam Ali Montazer. 2021. Using self-supervised co-training to improve facial representation. arXiv preprint arXiv:2105.06421 (2021).
    [41]
    Andreas Psaroudakis and Dimitrios Kollias. 2022. MixAugment & MixUp: Augmentation methods for facial expression recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2367–2375.
    [42]
    Qiyu Rao, Xing Qu, Qirong Mao, and Yongzhao Zhan. 2015. Multi-pose facial expression recognition based on SURF boosting. In IEEE International Conference on Affective Computing and Intelligent Interaction. 630–635.
    [43]
    Shuvendu Roy and Ali Etemad. 2021. Self-supervised contrastive learning of multi-view facial expressions. In International Conference on Multimodal Interaction. 253–257.
    [44]
    Shuvendu Roy and Ali Etemad. 2021. Spatiotemporal contrastive learning of facial expressions in videos. (2021), 1–8.
    [45]
    Shuvendu Roy and Ali Etemad. 2023. Active learning with contrastive pre-training for facial expression recognition. In International Conference on Affective Computing and Intelligent Interaction.
    [46]
    Shuvendu Roy and Ali Etemad. 2023. Temporal contrastive learning with curriculum. In IEEE International Conference on Acoustics, Speech and Signal Processing. 1–5.
    [47]
    Ruslan Salakhutdinov and Geoff Hinton. 2007. Learning a nonlinear embedding by preserving class neighbourhood structure. In Artificial Intelligence and Statistics. 412–419.
    [48]
    Ashok Samal and Prasana A. Iyengar. 1992. Automatic recognition and analysis of human faces and facial expressions: A survey. Pattern Recog. 25, 1 (1992), 65–77.
    [49]
    Dairazalia Sanchez-Cortes, Joan-Isaac Biel, Shiro Kumano, Junji Yamato, Kazuhiro Otsuka, and Daniel Gatica-Perez. 2013. Inferring mood in ubiquitous conversational video. In 12th International Conference on Mobile and Ubiquitous Multimedia. 1–9.
    [50]
    Bikash Santra and Dipti Prasad Mukherjee. 2016. Local dominant binary patterns for recognition of multi-view facial expressions. In 10th Indian Conference on Computer Vision, Graphics and Image Processing. 1–8.
    [51]
    Liam Schoneveld, Alice Othmani, and Hazem Abdelkawy. 2021. Leveraging recent advances in deep learning for audio-Visual emotion recognition. Pattern Recog. Lett. (2021).
    [52]
    Alireza Sepas-Moghaddam, Ali Etemad, Paulo Lobato Correia, and Fernando Pereira. 2019. A deep framework for facial emotion recognition using light field images. In 8th International Conference on Affective Computing and Intelligent Interaction (ACII’19). IEEE, 1–7.
    [53]
    Alireza Sepas-Moghaddam, Ali Etemad, Fernando Pereira, and Paulo Lobato Correia. 2020. Facial emotion recognition using light field images with deep attention-based bidirectional LSTM. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). 3367–3371.
    [54]
    Alireza Sepas-Moghaddam, Ali Etemad, Fernando Pereira, and Paulo Lobato Correia. 2020. Long short-term memory with gate and state level fusion for light field-based face recognition. IEEE Trans. Inf. Forens. Secur. 16 (2020), 1365–1379.
    [55]
    Alireza Sepas-Moghaddam, Ali Etemad, Fernando Pereira, and Paulo Lobato Correia. 2021. CapsField: Light field-based face and expression recognition in the wild using capsule routing. IEEE Trans. Image Process. 30 (2021), 2627–2642.
    [56]
    Alireza Sepas-Moghaddam, Fernando Pereira, Paulo Lobato Correia, and Ali Etemad. 2021. Multi-perspective LSTM for joint visual representation learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16540–16548.
    [57]
    Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. 2018. Time-contrastive networks: Self-supervised learning from video. In IEEE International Conference on Robotics and Automation (ICRA’18). 1134–1141.
    [58]
    Caifeng Shan, Shaogang Gong, and Peter W. McOwan. 2005. Robust facial expression recognition using local binary patterns. In IEEE International Conference on Image Processing, Vol. 2. IEEE, II–370.
    [59]
    Caifeng Shan, Shaogang Gong, and Peter W. McOwan. 2009. Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis. Comput. 27, 6 (2009), 803–816.
    [60]
    Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
    [61]
    Henrique Siqueira, Sven Magg, and Stefan Wermter. 2020. Efficient facial feature learning with wide ensemble-based convolutional neural networks. In AAAI Conference on Artificial Intelligence, Vol. 34. 5800–5809.
    [62]
    Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition. 1–9.
    [63]
    Sima Taheri, Pavan Turaga, and Rama Chellappa. 2011. Towards view-invariant expression analysis using analytic shape manifolds. In IEEE International Conference on Automatic Face & Gesture Recognition (FG’11). IEEE, 306–313.
    [64]
    Yan Tang, Xingming Zhang, Xiping Hu, Siqi Wang, and Haoxiang Wang. 2020. Facial expression recognition using frequency neural network. IEEE Trans. Image Process. 30 (2020), 444–457.
    [65]
    Michelle Thrasher, Marjolein D. Van der Zwaag, Nadia Bianchi-Berthouze, and Joyce H. D. M. Westerink. 2011. Mood recognition based on upper body posture and movement features. In International Conference on Affective Computing and Intelligent Interaction. 377–386.
    [66]
    Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive multiview coding. In European Conference on Computer Vision. 776–794.
    [67]
    Shinichi Tokuno, Gentaro Tsumatori, Satoshi Shono, Eriko Takei, Taisuke Yamamoto, Go Suzuki, Shunnji Mituyoshi, and Makoto Shimura. 2011. Usage of emotion recognition in military health care. In Defense Science Research Conference and Expo. 1–5.
    [68]
    Yedatore V. Venkatesh, Ashraf A. Kassim, Jun Yuan, and Tan Dat Nguyen. 2012. On the simultaneous recognition of identity and expression from BU-3DFE datasets. Pattern Recog. Lett. 33, 13 (2012), 1785–1793.
    [69]
    Quang Nhat Vo, Khanh Tran, and Guoying Zhao. 2019. 3D facial expression recognition based on multi-view and prior knowledge fusion. In IEEE 21st International Workshop on Multimedia Signal Processing (MMSP’19). IEEE, 1–6.
    [70]
    Thanh-Hung Vo, Guee-Sang Lee, Hyung-Jeong Yang, and Soo-Hyung Kim. 2020. Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access 8 (2020), 131988–132001.
    [71]
    Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu Qiao. 2020. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29 (2020), 4057–4069.
    [72]
    Lingfeng Wang, Shisen Wang, Jin Qi, and Kenji Suzuki. 2021. A multi-task mean teacher for semi-supervised facial affective behavior analysis. In IEEE/CVF International Conference on Computer Vision. 3603–3608.
    [73]
    Yiming Wang, Hui Yu, Junyu Dong, Brett Stevens, and Honghai Liu. 2016. Facial expression-aware face frontalization. In Asian Conference on Computer Vision. Springer, 375–388.
    [74]
    Zhirong Wu, Alexei A. Efros, and Stella X. Yu. 2018. Improving generalization via scalable neighborhood component analysis. In European Conference on Computer Vision. 685–701.
    [75]
    Zhirong Wu, Yuanjun Xiong, Stella X. Yu, and Dahua Lin. 2018. Unsupervised feature learning via non-parametric instance discrimination. In IEEE Conference on Computer Vision and Pattern Recognition. 3733–3742.
    [76]
    Bin Xia and Shangfei Wang. 2021. Micro-expression recognition enhanced by macro-expression from spatial-temporal domain. In International Joint Conference on Artificial Intelligence. 1186–1193.
    [77]
    Fanglei Xue, Zichang Tan, Yu Zhu, Zhongsong Ma, and Guodong Guo. 2022. Coarse-to-fine cascaded networks with smooth predicting for video facial expression recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2412–2418.
    [78]
    Lijun Yin, Xiaozhou Wei, Yi Sun, Jun Wang, and Matthew J. Rosato. 2006. A 3D facial expression database for facial behavior research. In International Conference on Automatic Face and Gesture Recognition. 211–216.
    [79]
    Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, and Stéphane Deny. 2021. Barlow twins: Self-supervised learning via redundancy reduction. arXiv preprint arXiv:2103.03230 (2021).
    [80]
    Dan Zeng, Zhiyuan Lin, Xiao Yan, Yuting Liu, Fei Wang, and Bo Tang. 2022. Face2Exp: Combating data biases for facial expression recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20291–20300.
    [81]
    Feifei Zhang, Qirong Mao, Xiangjun Shen, Yongzhao Zhan, and Ming Dong. 2018. Spatially coherent feature learning for pose-invariant facial expression recognition. ACM Trans. Multim. Comput., Commun. Applic. 14, 1s (2018), 1–19.
    [82]
    Feifei Zhang, Tianzhu Zhang, Qirong Mao, and Changsheng Xu. 2020. Geometry guided pose-invariant facial expression recognition. IEEE Trans. Image Process. 29 (2020), 4445–4460.
    [83]
    Mingkai Zheng, Fei Wang, Shan You, Chen Qian, Changshui Zhang, Xiaogang Wang, and Chang Xu. 2021. Weakly supervised contrastive learning. In IEEE/CVF International Conference on Computer Vision. 10042–10051.
    [84]
    Yuqian Zhou and Bertram E. Shi. 2017. Action unit selective feature maps in deep networks for facial expression recognition. In IEEE International Joint Conference on Neural Networks. 2031–2038.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 4
    April 2024
    676 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3613617
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 December 2023
    Online AM: 14 November 2023
    Accepted: 09 November 2023
    Revised: 15 October 2023
    Received: 06 May 2023
    Published in TOMM Volume 20, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Affective computing
    2. contrastive learning
    3. expression recognition

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 169
      Total Downloads
    • Downloads (Last 12 months)169
    • Downloads (Last 6 weeks)8
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media