Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer Interaction

Published: 26 September 2023 Publication History

Abstract

A fascinating challenge in robotics-human interaction is imitating the emotion recognition capability of humans to robots with the aim to make human-robotics interaction natural, genuine and intuitive. To achieve the natural interaction in affective robots, human-machine interfaces, and autonomous vehicles, understanding our attitudes and opinions is very important, and it provides a practical and feasible path to realize the connection between machine and human. Multimodal interface that includes voice along with facial expression can manifest a large range of nuanced emotions compared to purely textual interfaces and provide a great value to improve the intelligence level of effective communication. Interfaces that fail to manifest or ignore user emotions may significantly impact the performance and risk being perceived as cold, socially inept, untrustworthy, and incompetent. To equip a child well for life, we need to help our children identify their feelings, manage them well, and express their needs in healthy, respectful, and direct ways. Early identification of emotional deficits can help to prevent low social functioning in children. In this work, we analyzed the child’s spontaneous behavior using multimodal facial expression and voice signal presenting multimodal transformer-based last feature fusion for facial behavior analysis in children to extract contextualized representations from RGB video sequence and Hematoxylin and eosin video sequence and then using these representations followed by pairwise concatenations of contextualized representations using cross-feature fusion technique to predict users emotions. To validate the performance of the proposed framework, we have performed experiments with the different pairwise concatenations of contextualized representations that showed significantly better performance than state-of-the-art method. Besides, we perform t-distributed stochastic neighbor embedding visualization to visualize the discriminative feature in lower dimension space and probability density estimation to visualize the prediction capability of our proposed model.

References

[1]
Amani Albraikan, Basim Hafidh, and Abdulmotaleb El Saddik. 2018. iAware: A real-time emotional biofeedback system based on physiological signals. IEEE Access 6 (2018), 78780–78789.
[2]
Amani Albraikan, Diana P. Tobón, and Abdulmotaleb El Saddik. 2018. Hyper-parameter optimization for emotion detection using physiological signals. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 836–841.
[3]
Amani Albraikan, Diana P. Tobón, and Abdulmotaleb El Saddik. 2018. Toward user-independent emotion recognition using physiological signals. IEEE Sensors J. 19, 19 (2018), 8402–8412.
[4]
Russell Beale and Christian Peter. 2008. The role of affect and emotion in HCI. In Affect and Emotion in Human-Computer Interaction. Springer, 1–11.
[5]
Girum G. Demisse, Djamila Aouada, and Björn Ottersten. 2018. Deformation-based 3D facial expression representation. ACM Trans. Multim. Comput., Commun. Applic. 14, 1s (2018), 1–22.
[6]
Huang Gu, Qiong Chen, Xiaoli Xing, Junfeng Zhao, and Xiaoming Li. 2019. Facial emotion recognition in deaf children: Evidence from event-related potentials and event-related spectral perturbation analysis. Neuroscience Lett. 703 (2019), 198–204.
[7]
Rizwan Ahmed Khan, Arthur Crenn, Alexandre Meyer, and Saida Bouakaz. 2019. A novel database of children’s spontaneous facial expressions (LIRIS-CSE). Image Vis. Comput. 83 (2019), 61–69.
[8]
Yelin Kim and Emily Mower Provost. 2015. Emotion recognition during speech using dynamics of multiple regions of the face. (2015).
[9]
Natalia Kucirkova, Cecilie Evertsen-Stanghelle, Ingunn Studsrød, Ida Bruheim Jensen, and Ingunn Størksen. 2020. Lessons for child–computer interaction studies following the research challenges during the Covid-19 pandemic. Int. J. Child-comput. Interact. 26 (2020), 100203.
[10]
Shiguang Liu, Huixin Wang, and Min Pei. 2022. Facial-expression-aware emotional color transfer based on convolutional neural network. ACM Trans. Multim. Comput., Commun. Applic. 18, 1 (2022), 1–19.
[11]
Juan Manuel Mayor-Torres, Sara Medina-DeVilliers, Tessa Clarkson, Matthew D. Lerner, and Giuseppe Riccardi. 2021. Evaluation of interpretability for deep learning algorithms in EEG emotion recognition: A case study in autism. arXiv preprint arXiv:2111.13208 (2021).
[12]
Vida Mehdizadehfar, Farnaz Ghassemi, Ali Fallah, and Hamidreza Pouretemad. 2020. EEG study of facial emotion recognition in the fathers of autistic children. Biomed. Sig. Process. Contr. 56 (2020), 101721.
[13]
Raja Majid Mehmood and Hyo Jong Lee. 2016. A novel feature extraction method based on late positive potential for emotion recognition in human brain signal patterns. Comput. Electric. Eng. 53 (2016), 444–457.
[14]
Yu Miao, Haiwei Dong, Jihad Mohamad Al Jaam, and Abdulmotaleb El Saddik. 2019. A deep learning system for recognizing facial expression in real-time. ACM Trans. Multim. Comput., Commun. Applic. 15, 2 (2019), 1–20.
[15]
Abdul Qayyum, Imran Razzak, Nour Moustafa, and Mona Mazhar. 2022. Progressive ShallowNet for large scale dynamic and spontaneous facial behaviour analysis in children. Image Vis. Comput. (2022), 104375.
[16]
Caifeng Shan, Shaogang Gong, and Peter W. McOwan. 2009. Facial expression recognition based on local binary patterns: A comprehensive study. Image Vis. Comput. 27, 6 (2009), 803–816.
[17]
S. Voeffray. 2011. Emotion-sensitive human-computer interaction (HCI): State of the art—Seminar paper. Emot. Recog. (2011), 1–4.
[18]
Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu Qiao. 2020. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29 (2020), 4057–4069.
[19]
Xueping Wang, Yunhong Wang, and Weixin Li. 2019. U-Net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Trans. Multim. Comput., Commun. Applic. 15, 3s (2019), 1–23.
[20]
Huei-Fang Yang, Bo-Yao Lin, Kuang-Yu Chang, and Chu-Song Chen. 2018. Joint estimation of age and expression by combining scattering and convolutional networks. ACM Trans. Multim. Comput., Commun. Applic. 14, 1 (2018), 1–18.
[21]
Anbang Yao, Dongqi Cai, Ping Hu, Shandong Wang, Liang Sha, and Yurong Chen. 2016. HoloNet: Towards robust emotion recognition in the wild. In Proceedings of the 18th ACM International Conference on Multimodal Interaction. 472–478.
[22]
Guanghao Yin, Shouqian Sun, Dian Yu, Dejian Li, and Kejun Zhang. 2022. A multimodal framework for large-scale emotion recognition by fusing music and electrodermal activity signals. ACM Trans. Multim. Comput., Commun. Applic. 18, 3 (2022), 1–23.
[23]
Jiabei Zeng, Shiguang Shan, and Xilin Chen. 2018. Facial expression recognition with inconsistently annotated datasets. In Proceedings of the European Conference on Computer Vision (ECCV). 222–237.
[24]
Wei Zhang, Ting Yao, Shiai Zhu, and Abdulmotaleb El Saddik. 2019. Deep learning–based multimedia analytics: A review. ACM Trans. Multim. Comput., Commun. Applic. 15, 1s (2019), 1–26.
[25]
Zhaoxin Zhang, Changyong Guo, Fanzhi Meng, Taizhong Xu, and Junkai Huang. 2020. CovLets: A second-order descriptor for modeling multiple features. ACM Trans. Multim. Comput., Commun. Applic. 16, 1s (2020), 1–14.
[26]
Guoying Zhao and Matti Pietikainen. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Patt. Anal. Mach. Intell. 29, 6 (2007), 915–928.
[27]
Ruicong Zhi, Markus Flierl, Qiuqi Ruan, and W. Bastiaan Kleijn. 2010. Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition. IEEE Trans. Syst., Man, Cyber., Part B (Cyber.) 41, 1 (2010), 38–52.
[28]
Lin Zhong, Qingshan Liu, Peng Yang, Bo Liu, Junzhou Huang, and Dimitris N. Metaxas. 2012. Learning active facial patches for expression analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2562–2569.

Cited By

View all
  • (2024)Multi Fine-Grained Fusion Network for Depression DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366524720:8(1-23)Online publication date: 29-Jun-2024
  • (2022)Deep Recurrent Regression with a Heatmap Coupling Module for Facial Landmarks DetectionCognitive Computation10.1007/s12559-022-10065-916:4(1964-1978)Online publication date: 27-Oct-2022

Index Terms

  1. Spontaneous Facial Behavior Analysis Using Deep Transformer-based Framework for Child–computer Interaction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 2
      February 2024
      548 pages
      EISSN:1551-6865
      DOI:10.1145/3613570
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 September 2023
      Online AM: 26 May 2022
      Accepted: 16 May 2022
      Revised: 14 April 2022
      Received: 16 October 2021
      Published in TOMM Volume 20, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Datasets
      2. neural networks
      3. gaze detection
      4. text tagging

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)156
      • Downloads (Last 6 weeks)15
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Multi Fine-Grained Fusion Network for Depression DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366524720:8(1-23)Online publication date: 29-Jun-2024
      • (2022)Deep Recurrent Regression with a Heatmap Coupling Module for Facial Landmarks DetectionCognitive Computation10.1007/s12559-022-10065-916:4(1964-1978)Online publication date: 27-Oct-2022

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media