research-article

Public Access

Group-Level Emotion Recognition using Deep Models with A Four-stream Hybrid Network

Authors:

Ahmed Shehab Khan,

James O'Reilly,

Yan TongAuthors Info & Claims

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

Pages 623 - 629

https://doi.org/10.1145/3242969.3264987

Published: 02 October 2018 Publication History

Abstract

Group-level Emotion Recognition (GER) in the wild is a challenging task gaining lots of attention. Most recent works utilized two channels of information, a channel involving only faces and a channel containing the whole image, to solve this problem. However, modeling the relationship between faces and scene in a global image remains challenging. In this paper, we proposed a novel face-location aware global network, capturing the face location information in the form of an attention heatmap to better model such relationships. We also proposed a multi-scale face network to infer the group-level emotion from individual faces, which explicitly handles high variance in image and face size, as images in the wild are collected from different sources with different resolutions. In addition, a global blurred stream was developed to explicitly learn and extract the scene-only features. Finally, we proposed a four-stream hybrid network, consisting of the face-location aware global stream, the multi-scale face stream, a global blurred stream, and a global stream, to address the GER task, and showed the effectiveness of our method in GER sub-challenge, a part of the six Emotion Recognition in the Wild (EmotiW 2018) [10] Challenge. The proposed method achieved 65.59% and 78.39% accuracy on the testing and validation sets, respectively, and is ranked the third place on the leaderboard.

References

[1]

Dan Antoshchenko. 2017. mtcnn-pytorch. https://github.com/TropComplique/mtcnn-pytorch.

[2]

T. Baltrusaitis, M. Mahmoud, and P. Robinson. 2015. Cross-dataset learning and person-specific normalisation for automatic Action Unit detection. In FG, Vol. Vol. 6. 1--6.

[3]

M. S. Bartlett, G. Littlewort, M. G. Frank, C. Lainscsek, I. Fasel, and J. R. Movellan. 2005. Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior CVPR. 568--573.

Digital Library

[4]

C. Benitez-Quiroz, Y. Wang, and A. Martinez. 2017. Recognition of Action Units in the Wild with Deep Nets and a New Global-Local Loss ICCV. IEEE, 3990--3999.

[5]

J. Cai, Z. Meng, A. Khan, Z. Li, J. O'Reilly, and Y. Tong. 2018. Island Loss for Learning Discriminative Features in Facial Expression Recognition FG. IEEE, 302--309.

[6]

W. Chu, F. De la Torre, and J. Cohn. 2016. Selective transfer machine for personalized facial expression analysis. IEEE T-PAMI (2016).

Digital Library

[7]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database CVPR09.

[8]

Abhinav Dhall, Roland Goecke, and Tom Gedeon. 2015. Automatic group happiness intensity analysis. IEEE Transactions on Affective Computing Vol. 6, 1 (2015), 13--26.

Digital Library

[9]

Abhinav Dhall, Jyoti Joshi, Karan Sikka, Roland Goecke, and Nicu Sebe. 2015. The more the merrier: Analysing the affect of a group of people in images Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, Vol. Vol. 1. IEEE, 1--8.

[10]

Abhinav Dhall, Amanjot Kaur, Ronald Goecke, and Tom Gedeon. 2018. EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction ICMI. ACM.

Digital Library

[11]

H. Ding, S. Zhou, and R. Chellappa. 2017. Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In FG. IEEE, 118--126.

[12]

Y. Fan, X. Lu, D. Li, and Y. Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks ICMI. 445--450.

Digital Library

[13]

I. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D. Lee, et al. 2013. Challenges in representation learning: A report on three machine learning contests ICML. Springer, 117--124.

[14]

X. Guo, L. Polanía, and K. Barner. 2017. Group-level emotion recognition using deep models on image scene, faces, and skeletons ICMI. ACM, 603--608.

Digital Library

[15]

S. Han, Z. Meng, A. S. Khan, and Y. Tong. 2016. Incremental Boosting Convolutional Neural Network for Facial Action Unit Recognition NIPS. 109--117.

Digital Library

[16]

S. Han, Z. Meng, J. O'Reilly, J. Cai, X. Wang, and Y. Tong. 2018. Optimizing Filter Size in Convolutional Neural Networks for Facial Action Unit Recognition. CVPR (2018).

[17]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.

[18]

B. Jiang, B. Martinez, M. F. Valstar, and M. Pantic. 2014. Decision level fusion of domain specific regions for facial action recognition ICPR. 1776--1781.

Digital Library

[19]

B. Jiang, M. F. Valstar, and M. Pantic. 2011. Action Unit detection using sparse appearance descriptors in space-time video volumes FG.

[20]

H. Kaya, F. Gürpinar, S. Afshar, and A. A. Salah. 2015. Contrasting and Combining Least Squares Based Learners for Emotion Recognition in the Wild. In ICMI. 459--466.

Digital Library

[21]

A. Klaser, M. Marszałek, and C. Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients BMVC. 275--1.

[22]

J. Li, S. Roy, J. Feng, and T. Sim. 2016. Happiness level prediction with sequential inputs via multiple regressions ICMI. ACM, 487--493.

Digital Library

[23]

S. Li, W. Deng, and J. Du. 2017. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild. In CVPR.

[24]

P. Liu, S. Han, Z. Meng, and Y. Tong. 2014. Facial expression recognition via a boosted deep belief network CVPR. 1805--1812.

Digital Library

[25]

Z. Meng, P. Liu, J. Cai, S. Han, and Y. Tong. 2017. Identity-Aware Convolutional Neural Network for Facial Expression Recognition FG. IEEE, 558--565.

[26]

M. R. Mohammadi, E. Fatemizadeh, and M. H. Mahoor. 2014. Simultaneous recognition of facial expression and identity via sparse representation WACV. 1066--1073.

[27]

S. Moore and R. Bowden. 2011. Local binary patterns for multi-view facial expression recognition. CVIU Vol. 115, 4 (2011), 541--558.

Digital Library

[28]

Wenxuan Mou, Oya Celiktutan, and Hatice Gunes. 2015. Group-level arousal and valence recognition in static images: Face, body and context. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, Vol. Vol. 5. IEEE, 1--6.

[29]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS-W.

[30]

R. Ptucha, G. Tsagkatakis, and A. Savakis. 2011. Manifold based sparse representation for robust expression recognition without neutral subtraction. In ICCV Workshops. 2136--2143.

[31]

Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2015. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE transactions on pattern analysis and machine intelligence Vol. 37, 6 (2015), 1113--1133.

[32]

P. Scovanner, S. Ali, and M. Shah. 2007. A 3-dimensional sift descriptor and its application to action recognition ACM MM. 357--360.

Digital Library

[33]

K. Simonyan and A. Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition NIPS.

[34]

B. Sun, Q. Wei, L. Li, Q. Xu, J. He, and L. Yu. 2016. LSTM for dynamic emotion and group emotion recognition in the wild ICMI. ACM, 451--457.

Digital Library

[35]

L. Tan, K. Zhang, K. Wang, X. Zeng, X. Peng, and Y. Qiao. 2017. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In ICMI. ACM, 549--552.

Digital Library

[36]

M. F. Valstar, M. Mehu, B. Jiang, M. Pantic, and K. Scherer. 2012. Meta-analysis of the first facial expression recognition challenge. IEEE T-SMC-B Vol. 42, 4 (2012), 966--979.

Digital Library

[37]

V. Vonikakis, Y. Yazici, V. Nguyen, and S. Winkler. 2016. Group happiness assessment using geometric features and dataset balancing ICMI. ACM, 479--486.

Digital Library

[38]

Q. Wei, Y. Zhao, Q. Xu, L. Li, J. He, L. Yu, and B. Sun. 2017. A new deep-learning framework for group emotion recognition ICMI. ACM, 587--592.

Digital Library

[39]

J. Xue, H. Zhang, and K. Dana. 2018. Deep Texture Manifold for Ground Terrain Recognition CVPR.

[40]

J. Xue, H. Zhang, K. Dana, and K. Nishino. 2017. Differential angular imaging for material recognition CVPR, Vol. Vol. 5.

[41]

P. Yang, Q. Liu, and Metaxas D. N. 2009. Boosting Encoded Dynamic Features for Facial Expression Recognition. Pattern Recognition Letters Vol. 30, 2 (Jan. 2009), 132--139.

Digital Library

[42]

A. Yuce, H. Gao, and J. Thiran. 2015. Discriminant Multi-Label Manifold Embedding for Facial Action Unit Detection FG.

[43]

K Zhang, Z Zhang, Z Li, and Y Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters Vol. 23, 10 (2016), 1499--1503.

[44]

G. Zhao, X. Huang, M. Taini, S. Li, and M. Pietikäinen. 2011. Facial expression recognition from near-infrared videos. J. IVC Vol. 29, 9 (2011), 607--619.

Digital Library

[45]

G. Zhao and M. Pietiäinen. 2007. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions. IEEE T-PAMI Vol. 29, 6 (June. 2007), 915--928.

Digital Library

[46]

L. Zhong, Q. Liu, P. Yang, J. Huang, and D. N. Metaxas. 2015. Learning Multiscale Active Facial Patches for Expression Analysis. IEEE Trans. on Cybernetics Vol. 45, 8 (August. 2015), 1499--1510.

Cited By

Xu JHuang X(2024)Group-Level Emotion Recognition Using Hierarchical Dual-Branch Cross Transformer with Semi-Supervised Learning2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674336(252-256)Online publication date: 21-Jun-2024
https://doi.org/10.1109/SEAI62072.2024.10674336
Das CNanda DChatterjee SDutta D(2024)Comparative Analysis of Neonatal Facial Expression from Images using Deep Learning2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10725190(1-7)Online publication date: 24-Jun-2024
https://doi.org/10.1109/ICCCNT61001.2024.10725190
Cai JMeng ZKhan ALi ZO’Reilly JTong Y(2023)Probabilistic Attribute Tree Structured Convolutional Neural Networks for Facial Expression Recognition in the WildIEEE Transactions on Affective Computing10.1109/TAFFC.2022.315692014:3(1927-1941)Online publication date: 1-Jul-2023
https://doi.org/10.1109/TAFFC.2022.3156920
Show More Cited By

Index Terms

Recommendations

From individual to group-level emotion recognition: EmotiW 5.0
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

Research in automatic affect recognition has come a long way. This paper describes the fifth Emotion Recognition in the Wild (EmotiW) challenge 2017. EmotiW aims at providing a common benchmarking platform for researchers working on different aspects ...
Group emotion recognition with individual facial emotion CNNs and global image based CNNs
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

This paper presents our approach for group-level emotion recognition in the Emotion Recognition in the Wild Challenge 2017. The task is to classify an image into one of the group emotion such as positive, neutral or negative. Our approach is based on ...
Group-Level Emotion Recognition Using Hybrid Deep Models Based on Faces, Scenes, Skeletons and Visual Attentions
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

This paper presents a hybrid deep learning network submitted to the 6th Emotion Recognition in the Wild (EmotiW 2018) Grand Challenge [9], in the category of group-level emotion recognition. Advanced deep learning models trained individually on faces, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

October 2018

687 pages

ISBN:9781450356923

DOI:10.1145/3242969

General Chairs:
Sidney K. D'Mello
University of Illinois, USA
,
Panayiotis (Panos) Georgiou
University of Southern California, USA
,
Stefan Scherer
University of Southern California, USA
,
Program Chairs:
Emily Mower Provost
University of Michigan, USA
,
Mohammad Soleymani
University of Southern California, USA
,
Marcelo Worsley
Northwestern University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: Specialist Interest Group in Computer-Human Interaction of the ACM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ICMI '18

Sponsor:

SIGCHI

ICMI '18: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

October 16 - 20, 2018

CO, Boulder, USA

Acceptance Rates

ICMI '18 Paper Acceptance Rate 63 of 149 submissions, 42%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
826
Total Downloads

Downloads (Last 12 months)108
Downloads (Last 6 weeks)18

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu JHuang X(2024)Group-Level Emotion Recognition Using Hierarchical Dual-Branch Cross Transformer with Semi-Supervised Learning2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI)10.1109/SEAI62072.2024.10674336(252-256)Online publication date: 21-Jun-2024
https://doi.org/10.1109/SEAI62072.2024.10674336
Das CNanda DChatterjee SDutta D(2024)Comparative Analysis of Neonatal Facial Expression from Images using Deep Learning2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT61001.2024.10725190(1-7)Online publication date: 24-Jun-2024
https://doi.org/10.1109/ICCCNT61001.2024.10725190
Cai JMeng ZKhan ALi ZO’Reilly JTong Y(2023)Probabilistic Attribute Tree Structured Convolutional Neural Networks for Facial Expression Recognition in the WildIEEE Transactions on Affective Computing10.1109/TAFFC.2022.315692014:3(1927-1941)Online publication date: 1-Jul-2023
https://doi.org/10.1109/TAFFC.2022.3156920
Sharma GDhall ACai J(2023)Audio-Visual Automatic Group Affect AnalysisIEEE Transactions on Affective Computing10.1109/TAFFC.2021.310417014:2(1056-1069)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TAFFC.2021.3104170
Veltmeijer EGerritsen CHindriks K(2023)Automatic Emotion Recognition for Groups: A ReviewIEEE Transactions on Affective Computing10.1109/TAFFC.2021.306572614:1(89-107)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TAFFC.2021.3065726
Rathod BVanazara RPandya D(2023)Improved Group Facial Expression Recognition Using Super-Resolved Local Facial Multi Scale Features2023 11th International Conference on Intelligent Systems and Embedded Design (ISED)10.1109/ISED59382.2023.10444587(1-6)Online publication date: 15-Dec-2023
https://doi.org/10.1109/ISED59382.2023.10444587
Rathod BVanzara RPandya D(2023)A recent survey on perceived group sentiment analysisJournal of Visual Communication and Image Representation10.1016/j.jvcir.2023.10398897(103988)Online publication date: Dec-2023
https://doi.org/10.1016/j.jvcir.2023.103988
Qi SLiu B(2023)A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videosPattern Analysis and Applications10.1007/s10044-023-01178-426:3(1493-1503)Online publication date: 18-Jun-2023
https://doi.org/10.1007/s10044-023-01178-4
Quiroz MPatiño RDiaz-Amado JCardinale Y(2022)Group Emotion Detection Based on Social Robot PerceptionSensors10.3390/s2210374922:10(3749)Online publication date: 14-May-2022
https://doi.org/10.3390/s22103749
Khan ALi ZCai JTong Y(2021)Regional Attention Networks with Context-aware Fusion for Group Emotion Recognition2021 IEEE Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV48630.2021.00119(1149-1158)Online publication date: Jan-2021
https://doi.org/10.1109/WACV48630.2021.00119
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten