One-Shot Only Real-Time Video Classification: A Case Study in Facial Emotion Recognition

Basbrain, Arwa; Gan, John Q.

doi:10.1007/978-3-030-62362-3_18

Arwa Basbrain^12,13 &
John Q. Gan¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12489))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

900 Accesses
1 Citations

Abstract

Video classification is an important research field due to its applications ranging from human action recognition for video surveillance to emotion recognition for human-computer interaction. This paper proposes a new method called One-Shot Only (OSO) for real-time video classification with a case study in facial emotion recognition. Instead of using 3D convolutional neural networks (CNN) or multiple 2D CNNs with decision fusion as in the previous studies, the OSO method tackles video classification as a single image classification problem by spatially rearranging video frames using frame selection or clustering strategies to form a simple representative storyboard for spatio-temporal video information fusion. It uses a single 2D CNN for video classification and thus can be optimised end-to-end directly in terms of the classification accuracy. Experimental results show that the OSO method proposed in this paper outperformed multiple 2D CNNs with decision fusion by a large margin in terms of classification accuracy (by up to 13%) on the AFEW 7.0 dataset for video classification. It is also very fast, up to ten times faster than the commonly used 2D CNN architectures for video classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

User-generated video emotion recognition based on key frames

Article 22 January 2021

Emotion Categorization from Video-Frame Images Using a Novel Sequential Voting Technique

3D-CNN for Facial Emotion Recognition in Videos

References

Kim, B.-K., Roh, J., Dong, S.-Y., Lee, S.-Y.: Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J. Multimodal User Interfaces 10(2), 173–189 (2016). https://doi.org/10.1007/s12193-015-0209-0
Article Google Scholar
Liu, C., Tang, T., Lv, K., Wang, M.: Multi-feature based emotion recognition for video clips. In: Proceedings of the ACM International Conference on Multimodal Interaction, pp. 630–634. ACM, Boulder (2018)
Google Scholar
Lu, C., Zheng, W., Li, C., Tang, C., Liu, S., Yan, S., Zong, Y.: Multiple spatio-temporal feature learning for video-based emotion recognition in the wild. In: Proceedings of the ACM International Conference on Multimodal Interaction, pp. 646–652. ACM, Boulder (2018)
Google Scholar
Knyazev, B., Shvetsov, R., Efremova, N., Kuharenko, A.: Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598 (2017)
Bargal, S.A., Barsoum, E., Ferrer, C.C., Zhang, C.: Emotion recognition in the wild from videos using images. In: Proceedings of the ACM International Conference on Multimodal Interaction, pp. 433–436. ACM, Tokyo (2016)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Google Scholar
Jing, L., Yang, X., Tian, Y.: Video you only look once: overall temporal convolutions for action recognition. J. Vis. Commun. Image Representation 52, 58–65 (2018)
Article Google Scholar
Samadiani, N., Huang, G., Cai, B., Luo, W., Chi, C.-H., Xiang, Y., He, J.: A review on automatic facial expression recognition systems assisted by multimodal sensor data. Sensors 19, 1863 (2019)
Article Google Scholar
Ekman, P., Friesen, W.V.: Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 17, 124–129 (1971)
Article Google Scholar
Kahou, S.E., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM International conference on multimodal interaction, pp. 543–550. ACM, Sydney (2013)
Google Scholar
Dhall, A., Goecke, R., Joshi, J., Wagner, M., Gedeon, T.: Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 509–516. ACM, Sydney (2013)
Google Scholar
Sikka, K., Dykstra, K., Sathyanarayana, S., Littlewort, G., Bartlett, M.: Multiple kernel learning for emotion recognition in the wild. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 517–524. ACM, Sydney (2013)
Google Scholar
Liu, M., Wang, R., Huang, Z., Shan, S., Chen, X.: Partial least squares regression on grassmannian manifold for emotion recognition. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, pp. 525–530. ACM, Sydney (2013)
Google Scholar
Chen, J., Chen, Z., Chi, Z., Fu, H.: Facial expression recognition in video with multiple feature fusion. IEEE Trans. Affect. Comput. 9, 38–50 (2018)
Article Google Scholar
Dhall, A., Murthy, O.V.R., Goecke, R., Joshi, J., Gedeon, T.: Video and image based emotion recognition challenges in the wild: EmotiW 2015. In: Proceedings of the ACM on International Conference on Multimodal Interaction, pp. 423–426. ACM, Seattle (2015)
Google Scholar
Yang, B., Cao, J., Ni, R., Zhang, Y.: Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access 6, 4630–4640 (2018)
Article Google Scholar
Doherty, A.R., Byrne, D., Smeaton, A.F., Jones, G.J.F., Hughes, M.: Investigating keyframe selection methods in the novel domain of passively captured visual lifelogs. In: Proceedings of the International Conference on Content-based Image and Video Retrieval, pp. 259–268. ACM, Niagara Falls (2008)
Google Scholar
Guo, S.M., Pan, Y.A., Liao, Y.C., Hsu, C.Y., Tsai, J.S.H., Chang, C.I.: A key frame selection-based facial expression recognition system. In: Proceedings of ICICIC 2006 Innovative Computing, Information and Control, pp. 341–344 (2006)
Google Scholar
Zhang, Q., Yu, S.-P., Zhou, D.-S., Wei, X.-P.: An efficient method of key-frame extraction based on a cluster algorithm. J. Hum. Kinet. 39, 5–14 (2013)
Article Google Scholar
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10, 18–31 (2019)
Article Google Scholar
Dhall, A., Goecke, R., Lucey, S., Gedeon, T.: Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed. 19, 34–41 (2012)
Article Google Scholar
Shi, J., Tomasi, C.: Good features to track. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)
Google Scholar
Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical report, Carnegie Mellon University (1991)
Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23, 1499–1503 (2016)
Article Google Scholar
Ouyang, X., et al.: Audio-visual emotion recognition using deep transfer learning and multiple temporal models. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 577–582. ACM, Glasgow (2017)
Google Scholar
Fan, Y., Lu, X., Li, D., Liu, Y.: Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 445–450. ACM, Tokyo (2016)
Google Scholar
Vielzeuf, V., Pateux, S., Jurie, F.: Temporal multimodal fusion for video emotion classification in the wild. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction, pp. 569–576. ACM, Glasgow (2017)
Google Scholar
Fan, Y., Lam, Jacqueline C.K., Li, Victor O.K.: Multi-region ensemble convolutional neural network for facial expression recognition. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11139, pp. 84–94. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01418-6_9
Chapter Google Scholar
Yan, J., et al.: Multi-clue fusion for emotion recognition in the wild. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 458–463. ACM, Tokyo (2016)
Google Scholar
Ding, W., et al.: Audio and face video emotion recognition in the wild using deep neural networks and small datasets. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 506–513. ACM, Tokyo (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK
Arwa Basbrain & John Q. Gan
Faculty of Computing and Information Technology, King Abdul-Aziz University, Jeddah, Kingdom of Saudi Arabia
Arwa Basbrain

Authors

Arwa Basbrain
View author publications
You can also search for this author in PubMed Google Scholar
John Q. Gan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Arwa Basbrain or John Q. Gan .

Editor information

Editors and Affiliations

University of Minho, Braga, Portugal
Cesar Analide
University of Minho, Braga, Portugal
Paulo Novais
Technical University of Madrid, Madrid, Spain
David Camacho
University of Manchester, Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Basbrain, A., Gan, J.Q. (2020). One-Shot Only Real-Time Video Classification: A Case Study in Facial Emotion Recognition. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2020. IDEAL 2020. Lecture Notes in Computer Science(), vol 12489. Springer, Cham. https://doi.org/10.1007/978-3-030-62362-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-62362-3_18
Published: 27 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62361-6
Online ISBN: 978-3-030-62362-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

One-Shot Only Real-Time Video Classification: A Case Study in Facial Emotion Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

User-generated video emotion recognition based on key frames

Emotion Categorization from Video-Frame Images Using a Novel Sequential Voting Technique

3D-CNN for Facial Emotion Recognition in Videos

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

One-Shot Only Real-Time Video Classification: A Case Study in Facial Emotion Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

User-generated video emotion recognition based on key frames

Emotion Categorization from Video-Frame Images Using a Novel Sequential Voting Technique

3D-CNN for Facial Emotion Recognition in Videos

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation