research-article

Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild

Authors:

Qinglan WeiAuthors Info & Claims

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Pages 497 - 502

https://doi.org/10.1145/2818346.2830586

Published: 09 November 2015 Publication History

Abstract

In this paper, we describe our work in the third Emotion Recognition in the Wild (EmotiW 2015) Challenge. For each video clip, we extract MSDF, LBP-TOP, HOG, LPQ-TOP and acoustic features to recognize the emotions of film characters. For the static facial expression recognition based on video frame, we extract MSDF, DCNN and RCNN features. We train linear SVM classifiers for these kinds of features on the AFEW and SFEW dataset, and we propose a novel fusion network to combine all the extracted features at decision level. The final achievement we gained is 51.02% on the AFEW testing set and 51.08% on the SFEW testing set, which are much better than the baseline recognition rate of 39.33% and 39.13%.

References

[1]

Mehrabian, A. (1968). Communication without words. Psychological today,2, 53--55.

[2]

Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(12), 1424--1445.

Digital Library

[3]

Zhao, G., & Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(6), 915--928.

Digital Library

[4]

Päivärinta, J., Rahtu, E., & Heikkilä, J. (2011). Volume local phase quantization for blur-insensitive dynamic texture classification. In Image Analysis (pp. 360--369). Springer Berlin Heidelberg.

Digital Library

[5]

Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886--893). IEEE.

Digital Library

[6]

Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on (Vol. 2, pp. 1150--1157). Ieee.

Digital Library

[7]

Velusamy, S., Kannan, H., Anand, B., Sharma, A., & Navathe, B. (2011, May). A method to infer emotions from facial action units. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on (pp. 2028--2031). IEEE.

[8]

Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2012). Collecting Large, Richly Annotated Facial-Expression Databases from Movies. MultiMedia, IEEE, 19(3), 34--41.

Digital Library

[9]

Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011, November). Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on (pp. 2106--2112). IEEE.

[10]

Sikka, K., Wu, T., Susskind, J., & Bartlett, M. (2012, January). Exploring bag of words architectures in the facial expression domain. In Computer Vision-ECCV 2012. Workshops and Demonstrations (pp. 250--259). Springer Berlin Heidelberg.

Digital Library

[11]

Schuller, B., Steidl, S., & Batliner, A. (2009, September). The INTERSPEECH 2009 emotion challenge. In INTERSPEECH (Vol. 2009, pp. 312--315).

[12]

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097--1105).

Digital Library

[13]

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014, June). Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 580--587). IEEE.

Digital Library

[14]

Eyben, F., Wöllmer, M., & Schuller, B. (2010, October). Opensmile: the Munich versatile and fast open-source audio feature extractor. In Proceedings of the international conference on Multimedia (pp. 1459--1462). ACM.

Digital Library

[15]

Abhinav Dhall, O. V. Ramana Murthy, Roland Goecke, Jyoti Joshi and Tom Gedeom, Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015, ICMI 2015.

Digital Library

[16]

Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169--2178). IEEE.

Digital Library

[17]

Sikka, K., Wu, T., Susskind, J., & Bartlett, M. (2012, January). Exploring bag of words architectures in the facial expression domain. In Computer Vision-ECCV 2012. Workshops and Demonstrations (pp. 250--259). Springer Berlin Heidelberg.

Digital Library

[18]

Sikka, K., Dykstra, K., Sathyanarayana, S., Littlewort, G., & Bartlett, M. (2013, December). Multiple kernel learning for emotion recognition in the wild. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 517--524). ACM.

Digital Library

[19]

Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010, June). Locality-constrained linear coding for image classification. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 3360--3367). IEEE.

[20]

Vedaldi, A., & Fulkerson, B. (2010, October). VLFeat: An open and portable library of computer vision algorithms. In Proceedings of the international conference on Multimedia (pp. 1469--1472). ACM.

Digital Library

[21]

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278--2324.

[22]

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,521(7553), 436--444.

[23]

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 1--42.

Digital Library

[24]

Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 9, 1871--1874.

Digital Library

[25]

Sun, B., Li, L., Zuo, T., Chen, Y., Zhou, G., & Wu, X. (2014, November). Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 481--486). ACM.

Digital Library

[26]

Kahou, S. E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., ... & Bengio, Y. (2015). EmoNets: Multimodal deep learning approaches for emotion recognition in video. arXiv preprint arXiv:1503.01800.

[27]

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (pp. 675--678). ACM.

Digital Library

[28]

Varma, M., & Babu, B. R. (2009, June). More generality in efficient multiple kernel learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1065--1072). ACM.

Digital Library

Cited By

Ren FZhou YDeng JMatsumoto KFeng DShe TJiao ZLiu ZLi TNakagawa SKang X(2024)Tracking Emotions Using an Evolutionary Model of Mental State Transitions: Introducing a New ParadigmIntelligent Computing10.34133/icomputing.00753Online publication date: 8-Apr-2024
https://doi.org/10.34133/icomputing.0075
Wang ZXu GZhou XKim JZhu HDeng L(2024)Deep Tensor Evidence Fusion Network for Sentiment ClassificationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.319799411:4(4605-4613)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2022.3197994
Zhang YFei ZLi XZhou WFei M(2024)A method for recognizing facial expression intensity based on facial muscle variationsMultimedia Tools and Applications10.1007/s11042-024-19779-4Online publication date: 13-Jul-2024
https://doi.org/10.1007/s11042-024-19779-4
Show More Cited By

Index Terms

Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild
1. Computing methodologies

Recommendations

LSTM for dynamic emotion and group emotion recognition in the wild
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

In this paper, we describe our work in the fourth Emotion Recognition in the Wild (EmotiW 2016) Challenge. For video based emotion recognition sub-challenge, we extract acoustic features, LBPTOP, Dense SIFT and CNN-LSTM features to recognize the ...
Combining Multimodal Features with Hierarchical Classifier Fusion for Emotion Recognition in the Wild
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Emotion recognition in the wild is a very challenging task. In this paper, we investigate a variety of different multimodal features from video and audio to evaluate their discriminative ability to human emotion analysis. For each clip, we extract SIFT, ...
Video emotion recognition in the wild based on fusion of multimodal features
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

In this paper, we present our methods to the Audio-Video Based Emotion Recognition subtask in the 2016 Emotion Recognition in the Wild (EmotiW) Challenge. The task is to predict one of the seven basic emotions for the characters in the video clips ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

November 2015

678 pages

ISBN:9781450339124

DOI:10.1145/2818346

General Chairs:
Zhengyou Zhang
Microsoft Research, USA
,
Phil Cohen
VoiceBox Technologies, USA
,
Program Chairs:
Dan Bohus
Microsoft Research, USA
,
Radu Horaud
INRIA Grenoble Rhone-Alpes, France
,
Helen Meng
Chinese University of Hong Kong, China

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Fundamental Research Funds for the Central Universities of China
the National Education Science Twelfth Five-Year Plan Key Issues of the Ministry of Education

Conference

ICMI '15

Sponsor:

SIGCHI

ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

November 9 - 13, 2015

Washington, Seattle, USA

Acceptance Rates

ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

36
Total Citations
View Citations
399
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ren FZhou YDeng JMatsumoto KFeng DShe TJiao ZLiu ZLi TNakagawa SKang X(2024)Tracking Emotions Using an Evolutionary Model of Mental State Transitions: Introducing a New ParadigmIntelligent Computing10.34133/icomputing.00753Online publication date: 8-Apr-2024
https://doi.org/10.34133/icomputing.0075
Wang ZXu GZhou XKim JZhu HDeng L(2024)Deep Tensor Evidence Fusion Network for Sentiment ClassificationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.319799411:4(4605-4613)Online publication date: Aug-2024
https://doi.org/10.1109/TCSS.2022.3197994
Zhang YFei ZLi XZhou WFei M(2024)A method for recognizing facial expression intensity based on facial muscle variationsMultimedia Tools and Applications10.1007/s11042-024-19779-4Online publication date: 13-Jul-2024
https://doi.org/10.1007/s11042-024-19779-4
Joudeh ICretu ABouchard SGuimond S(2023)Prediction of Continuous Emotional Measures through Physiological and Visual DataSensors10.3390/s2312561323:12(5613)Online publication date: 15-Jun-2023
https://doi.org/10.3390/s23125613
Cai JMeng ZKhan ALi ZO’Reilly JTong Y(2023)Probabilistic Attribute Tree Structured Convolutional Neural Networks for Facial Expression Recognition in the WildIEEE Transactions on Affective Computing10.1109/TAFFC.2022.315692014:3(1927-1941)Online publication date: 1-Jul-2023
https://doi.org/10.1109/TAFFC.2022.3156920
Benssassi EYe J(2023)Investigating Multisensory Integration in Emotion Recognition Through Bio-Inspired Computational ModelsIEEE Transactions on Affective Computing10.1109/TAFFC.2021.310625414:2(906-918)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TAFFC.2021.3106254
Chen XXie HLi ZCheng GLeng MWang F(2023)Information fusion and artificial intelligence for smart healthcareInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10311360:1Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.ipm.2022.103113
Zhu XLi ZSun J(2022)Expression recognition method combining convolutional features and TransformerMathematical Foundations of Computing10.3934/mfc.2022018(0)Online publication date: 2022
https://doi.org/10.3934/mfc.2022018
Siddiqui MDhakal PYang XJavaid A(2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
https://doi.org/10.3390/mti6060047
Filali HRiffi JBoulealam CMahraz MTairi H(2022)Multimodal Emotional Classification Based on Meaningful LearningBig Data and Cognitive Computing10.3390/bdcc60300956:3(95)Online publication date: 8-Sep-2022
https://doi.org/10.3390/bdcc6030095
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents