Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2818346.2830586acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild

Published: 09 November 2015 Publication History

Abstract

In this paper, we describe our work in the third Emotion Recognition in the Wild (EmotiW 2015) Challenge. For each video clip, we extract MSDF, LBP-TOP, HOG, LPQ-TOP and acoustic features to recognize the emotions of film characters. For the static facial expression recognition based on video frame, we extract MSDF, DCNN and RCNN features. We train linear SVM classifiers for these kinds of features on the AFEW and SFEW dataset, and we propose a novel fusion network to combine all the extracted features at decision level. The final achievement we gained is 51.02% on the AFEW testing set and 51.08% on the SFEW testing set, which are much better than the baseline recognition rate of 39.33% and 39.13%.

References

[1]
Mehrabian, A. (1968). Communication without words. Psychological today,2, 53--55.
[2]
Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(12), 1424--1445.
[3]
Zhao, G., & Pietikainen, M. (2007). Dynamic texture recognition using local binary patterns with an application to facial expressions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(6), 915--928.
[4]
Päivärinta, J., Rahtu, E., & Heikkilä, J. (2011). Volume local phase quantization for blur-insensitive dynamic texture classification. In Image Analysis (pp. 360--369). Springer Berlin Heidelberg.
[5]
Dalal, N., & Triggs, B. (2005, June). Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on (Vol. 1, pp. 886--893). IEEE.
[6]
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on (Vol. 2, pp. 1150--1157). Ieee.
[7]
Velusamy, S., Kannan, H., Anand, B., Sharma, A., & Navathe, B. (2011, May). A method to infer emotions from facial action units. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on (pp. 2028--2031). IEEE.
[8]
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2012). Collecting Large, Richly Annotated Facial-Expression Databases from Movies. MultiMedia, IEEE, 19(3), 34--41.
[9]
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011, November). Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on (pp. 2106--2112). IEEE.
[10]
Sikka, K., Wu, T., Susskind, J., & Bartlett, M. (2012, January). Exploring bag of words architectures in the facial expression domain. In Computer Vision-ECCV 2012. Workshops and Demonstrations (pp. 250--259). Springer Berlin Heidelberg.
[11]
Schuller, B., Steidl, S., & Batliner, A. (2009, September). The INTERSPEECH 2009 emotion challenge. In INTERSPEECH (Vol. 2009, pp. 312--315).
[12]
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097--1105).
[13]
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014, June). Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 580--587). IEEE.
[14]
Eyben, F., Wöllmer, M., & Schuller, B. (2010, October). Opensmile: the Munich versatile and fast open-source audio feature extractor. In Proceedings of the international conference on Multimedia (pp. 1459--1462). ACM.
[15]
Abhinav Dhall, O. V. Ramana Murthy, Roland Goecke, Jyoti Joshi and Tom Gedeom, Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015, ICMI 2015.
[16]
Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on (Vol. 2, pp. 2169--2178). IEEE.
[17]
Sikka, K., Wu, T., Susskind, J., & Bartlett, M. (2012, January). Exploring bag of words architectures in the facial expression domain. In Computer Vision-ECCV 2012. Workshops and Demonstrations (pp. 250--259). Springer Berlin Heidelberg.
[18]
Sikka, K., Dykstra, K., Sathyanarayana, S., Littlewort, G., & Bartlett, M. (2013, December). Multiple kernel learning for emotion recognition in the wild. In Proceedings of the 15th ACM on International conference on multimodal interaction (pp. 517--524). ACM.
[19]
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010, June). Locality-constrained linear coding for image classification. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 3360--3367). IEEE.
[20]
Vedaldi, A., & Fulkerson, B. (2010, October). VLFeat: An open and portable library of computer vision algorithms. In Proceedings of the international conference on Multimedia (pp. 1469--1472). ACM.
[21]
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278--2324.
[22]
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature,521(7553), 436--444.
[23]
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 1--42.
[24]
Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 9, 1871--1874.
[25]
Sun, B., Li, L., Zuo, T., Chen, Y., Zhou, G., & Wu, X. (2014, November). Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 481--486). ACM.
[26]
Kahou, S. E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., ... & Bengio, Y. (2015). EmoNets: Multimodal deep learning approaches for emotion recognition in video. arXiv preprint arXiv:1503.01800.
[27]
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (pp. 675--678). ACM.
[28]
Varma, M., & Babu, B. R. (2009, June). More generality in efficient multiple kernel learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1065--1072). ACM.

Cited By

View all
  • (2024)Tracking Emotions Using an Evolutionary Model of Mental State Transitions: Introducing a New ParadigmIntelligent Computing10.34133/icomputing.00753Online publication date: 8-Apr-2024
  • (2024)Deep Tensor Evidence Fusion Network for Sentiment ClassificationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.319799411:4(4605-4613)Online publication date: Aug-2024
  • (2024)A method for recognizing facial expression intensity based on facial muscle variationsMultimedia Tools and Applications10.1007/s11042-024-19779-4Online publication date: 13-Jul-2024
  • Show More Cited By

Index Terms

  1. Combining Multimodal Features within a Fusion Network for Emotion Recognition in the Wild

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction
        November 2015
        678 pages
        ISBN:9781450339124
        DOI:10.1145/2818346
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 09 November 2015

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. emotion recognition
        2. fusion network
        3. multimodal features

        Qualifiers

        • Research-article

        Funding Sources

        • the Fundamental Research Funds for the Central Universities of China
        • the National Education Science Twelfth Five-Year Plan Key Issues of the Ministry of Education

        Conference

        ICMI '15
        Sponsor:
        ICMI '15: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION
        November 9 - 13, 2015
        Washington, Seattle, USA

        Acceptance Rates

        ICMI '15 Paper Acceptance Rate 52 of 127 submissions, 41%;
        Overall Acceptance Rate 453 of 1,080 submissions, 42%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)22
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 22 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Tracking Emotions Using an Evolutionary Model of Mental State Transitions: Introducing a New ParadigmIntelligent Computing10.34133/icomputing.00753Online publication date: 8-Apr-2024
        • (2024)Deep Tensor Evidence Fusion Network for Sentiment ClassificationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.319799411:4(4605-4613)Online publication date: Aug-2024
        • (2024)A method for recognizing facial expression intensity based on facial muscle variationsMultimedia Tools and Applications10.1007/s11042-024-19779-4Online publication date: 13-Jul-2024
        • (2023)Prediction of Continuous Emotional Measures through Physiological and Visual DataSensors10.3390/s2312561323:12(5613)Online publication date: 15-Jun-2023
        • (2023)Probabilistic Attribute Tree Structured Convolutional Neural Networks for Facial Expression Recognition in the WildIEEE Transactions on Affective Computing10.1109/TAFFC.2022.315692014:3(1927-1941)Online publication date: 1-Jul-2023
        • (2023)Investigating Multisensory Integration in Emotion Recognition Through Bio-Inspired Computational ModelsIEEE Transactions on Affective Computing10.1109/TAFFC.2021.310625414:2(906-918)Online publication date: 1-Apr-2023
        • (2023)Information fusion and artificial intelligence for smart healthcareInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10311360:1Online publication date: 1-Jan-2023
        • (2022)Expression recognition method combining convolutional features and TransformerMathematical Foundations of Computing10.3934/mfc.2022018(0)Online publication date: 2022
        • (2022)A Survey on Databases for Multimodal Emotion Recognition and an Introduction to the VIRI (Visible and InfraRed Image) DatabaseMultimodal Technologies and Interaction10.3390/mti60600476:6(47)Online publication date: 17-Jun-2022
        • (2022)Multimodal Emotional Classification Based on Meaningful LearningBig Data and Cognitive Computing10.3390/bdcc60300956:3(95)Online publication date: 8-Sep-2022
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media