Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A social emotion classification approach using multi-model fusion

Published: 01 January 2020 Publication History

Abstract

With the proliferation of the online video publishing, the number of multimodal contents on the Internet has exponentially grown. Research of emotion analysis has developed from the traditional single-mode to complex multimode analysis. Most recent studies have paid rare attention to the visual emotion information deriving from merging visual and audio emotional information at the feature or decision level, even though some of them considered the multimodality analysis. In this paper, we extract visual, textual, and audio information from video and propose a multimodal emotional classification framework to capture the emotions of users in social networks. We have designed a 3DCLS (3D Convolutional-Long Short Term Memory) hybrid model that classifies visual emotions as well as a CNN–RNN hybrid model that classifies text-based emotions. Finally, visual, audio and text modes are combined to generate final emotional classification results. Experiments on the MOUD and IEMOCAP emotion datasets show that the proposed framework outperforms existing models in multimodal mood analysis.

Highlights

This paper proposes a 3DCLS model to establish spatio-temporal information for emotion recognition tasks through the concatenation of deep 3-dimensional convolutional networks and convolutional long-term and short-term memory recurrent neural network.
Introduced the CNN–RNN hybrid model, using CNN to extract the emotional features in the text, and RNN classified the extracted features.
Construct a multi-mode fusion framework and use the MKL fusion method to heterogeneously integrate visual, textual, and audio information.

References

[1]
Balazs J., Velsquez J., Opinion mining and information fusion: A survey, Inf. Fusion 27 (2016) 95–110.
[2]
Sun S., Luo C., Chen J., A review of natural language processing techniques for opinion mining systems, Inf. Fusion 36 (2016) 10–25.
[3]
Morency L.-P., Mihalcea R., Doshi P., Towards multimodal sentiment analysis: harvesting opinions from the web, in: Proceedings of the 13th International Conference on Multimodal Interfaces, ACM, 2011, pp. 169–176.
[4]
Poria S., Cambria E., Bajpai R., A review of affective computing: From unimodal analysis to multimodal fusion, Inf. Fusion 37 (2017) 98–125.
[5]
S.E. Kahou, P. Froumenty, C. Pal, Facial expression analysis based on high dimensional binary features, in: ECCV Workshop on Computer Vision with Local Binary Patterns Variants, Zurich, Switzerland, 2014.
[6]
Shan C., Gong S., McOwan P.W., Facial expression recognition based on local binary patterns: A comprehensive study, Image Vis. Comput. 27 (6) (2009) 803–816.
[7]
Krizhevsky A., Sutskever I., Hinton G.E., Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
[8]
Kalchbrenner N., Grefenstette E., Blunsom P., A convolutional neural network for modelling sentences, 2014, arXiv:1404.2188.
[9]
Tran D., Bourdev L., Fergus R., Torresani L., Paluri M., Learning spatiotemporal features with 3D convolutional networks, in: 2015 IEEE International Conference on Computer Vision, ICCV, IEEE, 2015, pp. 4489–4497.
[10]
Mishne G., Experiments with mood classification in blog posts, 2005.
[11]
Oneto L., Bisio F., Cambria E., Statistical learning theory and ELM for big social data analysis, IEEE Comput. Intell. Mag. 11 (3) (2016) 45–55.
[12]
Pang B., Lee L., Vaithyanathan S., Thumbs up? sentiment classification using machine learning techniques, in: Proceedings of the EMNLP, ACL, 2002, pp. 79–86.
[13]
R. Socher, A. Perelygin, J.Y. Wu, Recursive deep models for semantic compositionality over a sentiment tree-bank, in: Proceedings of EMNLP, Vol. 1631, 2013, pp. 1642–1654.
[14]
I. Titov, R. Mcdonald, Modeling online reviews with multi-grain topic models, in: Proceedings of the 17th international conference on World Wide Web, 41 (1) (2008) 111–120.
[15]
T. Ivan, R. Mcdonald, A joint model of text and aspect ratings for sentiment summarization, in: Proc. ACL-08: HLT, 2008, pp. 308–316.
[16]
Melville P., Gryc W., Lawrence R.D., Sentiment analysis of blogs by combining lexical knowledge with text classification, in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2009, pp. 1275–1284.
[17]
X. Hu, J. Tang, H. Gao, Unsupervised sentiment analysis with emotional signals, in: Proceedings of the WWW, 2013, pp. 607–618.
[18]
Tai K.S., Socher R., Manning C.-D., Improved semantic representations from tree-structured long short-term memory networks, Comput. Sci. 5 (1) (2015) 36.
[19]
Lei T., Barzilay R., Jaakkola T., Molding CNNs for text: non-linear, non-consecutive convolutions, 2015, arXiv preprint arXiv:1508.04112.
[20]
Zhou C., Sun C., Liu Z., A C-LSTM neural network for text classification, Comput. Sci. 1 (4) (2015) 39–44.
[21]
Eyben F., Wollmer M., Schuller B., Openear introducing the munich opensource emotion and affect recognition toolkit, in: 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, IEEE, 2009, pp. 1–6.
[22]
Navas E., Hernaez I., Luengo I., An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS, IEEE Trans. Audio Speech Lang. Process. 14 (4) (2006) 1117–1127.
[23]
Huang Z.-W., Dong M., Mao Q.-R., Speech emotion recognition using CNN, in: Proceedings of the ACM International Conference on Multimedia, ACM, 2014, pp. 801–804.
[24]
C.-C. Chiu, Y.-L. Chang, Y.-J. Lai, The analysis and recognition of human vocal emotions, in: Proc. International Computer Symposium, pp. 83–88.
[25]
Ekman P., Friesen W.V., Constants across cultures in the face and emotion, J. Pers. Soc. Psychol. 17 (2) (1971) 124–129.
[26]
P. Ekman, Universal facial expressions of emotion, in: Culture and Personality: Contemporary Readings, Chicago, 1974.
[27]
Krizhevsky A., Sutskever I., Hinton G.E., Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
[28]
Xu C., Cetintas S., Lee K.-C., Li L.-J., Visual sentiment prediction with deep convolutional neural networks, 2014, arXiv preprint arXiv:1411.5731.
[29]
Baveye Y., Dellandrea E., Chamaret C., LIRIS-ACCEDE: A video database for affective content analysis, IEEE Trans. Affect. Comput. 6 (1) (2015) 43–55.
[30]
P. Liu, S. Han, Z. Meng, Facial expression recognition via a boosted deep belief network, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1805–1812.
[31]
Tran D., Bourdev L., Fergus R., Learning spatiotemporal features with 3D convolutional networks, 2014, arXiv preprint arXiv:1412.0767.
[32]
Poria S., Peng H., Hussain A., Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis, Neurocomputing 261 (2017) 217–230.
[33]
Glodek M., Reuter S., Schels M., Kalman filter based classifier fusion for affective state recognition, in: Multiple Classifier Systems, Springer, 2013, pp. 85–94.
[34]
Dobrisek S., Gajsek R., Mihelic F., Towards efficient multimodal emotion recognition, Int. J. Adv. Robot. Syst. 10 (1) (2013) 53.
[35]
Wollmer M., Weninger F., Knaup T., Youtube movie reviews: sentiment analysis in an audio-visual context, IEEE Intell. Syst. 28 (3) (2013) 46–53.
[36]
Qiu H., Qiu M., Lu Z.h., et al., An efficient key distribution system for data fusion in V2X heterogeneous networks, Inf. Fusion 50 (2019) 212–220.
[37]
Qiu H., Noura H., Qiu M., et al., A user-centric data protection method for cloud storage based on invertible DWT, IEEE Trans. Cloud Comput. (2019) 1–1.
[38]
Qiu M., Zhang K., Huang M., An empirical study of web interface design on small display devices, in: IEEE/WIC/ACM International Conference on Web Intelligence, WI’04, IEEE, 2004.
[39]
P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, J. Kautz, Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks, in: Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4207–4215.
[40]
X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, W.-C. Woo, Convolutional LSTM network: A machine learning approach for precipitation nowcasting, in: Proc. Adv. Neural Inf. Process. Syst. 2015, pp. 802–810.
[41]
Morency L.-P., Mihalcea R., Doshi P., Towards multimodal sentiment analysis: Harvesting opinions from the web, in: Proceedings of the 13th International Conference on Multimodal Interfaces, ACM, 2011, pp. 169–176.
[42]
Gu Y., Chanussot J., Jia X., et al., Multiple kernel learning for hyperspectral image classification: A review, IEEE Trans. Geosci. Remote Sens. (2017) 1–19.
[43]
Busso C., Bulut M., Lee C.-C., IEMOCAP: Interactive emotional dyadic motion capture database, Lang. Resour. Eval. 42 (4) (2008) 335.
[44]
Lucey P., Cohn J.F., Kanade T., Saragih J., Ambadar Z., Matthews I., The extended Cohn–Kanade dataset (CK+): A complete dataset for action unit and emotion-specified expression, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, Vol. 5, IEEE, 2010, pp. 94–101.

Cited By

View all
  • (2024)A Survey on Variational Autoencoders in Recommender SystemsACM Computing Surveys10.1145/366336456:10(1-40)Online publication date: 24-Jun-2024
  • (2024)EmoWear: Exploring Emotional Teasers for Voice Message Interaction on SmartwatchesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642101(1-16)Online publication date: 11-May-2024
  • (2024)Amount-Based Covert Communication Over BlockchainIEEE Transactions on Network and Service Management10.1109/TNSM.2024.335801321:3(3095-3111)Online publication date: 1-Jun-2024
  • Show More Cited By

Index Terms

  1. A social emotion classification approach using multi-model fusion
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Future Generation Computer Systems
            Future Generation Computer Systems  Volume 102, Issue C
            Jan 2020
            1062 pages

            Publisher

            Elsevier Science Publishers B. V.

            Netherlands

            Publication History

            Published: 01 January 2020

            Author Tags

            1. Multimodal fusion
            2. Emotion analysis
            3. 3D convolutional neural network
            4. Recurrent neural network

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 15 Oct 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)A Survey on Variational Autoencoders in Recommender SystemsACM Computing Surveys10.1145/366336456:10(1-40)Online publication date: 24-Jun-2024
            • (2024)EmoWear: Exploring Emotional Teasers for Voice Message Interaction on SmartwatchesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642101(1-16)Online publication date: 11-May-2024
            • (2024)Amount-Based Covert Communication Over BlockchainIEEE Transactions on Network and Service Management10.1109/TNSM.2024.335801321:3(3095-3111)Online publication date: 1-Jun-2024
            • (2023)Ensemble Convolution Neural Network for Robust Video Emotion Recognition Using Deep SemanticsScientific Programming10.1155/2023/68592842023Online publication date: 1-Jan-2023
            • (2023)A Survey of Textual Emotion Recognition and Its ChallengesIEEE Transactions on Affective Computing10.1109/TAFFC.2021.305327514:1(49-67)Online publication date: 1-Jan-2023
            • (2023)Speech emotion recognition approachesSpeech Communication10.1016/j.specom.2023.102974154:COnline publication date: 1-Oct-2023
            • (2023)Modality-invariant temporal representation learning for multimodal sentiment classificationInformation Fusion10.1016/j.inffus.2022.10.03191:C(504-514)Online publication date: 1-Mar-2023
            • (2023)Textual emotion detection utilizing a transfer learning approachThe Journal of Supercomputing10.1007/s11227-023-05168-579:12(13075-13089)Online publication date: 22-Mar-2023
            • (2022)Cross-modal distillation with audio–text fusion for fine-grained emotion classification using BERT and Wav2vec 2.0Neurocomputing10.1016/j.neucom.2022.07.035506:C(168-183)Online publication date: 28-Sep-2022
            • (2022)Automated emotion recognitionComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2022.106646215:COnline publication date: 1-Mar-2022
            • Show More Cited By

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media