Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3338533.3366591acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Multi-Feature Fusion for Multimodal Attentive Sentiment Analysis

Published: 10 January 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Sentiment analysis has been an interesting and challenging task, researchers mostly pay attention to single-modal (image or text) emotion recognition, less attention is paid to joint analysis of multi-modal data. Most existing multi-modal sentiment analysis algorithms combined with attention mechanism focus only on local area of images, ignore the emotional information provided by the global features of the image. Motivated by the research status quo, in this paper, we proposed a novel multi-modal sentiment analysis model, which focuses on local attentive feature also on the global contextual feature from image, then a novel feature fusion mechanism is utilized to fuse features from different modal. In our proposed model, we use a convolutional neural network (CNN) to extract the region maps of images, and use the attention mechanism to acquire attention coefficient, then use a CNN with fewer hidden layers to extract the global feature, a long-short term memory model (LSTM) is utilized to extract textual feature. Finally, a tensor fusion network (TFN) is utilized to fuse all features from different modal. Extensive experiments are conducted on both weakly labeled and manually labeled datasets, and the results demonstrate the superiority of the proposed method.

    References

    [1]
    Wang, Yiming, Rao, Yuan & Wu, Lianwei. (2017). A Review of Sentiment Semantic Analysis Technology and Progress. 10.1109/CIS.2017.00105. 452--455.
    [2]
    You, Quanzeng., Jin, Hailin., & Luo, Jiebo. (2017). Visual Sentiment Analysis by Attending on Local Image Region. In AAAI Conference on Artificial Intelligence. Retrieved from https://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14964
    [3]
    Kosti, Ronak, Alvarez, Jose M,. Recasens, Adria & Lapedriza, Àgata. (2017). Emotion Recognition in Context. 10.1109/CVPR.2017.212. 1960--1968
    [4]
    Morency, Louis-Philippe, Mihalcea, Rada, Doshi, Payal. (2011). Towards Multi-modal Sentiment Analysis: Harvesting Opinions from the Web. ICMI'11 -- In Proceedings of the 2011 ACM International Conference on Multi-modal Interaction. 10.1145/2070481.2070509. 169--176.
    [5]
    Chen, Minghai & Wang, Sen & Pu Liang, Paul & Baltrusaitis, Tadas & Zadeh, Amir & Morency, Louis-Philippe. (2017). Multi-modal sentiment analysis with word-level fusion and reinforcement learning. 10.1145/3136755.3136801. 163--171.
    [6]
    Zadeh, Amir & Chen, Minghai & Poria, Soujanya & Cambria, Erik & Morency, Louis-Philippe. (2017). Tensor Fusion Network for Multi-modal Sentiment Analysis.
    [7]
    Poria, S., Cambria, E., Hussain, A., & Huang, G. B. (2015). Towards an intelligent framework for multi-modal affective data analysis. Neural Networks, 63, 104--116.
    [8]
    Baecchi, C., Uricchio, T., Bertini, M., & Del Bimbo, A. (2016). A multi-modal feature learning approach for sentiment analysis of social network multimedia. Multimedia Tools and Applications, 75(5), 2507--2525.
    [9]
    Shah, R. R., Yu, Y., Verma, A., Tang, S., Shaikh, A. D., & Zimmermann, R. (2016). Leveraging multi-modal information for event summarization and concept-level sentiment analysis. Knowledge-Based Systems, S0950705116301101.
    [10]
    You, Q., Luo, J., Jin, H., & Yang, J. (2016). Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. 13--22
    [11]
    Borth, D., Ji, R., Chen, T., Breuel, T., & Chang, S. F. (2013). Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia. ACM.
    [12]
    You, Q. (2016). Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks. ACM.
    [13]
    Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Computer Science.
    [14]
    Hu, Anthony & Flaxman, Seth. (2018). Multi-modal Sentiment Analysis To Explore the Structure of Emotions. 10.1145/3219819.3219853.
    [15]
    Machajdik, Jana & Hanbury, Allan. (2010). Affective image classification using features inspired by psychology and art theory. 10.1145/1873951.1873965. 83--92.
    [16]
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from over-fitting. Journal of Machine Learning Research, 15(1), 1929--1958.
    [17]
    Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (2014). GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP).
    [18]
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., & Anguelov, D., et al. (2015). Going Deeper with Convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

    Index Terms

    1. Multi-Feature Fusion for Multimodal Attentive Sentiment Analysis

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MMAsia '19: Proceedings of the 1st ACM International Conference on Multimedia in Asia
      December 2019
      403 pages
      ISBN:9781450368414
      DOI:10.1145/3338533
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 January 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Attention mechanism
      2. Feature fusion
      3. Natural language processing
      4. Sentiment analysis

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      MMAsia '19
      Sponsor:
      MMAsia '19: ACM Multimedia Asia
      December 15 - 18, 2019
      Beijing, China

      Acceptance Rates

      MMAsia '19 Paper Acceptance Rate 59 of 204 submissions, 29%;
      Overall Acceptance Rate 59 of 204 submissions, 29%

      Upcoming Conference

      MM '24
      The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 225
        Total Downloads
      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)2

      Other Metrics

      Citations

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media