Abstract
As multimodal data become increasingly popular on social media platforms, it is desirable to enhance text-based approaches with other important data sources (e.g. images) for the Sentiment Classification of social media posts. However, existing approaches primarily rely on the textual content or are designed for the coarse-grained Multimodal Sentiment Classification. In this paper, we propose a recurrent attention network (called SaliencyBERT) over the BERT architecture for Target-oriented Multimodal Sentiment Classification (TMSC). Specifically, we first adopt BERT and ResNet to capture the intra-modality dynamics with the textual content and the visual information respectively. Then, we design a recurrent attention mechanism, which can derive target-sensitive visual representations, to capture the inter-modality dynamics. With recurrent attention, our model can progressively optimize the alignment of target-sensitive textual features and visual features and produce an output after a fixed number of time steps. Finally, we combine the loss of all-time steps for deep supervision to prevent converging slower and overfitting. Our empirical results show that the proposed model consistently outperforms single modal methods and achieves an indistinguishable or even better performance on several highly competitive methods on two multimodal datasets from Twitter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wagner, J., et al.: DCU: aspect-based polarity classification for semeval task 4. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 223–229 (2014)
Kiritchenko, S., Zhu, X., Cherry, C., Mohammad, S.: Nrc-canada-2014: detecting aspects and sentiment in customer reviews. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 437–442 (2014)
Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., Xu, K.: Adaptive recursive neural network for target-dependent twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (volume 2: Short papers), pp. 49–54 (2014)
Nguyen, T. H., & Shirai, K.: PhraseRNN: phrase recursive neural network for aspect-based sentiment analysis. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 2509–2514 (2015)
Ma, D., Li, S., Zhang, X., Wang, H.: Interactive attention networks for aspect-level sentiment classification. In: Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 4068–4074 (2017)
Li, C., Guo, X., Mei, Q.: Deep memory networks for attitude identification. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 671–680 (2017)
Xue, W., Li, T.: Aspect based sentiment analysis with gated convolutional networks. In: Proceedings Annual Meeting Association for Computational Linguistics, pp. 2514–2523 (2018)
Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017)
Zadeh, A., Liang, P.P., Mazumder, N., Poria, S., Cambria, E., Morency, L.P.: Memory fusion network for multi-view sequential learning. In: AAAI, pp. 5634–5641 (2018)
Vaswani, A., et al.: Attention is all you need. In: Proceedings Neural Information Processing System, pp. 5998–6008 (2017)
Li, J., Qiu, L.: A Sentiment Analysis Method of Short Texts in Microblog. A Sentiment Analysis Method of Short Texts in Microblog. IEEE Computer Society (2017)
Fan, X., Li, X., Du, F., Li, X., Wei, M.: Apply word vectors for sentiment analysis of APP reviews. In: 2016 3rd International Conference on Systems and Informatics, ICSAI 2016, 2017, no. Icsai, pp. 1062–1066 (2016)
Tang, D., Wei, F., Qin, B., Liu, T., Zhou, M.: Coooolll: a deep learning system for twitter sentiment classification. In: Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), pp. 208–212 (2014)
Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. WIREs Data Mining Knowl. Discov. 8(4), e1253 (2018)
Tang, D., Qin, B., Feng, X., and Liu, T.: Effective LSTMs for target-dependent sentiment classification. In: Computer Conference, pp. 3298–3307 (2015)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings International Conference Learning Representation, pp. 1–15 (2014)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: Proceedings Conference North American Chapter Association Computational Linguistics: Human Language Technologies, pp. 1480–1489 (2016)
Majumder, N., Poria, S., Gelbukh, A., Akhtar, M.S., Ekbal, A.: IARM: inter-aspect relation modeling with memory networks in aspect-based sentiment analysis. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3402–3411 (2018)
Fan, F., Feng, Y., Zhao, D.: Multi-grained attention network for aspect level sentiment classification. In: Proc. Conf. Empir. Methods Nat. Lang. Process, pp. 3433–3442 (2018)
Bertero, D., Siddique, F.B., Wu, C.S., Wan, Y., Chan, R.H.Y., Fung, P.: Real-time speech emotion and sentiment recognition for interactive dialogue systems. In: Proceedings of the 2016 Conference on Empirical Methods in NLP, pp. 1042–1047 (2016)
Wang, Y., Huang, M., Zhu, X., Zhao, L.: Attention-based LSTM for aspect-level sentiment classification. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 606–615 (2016)
Yu, J., Jiang, J., Xia, R.: Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 429–439 (2019)
Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I.: Ask me anything: dynamic memory networks for natural language processing. arXiv:1506.07285 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Yu, J., Jiang, J.: Adapting BERT for target-oriented multimodal sentiment classification. In: Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI-19 (2019)
Lu, D., Neves, L., Carvalho, V., Zhang, N., Ji, H.: Visual attention model for name tagging in multimodal social media. In: The Association for Computational Linguistics, pp. 1990–1999 (2018)
Zoran, D., Chrzanowski, M., Huang, P.S., Gowal, S., Mott, A., Kohli, P.: Towards robust image classification using sequential attention models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9483–9492 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, J., Liu, Z., Sheng, V., Song, Y., Qiu, C. (2021). SaliencyBERT: Recurrent Attention Network for Target-Oriented Multimodal Sentiment Classification. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13021. Springer, Cham. https://doi.org/10.1007/978-3-030-88010-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-88010-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88009-5
Online ISBN: 978-3-030-88010-1
eBook Packages: Computer ScienceComputer Science (R0)