Google Scholar

Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts

Z Wu, C Zheng, Y Cai, J Chen, H Leung… - Proceedings of the 28th …, 2020 - dl.acm.org

Z Wu, C Zheng, Y Cai, J Chen, H Leung, Q Li

Proceedings of the 28th ACM International Conference on Multimedia, 2020•dl.acm.org

Visual contexts often help to recognize named entities more precisely in short texts such as tweets or snapchat. For example, one can identify "Charlie'' as a name of a dog according to the user posts. Previous works on multimodal named entity recognition ignore the corresponding relations of visual objects and entities. Visual objects are considered as fine-grained image representations. For a sentence with multiple entity types, objects of the relevant image can be utilized to capture different entity information. In this paper, we propose a neural network which combines object-level image information and character-level text information to predict entities. Vision and language are bridged by leveraging object labels as embeddings, and a dense co-attention mechanism is introduced for fine-grained interactions. Experimental results in Twitter dataset demonstrate that our method outperforms the state-of-the-art methods.

ACM Digital Library

Show moreShow less

Save Cite Cited by 93 Related articles All 3 versions

Cite

Advanced search

Saved to My library

Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts