Multimodal representation with embedded visual guiding objects for named entity recognition in social media posts

Z Wu, C Zheng, Y Cai, J Chen, H Leung… - Proceedings of the 28th …, 2020 - dl.acm.org
Proceedings of the 28th ACM International Conference on Multimedia, 2020dl.acm.org
Visual contexts often help to recognize named entities more precisely in short texts such as
tweets or snapchat. For example, one can identify" Charlie''as a name of a dog according to
the user posts. Previous works on multimodal named entity recognition ignore the
corresponding relations of visual objects and entities. Visual objects are considered as fine-
grained image representations. For a sentence with multiple entity types, objects of the
relevant image can be utilized to capture different entity information. In this paper, we …
Visual contexts often help to recognize named entities more precisely in short texts such as tweets or snapchat. For example, one can identify "Charlie'' as a name of a dog according to the user posts. Previous works on multimodal named entity recognition ignore the corresponding relations of visual objects and entities. Visual objects are considered as fine-grained image representations. For a sentence with multiple entity types, objects of the relevant image can be utilized to capture different entity information. In this paper, we propose a neural network which combines object-level image information and character-level text information to predict entities. Vision and language are bridged by leveraging object labels as embeddings, and a dense co-attention mechanism is introduced for fine-grained interactions. Experimental results in Twitter dataset demonstrate that our method outperforms the state-of-the-art methods.
ACM Digital Library