Multimodal Neural Machine Translation with Search Engine Based Image Retrieval

Tang, ZhenHao; Zhang, XiaoBing; Long, Zi; Fu, XiangHua

Computer Science > Computer Vision and Pattern Recognition

arXiv:2208.00767 (cs)

[Submitted on 26 Jul 2022 (v1), last revised 3 Sep 2022 (this version, v2)]

Title:Multimodal Neural Machine Translation with Search Engine Based Image Retrieval

Authors:ZhenHao Tang, XiaoBing Zhang, Zi Long, XiangHua Fu

View PDF

Abstract:Recently, numbers of works shows that the performance of neural machine translation (NMT) can be improved to a certain extent with using visual information. However, most of these conclusions are drawn from the analysis of experimental results based on a limited set of bilingual sentence-image pairs, such as Multi30K. In these kinds of datasets, the content of one bilingual parallel sentence pair must be well represented by a manually annotated image, which is different with the actual translation situation. Some previous works are proposed to addressed the problem by retrieving images from exiting sentence-image pairs with topic model. However, because of the limited collection of sentence-image pairs they used, their image retrieval method is difficult to deal with the out-of-vocabulary words, and can hardly prove that visual information enhance NMT rather than the co-occurrence of images and sentences. In this paper, we propose an open-vocabulary image retrieval methods to collect descriptive images for bilingual parallel corpus using image search engine. Next, we propose text-aware attentive visual encoder to filter incorrectly collected noise images. Experiment results on Multi30K and other two translation datasets show that our proposed method achieves significant improvements over strong baselines.

Comments:	9 pages, 5 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2208.00767 [cs.CV]
	(or arXiv:2208.00767v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2208.00767

Submission history

From: ZhenHao Tang [view email]
[v1] Tue, 26 Jul 2022 08:42:06 UTC (1,035 KB)
[v2] Sat, 3 Sep 2022 09:22:51 UTC (2,532 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Multimodal Neural Machine Translation with Search Engine Based Image Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Multimodal Neural Machine Translation with Search Engine Based Image Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators